How to Create a Hugging Face Space to Run Your Machine Learning Model with Gradio

Text Generation with IBM-Granite

Jul 05, 2024

In this post, we’ll explore how to deploy a machine learning model on Hugging Face Spaces using Gradio. Whether you’re a data scientist, a machine learning enthusiast, or just someone interested in making your models accessible, this guide will take you step-by-step through the process. Let’s dive in!

Introduction

Imagine you have a fantastic machine learning model that can generate code snippets, analyze texts, or even generate artwork. But what good is it if only you can use it? Deploying your model in a way that others can interact with it easily is crucial. Hugging Face Spaces, combined with Gradio, offers a straightforward solution to make your models accessible online.

In this example, we will be using IBM Granite

IBM is building enterprise-focused foundation models to drive the
future of business. The Granite family of foundation models span a
variety of modalities, including language, code, and other modalities,
such as time series.

IBM's Granite code model family is going open source - IBM Research

This is what you will have on completing this tutorial:
IBM-Granite-Text-Generation ; A deployed model for text generation using IBM-Granite. Feel free to checkout.

What You’ll Need

Hugging Face Account: Sign up at Hugging Face if you don’t already have an account.
Gradio: A user-friendly interface for machine learning models.
Basic Python Knowledge: Understanding of Python will help, but the steps are simple enough for anyone to follow.

Step-by-Step Guide

1. Create a New Hugging Face Space

Navigate to Hugging Face Spaces: Go to Hugging Face Spaces.
Create a New Space: Click on Create new Space, give it a name, and choose its visibility (public or private).
Choose the App Interface: Select Gradio as the interface. It’s user-friendly and great for simple to moderately complex interfaces.

2. Set Up Your Space

Access the Space Repository: Once the Space is created, you’ll be taken to its repository.
Add a New File: Click on Files and versions and then Add file to create a new file named app.py.

I prefer cloning to local, add or make necessary changes and push. Ref. how to clone and push changes back to spaces

3. Write Your Code

Copy the following code into app.py:

import gradio as gr
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" if torch.cuda.is_available() else "cpu"
model_path = "ibm-granite/granite-3b-code-base"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()

def generate_code(input_text):
    input_tokens = tokenizer(input_text, return_tensors="pt")
    for i in input_tokens:
        input_tokens[i] = input_tokens[i].to(device)
    output = model.generate(**input_tokens)
    output_text = tokenizer.batch_decode(output, skip_special_tokens=True)[0]
    return output_text

# Gradio Interface
iface = gr.Interface(
    fn=generate_code, 
    inputs=gr.Textbox(lines=2, placeholder="Enter code snippet here..."), 
    outputs=gr.Textbox(label="Generated Code")
)

iface.launch()

Explanation:

Model and Tokenizer: The code loads a pretrained model and tokenizer from Hugging Face.
Device Selection: It checks if a GPU is available and sets the device accordingly.
Generate Code Function: This function generates a code snippet from the input text.
Gradio Interface: It creates a simple interface with an input textbox for the code snippet and an output textbox for the generated code.

4. Specify Dependencies

Create a requirements.txt file to specify your dependencies:

transformers
torch
gradio==4.4.0
accelerate

This ensures that the necessary libraries are installed when the Space is launched.

5. Deploy Your Space

Enable Hardware Acceleration: Go to Settings in your Space and enable Use hardware acceleration. Choose the appropriate hardware (CPU or GPU).
Run the Space: Click the Run button to deploy your Space. It should now be live and ready for interaction.

Troubleshooting Tips

Check the Logs: If something isn’t working as expected, check the logs for detailed error messages.

Conclusion

Deploying your machine learning model on Hugging Face Spaces with Gradio is a fantastic way to make it accessible to a broader audience. By following these steps, you can create a user-friendly interface that anyone can use to interact with your model. So why wait? Start deploying your models today and let the world see what they can do!

Additional Resources

Thanks for reading! If you found this helpful, subscribe to stay updated with more tutorials and tips on machine learning and model deployment.

Feel free to reach out if you have any questions or suggestions for future topics. Happy coding and happy deploying!

shravankumar’s Substack

Discussion about this post