Text Generation & Summarization with Hugging Face in Python

In this article, we'll dive into the world of text generation and summarization using the Hugging Face library in Python. Hugging Face is an open-source library with a wide range of pre-trained models for natural language processing (NLP) tasks. We'll cover the basics of using the library and provide examples for text generation and summarization tasks.

Table of Contents

  1. Introduction to Hugging Face
  2. Installation and Setup
  3. Text Generation with Hugging Face
  4. Text Summarization with Hugging Face
  5. Conclusion

Introduction to Hugging Face

Hugging Face is an open-source library that provides pre-trained models for various NLP tasks such as text generation, summarization, translation, and more. The library is built on top of the popular deep learning framework, PyTorch, and TensorFlow. It makes it easy for developers to fine-tune and customize pre-trained models for specific tasks.

Installation and Setup

To get started, install the Hugging Face library with the following command:

pip install transformers

Additionally, you'll need to install the torch library if you haven't done so already:

pip install torch

Text Generation with Hugging Face

To generate text using the Hugging Face library, we'll first need to import the required classes and load a pre-trained model. In this example, we'll use the GPT-2 model, a popular text generation model.

from transformers import GPT2LMHeadModel, GPT2Tokenizer

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")

Next, we'll create a function to generate text using the model:

def generate_text(prompt, model, tokenizer, max_length=50):
    input_ids = tokenizer.encode(prompt, return_tensors="pt")
    output = model.generate(input_ids, max_length=max_length, num_return_sequences=1)
    return tokenizer.decode(output[0], skip_special_tokens=True)

Now, let's try generating some text:

prompt = "Once upon a time"
generated_text = generate_text(prompt, model, tokenizer)
print(generated_text)

Text Summarization with Hugging Face

For text summarization, we'll use the BartForConditionalGeneration model along with the BartTokenizer. First, import the required classes and load the pre-trained model:

from transformers import BartForConditionalGeneration, BartTokenizer

tokenizer = BartTokenizer.from_pretrained("facebook/bart-large-cnn")
model = BartForConditionalGeneration.from_pretrained("facebook/bart-large-cnn")

Next, create a function to perform text summarization:

def summarize_text(text, model, tokenizer, max_length=100):
    input_ids = tokenizer.encode(text, return_tensors="pt")
    summary_ids = model.generate(input_ids, max_length=max_length, num_beams=4, early_stopping=True)
    return tokenizer.decode(summary_ids[0], skip_special_tokens=True)

Now, let's try summarizing a sample text:

text = "Hugging Face is an open-source library that provides pre-trained models for various NLP tasks such as text generation, summarization, translation, and more. The library is built on top of the popular deep learning framework, PyTorch, and TensorFlow. It makes it easy for developers to fine-tune and customize pre-trained models for specific tasks."
summary = summarize_text(text, model, tokenizer)
print(summary)

Conclusion

In this article, we've introduced the Hugging Face library and demonstrated how to use it for text generation and summarization tasks in Python. This powerful library offers a wide range of pre-trained models for various NLP tasks, making it an invaluable tool for developers working with natural language processing.

An AI coworker, not just a copilot

View VelocityAI