Language: Python
ML/AI
Hugging Face released Transformers in 2018 to simplify the use of transformer-based models like BERT, GPT, RoBERTa, and T5. The library provides easy access to pre-trained models and tokenizers, enabling developers and researchers to leverage powerful NLP models without extensive training or setup.
Transformers is a Python library developed by Hugging Face that provides state-of-the-art pre-trained models for Natural Language Processing (NLP) tasks such as text classification, translation, summarization, question answering, and more.
pip install transformersconda install -c conda-forge transformersTransformers provides pre-trained models that can be used directly for inference or fine-tuned on custom datasets. It supports PyTorch, TensorFlow, and JAX backends. The library also includes tokenizers, pipelines, and trainer APIs for streamlined workflows.
from transformers import pipeline
classifier = pipeline('sentiment-analysis')
result = classifier('I love using Hugging Face Transformers!')
print(result)Uses a pre-trained sentiment analysis model via a simple pipeline to classify the sentiment of text.
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
inputs = tokenizer('Hello, Hugging Face!', return_tensors='pt')
print(inputs)Uses a BERT tokenizer to convert text into token IDs suitable for model input.
from transformers import AutoModelForSequenceClassification, Trainer, TrainingArguments, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
# Prepare dataset, tokenize, and create Trainer object
training_args = TrainingArguments(output_dir='./results', num_train_epochs=1, per_device_train_batch_size=8)
trainer = Trainer(model=model, args=training_args, train_dataset=train_dataset, eval_dataset=eval_dataset)
trainer.train()Shows fine-tuning a pre-trained BERT model for a custom classification task using the Trainer API.
from transformers import pipeline
qa_pipeline = pipeline('question-answering')
context = 'Hugging Face is creating a Transformers library.'
question = 'Who is creating Transformers?'
result = qa_pipeline(question=question, context=context)
print(result)Uses a pre-trained question-answering model to find answers in a given context.
from transformers import pipeline
generator = pipeline('text-generation', model='gpt2')
result = generator('Once upon a time', max_length=50)
print(result)Generates text continuations using a GPT-2 model.
from transformers import pipeline
summarizer = pipeline('summarization')
text = 'Hugging Face Transformers provides thousands of pre-trained models to perform tasks on texts, images, and audio.'
summary = summarizer(text, max_length=50, min_length=25, do_sample=False)
print(summary)Uses a summarization pipeline to produce a condensed version of the input text.
Use pipelines for quick inference without dealing with tokenization and model objects directly.
Fine-tune pre-trained models on custom datasets for better performance on specific tasks.
Leverage GPU acceleration for large models and batch inference.
Use the `AutoModel` and `AutoTokenizer` classes to easily switch between architectures.
Keep track of model versions to ensure reproducibility.