Python code to fine tune LLM

Example code for fine-tuning a language model using URL content and a vector database, and also an example logic to answer questions with the model using Python.

Example Code:

To fine-tune a language model using URL content and a vector database, we can follow these steps:

  1. Collect the URL content and preprocess it to prepare it for training.
import requests
from bs4 import BeautifulSoup
from transformers import AutoTokenizer
import re

# Collect URL content
url = ""
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
text = soup.get_text()

# Preprocess text
text = re.sub(r"\s+", " ", text)
tokenizer = AutoTokenizer.from_pretrained("gpt2")
inputs = tokenizer.encode(text, add_special_tokens=True, return_tensors="pt")
  1. Store the vector representations of the preprocessed URL content in a vector database, such as Pinecone.
import pinecone

# Connect to Pinecone

# Store vectors in Pinecone
vectors = model(inputs).last_hidden_state.mean(dim=1)
pinecone.create_index(index_name="my_index", dimension=768)
pinecone.save_index(index_name="my_index", vectors=vectors.numpy())
  1. Fine-tune the language model using the stored vectors and a task-specific dataset.
import torch
from transformers import AutoModelForCausalLM, AutoConfig
from import DataLoader, Dataset

# Define dataset and dataloader
class MyDataset(Dataset):
    def __init__(self, data): = data

    def __getitem__(self, index):

    def __len__(self):
        return len(

dataset = MyDataset(["This is a sample sentence", "Another sample sentence"])
dataloader = DataLoader(dataset, batch_size=2)

# Define model and optimizer
config = AutoConfig.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2", config=config)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-5)

# Fine-tune model
for epoch in range(10):
    for batch in dataloader:
        inputs = tokenizer.batch_encode_plus(batch, padding=True, return_tensors="pt")
        input_ids = inputs["input_ids"]
        attention_mask = inputs["attention_mask"]
        outputs = model(input_ids, attention_mask=attention_mask, labels=input_ids)
        loss = outputs.loss

Example Logic for Answering Questions:

To use the fine-tuned language model to answer questions, we can follow these steps:

  1. Collect the question and preprocess it to prepare it for inference.
# Collect and preprocess question
question = "What is the capital of France?"
question = re.sub(r"\s+", " ", question)
inputs = tokenizer.encode(question, add_special_tokens=True, return_tensors="pt")
  1. Retrieve the most similar vector to the preprocessed question from the vector database.
# Retrieve most similar vector from Pinecone
query_vector = model(inputs).last_hidden_state.mean(dim=1).numpy()
result = pinecone.query(index_name="my_index", query=query_vector, top_k=1)
  1. Generate the answer using the fine-tuned language model and the retrieved vector.


Generate answer using language model

context = result.ids[

context = “Paris is the capital of France. “
generated = model.generate(
answer = tokenizer.decode(generated[0], skip_special_tokens=True)

4. Return the answer.


Return answer


This logic generates an answer by appending the retrieved context from the vector database to the question and then using the fine-tuned language model to generate a response. The generated response is then decoded to produce the answer. This approach is known as a “prompt-based” or “zero-shot” method for question-answering, where the language model is not explicitly trained on the question-answering task, but rather generates an answer based on a given prompt and some context.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top