How to Install and Use Salesforce’s CodeGen LLM

CodeGen is an AI (LLM) from Salesforce that can generate source code, as well as describe what a piece of code does. It comes under the Apache license and has a good performance while being lightweight enough to run on a laptop for both inference and fine tuning. Here is how to set it up and how to use it.

Installation with HuggingFace

This blog post provides instructions on how to use the Codegen LLM via the Hugging Face Transformers library. It assumes you have a development environment set up and are familiar with Hugging Face.

You’ll need to install the `transformers` and `torch` libraries:

pip install transformers torch

If you intend to use a GPU, ensure you have the correct CUDA drivers and PyTorch/TensorFlow builds for GPU support.

Model Loading

Codegen models are typically available on the Hugging Face Model Hub. You can load a model and its tokenizer using the following code:

from transformers import AutoTokenizer, AutoModelForCausalLM  # Or AutoModelForSeq2SeqLM for sequence-to-sequence models

model_name = "Salesforce/codegen-350M-mono"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)  # Or AutoModelForSeq2SeqLM

# For GPU usage (recommended):
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

Replace "Salesforce/codegen-350M-mono" with the specific Codegen model name you intend to use. Check the Hugging Face Model Hub for available models.

Code Generation

Here’s how to generate code using the loaded model:

prompt = "Write a Python function to calculate the factorial of a number."

input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)  # Move input to device

outputs = model.generate(input_ids,
                       max_length=200,  # Adjust as needed
                       num_beams=5,      # Adjust for quality/speed trade-off
                       temperature=0.7,  # Adjust for creativity (higher = more creative)
                       top_k=40,         # Adjust for sampling
                       top_p=0.95,        # Adjust for sampling
                       pad_token_id=tokenizer.eos_token_id # Important for some models
                       )

generated_code = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_code)


# Example with infilling (code completion):
prompt = "def my_function(x):\n    # TODO: Calculate the square of x\n    return"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
outputs = model.generate(input_ids, max_length=100, num_beams=5, pad_token_id=tokenizer.eos_token_id)
generated_code = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_code)

Considerations

Model Selection: Different Codegen models have different strengths. Choose the one that best suits your needs.

Prompt Engineering: Clear and specific prompts are essential for good results.

Parameter Tuning: Experiment with the generation parameters to find the optimal settings for your use case.

Resource Management: Large language models can be resource-intensive. Consider using a GPU if available.

Output Validation: The generated code should be reviewed and tested carefully. It might require debugging.

Leave a Reply