CodeGen is an AI (LLM) from Salesforce that can generate source code, as well as describe what a piece of code does. It comes under the Apache license and has a good performance while being lightweight enough to run on a laptop for both inference and fine tuning. Here is how to set it up and how to use it.
Installation with HuggingFace
This blog post provides instructions on how to use the Codegen LLM via the Hugging Face Transformers library. It assumes you have a development environment set up and are familiar with Hugging Face.
You’ll need to install the `transformers` and `torch` libraries:
pip install transformers torch
If you intend to use a GPU, ensure you have the correct CUDA drivers and PyTorch/TensorFlow builds for GPU support.
Model Loading
Codegen models are typically available on the Hugging Face Model Hub. You can load a model and its tokenizer using the following code:
from transformers import AutoTokenizer, AutoModelForCausalLM # Or AutoModelForSeq2SeqLM for sequence-to-sequence models
model_name = "Salesforce/codegen-350M-mono"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name) # Or AutoModelForSeq2SeqLM
# For GPU usage (recommended):
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
Replace "Salesforce/codegen-350M-mono" with the specific Codegen model name you intend to use. Check the Hugging Face Model Hub for available models.
Code Generation
Here’s how to generate code using the loaded model:
prompt = "Write a Python function to calculate the factorial of a number."
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device) # Move input to device
outputs = model.generate(input_ids,
max_length=200, # Adjust as needed
num_beams=5, # Adjust for quality/speed trade-off
temperature=0.7, # Adjust for creativity (higher = more creative)
top_k=40, # Adjust for sampling
top_p=0.95, # Adjust for sampling
pad_token_id=tokenizer.eos_token_id # Important for some models
)
generated_code = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_code)
# Example with infilling (code completion):
prompt = "def my_function(x):\n # TODO: Calculate the square of x\n return"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
outputs = model.generate(input_ids, max_length=100, num_beams=5, pad_token_id=tokenizer.eos_token_id)
generated_code = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_code)
Considerations
Model Selection: Different Codegen models have different strengths. Choose the one that best suits your needs.
Prompt Engineering: Clear and specific prompts are essential for good results.
Parameter Tuning: Experiment with the generation parameters to find the optimal settings for your use case.
Resource Management: Large language models can be resource-intensive. Consider using a GPU if available.
Output Validation: The generated code should be reviewed and tested carefully. It might require debugging.