Build Your Own Local AI Agent: A Step By Step Guide

Local AI agents run on your machine. No cloud. No external APIs. Just you, your hardware, and the model. This post walks through the essentials: choosing a model, wiring it up with an agent framework, and running it locally. If you want privacy, speed, or control, this is how you get it.

What Can Local Agents Do?

Local agents can handle a wide range of tasks: summarizing documents, answering questions, automating workflows, scraping websites, or even acting as coding assistants.

In this post, we’ll focus on a simple task: scraping news headlines from a website and summarizing them. It’s fast, useful, and shows the core pieces in action.

Tools We’ll Use

  • Ollama – run language models locally with one command. Gemma or Mistral work fine on a Laptop
  • LangChain – structure reasoning, tools, and memory
  • Python – glue everything together

Basic Structure of a Local Agent

  1. Model – the LLM doing the “thinking”
  2. Tools – code the agent can use (like a scraper or file reader)
  3. Prompt – instructions for what the agent should do
  4. Loop – let the agent think and act step-by-step

That’s it. The rest is just wiring.

Getting Started

  1. Install Ollama
    https://ollama.com
    brew install ollama or grab it for your OS.
  2. Pull a model: ollama run mistral
  3. Set up a LangChain agent
    Load the model via LangChain, define a tool, and pass it to the agent. You’ll see how in the example below.

The Code

pip install langchain beautifulsoup4 requests

ollama run mistral

Now make yourself a python script, such as run.py

from langchain.llms import Ollama

llm = Ollama(model="mistral")

The scraper:

import requests
from bs4 import BeautifulSoup

def get_headlines(url="https://www.bbc.com"):
    res = requests.get(url)
    soup = BeautifulSoup(res.text, "html.parser")
    headlines = [h.get_text() for h in soup.find_all("h3")]
    return "\n".join(headlines[:10])  # Just take top 10

Wrap it as a LangChain tool:

from langchain.agents import tool

@tool
def scrape_headlines() -> str:
    """Scrapes top headlines from BBC."""
    return get_headlines()

Build the agent:

from langchain.agents import initialize_agent, AgentType

tools = [scrape_headlines]

agent = initialize_agent(
    tools,
    llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True
)

Run the agent:

agent.run("Get the top news headlines and summarize them in a few bullet points.")

That’s it, you now have a local agent: scraping, thinking, and summarizing. All on your machine.

How to run Ollama in CircleCI

Yes, it’s absolutely possible! You can run a small LLM like Gemma3 4b using Ollama in your basic CircleCI pipeline to integrate AI capabilities directly into your CI/CD workflows. Of course its capabilities are limited, but you can use it for agents or semantic unittests.

Here is an example of a CircleCI config using Ollama, and runs on the free plan (large resource). It demonstrates how to use the Ollama Docker image in a CI pipeline and assumes you want to pull a model and run a basic script using the Ollama service.

jobs:
  ollama-example:
    docker:
      - image: cimg/python:3.9
      - image: ollama/ollama:latest
        name: ollama
    resource_class: large
    steps:
      - checkout
      - run:
          name: Wait for Ollama to start
          command: |
            until curl -s http://ollama:11434/; do
              echo "Waiting for Ollama to start..."
              sleep 5
            done
      - run:
          name: Pull Gemma3 Model Using Web API
          command: |
            curl -X POST http://ollama:11434/api/pull \
              -H "Content-Type: application/json" \
              -d '{"model": "gemma3:4b"}'
      - run:
          name: Run a Python script using Ollama
          command: |
            python script.py

workflows:
  version: 2
  ollama-workflow:
    jobs:
      - ollama-example

And the Python script:

import requests
from pprint import pprint

response = requests.post(
    'http://ollama:11434/api/completion',
    json={'model': 'gemma3:4b', 'prompt': 'Hello, Ollama!'}
)
pprint(response.json())

This configuration is simple and can be used as a starting point to work on integrating Ollama into a CI pipeline.

Semantic Unittests

Unit tests traditionally focus on verifying exact outputs, but how do we test the output of data that might slightly change, such as the output of an LLM to the same question.

Luckily, using a SemanticTestcase we can test semantic correctness rather than rigid string matches in Python. This is useful for applications like text validation, classification, or summarization, where there’s more than one “correct” answer.

Traditional vs. Semantic Testing

  • Traditional Unit Test

A standard test might look like this:

import unittest
from text_validator import validate_text

class TestTextValidator(unittest.TestCase):
    def test_profane_text(self):
        self.assertFalse(validate_text("This is some bad language!")) 
    def test_clean_text(self):
        self.assertTrue(validate_text("Hello, how are you?"))

Here, validate_text() returns True or False, but it assumes there’s a strict set of phrases that are “bad” or “good.” Edge cases like paraphrased profanity might be missed.

  • Semantic Unit Test

Instead of rigid assertions, we can use SemanticTestCase to evaluate the meaning of the response:

self.assertSemanticallyEqual("Blue is the sky.", "The sky is blue.")

A test case:


class TestTextValidator(SemanticTestCase):
    """
    We're testing the SemanticTestCase here
    """

    def test_semantic(self):
        self.assertSemanticallyCorrect(longer_text, "It is a public holiday in Ireland")
        self.assertSemanticallyIncorrect(longer_text, "It is a public holiday in Italy")
        self.assertSemanticallyEqual("Blue is the sky.", "The sky is blue.")

Here, assertSemanticallyCorrect() and its siblings use an LLM to classify the input and return a judgment. Instead of exact matches, we test whether the response aligns with our expectation.

Why This Matters

• AI systems often output slightly different versions of the same sentence, when repeated. This makes it very hard for traditional unittest asserts, but SemanticTestCase allows to compare these outputs as well.

• Handles paraphrased inputs: Profanity, toxicity, or policy violations don’t always follow exact patterns.

• More flexible testing: Works for tasks like summarization or classification, where exact matches aren’t realistic.

Some words on ..

Execution speed: Running an LLM for each test could be slower than traditional unit tests. But it is surprisingly fast on my Mac M1 with local Ollama and a laptop-sized LLM such as Gemma.

The speed is affected by the size of the prompt (or context), it is fast when comparing just a few sentences. Furthermore, the LLM stays loaded between two assertions, which also contributes to its speed.

Data protection: if handling sensitive data is a concern, install a local LLM e.g. using Ollama. Still quite fast.

Working with Copilot

Just something I wanted to share:

Me: when working with Copilot, many things get easier, but the core problems remain the same and fixing hairy problems such as concurrency issues and the like, or test edge cases etc are still manual work. It helps mainly with boilerplate code, which aren’t difficult to write anyway, but it saves time

ChatGPT said:

Yeah, exactly. Copilot and similar AI tools are great for reducing grunt work—boilerplate, repetitive patterns, and even some tedious debugging—but they don’t fundamentally change the nature of software development. The hard parts, like designing good abstractions, handling tricky concurrency issues, and ensuring correctness in edge cases, still require human insight.

It’s like having an assistant that speeds up the easy stuff, but when you hit a truly gnarly problem, you’re still on your own. Have you found any areas where it actually helps with the deeper problems, or is it mostly just a time-saver for you?

“Conclusion”

It is good that the AI agrees with me here.

NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the ‘ssl’ module is compiled with ‘LibreSSL 2.8.3’.

site-packages/urllib3/__init__.py:34: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020

Turns out that urllib3 version 2 wants OpenSSL to work properly. The earlier versions of urllib3 version 2 even stopped entirely, the later ones throw this warning. But my current ssl module appears to be LibreSSL. The idea is to install an urllib3 that is compatible with LibreSSL.

Solution

pip install urllib3==1.26.20

An older version of urllib3 series 1 works, I picked the latest. Now my app runs without the warning.

How to install the yaml package for Python?

I want to read the config for my backend from a yaml file and when installing the yaml package I am getting the following error:

pip install yaml
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
ERROR: Could not find a version that satisfies the requirement yaml (from versions: none)
ERROR: No matching distribution found for yaml

Quick Answer

pip install pyyaml

More Details

This article explaining how to read yaml config files in Python doesn’t show how to install the package, which should be straightforward, but there is no such package as yaml. A quick search on pypi.org found this package: https://pypi.org/project/PyYAML/

I tried it and it works like a charm with the same syntax as in the blog post. Now you can go ahead and try Ram’s example.

How to Install and Use Salesforce’s CodeGen LLM

CodeGen is an AI (LLM) from Salesforce that can generate source code, as well as describe what a piece of code does. It comes under the Apache license and has a good performance while being lightweight enough to run on a laptop for both inference and fine tuning. Here is how to set it up and how to use it.

Installation with HuggingFace

This blog post provides instructions on how to use the Codegen LLM via the Hugging Face Transformers library. It assumes you have a development environment set up and are familiar with Hugging Face.

You’ll need to install the `transformers` and `torch` libraries:

pip install transformers torch

If you intend to use a GPU, ensure you have the correct CUDA drivers and PyTorch/TensorFlow builds for GPU support.

Model Loading

Codegen models are typically available on the Hugging Face Model Hub. You can load a model and its tokenizer using the following code:

from transformers import AutoTokenizer, AutoModelForCausalLM  # Or AutoModelForSeq2SeqLM for sequence-to-sequence models

model_name = "Salesforce/codegen-350M-mono"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)  # Or AutoModelForSeq2SeqLM

# For GPU usage (recommended):
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

Replace "Salesforce/codegen-350M-mono" with the specific Codegen model name you intend to use. Check the Hugging Face Model Hub for available models.

Code Generation

Here’s how to generate code using the loaded model:

prompt = "Write a Python function to calculate the factorial of a number."

input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)  # Move input to device

outputs = model.generate(input_ids,
                       max_length=200,  # Adjust as needed
                       num_beams=5,      # Adjust for quality/speed trade-off
                       temperature=0.7,  # Adjust for creativity (higher = more creative)
                       top_k=40,         # Adjust for sampling
                       top_p=0.95,        # Adjust for sampling
                       pad_token_id=tokenizer.eos_token_id # Important for some models
                       )

generated_code = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_code)


# Example with infilling (code completion):
prompt = "def my_function(x):\n    # TODO: Calculate the square of x\n    return"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
outputs = model.generate(input_ids, max_length=100, num_beams=5, pad_token_id=tokenizer.eos_token_id)
generated_code = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_code)

Considerations

Model Selection: Different Codegen models have different strengths. Choose the one that best suits your needs.

Prompt Engineering: Clear and specific prompts are essential for good results.

Parameter Tuning: Experiment with the generation parameters to find the optimal settings for your use case.

Resource Management: Large language models can be resource-intensive. Consider using a GPU if available.

Output Validation: The generated code should be reviewed and tested carefully. It might require debugging.

Python-Alpaca Dataset

I came across this dataset recently, a collection of 22k Python code examples, tested and verified to work. What really caught my attention is how this was put together—they used a custom script to extract Python code from Alpaca-formatted datasets, tested each snippet locally, and only kept the functional ones. Non-functional examples were separated into their own file.

The dataset pulls from a mix of open-source projects like Wizard-LM’s Evol datasets, CodeUp’s 19k, and a bunch of others, plus some hand-prompted GPT-4 examples. Everything’s been deduplicated, so you’re not stuck with repeats.

It’s especially cool if you’re working on training AI models for coding tasks because it sidesteps one of the biggest issues with open datasets: non-functional or broken code. They even hinted at adapting the script for other languages like C++ or SQL.

If you use the dataset or their script, they ask for attribution: Filtered Using Vezora’s CodeTester. Oh, and they’re working on releasing an even bigger dataset with 220,000+ examples, definitely one to keep an eye on!

On Huggingface: Tested-22k-Python-Alpaca

Read also how to analyze a dataset.

Role Assignment in Multi-Agent Systems

When working with multi-agent systems, one of the most powerful concepts you can leverage is role assignment. In a multi-agent setup, you can define distinct roles for each agent to create different behaviors, allowing them to collaborate, interact, and solve problems in a simulated environment.

Imagine you’re managing a software development project. You have a project manager, a developer, and a tester, each with a unique perspective and responsibilities. By assigning these roles to different agents in a conversation, you can simulate their interactions to observe how they work together toward a common goal, like completing a feature or identifying a bug.

Why Use Role Assignment?

Role assignment is essential in multi-agent systems because it allows you to create more realistic, diverse behaviors in the simulation. Each agent has specific tasks, which means they’ll react differently based on their role. For example:

  • The project manager might focus on project timelines, priorities, and coordinating tasks.
  • The developer could be focused on writing code, debugging, and creating new features.
  • The tester would be identifying bugs, running test cases, and ensuring the quality of the product.

By assigning different roles, you give each agent context and a purpose, which leads to more meaningful interactions.

How to Assign Roles in the OpenAI Chat API

Using the OpenAI API Documentation, assigning roles is simple. You can use system messages to define the specific behavior of each agent. These messages help guide each agent’s response and ensure that they act within their role.

Here’s how you can structure it:

import openai

openai.ChatCompletion.create(model="gpt-3.5-turbo",
  messages=[

    {
      "role": "system", 
      "content": "You are the project manager for a software development team. Your role is to coordinate tasks, set deadlines, and ensure the project stays on track. Focus on the big picture and team collaboration."
    },

    {
      "role": "system", 
      "content": "You are a developer working on new features and fixing bugs. Focus on writing clean code, debugging, and offering technical solutions to problems."
    },

    {
      "role": "system", 
      "content": "You are a tester responsible for finding bugs and ensuring that the software is stable. Run tests, identify issues, and communicate them clearly for the team to address."
    },

    {
     "role": "user",
     "content": "Let's start the project. The first task is to build the user authentication feature."
    }
  ]
)
In this example:
Note: Don’t be confused by the API role and the role you define

Don’t be confused by the “role” in the API message (e.g., system, user, assistant) and the “role” you define for each agent (e.g., project manager, developer, tester). In the API context, “role” refers to the message sender (system, user, assistant), while in the agent context, “role” refers to the specific persona or responsibility the agent has within the conversation.

In this example:

  • The project manager agent is given a message to manage the project, prioritizing tasks and deadlines.
  • The developer agent is tasked with coding and troubleshooting technical challenges.
  • The tester agent focuses on testing and identifying bugs to ensure a stable product.

Each agent’s system message helps them understand their role and contributes accordingly to the conversation, creating a collaborative environment that mirrors real-world project dynamics.

Why It Works

The power of multi-agent systems comes from the interaction between agents with different roles. When agents understand their role and objectives, they can communicate more effectively, mimic real-world collaborations, and help identify solutions more efficiently. You can also test various scenarios to see how different roles react to challenges or changes in the system, all without human intervention.

Wrapping Up

Role assignment in multi-agent systems is a powerful way to simulate complex scenarios with diverse behaviors. By using system messages to define roles, you can create agents that act like real-life colleagues, each contributing in their own way to achieve the common goal. Whether you’re simulating a team of developers or testing a new feature, this approach brings both flexibility and realism to the table.

Next time you’re working with multi-agent systems, try assigning different roles to your agents. You might be surprised at how dynamic and engaging the conversation becomes!

For more information on how to implement these concepts, be sure to check out the OpenAI API Documentation, where you can explore further examples, code snippets, and more to help you make the most of the Chat API in your projects.

Can’t install PyTorch on my Macbook

To my surprise, I wasn’t able to install Pytorch for a project on my Macbook Pro M1 today (MacOS Sequoia 15.2). I kept getting this error when running pip3 install -r requirements.txt:

ERROR: Ignored the following versions that require a different python version: 1.21.2 Requires-Python >=3.7,<3.11; 1.21.3 Requires-Python >=3.7,<3.11; 1.21.4 Requires-Python >=3.7,<3.11; 1.21.5 Requires-Python >=3.7,<3.11; 1.21.6 Requires-Python >=3.7,<3.11; 1.26.0 Requires-Python <3.13,>=3.9; 1.26.1 Requires-Python <3.13,>=3.9
ERROR: Could not find a version that satisfies the requirement torch==2.7.0.dev20250116 (from versions: none)
ERROR: No matching distribution found for torch==2.7.0.dev20250116

I tried it manually: pip3 install torch, no luck:

pip install torch
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
ERROR: Could not find a version that satisfies the requirement torch (from versions: none)
ERROR: No matching distribution found for torch

Solution

This is what I came up with and it works fine:

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu