Build Your Own Local AI Agent: A Step By Step Guide

Local AI agents run on your machine. No cloud. No external APIs. Just you, your hardware, and the model. This post walks through the essentials: choosing a model, wiring it up with an agent framework, and running it locally. If you want privacy, speed, or control, this is how you get it.

What Can Local Agents Do?

Local agents can handle a wide range of tasks: summarizing documents, answering questions, automating workflows, scraping websites, or even acting as coding assistants.

In this post, we’ll focus on a simple task: scraping news headlines from a website and summarizing them. It’s fast, useful, and shows the core pieces in action.

Tools We’ll Use

  • Ollama – run language models locally with one command. Gemma or Mistral work fine on a Laptop
  • LangChain – structure reasoning, tools, and memory
  • Python – glue everything together

Basic Structure of a Local Agent

  1. Model – the LLM doing the “thinking”
  2. Tools – code the agent can use (like a scraper or file reader)
  3. Prompt – instructions for what the agent should do
  4. Loop – let the agent think and act step-by-step

That’s it. The rest is just wiring.

Getting Started

  1. Install Ollama
    https://ollama.com
    brew install ollama or grab it for your OS.
  2. Pull a model: ollama run mistral
  3. Set up a LangChain agent
    Load the model via LangChain, define a tool, and pass it to the agent. You’ll see how in the example below.

The Code

pip install langchain beautifulsoup4 requests

ollama run mistral

Now make yourself a python script, such as run.py

from langchain.llms import Ollama

llm = Ollama(model="mistral")

The scraper:

import requests
from bs4 import BeautifulSoup

def get_headlines(url="https://www.bbc.com"):
    res = requests.get(url)
    soup = BeautifulSoup(res.text, "html.parser")
    headlines = [h.get_text() for h in soup.find_all("h3")]
    return "\n".join(headlines[:10])  # Just take top 10

Wrap it as a LangChain tool:

from langchain.agents import tool

@tool
def scrape_headlines() -> str:
    """Scrapes top headlines from BBC."""
    return get_headlines()

Build the agent:

from langchain.agents import initialize_agent, AgentType

tools = [scrape_headlines]

agent = initialize_agent(
    tools,
    llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True
)

Run the agent:

agent.run("Get the top news headlines and summarize them in a few bullet points.")

That’s it, you now have a local agent: scraping, thinking, and summarizing. All on your machine.

How to run Ollama in CircleCI

Yes, it’s absolutely possible! You can run a small LLM like Gemma3 4b using Ollama in your basic CircleCI pipeline to integrate AI capabilities directly into your CI/CD workflows. Of course its capabilities are limited, but you can use it for agents or semantic unittests.

Here is an example of a CircleCI config using Ollama, and runs on the free plan (large resource). It demonstrates how to use the Ollama Docker image in a CI pipeline and assumes you want to pull a model and run a basic script using the Ollama service.

jobs:
  ollama-example:
    docker:
      - image: cimg/python:3.9
      - image: ollama/ollama:latest
        name: ollama
    resource_class: large
    steps:
      - checkout
      - run:
          name: Wait for Ollama to start
          command: |
            until curl -s http://ollama:11434/; do
              echo "Waiting for Ollama to start..."
              sleep 5
            done
      - run:
          name: Pull Gemma3 Model Using Web API
          command: |
            curl -X POST http://ollama:11434/api/pull \
              -H "Content-Type: application/json" \
              -d '{"model": "gemma3:4b"}'
      - run:
          name: Run a Python script using Ollama
          command: |
            python script.py

workflows:
  version: 2
  ollama-workflow:
    jobs:
      - ollama-example

And the Python script:

import requests
from pprint import pprint

response = requests.post(
    'http://ollama:11434/api/completion',
    json={'model': 'gemma3:4b', 'prompt': 'Hello, Ollama!'}
)
pprint(response.json())

This configuration is simple and can be used as a starting point to work on integrating Ollama into a CI pipeline.

Semantic Unittests

Unit tests traditionally focus on verifying exact outputs, but how do we test the output of data that might slightly change, such as the output of an LLM to the same question.

Luckily, using a SemanticTestcase we can test semantic correctness rather than rigid string matches in Python. This is useful for applications like text validation, classification, or summarization, where there’s more than one “correct” answer.

Traditional vs. Semantic Testing

  • Traditional Unit Test

A standard test might look like this:

import unittest
from text_validator import validate_text

class TestTextValidator(unittest.TestCase):
    def test_profane_text(self):
        self.assertFalse(validate_text("This is some bad language!")) 
    def test_clean_text(self):
        self.assertTrue(validate_text("Hello, how are you?"))

Here, validate_text() returns True or False, but it assumes there’s a strict set of phrases that are “bad” or “good.” Edge cases like paraphrased profanity might be missed.

  • Semantic Unit Test

Instead of rigid assertions, we can use SemanticTestCase to evaluate the meaning of the response:

self.assertSemanticallyEqual("Blue is the sky.", "The sky is blue.")

A test case:


class TestTextValidator(SemanticTestCase):
    """
    We're testing the SemanticTestCase here
    """

    def test_semantic(self):
        self.assertSemanticallyCorrect(longer_text, "It is a public holiday in Ireland")
        self.assertSemanticallyIncorrect(longer_text, "It is a public holiday in Italy")
        self.assertSemanticallyEqual("Blue is the sky.", "The sky is blue.")

Here, assertSemanticallyCorrect() and its siblings use an LLM to classify the input and return a judgment. Instead of exact matches, we test whether the response aligns with our expectation.

Why This Matters

• AI systems often output slightly different versions of the same sentence, when repeated. This makes it very hard for traditional unittest asserts, but SemanticTestCase allows to compare these outputs as well.

• Handles paraphrased inputs: Profanity, toxicity, or policy violations don’t always follow exact patterns.

• More flexible testing: Works for tasks like summarization or classification, where exact matches aren’t realistic.

Some words on ..

Execution speed: Running an LLM for each test could be slower than traditional unit tests. But it is surprisingly fast on my Mac M1 with local Ollama and a laptop-sized LLM such as Gemma.

The speed is affected by the size of the prompt (or context), it is fast when comparing just a few sentences. Furthermore, the LLM stays loaded between two assertions, which also contributes to its speed.

Data protection: if handling sensitive data is a concern, install a local LLM e.g. using Ollama. Still quite fast.

NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the ‘ssl’ module is compiled with ‘LibreSSL 2.8.3’.

site-packages/urllib3/__init__.py:34: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020

Turns out that urllib3 version 2 wants OpenSSL to work properly. The earlier versions of urllib3 version 2 even stopped entirely, the later ones throw this warning. But my current ssl module appears to be LibreSSL. The idea is to install an urllib3 that is compatible with LibreSSL.

Solution

pip install urllib3==1.26.20

An older version of urllib3 series 1 works, I picked the latest. Now my app runs without the warning.