Build Your Own Local AI Agent: A Step By Step Guide

Local AI agents run on your machine. No cloud. No external APIs. Just you, your hardware, and the model. This post walks through the essentials: choosing a model, wiring it up with an agent framework, and running it locally. If you want privacy, speed, or control, this is how you get it.

What Can Local Agents Do?

Local agents can handle a wide range of tasks: summarizing documents, answering questions, automating workflows, scraping websites, or even acting as coding assistants.

In this post, we’ll focus on a simple task: scraping news headlines from a website and summarizing them. It’s fast, useful, and shows the core pieces in action.

Tools We’ll Use

  • Ollama – run language models locally with one command. Gemma or Mistral work fine on a Laptop
  • LangChain – structure reasoning, tools, and memory
  • Python – glue everything together

Basic Structure of a Local Agent

  1. Model – the LLM doing the “thinking”
  2. Tools – code the agent can use (like a scraper or file reader)
  3. Prompt – instructions for what the agent should do
  4. Loop – let the agent think and act step-by-step

That’s it. The rest is just wiring.

Getting Started

  1. Install Ollama
    https://ollama.com
    brew install ollama or grab it for your OS.
  2. Pull a model: ollama run mistral
  3. Set up a LangChain agent
    Load the model via LangChain, define a tool, and pass it to the agent. You’ll see how in the example below.

The Code

pip install langchain beautifulsoup4 requests

ollama run mistral

Now make yourself a python script, such as run.py

from langchain.llms import Ollama

llm = Ollama(model="mistral")

The scraper:

import requests
from bs4 import BeautifulSoup

def get_headlines(url="https://www.bbc.com"):
    res = requests.get(url)
    soup = BeautifulSoup(res.text, "html.parser")
    headlines = [h.get_text() for h in soup.find_all("h3")]
    return "\n".join(headlines[:10])  # Just take top 10

Wrap it as a LangChain tool:

from langchain.agents import tool

@tool
def scrape_headlines() -> str:
    """Scrapes top headlines from BBC."""
    return get_headlines()

Build the agent:

from langchain.agents import initialize_agent, AgentType

tools = [scrape_headlines]

agent = initialize_agent(
    tools,
    llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True
)

Run the agent:

agent.run("Get the top news headlines and summarize them in a few bullet points.")

That’s it, you now have a local agent: scraping, thinking, and summarizing. All on your machine.

How to run Ollama in CircleCI

Yes, it’s absolutely possible! You can run a small LLM like Gemma3 4b using Ollama in your basic CircleCI pipeline to integrate AI capabilities directly into your CI/CD workflows. Of course its capabilities are limited, but you can use it for agents or semantic unittests.

Here is an example of a CircleCI config using Ollama, and runs on the free plan (large resource). It demonstrates how to use the Ollama Docker image in a CI pipeline and assumes you want to pull a model and run a basic script using the Ollama service.

jobs:
  ollama-example:
    docker:
      - image: cimg/python:3.9
      - image: ollama/ollama:latest
        name: ollama
    resource_class: large
    steps:
      - checkout
      - run:
          name: Wait for Ollama to start
          command: |
            until curl -s http://ollama:11434/; do
              echo "Waiting for Ollama to start..."
              sleep 5
            done
      - run:
          name: Pull Gemma3 Model Using Web API
          command: |
            curl -X POST http://ollama:11434/api/pull \
              -H "Content-Type: application/json" \
              -d '{"model": "gemma3:4b"}'
      - run:
          name: Run a Python script using Ollama
          command: |
            python script.py

workflows:
  version: 2
  ollama-workflow:
    jobs:
      - ollama-example

And the Python script:

import requests
from pprint import pprint

response = requests.post(
    'http://ollama:11434/api/completion',
    json={'model': 'gemma3:4b', 'prompt': 'Hello, Ollama!'}
)
pprint(response.json())

This configuration is simple and can be used as a starting point to work on integrating Ollama into a CI pipeline.

Semantic Unittests

Unit tests traditionally focus on verifying exact outputs, but how do we test the output of data that might slightly change, such as the output of an LLM to the same question.

Luckily, using a SemanticTestcase we can test semantic correctness rather than rigid string matches in Python. This is useful for applications like text validation, classification, or summarization, where there’s more than one “correct” answer.

Traditional vs. Semantic Testing

  • Traditional Unit Test

A standard test might look like this:

import unittest
from text_validator import validate_text

class TestTextValidator(unittest.TestCase):
    def test_profane_text(self):
        self.assertFalse(validate_text("This is some bad language!")) 
    def test_clean_text(self):
        self.assertTrue(validate_text("Hello, how are you?"))

Here, validate_text() returns True or False, but it assumes there’s a strict set of phrases that are “bad” or “good.” Edge cases like paraphrased profanity might be missed.

  • Semantic Unit Test

Instead of rigid assertions, we can use SemanticTestCase to evaluate the meaning of the response:

self.assertSemanticallyEqual("Blue is the sky.", "The sky is blue.")

A test case:


class TestTextValidator(SemanticTestCase):
    """
    We're testing the SemanticTestCase here
    """

    def test_semantic(self):
        self.assertSemanticallyCorrect(longer_text, "It is a public holiday in Ireland")
        self.assertSemanticallyIncorrect(longer_text, "It is a public holiday in Italy")
        self.assertSemanticallyEqual("Blue is the sky.", "The sky is blue.")

Here, assertSemanticallyCorrect() and its siblings use an LLM to classify the input and return a judgment. Instead of exact matches, we test whether the response aligns with our expectation.

Why This Matters

• AI systems often output slightly different versions of the same sentence, when repeated. This makes it very hard for traditional unittest asserts, but SemanticTestCase allows to compare these outputs as well.

• Handles paraphrased inputs: Profanity, toxicity, or policy violations don’t always follow exact patterns.

• More flexible testing: Works for tasks like summarization or classification, where exact matches aren’t realistic.

Some words on ..

Execution speed: Running an LLM for each test could be slower than traditional unit tests. But it is surprisingly fast on my Mac M1 with local Ollama and a laptop-sized LLM such as Gemma.

The speed is affected by the size of the prompt (or context), it is fast when comparing just a few sentences. Furthermore, the LLM stays loaded between two assertions, which also contributes to its speed.

Data protection: if handling sensitive data is a concern, install a local LLM e.g. using Ollama. Still quite fast.

NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the ‘ssl’ module is compiled with ‘LibreSSL 2.8.3’.

site-packages/urllib3/__init__.py:34: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020

Turns out that urllib3 version 2 wants OpenSSL to work properly. The earlier versions of urllib3 version 2 even stopped entirely, the later ones throw this warning. But my current ssl module appears to be LibreSSL. The idea is to install an urllib3 that is compatible with LibreSSL.

Solution

pip install urllib3==1.26.20

An older version of urllib3 series 1 works, I picked the latest. Now my app runs without the warning.

How to install the yaml package for Python?

I want to read the config for my backend from a yaml file and when installing the yaml package I am getting the following error:

pip install yaml
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
ERROR: Could not find a version that satisfies the requirement yaml (from versions: none)
ERROR: No matching distribution found for yaml

Quick Answer

pip install pyyaml

More Details

This article explaining how to read yaml config files in Python doesn’t show how to install the package, which should be straightforward, but there is no such package as yaml. A quick search on pypi.org found this package: https://pypi.org/project/PyYAML/

I tried it and it works like a charm with the same syntax as in the blog post. Now you can go ahead and try Ram’s example.

Python-Alpaca Dataset

I came across this dataset recently, a collection of 22k Python code examples, tested and verified to work. What really caught my attention is how this was put together—they used a custom script to extract Python code from Alpaca-formatted datasets, tested each snippet locally, and only kept the functional ones. Non-functional examples were separated into their own file.

The dataset pulls from a mix of open-source projects like Wizard-LM’s Evol datasets, CodeUp’s 19k, and a bunch of others, plus some hand-prompted GPT-4 examples. Everything’s been deduplicated, so you’re not stuck with repeats.

It’s especially cool if you’re working on training AI models for coding tasks because it sidesteps one of the biggest issues with open datasets: non-functional or broken code. They even hinted at adapting the script for other languages like C++ or SQL.

If you use the dataset or their script, they ask for attribution: Filtered Using Vezora’s CodeTester. Oh, and they’re working on releasing an even bigger dataset with 220,000+ examples, definitely one to keep an eye on!

On Huggingface: Tested-22k-Python-Alpaca

Read also how to analyze a dataset.

Can’t install PyTorch on my Macbook

To my surprise, I wasn’t able to install Pytorch for a project on my Macbook Pro M1 today (MacOS Sequoia 15.2). I kept getting this error when running pip3 install -r requirements.txt:

ERROR: Ignored the following versions that require a different python version: 1.21.2 Requires-Python >=3.7,<3.11; 1.21.3 Requires-Python >=3.7,<3.11; 1.21.4 Requires-Python >=3.7,<3.11; 1.21.5 Requires-Python >=3.7,<3.11; 1.21.6 Requires-Python >=3.7,<3.11; 1.26.0 Requires-Python <3.13,>=3.9; 1.26.1 Requires-Python <3.13,>=3.9
ERROR: Could not find a version that satisfies the requirement torch==2.7.0.dev20250116 (from versions: none)
ERROR: No matching distribution found for torch==2.7.0.dev20250116

I tried it manually: pip3 install torch, no luck:

pip install torch
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
ERROR: Could not find a version that satisfies the requirement torch (from versions: none)
ERROR: No matching distribution found for torch

Solution

This is what I came up with and it works fine:

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

Fix: No GPU support in Tensorflow

I came across a problem where my Tensorflow installation did not recognize the installed gpu, despite of Cuda and Nvidia drivers being installed properly.

test:

python3 -c “import tensorflow as tf; print(tf.config.list_physical_devices(‘GPU’))”

returned an empty list. Furthermore, it tells it cannot find the cuda library:

2024-01-30 14:57:42.015454: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.

Output of the Nvidia tool is correct and shows Cuda is installed:

nvidia-smi

ubuntu@ip-bla-foo:~/build-nb$  nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

Which tells us it is version 12. Ahhh!💡

Now, 12 is a version from 2023 and my idea was that Tensorflow 2.13 might not know this version, see https://blog.tensorflow.org/2023/11/whats-new-in-tensorflow-2-15.html

Ok, the latest version pip offered was TF 2.13 on Python 3.8. Here is the fix:

  1. upgrade Python: sudo apt install python3.9
  2. a new venv: virtualenv –python /usr/bin/python3.9 ~/.env-python3.9
  3. source ~/.env-python3.9/bin/activate
  4. pip install –upgrade pip
  5. python3 -m pip install tensorflow[and-cuda]==2.15.0.post1

Test: python3 -c “import tensorflow as tf; print(tf.config.list_physical_devices(‘GPU’))”

2024-01-30 15:27:04.458720: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-30 15:27:04.458772: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-30 15:27:04.459601: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-30 15:27:04.465334: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-01-30 15:27:05.115551: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-01-30 15:27:05.560865: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-01-30 15:27:05.585883: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-01-30 15:27:05.586100: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Now we see the GPU in Tensorflow.

Large vs Small LLMs – Thoughts

If you are working on a task that is very specific, a smaller LLM may be able to learn the task-specific patterns more quickly than a larger LLM. Additionally, if you are working on a resource-constrained device, a smaller LLM may be the only option. Read in this blog post how to prepare an LLM for a specific task.

Benefits of large LLMs, such as 70B

Large language models (LLMs) with more parameters are typically trained on larger datasets. The more parameters an LLM has, the more complex it is, and the more data it can process. This is because the parameters represent the connections between the neurons in the LLM’s neural network. The more parameters there are, the more connections there are, and the more complex the network can be.

Benefits of smaller LLMs, such as 6B or 770m

If I have a task that requires Python, I don’t need a model trained on Haskell, GO and Rust. It is not necessary to use a model that is trained on other programming languages. This is because LLMs that are trained on a variety of programming languages can often overfit to the training data, which can make them less effective for generating code in a specific language.

An LLM that is trained on a large dataset of Python, Haskell, Go, and Rust code may be able to generate code in all of these languages. However, it may not be as good at generating idiomatic Python code as an LLM that is specifically trained on Python code.

If you have a task that requires Python, it is generally best to use an LLM that is specifically trained on Python code. This will give you the best chance of generating code that is syntactically correct, semantically meaningful, and idiomatic.

A 6B model is significantly more convenient for many purposes: less expensive to operate, runs on your laptop, maybe more accurate on that specific language if the training data is good.

A good way to decide whether to use an LLM that is trained on multiple programming languages or an LLM that is specifically trained on one programming language is to experiment with both and see which one works better for your task.

Printing Stack Traces in Python Exceptions: A Comprehensive Guide

Uncover the Root of Errors with Detailed Tracebacks

Encountering errors in Python is inevitable. But with stack traces, you can pinpoint the exact location of an exception and diagnose issues effectively. Here’s a step-by-step guide:

1. Import the traceback Module:

import traceback

2. Utilize Exception Handling:

try:
processEvent() # Call the function that might raise an exception
except Exception as e:
print("Error encountered:", e)
traceback.print_exc() # Print the detailed stack trace

Key Points:

  • traceback.print_exc(): Prints a formatted traceback to the console.
  • Error Message: The print("Error encountered:", e) line displays the error message itself.
  • Traceback Structure:
    • Each line represents a function call leading to the exception.
    • The topmost line indicates the most recent call, and subsequent lines trace the sequence backward.

Example Output:

Error encountered: An error occurred!
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "<stdin>", line 2, in processEvent
Exception: An error occurred!

Leave a comment, or reach out if you have a question.

Additional Insights:

  • Customizing Output:
    • traceback.format_exc() returns the formatted traceback as a string for further manipulation.
    • traceback.print_exception() offers more control over output formatting.
  • Logging Stack Traces: Consider logging stack traces for debugging or analysis purposes.

Troubleshooting with Confidence:

  • By effectively printing and understanding stack traces, you’ll be equipped to resolve Python exceptions efficiently and maintain code stability.