How to run Ollama in CircleCI

Yes, it’s absolutely possible! You can run a small LLM like Gemma3 4b using Ollama in your basic CircleCI pipeline to integrate AI capabilities directly into your CI/CD workflows. Of course its capabilities are limited, but you can use it for agents or semantic unittests.

Here is an example of a CircleCI config using Ollama, and runs on the free plan (large resource). It demonstrates how to use the Ollama Docker image in a CI pipeline and assumes you want to pull a model and run a basic script using the Ollama service.

jobs:
  ollama-example:
    docker:
      - image: cimg/python:3.9
      - image: ollama/ollama:latest
        name: ollama
    resource_class: large
    steps:
      - checkout
      - run:
          name: Wait for Ollama to start
          command: |
            until curl -s http://ollama:11434/; do
              echo "Waiting for Ollama to start..."
              sleep 5
            done
      - run:
          name: Pull Gemma3 Model Using Web API
          command: |
            curl -X POST http://ollama:11434/api/pull \
              -H "Content-Type: application/json" \
              -d '{"model": "gemma3:4b"}'
      - run:
          name: Run a Python script using Ollama
          command: |
            python script.py

workflows:
  version: 2
  ollama-workflow:
    jobs:
      - ollama-example

And the Python script:

import requests
from pprint import pprint

response = requests.post(
    'http://ollama:11434/api/completion',
    json={'model': 'gemma3:4b', 'prompt': 'Hello, Ollama!'}
)
pprint(response.json())

This configuration is simple and can be used as a starting point to work on integrating Ollama into a CI pipeline.

Semantic Unittests

Unit tests traditionally focus on verifying exact outputs, but how do we test the output of data that might slightly change, such as the output of an LLM to the same question.

Luckily, using a SemanticTestcase we can test semantic correctness rather than rigid string matches in Python. This is useful for applications like text validation, classification, or summarization, where there’s more than one “correct” answer.

Traditional vs. Semantic Testing

  • Traditional Unit Test

A standard test might look like this:

import unittest
from text_validator import validate_text

class TestTextValidator(unittest.TestCase):
    def test_profane_text(self):
        self.assertFalse(validate_text("This is some bad language!")) 
    def test_clean_text(self):
        self.assertTrue(validate_text("Hello, how are you?"))

Here, validate_text() returns True or False, but it assumes there’s a strict set of phrases that are “bad” or “good.” Edge cases like paraphrased profanity might be missed.

  • Semantic Unit Test

Instead of rigid assertions, we can use SemanticTestCase to evaluate the meaning of the response:

self.assertSemanticallyEqual("Blue is the sky.", "The sky is blue.")

A test case:


class TestTextValidator(SemanticTestCase):
    """
    We're testing the SemanticTestCase here
    """

    def test_semantic(self):
        self.assertSemanticallyCorrect(longer_text, "It is a public holiday in Ireland")
        self.assertSemanticallyIncorrect(longer_text, "It is a public holiday in Italy")
        self.assertSemanticallyEqual("Blue is the sky.", "The sky is blue.")

Here, assertSemanticallyCorrect() and its siblings use an LLM to classify the input and return a judgment. Instead of exact matches, we test whether the response aligns with our expectation.

Why This Matters

• AI systems often output slightly different versions of the same sentence, when repeated. This makes it very hard for traditional unittest asserts, but SemanticTestCase allows to compare these outputs as well.

• Handles paraphrased inputs: Profanity, toxicity, or policy violations don’t always follow exact patterns.

• More flexible testing: Works for tasks like summarization or classification, where exact matches aren’t realistic.

Some words on ..

Execution speed: Running an LLM for each test could be slower than traditional unit tests. But it is surprisingly fast on my Mac M1 with local Ollama and a laptop-sized LLM such as Gemma.

The speed is affected by the size of the prompt (or context), it is fast when comparing just a few sentences. Furthermore, the LLM stays loaded between two assertions, which also contributes to its speed.

Data protection: if handling sensitive data is a concern, install a local LLM e.g. using Ollama. Still quite fast.

NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the ‘ssl’ module is compiled with ‘LibreSSL 2.8.3’.

site-packages/urllib3/__init__.py:34: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020

Turns out that urllib3 version 2 wants OpenSSL to work properly. The earlier versions of urllib3 version 2 even stopped entirely, the later ones throw this warning. But my current ssl module appears to be LibreSSL. The idea is to install an urllib3 that is compatible with LibreSSL.

Solution

pip install urllib3==1.26.20

An older version of urllib3 series 1 works, I picked the latest. Now my app runs without the warning.

How to install the yaml package for Python?

I want to read the config for my backend from a yaml file and when installing the yaml package I am getting the following error:

pip install yaml
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
ERROR: Could not find a version that satisfies the requirement yaml (from versions: none)
ERROR: No matching distribution found for yaml

Quick Answer

pip install pyyaml

More Details

This article explaining how to read yaml config files in Python doesn’t show how to install the package, which should be straightforward, but there is no such package as yaml. A quick search on pypi.org found this package: https://pypi.org/project/PyYAML/

I tried it and it works like a charm with the same syntax as in the blog post. Now you can go ahead and try Ram’s example.

Can’t install PyTorch on my Macbook

To my surprise, I wasn’t able to install Pytorch for a project on my Macbook Pro M1 today (MacOS Sequoia 15.2). I kept getting this error when running pip3 install -r requirements.txt:

ERROR: Ignored the following versions that require a different python version: 1.21.2 Requires-Python >=3.7,<3.11; 1.21.3 Requires-Python >=3.7,<3.11; 1.21.4 Requires-Python >=3.7,<3.11; 1.21.5 Requires-Python >=3.7,<3.11; 1.21.6 Requires-Python >=3.7,<3.11; 1.26.0 Requires-Python <3.13,>=3.9; 1.26.1 Requires-Python <3.13,>=3.9
ERROR: Could not find a version that satisfies the requirement torch==2.7.0.dev20250116 (from versions: none)
ERROR: No matching distribution found for torch==2.7.0.dev20250116

I tried it manually: pip3 install torch, no luck:

pip install torch
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
ERROR: Could not find a version that satisfies the requirement torch (from versions: none)
ERROR: No matching distribution found for torch

Solution

This is what I came up with and it works fine:

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

How to build an AI Agent with a memory

How to Build an Agent with a Local LLM and RAG, Complete with Local Memory

If you want to build an agent with a local LLM that can remember things and retrieve them on demand, you’ll need a few components: the LLM itself, a Retrieval-Augmented Generation (RAG) system, and a memory mechanism. Here’s how you can piece it all together, with examples using LangChain and Python. (and here is why a small LLM is a good idea)

Step 1: Set Up Your Local LLM

First, you need a local LLM. This could be a smaller pre-trained model like LLaMA or GPT-based open-source options running on your machine. The key is that it’s not connected to the cloud—it’s local, private, and under your control. Make sure the LLM is accessible via an API or similar interface so that you can integrate it into your system. A good choice would be using Ollama and an LLM such as Googles gemma. I also wrote easy to follow instructions in how to set an T5 LLM from Salesforce up locally, but it is also perfectly fine to use a cloud-based LLM.

In case the agent you want to build is about source code, here is an example of how to use CodeT5 with LangChain.

Step 2: Add Retrieval-Augmented Generation (RAG)

TL;DR: Gist on Github

Next comes the RAG. A RAG system works by combining your LLM with an external knowledge base. The idea is simple: when the LLM encounters a query, the RAG fetches relevant information from your knowledge base (documents, notes, or even structured data) and feeds it into the LLM as context.

To set up RAG, you’ll need:

  1. A Vector Database: This is where your knowledge will live. Tools like Pinecone, Weaviate, or even local implementations like FAISS can store your data as embeddings.
  2. A Way to Query the Vector Database: Use similarity search to find the most relevant pieces of information for any given query.
  3. Integration with the LLM: Once the RAG fetches data, format it and pass it as input to the LLM.

I have good experience with LangChain and Chroma:

documents = TextLoader("my_data.txt").load()
texts = CharacterTextSplitter(chunk_size=300, chunk_overlap=100).split_documents(documents)
vectorstore = Chroma.from_documents(texts, OllamaEmbeddings(model="gemma:latest")).as_retriever()

llm = OllamaLLM(model=model_name)
qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=vectorstore)

qa_chain.invoke("What is the main topic of my document?")

Step 3: Introduce Local Memory

Now for the fun part: giving your agent memory. Memory is what allows the agent to recall past interactions or store information for future use. There are a few ways to do this:

  • Short-Term Memory: Store conversation context temporarily. This can simply be a rolling buffer of recent interactions that gets passed back into the LLM each time.
  • Long-Term Memory: Save important facts or interactions for retrieval later. For this, you can extend your RAG system by saving interactions as embeddings in your vector database.

For example:

  1. After each interaction, decide if it’s worth remembering.
  2. If yes, convert it into an embedding and store it in your vector database.
  3. When needed, retrieve it alongside other RAG data to give the agent a sense of history.

Langchain Example

from langchain.memory import ConversationBufferMemory

# Initialize memory
memory = ConversationBufferMemory()

# Save some conversation turns
memory.save_context({"input": "Hello"}, {"output": "Hi there!"})
memory.save_context({"input": "How are you?"}, {"output": "I'm doing great, thanks!"})

# Retrieve stored memory
print(memory.load_memory_variables({}))

Step 4: Put It All Together

Now you can combine these elements:

  • The user sends a query.
  • The system retrieves relevant data via RAG.
  • The memory module checks for related interactions or facts.
  • The LLM generates a response based on the query, retrieved context, and memory.

This setup is powerful because it blends the LLM’s generative abilities with a custom memory tailored to your needs. It’s also entirely local, so your data stays private and secure.

Final Thoughts

Building an agent like this might sound complex, but it’s mostly about connecting the dots between well-known tools. Once you’ve got it running, you can tweak and fine-tune it to handle specific tasks or remember things better. Start small, iterate, and soon you’ll have an agent that feels less like software and more like a real assistant.

Fix: No GPU support in Tensorflow

I came across a problem where my Tensorflow installation did not recognize the installed gpu, despite of Cuda and Nvidia drivers being installed properly.

test:

python3 -c “import tensorflow as tf; print(tf.config.list_physical_devices(‘GPU’))”

returned an empty list. Furthermore, it tells it cannot find the cuda library:

2024-01-30 14:57:42.015454: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.

Output of the Nvidia tool is correct and shows Cuda is installed:

nvidia-smi

ubuntu@ip-bla-foo:~/build-nb$  nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

Which tells us it is version 12. Ahhh!💡

Now, 12 is a version from 2023 and my idea was that Tensorflow 2.13 might not know this version, see https://blog.tensorflow.org/2023/11/whats-new-in-tensorflow-2-15.html

Ok, the latest version pip offered was TF 2.13 on Python 3.8. Here is the fix:

  1. upgrade Python: sudo apt install python3.9
  2. a new venv: virtualenv –python /usr/bin/python3.9 ~/.env-python3.9
  3. source ~/.env-python3.9/bin/activate
  4. pip install –upgrade pip
  5. python3 -m pip install tensorflow[and-cuda]==2.15.0.post1

Test: python3 -c “import tensorflow as tf; print(tf.config.list_physical_devices(‘GPU’))”

2024-01-30 15:27:04.458720: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-30 15:27:04.458772: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-30 15:27:04.459601: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-30 15:27:04.465334: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-01-30 15:27:05.115551: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-01-30 15:27:05.560865: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-01-30 15:27:05.585883: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-01-30 15:27:05.586100: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Now we see the GPU in Tensorflow.

The Costs of Upgrading a Library in your SW Stack

As developers, we are constantly striving to improve our software, adopting the latest advancements to keep our applications running smoothly and efficiently. However, with progress comes change, and sometimes, that means facing the challenge of upgrading libraries when API concepts have evolved. In this blog post, we’ll dive into the costs associated with upgrading a library due to changes in API concepts and explore strategies to mitigate the impact on your codebase.

Understanding API Changes: The Evolution of Software Libraries

Software libraries are the building blocks of modern applications, providing developers with pre-built functionality and tools to save time and effort. As libraries evolve, their API concepts may change to accommodate new features, improve performance, or fix bugs. While these updates bring value and innovation, they can also introduce compatibility issues with existing code, leading to the need for upgrades.

Identifying the Costs of Upgrading

Upgrading a library with changed API concepts can have several costs associated with it:

a. Code Rewriting: API changes might render parts of your existing code incompatible. This could necessitate rewriting sections of your codebase to align with the updated library, resulting in additional development time and effort.

b. Testing and Debugging: Upgrades can introduce unexpected behavior or bugs. Rigorous testing and debugging are crucial to ensure the new version of the library functions correctly and doesn’t disrupt the existing functionalities.

c. Delayed Development: The process of upgrading can temporarily halt other development tasks, as developers focus on adapting the codebase to the changes. This might impact project timelines and deliverables.

d. Learning Curve: With updated API concepts, developers need time to understand the changes thoroughly. This learning curve can slow down the development process, especially for large or complex libraries.

Mitigating the Impact of Library Upgrades

While the costs of upgrading a library cannot be entirely eliminated, developers can take proactive steps to minimize their impact:

a. Regular Code Maintenance: Consistently reviewing and maintaining your codebase can make it more adaptable to future changes in APIs. This includes using best coding practices, avoiding deprecated features, and documenting crucial elements.

b. Version Control: Leveraging version control systems like Git allows you to manage library updates efficiently. By creating separate branches for upgrades, you can isolate changes and test them before merging into the main codebase.

c. Test-Driven Development (TDD): Implementing TDD ensures that your codebase remains stable even after library upgrades. Writing test cases before modifying code helps catch potential issues early on and ensures that new changes don’t break existing functionalities.

d. Community Support: Utilize online developer communities, forums, and documentation to seek advice and share experiences with library upgrades. Collaborating with other developers can provide valuable insights and solutions.

Conclusion

Upgrading a library when API concepts have changed is a necessary yet challenging aspect of software development. While it incurs costs such as code rewriting, testing, and delayed development, taking a proactive approach and utilizing best practices can help mitigate these challenges. Embrace upgrades as opportunities to improve your application and stay ahead of the curve. By staying informed, collaborating with the community, and maintaining your codebase diligently, you can tackle library upgrades with confidence and continue delivering exceptional software to your users.

How to read long compiler outputs

Reading long compiler outputs can be overwhelming and time-consuming, but there are several steps you can take to make it easier:

  1. Scan for error messages: Look for the word “error” in the output, as this indicates a problem that needs to be fixed. Start by fixing the first error, as it may resolve subsequent errors.
  2. Look for error messages that are repeated: If the same error message is repeated multiple times, it may be easier to resolve all instances of the error at once.
  3. Locate the file and line number of the error: The compiler will usually provide the name of the file and line number where the error occurred. This information can be used to quickly locate the problem in your code.
  4. Read the error message carefully: The error message will usually give you a clue as to what the problem is and how to fix it. Pay close attention to the error message and take the time to understand what it is telling you.
  5. Use a text editor with error navigation: Some text editors have plugins that can automatically parse the compiler output and allow you to quickly navigate to the location of the error in your code.
  6. Consult online resources: If you are not sure how to resolve an error, you can consult online resources such as Stack Overflow, the GCC documentation, or other forums.
  7. Try to understand the root cause of the error: Compiler errors often have multiple causes, so try to understand the root cause of the error so you can fix it for good.

By following these steps, you can make reading long compiler outputs easier and more manageable.

Git setup in Jenkins Pipeline

I’ll show how to set up Jenkins and SSH keys to clone a git repo in a build step in the Jenkins pipeline. This wasn’t straight forward at all and several obstacles were in the way and had to be removed.

I am using an Ubuntu 22.04 host, Jenkins 2.375.1, Jenkins Pipeline and docker based agents running Ubuntu as well.

SSH Setup for git clone

Pipeline is able to do the git clone, so we don’t need to hassle with running ‘git clone’ on the agent, which comes with its own problems. In this case we would have to find a safe and secure way to put your ssh keys into the agent. Luckily, Pipeline can do the git checkout, and the key stays with the host.

Basically, what we need to do is generate ssh keys for the jenkins user on Ubuntu, distribute them correctly to the git server, and set up the credentials in Jenkins. Then we can add the git step in the Pipeline.

You might easily run into the problems here, as it is a bit tricky to find the right settings. This is what worked for me, lets start:

Create an ssh key under the jenkins user. The easiest way to do this is by logging in into the user, and create the keys in its home dir. First, give the jenkins user a password:

sudo passwd jenkins

Now you can login as this user:

su jenkins

This users home dir, JENKINS_HOME, is under /usr/lib/jenkins/ (at least on Ubuntu 22.04). Make sure you stand in this directory when you create your key.

ssh-keygen

You can leave the passphrase empty. It generates a secret and an id file under /usr/lib/jenkins/.ssh/ The .ssh folder and the files need the following permissions:

-rw-------  1 jenkins jenkins 2602 Dec  6 11:10 id_rsa
-rw-r--r--  1 jenkins jenkins  569 Dec  6 11:10 id_rsa.pub

which corresponds to 600 and 644. The .ssh folder itself has 700 (drwx——)

Add the key to your git server:

ssh-copy-id git@yourgitserver

Test it:

ssh -vvv git@yourgitserver

If something goes wrong with your key, ssh will offer you password login and it looks like this:

debug3: authmethod_is_enabled publickey
debug1: Next authentication method: publickey
debug1: Offering public key: /var/lib/jenkins/.ssh/id_rsa RSA SHA256:*************************
debug3: send packet: type 50
debug2: we sent a publickey packet, wait for reply
debug3: receive packet: type 51
debug1: Authentications that can continue: publickey,password
debug1: Trying private key: /var/lib/jenkins/.ssh/id_ecdsa
debug3: no such identity: /var/lib/jenkins/.ssh/id_ecdsa: No such file or directory
debug1: Trying private key: /var/lib/jenkins/.ssh/id_ecdsa_sk
debug3: no such identity: /var/lib/jenkins/.ssh/id_ecdsa_sk: No such file or directory
debug1: Trying private key: /var/lib/jenkins/.ssh/id_ed25519
debug3: no such identity: /var/lib/jenkins/.ssh/id_ed25519: No such file or directory
debug1: Trying private key: /var/lib/jenkins/.ssh/id_ed25519_sk
debug3: no such identity: /var/lib/jenkins/.ssh/id_ed25519_sk: No such file or directory
debug1: Trying private key: /var/lib/jenkins/.ssh/id_xmss
debug3: no such identity: /var/lib/jenkins/.ssh/id_xmss: No such file or directory
debug1: Trying private key: /var/lib/jenkins/.ssh/id_dsa
debug3: no such identity: /var/lib/jenkins/.ssh/id_dsa: No such file or directory
debug2: we did not send a packet, disable method
debug3: authmethod_lookup password
debug3: remaining preferred: ,password
debug3: authmethod_is_enabled password
debug1: Next authentication method: password
git@victory's password: 

Check everything again, also permissions of the files on the git server side, e.g. authorized_keys (spelling, 600/-rw——-)

If things go well, it looks like this:

debug3: receive packet: type 60
debug1: Server accepts key: /var/lib/jenkins/.ssh/id_rsa RSA SHA256:*************************
debug3: sign_and_send_pubkey: using publickey-hostbound-v00@openssh.com with RSA SHA256:*************************
debug3: sign_and_send_pubkey: signing using rsa-sha2-512 SHA256:*************************
debug3: send packet: type 50
debug3: receive packet: type 52

...

debug2: shell request accepted on channel 0
Welcome to Ubuntu 22.04.1 LTS (GNU/Linux 5.15.0-56-generic x86_64)

Now you can set up an identity to use this key. Go to Dashboard – your project and click on Configure. Click on Pipeline Syntax (at the bottom), choose the Snippet Generator and choose git. Under Credentials is a + button to add one. Choose ssh username with private key, leave the scope as it is. ID and Description according to what makes sense to you. Username jenkins. Private key – enter directly. Here you paste the content of your ~/.ssh/id_rsa file (the private part). Caution here, this shouldn’t slip anywhere else, sharpen your copy and paste skills. Then click add.

Git in the Pipeline

Now, when you also put your git repo there, you get right away the right git clone snippet for your Jenkins pipeline:

git credentialsId: 'jenkins-git', url: 'git@yourgitserver:/your-repo.git'

Embedded in the pipeline it can look like this:

pipeline {
    agent any

    stages {
        stage('Git Checkout') {
            steps {
                git credentialsId: 'jenkins-git', url: 'git@yourgitserver:/your-repo.git'
            }
        }

Now, under Dashboard – Manage Jenkins – Manage Credentials you should see your key, and you can change it there.

Host key acceptance

In order to use git in Jenkins Pipeline, you must also ensure they host key is accepted. I got errors in the clone step indicating the host key is not known and cannot be accepted.

stdout: 
stderr: No ECDSA host key is known for victory and you have requested strict checking.

To fix this, you go to Dashboard – Manage Jenkins – Configure Global Security, look for Git Host Key Verification Configuration and change the strategy to Accept first connection.

Now you can build your project and it should be able to clone your git repo.