How to run Ollama in CircleCI

Yes, it’s absolutely possible! You can run a small LLM like Gemma3 4b using Ollama in your basic CircleCI pipeline to integrate AI capabilities directly into your CI/CD workflows. Of course its capabilities are limited, but you can use it for agents or semantic unittests.

Here is an example of a CircleCI config using Ollama, and runs on the free plan (large resource). It demonstrates how to use the Ollama Docker image in a CI pipeline and assumes you want to pull a model and run a basic script using the Ollama service.

jobs:
  ollama-example:
    docker:
      - image: cimg/python:3.9
      - image: ollama/ollama:latest
        name: ollama
    resource_class: large
    steps:
      - checkout
      - run:
          name: Wait for Ollama to start
          command: |
            until curl -s http://ollama:11434/; do
              echo "Waiting for Ollama to start..."
              sleep 5
            done
      - run:
          name: Pull Gemma3 Model Using Web API
          command: |
            curl -X POST http://ollama:11434/api/pull \
              -H "Content-Type: application/json" \
              -d '{"model": "gemma3:4b"}'
      - run:
          name: Run a Python script using Ollama
          command: |
            python script.py

workflows:
  version: 2
  ollama-workflow:
    jobs:
      - ollama-example

And the Python script:

import requests
from pprint import pprint

response = requests.post(
    'http://ollama:11434/api/completion',
    json={'model': 'gemma3:4b', 'prompt': 'Hello, Ollama!'}
)
pprint(response.json())

This configuration is simple and can be used as a starting point to work on integrating Ollama into a CI pipeline.

Semantic Unittests

Unit tests traditionally focus on verifying exact outputs, but how do we test the output of data that might slightly change, such as the output of an LLM to the same question.

Luckily, using a SemanticTestcase we can test semantic correctness rather than rigid string matches in Python. This is useful for applications like text validation, classification, or summarization, where there’s more than one “correct” answer.

Traditional vs. Semantic Testing

  • Traditional Unit Test

A standard test might look like this:

import unittest
from text_validator import validate_text

class TestTextValidator(unittest.TestCase):
    def test_profane_text(self):
        self.assertFalse(validate_text("This is some bad language!")) 
    def test_clean_text(self):
        self.assertTrue(validate_text("Hello, how are you?"))

Here, validate_text() returns True or False, but it assumes there’s a strict set of phrases that are “bad” or “good.” Edge cases like paraphrased profanity might be missed.

  • Semantic Unit Test

Instead of rigid assertions, we can use SemanticTestCase to evaluate the meaning of the response:

self.assertSemanticallyEqual("Blue is the sky.", "The sky is blue.")

A test case:


class TestTextValidator(SemanticTestCase):
    """
    We're testing the SemanticTestCase here
    """

    def test_semantic(self):
        self.assertSemanticallyCorrect(longer_text, "It is a public holiday in Ireland")
        self.assertSemanticallyIncorrect(longer_text, "It is a public holiday in Italy")
        self.assertSemanticallyEqual("Blue is the sky.", "The sky is blue.")

Here, assertSemanticallyCorrect() and its siblings use an LLM to classify the input and return a judgment. Instead of exact matches, we test whether the response aligns with our expectation.

Why This Matters

• AI systems often output slightly different versions of the same sentence, when repeated. This makes it very hard for traditional unittest asserts, but SemanticTestCase allows to compare these outputs as well.

• Handles paraphrased inputs: Profanity, toxicity, or policy violations don’t always follow exact patterns.

• More flexible testing: Works for tasks like summarization or classification, where exact matches aren’t realistic.

Some words on ..

Execution speed: Running an LLM for each test could be slower than traditional unit tests. But it is surprisingly fast on my Mac M1 with local Ollama and a laptop-sized LLM such as Gemma.

The speed is affected by the size of the prompt (or context), it is fast when comparing just a few sentences. Furthermore, the LLM stays loaded between two assertions, which also contributes to its speed.

Data protection: if handling sensitive data is a concern, install a local LLM e.g. using Ollama. Still quite fast.

Python-Alpaca Dataset

I came across this dataset recently, a collection of 22k Python code examples, tested and verified to work. What really caught my attention is how this was put together—they used a custom script to extract Python code from Alpaca-formatted datasets, tested each snippet locally, and only kept the functional ones. Non-functional examples were separated into their own file.

The dataset pulls from a mix of open-source projects like Wizard-LM’s Evol datasets, CodeUp’s 19k, and a bunch of others, plus some hand-prompted GPT-4 examples. Everything’s been deduplicated, so you’re not stuck with repeats.

It’s especially cool if you’re working on training AI models for coding tasks because it sidesteps one of the biggest issues with open datasets: non-functional or broken code. They even hinted at adapting the script for other languages like C++ or SQL.

If you use the dataset or their script, they ask for attribution: Filtered Using Vezora’s CodeTester. Oh, and they’re working on releasing an even bigger dataset with 220,000+ examples, definitely one to keep an eye on!

On Huggingface: Tested-22k-Python-Alpaca

Read also how to analyze a dataset.

ReDoc: Simplifying API Documentation for Open Source Developers

In the world of open source software development, creating user-friendly and informative API documentation is crucial. The right documentation can be the bridge that connects developers to your project, making it more accessible and inviting collaboration. That’s where ReDoc comes into the picture, offering a powerful solution for generating interactive API documentation with ease.

What Is ReDoc?

ReDoc is an open-source tool designed to simplify the process of creating interactive API documentation. It’s tailored for APIs that adhere to the OpenAPI Specification (formerly known as Swagger), which is a widely adopted standard for describing RESTful APIs. ReDoc takes your OpenAPI Specification file and transforms it into visually appealing and user-friendly documentation that developers love.

Why ReDoc?

As open source developers, we’re constantly seeking ways to make our projects more accessible and inviting to the community. High-quality API documentation is a significant part of this effort. Here’s why ReDoc is a game-changer for open source software development:

1. Interactivity: ReDoc creates interactive documentation that enables developers to explore API endpoints and responses in a user-friendly manner. This interactivity keeps users engaged and simplifies their learning experience.

2. A Modern Look: ReDoc offers a clean and modern design for your API documentation. It’s responsive, visually appealing, and aligns with the high standards that open source projects aim for.

3. OpenAPI Compatibility: If your API is described in an OpenAPI YAML or JSON file (and it should be!), ReDoc can seamlessly generate documentation from it. This ensures that your documentation is always in sync with your API.

4. Customization: ReDoc provides various customization options, allowing you to tailor the documentation to match your project’s branding and style. You can adjust colors, fonts, and other design elements to make it your own.

5. Ease of Integration: Integrating ReDoc into your existing documentation infrastructure is straightforward. You can host the generated documentation on your website, making it easily accessible to users.

6. Community and Support: ReDoc boasts an active and growing community of users and contributors. This means you can find support and resources when you need them.

7. Multiple Themes: ReDoc offers multiple pre-designed themes, making it easy to switch between different looks for your API documentation.

How to Get Started with ReDoc

Using ReDoc is as simple as 1-2-3:

  1. OpenAPI Specification: Make sure your API is described in an OpenAPI YAML or JSON file.
  2. Installation: Install ReDoc and specify the location of your OpenAPI Specification file.
  3. Customization: If desired, customize the documentation to match your project’s branding.

Conclusion

In the realm of open source software development, user-friendly API documentation is a non-negotiable aspect of project success. ReDoc empowers open source developers to create captivating and interactive documentation effortlessly. With ReDoc, your API documentation can be the key that invites developers into your project and fosters collaboration.

5 reasons why it makes sense to work with branches and tests in a single developer project

Working with branches and automated tests can bring a host of benefits to a single developer project, even if the project isn’t being worked on by multiple people. Here are some of the reasons why:

  1. Enhanced efficiency: When working with branches, a solo developer can tackle multiple features or bug fixes simultaneously, without having to worry about disrupting the main codebase. Additionally, by utilizing automated tests, the developer can validate that changes made in a branch don’t break existing functionality in a fast and efficient manner.
  2. Superior code quality: Automated tests can help catch bugs and issues early in the development process, long before they become problematic and harder to resolve. This leads to a more stable codebase and better code quality overall.
  3. Optimal version control: Branches allow a single developer to switch between different versions of code easily, as well as revert back to a previous version if necessary. This also makes it easier for the developer to manage code reviews and collaborate with other developers if the need arises in the future.
  4. Increased confidence: Automated tests provide a safety net for changes made in the code, which can give the developer more confidence when making modifications. If issues arise, the tests will quickly detect them, allowing the developer to fix them promptly.
  5. Support for experimentation: Branches make it possible for a developer to experiment with new ideas or approaches without affecting the main codebase. This can be especially valuable when exploring new technologies or finding new solutions to problems.

In conclusion, working with branches and automated tests can lead to improved efficiency, better code quality, optimal version control, increased confidence, and support for experimentation even in single developer projects. Whether you’re a beginner or an experienced developer, utilizing these tools can help streamline your development process and lead to better results.

Code Review – A Critical Component of Software Development

Code reviews are an essential aspect of software development that can significantly enhance the quality and reliability of your code. They provide an opportunity for developers to learn from one another, share their expertise, and collaborate on creating better code.

One of the key benefits of code reviews is improved code quality. Through code reviews, developers can identify and resolve potential bugs, performance issues, and security vulnerabilities before the code is released to production. This proactive approach can save time and resources in the long run, as it is more cost-effective to catch and fix problems early in the development process.

In addition to improving code quality, code reviews also facilitate knowledge sharing and best practices. Reviewing the code of others can help developers understand the codebase and learn new techniques for writing high-quality code. This sharing of knowledge and expertise can lead to increased efficiency and better collaboration among team members, as everyone works towards a common goal.

Code reviews also play a crucial role in enhancing team communication. By working together to review and improve code, developers can build a sense of teamwork and collaboration. This can result in better communication and higher-quality code, as everyone works together to ensure the code meets the necessary standards and specifications.

Consistency in code is another important aspect that can be maintained through code reviews. By ensuring that code follows established coding standards, code reviews make it easier to maintain and enhance the code over time. This can greatly reduce the time and effort required for code maintenance and updates, as everyone on the team follows the same standards and best practices.

Finally, code reviews can also improve documentation, making it easier for others to understand and work with the code in the future. By reviewing the documentation and ensuring its completeness and accuracy, code reviews can help ensure that the code is well-documented and easy to understand.

In conclusion, code reviews are a valuable tool that can provide numerous benefits for both individual developers and development teams. Incorporating code reviews into your development process can help you write better code, share knowledge, communicate effectively, maintain consistency, and enhance documentation. Don’t overlook the importance of code reviews – make them a part of your workflow for the best results.

2 real world code reviews compared

I came across this blog post from Python/Django developer Matt Layman, where he compares 2 different cases of code reviews: one where people sat in a room and spent a lot of time talking about style issues, and one where style checking was automated on GitHub and people started talking about the actual changes and problems the code tries to solve. I recommend reading it.

One thing that really interests me is how did the style guide they were not able to automate look like. Maybe AI based style checkers like FYT could have worked?

2 side takeaways: isort and flake8 for Python, which I will try out soon as I’m using Python/Django myself and especially imports are a topic I need to look into and do some refactoring on, it easily can get messy.

I’m curious how the decision finding and agreeing on rules like flake8 look like in a team, it must be a process of its own.

out parameter in C++

What is an out-parameter? An out-parameter is a non const reference (or pointer) passed to the function, which will then modify it by setting a value.

In C++, we pass arguments by reference usually to avoid copying the object, but what about the behavior of the function taking these arguments.

Using pass-by-value is clear: the arguments are inputs, whereas pass-by-reference can be inputs, outputs or in-outs. This confuses the reader: one has to take extra steps to find out what it does, these constructs are not self-documenting.

I still often see that methods or functions take one or several references for the purpose to modify it. This is not intuitive, and can lead to unexpected behavior in your Cpp code.

If an object needs to be modified, a method on that object could be used instead. The modified object could also be returned, where it is clear that the return type is an ‘output’.

If you need to return several values, a std::tuple or std::pair can be used.