Semantic Unittests

Unit tests traditionally focus on verifying exact outputs, but how do we test the output of data that might slightly change, such as the output of an LLM to the same question.

Luckily, using a SemanticTestcase we can test semantic correctness rather than rigid string matches in Python. This is useful for applications like text validation, classification, or summarization, where there’s more than one “correct” answer.

Traditional vs. Semantic Testing

  • Traditional Unit Test

A standard test might look like this:

import unittest
from text_validator import validate_text

class TestTextValidator(unittest.TestCase):
    def test_profane_text(self):
        self.assertFalse(validate_text("This is some bad language!")) 
    def test_clean_text(self):
        self.assertTrue(validate_text("Hello, how are you?"))

Here, validate_text() returns True or False, but it assumes there’s a strict set of phrases that are “bad” or “good.” Edge cases like paraphrased profanity might be missed.

  • Semantic Unit Test

Instead of rigid assertions, we can use SemanticTestCase to evaluate the meaning of the response:

self.assertSemanticallyEqual("Blue is the sky.", "The sky is blue.")

A test case:


class TestTextValidator(SemanticTestCase):
    """
    We're testing the SemanticTestCase here
    """

    def test_semantic(self):
        self.assertSemanticallyCorrect(longer_text, "It is a public holiday in Ireland")
        self.assertSemanticallyIncorrect(longer_text, "It is a public holiday in Italy")
        self.assertSemanticallyEqual("Blue is the sky.", "The sky is blue.")

Here, assertSemanticallyCorrect() and its siblings use an LLM to classify the input and return a judgment. Instead of exact matches, we test whether the response aligns with our expectation.

Why This Matters

• AI systems often output slightly different versions of the same sentence, when repeated. This makes it very hard for traditional unittest asserts, but SemanticTestCase allows to compare these outputs as well.

• Handles paraphrased inputs: Profanity, toxicity, or policy violations don’t always follow exact patterns.

• More flexible testing: Works for tasks like summarization or classification, where exact matches aren’t realistic.

Some words on ..

Execution speed: Running an LLM for each test could be slower than traditional unit tests. But it is surprisingly fast on my Mac M1 with local Ollama and a laptop-sized LLM such as Gemma.

The speed is affected by the size of the prompt (or context), it is fast when comparing just a few sentences. Furthermore, the LLM stays loaded between two assertions, which also contributes to its speed.

Data protection: if handling sensitive data is a concern, install a local LLM e.g. using Ollama. Still quite fast.

Was bedeutet ‘nit’ in Code Reviews?

Was bedeutet nit in einem Code Review?

Gelegentlich können wir in Codereviews Kommentare wie diese finden:

auto result = std::make_pair<uint64_t, std::string>(64, "Hallihallo");;

nit: double semicolon

In einem Review bezeichnet “Nit” eine kleine Ungenauigkeit oder einen Fehler, der die Funktionalität des Codes nicht wesentlich beeinträchtigt, aber dennoch korrigiert werden sollte. Zum Beispiel ein Tippfehler in einem Kommentar, ein zu viel gesetzter Semikolon oder eine zusätzliche Leerzeile. Der Prüfer weist auf diesen Fehler hin, möchte aber wahrscheinlich nicht, dass Ihr den Pull-Request aufgrund dieser Kleinigkeit verzögert.

Und so sollten wir damit umgehen: Wenn Du noch an dem PR arbeitest, kannst Du das in einem der nächsten Commits beheben. Verzögere aber nicht die Integration der Funktion oder des Bugfixes nicht wegen dieser Kleinigkeit. Wir alle wissen, dass das Warten auf den CI Zeit in Anspruch nehmen kann, und wenn Du den CI für diese Kleinigkeit blockierst, werden einige Leute wahrscheinlich nicht sehr glücklich sein.

Um mehr über die in Code Reviews verwendete “Slang” zu erfahren, habe ich in diesem Blogpost (englisch) eine Liste zusammengestellt, in der Du Erklärungen zu Abkürzungen wie +1, WIP, lgtm und anderen findest.

Die englische Version dieses Posts findest Du hier.

Code Review, what to look for

Why Code Reviews

Code review is an essential part of the software development process that helps to ensure the quality of code and catch potential issues before they become a problem. By having other developers review the code, it helps to identify areas for improvement, promote best practices, and ensure that code is maintainable, scalable, and secure. Code reviews can be conducted using a variety of tools, such as code review platforms, linting tools, automated code review tools, code comparison tools, and code coverage tools. The goal of Code review is to improve the quality of code and make the development process more efficient and effective. Regular Code Reviews can help to promote a culture of collaboration and teamwork within the development team, leading to better code and a more successful project.

What to look for

A code review is an important part of the software development process and developers should look for the following aspects when conducting a code review:

  1. Code Quality: Check if the code is clean, readable, and adheres to established coding standards. Ensure that the code is optimized and free of bugs.
  2. Functionality: Ensure that the code meets the requirements and that it functions as expected.
  3. Security: Check for potential security vulnerabilities and ensure that the code follows best practices for security.
  4. Test Coverage: Ensure that the code is covered by adequate test cases and that the tests are thorough.
  5. Performance: Review the code for performance bottlenecks and ensure that it is optimized for speed and efficiency.
  6. Scalability: Ensure that the code can scale to meet the needs of the users as the system grows.
  7. Maintainability: Check that the code is easy to maintain and can be easily updated and extended in the future.
  8. Documentations: Check if the code is properly documented, including comments and inline documentation, to help other developers understand it.

A successful code review is a collaborative effort that enhances the quality of the code and aligns it with the requirements of both the users and the development team. It should be a constructive process that helps to identify areas for improvement and ensures the code is optimized for maintenance, scalability, and security.

Tools

There are several tools available to help enhance code quality during code reviews:

  1. Linting Tools: These tools scan code for potential issues such as syntax errors, style violations, and semantic problems. Examples include ESLint and JSLint for JavaScript and Pylint for Python.
  2. Code Review Platforms: These platforms provide a centralized place for code review, allowing teams to review, discuss, and track changes to code. Examples include GitHub, GitLab, and Bitbucket.
  3. Automated Code Review Tools: These tools can automatically identify potential issues in code, such as security vulnerabilities, performance bottlenecks, and missing test coverage. Examples include SonarQube, CodeClimate, and Crucible.
  4. Code Comparison Tools: These tools allow developers to compare and merge changes to code. They can highlight differences between code versions and help to identify potential conflicts. Examples include Meld and Beyond Compare.
  5. Code Coverage Tools: These tools measure how much of the code is covered by tests and can identify areas where additional tests are needed. Examples include Cobertura and Istanbul.

Using these tools in combination with manual code review can help ensure that code quality is maintained and improved throughout the development process.

5 reasons why it makes sense to work with branches and tests in a single developer project

Working with branches and automated tests can bring a host of benefits to a single developer project, even if the project isn’t being worked on by multiple people. Here are some of the reasons why:

  1. Enhanced efficiency: When working with branches, a solo developer can tackle multiple features or bug fixes simultaneously, without having to worry about disrupting the main codebase. Additionally, by utilizing automated tests, the developer can validate that changes made in a branch don’t break existing functionality in a fast and efficient manner.
  2. Superior code quality: Automated tests can help catch bugs and issues early in the development process, long before they become problematic and harder to resolve. This leads to a more stable codebase and better code quality overall.
  3. Optimal version control: Branches allow a single developer to switch between different versions of code easily, as well as revert back to a previous version if necessary. This also makes it easier for the developer to manage code reviews and collaborate with other developers if the need arises in the future.
  4. Increased confidence: Automated tests provide a safety net for changes made in the code, which can give the developer more confidence when making modifications. If issues arise, the tests will quickly detect them, allowing the developer to fix them promptly.
  5. Support for experimentation: Branches make it possible for a developer to experiment with new ideas or approaches without affecting the main codebase. This can be especially valuable when exploring new technologies or finding new solutions to problems.

In conclusion, working with branches and automated tests can lead to improved efficiency, better code quality, optimal version control, increased confidence, and support for experimentation even in single developer projects. Whether you’re a beginner or an experienced developer, utilizing these tools can help streamline your development process and lead to better results.

10 shorthands commonly used in Code Reviews

There are several shorthands and abbreviations commonly used in code reviews:

  1. nit – nitpicking. Refers to minor and cosmetic changes that the reviewer suggests to the code (typos, formatting etc)
  2. N/A – Not Applicable, used to indicate that a particular comment or suggestion does not apply to the code being reviewed.
  3. +1 – Indicates agreement or support for a particular change or suggestion.
  4. -1 – Indicates opposition or disapproval of a particular change or suggestion.
  5. ACK – Acknowledge, used to indicate that the reviewer has seen the comment or suggestion and will address it.
  6. WIP – Work In Progress, used to indicate that the code being reviewed is still a work in progress and may not be complete or ready for review.
  7. RTFC – Read The F***ing Code, used to suggest that the reviewer should go back and read the relevant code before making a comment or suggestion.
  8. FIXME – A placeholder used to indicate that a particular piece of code needs to be fixed in the future.
  9. TODO – A placeholder used to indicate that a particular task needs to be completed in the future.
  10. LGTM

These shorthands and abbreviations are commonly used in code reviews to speed up the review process and make it more efficient. However, it’s important for all participants in the review to understand and agree on their meanings to avoid confusion and ensure effective communication.

If you know or use other shorthands or abbreviations, please let me know.

The 6 drawbacks of linter tools

While linter tools are widely used and can be incredibly helpful in detecting issues and improving code quality, they do have some disadvantages as well. Some of the common disadvantages of using linter tools include:

  1. False positives: Linters may produce false positive warnings or errors, which can be frustrating and lead to wasted time trying to resolve non-issues.
  2. Configuration complexity: Setting up a linter can be challenging, especially for large projects with multiple contributors and a complex codebase. It can be difficult to configure the linter to meet the specific needs of the project and the development team.
  3. Learning curve: Using a linter can require a learning curve for developers, as they need to understand how to use and configure the tool effectively. This can be especially challenging for developers who are new to the tool or the programming language.
  4. Inconsistent enforcement: Linters may not always be enforced consistently, leading to situations where some developers may not adhere to the linter’s recommendations. This can lead to inconsistent code quality and undermine the value of the linter.
  5. Limited scope: Linters are typically limited in scope and can only detect issues related to code syntax, style, and formatting. They may not be able to detect more complex issues such as performance bottlenecks or security vulnerabilities.
  6. Unfamiliar codebase: If a linter is being applied to an unfamiliar codebase, it may produce a large number of warnings and errors that can be overwhelming for the developer to resolve. This can lead to frustration and a sense that the tool is not effective.

In conclusion, while linter tools can be incredibly helpful in detecting issues and improving code quality, they also have some disadvantages that need to be taken into consideration. It is important to weigh the benefits and drawbacks of using a linter and determine if it is the right tool for your specific project and development team.

Code Review – A Critical Component of Software Development

Code reviews are an essential aspect of software development that can significantly enhance the quality and reliability of your code. They provide an opportunity for developers to learn from one another, share their expertise, and collaborate on creating better code.

One of the key benefits of code reviews is improved code quality. Through code reviews, developers can identify and resolve potential bugs, performance issues, and security vulnerabilities before the code is released to production. This proactive approach can save time and resources in the long run, as it is more cost-effective to catch and fix problems early in the development process.

In addition to improving code quality, code reviews also facilitate knowledge sharing and best practices. Reviewing the code of others can help developers understand the codebase and learn new techniques for writing high-quality code. This sharing of knowledge and expertise can lead to increased efficiency and better collaboration among team members, as everyone works towards a common goal.

Code reviews also play a crucial role in enhancing team communication. By working together to review and improve code, developers can build a sense of teamwork and collaboration. This can result in better communication and higher-quality code, as everyone works together to ensure the code meets the necessary standards and specifications.

Consistency in code is another important aspect that can be maintained through code reviews. By ensuring that code follows established coding standards, code reviews make it easier to maintain and enhance the code over time. This can greatly reduce the time and effort required for code maintenance and updates, as everyone on the team follows the same standards and best practices.

Finally, code reviews can also improve documentation, making it easier for others to understand and work with the code in the future. By reviewing the documentation and ensuring its completeness and accuracy, code reviews can help ensure that the code is well-documented and easy to understand.

In conclusion, code reviews are a valuable tool that can provide numerous benefits for both individual developers and development teams. Incorporating code reviews into your development process can help you write better code, share knowledge, communicate effectively, maintain consistency, and enhance documentation. Don’t overlook the importance of code reviews – make them a part of your workflow for the best results.

LGTM meaning

Let’s Gamble Try Merging, as suggested in this Reddit post in r/ProgrammerHumor? Probably not 😅

What most people agree on what LGTM means

LGTM is an acronym primarily used in software development and code reviews, but might be used in other contexts as well, and stands for “Looks Good To Me

Occasionally when reading code reviews we can find comments like these:

declare function nextMDX(options?: NextMDXOptions): WithMDX

LGTM in a code review

LGTM in a code review stands for Looks Good To Me. The Reviewer is Ok with the change and would like to see it on the main branch.

Often this is not only seen under code, but in the summary section of the Pull Request, referring to the entire pull request being fine with the reviewer. And sometimes together with a note.

But why writing this instead of using the ‘approve’ button in the GitHub or Bitbucket UI? Well, in some setups the approval goes away when the author of the change pushes again onto that branch, and the textual note will survive that.

In the end, when everything is ready, someone has to give approval via the approve button though.

However, it’s important to note that “LGTM” should not be used as a substitute for a thorough code review. Reviewers should still take the time to carefully examine the code and make sure it meets the necessary standards and requirements before giving their approval.

2 real world code reviews compared

I came across this blog post from Python/Django developer Matt Layman, where he compares 2 different cases of code reviews: one where people sat in a room and spent a lot of time talking about style issues, and one where style checking was automated on GitHub and people started talking about the actual changes and problems the code tries to solve. I recommend reading it.

One thing that really interests me is how did the style guide they were not able to automate look like. Maybe AI based style checkers like FYT could have worked?

2 side takeaways: isort and flake8 for Python, which I will try out soon as I’m using Python/Django myself and especially imports are a topic I need to look into and do some refactoring on, it easily can get messy.

I’m curious how the decision finding and agreeing on rules like flake8 look like in a team, it must be a process of its own.

What does ‘nit’ mean in a code review?

Sometimes when reading code reviews we can find comments like this from a colleague:

auto result = std::make_pair<uint64_t, std::string>(64, "hello");;

nit: double semicolon

What is nit in a code review or in a PR?

A nit is a minor finding in a code review that doesn’t significantly affect the functionality of the code but is still technically incorrect. The term comes from ‘nitpicking.

Like a typo in a comment, a semicolon too much, or an extra empty line. The reviewer points out that this is still not correct, but probably would not want you to delay merging the pull request for it.

And here is how we should treat it: if you still work on the PR, you can fix this in one of your next commits, but don’t delay the feature or bugfix integration for it. We all know, waiting for CI can take time, and if you clog CI for this, some people might not be happy.

Infographic for code review abbreviations

Infographic showing nit in code reviews and other abbreviations and their meaning

I created this infographic showing common code review abbreviations. Feel free to share or download, and please let me know if you know another one that should be on this list.

More to read

To learn more about common “slang” used in code reviews, I have compiled a list in this blog post, where you find explanations to abbreviations such as +1, WIP, LGTM and others.

By Thomas, updated on Jan 31, 2025.