September 18, 2023September 8, 2023 by Thomas

std::erase, erase_if C++20

In this blog post, we’ll explore how std::erase functions simplify container manipulation and improve code readability.

Dealing with the removal of specific elements from a container in your C++ code can often be a cumbersome and error-prone task. Fortunately, C++20 introduces two powerful allies to streamline this process: std::erase and std::erase_if. These functions bring efficiency and clarity to element removal, and in this article, we’ll explore their workings and benefits.

Why We Need Easy Element Removal

When you’re working with C++ containers like vectors or lists, you often want to kick out some elements based on certain conditions or values. Before C++20, this was like navigating a maze blindfolded. You had to write loops and custom code to find and remove the elements you wanted. Not exactly a picnic, right?

Meet std::erase: The Element Eraser

std::erase is like the Marie Kondo of C++20. It helps you tidy up your container by removing all instances of a specific value. It’s super easy to use and works with various container types—vectors, lists, and even sets and maps. Here’s how it works:

cppCopy code

std::vector<int> numbers = {1, 2, 3, 2, 4, 2, 5}; std::erase(numbers, 2); // Say goodbye to all those 2s

In this example, we’re saying, “Hey, std::erase, please get rid of all the 2s.” And just like magic, the vector becomes {1, 3, 4, 5}. Neat, right?

Meet std::erase_if: The Selective Element Picker

Now, what if you want to get a bit pickier and remove elements based on custom conditions? That’s where std::erase_if comes in. It’s like having a personal assistant that follows your criteria. Check it out:

cppCopy code

std::vector<int> numbers = {1, 2, 3, 4, 5, 6}; std::erase_if(numbers, [](int n) { returnn % 2 == 0; }); // Adios, even numbers!

In this case, we’re using a cool lambda function as a “picker.” It says, “Bye-bye, even numbers,” and, voilà, we’re left with {1, 3, 5}. std::erase_if lets you customize the removal process based on your whims and fancies.

The Perks of std::erase and std::erase_if

These new additions in C++20 bring some serious perks:

Readability: Your code becomes a breeze to read because these functions spell out your intent—removing elements—right in their names.
Simplicity: One function call does the trick, no more convoluted loops or DIY removal code.
Safety: Standard library functions are your trusty sidekicks, reducing the risk of bugs and odd edge cases in your element removal logic.
Performance: These functions are turbocharged for efficiency, so your code stays zippy even when you’re juggling large containers.

In Conclusion

Say goodbye to the headaches of element removal and embrace the simplicity of std::erase and std::erase_if. They make your code cleaner, more readable, and safer. Whether you’re cleaning up by value or on your custom criteria, C++20 has your back. So go ahead, give these new features a whirl in your C++ projects, and let your code shine. Happy coding! 🚀👨‍💻

September 14, 2023September 8, 2023 by Thomas

Introspect mapbox::value

When you’re working with container types that can hold arrays, objects or values, it might be very beneficial if you could look inside to discover its structure and the value it holds. Using the API might be cumbersome, but luckily we can use the toJson() method to dump its content to the console, like so:

value.toJson()

mapbox::value is a type from the Mapbox GL Native library that can hold a variety of different data types (e.g. integers, strings, arrays, etc.). To introspect a mapbox::value object, you can use the following methods:

type() method: This method returns an enumeration value indicating the type of data stored in the mapbox::value object. For example, if the mapbox::value object holds an integer, the type() method will return mapbox::value_type::number_integer.
get_value_type() method: This method returns a string representation of the type stored in the mapbox::value object. For example, if the mapbox::value object holds a string, the get_value_type() method will return “string”.
get<T>() method: This method allows you to access the underlying value stored in the mapbox::value object. You need to specify the data type T that you expect to retrieve. If the type stored in the mapbox::value object is different from T, a mapbox::util::bad_cast exception will be thrown.
is<T>() method: This method allows you to check if the mapbox::value object holds a value of type T. It returns true if the underlying value is of type T, and false otherwise.

You can use these methods to introspect a mapbox::value object and determine its type and underlying value. Once you have determined the type and value of the mapbox::value object, you can use the appropriate methods to access and manipulate the value as needed.

Let’s dive a bit deeper into how you can leverage these methods.

Using `type()` to Determine Data Type

Consider a scenario where you have a mapbox::value object, but you’re not sure what type of data it holds. The type() method comes to the rescue. Here’s how you can use it:

mapbox::value myValue = …; // Your mapbox::value object
switch (myValue.type()) {
    case mapbox::value_type::number_integer:
        // Handle integer data
        break;
    case mapbox::value_type::string:
        // Handle string data
        break;
    case mapbox::value_type::array:
        // Handle array data
        break;
    // Add more cases for other data types as needed
}
By examining the result of type(), you can take appropriate actions based on the actual data type.

Getting the String Representation with `get_value_type()`

Sometimes, you might not need the specific enumeration value returned by type(). Instead, you might prefer a more human-readable representation of the data type. This is where get_value_type() shines:

mapbox::value myValue = …; // Your mapbox::value object

std::string valueType = myValue.get_value_type();

// Now, valueType contains a string representation of the data type.

This string can be helpful for logging, reporting, or simply for making your code more understandable.

Accessing the Underlying Value with `get<T>()`

To access the actual value contained within a mapbox::value object, you can use the get<T>() method. Specify the data type T that corresponds to the expected data type. If the mapbox::value object doesn’t hold a value of that type, an exception will be thrown. Here’s an example:

mapbox::value myValue = ...;  // Your mapbox::value object

try {
    int intValue = myValue.get<int>();  // Attempt to get an integer value
    // Handle intValue
} catch (const mapbox::util::bad_cast&) {
    // Handle the case where myValue doesn't contain an integer
}

This approach ensures type safety and helps prevent runtime errors caused by incompatible data types.

Checking the Type with `is<T>()`

Before attempting to access the value with get<T>(), you might want to check if the mapbox::value object actually holds a value of a specific type. This can be done using the is<T>() method:

mapbox::value myValue = ...;  // Your mapbox::value object

if (myValue.is<int>()) {
    // myValue contains an integer
    int intValue = myValue.get<int>();
} else {
    // Handle the case where myValue is not an integer
}

This approach allows you to safely access the value only if it matches the expected type.

Introspecting a mapbox::value object doesn’t have to be a daunting task. By using the type(), get_value_type(), get<T>(), and is<T>() methods, you can confidently explore the contents of these versatile objects. Whether you’re parsing JSON data, working with Mapbox GL Native, or dealing with any other scenario involving mapbox::value, these introspection techniques will be your reliable companions in understanding and handling your data. Happy coding!

September 11, 2023September 8, 2023 by Thomas

Creating API Documentation

Creating API documentation is a crucial step in making your API accessible and understandable to other developers or users. Here’s a general guide on how to create API documentation:

Choose a Documentation Format:
- Decide on the format for your API documentation. Common formats include:
  - Swagger/OpenAPI: A standardized format for describing RESTful APIs. It’s machine-readable and can be used to generate interactive documentation.
  - Markdown: A lightweight, human-readable format often used for creating static API documentation.
  - HTML or PDF: You can create static HTML or PDF documents to document your API.
  - API Documentation Tools: Consider using dedicated API documentation tools like Swagger, Postman, or API Blueprint, which often have built-in documentation features.
Define API Endpoints and Methods:
- List all the endpoints, methods (GET, POST, PUT, DELETE, etc.), and their purposes. This serves as an outline for your documentation.
Document API Endpoints:
- For each endpoint, provide detailed information, including:
  - Endpoint URL: The URL or path for the endpoint.
  - HTTP Method: The HTTP method used (e.g., GET, POST, PUT, DELETE).
  - Parameters: List any query parameters, request headers, or request body parameters.
  - Responses: Describe the possible HTTP responses, including status codes and response bodies.
  - Authentication: Explain any authentication or authorization requirements for the endpoint.
  - Example Requests and Responses: Provide real-world examples of how to make requests and interpret responses.
  - Error Handling: Document how errors are handled and returned to the client.
Add Code Samples:
- Include code samples in various programming languages to show how developers can interact with your API. These code samples should cover common use cases.
Provide Interactive Examples (if possible):
- If using Swagger or a similar tool, you can create interactive documentation that allows users to make API requests directly from the documentation page.
Explain Authentication and Authorization:
- Clearly explain how users can authenticate themselves to access the API and any required API keys, tokens, or OAuth2 flows.
Include Rate Limiting and Usage Policies:
- If applicable, specify rate limiting policies and usage guidelines for your API.
Add Versioning Information:
- Include information about API versioning, especially if your API may undergo changes or updates over time.
Add FAQs and Troubleshooting:
- Address common questions and provide guidance on troubleshooting common issues users may encounter.
Style and Consistency:
- Maintain a consistent style and formatting throughout your documentation. Use headings, bullet points, and clear language to make the content easy to read and navigate.
Host Your Documentation:
- Host your API documentation on a dedicated server or platform. You can use GitHub Pages, GitLab Pages, ReadTheDocs, or other similar services to host static documentation. If you’re using Swagger or a dedicated API documentation tool, they often provide hosting options.
Keep Documentation Updated:
- Regularly update your documentation to reflect changes in your API. Outdated documentation can lead to confusion and frustration for users.
Seek Feedback:
- Encourage users and developers to provide feedback on your documentation. Address any issues or questions raised by users to improve the documentation’s quality.

Creating comprehensive and user-friendly API documentation is an ongoing process. It’s essential to keep it up-to-date and ensure it meets the needs of your API users. Good documentation can significantly improve the adoption and success of your API.

September 8, 2023September 8, 2023 by Thomas

The mysterious –> operator in C++

The C++ –> is used for … well. I came across this post on Stackoverflow where someone asks what the –> operator in C++ is for, with the following code snippet:

while (x --> 0)
{
    printf("%d ", x);
}

I asked ChatGPT what the –> operator is, and it correctly stated that it “appears to use a non-standard and unconventional usage of the --> operator, which is not a standard operator in C++.” – which is quite good considering the hallucination it often shows. Furthermore, it says:

In C++, the --> operator doesn’t have any predefined meaning or behavior. Instead, it appears that the code you’ve shown is trying to use the -- (decrement) operator followed by the > (greater than) operator. However, this is not valid C++ syntax.

This is pretty much in line with the answer of the post on Stack Overflow.

Just for the fun, I asked ChatGPT how an implementation would look like if it was a valid operator, and this is what it came up with:

#include <iostream>

class CustomClass {
private:
    int value;

public:
    CustomClass(int val) : value(val) {}

    // Overloading the -->(...) operator
    CustomClass operator-->(int incrementValue) {
        CustomClass temp(*this);
        value += incrementValue;
        return temp;
    }

    int getValue() const {
        return value;
    }
};

int main() {
    CustomClass obj(5);

    // Using the custom -->(...) operator
    obj = obj-->(3);

    std::cout << "Value after custom operation: " << obj.getValue() << std::endl;

    return 0;
}

Now I asked it if it could imagine what the –> operator could be used for. Here is the answer:

Let’s imagine a use case for the --> operator in a hypothetical scenario:

Suppose you are working on a complex numerical simulation or mathematical modeling library in C++. In such a library, you might frequently work with mathematical expressions and matrices. You could use the --> operator as a custom operator for matrix operations, specifically for updating matrix elements in a concise and intuitive way.

It also mentions I should keep in mind that this is entirely speculative, as the --> operator doesn’t exist in the C++ language, and inventing custom operators should be done judiciously and with care for code readability and maintainability.

Then it produced some code:

#include <iostream>
#include <vector>

class Matrix {
private:
    std::vector<std::vector<double>> data;

public:
    Matrix(const std::vector<std::vector<double>>& input) : data(input) {}

    // Overload the --> operator for element updates
    double& operator-->(const std::pair<int, int>& indices) {
        int row = indices.first;
        int col = indices.second;

        if (row >= 0 && row < data.size() && col >= 0 && col < data[0].size()) {
            return data[row][col];
        } else {
            throw std::out_of_range("Matrix element out of bounds");
        }
    }

    // Display matrix
    void display() const {
        for (const auto& row : data) {
            for (const double& value : row) {
                std::cout << value << ' ';
            }
            std::cout << '\n';
        }
    }
};

int main() {
    std::vector<std::vector<double>> initialData = {{1.0, 2.0}, {3.0, 4.0}};
    Matrix mat(initialData);

    // Use the custom --> operator to update matrix elements
    mat --> std::make_pair(0, 1) = 5.0;

    // Display the updated matrix
    mat.display();

    return 0;
}

The --> operator is overloaded for the Matrix class to update individual elements of the matrix.
The operator takes a pair of indices (row and column) and returns a reference to the corresponding matrix element.
It performs bounds checking to ensure that the indices are within the matrix dimensions.

Well, if C++ had a --> operator, we’d probably have debates about whether it should be called the “rocket ship operator” or the “arrow arrow operator.” And we’d have t-shirts with clever slogans like “I put the ‘fun’ in ‘function'” for C++ developers who love their custom operators. But for now, let’s stick with the good ol’ . and -> and leave the hypothetical operators to our wildest coding dreams! 😄

September 7, 2023February 3, 2025 by Thomas

How to Use CodeT5 on Your Laptop: A Step-by-Step Guide

CodeT5+ (or CodeT5) is an advanced LLM designed for developers. It can generate source code, and explain what your code does. It has a good performance while being lightweight enough to run on a laptop for both inference and fine tuning, and can be trained with additional knowledge-data. I’ll show how to set up and use CodeT5+ on your laptop in minutes.

💡 Try the inference web demo. If you need the LLM to better understand your code or toolchain, you can fine-tune it. It is really not very difficult to set up, and doesn’t require expensive hardware. Read here how to do the training, and here how to use CodeT5 with LangChain.

CodeT5 vs CodeT5+?

CodeT5 is an advanced Transformer-based model designed for both understanding and generating code. It stands out by effectively handling code identifiers, leveraging user-written comments, and excelling in various code-related tasks, surpassing previous methods.

CodeT5, provided by Salesforce, comes pre-trained as small, base and large versions differing in the size of trained parameters. The newer versions are called CodeT5+ which I will use here.

Installation on Your Laptop

CodeT5+ delivers impressive performance while remaining capable of running smoothly on a local laptop for inference without any issues. Fine-tuning can also be conducted locally, making CodeT5 an ideal choice for developers to experiment with large language models (LLMs).

Setting Up Your Environment

I’ll show how to set it up and use CodeT5+ fine-tuned with some KDE code.

For an easy installation and demonstrating fine-tuning, this GitHub repo is well suitable and uses the new CodeT5+. It can run as a local server and comes with a simple demo html page, a great choice if you want to make a quick REST call from your app and go from there. And for reference, here is the original Salesforce GitHub repo.

The model is capable of describing code accurately, and you can fine-tune this model with your own code snippets to make it better or know your latest API changes.

To get started, create a dedicated folder for your project and set up the necessary Python packages. Here are the steps:

git clone https://github.com/tm243/CodeT5-KDE.git

Create a virtual environment in your project folder:

virtualenv .env

Activate the virtual environment:

source .env/bin/activate

Install the required Python packages from the provided “requirements.txt” file:

pip install -r requirements.txt

Once you’ve set up the environment, you’ll need to obtain and configure the model weights before running the CodeT5+ model. Here’s how you can do that:

Download the model weights from the specified URL:

wget https://www.opendocstring.com/downloads/weights/codet5/saved-pretrained-kde-cpp-multisum-2023-05-10-06.tar.gz

Unpack the downloaded model weights. You can do this manually or by using the provided script:

Manually:

tar -xzvf saved-pretrained-kde-cpp-multisum-2023-05-10-06.tar.gz

Or by using the script:

./download_weights.sh

After completing these steps, you’ll have the model weights available in the “api/saved-pretrained-kde-…” directory, and your environment will be set up to run the CodeT5 model effectively.

Using CodeT5 for Inference

To test your model you can run the inference.py script in the folder.

Though, the model becomes more useful when starting it as a local server and make requests to it. You can do:

uvicorn api.rest:app --port 7999 --reload

Now, there is a demo.html in your folder which you can open and make some queries. Try to paste some code and see what it does.

Prompt example: Write a replace method for a string class which replaces the given string with a given set of characters. Try it out on this running demo.

If you found that it does a wrong description for your code, you might want to fine-tune your model. Keep in mind that the model is trained on certain code and programming languages, but maybe hasn’t seen any code similar to the one you tried.

Fine-tuning CodeT5 (Optional)

If you want to add capabilities to the model, such as another programming language, toolkit usage or other usage patterns, you can fine-tune this model.

Why fine tuning?

Fine tuning leverages on a pre-trained model, and simply adds knowledge on top. This way, you don’t need to go through the full, lengthy and expensive training process. Fine tuning is much faster and cheaper.

How to Fine-tune CodeT5

In order to do that, you will need to create a dataset which is then fed to the model training. To get some results, a dataset of about 100 entries is already sufficient. The entries should show some variety and cover different, but similar topics.

For example, if you want to improve your models capacity to understand Python Django code, you can spend some amount on Django Views, some on Django models, and so on. They are close enough to have things in common (Django) and different enough for diversification (models, views, etc).

The data is organized in simple csv files of the following format:

"code","docstring","url","license"

The training process will then read this file and feed it to the model. Read more about preparing training data in this blog post.

I will in a next blog post show how to set up such a fine-tune training on a Macbook M1.

Let me know if you have any questions about running and fine-tuning CodeT5, either in the comment section below or reach out.

One Two Bytes

Roaming the software world. A smorgasbord of topics I come across while writing software.

Month / September 2023

std::erase, erase_if C++20

Why We Need Easy Element Removal

Meet std::erase: The Element Eraser

Meet std::erase_if: The Selective Element Picker

The Perks of std::erase and std::erase_if

In Conclusion

Introspect mapbox::value

Using `type()` to Determine Data Type

Getting the String Representation with `get_value_type()`

Accessing the Underlying Value with `get<T>()`

Checking the Type with `is<T>()`

Creating API Documentation

The mysterious –> operator in C++

How to Use CodeT5 on Your Laptop: A Step-by-Step Guide

CodeT5 vs CodeT5+?

Installation on Your Laptop

Setting Up Your Environment

Using CodeT5 for Inference

Fine-tuning CodeT5 (Optional)

How to Fine-tune CodeT5

Why We Need Easy Element Removal

Meet std::erase: The Element Eraser

Meet std::erase_if: The Selective Element Picker

The Perks of std::erase and std::erase_if

In Conclusion

Using type() to Determine Data Type

Getting the String Representation with get_value_type()

Accessing the Underlying Value with get<T>()

Checking the Type with is<T>()

CodeT5 vs CodeT5+?

Installation on Your Laptop

Setting Up Your Environment

Using CodeT5 for Inference

Fine-tuning CodeT5 (Optional)

How to Fine-tune CodeT5

Using `type()` to Determine Data Type

Getting the String Representation with `get_value_type()`

Accessing the Underlying Value with `get<T>()`

Checking the Type with `is<T>()`