How to build an AI Agent with a memory

How to Build an Agent with a Local LLM and RAG, Complete with Local Memory

If you want to build an agent with a local LLM that can remember things and retrieve them on demand, you’ll need a few components: the LLM itself, a Retrieval-Augmented Generation (RAG) system, and a memory mechanism. Here’s how you can piece it all together, with examples using LangChain and Python. (and here is why a small LLM is a good idea)

Step 1: Set Up Your Local LLM

First, you need a local LLM. This could be a smaller pre-trained model like LLaMA or GPT-based open-source options running on your machine. The key is that it’s not connected to the cloud—it’s local, private, and under your control. Make sure the LLM is accessible via an API or similar interface so that you can integrate it into your system. A good choice would be using Ollama and an LLM such as Googles gemma. I also wrote easy to follow instructions in how to set an T5 LLM from Salesforce up locally, but it is also perfectly fine to use a cloud-based LLM.

In case the agent you want to build is about source code, here is an example of how to use CodeT5 with LangChain.

Step 2: Add Retrieval-Augmented Generation (RAG)

TL;DR: Gist on Github

Next comes the RAG. A RAG system works by combining your LLM with an external knowledge base. The idea is simple: when the LLM encounters a query, the RAG fetches relevant information from your knowledge base (documents, notes, or even structured data) and feeds it into the LLM as context.

To set up RAG, you’ll need:

  1. A Vector Database: This is where your knowledge will live. Tools like Pinecone, Weaviate, or even local implementations like FAISS can store your data as embeddings.
  2. A Way to Query the Vector Database: Use similarity search to find the most relevant pieces of information for any given query.
  3. Integration with the LLM: Once the RAG fetches data, format it and pass it as input to the LLM.

I have good experience with LangChain and Chroma:

documents = TextLoader("my_data.txt").load()
texts = CharacterTextSplitter(chunk_size=300, chunk_overlap=100).split_documents(documents)
vectorstore = Chroma.from_documents(texts, OllamaEmbeddings(model="gemma:latest")).as_retriever()

llm = OllamaLLM(model=model_name)
qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=vectorstore)

qa_chain.invoke("What is the main topic of my document?")

Step 3: Introduce Local Memory

Now for the fun part: giving your agent memory. Memory is what allows the agent to recall past interactions or store information for future use. There are a few ways to do this:

  • Short-Term Memory: Store conversation context temporarily. This can simply be a rolling buffer of recent interactions that gets passed back into the LLM each time.
  • Long-Term Memory: Save important facts or interactions for retrieval later. For this, you can extend your RAG system by saving interactions as embeddings in your vector database.

For example:

  1. After each interaction, decide if it’s worth remembering.
  2. If yes, convert it into an embedding and store it in your vector database.
  3. When needed, retrieve it alongside other RAG data to give the agent a sense of history.

Langchain Example

from langchain.memory import ConversationBufferMemory

# Initialize memory
memory = ConversationBufferMemory()

# Save some conversation turns
memory.save_context({"input": "Hello"}, {"output": "Hi there!"})
memory.save_context({"input": "How are you?"}, {"output": "I'm doing great, thanks!"})

# Retrieve stored memory
print(memory.load_memory_variables({}))

Step 4: Put It All Together

Now you can combine these elements:

  • The user sends a query.
  • The system retrieves relevant data via RAG.
  • The memory module checks for related interactions or facts.
  • The LLM generates a response based on the query, retrieved context, and memory.

This setup is powerful because it blends the LLM’s generative abilities with a custom memory tailored to your needs. It’s also entirely local, so your data stays private and secure.

Final Thoughts

Building an agent like this might sound complex, but it’s mostly about connecting the dots between well-known tools. Once you’ve got it running, you can tweak and fine-tune it to handle specific tasks or remember things better. Start small, iterate, and soon you’ll have an agent that feels less like software and more like a real assistant.

Analyzing an LLM dataset for non-data scientists

Large Language Models (LLMs) have become increasingly important for tasks involving natural language processing (NLP). However, their effectiveness hinges on the quality of the datasets used for training and evaluation. While data scientists typically handle the intricacies of these datasets, there are several reasons why non-data scientists, such as developers, project managers, or domain experts, might also need to engage in this process.

Why Analyze an LLM Dataset?

Understanding and analyzing an LLM dataset is essential for several reasons:

  1. Ensuring Model Quality: The performance of an LLM is directly tied to the quality of its training data. By analyzing the dataset, you can identify any potential issues, such as imbalances, biases, or irrelevant data that might negatively impact the model’s output.
  2. Bias Detection and Ethical Considerations: Datasets can inadvertently contain biases that lead to unfair or unethical outcomes. For example, if the training data over-represents certain demographic groups, the LLM might produce biased results. Analyzing the dataset allows you to spot these issues early and address them before the model is deployed.
  3. Customizing for Specific Needs: Not all datasets are created equal. Depending on your application, you might need to fine-tune the LLM on data that is more relevant to your domain. Analyzing the dataset helps you understand its strengths and weaknesses, guiding the fine-tuning process.
  4. Compliance and Documentation: In regulated industries, it’s crucial to ensure that your data practices are compliant with laws and regulations, such as GDPR. Analyzing the dataset is a necessary step in auditing and documenting the data to meet these requirements.

What to Look for in a Dataset

When you’re tasked with analyzing an LLM dataset, focus on these key aspects:

  • Data Distribution: Check if the data covers all relevant categories and is evenly distributed across them. Imbalances can lead to biased models.
  • Quality and Relevance: Assess the quality of the data—look for noise, duplicates, or irrelevant entries that could skew results.
  • Representation of Sensitive Attributes: Pay attention to how sensitive attributes (e.g., race, gender) are represented to avoid introducing bias.
  • Coverage of Domain-Specific Content: Ensure that the dataset contains sufficient examples related to the specific language, terminology, or context relevant to your application.

Practical Steps

  1. Data Profiling: Start with basic profiling to understand the dataset’s structure, including the distribution of data points, missing values, and outliers.
  2. Bias Auditing: Use statistical methods to detect any biases. Simple checks like comparing distributions across different demographic groups can reveal potential issues.
  3. Domain Relevance Check: Evaluate whether the dataset includes enough examples relevant to your specific use case, and consider augmenting it with additional data if necessary.

Conclusion

While data scientists usually handle the heavy lifting of dataset analysis, non-data scientists can play a crucial role in ensuring that an LLM performs well and behaves ethically. By engaging in dataset analysis, you not only improve the model’s quality but also help safeguard against potential biases and compliance issues. This approach ensures that the AI systems you contribute to are both effective and responsible.

LLM vs SLM: Why Small Language Models?

Large Language Models (LLMs) and Small Language Models (SLMs) represent distinct approaches to natural language processing. LLMs are massive models trained on vast amounts of text data, enabling them to generate human-quality text, translate languages, write different kinds of creative content, and answer your questions in an informative way. However, their size necessitates substantial computational resources.   

In contrast, SLMs are smaller models trained on more focused datasets. This makes them computationally efficient and suitable for specific tasks. They often excel in particular domains or applications.

Use Cases for LLMs

  • Content generation: Creating various text formats, from articles to code.
  • Machine translation: Translating between different languages.
  • Chatbots and virtual assistants: Providing interactive and informative conversations.
  • Summarization: Condensing long texts into shorter summaries.

LLMs excel at generating diverse text formats, from marketing copy and social media content to scripts, poems, and translations. They can also provide informative answers to a wide range of questions.

Use Cases for SLMs

  • Domain-specific tasks: Excelling in tasks requiring specialized knowledge, such as medical or legal text processing, as well as code-specific tasks
  • Resource-constrained environments: Operating efficiently on devices with limited computational power.
  • Faster training and deployment: Shorter development cycles compared to LLMs.

SLMs demonstrate strengths in specific text-based applications, excelling in tasks such as sentiment analysis, text classification, and named entity recognition. They can also be tailored for specialized domains like healthcare or finance. Additionally, SLMs can be adapted to support niche programming languages, providing solutions for specific development challenges.

Trade-offs and Considerations

LLMs demand substantial computational resources for training and deployment, reflecting their complexity and size. In contrast, SLMs are more efficient due to their smaller scale. While LLMs often excel in diverse language tasks, SLMs can be specialized for specific domains. Data requirements also differ significantly, with LLMs needing vast datasets and SLMs operating on smaller, focused collections. Ultimately, the choice between an LLM and an SLM hinges on factors such as computational budget, performance, and the nature of the target app.

Hybrid Approaches

Hybrid approaches to language models combine the strengths of large language models (LLMs) and smaller, more specialized language models (SLMs). Transfer learning involves utilizing a pre-trained LLM as a foundation and adapting it to specific tasks through fine-tuning on domain-specific data. This approach benefits from the knowledge captured in the base LLM while tailoring the model to the target domain. Model distillation compresses a large LLM into a smaller, more efficient SLM while preserving key functionalities. This technique enables deployment in resource-constrained environments without significant performance degradation. By strategically combining LLMs and SLMs, organizations can develop robust and adaptable language models capable of handling a wide range of tasks.