Start Building an LLM application

4 min readAug 27, 2024

Introduction

In this post, I will try to summarize my learning journey in building an LLM application. The post will attempt to summarize the essential concepts to understand and the high-level steps required to build the LLM application. Diving into the details (such as the use of specific APIs) is avoided to not duplicating effort in official documentation, and also accounts for the rapidly evolving nature of relevant APIs.

Key Concepts

Large Language Models (LLM) — with examples such as ChatGPT, Llama, Mistral AI, etc. The most popular one is ChatGPT.

Orchestration Framework — Examples: LangChain/LangSmith/LangGraph/Llama Index. Think of it more like a wrapper modules that allow you to interact with the base LLM, while adding additional features on top of it.

Prompt — context provided to LLM to understand the problem

Temperature — The temperature of a large language model (LLM) is a parameter that controls the amount of randomness in the model’s output. The LLM temperature serves as a critical parameter influencing the balance between predictability (lower temperature) and creativity (higher temperature) in the generated text

Memory — The foundational LLM model may not store the conversation history (it doesn't remember what you asked/talked about before). The first thing for the LLM app you built is to add the memory feature.

Getting Start — Building the first simple app

Official documentation from LangChain is here.

High-level steps:

Obtain an API key of the LLM model you want to use. If you are using ChatGPT, most likely will need to top-up a small amount to start using their API.
Obtain an API key from the Orchestration Framework (like LangChain)
Stored both API keys in a .env file. If you using git to store your project, add the .env file into the .gitignore.
(Recommended) — For Python projects, set up a virtual environment, and follow typical project set up and dependencies management.
Start Coding, and take note of the initial setting such as temperature. For example, the class of ChatOpenAI for LangChain uses a default temperature of 0.7 (scale 0–1).

Working with Llama?

There are two options,

Download the llama model from the official site and run it yourself. Since the model is open source, you can perform customization and fine-tuning. The drawback of this approach is you need a strong computer to run the model (and host it).
Use one of the online hosting solutions such as Groq.

ChatPromptTemplate

Initial thought: provide the template so the developer can program prompts of similar patterns with the ability to change key variables.

Example taken from the LangChain documentation:

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant. Answer all questions to the best of your ability.",
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

# Provided you have a model defined earlier, we can then chain the prompt with 
# the base model as shown below
# Note you cannot do double chain of prompt like prompt1 | prompt2
# the output of prompt1 cannot be simply passed to prompt2
chain = prompt | model

LangChainExpressionLanguage(LCEL)

The “pipe” operator | is key to the LCEL chains
Order of the elements matter. LCEL will execute from left to right
LCEL chain is like a sequence of Runnables

Token, Memory, and Cost

Definition from ChatGPT: “Token is pieces of words used for natural language processing. For English text, 1 token is approximately 4 characters or 0.75 words.”

Why is this important? Cost matter

Because ChatGPT charges API usage using the token count. You can check the latest pricing on their webpage here. For example, for ChatGPT-3.5-Turbo it is currently $3 / 1M input token.

1M input token might seem to be a lot, but your typical input to provide enough context to the model may already be 100 tokens or more per input. Thus the actual Q & A you can get with $3 could be much less (like < 10,000 conversation) than you thought.

You can try to use a cheaper model like ChatGPT-4o mini, but it will still cost you.

Relation with Memory

LangChain offers a way to manage conversation history (memory) by storing the message history associated with a session ID. This memory management is different than the memory capability offered by ChatGPT. Take the image illustration from the official documentation as an example:

memory loop. Source: https://python.langchain.com/v0.1/docs/modules/memory/#introduction

The above loop is for version 0.1. It is unclear if they changed it for v0.2. This memory management technique implies that LangChain API will add the memory history as part of the prompt to the LLM Model (like ChatGPT). This will increase the token count used and lead to rising costs.

Simple Apps with LangChain

What we can do

Basic conversation application with memory
Key-value extractor from a given paragraph/passage
Sentiment analysis from a given paragraph/passage
Use some pre-built chain (from LangChain) to convert queries into SQL
Extract information from pdf (involved the use of RAG)

RAG (Retrieval-augmented generation)

This NVIDIA article gave a pretty good summary of what RAG is.

High-level steps I will need

Load the data
Split it
Convert it (a process called embedding, can be costly. need to investigate this)
Store it in a vector database
Chain it in prompt — using retrieval etc.