Monthly Archives: February 2025

Building a Customer Support Chatbot With Ollama, Mistral 7B, SQLite, & Docker

(Part -1)

I built a customer support chatbot that can answer user queries and track orders using Mistral 7B, SQLite, and Docker. This chatbot is integrated with Ollama to generate intelligent responses and can retrieve real-time order statuses from a database. The project is fully containerized using Docker and version-controlled with GitHub.

In Part 1, we cover:
 ✅ Setting up Mistral 7B with Ollama (Dockerized Version)
 ✅ Connecting the chatbot to SQLite for order tracking
 ✅ Running everything inside Docker
 ✅ Pushing the project to GitHub

In Part 2, we will expand the chatbot by:
 🚀 Turning it into an API with FastAPI
 🎨 Building a Web UI using Streamlit
 💾 Allowing users to add new orders dynamically

📂 Project Structure

Here’s the structure of the project:

llm-chatbot/
├── Dockerfile # Docker setup for chatbot
├── setup_db.py # Initializes the SQLite database
├── chatbot.py # Main chatbot Python script
├── requirements.txt # Python dependencies
├── orders.db # SQLite Database (created dynamically, not tracked in Git)
└── README.md # Documentation

✅ The database (orders.db) is dynamically created and not committed to Git to keep things clean.

🚀 Tech Stack

🔧 Step 1: Setting Up Mistral 7B With Ollama (Dockerized Version)

To generate human-like responses, we use Mistral 7B, an open-source LLM that runs efficiently on local machines. Instead of installing Ollama directly, we use its Dockerized version.

📌 Install Ollama (Docker Version)

Run the following command to pull and start Ollama as a Docker container:

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
  • -d → Runs Ollama in the background
  • -v ollama:/root/.ollama → Saves model files persistently
  • -p 11434:11434 → Exposes Ollama’s API on port 11434
  • --name ollama → Names the container “ollama”

📌 Download Mistral 7B

Once Ollama is running in Docker, download Mistral 7B by running:

docker exec -it ollama ollama pull mistral

Now we have Mistral 7B running inside Docker! 🎉

📦 Step 2: Creating the Chatbot

📌 chatbot.py (Main Chatbot Code)

This Python script:
 ✅ Takes user input
 ✅ Checks if it’s an order query
 ✅ Fetches order status from SQLite (if needed)
 ✅ If not an order query, sends it to Mistral 7B

import requests
import sqlite3
import os

# Ollama API endpoint
url = "http://localhost:11434/api/generate"

# Explicitly set the correct database path inside Docker
DB_PATH = "/app/orders.db"

# Function to fetch order status from SQLite
def get_order_status(order_id):
# Ensure the database file exists before trying to connect
if not os.path.exists(DB_PATH):
return "Error: Database file not found! Please ensure the database is initialized."

conn = sqlite3.connect(DB_PATH)
cursor = conn.cursor()

# Check if the orders table exists
cursor.execute("SELECT name FROM sqlite_master WHERE type='table' AND name='orders';")
table_exists = cursor.fetchone()
if not table_exists:
conn.close()
return "Error: Orders table does not exist!"

cursor.execute("SELECT status FROM orders WHERE order_id = ?", (order_id,))
result = cursor.fetchone()
conn.close()

return result[0] if result else "Sorry, I couldn't find that order."

# System instruction for chatbot
system_prompt = """
You are a customer support assistant for an online shopping company.
Your job is to help customers with order tracking, returns, and product details.
Always be polite and provide helpful answers.
If the user asks about an order, ask them for their order number.
"""


print("Welcome to the Customer Support Chatbot! Type 'exit' to stop.\n")

while True:
user_input = input("You: ")

if user_input.lower() == "exit":
print("Goodbye! 👋")
break

# Check if the user provided an order number (5-digit number)
words = user_input.split()
order_id = next((word for word in words if word.isdigit() and len(word) == 5), None)

if order_id:
chatbot_response = f"Order {order_id} Status: {get_order_status(order_id)}"
else:
# Send the question to Mistral for a response
data = {
"model": "mistral",
"prompt": f"{system_prompt}\nCustomer: {user_input}\nAgent:",
"stream": False
}

response = requests.post(url, json=data)
chatbot_response = response.json()["response"]

print("Chatbot:", chatbot_response)

Now the chatbot can track orders and answer general questions!

🐳 Step 2: Running Everything with Docker & Docker Compose

To make deployment easier, we containerized the chatbot using Docker Compose, which allows us to manage multiple services (chatbot & database) easily.

📌 docker-compose.yml (Manages Services)

version: '3.8'

services:
chatbot:
build: .
container_name: chatbot_container
volumes:
- chatbot_data:/app
stdin_open: true
tty: true
command: >
sh -c "python setup_db.py && python chatbot.py"
volumes:
chatbot_data:

What This Does:

  • Defines the chatbot service (chatbot_container)
  • Mounts a volume to persist database files
  • Automatically runs setup_db.py before starting chatbot.py

💾 Step 3: Storing Orders in SQLite

We need a database to store order tracking details. We use SQLite, a lightweight database that’s perfect for small projects.

📌 setup_db.py (Creates the Database)

import sqlite3

# Store the database inside Docker at /app/orders.db
DB_PATH = "/app/orders.db"
# Connect to the database
conn = sqlite3.connect(DB_PATH)
cursor = conn.cursor()
# Create the orders table
cursor.execute("""
CREATE TABLE IF NOT EXISTS orders (
order_id TEXT PRIMARY KEY,
status TEXT
)
"""
)
# Insert some sample data
orders = [
("12345", "Shipped - Expected delivery: Feb 28, 2025"),
("67890", "Processing - Your order is being prepared."),
("11121", "Delivered - Your package was delivered on Feb 20, 2025."),
]
cursor.executemany("INSERT OR IGNORE INTO orders VALUES (?, ?)", orders)
conn.commit()
conn.close()
print("✅ Database setup complete inside Docker!")

Now our chatbot can track real orders!

📦 Step 4: Building the Docker Image

📌 Dockerfile (Defines the Chatbot Container)

# Use an official Python image
FROM python:3.10

# Install SQLite inside the container
RUN apt-get update && apt-get install -y sqlite3
# Set the working directory inside the container
WORKDIR /app
# Copy project files into the container
COPY . .
# Install required Python packages
RUN pip install requests
# Expose port (optional, if we later add an API)
EXPOSE 5000
# Run the chatbot script
CMD ["python", "chatbot.py"]
# Ensure the database is set up before running
RUN python setup_db.py

📌 Build and Run in Docker

docker build -t chatbot .
docker run --network host -it chatbot

Now the chatbot and database run inside Docker! 🎉

📤 Step 4: Pushing the Project to GitHub

To save our work, we pushed everything to GitHub.

📌 Steps to Push to GitHub

git init
git add .
git commit -m "Initial commit: Add chatbot with SQLite and Docker"
git branch -M main
git remote add origin https://github.com/<YOUR_REPO>/llm-chatbot.git
git push -u origin main

Now the project is live on GitHub! 🎉


📌 What’s Next? (Part 2)

In Part 2, we will:
 🚀 Turn the chatbot into an API with FastAPI
 🎨 Build a Web UI using Streamlit
 💾 Allow users to add new orders dynamically

💡 Stay tuned for Part 2! 🚀

Thank you for being a part of the community

Before you go:

Distributed Design Pattern: Data Federation for Real-Time Querying

[Financial Portfolio Management Use Case]

In modern financial institutions, data is increasingly distributed across various internal systems, third-party services, and cloud environments. For senior architects designing scalable systems, ensuring real-time, consistent access to financial data is a challenge that can’t be underestimated. Consider the complexity of querying diverse data sources — from live market data feeds to internal portfolio databases and client analytics systems — and presenting it as a unified view.

Problem Context:

As the financial sector moves towards more distributed architectures, especially in cloud-native environments, systems need to ensure that data across all sources is up-to-date and consistent in real-time. This means avoiding stale data reads, which could result in misinformed trades or investment decisions.

For example, a stock trading platform queries live price data from multiple sources. If one of the sources returns outdated prices, a trade might be executed based on inaccurate information, leading to financial losses. This problem is particularly evident in environments like real-time portfolio management, where every millisecond of data staleness can impact trading outcomes.

The Federated Query Processing Solution

Federated Query Processing offers a powerful way to solve these issues by enabling seamless, real-time access to data from multiple distributed sources. Instead of consolidating data into a single repository (which introduces replication and synchronization overhead), federated querying allows data to remain in its source system. The query processing engine handles the aggregation of results from these diverse sources, offering real-time, accurate data without requiring extensive data movement.

How Federated Querying Works

  1. Query Management Layer:
    This layer sits at the front-end of the system, serving as the interface for querying different data sources. It’s responsible for directing the query to the right sources based on predefined criteria and ensuring the appropriate data is retrieved for any given request. As part of this layer, a query optimization strategy is essential to ensure the most efficient retrieval of data from distributed systems.
  2. Data Source Layer:
    In real-world applications, data is spread across various databases, APIs, internal repositories, and cloud storage. Federated queries are designed to traverse these diverse sources without duplicating or syncing data. Each of these data sources remains autonomous and independently managed, but queries are handled cohesively.
  3. Query Execution and Aggregation:
    Once the queries are dispatched to the relevant sources, the results are aggregated by the federated query engine. The aggregation process ensures that users or systems get a seamless, real-time view of data, regardless of its origin. This architecture enables data autonomy, where each source retains control over its data, yet data can be queried as if it were in a single unified repository.

Architectural Considerations for Federated Querying

As a senior architect, implementing federated query processing involves several architectural considerations:

Data Source Independence:
Federated query systems thrive in environments where data sources must remain independently managed and decentralized. Systems like this often need to work with heterogeneous data formats and data models across systems. Ensuring that each source can remain updated without disrupting the overall query response time is critical.

Optimization and Scalability:
Query optimization plays a key role. A sophisticated optimization strategy needs to be in place to handle:

  • Source Selection: The federated query engine should intelligently decide where to pull data from based on query complexity and data freshness requirements.
  • Parallel Query Execution: Given that data is distributed, executing multiple queries in parallel across nodes helps optimize response times.
  • Cache Mechanisms: Using cache for frequently requested data or complex queries can greatly improve performance.

Consistency and Latency:

Real-time querying across distributed systems brings challenges of data consistency and latency. A robust mechanism should be in place to ensure that queries to multiple sources return consistent data. Considerations such as eventual consistency and data synchronization strategies are key to implementing federated queries successfully in real-time systems.

Failover Mechanisms:

Given the distributed nature of data, ensuring that the system can handle failures gracefully is crucial. Federated systems must have failover mechanisms to redirect queries when a data source fails and continue serving queries without significant delay.

Real-World Performance Considerations

When federated query processing is implemented effectively, significant performance improvements can be realized:

  1. Reduction in Network Overhead:
    Instead of moving large volumes of data into a central repository, federated queries only retrieve the necessary data, significantly reducing network traffic and latency.
  2. Scalability:
    As the number of data sources grows, federated query engines can scale by adding more nodes to the query execution infrastructure, ensuring the system can handle larger data volumes without performance degradation.
  3. Improved User Experience:
    In financial systems, low-latency data retrieval is paramount. By optimizing the query process and ensuring the freshness of data, users can access real-time market data seamlessly, leading to more accurate and timely decision-making.

Federated query processing is a powerful approach that enables organizations to handle large-scale, distributed data systems efficiently. For senior architects, understanding how to implement federated query systems effectively will be critical to building systems that can seamlessly scale, improve performance, and adapt to changing data requirements. By embracing these patterns, organizations can create flexible, high-performing systems capable of delivering real-time insights with minimal latency — crucial for sectors like financial portfolio management.

Thank you for being a part of the community

Before you go:

Distributed Design Pattern: Consistent Hashing for Load Distribution

[A Music Streaming Service Shard Management Case Study]

Imagine you’re building the next Spotify or Apple Music. Your service needs to store and serve millions of music files to users worldwide. As your user base grows, a single server cannot handle the load, so you need to distribute the data across multiple servers. This raises several critical challenges:

  1. Initial Challenge: How do you determine which server should store and serve each music file?
  2. Scaling Challenge: What happens when you need to add or remove servers?
  3. Load Distribution: How do you ensure an even distribution of data and traffic across servers?

Let’s see how these challenges manifest in a real scenario:

Consider a music streaming service with:

  • 10 million songs
  • 4 servers (initially)
  • Need to scale to 5 servers due to increased load

Traditional Approach Using Simple Hash Distribution

The simplest approach would be to use a hash function with modulo operation:

server_number = hash(song_id) % number_of_servers

Problems with this approach:

  1. When scaling from 4 to 5 servers, approximately 80% of all songs need to be redistributed
  2. During redistribution:
  • High network bandwidth consumption
  • Temporary service degradation
  • Risk of data inconsistency
  • Increased operational complexity

For example:

  • Song “A” with hash 123 → Server 3 (123 % 4 = 3)
  • After adding 5th server → Server 3 (123 % 5 = 3)
  • Song “B” with hash 14 → Server 2 (14 % 4 = 2)
  • After adding 5th server → Server 4 (14 % 5 = 4)

Solution: Consistent Hashing

Consistent Hashing elegantly solves these problems by creating a virtual ring (hash space) where both servers and data are mapped using the same hash function.

How It Works

1. Hash Space Creation:

  • Create a circular hash space (typically 0 to 2²⁵⁶ — 1)
  • Map both servers and songs onto this space using a uniform hash function

2. Data Assignment:

  • Each song is assigned to the next server clockwise from its position
  • When a server is added/removed, only the songs between the affected server and its predecessor need to move

3. Virtual Nodes:

  • Each physical server is represented by multiple virtual nodes
  • Improves load distribution
  • Handles heterogeneous server capacities

Implementation Example

Let’s implement this for our music streaming service:

class ConsistentHash:
def __init__(self, replicas=3):
self.replicas = replicas
self.ring = {} # Hash -> Server mapping
self.sorted_keys = [] # Sorted hash values

def add_server(self, server):
# Add virtual nodes for each server
for i in range(self.replicas):
key = self._hash(f"{server}:{i}")
self.ring[key] = server
self.sorted_keys.append(key)
self.sorted_keys.sort()

def remove_server(self, server):
# Remove all virtual nodes for the server
for i in range(self.replicas):
key = self._hash(f"{server}:{i}")
del self.ring[key]
self.sorted_keys.remove(key)

def get_server(self, song_id):
# Find the server for a given song
if not self.ring:
return None

key = self._hash(str(song_id))
for hash_key in self.sorted_keys:
if key <= hash_key:
return self.ring[hash_key]
return self.ring[self.sorted_keys[0]]

def _hash(self, key):
# Simple hash function for demonstration
return hash(key)

The Consistent Hashing Ring ensures efficient load distribution by mapping both servers and songs onto a circular space using SHA-256 hashing. Each server is assigned multiple virtual nodes, helping balance the load evenly. When a new server is added, it gets three virtual nodes to distribute traffic more uniformly. To determine where a song should be stored, the system hashes the song_id and assigns it to the next available server in a clockwise direction. This mechanism significantly improves scalability, as only a fraction of songs need to be reassigned when adding or removing servers, reducing data movement and minimizing disruptions.

How This Solves Our Previous Problems

  1. Minimal Data Movement:
  • When adding a new server, only K/N songs need to move (where K is total songs and N is number of servers)
  • For our 10 million songs example, scaling from 4 to 5 servers:
  • Traditional: ~8 million songs move
  • Consistent Hashing: ~2 million songs move

2. Better Load Distribution:

  • Virtual nodes ensure even distribution
  • Each server handles approximately equal number of songs
  • Can adjust number of virtual nodes based on server capacity

3. Improved Scalability:

  • Adding/removing servers only affects neighboring segments
  • No system-wide recalculation needed
  • Operations can be performed without downtime
The diagram illustrates Consistent Hashing for Load Distribution in a Music Streaming Service. Songs (e.g., Song A and Song B) are assigned to servers using a hash function, which maps them onto a circular hash space. Servers are also mapped onto the same space, and each song is assigned to the next available server in the clockwise direction. This ensures even distribution of data across multiple servers while minimizing movement when scaling. When a new server is added or removed, only the affected segment of the ring is reassigned, reducing disruption and improving scalability.

Real-World Benefits

Efficient Scaling: Servers can be added or removed without downtime.
Better User Experience: Reduced query latency and improved load balancing.
Cost Savings: Optimized network bandwidth usage and lower infrastructure costs.

Consistent Hashing is a foundational pattern used in large-scale distributed systems like DynamoDB, Cassandra, and Akamai CDN. It ensures high availability, efficient load balancing, and seamless scalability — all crucial for real-time applications like music streaming services.

💡 Key Takeaways:
Reduces data movement by 80% during scaling.
Enables near-linear scalability with minimal operational cost.
Prevents service disruptions while handling dynamic workloads.

This elegant approach turns a brittle, inefficient system into a robust, scalable infrastructure — making it the preferred choice for modern distributed architectures.

Thank you for being a part of the community

Before you go: