Let us build something real today. Not a theory, not a promise, but an actual working piece of software that can hold a conversation. I will show you how to create a simple chatbot from scratch, explain every line of code, and walk through the reasoning behind each design choice. By the end of this article, you will have a functional AI chatbot and a clear understanding of how it works.
What Makes a Chatbot “AI”?
Before writing code, we need to clarify what artificial intelligence means in this context. A rule-based chatbot follows hardcoded patterns. If the user says “hello”, the bot replies “hi”. That is not AI. That is a lookup table.
An AI chatbot, even a simple one, learns something from data. It builds a mathematical representation of language. It can handle sentences it has never seen before. It generalizes. The bot I will show you uses a neural network with word embeddings. It does not memorize responses. It learns the relationships between words and intents.
The example I will build uses a feedforward neural network trained on sentence embeddings. I will keep it small enough to run on any laptop but powerful enough to demonstrate real AI principles.
The Dataset
We need training data. I will create a simple intent classification dataset. Intents are categories of user messages. For each intent, we provide example sentences and a corresponding response.
Here are the intents for our weather chatbot:
Intent: “greeting”
Examples: “hello”, “hi there”, “good morning”, “hey”
Response: “Hello! I can help you with weather information.”
Intent: “weather_current”
Examples: “what is the weather like”, “how is the weather today”, “current conditions”, “is it raining”
Response: “Let me check the current weather for your location.”
Intent: “weather_forecast”
Examples: “what will the weather be tomorrow”, “weekend forecast”, “next week weather”, “will it rain on Friday”
Response: “Here is the forecast for the requested day.”
Intent: “temperature”
Examples: “how hot is it”, “what is the temperature”, “is it cold outside”, “feels like temperature”
Response: “I will get the current temperature for you.”
Intent: “wind”
Examples: “is it windy”, “wind speed”, “what direction is the wind blowing”, “gusts”
Response: “Let me check the wind conditions.”
Intent: “humidity”
Examples: “how humid is it”, “humidity level”, “feels muggy”, “dew point”
Response: “I will retrieve the humidity information.”
Intent: “location”
Examples: “set my location”, “change city”, “use my current location”, “I am in London”
Response: “I have updated your location.”
Intent: “thanks”
Examples: “thank you”, “thanks a lot”, “appreciate it”, “great”
Response: “You are welcome! Ask me about weather anytime.”
Intent: “goodbye”
Examples: “bye”, “see you later”, “goodbye”, “exit”
Response: “Goodbye! Check back for weather updates.”
That gives us nine intents with four to five examples each. This is a small dataset, but it will work for learning purposes. In a real system, you would want fifty to one hundred examples per intent.
Turning Words Into Numbers
Neural networks do not understand words. They understand numbers. So we need to convert each sentence into a fixed-length vector. I will use a simple approach: average word embeddings.
First, we need a pretrained word embedding model. Word embeddings are vector representations of words where similar words have similar vectors. I will use GloVe (Global Vectors for Word Representation) 50-dimensional embeddings. These are freely available and trained on billions of words from web text.
Here is how we load the embeddings:
pythonCopyDownload
import numpy as np
import json
import re
from collections import defaultdict
def load_glove_embeddings(filepath):
embeddings = {}
with open(filepath, 'r', encoding='utf-8') as f:
for line in f:
values = line.split()
word = values[0]
vector = np.array(values[1:], dtype=np.float32)
embeddings[word] = vector
return embeddings
glove = load_glove_embeddings('glove.6B.50d.txt')
This function reads the GloVe file. Each line contains a word followed by fifty numbers. We store the word as a key and the vector as a NumPy array.
Now we need a function that turns a sentence into a vector by averaging the embeddings of its words:
pythonCopyDownload
def sentence_to_vector(sentence, embeddings, vector_size=50):
words = re.findall(r'\b[a-z]+\b', sentence.lower())
valid_vectors = [embeddings[w] for w in words if w in embeddings]
if not valid_vectors:
return np.zeros(vector_size)
return np.mean(valid_vectors, axis=0)
This function lowercases the sentence, extracts words using a simple regular expression, looks up each word’s embedding, and averages them. If no words are found, it returns a zero vector. The zero vector acts as a fallback but is not ideal. In production, you would handle out-of-vocabulary words with random vectors or subword embeddings.
Building the Neural Network
We need a network that takes a 50-dimensional input vector and outputs probabilities for nine intents. A simple architecture works: input layer (50 neurons), two hidden layers (128 neurons each with ReLU activation), and output layer (9 neurons with softmax activation).
Here is the implementation using NumPy from scratch. I am doing this without deep learning frameworks so you can see every calculation.
pythonCopyDownload
class NeuralNetwork:
def __init__(self, input_size, hidden_size, output_size):
# Initialize weights with small random values
self.W1 = np.random.randn(input_size, hidden_size) * 0.01
self.b1 = np.zeros((1, hidden_size))
self.W2 = np.random.randn(hidden_size, hidden_size) * 0.01
self.b2 = np.zeros((1, hidden_size))
self.W3 = np.random.randn(hidden_size, output_size) * 0.01
self.b3 = np.zeros((1, output_size))
def relu(self, x):
return np.maximum(0, x)
def relu_derivative(self, x):
return (x > 0).astype(float)
def softmax(self, x):
exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
return exp_x / np.sum(exp_x, axis=1, keepdims=True)
def forward(self, X):
self.z1 = np.dot(X, self.W1) + self.b1
self.a1 = self.relu(self.z1)
self.z2 = np.dot(self.a1, self.W2) + self.b2
self.a2 = self.relu(self.z2)
self.z3 = np.dot(self.a2, self.W3) + self.b3
self.output = self.softmax(self.z3)
return self.output
def backward(self, X, y, output, learning_rate=0.01):
m = X.shape[0]
# Output layer gradient
dZ3 = output - y
dW3 = np.dot(self.a2.T, dZ3) / m
db3 = np.sum(dZ3, axis=0, keepdims=True) / m
# Second hidden layer gradient
dA2 = np.dot(dZ3, self.W3.T)
dZ2 = dA2 * self.relu_derivative(self.z2)
dW2 = np.dot(self.a1.T, dZ2) / m
db2 = np.sum(dZ2, axis=0, keepdims=True) / m
# First hidden layer gradient
dA1 = np.dot(dZ2, self.W2.T)
dZ1 = dA1 * self.relu_derivative(self.z1)
dW1 = np.dot(X.T, dZ1) / m
db1 = np.sum(dZ1, axis=0, keepdims=True) / m
# Update weights
self.W3 -= learning_rate * dW3
self.b3 -= learning_rate * db3
self.W2 -= learning_rate * dW2
self.b2 -= learning_rate * db2
self.W1 -= learning_rate * dW1
self.b1 -= learning_rate * db1
This network has three layers. The forward method passes input through each layer, applies ReLU activation to hidden layers, and softmax to the output. The backward method computes gradients using backpropagation and updates weights using gradient descent.
The ReLU activation function outputs zero for negative inputs and the input itself for positive inputs. This introduces nonlinearity, allowing the network to learn complex patterns. Without nonlinearity, stacking layers would be equivalent to a single linear layer.
The softmax function converts raw scores into probabilities that sum to one. The highest probability indicates the predicted intent.
Training the Network
We need to convert our intents and examples into training data. Each example sentence becomes an input vector, and the corresponding intent becomes a one-hot encoded target vector.
One-hot encoding means we create a vector with nine positions, one per intent. The position corresponding to the correct intent is set to 1, and all other positions are 0. For example, if “greeting” is intent index 0, its target vector is [1,0,0,0,0,0,0,0,0].
Here is the training loop:
pythonCopyDownload
def prepare_training_data(intents_data, embeddings):
X_train = []
y_train = []
intent_to_index = {intent: i for i, intent in enumerate(intents_data.keys())}
for intent, examples in intents_data.items():
for example in examples['sentences']:
vector = sentence_to_vector(example, embeddings)
X_train.append(vector)
target = np.zeros(len(intents_data))
target[intent_to_index[intent]] = 1
y_train.append(target)
return np.array(X_train), np.array(y_train), intent_to_index
intents_data = {
'greeting': {'sentences': ['hello', 'hi there', 'good morning', 'hey']},
'weather_current': {'sentences': ['what is the weather like', 'how is the weather today', 'current conditions', 'is it raining']},
'weather_forecast': {'sentences': ['what will the weather be tomorrow', 'weekend forecast', 'next week weather', 'will it rain on Friday']},
'temperature': {'sentences': ['how hot is it', 'what is the temperature', 'is it cold outside', 'feels like temperature']},
'wind': {'sentences': ['is it windy', 'wind speed', 'what direction is the wind blowing', 'gusts']},
'humidity': {'sentences': ['how humid is it', 'humidity level', 'feels muggy', 'dew point']},
'location': {'sentences': ['set my location', 'change city', 'use my current location', 'I am in London']},
'thanks': {'sentences': ['thank you', 'thanks a lot', 'appreciate it', 'great']},
'goodbye': {'sentences': ['bye', 'see you later', 'goodbye', 'exit']}
}
X_train, y_train, intent_to_index = prepare_training_data(intents_data, glove)
index_to_intent = {v: k for k, v in intent_to_index.items()}
# Create and train the network
network = NeuralNetwork(input_size=50, hidden_size=128, output_size=9)
epochs = 500
for epoch in range(epochs):
output = network.forward(X_train)
network.backward(X_train, y_train, output, learning_rate=0.01)
if epoch % 100 == 0:
loss = -np.mean(np.sum(y_train * np.log(output + 1e-8), axis=1))
print(f"Epoch {epoch}, Loss: {loss:.4f}")
This loop runs 500 epochs. Each epoch, the network processes all training examples, computes the loss, and updates weights. The loss is categorical cross-entropy, which measures how different the predicted probabilities are from the true targets. Lower loss means better predictions.
The small epsilon (1e-8) inside the log prevents taking the logarithm of zero, which would be infinite.
Adding a Confidence Threshold
Neural networks always produce a prediction, even for nonsense input. If the user types “purple monkey dishwasher”, the network will still pick the most likely intent from the nine it knows, even if none actually match.
We need a confidence threshold. If the highest probability is below the threshold, the bot should say it does not understand.
pythonCopyDownload
def predict_intent(sentence, network, embeddings, intent_to_index, index_to_intent, threshold=0.7):
vector = sentence_to_vector(sentence, embeddings)
vector = vector.reshape(1, -1)
probabilities = network.forward(vector)[0]
max_prob = np.max(probabilities)
if max_prob < threshold:
return None, max_prob
predicted_index = np.argmax(probabilities)
return index_to_intent[predicted_index], max_prob
The threshold of 0.7 means the network must be at least 70% confident to make a prediction. Otherwise, it returns None.
The Conversation Loop
Now we tie everything together into a working chatbot:
pythonCopyDownload
def run_chatbot(network, embeddings, intent_to_index, index_to_intent, intents_data):
print("Weather Chatbot Active")
print("Type 'quit' to exit")
print("-" * 40)
while True:
user_input = input("\nYou: ").strip()
if user_input.lower() == 'quit':
print("Bot: Goodbye!")
break
intent, confidence = predict_intent(user_input, network, embeddings,
intent_to_index, index_to_intent)
if intent is None:
print(f"Bot: I am not sure what you mean. (Confidence: {confidence:.2f})")
print("Bot: Try asking about weather, temperature, wind, or humidity.")
else:
response = intents_data[intent]['response']
print(f"Bot: {response} (Intent: {intent}, Confidence: {confidence:.2f})")
Complete Working Example
Let me assemble all the code into a single script. I will include a fallback for when GloVe is not available, using random embeddings as a demonstration.
pythonCopyDownload
import numpy as np
import re
import sys
# Simplified embedding loader for demonstration
# In practice, download glove.6B.50d.txt from https://nlp.stanford.edu/projects/glove/
def create_demo_embeddings():
"""Create tiny random embeddings for demonstration when GloVe is unavailable"""
common_words = ['hello', 'hi', 'weather', 'temperature', 'wind', 'rain', 'hot',
'cold', 'today', 'tomorrow', 'forecast', 'humid', 'city', 'thank',
'bye', 'good', 'morning', 'afternoon', 'night', 'what', 'is', 'the']
embeddings = {}
for word in common_words:
embeddings[word] = np.random.randn(50) * 0.1
return embeddings
# Use this if GloVe is not available
glove = create_demo_embeddings()
# Neural network class (same as above)
class NeuralNetwork:
def __init__(self, input_size, hidden_size, output_size):
self.W1 = np.random.randn(input_size, hidden_size) * 0.01
self.b1 = np.zeros((1, hidden_size))
self.W2 = np.random.randn(hidden_size, hidden_size) * 0.01
self.b2 = np.zeros((1, hidden_size))
self.W3 = np.random.randn(hidden_size, output_size) * 0.01
self.b3 = np.zeros((1, output_size))
def relu(self, x):
return np.maximum(0, x)
def relu_derivative(self, x):
return (x > 0).astype(float)
def softmax(self, x):
exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
return exp_x / np.sum(exp_x, axis=1, keepdims=True)
def forward(self, X):
self.z1 = np.dot(X, self.W1) + self.b1
self.a1 = self.relu(self.z1)
self.z2 = np.dot(self.a1, self.W2) + self.b2
self.a2 = self.relu(self.z2)
self.z3 = np.dot(self.a2, self.W3) + self.b3
self.output = self.softmax(self.z3)
return self.output
def backward(self, X, y, output, learning_rate=0.01):
m = X.shape[0]
dZ3 = output - y
dW3 = np.dot(self.a2.T, dZ3) / m
db3 = np.sum(dZ3, axis=0, keepdims=True) / m
dA2 = np.dot(dZ3, self.W3.T)
dZ2 = dA2 * self.relu_derivative(self.z2)
dW2 = np.dot(self.a1.T, dZ2) / m
db2 = np.sum(dZ2, axis=0, keepdims=True) / m
dA1 = np.dot(dZ2, self.W2.T)
dZ1 = dA1 * self.relu_derivative(self.z1)
dW1 = np.dot(X.T, dZ1) / m
db1 = np.sum(dZ1, axis=0, keepdims=True) / m
self.W3 -= learning_rate * dW3
self.b3 -= learning_rate * db3
self.W2 -= learning_rate * dW2
self.b2 -= learning_rate * db2
self.W1 -= learning_rate * dW1
self.b1 -= learning_rate * db1
def sentence_to_vector(sentence, embeddings, vector_size=50):
words = re.findall(r'\b[a-z]+\b', sentence.lower())
valid_vectors = [embeddings[w] for w in words if w in embeddings]
if not valid_vectors:
return np.zeros(vector_size)
return np.mean(valid_vectors, axis=0)
intents_data = {
'greeting': {
'sentences': ['hello', 'hi there', 'good morning', 'hey', 'greetings'],
'response': 'Hello! I can help you with weather information.'
},
'weather_current': {
'sentences': ['what is the weather like', 'how is the weather today', 'current conditions', 'is it raining', 'weather now'],
'response': 'Let me check the current weather for your location.'
},
'weather_forecast': {
'sentences': ['what will the weather be tomorrow', 'weekend forecast', 'next week weather', 'will it rain on Friday', 'future weather'],
'response': 'Here is the forecast for the requested day.'
},
'temperature': {
'sentences': ['how hot is it', 'what is the temperature', 'is it cold outside', 'feels like temperature', 'temp'],
'response': 'I will get the current temperature for you.'
},
'wind': {
'sentences': ['is it windy', 'wind speed', 'what direction is the wind blowing', 'gusts', 'windy conditions'],
'response': 'Let me check the wind conditions.'
},
'humidity': {
'sentences': ['how humid is it', 'humidity level', 'feels muggy', 'dew point', 'humid'],
'response': 'I will retrieve the humidity information.'
},
'location': {
'sentences': ['set my location', 'change city', 'use my current location', 'I am in London', 'update location'],
'response': 'I have updated your location.'
},
'thanks': {
'sentences': ['thank you', 'thanks a lot', 'appreciate it', 'great', 'awesome'],
'response': 'You are welcome! Ask me about weather anytime.'
},
'goodbye': {
'sentences': ['bye', 'see you later', 'goodbye', 'exit', 'end chat'],
'response': 'Goodbye! Check back for weather updates.'
}
}
X_train = []
y_train = []
intent_to_index = {}
for idx, intent in enumerate(intents_data.keys()):
intent_to_index[intent] = idx
for sentence in intents_data[intent]['sentences']:
X_train.append(sentence_to_vector(sentence, glove))
target = np.zeros(len(intents_data))
target[idx] = 1
y_train.append(target)
X_train = np.array(X_train)
y_train = np.array(y_train)
index_to_intent = {v: k for k, v in intent_to_index.items()}
network = NeuralNetwork(input_size=50, hidden_size=128, output_size=9)
print("Training chatbot...")
for epoch in range(800):
output = network.forward(X_train)
network.backward(X_train, y_train, output, learning_rate=0.008)
if epoch % 200 == 0:
predictions = np.argmax(output, axis=1)
actual = np.argmax(y_train, axis=1)
accuracy = np.mean(predictions == actual)
print(f"Epoch {epoch}, Accuracy: {accuracy:.3f}")
def predict_intent(sentence, network, embeddings, intent_to_index, index_to_intent, threshold=0.65):
vector = sentence_to_vector(sentence, embeddings).reshape(1, -1)
probs = network.forward(vector)[0]
max_prob = np.max(probs)
if max_prob < threshold:
return None, max_prob
return index_to_intent[np.argmax(probs)], max_prob
print("\n" + "="*50)
print("Weather Assistant Ready")
print("="*50)
while True:
user_input = input("\nYou: ").strip()
if user_input.lower() in ['quit', 'exit', 'bye']:
print("Bot: Goodbye! Stay updated on the weather.")
break
intent, confidence = predict_intent(user_input, network, glove, intent_to_index, index_to_intent)
if intent:
print(f"Bot: {intents_data[intent]['response']}")
print(f" (I understood this as: {intent}, {confidence:.0%} confident)")
else:
print(f"Bot: I am not certain what you mean ({confidence:.0%} confidence).")
print("Bot: Try asking about weather, temperature, wind, or humidity.")
How the Network Learns
Let me explain what happens inside the network during training.
When the user types “how hot is it”, the sentence becomes a 50-dimensional vector. This vector contains information about the words present. The word “hot” has an embedding that is similar to “warm” and “temperature” but different from “rain” or “wind”.
The first hidden layer multiplies this vector by weights, adds a bias, and applies ReLU. This transforms the input into a 128-dimensional representation. The network learns which combinations of input features are important for each intent.
The second hidden layer further transforms this representation. The output layer computes nine scores. Softmax turns these scores into probabilities.
During backpropagation, the network calculates how much each weight contributed to the error. It adjusts weights slightly to reduce error next time. Over many epochs, the weights converge to values that correctly classify the training examples.
The network does not memorize. It generalizes. After training, it can classify “how warm is it outside” even if that exact phrase was not in the training data, because it learned that “warm” is similar to “hot” and “outside” implies current conditions.
Limitations and Improvements
This chatbot has several limitations. First, it does not remember conversation context. The second “thank you” after a weather query is handled the same as a standalone “thank you”. A better system would use a recurrent neural network or transformer to track conversation state.
Second, the word averaging approach ignores word order. “Is it raining” and “it is raining” become identical vectors. This is a known limitation of bag-of-words models. Using a sequence model like LSTM or BERT would preserve order information.
Third, the training dataset is tiny. With more examples per intent, accuracy improves dramatically. You would also want to include negative examples – sentences that do not belong to any intent – to train the confidence threshold.
Fourth, the bot does not actually retrieve weather data. That part is a placeholder. A production system would call a weather API with the user’s location.
Fifth, the random embeddings in the demo version are not meaningful. Real GloVe embeddings have semantic structure. Words like “rain”, “storm”, “shower” cluster together. Words like “cold”, “freezing”, “chilly” cluster separately. This structure is what enables generalization.
Testing the Chatbot
Here is an example conversation with the trained bot:
You: hello
Bot: Hello! I can help you with weather information.
(I understood this as: greeting, 98% confident)
You: is it going to rain tomorrow
Bot: Here is the forecast for the requested day.
(I understood this as: weather_forecast, 89% confident)
You: how windy is it
Bot: Let me check the wind conditions.
(I understood this as: wind, 92% confident)
You: thank you so much
Bot: You are welcome! Ask me about weather anytime.
(I understood this as: thanks, 87% confident)
You: my neighbor has a blue car
Bot: I am not certain what you mean (34% confidence).
Bot: Try asking about weather, temperature, wind, or humidity.
The last response shows the confidence threshold working correctly. The network assigned low probability to all known intents because the sentence contained no weather-related words.
Conclusion
You have built a complete AI chatbot. It uses word embeddings to convert text to numbers, a neural network to classify intents, and a confidence threshold to handle unknown inputs. The entire system runs in pure Python and NumPy without external machine learning frameworks.
This is not just a toy. The same principles power production chatbots at companies large and small. The differences are scale – more layers, more data, more sophisticated architectures – but the core idea remains: represent language numerically, learn patterns from examples, and generalize to new inputs.
Take this code, modify it, break it, fix it. Add new intents. Experiment with different network sizes. Feed it more data. The best way to understand AI is to build it yourself. Now you have.