The brain is made up of neurons¶

Specialized cells that carry signals and power all brain activity¶

The human brain is a complex, living network built from tiny, specialized cells known as neurons. These cells form the core units of the nervous system and are responsible for carrying information in the form of electrical and chemical signals. Almost everything the brain does, including seeing, hearing, thinking, moving, and feeling, relies on how these neurons communicate with one another.

Each neuron has a unique structure that supports its role in signal transmission. It typically includes three major parts. The dendrites are short, branch-like extensions that receive incoming messages from other neurons. The cell body, or soma, contains the nucleus and processes these messages. The axon is a long, fiber-like structure that carries outgoing electrical signals to other neurons, muscles, or glands.

Neurons communicate quickly and precisely through a two-step process.
First, when a neuron is activated, it generates an electrical pulse called an action potential, which travels along the axon toward its end. At the axon terminals, this electrical signal triggers the release of neurotransmitters, which are chemicals that cross a small gap called the synapse. These neurotransmitters carry the message to the next neuron by binding to receptors on its dendrites.

image.png

This continuous exchange of electrical and chemical signals forms the basis for thought, sensation, learning, and memory. With an estimated 86 billion neurons in the brain, and each neuron forming up to 10,000 connections, the brain is capable of creating trillions of dynamic pathways.

These neurons do not work in isolation, but as part of interconnected systems.
Their collective behavior allows the brain to solve problems, adapt to change, and coordinate both simple and complex tasks. The ability of neurons to form, modify, and remove connections based on activity is called neuroplasticity, and it is essential for development, learning, and recovery.

This biological foundation — where intelligent behavior emerges from adaptive signal-processing units — directly influenced the design of Artificial Neural Networks in machine learning, where artificial neurons mimic these communication and learning principles in a digital form.

The brain learns by strengthening connections between neurons¶

This ability to adapt is known as synaptic plasticity¶

Learning in the human brain occurs when the connections between neurons become stronger through repeated use. These connections, called synapses, are tiny gaps where signals are passed from one neuron to another using chemical messengers. Each time a specific pathway is activated, such as while practicing a skill or recalling information, the brain slightly adjusts the strength of that synapse. With enough repetition, the connection becomes more efficient and stable.

This ability of the brain to modify its own wiring is known as synaptic plasticity, and it is a key mechanism behind how we learn, adapt, and remember. When two neurons are frequently active at the same time, the synapse between them becomes more responsive, allowing signals to travel more quickly and reliably. This principle is often summarized by the phrase, "neurons that fire together, wire together".

image.png

Synaptic plasticity is not one-directional. Connections can become either stronger or weaker depending on how often they are used. If a pathway is used frequently, the brain strengthens it by increasing receptor sensitivity, releasing more neurotransmitters, or creating new branches between the neurons. If a connection is rarely used, the brain may reduce its efficiency or eliminate it altogether, through a process known as synaptic pruning.

This continuous adjustment allows the brain to remain flexible and capable of change throughout life. In childhood, synaptic plasticity supports rapid learning and development. In adulthood, it plays a role in skill improvement, behavior change, and recovery from injury by reassigning functions to other regions.

Every experience reshapes the brain’s wiring.
Studying for a test, repeating a physical movement, or practicing an instrument strengthens the neural circuits involved. As connections are reinforced, tasks that were once difficult become easier and more automatic, not because the body changes, but because the brain has optimized its internal pathways.

This biological concept of strengthening useful connections through experience directly influenced how Artificial Neural Networks learn. In ANNs, weights between artificial neurons are adjusted in response to repeated data exposure, mimicking the brain’s way of learning through repetition.

Neurons form networks that process information¶

Groups of connected neurons work together to handle thinking, learning, and action¶

A single neuron is limited in what it can do on its own. However, when many neurons connect and work together, they form powerful systems known as neural networks. These networks are responsible for almost everything the brain does, including perception, movement, memory, emotion, and reasoning. Signals pass from one neuron to the next, traveling along complex paths that form when neurons link together through synapses.

image.png

Each time we see, hear, move, or learn, different patterns of neurons become active. When these patterns repeat, the connections between them grow stronger, allowing the brain to form stable circuits. These circuits become part of larger functional networks, where groups of neurons specialize in processing specific types of information. For instance, the visual network interprets shapes, colors, and motion from the eyes, the auditory network decodes sound, and the motor network controls movement.

These networks often operate in parallel. When you are walking down the street, one network processes what you see, another guides your balance and coordination, and another helps you remember where you are going. The brain integrates all of this information simultaneously to produce smooth and intelligent behavior.

Neural networks are not fixed systems, they are constantly changing and adapting.
Connections are formed, strengthened, weakened, or removed depending on how they are used. This adaptability, known as neuroplasticity, allows the brain to respond to new experiences, recover from damage, and refine its skills through practice.

The brain's networks are organized efficiently, but they are also flexible. If one part of a network is damaged, nearby areas can sometimes take over its function. This is especially true in younger brains, which show high levels of plasticity and can rewire themselves rapidly.

The idea of using groups of connected units that process information together is what inspired the architecture of Artificial Neural Networks. In ANNs, artificial neurons are organized into layers and connected by weighted links. These artificial networks, like the biological ones they are modeled after, can learn to recognize patterns, transform inputs into meaningful outputs, and improve through training.

Connected neurons work together to perform complex mental and physical tasks¶

In the human brain, neurons rarely work alone. They form large, organized groups called neural networks, where thousands or millions of neurons are interconnected through synapses. These connections allow signals to travel from one neuron to the next, creating information pathways. The brain uses these networks to process inputs, generate thoughts, make decisions, store memories, and control body functions — often all at once.

Each time we perform an action or experience something new, specific neurons become active. When this pattern of activity is repeated, the connections between those neurons become stronger and more reliable. As more neurons join in and fire together regularly, they form a stable circuit. These circuits eventually develop into specialized networks, where different regions of the brain work together to perform specific tasks. For example, the visual cortex in the back of the brain handles visual input, the motor cortex controls movement, and the prefrontal cortex helps with decision-making and planning.

What makes neural networks powerful is their ability to work in parallel. Multiple networks can process different pieces of information at the same time. For instance, when you drive a car, one network tracks the road, another monitors your speed, a third processes the sound of traffic, and yet another helps you remember where to turn. The brain coordinates these parallel tasks smoothly and quickly, without conscious effort.

Neural networks are also plastic, meaning they can change based on experience. If a pathway is used frequently, it becomes stronger and faster like a well-worn trail in a forest. If it is ignored, the connection fades or disappears. This ability to grow, prune, and reroute connections is known as neuroplasticity, and it’s central to how we learn new skills, adapt to change, and recover from injury.

These biological networks are not fixed circuits. They constantly reshape themselves in response to learning, attention, emotion, and environment. The structure of a person's neural networks is always evolving, becoming more refined as we gain experience.

The brain processes input and generates output¶

Perception, decision-making, and control — the core of intelligent behavior¶

The human brain functions as a highly dynamic information system. It continuously performs a cycle of input, processing, and output, enabling us to sense the world, understand it, and respond appropriately. This cycle happens in real time, without conscious effort, and is powered by billions of interconnected neurons.

Input begins with the senses, as organs such as the eyes, ears, skin, and tongue detect signals from the environment — including light, sound, texture, temperature, and chemicals — and convert them into electrical impulses. These impulses are transmitted to the brain through specialized sensory neurons.

Inside the brain, this incoming data is routed to specialized processing regions, each designed to handle a specific type of input.
For example,
visual information is directed to the visual cortex,
sound is handled in the auditory cortex,
and touch-related input is processed in the somatosensory cortex.

These regions do more than just receive signals. They perform complex interpretation — recognizing shapes, sounds, movement, language, and even emotion — by mapping patterns of activation across networks of neurons.

Once input is processed, the brain must decide how to respond.
This decision-making happens in higher-level areas such as the prefrontal cortex, where the brain evaluates current input, retrieves relevant memories, applies learned knowledge, and considers context. Whether it’s pulling your hand away from heat or forming a spoken sentence, the brain compares possibilities and selects an appropriate course of action.

The output is sent to the body through motor pathways, using motor neurons. These neurons activate muscles that control movement, facial expression, speech, or internal bodily functions. In many cases, multiple outputs are coordinated — for example, walking while maintaining balance and scanning the surroundings.

This entire perception-to-action flow happens in milliseconds. Behind the scenes, the brain is running multiple overlapping processes — each governed by a network of neurons — to ensure smooth, real-time behavior.

Artificial Neural Networks (ANNs) attempt to replicate this same model. They receive structured inputs, pass them through interconnected layers of processing units, and produce final outputs, such as classifications, decisions, or predictions. Like the brain, their strength lies in how well the internal layers learn to represent and respond to the input data.

The brain improves with practice¶

Repetition strengthens neural pathways and makes actions easier over time¶

The human brain becomes better at a task when it is repeated. This happens because repetition strengthens the neural pathways responsible for performing that task. Each time a group of neurons is activated together, the connections between them become more efficient. With enough repetition, signals can travel faster and more reliably, making the task feel easier and more automatic.

This process is a direct result of synaptic plasticity, which allows the brain to change based on experience. When a skill is practiced, whether it is solving math problems, typing on a keyboard, or playing a musical instrument, the brain strengthens the specific connections involved. These changes occur at the synapses, where chemical signals are passed from one neuron to another. Repeated activation causes these synapses to become more responsive, reducing the effort needed to trigger them again.

With continued practice, the brain begins to optimize the pathway.
Unused connections are weakened or removed, while useful connections are reinforced. Over time, the pathway becomes so strong and efficient that the task can be performed with little conscious thought. This is why we can ride a bicycle, speak a second language, or remember a phone number through recall, rather than relearning it from scratch.

This principle applies to both mental and physical activities. A person can become better at concentrating, reading, drawing, or playing chess through focused repetition. Similarly, physical actions such as throwing a ball, playing a piano scale, or writing with speed and accuracy can all improve because the brain has optimized the control circuits involved.

Practice does not just improve performance, it also helps protect the skill from being forgotten. A well-established neural pathway is more likely to remain active, even if it is not used for a period of time. However, like any adaptable system, the brain will gradually weaken connections that are not used regularly.

This principle, where learning occurs through repeated activation of specific pathways, is also central to how Artificial Neural Networks are trained. In ANNs, each time the model sees data, it adjusts the connection weights slightly. Repeated exposure to similar examples helps the network reinforce useful pathways and ignore those that do not improve accuracy — mimicking how repetition sharpens skills in the biological brain.

Artificial Neural Networks are inspired by the brain¶

They are built using artificial neurons that process and learn from data¶

Artificial Neural Networks, or ANNs, are machine learning systems designed to mimic how the human brain learns and processes information. In the biological brain, intelligence comes from networks of neurons that communicate through signals and strengthen their connections over time. Inspired by this, scientists created artificial systems made of simple computational units called artificial neurons, also referred to as nodes.

An artificial neuron does not function like a biological cell, but it imitates the key idea. It takes in one or more numerical inputs, multiplies each input by a weight, adds a bias, and passes the result through a mathematical activation function. This function determines how strongly the neuron responds, allowing the network to capture non-linear patterns in the data.

The output from one artificial neuron becomes input to others. As signals pass through many interconnected neurons, the network is able to transform raw inputs into meaningful outputs, such as predictions or classifications. These connections between neurons are not fixed, they are adjusted during training, based on how accurate the network’s output is.

Each artificial neuron plays a small role, but together they form a powerful system.
Just like in the brain, the strength of connections between artificial neurons changes over time, allowing the network to improve. Through this structure, ANNs learn to recognize patterns, generalize from examples, and solve problems across many fields, including computer vision, natural language processing, and decision-making.

While artificial neurons are much simpler than biological ones, the idea of learning by adjusting connections is the same. This foundational principle makes ANNs one of the most effective tools in modern machine learning.

Artificial neurons are arranged into layers¶

Each layer transforms the data and passes it forward through the network¶

In an Artificial Neural Network, artificial neurons are not used in isolation. They are arranged into structured groups called layers. Each layer is made up of multiple neurons working in parallel, and each layer has a specific role in transforming the data as it flows through the network.

The first layer is called the input layer. It receives raw data, such as numerical values, pixel intensities, or text-based features. Each neuron in the input layer corresponds to a specific feature in the dataset and simply passes that value into the network.

After the input layer, there are one or more hidden layers. These layers are responsible for processing the input, identifying patterns, and building internal representations. The neurons in hidden layers perform weighted calculations and apply activation functions, which allow the network to model complex, non-linear relationships between inputs and outputs.

At the end of the network is the output layer. This layer produces the final prediction, classification, or score, depending on the task. For example, in a binary classification problem, the output layer might contain a single neuron that returns a value between 0 and 1, representing probability.

image.png

Data moves through these layers in a forward direction, one step at a time.
As each layer transforms the data, it passes the result to the next layer. This process continues until the output is generated. The entire network, from input to output, works together to learn patterns from data by gradually adjusting internal parameters during training.

The layered structure of ANNs, along with their ability to learn from experience, makes them highly flexible. By stacking multiple hidden layers, networks can capture deeper and more abstract relationships in data, which is why this structure is also called a deep neural network when many layers are used.

Each connection between neurons has a weight¶

Weights control how strongly one neuron influences another¶

In an Artificial Neural Network, neurons are connected to one another through weighted links. Each connection between two neurons carries a numeric value called a weight, which determines how much influence the signal from one neuron has on the neuron it connects to.

When a neuron receives inputs from multiple other neurons, each input is multiplied by its corresponding weight. These weighted inputs are then summed, along with a bias term, and passed through an activation function to produce the output of the neuron. The weight plays a central role in this computation — a higher weight increases the impact of that particular input, while a lower or negative weight reduces it.

image.png

Learning in ANNs depends on updating these weights.
Initially, weights are assigned randomly or with small values. During training, the network compares its predictions to the actual results and adjusts the weights to reduce the difference between them. This adjustment process allows the network to learn the correct patterns, associations, or mappings from inputs to outputs.

The concept of weighted connections is inspired by the brain’s synaptic strength. In biological systems, some synapses pass signals more effectively than others, depending on experience and prior activity. Similarly, in ANNs, stronger weights help the network recognize important features, while weaker weights suppress irrelevant or less useful signals.

Through careful tuning of weights, an ANN can model highly complex and abstract relationships. Every prediction the network makes is the result of thousands or even millions of weighted contributions, working together across layers to generate meaningful output.

Artificial Neural Networks learn by adjusting their weights¶

Training improves accuracy by minimizing the difference between prediction and reality¶

Artificial Neural Networks improve their performance through a process called training, in which the weights of the connections between neurons are continuously adjusted based on experience. At the core of this learning process is feedback, the network makes a prediction, compares it to the correct answer, and uses the result of that comparison to improve.

When data is passed through the network, each neuron performs its computation, and the output is generated. The network then calculates a loss, or error, which measures how far the prediction is from the true value. A smaller loss indicates a more accurate result, while a larger loss means the network needs to improve.

To reduce this error, the network uses a technique known as backpropagation. In this method, the error is propagated backward from the output layer to the earlier layers. As the error signal moves through the network, it tells each neuron how much it contributed to the mistake. Based on this information, the weights are adjusted using an algorithm such as gradient descent, which fine-tunes each weight slightly to improve performance.

image.png

This process is repeated across many rounds of training, using batches of data.
With each pass, known as an epoch, the network becomes better at minimizing the loss. The goal is not to memorize the data, but to generalize, to learn patterns that apply to new, unseen examples. Over time, the network adjusts its internal weights in a way that produces more accurate and reliable outputs.

This weight-adjustment mechanism is what makes ANNs capable of learning from data, just as the brain modifies its synaptic strengths through experience. As the training continues, the model moves closer to solving the problem it was designed to address.

Artificial Neural Networks can perform a wide range of tasks¶

They learn from data to recognize patterns and make predictions¶

Once trained, Artificial Neural Networks become powerful tools for solving real-world problems. They can be used to recognize images, such as identifying objects in a photo, or predict outcomes, like forecasting stock prices or medical diagnoses. ANNs are also widely used to understand language, enabling applications such as translation, sentiment analysis, and chatbots.

Their versatility comes from learning directly from data, without needing to be explicitly programmed for each task.

In [1]:
# Step 1: Import required libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.utils import to_categorical

# Step 2: Load the digits dataset
digits = load_digits()
X = digits.data
y = digits.target  # Labels: digits from 0 to 9

# Step 3: Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Step 4: Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Step 5: One-hot encode the labels
y_train_cat = to_categorical(y_train, num_classes=10)
y_test_cat = to_categorical(y_test, num_classes=10)
/Users/saroshbaig/Library/Python/3.9/lib/python/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
In [2]:
# Step 6: Build the ANN model using Input layer (no input_shape warning)
model = Sequential()
model.add(Input(shape=(X.shape[1],)))      # Explicit input layer
model.add(Dense(64, activation='relu'))    # Hidden layer 1
model.add(Dense(32, activation='relu'))    # Hidden layer 2
model.add(Dense(10, activation='softmax')) # Output layer (10 classes)

# Step 7: Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Step 8: Train the model
history = model.fit(X_train_scaled, y_train_cat,
                    epochs=30,
                    batch_size=32,
                    validation_split=0.1,
                    verbose=1)
Epoch 1/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.2119 - loss: 2.2803 - val_accuracy: 0.5625 - val_loss: 1.6112
Epoch 2/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.7254 - loss: 1.2771 - val_accuracy: 0.8403 - val_loss: 0.7629
Epoch 3/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.8877 - loss: 0.6027 - val_accuracy: 0.9167 - val_loss: 0.4266
Epoch 4/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9345 - loss: 0.3241 - val_accuracy: 0.9306 - val_loss: 0.3300
Epoch 5/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9694 - loss: 0.1955 - val_accuracy: 0.9306 - val_loss: 0.2719
Epoch 6/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9769 - loss: 0.1542 - val_accuracy: 0.9375 - val_loss: 0.2419
Epoch 7/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9850 - loss: 0.1129 - val_accuracy: 0.9375 - val_loss: 0.2290
Epoch 8/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9905 - loss: 0.0839 - val_accuracy: 0.9375 - val_loss: 0.2186
Epoch 9/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9889 - loss: 0.0803 - val_accuracy: 0.9375 - val_loss: 0.1977
Epoch 10/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9941 - loss: 0.0580 - val_accuracy: 0.9444 - val_loss: 0.1813
Epoch 11/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9935 - loss: 0.0518 - val_accuracy: 0.9444 - val_loss: 0.1763
Epoch 12/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9974 - loss: 0.0372 - val_accuracy: 0.9444 - val_loss: 0.1781
Epoch 13/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9995 - loss: 0.0255 - val_accuracy: 0.9444 - val_loss: 0.1667
Epoch 14/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9980 - loss: 0.0296 - val_accuracy: 0.9444 - val_loss: 0.1669
Epoch 15/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9963 - loss: 0.0269 - val_accuracy: 0.9444 - val_loss: 0.1670
Epoch 16/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9970 - loss: 0.0214 - val_accuracy: 0.9444 - val_loss: 0.1619
Epoch 17/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9963 - loss: 0.0196 - val_accuracy: 0.9444 - val_loss: 0.1631
Epoch 18/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9936 - loss: 0.0215 - val_accuracy: 0.9444 - val_loss: 0.1626
Epoch 19/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9987 - loss: 0.0133 - val_accuracy: 0.9444 - val_loss: 0.1619
Epoch 20/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9979 - loss: 0.0161 - val_accuracy: 0.9444 - val_loss: 0.1576
Epoch 21/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0114 - val_accuracy: 0.9444 - val_loss: 0.1606
Epoch 22/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0096 - val_accuracy: 0.9444 - val_loss: 0.1587
Epoch 23/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0087 - val_accuracy: 0.9444 - val_loss: 0.1618
Epoch 24/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0071 - val_accuracy: 0.9444 - val_loss: 0.1581
Epoch 25/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0065 - val_accuracy: 0.9444 - val_loss: 0.1589
Epoch 26/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0057 - val_accuracy: 0.9444 - val_loss: 0.1597
Epoch 27/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0058 - val_accuracy: 0.9444 - val_loss: 0.1604
Epoch 28/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0052 - val_accuracy: 0.9444 - val_loss: 0.1616
Epoch 29/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0052 - val_accuracy: 0.9444 - val_loss: 0.1594
Epoch 30/30
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0053 - val_accuracy: 0.9444 - val_loss: 0.1592

Step 6: Build the ANN model using Input and Dense layers¶

This step defines the structure of the neural network¶

We begin by using the Sequential() model, which means layers are stacked one after another in a fixed order. This is the most beginner-friendly way to create a feedforward neural network.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input

model = Sequential()
  • This line creates an empty container to which we will add layers step-by-step.
model.add(Input(shape=(X.shape[1],)))
  • This line specifies the shape of the input data. X.shape[1] gives the number of input features (columns) in the dataset.
  • The Input layer does not perform any computation; it just defines the shape of data entering the network.
model.add(Dense(64, activation='relu'))
  • This is the first hidden layer.
  • Dense(64) means it contains 64 neurons, each fully connected to all inputs.
  • activation='relu' applies the ReLU function: it outputs the input directly if it's positive; otherwise, it returns 0. ReLU helps the model learn non-linear patterns without causing issues like vanishing gradients.
model.add(Dense(32, activation='relu'))
  • This is the second hidden layer, similar to the first one, but with 32 neurons.
  • It takes the output from the first hidden layer as input and applies the same ReLU activation.
model.add(Dense(10, activation='softmax'))
  • This is the output layer.
  • It has 10 neurons, one for each digit class (0 through 9).
  • activation='softmax' converts the output into probabilities that sum to 1, helping the model decide the most likely class.

Other options for Dense:

  • use_bias: Whether to use a bias term (default: True)
  • kernel_initializer: Method to initialize the weights, such as 'he_normal', 'glorot_uniform'
  • bias_initializer: How to initialize biases, such as 'zeros'
  • kernel_regularizer: Use L1 or L2 penalties to prevent overfitting

Step 7: Compile the model¶

This step sets up how the model will learn¶

model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
  • optimizer='adam' selects the Adam optimizer, which automatically adjusts learning rates and combines momentum and adaptive learning. It's efficient and works well for most problems.
  • loss='categorical_crossentropy' is used when the output labels are one-hot encoded. It measures the difference between the predicted and actual probabilities.
  • metrics=['accuracy'] tells Keras to monitor the percentage of correct predictions during training and testing.

Other options:

  • You could use 'sgd', 'rmsprop', 'adagrad', or custom settings like Adam(learning_rate=0.001)
  • Loss options include 'sparse_categorical_crossentropy' for integer labels, 'binary_crossentropy' for binary classification, and 'mse' for regression
  • You can add metrics like 'precision', 'recall', or 'AUC'

Step 8: Train the model¶

This step runs the training process¶

history = model.fit(X_train_scaled, y_train_cat,
                    epochs=30,
                    batch_size=32,
                    validation_split=0.1,
                    verbose=1)
  • X_train_scaled: This is your input training data, already standardized so that features have mean 0 and standard deviation 1.
  • y_train_cat: This is your training target (label) data, converted to one-hot encoding (e.g., 7 becomes [0,0,0,0,0,0,0,1,0,0]).
  • epochs=30: The model will go through the full training data 30 times.
  • batch_size=32: The data is divided into batches of 32 samples. Weights are updated after each batch.
  • validation_split=0.1: Reserves 10% of training data for validation. This helps track how well the model performs on unseen data.
  • verbose=1: Shows a progress bar during training.

Optional arguments:

  • callbacks: You can add dynamic training behavior, like stopping early if performance stops improving (EarlyStopping), or saving the best model (ModelCheckpoint)
  • shuffle: Set False if you don’t want data to be randomly shuffled each epoch (usually best to keep it True)
  • initial_epoch: Start training from a specific epoch (useful when continuing from a previously saved model)
  • class_weight and sample_weight: Useful for handling imbalanced classes or prioritizing certain samples during training
In [3]:
# Step 9: Evaluate on test set
test_loss, test_accuracy = model.evaluate(X_test_scaled, y_test_cat, verbose=0)
print(f"\nTest Accuracy: {test_accuracy:.4f}")

# Step 10: Plot training and validation accuracy
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.title('Training vs Validation Accuracy')
plt.legend()
plt.grid(True)
plt.show()
Test Accuracy: 0.9722
No description has been provided for this image