Week 5: Neural Networks¶

CS50's Introduction to Artificial Intelligence with Python

Biological Inspiration¶

Biological neurons connect to form networks that receive and transmit electrical signals. When input surpasses a threshold, the neuron fires and forwards its signal.

An Artificial Neural Network is a computational framework modeled on this biological system. Networks map inputs to outputs based on their structure and learned parameters — training data shapes the configuration.

Each biological neuron's equivalent in AI is a unit linked with other units.

Basic hypothesis function: h(x₁, x₂) = w₀ + w₁x₁ + w₂x₂ - w₁, w₂ = weights - w₀ = bias

Activation Functions¶

Translate hypothesis function outputs into decisions.

Function	Behavior
Step Function	Output 0 before threshold, then 1
Logistic Function	Output between 0 and 1 (graduated confidence)
ReLU (Rectified Linear Unit)	Any positive output; negatives become 0

Network Structure¶

Layers: - Input units — receive raw features - Hidden units — intermediate processing - Output units — produce final predictions

Each output unit: multiplies inputs by weights, adds bias, applies activation function g.

OR function example: With inputs x₁ and x₂, unit weights of 1, and bias of -1:

g(-1 + 1×x₁ + 1×x₂)

Produces correct outputs when thresholded at 0.

Gradient Descent¶

Minimizes loss during training by adjusting weights.

Start with random weights
Compute gradients across all data points
Update weights in the direction that reduces loss

Variant	Description
Standard Gradient Descent	Uses all data points
Stochastic Gradient Descent	Uses a single randomly-selected point
Mini-Batch Gradient Descent	Uses small random samples — balances cost and precision

Multilayer Networks¶

Networks with input, output, and one or more hidden layers enable modeling non-linear data.

Perceptrons only create linear decision boundaries — fail for non-linearly separable data
Multiple hidden layers overcome this limitation

Backpropagation¶

Training algorithm for networks with hidden layers:

Calculate output layer errors
Propagate errors backward layer by layer
Update weights progressively

Networks with multiple hidden layers are called deep neural networks.

Overfitting Prevention¶

Dropout: Randomly deactivates units during training, preventing over-reliance on individual units. All units function normally after training.

TensorFlow Example¶

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(n_features,)),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10)

Dense layers connect every node in one layer to all nodes in the previous layer.

Computer Vision¶

Images are made of pixels represented by RGB values (0–255 each). Direct neural networks on raw pixels face two problems: 1. Ignores spatial structure 2. Requires excessive weight calculations

Image Convolution¶

Applies a kernel matrix across an image, weighting each pixel by neighboring values. Different kernels serve different purposes — edge detection kernels highlight boundaries by amplifying pixel differences.

Pooling¶

Max-Pooling reduces input dimensions by sampling region-wise maximum values, leveraging similarity of adjacent pixels.

Flattening¶

Converts processed images into format suitable for traditional neural network layers.

Convolutional Neural Networks (CNNs)¶

Combine convolution, pooling, and dense layers for image analysis:

Extract features through learned convolution filters
Reduce dimensionality through pooling
Feed flattened results into dense layers

Convolution + pooling together reduce sensitivity to image variations — slightly different angles produce comparable outputs.

Recurrent Neural Networks (RNNs)¶

Unlike feed-forward networks that produce fixed outputs, RNNs use their own outputs as subsequent inputs.

This enables variable-length outputs, making them suitable for: - Image captioning - Video analysis - Machine translation - Sequential data processing

The non-linear architecture processes sequences iteratively, generating outputs at multiple steps until completion.