Week 5: Neural Networks¶
CS50's Introduction to Artificial Intelligence with Python
Biological Inspiration¶
Biological neurons connect to form networks that receive and transmit electrical signals. When input surpasses a threshold, the neuron fires and forwards its signal.
An Artificial Neural Network is a computational framework modeled on this biological system. Networks map inputs to outputs based on their structure and learned parameters — training data shapes the configuration.
Each biological neuron's equivalent in AI is a unit linked with other units.
Basic hypothesis function: h(x₁, x₂) = w₀ + w₁x₁ + w₂x₂
- w₁, w₂ = weights
- w₀ = bias
Activation Functions¶
Translate hypothesis function outputs into decisions.
| Function | Behavior |
|---|---|
| Step Function | Output 0 before threshold, then 1 |
| Logistic Function | Output between 0 and 1 (graduated confidence) |
| ReLU (Rectified Linear Unit) | Any positive output; negatives become 0 |
Network Structure¶
Layers: - Input units — receive raw features - Hidden units — intermediate processing - Output units — produce final predictions
Each output unit: multiplies inputs by weights, adds bias, applies activation function g.
OR function example: With inputs x₁ and x₂, unit weights of 1, and bias of -1:
Produces correct outputs when thresholded at 0.Gradient Descent¶
Minimizes loss during training by adjusting weights.
- Start with random weights
- Compute gradients across all data points
- Update weights in the direction that reduces loss
| Variant | Description |
|---|---|
| Standard Gradient Descent | Uses all data points |
| Stochastic Gradient Descent | Uses a single randomly-selected point |
| Mini-Batch Gradient Descent | Uses small random samples — balances cost and precision |
Multilayer Networks¶
Networks with input, output, and one or more hidden layers enable modeling non-linear data.
- Perceptrons only create linear decision boundaries — fail for non-linearly separable data
- Multiple hidden layers overcome this limitation
Backpropagation¶
Training algorithm for networks with hidden layers:
- Calculate output layer errors
- Propagate errors backward layer by layer
- Update weights progressively
Networks with multiple hidden layers are called deep neural networks.
Overfitting Prevention¶
Dropout: Randomly deactivates units during training, preventing over-reliance on individual units. All units function normally after training.
TensorFlow Example¶
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu', input_shape=(n_features,)),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10)
Dense layers connect every node in one layer to all nodes in the previous layer.
Computer Vision¶
Images are made of pixels represented by RGB values (0–255 each). Direct neural networks on raw pixels face two problems: 1. Ignores spatial structure 2. Requires excessive weight calculations
Image Convolution¶
Applies a kernel matrix across an image, weighting each pixel by neighboring values. Different kernels serve different purposes — edge detection kernels highlight boundaries by amplifying pixel differences.
Pooling¶
Max-Pooling reduces input dimensions by sampling region-wise maximum values, leveraging similarity of adjacent pixels.
Flattening¶
Converts processed images into format suitable for traditional neural network layers.
Convolutional Neural Networks (CNNs)¶
Combine convolution, pooling, and dense layers for image analysis:
- Extract features through learned convolution filters
- Reduce dimensionality through pooling
- Feed flattened results into dense layers
Convolution + pooling together reduce sensitivity to image variations — slightly different angles produce comparable outputs.
Recurrent Neural Networks (RNNs)¶
Unlike feed-forward networks that produce fixed outputs, RNNs use their own outputs as subsequent inputs.
This enables variable-length outputs, making them suitable for: - Image captioning - Video analysis - Machine translation - Sequential data processing
The non-linear architecture processes sequences iteratively, generating outputs at multiple steps until completion.