Week 2: Uncertainty¶
CS50's Introduction to Artificial Intelligence with Python
Probability Fundamentals¶
Possible Worlds and Axioms¶
Every scenario is a possible world (ω). Core axioms: - All probabilities range between 0 (impossible) and 1 (certain) - Probabilities across all possible outcomes sum to 1
Calculating Probabilities¶
Probability = favorable outcomes / total possible outcomes.
Example: Two dice yield 36 equally likely combinations. Only one produces a sum of 12, so P(12) = 1/36.
Unconditional vs. Conditional Probability¶
Unconditional probability: Belief without any evidence.
Conditional probability P(a|b): Likelihood of event a given that b occurred.
This restricts consideration to worlds where b is true, then finds the proportion where a is also true.
Random Variables and Independence¶
A random variable has a domain of possible values with associated probabilities.
Example — flight status:
Independence: One event's probability is unaffected by another.
Bayes' Rule¶
Calculates P(b|a) from P(a|b):
Example: Given: - 80% of rainy days start cloudy → P(cloudy|rain) = 0.8 - 40% of days are cloudy → P(cloudy) = 0.4 - 10% of days are rainy → P(rain) = 0.1
P(rain|cloudy) = (0.1 × 0.8) / 0.4 = 0.2 (20%)
Bayes' rule reverses cause and effect — visible evidence infers hidden causes.
Joint and Marginal Probability¶
Joint probability tables show the likelihood of multiple events co-occurring, enabling conditional distributions through normalization.
Marginalization recovers individual variable probabilities:
Probability Rules Summary¶
| Rule | Formula |
|---|---|
| Negation | P(¬a) = 1 − P(a) |
| Inclusion-Exclusion | P(a ∨ b) = P(a) + P(b) − P(a ∧ b) |
| Marginalization | Sum joint probabilities across hidden variables |
| Conditioning | P(a) = P(a|b)P(b) + P(a|¬b)P(¬b) |
Bayesian Networks¶
A directed graph where: - Each node represents a random variable - Arrows indicate dependency (parent → child) - Each node stores P(X | Parents(X))
Example structure: Rain → Maintenance → Train → Appointment
Parents directly influence children; indirect ancestors don't appear in conditional probability tables.
Inference in Bayesian Networks¶
Components: - Query variable (X): Target of probability calculation - Evidence variables (E): Observed facts - Hidden variables (Y): Unobserved factors - Goal: Calculate P(X|e)
Inference by enumeration: Sums joint probabilities of query and evidence across all hidden variable combinations, then normalizes.
Code Example (Pomegranate)¶
from pomegranate import *
# Root node
rain = Node(DiscreteDistribution({
"none": 0.7, "light": 0.2, "heavy": 0.1
}), name="rain")
# Conditional node
maintenance = Node(ConditionalProbabilityTable([
["none", "yes", 0.4],
["none", "no", 0.6],
# ... additional rows
], [rain.distribution]), name="maintenance")
model = BayesianNetwork()
model.add_states(rain, maintenance)
model.add_edge(rain, maintenance)
model.bake()
Sampling Methods¶
Basic Sampling¶
Generate approximate distributions by repeatedly sampling each variable according to its conditional probability given parent values. Count occurrences to estimate probabilities.
For conditional queries like P(Rain|Train=delayed), discard samples not matching the evidence constraint.
Likelihood Weighting¶
More efficient than rejection sampling:
- Fix evidence variables at observed values
- Sample remaining variables using conditional probabilities
- Weight each sample by P(evidence | sampled parents)
Avoids discarding incompatible samples entirely.
def generate_sample():
sample = {}
parents = {}
for state in model.states:
if isinstance(state.distribution, ConditionalProbabilityTable):
sample[state.name] = state.distribution.sample(parent_values=parents)
else:
sample[state.name] = state.distribution.sample()
parents[state.distribution] = sample[state.name]
return sample
Markov Models¶
The Markov Assumption¶
The current state depends only on a finite fixed number of prior states, making complex prediction tractable.
Markov Chains¶
A sequence where each event's probability depends solely on the previous event.
Requirements: - Transition model: P(Xₜ₊₁ | Xₜ) - Start probabilities: Initial distribution
model = MarkovChain([
DiscreteDistribution({"sun": 0.5, "rain": 0.5}),
ConditionalProbabilityTable([
["sun", "sun", 0.8],
["sun", "rain", 0.2],
["rain", "sun", 0.3],
["rain", "rain", 0.7]
], [start])
])
Hidden Markov Models¶
Systems with unobservable internal states generating observable evidence.
Examples: - Robot position (hidden) vs. sensor readings (observed) - Spoken words (hidden) vs. audio waveforms (observed) - User engagement (hidden) vs. analytics data (observed)
Key components: - Emission model: P(observation | state) - Transition model: P(next state | current state) - Sensor Markov assumption: Evidence depends only on current state
Applications:
| Task | Description |
|---|---|
| Filtering | Current state probability given observations so far |
| Prediction | Future state probability |
| Smoothing | Past state probability given current observations |
| Most likely explanation | Most probable state sequence producing observations |
sun = DiscreteDistribution({"umbrella": 0.2, "no umbrella": 0.8})
rain = DiscreteDistribution({"umbrella": 0.9, "no umbrella": 0.1})
transitions = numpy.array([[0.8, 0.2], [0.3, 0.7]])
starts = numpy.array([0.5, 0.5])
model = HiddenMarkovModel.from_matrix(
transitions, [sun, rain], starts,
state_names=["sun", "rain"])