Skip to main content

Section πŸŽ›οΈ Monte Carlo Simulation

This section introduces the concept of Monte Carlo simulation, a computational method that uses random sampling to estimate numerical quantities such as probabilities, averages, or areas. The following examples illustrate how to use Monte Carlo methods for probability estimation, geometric area approximation, and even a game show problem.

Subsection Overview

Subsubsection Typical Workflow

A Monte Carlo simulation repeatedly runs an experiment with random inputs and records the outcome of each trial. By averaging over many trials, we approximate a theoretical value, such as a probability \(p\text{.}\) The estimate is often denoted by \(\hat{p}\text{.}\)
The typical workflow is:
  1. Choose the number of experiments \(N\text{.}\)
  2. For each \(i=1,\dots,N\text{:}\)
    • Generate random values that represent one realization of the experiment.
    • Record the outcome \(X_i\text{.}\)
  3. Compute the mean outcome:
    \begin{equation*} \hat{p} = \frac{X_1 + X_2 + \dots + X_N}{N}. \end{equation*}
  4. Quantify uncertainty using a confidence interval for \(p\text{.}\)

Subsubsection Reproducibility

When developing and testing code that depends on random numbers, it is helpful to start with a predictable sequence of random values. You can do this in MATLAB by setting the random number generator’s seed:
seed = 123;
rng(seed,'twister');
In your assignments, the random seed will usually be provided as an optional input so results can be reproduced.

Subsection Confidence Intervals

Subsubsection Proportion Estimation

If the random outcomes \(X_i\) are binary (for example, success = 1, failure = 0) with true probability \(P(X_i = 1) = p\text{,}\) then \(\hat{p} = \mathrm{mean}(X)\) estimates \(p\text{.}\)
For large \(N\text{,}\) an approximate 95% confidence interval for \(p\) is
\begin{equation*} \hat{p} \pm z \sqrt{\frac{\hat{p}(1-\hat{p})}{N}}, \end{equation*}
where \(z = 1.96\) corresponds to a 95% confidence level. These approximations work well for large sample sizes and probabilities that are not extreme.
Example 15. Area Under a Curve in the Unit Square.
We can use Monte Carlo simulation to estimate geometric areas. For example, suppose we want to approximate the area of a circle of radius \(r\text{.}\) Because the circle is symmetric, it suffices to estimate the area of a quarter circle in the unit square and multiply by 4.
A random point \((x,y)\) uniformly distributed in \([0,1]^2\) either lies inside the quarter circle (\(x^2 + y^2 \le r^2\)) or it doesn’t. The fraction of points that fall inside should approximate the ratio of the areas:
\begin{equation*} \frac{\text{points inside circle}}{\text{total points}} \approx \frac{\text{area of quarter circle}}{\text{area of unit square}}. \end{equation*}
N = 1000;
r = 0.5;
hits = 0;

% Monte Carlo loop
for k = 1:N
    x1 = rand(); y1 = rand();
    if x1^2 + y1^2 < r^2
        hits = hits + 1;
    end
end

hit_frac_hat = hits/N;           % fraction of points inside
approxArea   = 4 * hit_frac_hat; % approximate full circle area
exactArea    = pi * r^2;         % true area for comparison

% 95% confidence interval
CL = 0.95;
z = z_from_CL(CL);
marginOfError = z * sqrt(hit_frac_hat*(1-hit_frac_hat)/N);

T = table(N, hits, hit_frac_hat, marginOfError, approxArea, exactArea);
disp(T)
Running this produces the following output:
 N      hits    hit_frac_hat    marginOfError    approxArea    exactArea
____    ____    ____________    _____________    __________    _________

1000    192        0.192         Β± 0.024413        0.768        0.7854
Example 16. Distance Between Two Random Points.
Suppose two points are chosen independently and uniformly in the unit square. The following Monte-Carlo simulation estimates the probability that the two points are within distance \(r\) of each other.
N = 100;
r = 0.25;
hits = 0;

for k = 1:N
    x1 = rand(); y1 = rand();
    x2 = rand(); y2 = rand();

    sqrdist = (x2-x1)^2 + (y2-y1)^2;
    if sqrdist < r^2
        hits = hits + 1;
    end
end

hit_frac_hat = hits/N;

CL = 0.95;
z = z_from_CL(CL);
marginOfError = z * sqrt(hit_frac_hat*(1-hit_frac_hat)/N);

T = table(N, hits, hit_frac_hat, marginOfError);
disp(T)
Running this produces the following output:
N     hits    hit_frac_hat    marginOfError
___    ____    ____________    _____________

100     6          0.06         Β± 0.046547
Example 17. The Monty Hall Problem.
The famous Monty Hall game show problem asks: if you pick one of three doors at random, and the host opens another door revealing a goat, should you switch to the remaining door? Monte Carlo simulation can estimate the probability of winning if you switch versus if you stay.
N = 1000;
wins_by_switching = 0;
wins_by_sticking = 0;

for t = 1:N
    carDoor = randi(3);
    pickDoor = randi(3);

    if pickDoor == carDoor
        wins_by_sticking = wins_by_sticking + 1;
    else
        wins_by_switching = wins_by_switching + 1;
    end
end

prob_win_by_switching = wins_by_switching / N;
prob_win_by_sticking  = wins_by_sticking / N;

CL = 0.95;
z = z_from_CL(CL);
se = sqrt(prob_win_by_switching*(1-prob_win_by_switching)/N);

marginOfError = z * se;
T = table(N, wins_by_switching, wins_by_sticking, ...
        prob_win_by_switching, prob_win_by_sticking, marginOfError);
disp(T)
Running this produces the following output:
N      wins_by_switching    wins_by_sticking    prob_win_by_switching    prob_win_by_sticking    marginOfError
____    _________________    ________________    _____________________    ____________________    _____________

1000           669                 331                   0.669                   0.331             Β± 0.029166
Critical \(z\)-Value.
The following function returns an approximate critical \(z\) value for a given confidence level (CL). It is used in all examples to compute the 95% confidence interval.
function z = z_from_CL(CL)
    % Approximate two-sided z critical value for common CLs (0-1)
    if abs(CL - 0.99) < 1e-10
        z = 2.575;
    elseif abs(CL - 0.95) < 1e-10
        z = 1.960;
    elseif abs(CL - 0.90) < 1e-10
        z = 1.645;
    elseif abs(CL - 0.80) < 1e-10
        z = 1.282;
    else
        alpha = 1 - CL;
        z = sqrt(2) * erfinv(1 - alpha);
    end
end

Subsubsection Mean Estimation

In Proportion Estimation, we explored Monte Carlo simulations for binary (yes/no) outcomes and probability estimation. In this section, we extend the idea to experiments with numerical outcomes that can take on many values, such as the number of tosses required to achieve a specific pattern in coin flips. We will also learn how to compute confidence intervals for estimates of expected (mean) values.
When the outcome of a random experiment is a numerical value rather than a simple yes/no indicator, we are estimating a mean rather than a probability. Suppose \(Y\) is a real-valued random variable with true mean \(\mu\text{.}\) We can estimate \(\mu\) using a Monte Carlo average:
\begin{equation*} \hat{\mu} = \frac{Y_1 + Y_2 + \dots + Y_N}{N}. \end{equation*}
To measure uncertainty, we form a confidence interval for \(\mu\text{:}\)
\begin{equation*} \hat{\mu} \pm z \frac{s}{\sqrt{N}}, \end{equation*}
where \(z\) is the critical value corresponding to the chosen confidence level (for example, \(z = 1.96\) for 95%) and \(s = \mathrm{std}(Y_1, Y_2, \dots, Y_N)\) is the sample standard deviation.
Example 18. Expected Tosses to Get Two Consecutive Heads.
Consider repeatedly tossing a fair coin until you see two consecutive heads (HH). The random variable \(T\) represents the number of tosses required. We want to estimate \(E[T]\text{,}\) the expected number of tosses needed to get HH, using Monte Carlo simulation.
The following table shows four example outcomes for the experiment:
Experiment (\(k\)) Observation \(T_k\)
1 H, T, H, H 4
2 T, T, T, H, T, H, H 7
3 H, H 2
4 T, T, T, H, T, H, T, T, T, H, T, H, H 13
Estimating the Expected Number of Tosses to Get Two Heads in a Row
N = 1000;
tosses = zeros(N,1);

% Monte Carlo loop
for t = 1:N

    head_streak = 0;
    toss_count = 0;

    while head_streak < 2
        flip = randi([0 1]);   % 1 = heads, 0 = tails

        if flip == 1
            head_streak = head_streak + 1;  % increase streak
        else
            head_streak = 0;                % reset streak
        end

        toss_count = toss_count + 1;        % count total tosses
    end

    % record result for this trial
    tosses(t) = toss_count;
end

% Compute Monte Carlo estimate
avgTosses = mean(tosses);

% Confidence interval for the mean
CL = 0.95;
z = z_from_CL(CL);
s = std(tosses, 0);       % sample standard deviation
ME = z * s / sqrt(N);

% Display results as a table
ME = categorical("Β± " + ME);
T = table(N, avgTosses, s, ME);
disp(T)
Running this produces the following output:
 N      avgTosses      s          ME    
____    _________    ______    _________

1000      5.839      4.7421    Β± 0.29392
As \(N\) increases, the average number of tosses \(\hat{\mu} \) should approach the true expected value. The 95% confidence interval width decreases proportionally to \(1 / \sqrt{N}\text{.}\)
Local Helper Function.
The helper function below returns an approximate two-sided \(z\)-critical value for a given confidence level. It supports common confidence levels such as 90%, 95%, and 99%.
function z = z_from_CL(CL)
    % Approximation of a two-sided z critical value for confidence level CL.
    if abs(CL - 0.99) < 1e-10
        z = 2.575;
    elseif abs(CL - 0.95) < 1e-10
        z = 1.960;
    elseif abs(CL - 0.90) < 1e-10
        z = 1.645;
    elseif abs(CL - 0.80) < 1e-10
        z = 1.282;
    else
        alpha = 1 - CL;
        z = sqrt(2) * erfinv(1 - alpha);
    end
end