More Monte Carlo


Lecture 25

March 25, 2026

Review

Monte Carlo

  • Stochastic simulation: propagate random variables \(X\) through model \(f\) and want to estimate \(\mathbb{E}[f(X)] \approx \sum_{i=1}^n f(X_i)\) using samples \(X_1, \ldots, X_n\).
  • Example: probability of a standard Gaussian between \([0.8, 0.9]\): draw \(X_1, \ldots, X_n \sim N(0, 1)\), \[f(x) = \mathbb{I}_{[0.8, 0.9]}(x).\]

Statistics of Monte Carlo

  • Denote \(Y = f(X)\), \(\mathbb{E}[Y] = \mu\), and \(\frac{1}{n}\sum_{i=1}^n Y_i = \tilde{\mu}_n\).
  • MC is unbiased: \(\mathbb{E}[\tilde{\mu}_n] = \mu\);
  • Monte Carlo standard error (MCSE): \(Y_i\) i.i.d., then \(\text{SE}(\tilde{\mu}_n) = \sigma_Y / \sqrt{n}\).

When Would We Want To Use Monte Carlo?

  1. Ugly integrands: \(f(X)\) is highly nonlinear or not smooth;
  2. Ugly domains;
  3. Higher dimensions;
  4. Distribution of \(X\) is implicit.

Monte Carlo Confidence Intervals

Monte Carlo Confidence Intervals

Basic Idea: The Central Limit Theorem says that with enough samples, the errors are normally distributed:

\[\left\|\tilde{\mu}_n - \mu\right\| \to \mathcal{N}\left(0, \frac{\sigma_Y^2}{n}\right)\]

Monte Carlo Confidence Intervals

The \(1-\alpha\)-confidence interval is: \[\tilde{\mu}_n \pm \Phi^{-1}\left(1 - \frac{\alpha}{2}\right) \frac{\sigma_Y}{\sqrt{n}}\]

For example, the 95% confidence interval is \[\tilde{\mu}_n \pm 1.96 \frac{\sigma_Y}{\sqrt{n}}.\]

How Many Samples Do I Need?

Notice that we can’t figure this out ahead of time: if we want to get the MCSE \(\tilde{\sigma}_n\) below \(\varepsilon\), we can solve

\[n \geq \frac{1}{\varepsilon^2}\mathbb{V}\left[Y_i\right],\]

but we can’t even estimate \(\mathbb{V}\left[Y_i\right]\) without doing Monte Carlo!

How Many Samples Do I Need?

Try:

  1. Run a “pilot” MC with a relatively small number of samples to estimate \(\mathbb{V}\left[Y_i\right]\).
  2. Pick a target MCSE \(\varepsilon\), then we need \(N \approx \mathbb{V}\left[Y_i\right] / \varepsilon^2\) samples.
  3. Run the “real” MC with sample size \(N\) and check if the MCSE is right (\(N\) might be off due to error in the variance estimate).
  4. If not, run more.

Implications of Monte Carlo Error

Converging at a rate of \(1/\sqrt{n}\) is not great. But:

  • All models are wrong, and so there always exists some irreducible model error.
  • We often need a lot of simulations. Do we have enough computational power?

Estimating Quantiles with Monte Carlo

MC Estimate of the CDF

Would like to estimate the CDF \(F\) with some approximation \(\hat{F}_n\), then compute \(\hat{z}^\alpha_n = \hat{F}_n^{-1}(\alpha)\) as an estimator of the \(\alpha\)-quantile \(z^\alpha\).

Given samples \(\hat{\mathbf{y}} = y_1, \ldots, y_n \sim F\), define \[\hat{F}_(y) = \frac{1}{n} \sum_{i=1}^n \mathbb{I}(y_i \leq y).\]

Is This An Unbiased Estimator?

\[ \begin{align*} \mathbb{E}[\hat{F}_(y)] &= \frac{1}{n} \sum_{i=1}^n \mathbb{I}(y_i \leq y) \\ &= \frac{1}{n} \sum_{i=1}^n \mathbb{P}(y_i \leq y) \\ &= F(y) \end{align*} \]

Monte Carlo Quantile Estimation Error

From the CLT and some (not super important) theory about order statistics: \[\text{Var}(\hat{z}^\alpha_n) \to \frac{\sigma^2_y}{n}\frac{\alpha (1 - \alpha)}{f^2(z^\alpha)}\]

In other words, the smaller the density at the “true” quantile \(z^\alpha\), the greater the error and the more samples required.

More Advanced Monte Carlo Methods (Teasers)

Can We Reduce Sample Size?

Goal of most advanced MC methods are to reduce the variance so MCSE decays faster than \(1/\sqrt{n}\).

Some approaches:

  1. Control variates;
  2. Antithetic variates;
  3. Importance sampling.

Control Variates

Idea is to “anchor” the estimate of \(\mathbb{E}[f(X)]\) to some a similar but easier-to-calculate value \(\mathbb{E}[g(X)]\).

Anchoring

Let’s play a game:

Write down how long you think it takes to fly from London to New York.

What did you write down?

Anchoring

Now suppose you know that it takes 8 hours and 15 minutes to fly from London to Washington, DC.

Write down how long you think it takes to fly from London to New York.

What did you write down?

Anchoring and Control Variates

It’s easier to estimate a quantity if we can start with a known-but-similar value and tweak it slightly.

We want to estimate \(\mu = \mathbb{E}[f(X)]\), but suppose we know \(\mathbb{E}[g(X)]\) instead. Then:

\[\begin{aligned} \mu &= \mathbb{E}[f(X)] = \mathbb{E}[f(X) - g(X) + g(X)] \\ &= \underbrace{\mathbb{E}[f(X) - g(X)]}_{\text{Estimate with MC}} + \underbrace{\mathbb{E}[g(X)]}_{\text{known}}. \end{aligned}\]

Control Variate Estimate

In other words, if \(\nu = \mathbb{E}[g(X)]\), we can obtain an estimate

\[\tilde{\mu}_n^{\text{CV}} = \frac{1}{n}\sum_{i=1}^n \left(f(X) - g(X)\right) + \nu.\]

The level of increased efficiency is related to the correlation between \(f\) and \(g\).

Control Variate Example

Suppose we want to estimate \(\mathbb{E}[\cos(X)]\), where \(X \sim N(0, 1)\).

Let’s use \(g(x) = 1 - \frac{x^2}{2}\) based on the Taylor expansion.

Code
x = -5:0.01:5
f(x) = cos(x)
g(x) = 1 - x^2/2

plot(x, f.(x), color=:blue, linewidth=3, label=L"$\cos(x)$")
plot!(x, g.(x), color=:red, linewidth=3, label=L"$g(x)$")
ylims!((-1.25, 1.25))
plot!(size=(400, 350))
Figure 1: Control Variate Example Functions

Control Variate Example

We know that if \(X \sim N(\mu, \sigma^2)\), \(\mathbb{E}[X^2] = \mu^2 + \sigma^2\), so \(\mathbb{E}[g(X)] = 1/2\).

1,000 samples:

Naive MC estimate: 0.599 with MCSE 0.014.

Control variate estimate: 0.605 with MCSE 0.008.

Antithetic Variates

If target values of pairs of samples \(h(X_i)\) and \(h(Y_i)\) are negatively correlated, can increase rate of convergence of \[\frac{1}{2M} \sum_{i=1}^M [h(X_i) + h(Y_i)]\] relative to \(\frac{1}{2M} h(X_i)\) alone:

Efficiency of Antithetic Variates

\[\begin{aligned} \mathbb{V}\left[\frac{1}{2}(h(X_i) + h(Y_i)\right] &= \frac{1}{4}\left[\mathbb{V}[h(X_i)] + \mathbb{V}[h(Y_i)] + \\ &\qquad\qquad 2\text{Cov}(h(X_i), h(Y_i)\right] \\ &= \frac{1}{4}\left(\sigma^2 + \sigma^2 + 2\rho\sigma^2\right) = \frac{1 + \rho}{2}\sigma^2 \end{aligned}\]

Antithetic Variate Generation Can Be Difficult in Practice

  • Ensuring anti-correlation can be difficult to verify in general;
  • Gains in efficiency are dependent on effectiveness of antithetical variate generation and shape of \(h(x)\)

What If We Can’t Sample From The Distribution?

  • May not be able to generate samples \(X \sim f(x)\) efficiently
  • Think of sampling from tails:

\[P(X > k) \approx \frac{1}{M} \sum_{i=1}^M \mathbb{I}(X_i > k)\]

Importance Sampling

Extension of rejection sampling without requiring “rejection”:

  1. Draw samples from importance distribution g(x);
  2. Reweight samples: \[ \mathbb{E}_f[h(x)] = \int_x \frac{f(x)}{g(x)} g(x)h(x) dx \approx \frac{1}{M} \sum_{i=1}^M \frac{f(x)}{g(x)}h(x) \]

Importance Sampling Needs

Technically works with any proposal \(g\), but more efficient if \(g\) “covers” \(f\) (like with rejection sampling):

\[f(x)/g(x) < M < \infty\]

Key Points

Key Points

  • Monte Carlo is an unbiased estimator; confidence intervals given by CLT.
  • Be mindful of Monte Carlo standard error for “naive” MC with iid samples.
  • Advanced: Variance reduction techniques to improve convergence, see e.g. https://artowen.su.domains/mc/Ch-var-basic.pdf.

Perhaps Most Importantly…

Always report Monte Carlo error!

Upcoming Schedule

Next Classes

Wednesday: The Bootstrap

References

References (Scroll for Full List)