Lecture 25
March 25, 2026
Basic Idea: The Central Limit Theorem says that with enough samples, the errors are normally distributed:
\[\left\|\tilde{\mu}_n - \mu\right\| \to \mathcal{N}\left(0, \frac{\sigma_Y^2}{n}\right)\]
The \(1-\alpha\)-confidence interval is: \[\tilde{\mu}_n \pm \Phi^{-1}\left(1 - \frac{\alpha}{2}\right) \frac{\sigma_Y}{\sqrt{n}}\]
For example, the 95% confidence interval is \[\tilde{\mu}_n \pm 1.96 \frac{\sigma_Y}{\sqrt{n}}.\]
We don’t know the population standard deviation \(\sigma_Y\).
But we can estimate it using the simulation variance:
\[\begin{aligned} \tilde{\sigma}^2_n &= \mathbb{V}\left[Y_i\right] \\ &= \frac{1}{n-1} \sum_{i=1}^n \left(Y_i - \tilde{\mu}_n\right)^2 \end{aligned}\]
Notice that we can’t figure this out ahead of time: if we want to get the MCSE \(\tilde{\sigma}_n\) below \(\varepsilon\), we can solve
\[n \geq \frac{1}{\varepsilon^2}\mathbb{V}\left[Y_i\right],\]
but we can’t even estimate \(\mathbb{V}\left[Y_i\right]\) without doing Monte Carlo!
Try:
Converging at a rate of \(1/\sqrt{n}\) is not great. But:
Would like to estimate the CDF \(F\) with some approximation \(\hat{F}_n\), then compute \(\hat{z}^\alpha_n = \hat{F}_n^{-1}(\alpha)\) as an estimator of the \(\alpha\)-quantile \(z^\alpha\).
Given samples \(\hat{\mathbf{y}} = y_1, \ldots, y_n \sim F\), define \[\hat{F}_(y) = \frac{1}{n} \sum_{i=1}^n \mathbb{I}(y_i \leq y).\]
\[ \begin{align*} \mathbb{E}[\hat{F}_(y)] &= \frac{1}{n} \sum_{i=1}^n \mathbb{I}(y_i \leq y) \\ &= \frac{1}{n} \sum_{i=1}^n \mathbb{P}(y_i \leq y) \\ &= F(y) \end{align*} \]
From the CLT and some (not super important) theory about order statistics: \[\text{Var}(\hat{z}^\alpha_n) \to \frac{\sigma^2_y}{n}\frac{\alpha (1 - \alpha)}{f^2(z^\alpha)}\]
In other words, the smaller the density at the “true” quantile \(z^\alpha\), the greater the error and the more samples required.
Goal of most advanced MC methods are to reduce the variance so MCSE decays faster than \(1/\sqrt{n}\).
Some approaches:
Idea is to “anchor” the estimate of \(\mathbb{E}[f(X)]\) to some a similar but easier-to-calculate value \(\mathbb{E}[g(X)]\).
Let’s play a game:
Write down how long you think it takes to fly from London to New York.
What did you write down?
Now suppose you know that it takes 8 hours and 15 minutes to fly from London to Washington, DC.
Write down how long you think it takes to fly from London to New York.
What did you write down?
It’s easier to estimate a quantity if we can start with a known-but-similar value and tweak it slightly.
We want to estimate \(\mu = \mathbb{E}[f(X)]\), but suppose we know \(\mathbb{E}[g(X)]\) instead. Then:
\[\begin{aligned} \mu &= \mathbb{E}[f(X)] = \mathbb{E}[f(X) - g(X) + g(X)] \\ &= \underbrace{\mathbb{E}[f(X) - g(X)]}_{\text{Estimate with MC}} + \underbrace{\mathbb{E}[g(X)]}_{\text{known}}. \end{aligned}\]
In other words, if \(\nu = \mathbb{E}[g(X)]\), we can obtain an estimate
\[\tilde{\mu}_n^{\text{CV}} = \frac{1}{n}\sum_{i=1}^n \left(f(X) - g(X)\right) + \nu.\]
The level of increased efficiency is related to the correlation between \(f\) and \(g\).
Suppose we want to estimate \(\mathbb{E}[\cos(X)]\), where \(X \sim N(0, 1)\).
Let’s use \(g(x) = 1 - \frac{x^2}{2}\) based on the Taylor expansion.
We know that if \(X \sim N(\mu, \sigma^2)\), \(\mathbb{E}[X^2] = \mu^2 + \sigma^2\), so \(\mathbb{E}[g(X)] = 1/2\).
1,000 samples:
Naive MC estimate: 0.599 with MCSE 0.014.
Control variate estimate: 0.605 with MCSE 0.008.
If target values of pairs of samples \(h(X_i)\) and \(h(Y_i)\) are negatively correlated, can increase rate of convergence of \[\frac{1}{2M} \sum_{i=1}^M [h(X_i) + h(Y_i)]\] relative to \(\frac{1}{2M} h(X_i)\) alone:
\[\begin{aligned} \mathbb{V}\left[\frac{1}{2}(h(X_i) + h(Y_i)\right] &= \frac{1}{4}\left[\mathbb{V}[h(X_i)] + \mathbb{V}[h(Y_i)] + \\ &\qquad\qquad 2\text{Cov}(h(X_i), h(Y_i)\right] \\ &= \frac{1}{4}\left(\sigma^2 + \sigma^2 + 2\rho\sigma^2\right) = \frac{1 + \rho}{2}\sigma^2 \end{aligned}\]
\[P(X > k) \approx \frac{1}{M} \sum_{i=1}^M \mathbb{I}(X_i > k)\]
Extension of rejection sampling without requiring “rejection”:
Technically works with any proposal \(g\), but more efficient if \(g\) “covers” \(f\) (like with rejection sampling):
\[f(x)/g(x) < M < \infty\]
Always report Monte Carlo error!
Wednesday: The Bootstrap