Appendix A — Review of Probability Theory
This appendix reviews the mathematical foundations underlying Monte Carlo methods, covering probability spaces, random variables, estimation theory, and the fundamental limit theorems that justify Monte Carlo inference.
A.1 Probability Spaces and Random Variables
A probability space is a triple \((\Omega, \mathcal{F}, P)\) where \(\Omega\) is the sample space, \(\mathcal{F}\) is a \(\sigma\)-algebra on \(\Omega\), and \(P: \mathcal{F} \to [0,1]\) is a probability measure satisfying Kolmogorov’s axioms.
A random variable is a measurable function \(X: \Omega \to \mathbb{R}\) such that \(\{X \leq x\} \in \mathcal{F}\) for all \(x \in \mathbb{R}\). The cumulative distribution function (CDF) of \(X\) is: \[F_X(x) = P(X \leq x) = P(\{\omega \in \Omega : X(\omega) \leq x\})\]
Continuous random variables have a probability density function (PDF) \(f_X(x)\) such that: \[F_X(x) = \int_{-\infty}^{x} f_X(t) \, dt \quad \text{and} \quad f_X(x) = \frac{dF_X(x)}{dx}\]
Discrete random variables with support \(\{x_1, x_2, \ldots\}\) satisfy \(P(X = x_i) = p_i\) where \(\sum_{i} p_i = 1\).
For a random variable \(X\):
Expected value: \[\mathbb{E}[X] = \begin{cases} \sum_{i} x_i \cdot P(X = x_i) & \text{if } X \text{ is discrete} \\ \int_{-\infty}^{\infty} x \cdot f_X(x) \, dx & \text{if } X \text{ is continuous} \end{cases}\]
Variance: \[\text{Var}(X) = \mathbb{E}[(X - \mathbb{E}[X])^2] = \mathbb{E}[X^2] - (\mathbb{E}[X])^2\]
A.2 Statistical Estimation and the Sample Mean
Consider i.i.d. observations \(X_1, \ldots, X_N\) with common distribution \(F\). We seek to estimate \(\theta = \mathbb{E}[g(X)]\) for some measurable function \(g: \mathbb{R} \to \mathbb{R}\) using the sample mean estimator: \[\hat{\theta}_N = \frac{1}{N} \sum_{i=1}^{N} g(X_i) \tag{A.1}\]
The sample mean estimator \(\hat{\theta}_N\) satisfies:
- Unbiasedness: \(\mathbb{E}[\hat{\theta}_N] = \theta\)
- Variance: \(\text{Var}(\hat{\theta}_N) = \frac{\sigma^2}{N}\) where \(\sigma^2 = \text{Var}(g(X))\)
- Standard Error: \(\text{SE}(\hat{\theta}_N) = \sigma/\sqrt{N}\)
- Mean Squared Error: \(\text{MSE}(\hat{\theta}_N) = \sigma^2/N\)
For any estimator \(\hat{\theta}\) of parameter \(\theta\), we define:
- Bias: \(\text{Bias}(\hat{\theta}) = \mathbb{E}[\hat{\theta}] - \theta\)
- Mean Squared Error: \(\text{MSE}(\hat{\theta}) = \mathbb{E}[(\hat{\theta} - \theta)^2] = \text{Var}(\hat{\theta}) + \text{Bias}^2(\hat{\theta})\)
An estimator sequence \(\{\hat{\theta}_N\}\) is consistent if \(\hat{\theta}_N \xrightarrow{P} \theta\) as \(N \to \infty\), and strongly consistent if \(\hat{\theta}_N \xrightarrow{a.s.} \theta\) as \(N \to \infty\).
A.3 Fundamental Limit Theorems
The theoretical foundation of Monte Carlo methods rests on two cornerstone results from probability theory.
Let \(X_1, X_2, \ldots\) be i.i.d. random variables with \(\mathbb{E}[|X_i|] < \infty\) and \(\mathbb{E}[X_i] = \mu\). Then: \[\frac{1}{N}\sum_{i=1}^{N} X_i \xrightarrow{a.s.} \mu \quad \text{as } N \to \infty\]
Let \(X_1, X_2, \ldots\) be i.i.d. random variables with \(\mathbb{E}[X_i] = \mu\) and \(0 < \text{Var}(X_i) = \sigma^2 < \infty\). Then: \[\frac{\sqrt{N}(\bar{X}_N - \mu)}{\sigma} \xrightarrow{d} \mathcal{N}(0,1) \quad \text{as } N \to \infty\] where \(\bar{X}_N = \frac{1}{N}\sum_{i=1}^{N} X_i\) and \(\xrightarrow{d}\) denotes convergence in distribution.
Convergence notation: \(\xrightarrow{P}\) denotes convergence in probability, \(\xrightarrow{a.s.}\) denotes almost sure convergence, and \(\xrightarrow{d}\) denotes convergence in distribution.
A.3.1 Monte Carlo Implications
For our estimator \(\hat{\theta}_N = \frac{1}{N}\sum_{i=1}^{N} g(X_i)\) where \(\theta = \mathbb{E}[g(X)]\):
- Consistency (from SLLN): \(\hat{\theta}_N \xrightarrow{a.s.} \theta\)
- Asymptotic normality (from CLT): \(\sqrt{N}(\hat{\theta}_N - \theta) \xrightarrow{d} \mathcal{N}(0, \sigma^2)\) where \(\sigma^2 = \text{Var}(g(X))\)
A.4 Confidence Intervals and Convergence Analysis
Let \(S_N^2 = \frac{1}{N-1}\sum_{i=1}^{N}(g(X_i) - \hat{\theta}_N)^2\) be the sample variance. By the CLT: \[\frac{\hat{\theta}_N - \theta}{S_N/\sqrt{N}} \xrightarrow{d} \mathcal{N}(0,1)\]
An asymptotic \((1-\alpha)\)-level confidence interval is: \[\left[ \hat{\theta}_N - z_{1-\alpha/2} \frac{S_N}{\sqrt{N}}, \quad \hat{\theta}_N + z_{1-\alpha/2} \frac{S_N}{\sqrt{N}} \right]\] where \(z_{1-\alpha/2} = \Phi^{-1}(1-\alpha/2)\) and \(\Phi\) is the standard normal CDF.
The absolute width is \(2z_{1-\alpha/2} \frac{S_N}{\sqrt{N}}\) and the relative width is \(\frac{2z_{1-\alpha/2} S_N}{|\hat{\theta}_N|\sqrt{N}}\).
The sample mean estimator has standard error \(\text{SE}(\hat{\theta}_N) = \sigma/\sqrt{N} = O(N^{-1/2})\), leading to several key properties:
- Square Root Law: To halve the confidence interval width requires four times as many samples
- Dimension Independence: The \(O(N^{-1/2})\) convergence rate is independent of problem dimension for independent samples
- MCMC Caveat: When using MCMC, dependence between samples can effectively reduce the number of independent observations, particularly in high dimensions
Both the Law of Large Numbers and Central Limit Theorem require:
- Independence: Samples must be independent (or satisfy weaker mixing conditions)
- Identical distribution: Samples from the same distribution
- Finite moments: Finite mean for LLN, finite variance for CLT
When using MCMC, the independence assumption is violated, requiring analysis of effective sample size.