Appendix A — Review of Probability Theory

This appendix reviews the mathematical foundations underlying Monte Carlo methods, covering probability spaces, random variables, estimation theory, and the fundamental limit theorems that justify Monte Carlo inference.

A.1 Probability Spaces and Random Variables

A probability space is a triple \((\Omega, \mathcal{F}, P)\) where \(\Omega\) is the sample space, \(\mathcal{F}\) is a \(\sigma\)-algebra on \(\Omega\), and \(P: \mathcal{F} \to [0,1]\) is a probability measure satisfying Kolmogorov’s axioms.

A random variable is a measurable function \(X: \Omega \to \mathbb{R}\) such that \(\{X \leq x\} \in \mathcal{F}\) for all \(x \in \mathbb{R}\). The cumulative distribution function (CDF) of \(X\) is: \[F_X(x) = P(X \leq x) = P(\{\omega \in \Omega : X(\omega) \leq x\})\]

Continuous random variables have a probability density function (PDF) \(f_X(x)\) such that: \[F_X(x) = \int_{-\infty}^{x} f_X(t) \, dt \quad \text{and} \quad f_X(x) = \frac{dF_X(x)}{dx}\]

Discrete random variables with support \(\{x_1, x_2, \ldots\}\) satisfy \(P(X = x_i) = p_i\) where \(\sum_{i} p_i = 1\).

Definition (Moments)

For a random variable \(X\):

Expected value: \[\mathbb{E}[X] = \begin{cases} \sum_{i} x_i \cdot P(X = x_i) & \text{if } X \text{ is discrete} \\ \int_{-\infty}^{\infty} x \cdot f_X(x) \, dx & \text{if } X \text{ is continuous} \end{cases}\]

Variance: \[\text{Var}(X) = \mathbb{E}[(X - \mathbb{E}[X])^2] = \mathbb{E}[X^2] - (\mathbb{E}[X])^2\]

A.2 Statistical Estimation and the Sample Mean

Consider i.i.d. observations \(X_1, \ldots, X_N\) with common distribution \(F\). We seek to estimate \(\theta = \mathbb{E}[g(X)]\) for some measurable function \(g: \mathbb{R} \to \mathbb{R}\) using the sample mean estimator: \[\hat{\theta}_N = \frac{1}{N} \sum_{i=1}^{N} g(X_i) \tag{A.1}\]

Theorem (Properties of the Sample Mean Estimator)

The sample mean estimator \(\hat{\theta}_N\) satisfies:

  1. Unbiasedness: \(\mathbb{E}[\hat{\theta}_N] = \theta\)
  2. Variance: \(\text{Var}(\hat{\theta}_N) = \frac{\sigma^2}{N}\) where \(\sigma^2 = \text{Var}(g(X))\)
  3. Standard Error: \(\text{SE}(\hat{\theta}_N) = \sigma/\sqrt{N}\)
  4. Mean Squared Error: \(\text{MSE}(\hat{\theta}_N) = \sigma^2/N\)

For any estimator \(\hat{\theta}\) of parameter \(\theta\), we define:

  • Bias: \(\text{Bias}(\hat{\theta}) = \mathbb{E}[\hat{\theta}] - \theta\)
  • Mean Squared Error: \(\text{MSE}(\hat{\theta}) = \mathbb{E}[(\hat{\theta} - \theta)^2] = \text{Var}(\hat{\theta}) + \text{Bias}^2(\hat{\theta})\)

An estimator sequence \(\{\hat{\theta}_N\}\) is consistent if \(\hat{\theta}_N \xrightarrow{P} \theta\) as \(N \to \infty\), and strongly consistent if \(\hat{\theta}_N \xrightarrow{a.s.} \theta\) as \(N \to \infty\).

A.3 Fundamental Limit Theorems

The theoretical foundation of Monte Carlo methods rests on two cornerstone results from probability theory.

Theorem (Strong Law of Large Numbers)

Let \(X_1, X_2, \ldots\) be i.i.d. random variables with \(\mathbb{E}[|X_i|] < \infty\) and \(\mathbb{E}[X_i] = \mu\). Then: \[\frac{1}{N}\sum_{i=1}^{N} X_i \xrightarrow{a.s.} \mu \quad \text{as } N \to \infty\]

Theorem (Central Limit Theorem)

Let \(X_1, X_2, \ldots\) be i.i.d. random variables with \(\mathbb{E}[X_i] = \mu\) and \(0 < \text{Var}(X_i) = \sigma^2 < \infty\). Then: \[\frac{\sqrt{N}(\bar{X}_N - \mu)}{\sigma} \xrightarrow{d} \mathcal{N}(0,1) \quad \text{as } N \to \infty\] where \(\bar{X}_N = \frac{1}{N}\sum_{i=1}^{N} X_i\) and \(\xrightarrow{d}\) denotes convergence in distribution.

Convergence notation: \(\xrightarrow{P}\) denotes convergence in probability, \(\xrightarrow{a.s.}\) denotes almost sure convergence, and \(\xrightarrow{d}\) denotes convergence in distribution.

A.3.1 Monte Carlo Implications

For our estimator \(\hat{\theta}_N = \frac{1}{N}\sum_{i=1}^{N} g(X_i)\) where \(\theta = \mathbb{E}[g(X)]\):

  1. Consistency (from SLLN): \(\hat{\theta}_N \xrightarrow{a.s.} \theta\)
  2. Asymptotic normality (from CLT): \(\sqrt{N}(\hat{\theta}_N - \theta) \xrightarrow{d} \mathcal{N}(0, \sigma^2)\) where \(\sigma^2 = \text{Var}(g(X))\)

A.4 Confidence Intervals and Convergence Analysis

Let \(S_N^2 = \frac{1}{N-1}\sum_{i=1}^{N}(g(X_i) - \hat{\theta}_N)^2\) be the sample variance. By the CLT: \[\frac{\hat{\theta}_N - \theta}{S_N/\sqrt{N}} \xrightarrow{d} \mathcal{N}(0,1)\]

An asymptotic \((1-\alpha)\)-level confidence interval is: \[\left[ \hat{\theta}_N - z_{1-\alpha/2} \frac{S_N}{\sqrt{N}}, \quad \hat{\theta}_N + z_{1-\alpha/2} \frac{S_N}{\sqrt{N}} \right]\] where \(z_{1-\alpha/2} = \Phi^{-1}(1-\alpha/2)\) and \(\Phi\) is the standard normal CDF.

The absolute width is \(2z_{1-\alpha/2} \frac{S_N}{\sqrt{N}}\) and the relative width is \(\frac{2z_{1-\alpha/2} S_N}{|\hat{\theta}_N|\sqrt{N}}\).

Remark (Monte Carlo Convergence Properties)

The sample mean estimator has standard error \(\text{SE}(\hat{\theta}_N) = \sigma/\sqrt{N} = O(N^{-1/2})\), leading to several key properties:

  1. Square Root Law: To halve the confidence interval width requires four times as many samples
  2. Dimension Independence: The \(O(N^{-1/2})\) convergence rate is independent of problem dimension for independent samples
  3. MCMC Caveat: When using MCMC, dependence between samples can effectively reduce the number of independent observations, particularly in high dimensions
Conditions for Limit Theorems

Both the Law of Large Numbers and Central Limit Theorem require:

  • Independence: Samples must be independent (or satisfy weaker mixing conditions)
  • Identical distribution: Samples from the same distribution
  • Finite moments: Finite mean for LLN, finite variance for CLT

When using MCMC, the independence assumption is violated, requiring analysis of effective sample size.