3. Monte Carlo #

Monte Carlo methods are a subclass of numerical integration techniques, that is, techniques to approximate the value of an integral by a finite sampling,

$$
  \int f(x)\,\text{d}x \approx \sum_{i = 0}^{N-1} \alpha_i f(x_i), \tag{3.1}
$$

where, $f : x \in \mathbb{R}^d \mapsto y \in \mathbb{R}$ denotes the real-valued integrand, and $\alpha_i \in \mathbb{R}$ and $x_i$ are called weights and integration points, respectively. Evidently, an approximation becomes better when more points are used.

Textbook analytic integration schemes, such as the Simpson method or Gauss quadrature rule, take integration points equidistantially spaced in the domain of $f$. When, however, the dimensionality $d$ of the domain grows, the number of integration points needs to increase exponentially to maintain the same approximation error. As a result computations become quickly infeasible. These methods are therefore said to suffer from the curse of dimensionality.

Monte Carlo methods sample integration points randomly from $\text{dom}(f)$, and the approximation error decreases with the square root of the number of integration points, i.e. with $\mathcal O(1 / \sqrt{N})$. Notwithstanding that the big-$\mathcal O$ constant could depend radically on $d$, the limit is independent of the dimensionality. We discuss two Monte Carlo methods below.

3.1 Hit-or-miss Monte Carlo #

3.1.1 Method #

Consider a non-negative integrand $f(x) : \mathbb{R} \to [0, \infty)$. This method is similar to the rejection method. Enclose the graph of $f$ in $X \times Y := [a, b] \times [0, c]$, where $a \leq x \leq b$ and $0 \leq f(x) \leq c$. The region of integration is

$$
  A := \left\{(x, y) \in X \times Y \, | \, y \leq f(x) \right\},
$$

and all points are simply given by $R := \{(x, y) \in X \times Y\}$ .

Let $B$ be the probability that a given point is both in $A$ and in $R$. Then

$$
  \mathbb{E}[B] = \frac{\text{area}\, A}{\text{area}\, R} = \frac{\int f(x)\,\text{d}x}{c(b-a)}. 
$$

Sample $N$ points $(x_i, y_i)$ uniformly in $R$. Denote with $N_A$ the number of hits, points for which $y_i \leq f(x_i)$ . Then $\mathbb{E}[B]$ is approximated by the ratio samples that hit $A$, i.e., $N_A / N$ , and as a consequence

$$
  \int f(x)\,\text{d}x \approx \frac{N_A}{N}c(b-a).
$$

3.1.2 Error #

More precisely, $B$ can be interpreted as a Bernoulli variable that is $1$ whenever a point falls in $A$, and $0$ otherwise. For $N$ samples $b_i \sim B$ , the sample mean is

$$
  \bar{B}_N := \frac{1}{N} \sum_{i=1}^N b_i = \frac{N_A}{N}.
$$

The error of the integral is determined by the error of the sample mean. The latter can be computed with the help of the standard error of the sample mean,

$$
  \mathbb{E}\left[\left|\bar{B}_N - \mathbb{E}[B]\right|\right] \leq \frac{1}{\sqrt{N}} \sqrt{\mathbb{V}[B]} \leq \frac{1}{2\sqrt{N}},
$$

which is again bounded by an application of the Bernoulli variance.

If the area $A$ is known, e.g., for a theoretical exercise, $\mathbb{V}[B]$ can simply be computed as

$$
  \mathbb{V}[B]
  = \frac{\text{area}\,A}{\text{area}\,R} - \left(\frac{\text{area}\,A}{\text{area}\,R}\right)^2.
$$

If that is, however, not the case, a solution is to use the variance of the samples itself. In that case $\mathbb{V}[B]$ can be approximated via the unbiased sample variance:

$$
  \mathbb{V}[B] \approx
  \mathbb{V}[\bar{B}_N]
  = \frac{1}{N-1} \sum_{i=1}^N\left(b_i - \bar{B}_N\right)^2
  = \frac{N_A}{N - 1}\left(1 - \frac{N_A}{N}\right).
$$

3.2 Simple-sampling Monte Carlo #

3.2.1 Method #

Simple-sampling Monte Carlo obtains an approximation of the integral by uniformly sampling from the domain of $f(x)$ and averaging all the function values on these integration points. Consider $x_i \sim X$ to be a realization of the uniform random variable $X$ over $[a, b]$. The expectatation of function values of $X$, i.e., $\mathbb{E}[f(X)]$ , is therefore:

$$
  \mathbb{E}\left[f(X)\right] = \int_a^b \frac{f(x)}{b - a}\,\text{d}x = \frac{1}{b-a} \int_a^b f(x)\,\text{d}x. \tag{3.2}
$$

The first equality is by definition of the expected value: a random variable $Q$ with pdf $q(x)$ has $\mathbb{E}[Q] = \int x q(x)\,\text{d}x$ .

Reorganization of eq. (3.2) gives an expression for the desired integral value:

$$
  \int_a^b f(x)\,\text{d}x = (b - a)\mathbb{E}[f(X)]. \tag{3.3}
$$

To gain an estimator of eq. (3.2) is just to use the law of large numbers. All one needs to do is “simply sample” from $X$ and average the function values:

$$
  \overline{f(X)}_N := \sum_{i=1}^N \frac{1}{N} f(x_i). \tag{3.4}
$$

An approximation to the integral follows from eq. 3.3 and eq. 3.4:

$$
  \int_a^b f(x)\,\text{d}x \approx \sum_{i=1}^N \frac{b-a}{N} f(x_i).
$$

3.2.2 Error #

The sample mean is given by eq. 3.4. Using the standard error of the sample mean, the error is

$$
  \mathbb{E}\left[\left|\overline{f(X)}_N - \mathbb{E}[f(X)]\right|\right] \leq \sqrt{\frac{1}{N} \mathbb{V}[f(X)]}.
$$

However, as $\mathbb{V}[f(X)]$ is not generally known, we can only conclude that the error decreases asymptotically with $\mathcal O(1 / \sqrt{N})$ – provided that the function values have bounded variance.

It is again possible to estimate the sample variance from the calculated function values itself, using the unbiased sample variance:

$$
  \mathbb{V}[f(X)] \approx \mathbb{V}[\overline{f(X)}_N] = \frac{1}{N-1} \sum_{i=1}^N \left(f(x_i) - \overline{f(X)}_N\right)^2.
$$