Commit 841fb5e2 authored by David van Driel's avatar David van Driel
Browse files

some typos

parent 13cc3488
Pipeline #57577 passed with stages
in 1 minute and 33 seconds
......@@ -5,20 +5,20 @@
<iframe width="100%" height="315" src="https://www.youtube-nocookie.com/embed/a6NqGjkCkk0" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
## Introduction
The Monte Carlo method is a very powerful tool of statistical physics. Monte Carlo methods are as useful as they are widespread. For example, one can also compute [molecular dynamics using Monte Carlo methods](https://en.wikipedia.org/wiki/Monte_Carlo_molecular_modeling). There's a reason it's named after Monaco's famous casino; it utilises probability and randomness. In most cases, a system is evolved to a new state which is chosen from a randomly generated ensemble of possible future states. Then, using some criteria, this new state is accepted or rejected with a certain probability. This can be used in many different areas of statistics, with end goals ranging from: reaching a Bose-Einstein ground state, minimizing an investment portfolio risk or [optimizing the boarding process of an airplane](https://arxiv.org/abs/0802.0733?). Considering the breadth of applications, we choose to center this second project on Monte Carlo methods.
The Monte Carlo method is a very powerful tool of statistical physics. Monte Carlo methods are as useful as they are widespread. For example, one can also compute [molecular dynamics using Monte Carlo methods](https://en.wikipedia.org/wiki/Monte_Carlo_molecular_modeling). There's a reason it's named after Monaco's famous casino; it utilises probability and randomness. In most cases, a system is evolved to a new state which is chosen from a randomly generated ensemble of possible future states. Then, using some criteria, this new state is accepted or rejected with a certain probability. This can be used in many different areas of science, with end goals ranging from: reaching a Bose-Einstein ground state, minimizing an investment portfolio risk or [optimizing the boarding process of an airplane](https://arxiv.org/abs/0802.0733?). Considering the breadth of applications, we choose to center this second project on Monte Carlo methods.
## Monte Carlo integration
While there are multiple categories of Monte Carlo Methods, we will focus on Monte Carlo integration. To see the advantage of this technique, consider a system that is described by a Hamiltonian $H(R)$ depending on $R$. Let's call $R$ the set of all system degrees of freedom. This might include terms like a magnetic field $B$, potential $V$, etc. We're interested in a specific observable of this system called $A(R)$. Specifically, we'd like to know it's expectation value $\langle A\rangle$. From statistical physics, all system state likelihoods can be summarized in the partition function:
While there are multiple categories of Monte Carlo Methods, we will focus on Monte Carlo integration. To see the advantage of this technique, consider a system that is described by a Hamiltonian $H(R)$ depending on $R$. This $R$ is the set of all system degrees of freedom. This might include terms like a magnetic field $B$, potential $V$, etc. We're interested in a specific observable of this system called $A(R)$. Specifically, we'd like to know it's expectation value $\langle A\rangle$. From statistical physics, all system state likelihoods can be summarized in the partition function:
$$Z=\int e^{-\beta H(R)}dR$$
Where $\beta=1/(k_BT$) and the Hamiltonian $H(R)$ is integrated over all system degrees of freedom. Here, the Boltzmann factor weighs the probability of each state. The expression for the expectation value can then be expressed as:
$$\langle A\rangle = 1/Z \int e^{-\beta H(R)} A(R)dR$$
For most systems, $R$ is a collection of many parameters. Hence, this is a high-dimensional integral. This means an analytic solution is often impossible. A numerical solution is therefore required to compute the expectation value. In the next section, we will demonstrate the purpose of sampling an integral and convert it into a sum, which is easier to solve for computers. Then, a bit later, you will see why Monte Carlo integration becomes beneficial quite quickly.
For most systems, $R$ is a collection of many parameters. Hence, this is a high-dimensional integral. This means an analytic solution is often impossible. A numerical solution is therefore required to compute the expectation value. In the next section, we will demonstrate the purpose of sampling an integral and converting it into a sum, which is easier to solve for computers. Then, a bit later, you will see why Monte Carlo integration becomes beneficial quite quickly.
## A simple example
Take a general, one-dimensional integral $I=\int_a^bf(x)dx$. We can rewrite this integral into a summation as follows:
$$\int_a^bf(x)dx = (b-a)\int_a^b \frac{1}{b-a} f(x)dx= \lim_{N \rightarrow \infty} \frac{b-a}{N} \sum_i^N f(x_i)$$
Where, $x_i = a + i\ \frac{b-a}{N}$ One could say, the $\{x_i\}$ are distributed uniformly in the interval $[a,b]$
Where, $x_i = a + i\ \frac{b-a}{N}$ One could say, the $\{x_i\}$ are distributed uniformly in the interval $[a,b].$
The limit is only needed for the integral and summation to be exactly the same. As opposed to choosing the $x_i$ uniformly, we could also pick them randomly. Doing this results in a Monte Carlo integral:
$$I=\int p(x)f(x)dx \approx \frac{1}{N}\sum_i f(x_i)$$
Now, the $x_i$ are randomly drawn from $p(x)$. In other words: we are sampling the function $f(x)$ with values from $p(x)$. This way, the result of the integral can be constructed from the finite summation. In the previous example, the $x_i$ weren't randomly drawn but rather evenly distributed. Getting the $x_i$ from $p(x)$ and computing the sum results in an error $\sim \frac{1}{\sqrt{n}}$.
......@@ -50,33 +50,32 @@ $$\langle A\rangle = \frac{\int p(R)W(R) A(R)dR}{\int p(R)W(R)dR} \approx \frac{
This is the Monte Carlo integral, where the $R_i$ are now sampled from $P(R)$. Already, we can note a number of things:
- It is ideal to have $P(R) = e^{-\beta H(R)}$. This means you're only sampling in the areas where the observable $A(R)$ actually contributes to $\langle A\rangle$
- Some weights $W(R_i)$ might dominate and skew the resulting $\langle A\rangle$
- In high dimensions, the overlap of $P(R)$ and $e^{-\beta H(R)}$ vanishes
- In high dimensions, the overlap of $P(R)$ and $e^{-\beta H(R)}$ vanishes. If you have a slight misalignment of the two for each dimension $N$, the total misaligment results in poor overlap.
This leaves us a practical question: how do we sample from $P(R)$? The answer lies in using Markov chains.
## Metropolis Monte Carlo
So what is a Markov chain? Consider a sequence of random states $\{X_i\} = X_1, X_2, ..., X_N$. We say that the probability of choosing a new random state $X_{i+1}$ only depends on $X_i$, not any other previous states. Now, one can transition from one state to the other with a certain probability, which we denote as: $T(X \rightarrow X^\prime)$. This rate is normalised: $\sum_{X^\prime}T(X \rightarrow X^\prime)=1$. This simply means: if you choose 1 out of $N$ options, you choose one of the $N$.
So what is a Markov chain? Consider a sequence of random states $\{X_i\} = X_1, X_2, ..., X_N$. We say that the probability of choosing a new random state $X_{i+1}$ only depends on $X_i$, not any other previous states. Now, one can transition from one state to the other with a certain probability, which we denote as: $T(X \rightarrow X^\prime)$. This rate is normalised: $\sum_{X^\prime}T(X \rightarrow X^\prime)=1$. This simply means: you have to go to some state.
We can then write a rate equation to see what the chance is of obtaining state $P(X)$ at step $i+1$:
$$P(X, i+1) = P(X, i) - \sum_{X^\prime}P(X, i)T(X \rightarrow X^\prime) + \sum_{X^\prime}P(X^\prime, i)T(X^\prime \rightarrow X)$$
Put in words:
But we want this Markov chain to eventually sample a probability distribution, like the Boltzmann distribution. This means that we want the probability to be stationary:
But we want this Markov chain to eventually sample a probability distribution, like the Boltzmann distribution. This means that we want the probability to be stationary, such that we always sample the same distribution:
$$P(X, i+1)=P(X, i)=P(X)$$
This simplifies the rate equation to:
$$\sum_{X^\prime}P(X)T(X \rightarrow X^\prime) = \sum_{X^\prime}P(X^\prime)T(X^\prime \rightarrow X)$$
In turn, this is then solved by the condition:
$$P(X)T(X \rightarrow X^\prime) = P(X^\prime)T(X^\prime \rightarrow X)$$
This is very important, it's called **"detailed balance"**.
We can generate a desired probability distribution $P(X)$ by choosing the right $T(X^\prime \rightarrow X)$, including the Boltzmann distribution. This leads to the question: how to we generate the $T$ terms to get the Boltzmann distributin?
We can generate a desired probability distribution $P(X)$ by choosing the right $T(X^\prime \rightarrow X)$, including the Boltzmann distribution. This leads to the question: how do we generate the $T$ terms to get the Boltzmann distributin?
The Metropolis algorithm uses the following ansatz: $T(X^\prime \rightarrow X) = \omega_{XX^\prime} \cdot A_{XX^\prime}$. Here, $\omega$ is the probability to generate this new state. This is called 'generating a trial move'. Then, $A$ is the probability of accepting this proposed trial move.
The Metropolis algorithm uses the following ansatz: $T(X^\prime \rightarrow X) = \omega_{XX^\prime} \cdot A_{XX^\prime}$. Here, $\omega$ is the probability to generate this new state. This is called 'proposing a trial move'. Then, $A$ is the probability of accepting this proposed trial move.
Put into words, we say the probability of going to a state new state ($T(X^\prime \rightarrow X)$) is the product of proposing it ($\omega_{XX^\prime}$) and accepting it ($A_{XX^\prime})$.
We usually demand that: $\omega_{XX^\prime}=\omega_{X^\prime X}$. Which means that our detailed balance equation reduces to:
$$\frac{A_{X^\prime X}}{A_{XX^\prime}} = \frac{P(X)}{P(X^\prime)}$$
There's multiple solutions that fulfil this condition. The solution as given by Metropolis is:
$$A_{XX^\prime}=\begin{cases}&1 &\text{if} \ P(X^\prime) > P(X)\\ \\ &\frac{P(X)}{P(X^\prime)} &\text{if} \ P(X^\prime) < P(X)\end{cases}$$
Below, we present a pseudocode for performing the Metropolis algorithm
Below, we present pseudocode for performing the Metropolis algorithm
## Summary of Metropolis algorithm
1. Start with a state $X_i$
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment