Lesson 1: Mathematical Fundamentals

Author

Evan Walter Clark Spotte-Smith

(home)

Mathematical Fundamentals

As the name implies, statistical mechanics (statmech) requires us to use techniques from the mathematical branches of probability and statistics to reason about large systems (or collections of systems, called ensembles) that can exist in many different states.

In this lesson, we’ll go through some basic mathematical concepts that will be useful to you in your statmech journey.

Code
import Pkg
Pkg.activate("../../../")

using InteractiveUtils
using DataStructures # For common data structures; see https://juliacollections.github.io/DataStructures.jl/latest/
using StatsBase # For basic statistics; see https://juliastats.org/StatsBase.jl/stable/
using Distributions # Also for stats purposes; see https://juliastats.org/Distributions.jl/stable/
using Random # For (pseudo)random number generation; see https://docs.julialang.org/en/v1/stdlib/Random/
using Plots # For data visualization; see https://docs.juliaplots.org/stable/
using LaTeXStrings # For string formatting; see https://github.com/JuliaStrings/LaTeXStrings.jl
  Activating project at `~/software/ewcss.info`

Limits and Series

Source material: McQuarrie & Simon, “MathChapter C: Series and Limits”, in Molecular Thermodynamics, 1999.

Further Resources:

You’ve probably encountered limits before, for instance in your calculus classes. Simply put, a limit is the value that some function or numerical sequence approaches as its input (or argument) gets closer and closer to some value. Note the use of the word “approaches”. When evaluating limits, we are not looking at what happens at some input value, but only what happens as we get closer and closer to it. This is important, because often we’re interested in limits related to quantities that we can’t reach, like infinity (here denoted \(\infty\)).

As an example, we define the derivative of a function \(f(x)\) using limits:

\[ \frac{df}{dx}(a) = \lim_{x \to a} \frac{f(x) - f(a)}{x - a} \]

The expression on the right describes how the value of \(f\) changes as \(x\) gets closer and closer to \(a\) (but never exactly reaching \(a\), as this would lead to division by 0); in other words, it is the local rate of change of the function at \(a\).

Evaluating limits is often straightforward. For instance, the limit

\[ \lim_{x \to \infty} \frac{1}{x} \]

is unambiguously 0. The numerator is always 1, the denominator is ever-increasing, so the limit vanishes.

In some cases, though, a limit cannot be directly evaluated. Take, for example

\[ \lim_{x \to 0} \frac{\text{sin}(x)}{x} \]

or

\[ \lim_{x \to 0} x~\text{sin}(1/x) \]

In the first case, we know that \(\text{sin}(0)\) is 0, but \(x\) is also 0 at that point, so, on first glance, our limit seems to converge to \(\frac{0}{0}\), which we can’t evalute. The second case is even more confusing. We have \(\frac{1}{x}\) going towards \(\infty\) as \(x \to 0\), which means \(\text{sin}(1/x)\) goes to… nothing in particular (since \(\text{sin}\) oscillates, it will never converge to any one value as its argument continues to increase). Of course, the \(x\) term will converge to 0, so we have 0 times some undefined quantity.

Luckily, there are some tricks that you can use to evaluate these unruly limits.

The adorably named Squeeze Theorem says:

Let \(I\) be an interval containing the point \(a\). Let \(g\), \(f\), and \(h\) be functions defined on \(I\), except possibly at \(a\) itself. Suppose that for every \(x\) in \(I\) not equal to \(a\), we have \(g(x) ≤ f(x) ≤ h (x)\) and also suppose that \(\lim_{x \to a} g(x) = \lim_{x \to a} h (x) = L\). Then \(\lim_{x \to a} f(x) = L\). (Wikipedia)

Basically, if you can find two functions \(g\) and \(h\) that bound the function of interest, \(f\), around your point of interest \(a\) (but not necessarily at \(a\); yay, limits!), and if you know that the upper and lower bound functions converge to the same value as they approach \(a\), then that tells you that the limit of \(f\) is the limit of \(g\) and \(h\). Picking appropriate bounding functions is tricky and might require some creativity, but once done, evaluating your limit becomes easy.

You may not have seen the Squeeze Theorem in your calculus class, but you’ve probably seen (or at least heard of) L’Hôpital’s Rule, which says:

Let \(f\) and \(g\) be functions that are defined on an interval \(I\) and differentiable over \(I\) (except perhaps at the point of interest \(a\)). If \(\lim_{x \to a} f(x) = 0\) or \(\pm \infty\) and \(\lim_{x \to a} g(x) = 0\) or \(\pm \infty\), and if \(\lim_{x \to a} \frac{f'(x)}{g'(x)}\) exists, where \(f'(x)\) and \(g'(x)\) are the first derivatives of \(f\) and \(g\) with respect to \(x\), respectively, then \(\lim_{x \to a} \frac{f(x)}{g(x)} = \lim_{x \to a} \frac{f'(x)}{g'(x)}\).

L’Hôpital’s Rule is somewhat more narrow than the Squeeze Theorem in the kinds of situations where you can use it, but it’s handy because it does not require you to figure out any new bounding functions. All you need is the ability to derive the different parts of your function!

Try it out yourself! Use the Squeeze Theorem and/or L’Hôpital’s Rule to evaluate the two tricky limits listed above: \(\lim_{x \to 0} \frac{\text{sin}(x)}{x}\) and \(\lim_{x \to 0} x~\text{sin}(1/x)\).

Series are, in essence, infinite sums. Series arise in statmech in a number of different contexts. For instance, we make use of series and their approximations when determining the expected (i.e., average; see Probability and Statistics below) value of some property. Series are also useful because they can be used to approximate many common functions. Below are listed some examples of infinite series representations:

\[ e^x = \sum_{n=0}^\infty \frac{x^n}{n!} = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} … \]

\[ \frac{1}{1-x} = \sum_{n=0}^\infty x^n = 1 + x + x^2 + x^3 + …, x \in (-1, 1) \]

\[ \text{sin}(x) = \sum_{n=0}^\infty (-1)^{n} \frac{x^{(2n + 1)}}{(2n + 1)!} = x - \frac{x^3}{3!} + \frac{x^5}{5!} + … \]

\[ \text{cos}(x) = \sum_{n=0}^\infty (-1)^{n} \frac{x^{2n}}{2n!} = 1 - \frac{x^2}{2!} + \frac{x^4}{4!} + … \]

In practice, we cannot sum infinite series using computers, but by truncating the summation, we can create useful approximations.

To illustrate this, let’s show how a finite approximation of the series expansion of \(e^x\) converges towards the desired function. We’ll display the exact function \(f(x) = e^x\) and the approximation:

\[ e^x \approx \sum_{i=0}^{n} \frac{x^i}{i!} = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} … + \frac{x^{n}}{n!} + O(\frac{x^{n+1}}{(n+1)!}) \]

where \(n\) is some integer that we choose and the \(O(\frac{x^{n+1}}{(n+1)!})\) term indicates that there is some error in our calculation that will be on the order of \(\frac{x^{n+1}}{(n+1)!}\) (because each term in the series gets progressively smaller, the error will always mostly be captured by the leading term you’ve excluded).

Code
xs = 0.0:0.05:5.0
ys_1 = xs
ys_3 = zeros(size(xs))
ys_5 = zeros(size(xs))
ys_10 = zeros(size(xs))

for i in 1:3
    ys_3 .+= xs .^ i ./ factorial(i)
end

for i in 1:5
    ys_5 .+= xs .^ i ./ factorial(i)
end

for i in 1:10
    ys_10 .+= xs .^ i ./ factorial(i)
end

plot(
    xs,
    [exp.(xs) ys_1 ys_3 ys_5 ys_10],
    label=["exact" "n = 1" "n = 3" "n = 5" "n = 10"],
    xlabel="x",
    ylabel="exp(x)",
    linewidth=3
)

Notice that, close to \(x = 0\), the approximation is accurate (on the scale of the plot) even at small \(n\). We often take advantage of this limiting behavior to simplify our mathematical expressions. For instance, we might write (for small \(x\))

\[ e^x \approx 1 + x + O(x^2) \]

so we now have a linear function (\(1 + x\)) as opposed to an exponential, and linear functions are often much nicer to work with.

Typically, a series representation of a function is only useful to us if it converges; that is, if, as \(n \rightarrow \infty\) (or, put another way, in the infinite limit), the series approaches a single, well-defined value. We use limits to understand behavior in situations like these, answering the question: “As our input moves closer to some value, what tends to happens to the response (in this case, the value of the sum)?”

Our friends in mathematics have developed a number of ways to prove if a series converges or diverges. We’ll keep it simple and look at the ratio test:

Take \[ L = \lim_{n\to\infty} |\frac{a_{n+1}}{a_n}|, \]

where \(a_i\) is the \(i\)-th term in the series. If \(L > 1\), that means that, as \(n\) keeps increasing, each term in the series will be larger than the last (at least in absolute terms). This means that the sum will either tend towards \(-\infty\) or \(\infty\) or else will oscillate wildly; the series diverges. On the other hand, if \(L < 1\), that means that, eventually (remember, this is in the infinte limit), the series will reach a point where each term is smaller than the last; in this case, the series will converge. If \(L = 1\), the ratio test is inconclusive; the series could converge or diverge.

Try this out yourself! Use the limit test to evaluate whether (or under what conditions) the series representation for \(\text{sin}(x)\) (see above) converges. This is Problem C-12 in McQuarrie & Simon.

Probability and Statistics

Source material: McQuarrie & Simon, “MathChapter B: Probability and Statistics”, in Molecular Thermodynamics, 1999.

Further Resources:

Simply put, a probability \(p_i\) is the likelihood of some event \(i\) occuring. We represent probabilities as numbers, and our common sense tells us that these numbers have two critical properties:

\[ \text{1.}~p_i \in [0, 1] \]

Probabilities are strictly bounded. They are at least \(0\) (a \(p_i = 0\) means that \(i\) is impossible), and they are at most \(1\) (\(p_i = 1\) means that \(i\) is certain). Having a negative probability or a probability greater than \(1\) is meaningless.

Now, say that, in a given experiment, there are multiple possible outcomes, \(i = 1, 2, \dots\). We know that one of our outcomes has to happen in our experiment; in other words, it is certain that one of our outcomes will occur. This gives us our normalization condition:

\[ \text{2.}~\sum_{i = 1}^{n} p_i = 1 \]

where outcomes are indexed sequentially and there are \(n\) possible outcomes.

Typically, in physical and chemical settings, our events are associated with some measurable property. In statmech, we often want to know: what is the average, or expected, value, of that property?

For each event \(i\) with probability \(p_i\), let there be an associated value \(x_i\). We then define the expectation value of \(x\), \(\langle x \rangle\):

\[ \langle x \rangle = \sum_{i = 1}^n x_i p_i \]

Let’s look more closely at this equation, as it will be important in this module! We know that we have some events \(I = \{i | i \in \mathbb{N}, 1 \leq i \leq n\}\). In each experiment, one and only one of these events (some \(i\)) occurs, and when that event occurs, we measure the associated value \(x_i\). We know how likely \(i\) is to occur — that’s \(p_i\) — and we can use that probability to assign the relative importance of the \(x_i\) associated with \(i\) to the expectation value.

Now, let’s look at probability and expectation values in practice. Let’s say that we’re rolling a conventional (six-sided) die, but the die is not balanced. We (somehow) know that, instead of each number 1–6 being equally likely, the probabilities are as follows:

  1. 0.15
  2. 0.15
  3. 0.20
  4. 0.20
  5. 0.10
  6. 0.20

Quick check! Make sure that these probabilities obey the bounding and normalization conditions listed above.

In a fair die, our expectation value would be

\[ \frac{1}{6} \times 1 + \frac{1}{6} \times 2 + \frac{1}{6} \times 3 + \frac{1}{6} \times 4 + \frac{1}{6} \times 5 + \frac{1}{6} \times 6 = 3.5. \]

For our die, we instead find that the expectation value is

\[ 0.15 \times 1 + 0.15 \times 2 + 0.20 \times 3 + 0.20 \times 4 + 0.10 \times 5 + 0.20 \times 6 = 3.55 \]

So, when you roll our unbalanced die, your roll will be \(\approx 1\%\) higher, on average.

At this point, it’s useful to introduce the law of large numbers, which states that, as the number of independent experiments with identical probability distributions tends towards infinity, the average observed value will converge to the “true” average, i.e., the expectation value.

Put more simply: when you’re dealing with very large numbers of experiments (a large sample size), you should expect for the sample as a whole to behave as in the theoretical average case.

We can also expect that the frequenty of events in this large-number case should match their probabilities:

\[ \lim_{N \to \infty} \frac{N_i}{N} = p_i, \]

where \(N_i\) is the number of occurrences of event \(i\) and \(N\) is the total number of experiments.

I’ve highlighted two important caveats to the law of large numbers: 1) your experiments need to be completely independent, i.e., the outcome of one experiment in no way affects any future experiments; and 2) each experiment has the exact same probability distribution. For our case, it seems fair to assume that each roll of the die is independent from all others, so the first criterion is satisfied. As for the second, our probabilities are constants by definition.

Now, let’s watch this play out in a real example. I’ll use a (pseudo)random number generator1 to roll our fictitious, imbalanced six-sided die a number of times. I’ll track how many of each number (1–6) we get, and from those observed frequencies, I’ll calculate the average. What we should see is that, as the number of experiments goes up, the average approaches the expectation value.

Code
values = [1, 2, 3, 4, 5, 6]  # Possible values for our dice roll
probs = [0.15, 0.15, 0.20, 0.20, 0.10, 0.20]  # Probabilities, with matching indices

sample_10 = wsample(values, probs, 10)  # Roll the die 10 times
sample_1000 = wsample(values, probs, 1000)  # Roll the die 1000 times
sample_100000 = wsample(values, probs, 100000)  # Roll the die 100000 times
sample_10000000 = wsample(values, probs, 10000000)  # Roll the die 10000000 times

# Create collections with the number of times that each possible value was rolled
c_10 = counter(sample_10)
c_1000 = counter(sample_1000)
c_100000 = counter(sample_100000)
c_10000000 = counter(sample_10000000)

# Another function from the Julia plotting library `Plots.jl`, this time for bar plots
# Note: the expression `[c_x[i] / x for i in values]` creates an array.
#       We iterate over `values`, performing some operation on each entry in
#       the array, and then each result is added to an array in corresponding
#       order.
b_10 = bar(
    values,
    [c_10[i] / 10 for i in values],  # Again, we're plotting multiple values.
    xlabel="Die Value",
    ylabel="Relative Frequency",
    linewidth=3,
    legend=false,
    ylim = (0.0, 0.6),
    title="N = 10, Avg: $(round(mean(sample_10), digits=3))"  # Here, we're taking the average and then reporting it
                                                              # as a string
)

b_1000 = bar(
    values,
    [c_1000[i] / 1000 for i in values],
    xlabel="Die Value",
    ylabel="Relative Frequency",
    linewidth=3,
    legend=false,
    ylim = (0.0, 0.6),
    title="N = 1,000, Avg: $(round(mean(sample_1000), digits=3))"
)

b_100000 = bar(
    values,
    [c_100000[i] / 100000 for i in values],
    xlabel="Die Value",
    ylabel="Relative Frequency",
    linewidth=3,
    legend=false,
    ylim = (0.0, 0.6),
    title="N = 100,000, Avg: $(round(mean(sample_100000), digits=3))"
)

b_10000000 = bar(
    values,
    [c_100000[i] / 100000 for i in values],
    xlabel="Die Value",
    ylabel="Relative Frequency",
    linewidth=3,
    legend=false,
    ylim = (0.0, 0.6),
    title="N = 10,000,000, Avg: $(round(mean(sample_10000000), digits=3))"
)

# Plot the four bar charts in a 2-by-2 grid
plot(b_10, b_1000, b_100000, b_10000000, layout=4)

Analogous to the expectation value, we are sometimes interested in the second moment of some property, \(\langle x^2 \rangle\):

\[ \langle x^2 \rangle = \sum_{i = 1}^n x_i^2 p_i \]

or the variance of the property, \(\sigma_x^2\):

\[ \sigma_x^2 = \langle (x - \langle x \rangle)^2 \rangle = \sum_{i = 1}^n (x_i - \langle x \rangle)^2 p_i \]

Note that the square root of the variance, \(\sigma_x\) is the standard deviation of the mean.

Try this out yourself! Calculate the second moment, variance, and standard deviation of a balanced die and the unbalanced die used in the previous example.

Up until this point, we’ve discussed only discrete probability distributions; that is, functions defining probabilities that are finite within some exact, infinitessimal region and zero everywhere else. To see what we mean by this, consider again the die example. There is a probability for the die to roll 1, 2, 3, and so on, but there’s zero probability of the die rolling a 2.05 or a 0.008.

While reality is often discrete (especially at the sub-atomic to molecular scales, where quantum mechanics rules), it is often much more mathematically convenient to work with continuous probability distributions, functions that are (as the name suggests) continuous over the range of values that the random variable(s) can take. Not only is it convenient, but this is also a good approximation; as the number of particles (atoms, molecules, etc.) in a system increases, even though energy levels and other properties remain quantized, they can become so tightly packed together that the distribution becomes quasi-continuous.

One of the main differences between discrete and continuous distributions is the way that we account for each possible event. Whereas, with discrete distributions, we summed over all possible events, now we take an integral. For instance, our normalization condition becomes

\[ \int_{a}^b p(x) dx = 1, \]

where \(a\) and \(b\) are the limits of our random variable’s possible values (most generally, \(a = -\infty\) and \(b = \infty\). Note that we no longer index our events (they are, in general, uncountably infinite). Similarly, our expression for the expectation value is

\[ \langle x \rangle = \int_{a}^b x p(x)dx \]

Another way to think about \(\langle x \rangle\) is using the analogy of probability mass.

Imagine that the property values associated with your events exist in some continuous space (this isn’t strictly true – as we just discussed, we often work with discrete distributions – but it’s helpful for our illustration). You can think of this like our physical space, although our event-probability space may not have 3 dimensions. The probability of each event/associated property is treated as the amount of mass at that point (the probability mass).

Using this metaphor, the expectation value is nothing more and nothing less than the center of mass of our probability distribution. The second moment is analogous to the moment of inertia. The variance doesn’t have a common physical analogue, but it (and the standard deviation) are measures of how widely the distribution is spread around the average value.

Combinatorics

Resources:

At its core, combinatorics is about counting. That probably sounds like it’s not a particularly useful or interesting area of maths, but I promise you, there’s a lot that you can do with counting and combinatorial structures. (For interested students, ask me about my work using graph theory!)

For the purposes of this module, the main things that we need to count are permutations and combinations of finite collections. A combination is an unordered selection of elements from a collection, while a permutation is an ordered selection of elements. Unless otherwise stated, these selections occur without replacement (that is, I can’t pick the same element more than once).

As an example, consider a standard deck of 52 playing cards (ignoring Jokers). If we ask, “How many ways are there to order the cards in the deck?”, we’re asking about the number of possible permutations, because the order matters. If, on the other hand, we ask, “How many unique five-card poker hands are possible?”, we’re asking about combinations; it doesn’t matter what order you get the five cards in, just what the five cards are.

The general formula to count permutations of \(k\) elements from a collection of \(n\) is

\[ P(n, k) = \frac{n!}{(n-k)!} \]

Here, \(!\) refers to the factorial function, where

\[ n! = \begin{cases} 1~~\text{if}~~n = 0,\\ n (n-1)!~~\text{if}~~n > 0 \end{cases} \]

So, in our example of the deck, we’d have

\[ P(52, 52) = \frac{52!}{(52 - 52)!} = 52! \]

\[ = 52 \times 51! = 52 \times 51 \times 50! \dots \]

\[ = 52 \times 51 \times \dots \times 1 \approx 8.07 \times 10^{67} \]

Note that the factorial function grows really, really fast, so getting large numbers like this is pretty typical.

The expression for counting combinations is very similar. Basically, we take the permutation formula (\(P(n,k)\) above), and then, we divide this by the number of permutations corresponding to each combination. This gives us

\[ C(n,k) = \binom{n}{k} = \frac{n!}{k!(n-k)!}, \]

where \(\binom{n}{k}\) is commonly used notation for the binomial coefficient.

For our poker example, we have

\[ C(52, 5) = \binom{52}{5} = \frac{52!}{5! \times 47!} \approx 2.60 \times 10^6 \]

unique possible 5-card hands.

Finally, for some of the manipulations we’ll use in stat. mech., we’ll need Stirling’s approximation, which says that

\[ \text{ln}(N!) \approx N\text{ln}(N) - N \]

Stirling’s approximation is pretty good even with small \(N\), and it’s very accurate for large \(N\). Since, in stat. mech., we’re almost always dealing with large numbers (of particles, of systems in an ensemble, etc.), we can use Stirling’s approximation without fear of (significant) loss of accuracy.

Try this out yourself! Pick a few (reasonably small) numbers. Calculate the exact quantity \(ln(N!)\), and then use the Stirling approximation. How close is the estimate, and how quickly do the approximation and the exact value converge?

(back to course home)

Footnotes

  1. The numbers generated aren’t actually random. However, for our purposes, they can’t be distinguished from random numbers, so we can treat them as if they’re random.↩︎