Lesson 5: The Canonical Ensemble

Author

Evan Walter Clark Spotte-Smith

The Canonical Ensemble

In Lesson 2, we covered the microcanonical (NVE) ensemble, a somewhat simplistic ensemble where all systems have the same number of particles (\(N\)), volume (\(V\)), and energy (\(E\)).

Here, we’ll extend our analysis to a somewhat more practical ensemble: the canonical ensemble, also known as the NVT ensemble. As the name implies, this is an ensemble where every system has fixed number of particles, volume, and temperature (\(T\)). In the canonical ensemble, systems can exchange energy with a thermal reservoir. Because the individual systems can have different energies, there are multiple macrostates represented. Therefore, we cannot use the equal a priori probabilities postulate directly. However, we can think of the ensemble as a whole as being thermally isolated (with constant energy \(\mathcal{E}\)), and using this trick, we can arrive at expressions for a microstate’s probability and for the average thermodynamic properties of the systems in the ensemble.

Source Material:

Callen, “Chapter 16, Canonical Formalism”, in Thermodynamics and an Introduction to Thermostatistics: 2nd Edition, 1985.

Further Resources:

“Deriving the Canonical Ensemble (boltzmann entropy)” by “Fisika Lectures”
“Lecture 04, concept 12: Deriving the Boltzmann distribution - general case” by Erik Lindahl
“Partition Function” by “Physical Chemistry” (note that there are a couple of small errors – see the video description)

Code


    import Pkg
    Pkg.activate("../../../")

    using DataStructures # For common data structures; see https://juliacollections.github.io/DataStructures.jl/latest/
    using StatsBase # For basic statistics; see https://juliastats.org/StatsBase.jl/stable/
    using Distributions # Also for stats purposes; see https://juliastats.org/Distributions.jl/stable/
    using Random # For (pseudo)random number generation; see https://docs.julialang.org/en/v1/stdlib/Random/
    using Plots # For data visualization; see https://docs.juliaplots.org/stable/
    using LaTeXStrings # For string formatting; see https://github.com/JuliaStrings/LaTeXStrings.jl

  Activating project at `~/software/ewcss.info`

Deriving the Canonical Microstate Probability and Partition Function

As in the microcanonical ensemble, it’s important for us to keep in mind the constraints on our ensemble.

If we assume that there are a range of discrete microstates available to systems in our ensemble, indexed by \(i\), then we have

\[ \sum_i n_i = \mathcal{N}, \]

where \(\mathcal{N}\) is the total number of systems in our ensemble and \(n_i\) is the number of systems in state \(i\).

We also have that

\[ \sum_i n_i \varepsilon_i = \mathcal{E}, \]

where \(\varepsilon_i\) is the energy of state \(i\). This is a mathematical expression of our statement above that the ensemble energy is fixed, even though different systems within the ensemble can have different energies.

With these basics in mind, we can ask an important question: what is the probability of any particular system in our ensemble being in state \(i\)?

The simplest (though not the most insightful) way to express this probability is:

\[ p_i = \frac{\Omega_{\text{res}}(\mathcal{E} - \varepsilon_i)}{\Omega_{\text{total}}(\mathcal{E})}. \]

In this expression, \(\Omega_{\text{total}}(\mathcal{E})\) is the total number of microstates available to the ensemble (not to an individual system!) with total energy \(\mathcal{E}\), and \(\Omega_{\text{res}}(\mathcal{E} - \varepsilon_i)\) is the total number of microstates available to the reservoir (where the reservoir is defined as everything that isn’t our system of interest) with total energy \(\mathcal{E} - \varepsilon_i\). We subtract \(\varepsilon_i\) because, if the system is in state \(i\) with energy \(\varepsilon_i\), then the total amount of energy available to the reservoir is \(\mathcal{E} - \varepsilon_i\). Note that the numerator is written only in terms of the reservoir; because the state of the system is fixed (\(i\)), the number of microstates available to the system is \(1\), which doesn’t affect the numerator.

Although we’re far from a usable expression, we can already obtain some insights into the canonical ensemble from this initial \(p_i\). Intuitively, we might expect that there are more states available when there’s a higher energy, because there would be more ways to choose different amounts of energy to go in different systems (if this is confusing, look back at combinatorics in Lessons 1 and 4!). Since \(\mathcal{E}\) is fixed, our denominator is fixed, which would suggest that \(\Omega_{\text{res}}(\mathcal{E} - \varepsilon_i)\) (and, thus, \(p_i\)) is higher when \(\varepsilon_i\) is lower. Keep this in mind for later.

Moving on, we can rewrite this expression in terms of \(S\) using the Boltzmann entropy expression (\(S = k_B ln(\Omega)\); see Lesson 2):

\[ p_i = \frac{exp(k_B^{-1} S_{\text{res}}(\mathcal{E} - \varepsilon_i))}{exp(k_B^{-1} S_{\text{total}}(\mathcal{E}))} \]

This transformation is allowed because, while the equal a priori probability principle doesn’t apply to individual systems, it does apply to the ensemble.

Now, we can use some tricks. Let’s say \(U\) is the average system energy (later on, we’ll derive the value of \(U\) in this ensemble). Then, we can say

\[ S_{\text{total}}(\mathcal{E}) = S_{\text{sys}}(U) + S_{\text{res}}(\mathcal{E} - U), \]

because entropy is extensive and thus additive for different components.

Now we manipulate the reservoir term, \(S_{\text{res}}\). We rewrite the energy argument:

\[ S_{\text{res}}(\mathcal{E} - \varepsilon_i) = S_{\text{res}}(\mathcal{E} - U + U - \varepsilon_i). \]

This is allowed because we’re adding \(-U + U = 0\) and thus not changing the expression at all.

Now, because the ensemble contains many systems, we can safely assume that \(U - \varepsilon_i \ll \mathcal{E} - U\). So, we can write this as an expansion:

\[ S_{\text{res}}(\mathcal{E} - U + U - \varepsilon_i) = S_{\text{res}}(\mathcal{E} - U) + (\frac{\partial S}{\partial E})_{V,N}(U - \varepsilon_i), \]

where we’re treating \(U - \varepsilon_i\) as a small perturbation \(\text{d}E\). \((\frac{\partial S}{\partial E})_{V,N} = \frac{1}{T},\) so we have

\[ S_{\text{res}}(\mathcal{E} - U) + \frac{U - \varepsilon_i}{T}. \]

Putting that all together, we get the following expression for \(p_i\):

\[ p_i = \frac{exp(k_B^{-1} (S_{\text{res}}(\mathcal{E} - U) + \frac{U - \varepsilon_i}{T}))}{exp(k_B^{-1} (S_{\text{sys}}(U) + S_{\text{res}}(\mathcal{E} - U)))} \]

The \(S_{\text{res}}\) terms cancel out:

\[ p_i = \frac{exp(k_B^{-1} \frac{U - \varepsilon_i}{T})}{exp(k_B^{-1} (S_{\text{sys}}(U))}. \]

We can then rearrange these terms, arriving at

\[ p_i = exp(\frac{U - TS}{k_B T}) exp(\frac{- \varepsilon_i}{k_B T}), \]

where we have written \(S\) as shorthand for \(S_{\text{sys}}(U)\).

The Boltzmann Factor

Expressions like \(p_i = \frac{1}{Z} exp(\frac{-\varepsilon_i}{k_B T})\) pop up all over the place in chemistry and physics. They’re so common that not only does \(Z\) have a special name (partition function), but so does \(exp(\frac{- \varepsilon_i}{k_B T})\): the Boltzmann factor. It’s a unitless expression that tends to pop up when we’re looking at the likelihood of some energetic state being occupied.

Code


    # Create sequences to hold our `x` and `y` values 
    xs = 0.0:0.05:3.0  # The range of energies, from 0 eV to 10 eV, where 1 eV ≈ 1.602 × 10^-19 J
    ys_1 = exp.(-xs ./ (8.617 * 10^-5))  # The Boltzmann distribution at temperature `T = 1 K`. Here, the Boltzmann constant kB is also expressed in eV
    ys_300 = exp.(-xs ./ (8.617 * 10^-5 * 300))  # The Boltzmann distribution at temperature `T = 300 K`.
    ys_5000 = exp.(-xs ./ (8.617 * 10^-5 * 5000))  # The Boltzmann distribution at temperature `T = 5000 K`.
    ys_50000 = exp.(-xs ./ (8.617 * 10^-5 * 50000))  # The Boltzmann distribution at temperature `T = 50000 K`.

    plot(
        xs,
        [ys_1 ys_300 ys_5000 ys_50000],
        label=["T = 1 K" "T = 300 K" "T = 5,000 K" "T = 50,000 K"],
        xlabel="Energy (eV)",
        ylabel=L"Z p_i",
        linewidth=3,
    )

You can see how, unless you’re at super-high temperatures, the relative probability remains near-zero for virtually all energy levels. It’s also apparent that (as we expected) the low-energy states are always more likely than high-energy states. More specifically, the lowest-energy state (here, \(\varepsilon = 0 \text{eV}\)) is the most probable state.

More On Partition Functions

Partition functions are kind of a big deal, and they’re something that we’ll return to throughout the rest of this module. It’s therefore important that we understand what the partition function represents and how we can use it to obtain thermodynamic insights.

We’ve already seen the most basic application of the partition function: in probability distributions. The reason it’s called the “partition function” is because it defines how probability is partitioned among the different available states.

But the partition function is so much more than just a normalization factor in a probability expression. It’s more like a skeleton key or a multi-tool (e.g., Swiss army knife) that we can use to unlock other expressions of interest.

From our original expression for the partition function (i.e., \(Z = exp(\frac{-A}{k_B T})\)), we can see another important expression:

\[ A = -k_B T~ln(Z) \]

And, from this, we can arrive at a bunch of other useful properties. For instance, we know that \(S = -(\frac{\partial A}{\partial T})_{V,N}\), so we can derive the expression above to get the entropy in the canonical ensemble (I’ve left this as an exercise for you – see below).

We can also derive some other useful expressions:

\[ U = \sum_i \varepsilon_i p_i = - \frac{\partial ln(Z)}{\partial (\frac{1}{k_B T})} \]

We know that

\[ - \frac{\partial ln(Z)}{\partial (\frac{1}{k_B T})} = -\frac{1}{Z} \frac{\partial Z}{\partial (\frac{1}{k_B T})}, \]

thanks to the chain rule. We can then directly insert and derive \(Z\):

\[ -\frac{1}{Z} \frac{\partial Z}{\partial (\frac{1}{k_B T})} = \frac{-1}{\sum_i exp(\frac{- \varepsilon_i}{k_B T})} (\sum_i -\varepsilon_i exp(\frac{- \varepsilon_i}{k_B T})), \]

where

\[ \frac{\partial}{\partial (\frac{1}{k_B T})} \sum_i exp(\frac{- \varepsilon_i}{k_B T}) = \sum_i exp(\frac{- \varepsilon_i}{k_B T}) (- \varepsilon_i) \]

Thus, our expression simplifies to

\[ \frac{\sum_i \varepsilon_i exp(\frac{-\varepsilon_i}{k_B T})}{\sum_i exp(\frac{- \varepsilon_i}{k_B T})} \]

which is indeed \(\sum_i \varepsilon_i p_i\).

Finally (for this lesson), we can look at the heat capacity. We know from good-old macroscopic thermodynamics that \(C_V = (\frac{\partial U}{\partial T})_{V,N}\). So, we can use the expression we just got for \(U\) and just derive it!

We first (again) apply the chain rule:

\[ \frac{\partial U}{\partial T} = \frac{\partial U}{\partial (\frac{1}{k_B T})} \frac{\partial (\frac{1}{k_B T})}{\partial T}. \]

We do this because our partition functions are generally expressed in terms of \(k_B T\), and more precisely because we’ve already expressed \(U\) above in terms of \(k_B T\).

Now we plug in our expression for \(U\) and evaluate the second term directly:

\[ \frac{\partial U}{\partial T} = - \frac{\partial^2 ln(Z)}{\partial (\frac{1}{k_B T})^2} (\frac{-1}{k_B T^2}). \]

We can then plug this into our expression for \(C_V\):

\[ C_V = \frac{1}{k_B T^2} \frac{\partial^2 ln(Z)}{\partial (\frac{1}{k_B T})^2}. \]

Tada!

Try it yourself! Use the expressions and techniques that we walked through above to derive an expression for the entropy \(S\) in terms of the partition function and related derivatives (Note: If you’re having trouble, I do provide a derivation in the lecture slides).

(back to course home)