Homework 4: Nonlinear Least Squares

1. Van der Waals Equation of State

The relationship between pressure (\(P\)), volume (\(V\)), and temperature (\(T\)) is modeled as:

\left(P + \frac{a}{V^2}\right)(V - b) = RT \implies P(V) = \frac{RT}{V-b} - \frac{a}{V^2}

We need to find the parameters \(a\) and \(b\) that minimize the sum of squared residuals between the measured data and the model.

2. Mathematical Ingredients

Model:

P_i(a,b) = \frac{RT}{V_i - b} - \frac{a}{V_i^2}

Residual:

r_i(a,b) = P_i(a,b) - P_i^{\text{obs}}

Jacobian entries:

\frac{\partial r_i}{\partial a} = -\frac{1}{V_i^2} \quad \text{and} \quad \frac{\partial r_i}{\partial b} = \frac{RT}{(V_i - b)^2}

So each row of the Jacobian matrix \( J \) is:

J_i = \left[ -\frac{1}{V_i^2} \quad \frac{RT}{(V_i - b)^2} \right]

Then LM forms:

J^T J = \begin{bmatrix} \sum J_{i1}^2 & \sum J_{i1}J_{i2} \\ \sum J_{i1}J_{i2} & \sum J_{i2}^2 \end{bmatrix}

J^T r = \begin{bmatrix} \sum J_{i1}r_i \\ \sum J_{i2}r_i \end{bmatrix}

3. The Levenberg-Marquardt Algorithm

Your original code likely used first-order gradient descent (\( a \leftarrow a - \eta \frac{\partial J}{\partial a} \) , \( b \leftarrow b - \eta \frac{\partial J}{\partial b} \)). The Levenberg-Marquardt (LM) version instead solves, at each iteration:

(J^T J + \lambda I) \Delta = -J^T r

Then updates the parameter vector \( \theta = \begin{bmatrix} a \\ b \end{bmatrix} \):

\theta_{\text{new}} = \theta + \Delta

\( r_i = P_{\text{calc},i} - P_{\text{obs},i} \) is the residual.
\( J \) is the Jacobian of residuals with respect to \( a, b \).
\( \lambda \) is the damping parameter.

Why LM is better here?

Your model is highly nonlinear in \( b \) because of the term:

\frac{1}{V - b}

This can become stiff when \( b \) gets close to the smallest volume. LM is usually much more stable than plain gradient descent because it uses local curvature information through \( J^T J \), while still damping risky steps.

Subtle point: Strictly speaking, classical LM often uses \( J^T J + \lambda \text{diag}(J^T J) \). The code below uses the simpler and common form with \( I \), which is still a valid LM-style implementation.

Algorithm Steps

Computes residuals \( r \)
Computes Jacobian \( J \)
Forms \( J^T J \) and \( J^T r \)
Solves for \( \Delta \) from \( (J^T J + \lambda I)\Delta = -J^T r \)
Tests whether the step improved the fit
Decreases \( \lambda \) if successful, increases it if not

So:
• small \( \lambda \rightarrow \) behaves more like Gauss-Newton
• large \( \lambda \rightarrow \) behaves more like cautious gradient descent
That is the key LM idea.

4. V vs. P Visualization

Fitted Parameter a (L² bar / mol²)

--

Fitted Parameter b (L / mol)