A step-by-step mathematical derivation of the Exponential Random Graph Model (ERGM). We start from a simple question — "what is the probability of observing this network?" — and arrive at a logistic regression form.
Prerequisites
Familiarity with basic network concepts (nodes, edges, adjacency matrix). Basic understanding of probability and logarithms. No prior ERGM knowledge needed.
Starting Point
What Do We Want?
We want to answer a fundamental question: what is the probability of observing a particular network \(y\)? That is, among all the networks that could exist with these nodes, why did we see this specific pattern of ties?
The Question
$$P(Y = y) = \;?$$
Goal: Build a model that assigns higher probability to networks whose structural patterns (reciprocity, triangles, etc.) match what we observe in real social systems.
Step 1
The Simplest Idea
If every tie in the network were independent and random, every possible network would be equally likely:
Uniform Distribution
$$P(Y = y) = C \quad \text{(some constant)}$$
Problem: This is too simple — it tells us nothing about network structure. Real networks are not random: friendships cluster, advice flows to experts, and ties tend to be reciprocated. We need a model that captures these patterns.
Step 2
Network Structure Matters
We want to say: certain patterns (reciprocity, triangles) make a network more likely to appear. So we define a "score" for any network:
\(g_1(y)\) = number of reciprocal pairs in the network
\(g_2(y)\) = number of triangles in the network
\(\theta_1, \theta_2\) = weights (parameters) for each pattern
Intuition: Higher score → this network is more likely to be observed. A positive \(\theta_1\) means reciprocity makes a network more probable; a negative one means reciprocity is suppressed.
Step 3
Turn "Score" into "Probability"
Probabilities must be positive, but our score can be any real number. The exponential function solves this — it maps any number to a positive value:
Note: The symbol \(\propto\) means "proportional to" — this is not a true probability yet because all probabilities must sum to 1. We fix that in the next step.
Step 4
Make Probabilities Sum to 1
We divide by a normalizing constant \(\kappa\) — the sum of the exponentiated scores over every possible network:
The Full ERGM Formula
$$\boxed{\;P(Y = y) = \frac{1}{\kappa}\,\exp\!\Big(\theta_1 \cdot g_1(y) + \theta_2 \cdot g_2(y)\Big)\;}$$
The catch: For a network with \(n\) nodes, there are \(2^{n(n-1)}\) possible directed networks. Even with just 20 nodes, that's over \(10^{114}\) networks. Computing \(\kappa\) by brute force is impossible. But — as we'll see — it cancels out!
Step 5
Zoom In on a Single Tie
Instead of asking about the entire network, ask a more practical question: what is the probability that the tie Alice → Bob exists?
Define two network states:
\(y^+_{ij}\) = the network with the tie Alice → Bob
\(y^-_{ij}\) = the network without the tie Alice → Bob
The odds of this tie existing (holding all other ties fixed):
Odds of a Single Tie
$$\frac{P(Y = y^+_{ij})}{P(Y = y^-_{ij})} = \frac{\;\dfrac{1}{\kappa}\exp\!\Big(\theta_1 g_1(y^+) + \theta_2 g_2(y^+)\Big)\;}{\;\dfrac{1}{\kappa}\exp\!\Big(\theta_1 g_1(y^-) + \theta_2 g_2(y^-)\Big)\;}$$
Step 6
\(\kappa\) Cancels Out
\(\kappa\) appears in both the numerator and denominator, so it disappears:
In words: \(\Delta g_1\) answers "how much does the reciprocity count change when we add this one tie?" For example, if Bob already has a tie to Alice, then adding Alice → Bob creates one new reciprocal pair, so \(\Delta g_1 = 1\).
This is exactly the form of logistic regression. The log-odds of a tie existing is a linear combination of the change statistics, weighted by the model parameters.
Interpretation: Each \(\theta_k\) is a log-odds ratio, just like in logistic regression. For instance, if \(\theta_1 = 0.8\), then a tie that creates one new reciprocal pair has \(e^{0.8} \approx 2.2\times\) higher odds of existing.
The key insight: \(\kappa\) is computationally intractable — but it cancels naturally in the derivation. This is why we can estimate \(\theta\) using MCMC (Markov Chain Monte Carlo) without ever computing \(\kappa\) directly.