Theoretical Modeling

Computational Social Science · Foundations 04

Notes from ICPSR 2024 "Theoretical Modeling for the Social and Behavioral Sciences", University of Michigan · Prof. Paul Smaldino

笔记整理自 ICPSR 2024「Theoretical Modeling for the Social and Behavioral Sciences」, University of Michigan · Prof. Paul Smaldino

Theoretical modeling is the art of building simplified, formal representations of social systems to understand how they work. Instead of describing every detail of reality, we identify the essential parts—agents, rules, environments—and explore what happens when these simple pieces interact. The results are often surprising: racial segregation can emerge from mild preferences for similar neighbors, cooperation can evolve among selfish agents, and simple exchange rules inevitably produce inequality. This tree organizes key concepts from the course: What is a model and why build one? (foundations), How do agents interact strategically? (game theory), How does behavior spread and evolve? (dynamics), and How are agents connected? (networks). Click any node to explore.

理论建模是一门构建社会系统简化形式化表示的艺术，目的是理解这些系统如何运作。我们不去描述现实的每一个细节，而是识别出关键的组成部分——行为者、规则、环境——然后探索这些简单元素互动后会产生什么结果。结果往往出人意料：温和的邻居偏好就能导致种族隔离，自私的个体之间也能演化出合作，简单的交换规则必然产生不平等。这棵知识树组织了课程的核心概念：什么是模型，为什么要建模？（基础）、行为者如何进行策略互动？（博弈论）、行为如何传播和演化？（动态过程）、行为者如何连接？（网络）。点击任意节点即可展开。

New to modeling? Don't worry — this course assumes no prior modeling experience. A "model" is simply a set of assumptions written down precisely (as math equations or computer code) so we can explore their logical consequences. Think of it like a thought experiment you can actually run. The course uses NetLogo, a free tool where you can build and see simulations visually. Math needed: basic algebra, functions, and probability. Programming needed: variables, loops, and conditionals.

建模新手？没关系——这门课不需要你有任何建模经验。"模型"不过是一组被精确写下来的假设（用数学公式或计算机代码），这样我们就能探索这些假设的逻辑后果。把它想象成一个你可以实际运行的思想实验。课程使用 NetLogo，一个免费的可视化仿真工具。数学要求：基础代数、函数和概率。编程要求：变量、循环和条件判断。

Click any node to expand / collapse

点击任意节点展开 / 收起

▶ Q1 What Is a Model & Why Build One?什么是模型，为什么要建模？ Foundations基础

▶ M Types of Models模型的类型

Mathematical vs Agent-Based Models数学模型 vs 基于代理的模型

Mathematical models use equations to specify relationships between variables and temporal dynamics. They can sometimes be solved precisely. Agent-based models simulate individuals as explicit computational entities, allowing for greater heterogeneity and structure. These are complementary, not competing approaches!

Equation-based = top-down summary of a system; agent-based = bottom-up simulation of individual behaviors. Neither is "better" — they answer different questions.

Equation-Based vs Agent-Based: Key Trade-offs

Equation-Based (Mathematical)

Can derive exact solutions (sometimes)
Fast to simulate; easy for analytical insight
Assumes homogeneous agents and regular structure
Hard to include spatial effects, heterogeneity
Better for: predicting population-level dynamics

Agent-Based (Computational)

Explicitly simulates each individual
Natural way to include heterogeneity, networks, geography
Hard to understand analytically (need simulation + analysis)
Computationally expensive for large populations
Better for: exploring emergence, heterogeneous interactions

Granularity spectrum: You can also think about fine-grained vs coarse-grained models. A fine-grained model tracks every detail; a coarse-grained model averages over many details. Coarse-grained is faster but loses information; fine-grained preserves detail but becomes intractable. The art is choosing the right granularity for your question.

Popular ABM platforms beyond NetLogo: Java (MASON — fast, scalable), Python (Mesa — modern, well-documented), Julia (Agents.jl — blazingly fast for large simulations), C++ (if you need maximum performance). NetLogo is beginner-friendly; others suit large-scale or production work.

Examples: Lotka-Volterra predator-prey (equation-based: elegant closed form, but assumes homogeneous mixing) vs Boids flocking (agent-based: each bird follows simple local rules, but collective behavior only emerges from simulation).

数学模型使用方程来描述变量之间的关系和时间动态，有时可以精确求解。基于代理的模型将个体模拟为明确的计算实体，允许更大的异质性和结构。这两种方法是互补的，不是竞争关系！

方程模型 = 系统的自上而下总结；代理模型 = 个体行为的自下而上模拟。没有哪个更"好"——它们回答不同的问题。

方程模型 vs 代理模型：关键权衡

方程模型（数学）

可以推导精确解（有时）
模拟快速；易于分析洞察
假设同质代理和规则结构
难以包含空间效应、异质性
最适合：预测群体级动态

代理模型（计算）

显式模拟每个个体
自然地包含异质性、网络、地理
难以分析理解（需要模拟 + 分析）
大规模群体在计算上昂贵
最适合：探索涌现、异质互动

粒度谱：你也可以考虑细粒度与粗粒度模型。细粒度模型追踪每一个细节；粗粒度模型平均许多细节。粗粒度更快但损失信息；细粒度保留细节但变得不可处理。艺术在于为你的问题选择恰当的粒度。

除 NetLogo 外的流行 ABM 平台：Java（MASON——快速、可扩展）、Python（Mesa——现代、文档完善）、Julia（Agents.jl——大规模模拟极快）、C++（需要最高性能时）。NetLogo 对初学者友好；其他适合大规模或生产工作。

例子：Lotka-Volterra 捕食者-猎物模型（方程模型：优雅的闭式解，但假设同质混合）vs Boids 群集（代理模型：每只鸟遵循简单局部规则，但集体行为只在模拟中涌现）。

▶ B Building Blocks of a Model模型的构建要素

Parts, Initialization, Dynamics, Outcomes组件、初始化、动态、结果

Every model needs four things: Parts (agents and their environment, with attributes, behaviors, and relationships), Initialization (what does the world look like at the start?), Dynamics (how does the system change from one moment to the next?), and Outcomes (what are we measuring?). The ODD protocol (Overview, Design, Details) is the standard for describing agent-based models.

The Four Building Blocks in Detail

Parts: What entities exist? (e.g., turtles = agents, patches = environment). What attributes do they have? (e.g., wealth, location, opinion). What behaviors do they perform? (e.g., move, reproduce, consume). Are there relationships? (e.g., networks, hierarchies).
Initialization: What is the starting state? Random? Loaded from data? Deterministic? Include parameter choices here—what ranges do you explore?
Dynamics: What happens each time step? In what order do agents act? (This is scheduling—crucial for emergent behavior.) Do agents update synchronously (all at once) or asynchronously (one at a time, in random order)?
Outcomes: What statistics or patterns do you measure and track over time? What counts as a "result"?

The ODD Protocol (Overview, Design, Details)

This standardized format ensures you document your model completely:

Overview: Purpose, entities, variables, scales (how many agents? how long?)
Design: Principles, scheduling, how agents make decisions
Details: Full algorithmic description, parameter values, initialization

Using ODD makes your model reproducible—others can implement it independently and verify your results.

Simple Economy Example (Boltzmann-Gibbs Distribution): 500 turtles, each starts with \$100. Each time step: give \$1 to a random other turtle. Measuring: wealth distribution over time. Surprising result: after ~10,000 steps, wealth becomes exponentially distributed, matching the Boltzmann-Gibbs law from statistical physics! This emerges from just one simple rule—no external inequality is needed, entropy drives it. This illustrates emergence and why simulation matters: the math is non-obvious.

Scheduling matters: if all agents update synchronously, you might see oscillations; if asynchronously (random order), the system smooths out differently. Simple choice, big consequences.

每个模型需要四个要素：组件（行为者及其环境，包括属性、行为和关系）、初始化（模型开始时世界是什么样的？）、动态（系统如何从一个时刻变化到下一个？）和结果（我们在测量什么？）。ODD 协议（概览、设计、细节）是描述基于代理模型的标准。

四个构建块的详细说明

组件：存在哪些实体？（例如，turtles = 代理，patches = 环境）。它们有什么属性？（例如，财富、位置、意见）。它们执行什么行为？（例如，移动、繁殖、消费）。有关系吗？（例如，网络、等级制度）。
初始化：起始状态是什么？随机？从数据加载？确定性？在这里包括参数选择——你探索什么范围？
动态：每个时间步发生什么？代理以什么顺序行动？（这就是调度——对涌现行为至关重要。）代理同步更新（全部一次）还是异步更新（一次一个，随机顺序）？
结果：你测量和追踪哪些统计数据或模式？什么算作"结果"？

ODD 协议（概览、设计、细节）

这种标准化格式确保你完整记录模型：

概览：目的、实体、变量、尺度（多少个代理？多久？）
设计：原则、调度、代理如何做决策
细节：完整的算法描述、参数值、初始化

使用 ODD 使你的模型可复现——他人可以独立实现它并验证你的结果。

简单经济学例子（Boltzmann-Gibbs 分布）： 500 只 turtles，每只初始 \$100。每个时间步：给随机另一只 \$1。测量：随时间的财富分布。惊人结果：约 10,000 步后，财富变成指数分布，匹配统计物理中的 Boltzmann-Gibbs 定律！这仅从一条简单规则涌现——不需要外部不平等，熵驱动它。这说明涌现为什么重要：数学推导不明显。

调度很重要：如果所有代理同步更新，你可能看到振荡；如果异步（随机顺序），系统平滑不同。简单选择，巨大后果。

▶ P Programming Basics & NetLogo编程基础与 NetLogo

NetLogo Components, Variables, Functions, Control FlowNetLogo 组件、变量、函数、控制流

NetLogo is a free agent-based modeling language. Understanding its core data structures is essential:

Core Components

Patches: Stationary grid cells (the environment). Each patch at a unique (x, y) location. Patches can have state (e.g., color, pheromone level, resource). One patch per grid location.
Turtles: Mobile agents. Can move, die, reproduce, change state. Possess location, direction, and attributes. Can form networks via links.
Links: Connections between turtles. Can be directed (one-way, e.g., "follows") or undirected (mutual). Enable network structures.
Observer: Watches the world, runs the global simulation loop. Can initialize and measure outcomes.

Space and Topology

Toroidal (wraparound): Edges wrap — a turtle at x=100 moving right appears at x=-100. Good for avoiding boundary effects.
Bounded: World has hard edges; turtles can't move beyond. Good for modeling real geographical boundaries.

Variables and Scope

Global variables: accessible by all agents and observer. Declare: globals [wealth-total wealth-average]
Agent-level variables (properties): unique per turtle or patch. Declare: turtles-own [money opinion]
Local variables: temporary, exist only in a procedure. Scope is limited.
Parameters: global variables you adjust for each run (e.g., population-size, cooperation-probability). Often displayed as sliders in the interface.

Procedures and Reporters

Procedures: routines that perform actions. E.g., to move end. Executed by agents or observer.
Reporters: functions that return a value. E.g., to-report count-neighbors ... end. Used to compute statistics.

Control Flow

Loops:
- repeat n [ ... ] — repeat block n times
- loop [ ... ] — infinite loop (use break to exit)
- foreach list [ ... ] — iterate over list
- ask turtles [ ... ] — parallel loop: each turtle executes block
Conditionals:
- if condition [ ... ] — execute if true
- ifelse condition [ ... ] [ ... ] — if-else
- ifelse-value condition [ value-if-true ] [ value-if-false ] — return a value conditionally

Primitives and the User Manual: NetLogo has ~300 built-in functions (primitives) for math, list operations, string handling, random number generation, network operations, etc. The official User Manual is your best resource—it's well-indexed and includes examples for nearly every primitive.

Minimal Example: To simulate flocking, each turtle (bird) looks at nearby neighbors, computes average position and direction, and moves toward them. Simple rules → collective motion. The loops ask agents in parallel, conditionals check if neighbors are in range.

NetLogo 是一个免费的基于代理的建模语言。理解其核心数据结构是必要的：

核心组件

Patches（补丁）：固定网格单元格（环境）。每个补丁在唯一的 (x, y) 位置。补丁可以有状态（例如，颜色、信息素水平、资源）。每个网格位置一个补丁。
Turtles（乌龟）：移动代理。可以移动、死亡、繁殖、改变状态。拥有位置、方向和属性。可以通过链接形成网络。
Links（链接）：乌龟之间的连接。可以是有向的（单向，例如"跟随"）或无向的（相互的）。实现网络结构。
Observer（观察者）：监视世界，运行全球模拟循环。可以初始化和测量结果。

空间和拓扑

环形（围绕）：边缘环绕——位于 x=100 的乌龟向右移动会出现在 x=-100。好处是避免边界效应。
有界：世界有硬边界；乌龟不能移动超出。适合模拟真实地理边界。

变量和作用域

全局变量：所有代理和观察者都可以访问。声明：globals [wealth-total wealth-average]
代理级变量（属性）：每个乌龟或补丁独有。声明：turtles-own [money opinion]
本地变量：临时的，仅在程序中存在。作用域受限。
参数：全局变量，每次运行调整（例如，population-size, cooperation-probability）。通常显示为界面中的滑块。

程序和报告器

程序：执行操作的例程。例如，to move end。由代理或观察者执行。
报告器：返回值的函数。例如，to-report count-neighbors ... end。用于计算统计数据。

控制流

循环：
- repeat n [ ... ] — 重复块 n 次
- loop [ ... ] — 无限循环（使用 break 退出）
- foreach list [ ... ] — 遍历列表
- ask turtles [ ... ] — 并行循环：每只乌龟执行块
条件语句：
- if condition [ ... ] — 如果真则执行
- ifelse condition [ ... ] [ ... ] — if-else
- ifelse-value condition [ value-if-true ] [ value-if-false ] — 条件返回值

原语和用户手册： NetLogo 有约 300 个内置函数（原语）用于数学、列表操作、字符串处理、随机数生成、网络操作等。官方用户手册是你最好的资源——索引清晰，几乎每个原语都有例子。

最小例子：要模拟群集，每只乌龟（鸟）查看附近邻居，计算平均位置和方向，向其移动。简单规则 → 集体运动。循环并行要求代理，条件语句检查邻居是否在范围内。

▶ S Segregation & Simulation (Schelling)隔离与模拟（谢林模型）

Schelling's Segregation Model谢林隔离模型

Thomas Schelling (1971) showed that even mild individual preferences for similar neighbors — not organized discrimination — can produce extreme segregation at the population level. The model: agents of two types on a grid, each with a tolerance threshold for local minority status. If unhappy, they move. The stunning result: even when every agent would be happy with 33% similar neighbors, the outcome is near-complete segregation. This is emergence — macro-level patterns arising from micro-level interactions that no individual intended.

Tolerance threshold: if fraction of similar neighbors ≥ τ → happy; else → move

Neighborhood Definition: Moore vs Von Neumann

Moore neighborhood: 8 cells around your location (including diagonals). More realistic for social interaction; agents see more.
Von Neumann neighborhood: 4 cells (up, down, left, right only). More restrictive; agents see fewer neighbors.
Radius: you can extend to "radius-2" (e.g., 24 cells) or larger. Bigger radius = more neighbors to consider = different dynamics.

Key Model Parameters and Results

Density: fraction of occupied cells. Lower density = more empty space = easier to find a compatible patch. Counterintuitive: lower density paradoxically leads to MORE segregation (less interaction opportunity).
Similarity threshold (τ): What fraction of neighbors must be similar for happiness? Even τ = 0.30 (30% similar) produces strong segregation. At τ = 0.14, some segregation emerges; below that, mixed neighborhoods persist.
Results: Schelling's key finding: segregation is much more extreme than anyone's preferences would predict. A benign preference (like, "I want ~50% of my neighbors to be like me") produces near-total spatial separation.

Batch Runs and Parameter Sweeps

To understand a model, you don't run it once. You run it many times with different parameters:

Batch runs: Repeat the same parameter set multiple times (accounting for randomness). Example: run segregation model 10 times with τ = 0.3, same random seed. Measure average final segregation.
Parameter sweep: Systematically vary one or more parameters. Example: τ from 0.1 to 0.9 in steps of 0.1, for each value run 10 times, measure segregation level. Plot: y-axis = final segregation, x-axis = threshold τ.
Curse of dimensionality: if you have many parameters, the parameter space explodes. Example: 5 parameters × 10 values each = 100,000 runs needed to explore the space exhaustively. Often you focus on a few key parameters or use optimization/sensitivity analysis.

Variant: Contented Loners. Some agents (say 10%) are "contented loners"—they're happy anywhere, don't care about neighbors. Surprising result: the presence of even a small fraction of loners can dramatically reduce segregation, because they're willing to live in mixed neighborhoods and "anchor" them. This shows how subtle individual differences can change macro-level outcomes.

Schelling's model is like a mirror: it doesn't require intentional discrimination to produce segregation, just individual choices that seem innocuous. The surprising gap between micro intentions and macro outcomes is what makes agent-based models valuable.

托马斯·谢林（1971）证明，即使是温和的个人偏好——而非有组织的歧视——也能在群体层面产生极端的隔离。模型：两种类型的代理在网格上，每个都有对本地少数群体状态的容忍阈值。如果不满意就搬家。惊人的结果：即使每个代理在33%相似邻居时就满足了，最终结果仍然是近乎完全的隔离。这就是涌现——个体层面的互动产生了没有任何个体预期的宏观模式。

容忍阈值：如果相似邻居比例 ≥ τ → 满意；否则 → 搬家

邻域定义：Moore vs Von Neumann

Moore 邻域：你位置周围的 8 个单元格（包括对角线）。对社会互动更现实；代理看到更多。
Von Neumann 邻域：4 个单元格（仅上下左右）。更受限；代理看到更少邻居。
半径：你可以扩展到"半径-2"（例如 24 个单元格）或更大。更大半径 = 更多邻居考虑 = 不同动态。

关键模型参数和结果

密度：被占据单元格的比例。低密度 = 更多空地 = 更容易找到兼容补丁。反直觉：低密度悖论上导致更多隔离（更少互动机会）。
相似度阈值 (τ)：多少比例的邻居必须相似才能满意？即使 τ = 0.30（30% 相似）也产生强隔离。在 τ = 0.14 时，出现一些隔离；低于此，混合邻域持续。
结果：谢林的关键发现：隔离比任何人的偏好预测都极端得多。温和偏好（例如，"我希望 ~50% 的邻居像我"）导致近乎完全的空间分离。

批量运行和参数扫描

要理解一个模型，你不能只运行一次。你用不同参数多次运行：

批量运行：用相同参数集重复多次（考虑随机性）。例如：用 τ = 0.3、相同随机种子运行隔离模型 10 次。测量平均最终隔离。
参数扫描：系统地变化一个或多个参数。例如：τ 从 0.1 到 0.9，步长 0.1，每个值运行 10 次，测量隔离水平。绘图：y 轴 = 最终隔离，x 轴 = 阈值 τ。
维度诅咒：如果有很多参数，参数空间爆炸。例如：5 个参数 × 10 个值 = 需要 100,000 次运行来穷尽探索。通常你关注少数关键参数或使用优化/敏感性分析。

变体：满足的独居者。一些代理（比如 10%）是"满足的独居者"——他们在任何地方都满意，不在意邻居。惊人结果：即使只有少量独居者，也能显著减少隔离，因为他们愿意住在混合邻域并"锚定"它们。这表明微妙的个体差异如何改变宏观结果。

谢林的模型就像一面镜子：它不需要有意的歧视来产生隔离，只需要看似无害的个人选择。微观意图和宏观结果之间的惊人差距是什么使基于代理的模型有价值。

▶ A Cellular Automata元胞自动机

1-D & 2-D CA, Rules, Emergence, Game of Life一维和二维元胞自动机、规则、涌现、生命游戏

Cellular automata (CA) are grids of cells where each cell has a state (e.g., 0 or 1, dead or alive, color), and cells update synchronously based on the states of their neighbors. The simplicity is deceptive: CA can generate stunning complexity from trivial rules.

One-Dimensional Cellular Automata

Simplest case: a row of cells, each state 0 or 1. New state depends on cell and two neighbors. There are 2³ = 8 possible neighborhoods, so 2⁸ = 256 possible rules. Stephen Wolfram catalogued them:

Rule 90: XOR rule (new state = left XOR right). Produces Sierpinski triangle patterns—fractal geometry from iterations. Simple rule, complex pattern.
Rule 30: A "chaotic" rule. Produces intricate, hard-to-predict patterns. Used in cryptography. Random-looking despite deterministic origins.

Conway's Game of Life (2-D)

John Conway (1970). A 2-D grid where each cell is alive or dead. Each tick, cell state updates based on neighbor count:

- Underpopulation: <2 live neighbors → cell dies
- Survival: 2–3 live neighbors → cell survives (or stays alive)
- Overpopulation: >3 live neighbors → cell dies
- Reproduction: exactly 3 live neighbors → dead cell becomes alive

Remarkable emergent patterns:

Still lifes: stable configurations (e.g., blocks, beehives). Don't change.
Oscillators: patterns that cycle (e.g., blinkers cycle every 2 steps). Low period.
Spaceships: patterns that move across the grid. Most famous: the glider, a 5-cell pattern that moves diagonally and repeats. Gliders can "collide" and interact—Life is essentially a universal computer.

Why Game of Life matters: It demonstrates that simple, deterministic, local rules can generate elaborate behaviors—growth, decay, interaction, apparent "goals." The rules are non-chaotic for some initial conditions but chaotic for others, showing sensitivity to initial conditions.

Emergence in Cellular Automata

CA beautifully illustrate emergence: local rules don't mention anything about "patterns," "gliders," or "fractals," yet these phenomena arise inevitably. This is the core of why ABM is powerful—rules at the micro level produce macro-level phenomena you didn't explicitly program. The patterns exist in the possibility space, waiting to unfold.

CA is like watching sand pile up grain by grain. Each grain follows gravity (a local rule), but pile size, shape, and avalanche dynamics emerge naturally—you never told any grain about piles or avalanches.

Practical use: CA models have been used to simulate traffic flow, forest fires, epidemics, crystal growth. The discrete time steps and local neighborhood logic fit naturally. Wolfram's 1-D CA analysis revealed a spectrum: some rules converge to fixed patterns (class 1), some to oscillations (class 2), some are chaotic (class 3), some are complex and hard to predict (class 4). Class 4 is "the edge of chaos"—where computation-universal systems live.

元胞自动机 (CA) 是网格化单元格，每个单元格有一个状态（例如，0 或 1，死或活，颜色），单元格基于邻居状态同步更新。简单性是欺骗性的：CA 可以从微不足道的规则生成惊人的复杂性。

一维元胞自动机

最简单的情况：一行单元格，每个状态 0 或 1。新状态取决于单元格和两个邻居。有 2³ = 8 个可能的邻域，所以 2⁸ = 256 个可能的规则。Stephen Wolfram 对其进行了分类：

规则 90：XOR 规则（新状态 = 左 XOR 右）。产生 Sierpinski 三角形模式——迭代中的分形几何。简单规则，复杂模式。
规则 30："混沌"规则。产生错综复杂、难以预测的模式。用于密码学。看起来随机，尽管源于确定性。

康威的生命游戏（2-D）

John Conway（1970）。二维网格，每个单元格活或死。每个时刻，单元格状态基于邻居计数更新：

- 人口不足：<2 个活邻居 → 单元格死亡
- 生存：2–3 个活邻居 → 单元格生存（或保持活）
- 人口过剩：>3 个活邻居 → 单元格死亡
- 繁殖：恰好 3 个活邻居 → 死单元格变活

显著涌现的模式：

静物：稳定配置（例如，方块、蜂房）。不改变。
振荡器：循环的模式（例如，闪烁每 2 步循环一次）。低周期。
宇宙飞船：跨越网格移动的模式。最著名：滑翔机，一个 5 单元格模式，斜向移动并重复。滑翔机可以"碰撞"并互动——生命本质上是通用计算机。

为什么生命游戏重要：它演示简单、确定性、局部规则可以生成复杂行为——增长、衰退、互动、明显的"目标"。规则对某些初始条件非混沌，对其他条件混沌，显示对初始条件的敏感性。

元胞自动机中的涌现

CA 优美地说明了涌现：局部规则不涉及"模式"、"滑翔机"或"分形"，但这些现象不可避免地出现。这是为什么 ABM 强大的核心——微观层规则产生你从未明确编程的宏观现象。模式存在于可能性空间中，等待展开。

CA 就像观看沙子逐粒堆积。每粒沙子遵循重力（局部规则），但堆大小、形状和雪崩动态自然涌现——你永远没告诉任何沙子关于堆或雪崩。

实际用途：CA 模型已用于模拟交通流、森林火灾、流行病、晶体生长。离散时间步和局部邻域逻辑自然适合。Wolfram 的一维 CA 分析揭示了频谱：一些规则收敛到固定模式（类 1），一些到振荡（类 2），一些混沌（类 3），一些复杂难预测（类 4）。类 4 是"混沌边缘"——通用计算系统所在的地方。

▶ Q2 How Do Agents Interact Strategically?行为者如何进行策略互动？ Game Theory博弈论

▶ G Classical Game Theory经典博弈论

Normal-Form Games, Nash Equilibrium, Dilemmas标准式博弈、纳什均衡、困境

Game theory studies strategic interaction: your best choice depends on what others do. A normal-form game has players, strategies, and payoffs. Nash Equilibrium (NE) is a set of strategies where no player can do better by changing unilaterally. The Prisoner's Dilemma shows why rational individuals might not cooperate even when mutual cooperation would be best: defection dominates regardless of what the other does. In coordination games (like driving on the left or right), multiple NE exist and the challenge is selecting one.

Prisoner's Dilemma: Payoff Matrix

Two players, each can Cooperate (C) or Defect (D). Payoffs: (row, column) in each cell.

Prisoner's Dilemma

	C	D
C	(3, 3)	(0, 5)
D	(5, 0)	(1, 1)

Dominant strategy: D. If opponent cooperates, you get 5 (vs 3). If opponent defects, you get 1 (vs 0). D is better in both cases.

Finding Nash Equilibrium

Step 1: Assume opponent plays C. Your best response? D (get 5 vs 3).

Step 2: Assume opponent plays D. Your best response? D (get 1 vs 0).

Conclusion: D is your best response to both moves. Same logic for opponent.

NE is (D, D): both defect, earn (1, 1). But (C, C) is better for both! This is the dilemma.

Coordination Games: Multiple Equilibria

Not all games have a unique NE. Battle of the Sexes: Couple wants to spend evening together but disagree on activity. Woman prefers opera (O), man prefers football (F).

Equilibrium 1: Both go to opera. (Woman gets high payoff, man low, but together is better than apart.)
Equilibrium 2: Both go to football. (Opposite.)
Equilibrium 3: Mixed strategy: woman goes to opera with probability p, football with probability 1−p; man chooses the opposite. If indifferent, p = 2/3.

Mixed Strategy Nash Equilibrium

Sometimes players randomize. Example: Stag Hunt. Two hunters can hunt Stag (risky, requires cooperation, payoff 4 each if both try) or Hare (safe, payoff 3 each, solo). If you hunt Stag and opponent hunts Hare, you starve (payoff 0). Pure NE: both hunt Hare (safe). But there's a mixed NE where players randomize, trying Stag sometimes and catching Hare otherwise.

Hawk-Dove Game

Two animals contest a resource (value V). Hawk: attacks; Dove: shares or flees.

Hawk vs Dove: Hawk gets V, Dove gets 0.
Hawk vs Hawk: fight, each gets (V − C)/2 where C = cost of injury. If C > V, both lose.
Dove vs Dove: split peacefully, each gets V/2.

If C > V (fighting is costly), no pure strategy is always best. NE is mixed: population plays Hawk with probability p, Dove with 1−p. Stable mix depends on C and V. If injury risk is high (C >> V), Doves dominate. If low (C << V), Hawks dominate. At intermediate risk, you get a mixed ESS where Hawks and Doves coexist.

Real example: Animal communication. Some individuals produce honest signals (costly), others cheat. In equilibrium, both strategies can persist if frequencies adjust: if signalers are rare, receivers trust them; if cheaters become common, signals devalue.

博弈论研究策略互动：你的最优选择取决于别人怎么做。标准式博弈包含玩家、策略和收益。纳什均衡是一组策略，其中没有玩家能通过单方面改变策略来获得更好的结果。囚徒困境展示了为什么理性个体可能不合作，即使互相合作对双方都最好：无论对方怎么做，背叛都是优势策略。在协调博弈中（比如靠左还是靠右开车），存在多个纳什均衡，挑战在于如何选择其中一个。

囚徒困境：收益矩阵

两个玩家，每个可以合作 (C) 或背叛 (D)。收益：每个单元格 (行, 列)。

囚徒困境

	C	D
C	(3, 3)	(0, 5)
D	(5, 0)	(1, 1)

优势策略： D。如果对手合作，你得 5（vs 3）。如果对手背叛，你得 1（vs 0）。D 在两种情况下都更好。

寻找纳什均衡

步骤 1：假设对手玩 C。你的最优应对？D（得 5 vs 3）。

步骤 2：假设对手玩 D。你的最优应对？D（得 1 vs 0）。

结论：D 是你对两个动作的最优应对。对手的逻辑相同。

NE 是 (D, D)：双方背叛，收益 (1, 1)。但 (C, C) 对双方都更好！这就是困境。

协调博弈：多个均衡

并非所有博弈都有唯一的 NE。夫妻的战争：夫妇想一起度过晚上，但对活动不同意。女性偏好歌剧 (O)，男性偏好足球 (F)。

均衡 1：都去歌剧。（女性得高收益，男性低，但在一起总比分开好。）
均衡 2：都去足球。（相反。）
均衡 3：混合策略：女性以概率 p 去歌剧，以概率 1−p 去足球；男性选相反。若无差别，p = 2/3。

混合策略纳什均衡

有时玩家随机化。例如：鹿狩。两个猎人可以猎鹿（风险高，需要合作，如果双方尝试收益 4）或兔子（安全，收益 3，独自）。如果你猎鹿而对手猎兔，你挨饿（收益 0）。纯 NE：都猎兔（安全）。但有混合 NE，玩家随机化，有时尝试鹿，有时捕捉兔子。

鹰鸽博弈

两只动物争夺资源（价值 V）。鹰：攻击；鸽：共享或逃离。

鹰 vs 鸽：鹰得 V，鸽得 0。
鹰 vs 鹰：争斗，各得 (V − C)/2，其中 C = 伤害代价。如果 C > V，双方都亏。
鸽 vs 鸽：和平共享，各得 V/2。

如果 C > V（争斗昂贵），没有纯策略总是最优。NE 是混合：群体以概率 p 玩鹰，以 1−p 玩鸽。稳定混合取决于 C 和 V。如果伤害风险高 (C >> V)，鸽占主导。如果低 (C << V)，鹰占主导。在中等风险，你得到混合 ESS，鹰和鸽共存。

真实例子：动物交流。一些个体产生诚实信号（昂贵），其他欺骗。在均衡，如果频率调整，两种策略都能持续：如果信号者稀有，接收者信任；如果欺骗者变普通，信号贬值。

▶ E Evolutionary Game Theory演化博弈论

Replicator Dynamics, ESS, Cooperation Evolution复制者方程、进化稳定策略、合作演化

Classical game theory assumes rational agents. Evolutionary game theory drops this assumption: instead, strategies that produce higher payoffs spread through the population (via reproduction or imitation). The replicator equation describes how strategy frequencies change over time. An Evolutionarily Stable Strategy (ESS) is a strategy that, once prevalent, cannot be invaded by a rare mutant. Cooperation can evolve through mechanisms like kin selection, reciprocity, group selection, and spatial structure.

Replicator equation: dx_i/dt = x_i(f_i - f̄), where f_i = average payoff of strategy i

Understanding the Replicator Equation

The replicator equation says: strategy i's growth rate is proportional to how much better it does than the population average. If f_i > f̄ (above average), x_i increases (spreads). If f_i < f̄ (below average), x_i decreases (dies out). The dynamics are driven by relative fitness not absolute payoff—even if all payoffs are positive, only above-average strategies grow.

Evolutionarily Stable Strategy (ESS)

An ESS is a strategy that cannot be invaded by a rare mutant playing a different strategy. Formally: if proportion (1−ε) play ESS and ε play invader, the ESS player does better. Example from Hawk-Dove:

If population is all Hawks (p=1): Hawk gets (V−C)/2 each; a rare Dove gets 0 (loses fights). Hawk is ESS if C > V.
If population is all Doves (p=0): Dove gets V/2; rare Hawk gets V (beats all Doves). Hawk invades, so Dove alone is not ESS if C < V.
Mixed ESS: At the right frequency p*, Hawks get same payoff as Doves. Population stabilizes; neither can invade because invaders do worse than residents.

Axelrod's Tournament: Evolution of Cooperation

Robert Axelrod (1984) ran a computational tournament of strategies playing Iterated Prisoner's Dilemma (repeat the game many times). Surprising result: Tit-for-Tat (cooperate first, then copy opponent's last move) beat sophisticated strategies. Why?

Nice: TfT never initiates defection.
Retaliatory: TfT punishes defectors immediately.
Forgiving: TfT returns to cooperation if opponent does.
Simple: easy to understand and predict.

Lesson: in repeated games, reputation and reciprocal punishment matter. Long shadows of the future (high iteration count) favor cooperation.

Mechanisms for Evolution of Cooperation

Kin selection (Hamilton's rule): Help relatives because they share genes. Rule: rb > c, where r = relatedness, b = benefit to recipient, c = cost to actor. Explains altruism toward kin.
Direct reciprocity: Repeat encounters with same partner. Remember history. Tit-for-Tat works here.
Spatial structure: Agents clustered geographically; cooperation spreads locally. Spatial games show that even in PD, cooperation clusters can thrive and outcompete defectors.
Group selection: Groups of cooperators outcompete groups of defectors, even if within groups, defectors do better. Controversial but supported in some models.
Indirect reciprocity & reputation: Help those with good reputations. Requires tracking information about others. Enables "upstream reciprocity" (help others, build reputation, receive later help).

Public goods game: N agents get endowment M. Each contributes c (0 to M) to a pool, which multiplies by factor r > 1, then is divided equally. Individual incentive: contribute 0 (free-ride), but if all do, nothing multiplies. Group incentive: all contribute. With punishment (costly but can fine free-riders), cooperation is sustained. The threat of punishment stabilizes cooperation.

经典博弈论假设行为者是理性的。演化博弈论放弃了这个假设：产生更高收益的策略会通过种群传播（通过繁殖或模仿）。复制者方程描述了策略频率如何随时间变化。演化稳定策略（ESS）是一种一旦占据主导就不会被稀有突变策略入侵的策略。合作可以通过亲缘选择、互惠、群体选择和空间结构等机制演化出来。

复制者方程：dx_i/dt = x_i(f_i - f̄)，其中 f_i = 策略 i 的平均收益

理解复制者方程

复制者方程说：策略 i 的增长率与它比群体平均好多少成正比。如果 f_i > f̄（高于平均），x_i 增加（传播）。如果 f_i < f̄（低于平均），x_i 减少（消亡）。动态由相对适应度驱动，而不是绝对收益——即使所有收益都是正，只有高于平均的策略增长。

演化稳定策略 (ESS)

ESS 是一种不能被玩不同策略的稀有突变入侵的策略。形式上：如果比例 (1−ε) 玩 ESS，ε 玩入侵者，ESS 玩家做得更好。来自鹰鸽的例子：

如果群体全是鹰 (p=1)：鹰各得 (V−C)/2；稀有鸽得 0（输得争斗）。如果 C > V，鹰是 ESS。
如果群体全是鸽 (p=0)：鸽得 V/2；稀有鹰得 V（击败所有鸽）。鹰入侵，所以如果 C < V，仅鸽不是 ESS。
混合 ESS：在正确频率 p*，鹰得与鸽相同收益。群体稳定；都不能入侵因为入侵者比居民做得更差。

Axelrod 的竞赛：合作演化

Robert Axelrod（1984）举办了一场计算竞赛，策略玩迭代囚徒困境（多次重复游戏）。惊人结果：以牙还牙（先合作，然后复制对手上一步）击败了复杂策略。为什么？

好的：TfT 从不发起背叛。
报复的：TfT 立即惩罚背叛者。
宽容的：如果对手合作，TfT 回归合作。
简单的：容易理解和预测。

教训：在重复游戏中，声誉和互惠惩罚很重要。长未来阴影（高迭代次数）有利于合作。

合作演化的机制

亲缘选择（Hamilton 规则）：帮助亲属因为他们共享基因。规则：rb > c，其中 r = 亲缘系数，b = 收益人收益，c = 施主成本。解释对亲属的利他。
直接互惠：与同一伙伴重复相遇。记住历史。以牙还牙在这里工作。
空间结构：代理在地理上集聚；合作本地传播。空间博弈显示即使在 PD 中，合作团簇也能茁壮并击败背叛者。
群体选择：合作者群体击败背叛者群体，即使在群体内背叛者做得更好。有争议但在一些模型中得到支持。
间接互惠与声誉：帮助声誉好的人。需要追踪他人信息。实现"上游互惠"（帮助他人，建立声誉，后来受到帮助）。

公共商品博弈：N 个代理得禀赋 M。每个贡献 c（0 到 M）到池，乘以因子 r > 1，然后平分。个体激励：贡献 0（搭便车），但如果都这样，什么都不乘。群体激励：都贡献。有惩罚（昂贵但能罚免费骑者），合作维持。惩罚的威胁稳定合作。

▶ C Costly Signaling代价信号

Signaling Theory and Honest Signals信号论与诚实信号

How can you trust someone's claims about their qualities? Costly signaling theory (Zahavi, Spence) shows that signals can be honest when they are expensive to fake. A peacock's tail is costly: only genuinely fit males can afford one. Similarly, a college degree signals ability partly because it's costly to obtain. The key insight: the cost of the signal must be differentially higher for low-quality signalers.

Spence's Job Market Signaling Model

The Problem: Employers can't observe worker ability directly. Workers claim to be "highly productive" but employers don't know if they're lying. If employers can't distinguish, they offer the same wage to everyone (pooling). High-ability workers lose (underpaid) and low-ability gain (overpaid).

The Solution: High-ability workers acquire education as a signal. Education is costly, but less costly for high-ability workers (they learn faster, enjoy school more). If the cost differential is large enough:

High-ability: education benefit > cost. Gets hired, earns high wage.
Low-ability: education cost > benefit (they'd struggle). Better not to signal; gets hired as low-ability, earns low wage.

In equilibrium, education is a separating signal: high-ability separates from low-ability by signaling. Employers observe education and infer ability correctly. The signal is honest not because it creates ability, but because it's costly enough that only high-ability would incur it.

Zahavi's Handicap Principle

Amotz Zahavi (1975) proposed: conspicuous, costly traits (handicaps) honestly signal quality. Why? Because only high-quality individuals can afford such handicaps. A peacock's tail (bright, large, hard to fly away from predators with) honestly signals genetic fitness—only healthy males can grow and maintain such expensive ornaments. The paradox resolved: the handicap is honest precisely because it's costly and wasteful. Low-quality males can't afford to waste that much.

Separating vs Pooling Equilibria

Separating Equilibrium

Different types (high/low ability) take different actions.
High-ability signals; low-ability doesn't.
Receivers can tell types apart.
Signal is costly enough to discourage imitation.
Example: college degree distinguishes productive from unproductive workers.

Pooling Equilibrium

All types take the same action (no signaling).
Receivers can't distinguish types.
Wage/price reflects average quality.
High-quality lose (underpaid).
Example: no one buys a degree because employers can't use it to infer ability.

Multiple equilibria can coexist: If a signal's cost is intermediate, you might have separating and pooling equilibria. Which equilibrium occurs depends on history and expectations. This is why signaling markets can be fragile and subject to information cascades.

Real-world examples:

Potlatch ceremonies (Pacific Northwest tribes): Chiefs signal status by giving away wealth. The more you give, the higher your status. High-wealth chiefs afford this; low-wealth cannot.
Religious costly rituals (Iannaccone): Strict religious groups impose costly rules (dietary restrictions, time investment, distinctive dress). This filters for genuine believers; free-riders leave because costs are too high.
Academic publishing: Researchers signal expertise by publishing in top journals (costly: hard work, rejection, time). Signal quality is partly about actual knowledge but also about ability to navigate academic gatekeepers.

你怎么能相信别人关于自己品质的声明？代价信号理论（Zahavi, Spence）指出，当信号难以伪造时，它就是诚实的。孔雀的尾巴是昂贵的：只有真正健康的雄性才负担得起。类似地，大学学位之所以能传递能力信号，部分原因是它获取成本高。关键洞察：信号的成本对低质量发送者来说必须不成比例地更高。

Spence 的就业市场信号模型

问题：雇主不能直接观察工人能力。工人声称"高产能力"但雇主不知道他们是否说谎。如果雇主不能区分，他们对所有人提供相同工资（混合）。高能力工人亏（薪酬不足）且低能力得利（薪酬过多）。

解决方案：高能力工人获得教育作为信号。教育是昂贵的，但对高能力工人成本更低（他们学得快，更享受学校）。如果成本差异足够大：

高能力：教育利益 > 成本。被雇用，挣高工资。
低能力：教育成本 > 利益（他们会挣扎）。更好不信号；作为低能力被雇用，挣低工资。

在均衡，教育是分离信号：高能力通过信号与低能力分离。雇主观察教育并正确推断能力。信号是诚实的，不是因为它创造能力，而是因为只有高能力会承担它的成本。

Zahavi 的伤残原理

Amotz Zahavi（1975）提议：显眼的、昂贵的特征（伤残）诚实地信号质量。为什么？因为只有高质量个体才能承受这样的伤残。孔雀的尾巴（亮、大、难以逃离捕食者）诚实地信号遗传适应度——只有健康的雄性才能生长和维持这样昂贵的装饰。悖论解决：伤残诚实恰恰因为它是昂贵和浪费的。低质量雄性不能承受那么多浪费。

分离与混合均衡

分离均衡

不同类型（高/低能力）采取不同行动。
高能力信号；低能力不。
接收者可以区分类型。
信号成本足够高以阻止模仿。
例子：大学学位区分有生产力和无生产力工人。

混合均衡

所有类型采取相同行动（无信号）。
接收者不能区分类型。
工资/价格反映平均质量。
高质量亏（薪酬不足）。
例子：没人买学位因为雇主不能用它推断能力。

多个均衡可以共存：如果信号成本中等，你可能有分离和混合均衡。哪个均衡发生取决于历史和预期。这就是为什么信号市场是脆弱的，易受信息级联。

真实世界例子：

Potlatch 仪式（太平洋西北部落）：酋长通过赠送财富来信号地位。你赠送得越多，地位越高。高财富酋长能承受；低财富不能。
宗教昂贵仪式（Iannaccone）：严格宗教团体施加昂贵规则（饮食限制、时间投资、独特服装）。这过滤出真正信徒；搭便车者因成本太高而离开。
学术出版：研究者通过在顶级期刊发表来信号专业知识（昂贵：艰苦、拒稿、时间）。信号质量部分关于实际知识，也关于导航学术守门人的能力。

▶ Q3 How Does Behavior Spread & Evolve?行为如何传播和演化？ Dynamics动态过程

▶ C Contagion (SIR Models)传染（SIR 模型）

SIR Model, Basic Reproduction Number, Epidemic ThresholdSIR 模型、基本再生数、流行阈值

The SIR model (Kermack-McKendrick) divides a population into Susceptible, Infected, and Recovered. When susceptible individuals contact infected ones, they become infected with probability τ; infected individuals recover with rate γ. The basic reproduction number R₀ = τ/γ determines whether an epidemic occurs: R₀ > 1 means the disease spreads. Agent-based versions allow spatial structure: fast-moving agents cause faster spreading.

dS/dt = -τSI, dI/dt = τSI - γI, dR/dt = γI

SIR Model Variants

SIR (Susceptible → Infected → Recovered): Assumes permanent immunity after recovery (e.g., measles, chickenpox). Once you recover, you're protected forever.
SIS (Susceptible → Infected → Susceptible): No immunity. Recovered individuals become susceptible again (e.g., cold, flu). Disease persists at endemic equilibrium—neither dies out nor explodes.
SIRS (Susceptible → Infected → Recovered → Susceptible): Temporary immunity. Recovered slowly lose immunity, become susceptible again. Models waning immunity or immune escape (variant strains).
SEIR (add Exposed): Individuals who are infected but not yet infectious (incubation period). More realistic for diseases with latency.

Epidemic Threshold and R₀ (Basic Reproduction Number)

R₀ = expected number of secondary infections from one infected individual in a fully susceptible population. In a population of N individuals, if an infected person contacts others with rate k, transmission probability per contact is β, and they're infectious for 1/γ time:

R₀ = β × k × (1/γ)

Interpretation:

R₀ > 1: epidemic (exponential spread initially)
R₀ = 1: critical threshold (disease endemic at low level)
R₀ < 1: disease dies out

Example: COVID-19 early estimates: R₀ ≈ 2-3 (each infected infects 2-3 others). Flu: R₀ ≈ 1-2. Measles: R₀ ≈ 12-18 (highly contagious). To prevent epidemic, vaccination must reduce susceptible fraction below 1 − 1/R₀ (e.g., for R₀ = 3, vaccinate 2/3 of population).

Agent-Based SIR with Spatial Structure

Mathematical SIR assumes homogeneous mixing (everyone meets everyone equally). Agent-based versions are more realistic:

Spatial proximity: Agents move on a grid; only infect nearby neighbors.
Network structure: Transmission only along explicit links (contact network).
Movement speed: Fast-moving agents spread disease further; slow agents contain it locally (quarantine-like effect).
Heterogeneous contact rates: Some agents meet more people (hubs); others are isolated. Spatial clustering can slow epidemic even if R₀ is high.

Continuous-time vs discrete-time: Mathematical models often use continuous differential equations (above). Agent-based models typically use discrete time steps (each tick, agents have chance to infect neighbors). Results can differ! Discrete models might overestimate spread if infection probability is per-tick instead of per-contact.

COVID-19 real-world application: Early models with R₀ ≈ 2.5 predicted exponential growth (doubling every ~3 days). But contact tracing, quarantine, and behavioral changes reduced effective contacts, lowering k in the R₀ formula. Vaccination + natural immunity reduced β. These public health interventions brought effective R below 1, controlling the epidemic. Non-disease example: social media virality. Information spreads like SIR: susceptible users who see a post become infected (exposed), share it to contacts (infectious), eventually the meme dies out (recovered). R₀ for viral content depends on engagement and algorithm amplification.

SIR 模型（Kermack-McKendrick）将群体分为易感者（S）、感染者（I）和康复者（R）。易感者接触感染者后以概率 τ 被感染；感染者以速率 γ 康复。基本再生数 R₀ = τ/γ 决定是否发生流行：R₀ > 1 意味着疾病传播。基于代理的版本允许空间结构：移动速度快的代理导致更快的传播。

dS/dt = -τSI, dI/dt = τSI - γI, dR/dt = γI

SIR 模型变体

SIR（易感 → 感染 → 康复）：假设康复后永久免疫（例如，麻疹、水痘）。一旦康复，你永远受保护。
SIS（易感 → 感染 → 易感）：无免疫。康复个体再次变易感（例如，感冒、流感）。疾病在地方性均衡持续——既不消亡也不爆炸。
SIRS（易感 → 感染 → 康复 → 易感）：临时免疫。康复逐渐失去免疫，再次变易感。模型免疫衰减或免疫逃逸（变异株）。
SEIR（加暴露）：已感染但尚非传染的个体（潜伏期）。对有潜伏期的疾病更现实。

流行阈值和 R₀（基本再生数）

R₀ = 在完全易感群体中，一个感染者产生的预期二次感染数。在 N 个体的群体中，如果感染者接触其他人速率 k，每次接触传播概率 β，他们的传染期是 1/γ：

R₀ = β × k × (1/γ)

解释：

R₀ > 1：流行（初期指数增长）
R₀ = 1：临界阈值（疾病在低水平地方性）
R₀ < 1：疾病消亡

例子：COVID-19 早期估计：R₀ ≈ 2-3（每个感染者感染 2-3 个）。流感：R₀ ≈ 1-2。麻疹：R₀ ≈ 12-18（高度传染性）。为防止流行，疫苗接种必须使易感比例低于 1 − 1/R₀（例如，R₀ = 3，接种 2/3 群体）。

空间结构的基于代理的 SIR

数学 SIR 假设同质混合（每个人平等与所有人相遇）。基于代理的版本更现实：

空间邻近：代理在网格上移动；仅感染附近邻居。
网络结构：传播仅沿显式链接（接触网络）。
移动速度：快速移动代理传播更远；慢速代理本地遏制（隔离效应）。
异质接触率：一些代理见更多人（枢纽）；其他孤立。空间聚类即使 R₀ 高也能减缓流行。

连续时间 vs 离散时间：数学模型常用连续微分方程（上面）。基于代理模型通常用离散时间步（每个时刻，代理有机会感染邻居）。结果可能不同！离散模型如果感染概率是每时刻而非每接触可能高估传播。

COVID-19 真实应用：早期模型有 R₀ ≈ 2.5 预测指数增长（每 ~3 天翻倍）。但接触追踪、隔离和行为改变减少了有效接触，降低了 R₀ 公式中的 k。疫苗接种 + 自然免疫降低了 β。这些公共卫生干预降低有效 R 低于 1，控制流行。非疾病例子：社交媒体病毒性。信息像 SIR 一样传播：看到帖子的易感用户变感染（暴露），与接触分享（传染），最终模因消亡（康复）。病毒内容的 R₀ 取决于参与度和算法放大。

▶ O Opinion Dynamics意见动态

Bounded Confidence, Voter Model, Polarization有界信任、投票模型、极化

How do opinions form and change? In the bounded confidence model (Hegselmann-Krause, Deffuant), agents only update their opinions when interacting with others whose opinions are sufficiently close. This simple mechanism produces clustering and can explain polarization: even starting from random opinions, groups of consensus emerge. The voter model is even simpler: copy a random neighbor's opinion. Despite its simplicity, it predicts how long it takes a population to reach consensus.

Bounded Confidence Models: Two Main Variants

Deffuant Model

Pairwise interaction: two random agents meet.
If |opinion_i − opinion_j| < ε (bounded confidence), they compromise: move toward each other by fraction μ (0 < μ ≤ 1).
If distance ≥ ε, no interaction.
Outcome: clusters form. Each cluster is a consensus group.
How many clusters? Depends on ε. Small ε → many clusters. Large ε → one cluster.

Hegselmann-Krause Model

All-to-all interaction: each agent updates by averaging all neighbors within distance ε.
Update: x_i(t+1) = mean({x_j : |x_i − x_j| < ε})
Parallel/simultaneous update (all agents update at once).
Converges faster than Deffuant to final clusters.
Critical ε: below it, multiple clusters; above it, single consensus.

The Bounded Confidence Parameter ε

ε = maximum opinion distance for interaction. If ε is small, you only talk to those nearly identical to you (echo chamber). If ε is large, you're willing to engage across diverse views. Mathematical result: there exists a critical ε* above which everyone converges to consensus, below which clusters persist. The exact value depends on initial opinion distribution.

Polarization dynamics: Start with opinions uniformly distributed [0, 1]. With ε = 0.2, clusters emerge (e.g., left, center, right). With ε = 0.5, consensus. The narrower your confidence bounds, the more polarized the outcome. This models the danger of echo chambers: if people only listen to similar views, society splinters.

The Voter Model

Simplest opinion dynamics: each agent has binary opinion (0 or 1). At each step, pick a random agent, then they copy a random neighbor's opinion. Remarkably:

Always converges to consensus (everyone agrees on 0 or 1).
Time to consensus: scales with population size N. For a graph, typically O(N²) or O(N log N) depending on network structure.
Which consensus? If starting opinion frequencies are p(0) for state 1, expected consensus is toward the initial majority.
Spatial voter model: agents on a grid update only with neighbors. Creates spatial clusters initially, slow coalescence.

Media Effects and External Bias

Real societies have media and external influencers. Modify bounded confidence model by adding:

Media node: all agents can "interact" with a media agent with fixed opinion (e.g., opinion = 0.8). Media broadcasts to all.
Outcome: instead of multiple clusters, all agents drift toward media opinion.
Filter bubbles: if media shows content aligned with your current view, ε effectively shrinks, increasing polarization.

Echo chambers and algorithms: Social media algorithms often recommend content similar to past views, creating artificial small ε. Users who see only confirming views radicalize faster (opinion moves further from center). This is a computational mechanism for polarization.

Political polarization (US example): In 1990s-2000s, median Democrat opinion and median Republican opinion were ~0.5 apart. By 2020, they'd moved to nearly opposite extremes. Bounded confidence with declining ε fits this pattern: geographic sorting (people move to communities like them), partisan media (different ε thresholds for Dems vs Reps), and algorithm-driven filter bubbles (effective ε shrinks further). Result: two barely-interacting clusters.

意见如何形成和改变？在有界信任模型（Hegselmann-Krause, Deffuant）中，代理只在与观点足够接近的人互动时才更新自己的意见。这个简单机制产生了聚类，并能解释极化：即使从随机意见出发，也会涌现出共识群体。投票模型更简单：复制一个随机邻居的意见。尽管如此简单，它能预测群体达成共识需要多长时间。

有界信任模型：两个主要变体

Deffuant 模型

成对互动：两个随机代理相遇。
如果 |观点_i − 观点_j| < ε（有界信任），他们折衷：相互移动分数 μ（0 < μ ≤ 1）。
如果距离 ≥ ε，无互动。
结果：形成集群。每个集群是一个共识组。
多少个集群？取决于 ε。小 ε → 多个集群。大 ε → 一个集群。

Hegselmann-Krause 模型

全对全互动：每个代理通过平均距离 ε 内的所有邻居更新。
更新：x_i(t+1) = 平均({x_j : |x_i − x_j| < ε})
平行/同步更新（所有代理同时更新）。
比 Deffuant 更快收敛到最终集群。
临界 ε：低于它多个集群；高于它单个共识。

有界信任参数 ε

ε = 互动的最大观点距离。如果 ε 小，你只与几乎相同的人交谈（回音室）。如果 ε 大，你愿意跨越不同观点参与。数学结果：存在临界 ε* 高于它时所有人收敛到共识，低于它时集群持续。精确值取决于初始观点分布。

极化动态：从观点均匀分布 [0, 1] 开始。有 ε = 0.2，集群涌现（例如，左、中、右）。有 ε = 0.5，共识。你的信任界限越窄，结果越极化。这模型回音室的危险：如果人们只听相似观点，社会分裂。

投票模型

最简单的意见动态：每个代理有二元观点（0 或 1）。每一步，选择随机代理，然后他们复制随机邻居的观点。显著地：

总是收敛到共识（每个人同意 0 或 1）。
共识时间：随群体大小 N 扩展。对于图，通常 O(N²) 或 O(N log N) 取决于网络结构。
哪个共识？如果开始观点频率是 p(0) 为状态 1，预期共识趋向初始多数。
空间投票模型：代理在网格上，仅与邻居更新。初期创建空间集群，慢速合并。

媒体效应和外部偏向

真实社会有媒体和外部影响者。通过添加修改有界信任模型：

媒体节点：所有代理可以"互动"具有固定观点的媒体代理（例如，观点 = 0.8）。媒体广播给所有人。
结果：不是多个集群，所有代理向媒体观点漂移。
过滤泡沫：如果媒体显示与当前观点一致的内容，ε 有效缩小，增加极化。

回音室和算法：社交媒体算法通常推荐与过去观点相似的内容，创建人为小 ε。只看确认观点的用户更快激进化（观点更离中心）。这是极化的计算机制。

政治极化（美国例子）：1990 年代-2000 年代，中位民主党观点和中位共和党观点相隔 ~0.5。到 2020 年，他们移到几乎完全相反的极端。有界信任加ε 衰减符合这个模式：地理排序（人搬到像他们的社区），党派媒体（民主党 vs 共和党不同 ε 阈值），和算法驱动过滤泡沫（有效 ε 进一步缩小）。结果：两个几乎不相互作用的集群。

▶ L Social Learning社会学习

Conformity, Prestige Bias, Cultural Evolution从众、声望偏向、文化演化

Social learning is how cultures transmit information. Key biases: conformity bias (copy the majority), prestige bias (copy successful individuals), and content bias (some ideas are inherently more memorable). Rogers' paradox shows that social learners can't outperform individual learners in a changing environment — they free-ride on others' information. Critical social learning, where agents learn individually when uncertain and socially when confident, can resolve this paradox.

Rogers' Paradox: The Dilemma of Social Learning

Setup: Environment changes (new optimal strategy each generation). Two learner types:

Individual learners (IL): Explore and discover the current optimal strategy via trial and error. Accurate but costly (time, energy, errors).
Social learners (SL): Copy others' strategies. Cheap but always one generation behind (you learn what was optimal last generation, not now).

The Paradox: In equilibrium, SL frequency equals the fitness difference between old-optimal and new-optimal strategies. If environment changes slowly, SL can be as fit as IL because old strategies are still nearly optimal. But if environment changes fast, IL must dominate or the population will lock into outdated behaviors.

Why is this a paradox? Social learning seems "clever" (exploit others' knowledge) but it's actually parasitic. SL need IL to do the costly exploration. Yet if too many SL appear, they out-compete IL (free-riders dominate), the population loses explorers, and everyone crashes when environment shifts.

Critical Social Learning (CSL)

Resolution: agents use critical social learning — learn individually when uncertain, socially when confident. Formally: if confidence in your knowledge (e.g., number of correct experiences) exceeds threshold, copy others; else explore.

Learn socially if: (# successes with strategy A) > confidence threshold
Else: explore alternatives individually

Result: population maintains an optimal mix of explorers (IL) and exploiters (SL). The system is robust: if environment shifts, confident agents switch to individual mode, population adapts.

Key Learning Biases

Conformity bias: Copy the majority. Mathematically: P(adopt_i) ∝ (frequency_i)^s, where s > 1 (super-proportional copying). Why? Averaging many opinions reduces error; majority is usually right. But can trap populations in local optima.
Prestige bias: Copy the most successful. Track who has high payoff and copy their strategy. More direct than conformity but requires knowing others' payoffs. Can lead to faster adaptation than conformity.
Content bias: Some ideas are "sticky"—memorable, emotionally engaging, simple, fitting existing beliefs. Not about the source, but the idea itself. Explains why some memes (ideas) spread regardless of accuracy.

Cultural Accumulation and Cumulative Culture

Unique to humans: we build on each other's discoveries. A tool invented by Alice, improved by Bob, further modified by Carol. The "collective ratchet"—knowledge ratchets up across generations. Agent-based models show:

Without social learning: Each generation reinvents from scratch. Complexity plateaus.
With social learning: Innovations accumulate. Complexity grows over time. Small improvements compound.
Critical threshold: need sufficient population size and communication to sustain cumulative culture. Too small, and innovations are lost.

Bottleneck effect: If a culture is isolated or population crashes, accumulated knowledge can be lost. Aboriginal Australian cultures maintained some technologies (boomerangs, didgeridoos) for 60,000+ years through robust transmission despite periodic population bottlenecks—but other technologies were lost (like seafaring in Tasmania 10,000 years ago).

Fashion trends: New style emerges (content bias: aesthetically appealing). Prestige bias: celebrities adopt it (high-payoff individuals). Conformity bias: masses follow (majority effect). Trend spreads widely. Eventually novelty wears, prestige individuals adopt new style, conformists follow. Cycle repeats. Model: each agent has "style" (discrete choice), adopts based on majority (conformity) + prestige individuals (prestige bias). Result: coordinated cascades of adoption/abandonment. No central planner needed.

Technology adoption: Smartphones. Early adopters (high income, tech-savvy) = prestige. Conformists see majority using, adopt. Utility increases as network grows (indirect prestige). Within years, become ubiquitous. Social learning models fit this S-curve adoption pattern well.

社会学习是文化传递信息的方式。关键偏向：从众偏向（跟随多数人）、声望偏向（模仿成功者）和内容偏向（有些想法本身更容易被记住）。罗杰斯悖论表明，在变化的环境中，社会学习者无法超越个体学习者——他们搭别人信息的便车。批判性社会学习（不确定时独立学习，确定时社会学习）可以解决这个悖论。

罗杰斯悖论：社会学习的困境

设置：环境改变（每代新的最优策略）。两种学习者类型：

个体学习者 (IL)：通过试错探索和发现当前最优策略。准确但昂贵（时间、能量、错误）。
社会学习者 (SL)：复制他人策略。便宜但总是落后一代（你学的是上代最优，不是现在）。

悖论：在均衡，SL 频率等于老最优和新最优策略之间的健身差异。如果环境变化缓慢，SL 可以与 IL 一样健康，因为旧策略仍几乎最优。但如果环境变化快，IL 必须占主导或群体将锁定在过时行为。

为什么这是悖论？社会学习看起来"聪明"（利用他人知识），但实际上是寄生。SL 需要 IL 做昂贵探索。但如果太多 SL 出现，他们超竞争 IL（搭便车者占主导），群体失去探险者，环境移动时每个人都会崩溃。

批判性社会学习 (CSL)

解决方案：代理使用批判性社会学习——不确定时独立学习，确定时社会学习。形式上：如果对知识的信心（例如，正确经验数量）超过阈值，复制他人；否则探索。

社会学习如果：(策略 A 的成功 #) > 信心阈值
否则：个体探索替代方案

结果：群体维持探险者 (IL) 和开拓者 (SL) 的最优混合。系统是健壮的：如果环境移动，有信心代理切换到个体模式，群体适应。

关键学习偏向

从众偏向：复制多数人。数学上：P(采用_i) ∝ (频率_i)^s，其中 s > 1（超比例复制）。为什么？平均许多观点减少错误；多数通常正确。但可以困住群体在局部最优。
声望偏向：复制最成功的。追踪谁有高收益并复制他们的策略。比从众更直接但需要知道他人收益。可导致比从众更快适应。
内容偏向：有些想法是"粘的"——易记，情感参与，简单，符合已有信念。不关于来源，但想法本身。解释为什么一些模因（想法）无论准确性如何都传播。

文化积累和累积文化

人类独有：我们基于彼此的发现。Alice 发明的工具，由 Bob 改进，由 Carol 进一步修改。"集体棘轮"——知识跨世代上升。基于代理的模型显示：

没有社会学习：每代从零重新发明。复杂性停止。
有社会学习：创新积累。复杂性随时间增长。小改进复合。
临界阈值：需要足够的群体大小和沟通来维持累积文化。太小，创新丧失。

瓶颈效应：如果文化孤立或群体崩溃，积累的知识可能丧失。澳大利亚原住民文化维持了某些技术（回旋镖、迪吉里杜管）60,000+ 年，通过健壮传输尽管周期性群体瓶颈——但其他技术丧失了（如 10,000 年前塔斯马尼亚的航海）。

时尚趋势：新风格涌现（内容偏向：美学上有吸引力）。声望偏向：名人采用（高收益个体）。从众偏向：群众跟随（多数效应）。趋势广泛传播。最终新奇褪去，声望个体采用新风格，从众者跟随。循环重复。模型：每个代理有"风格"（离散选择），基于多数采用（从众）+ 声望个体（声望偏向）。结果：协调的采用/抛弃级联。不需要中央规划者。

技术采用：智能手机。早期采用者（高收入、技术精通）= 声望。从众者看到多数使用，采用。网络效用增长（间接声望）。数年内，变得无所不在。社会学习模型很好地符合这个 S 曲线采用模式。

▶ Q4 How Are Agents Connected & How Do We Know?行为者如何连接，我们如何知道？ Networks & Inference网络与推断

▶ N Network Fundamentals网络基础

Graphs, Degree Distribution, Clustering, Small Worlds图、度分布、聚类系数、小世界

Networks describe who is connected to whom. Key concepts: degree (number of connections), degree distribution (random networks follow Poisson, real social networks are often scale-free), clustering coefficient (are your friends also friends with each other?), and path length (degrees of separation). Small-world networks (Watts & Strogatz) combine high clustering with short paths — like real social networks. They arise from rewiring just a few random links in a regular lattice.

Network Models and Degree Distributions

Random Networks (Erdős–Rényi)

Model: N nodes, each pair connected with probability p.
Degree distribution: Poisson. Most nodes have similar degree (roughly N×p).
Phase transition at p = 1/N: below it, fragmented; above, giant connected component.
Short paths: log(N) average distance.
Low clustering: friends unlikely to be mutual friends.
Real networks: rarely Poisson. Usually have high variance.

Scale-Free Networks (Barabási–Albert)

Model: Preferential attachment. New nodes connect to existing nodes proportional to their degree (rich get richer).
Degree distribution: Power law P(k) ∝ k^−α. Many low-degree nodes, few hubs.
Real examples: Internet, citing papers, social networks.
Robustness: removing random nodes has little effect; removing hubs breaks network.
Efficiency: few hops to any node via hub shortcuts.

Small-World Networks (Watts & Strogatz)

Reality: social networks have both properties:

High clustering: your friends' friends are likely friends (triangles are common).
Short paths: you can reach anyone in ~6 degrees.

W-S Model: start with a lattice ring (high clustering), then randomly rewire a fraction p of edges. Result:

p = 0 (no rewiring): lattice, high clustering C, long paths L.
p = 1 (all random): ER random graph, low C, short L.
p ≈ 0.01 (just a bit of rewiring): small-world! C stays high, but L drops to log(N). A few random links act as "shortcuts" across the lattice.

Network Measures and Centrality

Degree centrality: simple count of connections. High-degree nodes are "hubs."
Betweenness centrality: how many shortest paths pass through this node? Nodes with high betweenness are "bridges." Removing them fragments the network.
Closeness centrality: average distance to all other nodes. Central nodes can reach everyone quickly (low average distance).
Clustering coefficient (local): if I have k neighbors, how many connections exist among them? C_i = (edges among neighbors) / (k(k-1)/2). Ranges [0,1].
Clustering coefficient (global): average of local clustering. High C means triangles and cliques.

Properties and Implications

Connectivity: what fraction of nodes can reach each other? In a small-world, usually 1 giant component + isolated nodes.
Assortativity: do high-degree nodes connect to other high-degree nodes (assortative) or low-degree (disassortative)? Social networks tend assortative (rich befriend rich); biological networks disassortative (hubs connect to specialists).
Robustness: small-world is resilient to random failures (redundant paths) but vulnerable to targeted hub removal.

Six degrees of separation: Milgram's 1960s experiment. People were asked to pass a letter to acquaintances, trying to reach a target across the US. Median chain length: ~6 hops. Modern empirical studies (Facebook, LinkedIn) confirm: average distance is indeed 4–6, consistent with small-world prediction log(N) ≈ 6 for billion-scale networks.

Kevin Bacon game: connect any actor to Kevin Bacon via co-starring relationships. "Bacon number" is shortest path length. Most actors have Bacon number ≤ 3. This is small-world: few degrees of separation despite network size (million+ actors). Betweenness centrality varies: prolific, central actors have low Bacon numbers.

网络描述了谁与谁相连。关键概念：度（连接数量）、度分布（随机网络服从泊松分布，真实社交网络通常是无标度的）、聚类系数（你的朋友之间也是朋友吗？）和路径长度（分隔度数）。小世界网络（Watts & Strogatz）结合了高聚类和短路径——就像真实的社交网络。只需在规则格子中随机重连少量链接就能产生。

网络模型和度分布

随机网络（Erdős–Rényi）

模型：N 个节点，每对以概率 p 相连。
度分布：泊松。大多数节点度相似（大约 N×p）。
相变在 p = 1/N：低于它分片；高于它巨型连通分量。
短路径：log(N) 平均距离。
低聚类：朋友不太可能相互朋友。
真实网络：很少泊松。通常高方差。

无标度网络（Barabási–Albert）

模型：优先连接。新节点连接现有节点的概率与其度成正比（富者更富）。
度分布：幂律 P(k) ∝ k^−α。许多低度节点，少数枢纽。
真实例子：互联网、引用论文、社交网络。
健壮性：删除随机节点影响小；删除枢纽破坏网络。
效率：通过枢纽快捷方式到任何节点几个跳跃。

小世界网络（Watts & Strogatz）

现实：社交网络有两个属性：

高聚类：你朋友的朋友很可能是朋友（三角形常见）。
短路径：你可以在 ~6 度内到达任何人。

W-S 模型：从格子环（高聚类）开始，然后随机重连比例 p 的边。结果：

p = 0（无重连）：格子，高聚类 C，长路径 L。
p = 1（全随机）：ER 随机图，低 C，短 L。
p ≈ 0.01（只是一点重连）：小世界！C 保持高，但 L 降到 log(N)。少数随机链接充当格子穿梭的"快捷方式"。

网络措施和中心性

度中心性：简单连接计数。高度节点是"枢纽"。
介数中心性：多少最短路径通过此节点？高介数节点是"桥"。删除它们碎裂网络。
接近中心性：到所有其他节点的平均距离。中心节点可快速到达所有人（低平均距离）。
聚类系数（局部）：如果我有 k 个邻居，它们之间存在多少连接？C_i = (邻居间的边) / (k(k-1)/2)。范围 [0,1]。
聚类系数（全局）：局部聚类的平均。高 C 意味着三角形和团。

属性和含义

连通性：多少比例的节点可以相互到达？在小世界，通常 1 个巨型分量 + 孤立节点。
同质性：高度节点连接其他高度节点（同质）还是低度（异质）？社交网络倾向同质（富人与富人交友）；生物网络异质（枢纽连接专家）。
健壮性：小世界对随机失败有弹性（冗余路径），但易受指向性枢纽删除。

六度分离：Milgram 的 1960 年代实验。要求人们通过熟人传递信件，试图到达美国目标。中位链长：~6 跳。现代实证研究（Facebook、LinkedIn）确认：平均距离确实 4–6，与小世界预测 log(N) ≈ 6 对十亿规模网络一致。

Kevin Bacon 游戏：通过共同主演关系连接任何演员到 Kevin Bacon。"Bacon 数"是最短路径长度。大多数演员有 Bacon 数 ≤ 3。这是小世界：尽管网络大小（百万+ 演员），分隔度数少。介数中心性变化：多产、中心演员有低 Bacon 数。

▶ B Science as Bayesian Inference科学即贝叶斯推断

Bayesian Updating, Models as Inference Tools贝叶斯更新、模型作为推断工具

Models are tools for inference. Bayesian thinking: start with a prior belief, observe evidence, update to a posterior belief using Bayes' rule. Science progresses by updating beliefs about competing theories as evidence accumulates. A good model generates clear, falsifiable predictions that help us distinguish between theories. The strength of evidence is captured by the likelihood ratio.

P(H|D) = P(D|H)P(H) / P(D)

Decomposing Bayes' Rule

Prior P(H): your belief before seeing evidence. How plausible was this hypothesis beforehand?
Likelihood P(D|H): probability of observing this data IF the hypothesis is true. How well does the model predict the data?
Evidence P(D): overall probability of the data under all hypotheses. Normalizes the posterior.
Posterior P(H|D): updated belief after seeing data. How plausible is the hypothesis now?

Concrete Example: Disease Testing

Scenario: You take a medical test. 1% of population has the disease; 99% don't. Test accuracy: 95% sensitivity (catches 95% of true cases), 90% specificity (correctly identifies 90% of non-cases as negative). You test positive. What's the probability you actually have the disease?

Intuition fails: Many doctors answer ~95%. Wrong!

Bayesian calculation (for 10,000 people):

100 people have disease; 9,900 don't.
Test correctly detects: 95 of the 100 (95% sensitivity).
Test falsely flags: 10% of 9,900 = 990 don't have it but test positive.
Total positives: 95 + 990 = 1,085.
Posterior: P(disease | positive) = 95 / 1,085 ≈ 8.8%!

Why so low? Even though the test is 95% accurate, the disease is so rare that false positives (990) dwarf true positives (95). Your prior was strong (1% prevalence); the test updates it, but not as much as intuition suggests.

The Likelihood Ratio

The strength of evidence is the likelihood ratio (LR): P(D|H1) / P(D|H2). How much more likely is the data under hypothesis 1 vs 2?

LR = 1: evidence neutral.
LR > 1: favors H1.
LR < 1: favors H2.

In the disease example: LR = P(positive | disease) / P(positive | no disease) = 0.95 / 0.10 = 9.5. The evidence (positive test) is 9.5× more likely under disease than no disease. But with a prior ratio of 1:99, posterior is still skewed toward "no disease."

Science as Bayesian Model Updating

Hypotheses = models. Data = observations. Science progresses by:

Propose theories: set prior beliefs over competing models.
Make predictions: each model generates predicted data distribution P(D|model).
Observe: do the prediction and reality match?
Update: use likelihood ratios to update belief. Model with higher likelihood rises in posterior.
Repeat: as more evidence accumulates, posterior becomes more confident (posterior peaks narrow).

Falsifiability: Popper said a theory is scientific if falsifiable (can be proven wrong). Bayesian view: no theory is ever 100% ruled out, but wrong theories get lower and lower posterior probability as contradictory evidence piles up. Eventually, prior + huge likelihood ratios make the posterior negligible.

Models as Hypothesis Generators

Computational models excel here: they generate precise, quantitative predictions. Instead of vague theory ("aggression depends on provocation"), a model says "aggression increases 2.3× per provocation unit." Data either matches (model gains posterior probability) or doesn't (model loses). This makes models falsifiable and powerful for inference.

Schelling segregation model as Bayesian inference: Competing hypotheses: (1) segregation requires intentional discrimination; (2) mild preferences + self-interested moves suffice. Schelling's model implements hypothesis 2, generates prediction: even 30% in-group preference produces ~90% segregation. Data from real neighborhoods: ~90% segregation observed, consistent with prediction. Likelihood of data under hypothesis 2 > hypothesis 1. Posterior: favors hypothesis 2.

Cognitive psychology example: Is memory decay exponential or power-law? Two competing models. Both fit existing data reasonably well. Each model predicts long-term forgetting differently. New experiment: measure recall at 1 year, 5 years, 10 years. Power-law model predicts slower decay. If data follows power-law, posterior probability of power-law model rises. Model-based science uses likelihood ratios to arbitrate between theories.

模型是推断的工具。贝叶斯思维：从先验信念出发，观察证据，使用贝叶斯规则更新为后验信念。科学通过在证据积累时更新对竞争理论的信念来进步。好的模型能产生清晰、可证伪的预测，帮助我们区分不同理论。证据的强度由似然比捕获。

P(H|D) = P(D|H)P(H) / P(D)

分解贝叶斯规则

先验 P(H)：看证据前的信念。这个假设事先有多可信？
似然 P(D|H)：如果假设真，观察此数据的概率。模型有多好地预测数据？
证据 P(D)：所有假设下数据的总体概率。归一化后验。
后验 P(H|D)：看数据后的更新信念。假设现在有多可信？

具体例子：疾病检测

场景：你进行医学检测。1% 人口有病；99% 没有。测试准确度：95% 灵敏度（捕获 95% 真实病例），90% 特异度（正确识别 90% 非病例为阴性）。你测试阳性。你实际患病的概率是多少？

直觉失败：许多医生答 ~95%。错！

贝叶斯计算（10,000 人）：

100 人有病；9,900 没有。
测试正确检测：100 中的 95（95% 灵敏度）。
测试错误标记：9,900 的 10% = 990 没有病但阳性。
总阳性： 95 + 990 = 1,085。
后验： P(病 | 阳性) = 95 / 1,085 ≈ 8.8%！

为什么那么低？尽管测试 95% 准确，病是如此罕见，假阳性（990）远超真阳性（95）。你的先验很强（1% 患病率）；测试更新它，但不如直觉建议那么多。

似然比

证据强度是似然比 (LR)： P(D|H1) / P(D|H2)。在假设 1 vs 2 下数据多可能？

LR = 1：证据中立。
LR > 1：支持 H1。
LR < 1：支持 H2。

在疾病例子中：LR = P(阳性 | 病) / P(阳性 | 无病) = 0.95 / 0.10 = 9.5。证据（阳性测试）在病下 9.5× 比无病可能。但带 1:99 先验比，后验仍偏向"无病"。

科学即贝叶斯模型更新

假设 = 模型。数据 = 观察。科学通过以下进步：

提议理论：设定竞争模型的先验信念。
做预测：每个模型生成预测数据分布 P(D|model)。
观察：预测和现实匹配吗？
更新：使用似然比更新信念。高似然模型在后验中上升。
重复：随着更多证据积累，后验变更确信（后验峰变窄）。

可证伪性：Popper 说如果理论可证伪（可被证明错误），它是科学的。贝叶斯视角：没有理论 100% 被排除，但错误理论随矛盾证据堆积得到越来越低的后验概率。最终，先验 + 巨大似然比使后验忽视不计。

模型作为假设生成器

计算模型卓越之处：它们生成精确、定量的预测。不是含糊理论（"侵犯取决于激怒"），模型说"每激怒单位侵犯增加 2.3×"。数据要么匹配（模型获得后验概率），要么不（模型失去）。这使模型可证伪和强大用于推断。

Schelling 隔离模型作为贝叶斯推断：竞争假设：(1) 隔离需要有意歧视；(2) 温和偏好 + 自私移动足够。Schelling 模型实现假设 2，生成预测：即使 30% 同族偏好产生 ~90% 隔离。真实社区数据：观察到 ~90% 隔离，与预测一致。数据在假设 2 下的似然 > 假设 1。后验：支持假设 2。

认知心理学例子：记忆衰减是指数还是幂律？两个竞争模型。两个都相当好地拟合现有数据。每个模型预测长期遗忘不同。新实验：测量 1 年、5 年、10 年时的回忆。幂律模型预测较慢衰减。如果数据遵循幂律，幂律模型的后验概率上升。基于模型的科学使用似然比在理论间仲裁。

← Previous

Text as Data

Empirical Modeling