Latent Profile Analysis
A person-centered approach to identifying hidden subgroups
一种以个体为中心的隐藏亚组识别方法
Most statistical methods you've learned (regression, ANOVA, SEM) ask: "How do variables relate to each other?" But what if different people follow entirely different patterns? Latent Profile Analysis (LPA) flips the question — instead of studying variables, it studies people, looking for hidden subgroups who share similar response patterns across multiple indicators. Prerequisites: basic understanding of means, variances, and the concept of probability distributions. Software: R (tidyLPA) or Mplus.
你学过的大多数统计方法(回归、方差分析、SEM)都在问:"变量之间如何相关?"但如果不同的人遵循完全不同的模式呢?潜在剖面分析(LPA)反转了问题——不研究变量,而是研究人,寻找在多个指标上具有相似反应模式的隐藏亚组。前置知识:对均值、方差和概率分布概念的基本理解。软件:R(tidyLPA)或 Mplus。
Variable-Centered vs. Person-Centered
以变量为中心 vs. 以个体为中心
Focus: Relationships among variables
Assumption: Single homogeneous population
Examples: Regression, SEM, ANOVA, CFA
焦点:变量之间的关系
假设:样本来自单一同质总体
举例:回归、SEM、ANOVA、CFA
Focus: Identifying subgroups of individuals
Assumption: Heterogeneous — distinct latent subgroups exist
Examples: LPA, LCA, Cluster Analysis
焦点:识别不同的个体亚组
假设:总体是异质的——存在不同的潜在亚组
举例:LPA、LCA、聚类分析
Person-Centered Analytical Methods
以个体为中心的分析方法
Cluster Analysis
聚类分析
- Distance-based grouping (e.g., K-means)
- No underlying statistical model
- No formal test for optimal clusters
- Hard assignment only
- 基于距离的分组(如 K-means)
- 没有底层统计模型
- 没有正式的最优簇数检验
- 只能硬分类
LPA
- Continuous indicators
- Model-based (Finite Mixture Model)
- Statistical fit indices for class enumeration
- Probabilistic assignment
- 连续型指标
- 基于模型(有限混合模型)
- 有统计拟合指标来确定类别数
- 概率性分类
LCA
- Categorical indicators
- Same framework as LPA
- Item response probabilities instead of means
- Probabilistic assignment
- 分类型指标
- 与 LPA 同一框架
- 用项目反应概率代替均值
- 概率性分类
How LPA Works
LPA 的工作原理
LPA is built on a Finite Mixture Model: observed data is a mixture of K normal distributions, each representing a hidden subgroup. Using the EM algorithm, LPA infers how many subgroups exist, the mean and variance of each on each indicator, and each person's probability of belonging to each subgroup.
LPA 建立在有限混合模型之上:观测数据是 K 个正态分布的混合,每个分布代表一个隐藏的亚组。通过 EM 算法,LPA 推断出有多少个亚组、每个亚组在每个指标上的均值和方差,以及每个个体属于各亚组的概率。
Probabilistic Classification
概率性分类
Each individual has a probability of belonging to each class — not hard assignment. Classification uses the highest posterior probability.
每个个体都有属于每个类别的概率——不是硬性分配。最终分类取后验概率最高的类别。
Statistical Fit Indices
统计拟合指标
BIC (Bayesian Information Criterion) and BLRT (Bootstrapped Likelihood Ratio Test) allow systematic model comparison. Entropy measures classification precision (0–1); values > .80 indicate good quality (Nylund et al., 2007).
BIC(贝叶斯信息准则)和 BLRT(自助似然比检验)可用于系统性的模型比较。Entropy 衡量分类精度(0–1),> .80 表示分类质量好(Nylund et al., 2007)。
LPA Workflow
LPA 工作流程
Indicators
指标选择
Models
个模型
Fit Indices
拟合指标
Profiles
剖面图
Sample size: N ≥ 300–500 (Nylund-Gibson & Choi, 2018). Random starts: ≥ 500 to avoid local solutions. Smallest class: ≥ 5–8% of sample. Entropy: > .80 indicates good classification — but do NOT use it to select K.
样本量:N ≥ 300–500(Nylund-Gibson & Choi, 2018)。随机起始值:≥ 500 以避免局部最优解。最小类别比例:≥ 样本的 5–8%。Entropy:> .80 表示分类质量好——但不要用它来选择 K。
Article Example
文献示例
Research Design & Indicator Selection
研究设计与指标选择
Organismic Integration Theory (OIT), a sub-theory of Self-Determination Theory, posits that academic motivation is not a simple "high vs. low" dimension but a continuum of self-determination — from external regulation to intrinsic motivation. Variable-centered methods would ask: "For every one-unit increase in intrinsic motivation, does achievement increase by 0.3 points?" But in reality, every student has a simultaneous score on all four motivational types — creating a unique combination pattern.
The person-centered question: Are there distinct subgroups of students characterized by unique combinations of these motivations?
The study hypothesized that (H1) at least 4 distinct profiles would emerge based on OIT motivation types, and (H2) more autonomous profiles would show higher effort, value, competence, and extra time on math. Four motivation types from the Self-Regulation Questionnaire–Academic (SRQ-A; Ryan & Connell, 1989) served as LPA indicators — external regulation (4 items), introjected regulation (4 items), identified regulation (3 items), and intrinsic motivation (3 items) — measured on a 7-point Likert scale. The sample comprised N = 1,151 secondary school students (679 males, 444 females, 28 unreported; age 13–17, M = 14.69, SD = .58) from 5 schools in Singapore. Models were estimated in Mplus 7.2 using the MLR estimator with 10,000 random starts (500 best solutions retained).
有机整合理论(OIT)是自我决定理论的子理论,认为学业动机不是简单的"高 vs. 低"维度,而是一个从外部调节到内在动机的自我决定连续体。以变量为中心的方法会问:"内在动机每提高一个单位,成绩是否提高 0.3 分?"但现实中,每个学生在四种动机类型上同时拥有得分——形成独特的组合模式。
以个体为中心的研究问题:学生中是否存在具有独特动机组合的不同亚组?
研究假设:(H1) 基于 OIT 动机类型至少会出现 4 个不同的剖面;(H2) 更自主的剖面在努力、价值感、胜任感和数学学习时间上表现更好。四种动机类型来自学业自我调节问卷(SRQ-A; Ryan & Connell, 1989)作为 LPA 指标——外部调节(4 题)、内摄调节(4 题)、认同调节(3 题)和内在动机(3 题)——采用 7 点 Likert 量表。样本为新加坡 5 所学校的 N = 1,151 名中学生(679 男,444 女,28 未报告;年龄 13–17 岁,M = 14.69, SD = .58)。使用 Mplus 7.2 的 MLR 估计器,10,000 个随机起始值(保留 500 个最佳解)。
Step 1: Model Comparison & Class Enumeration
步骤一:模型比较与类别数确定
BIC: Lower is better — look for the "elbow" where decline slows. BLRT: Most accurate across all conditions — significant p means K > K−1. aLMR: Adjusted Lo-Mendell-Rubin test — non-significant p suggests current K is sufficient. When indices disagree: Prioritize BIC + BLRT, combined with theoretical interpretability and class size. Here, the 4-profile solution was selected: the aLMR became non-significant beyond 4 profiles, fit improvements were marginal, and each profile was theoretically interpretable.
BIC:越低越好——寻找下降趋缓的"拐点"。BLRT:在所有条件下最准确——p 显著说明 K 优于 K−1。aLMR:调整后的 Lo-Mendell-Rubin 检验——p 不显著说明当前 K 已足够。指标不一致时:优先考虑 BIC + BLRT,结合理论可解释性和类别大小。本研究选择了 4 剖面方案:aLMR 在 4 剖面之后变得不显著,拟合改善微弱,且每个剖面理论上可解释。
Step 2: Interpreting the Profile Plot
步骤二:解读剖面图
Focus on the shape of the line (the pattern across indicators), not just absolute levels. Name each profile based on its most distinctive features.
关注线条的形状(各指标上的模式),而非仅看绝对水平。根据每个剖面最独特的特征来命名。
Step 3: Outcome Validation
步骤三:结果验证
Do the profiles differ on meaningful academic outcomes?
这些剖面在有意义的学业结果上是否存在差异?
Autonomous Advantage
自主型优势
The Autonomous profile (P3) consistently outperformed all other groups across every outcome: effort (3 > 2 > 4 > 1), task value (3 > 2 = 4 > 1), perceived competence (3 > 4 > 2 = 1), and math study hours (3 > 4 = 2 = 1). High autonomous motivation led to the most adaptive outcomes.
自主型(P3)在所有结果变量上均优于其他组:努力(3 > 2 > 4 > 1)、任务价值(3 > 2 = 4 > 1)、感知胜任力(3 > 4 > 2 = 1)和数学学习时间(3 > 4 = 2 = 1)。高自主动机带来最适应性的结果。
Effort Is Graded by Self-Determination
努力随自我决定程度递增
Effort showed a clear gradient across profiles: P3 > P2 > P4 > P1. Notably, the Externally Driven group (P2) reported higher effort than the Moderate group (P4), suggesting external pressure can sustain effort — but the Autonomous group's effort still surpassed all others.
努力在各剖面间呈现清晰的梯度:P3 > P2 > P4 > P1。值得注意的是,外部驱动组(P2)的努力高于中等组(P4),说明外部压力能维持努力——但自主组的努力仍然高于所有其他组。
Competence Requires Intrinsic Interest
胜任感需要内在兴趣
The Externally Driven profile (P2) showed no advantage in perceived competence over the Low Motivation group (P1), despite P2's higher external and identified regulation (2 = 1). In contrast, even the Moderate group (P4) outperformed P2 in competence, suggesting that intrinsic interest — not external pressure — is essential for building academic confidence.
外部驱动型(P2)在感知胜任力上相比低动机组(P1)没有任何优势,尽管 P2 的外部调节和认同调节更高(2 = 1)。相反,中等组(P4)在胜任感上优于 P2,说明内在兴趣——而非外部压力——是建立学业自信的关键。
Extensions
方法拓展
Morin et al. (2016) Six-Step Framework:
1. Configural similarity (same # of profiles?)
2. Structural similarity (same means?)
3. Dispersion similarity (same variances?)
4. Distributional similarity (same proportions?)
5. Predictive similarity (same predictors?)
6. Explanatory similarity (same outcomes?)
Morin et al. (2016) 六步框架:
1. 形态相似性(剖面数量相同?)
2. 结构相似性(均值相同?)
3. 离散相似性(方差相同?)
4. 分布相似性(比例相同?)
5. 预测相似性(预测因素相同?)
6. 解释相似性(结果变量相同?)
Latent Transition Analysis (LTA) — Tracks how individuals transition between profiles over time. Estimates transition probabilities.
Growth Mixture Modeling (GMM) — Identifies distinct developmental trajectory classes (e.g., increasing, stable, declining).
Other: Multilevel LCA/LPA, factor mixture models, Bayesian estimation for small samples.
潜在转变分析(LTA)——追踪个体如何随时间在剖面之间转变,估计转变概率。
增长混合模型(GMM)——识别不同的发展轨迹类别(如上升、稳定、下降)。
其他:多层次 LCA/LPA、因子混合模型、小样本的贝叶斯估计。
References & Software
参考文献与软件
Key References
核心文献
- BeginnerNylund-Gibson, K., & Choi, A. Y. (2018). Ten frequently asked questions about latent class analysis. Translational Issues in Psychological Science, 4(4), 440–461. https://doi.org/10.1037/tps0000176
- Fit IndicesNylund, K. L., Asparouhov, T., & Muthén, B. O. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling, 14(4), 535–569. https://doi.org/10.1080/10705510701575396
- Multi-GroupMorin, A. J. S., Meyer, J. P., Creusier, J., & Biétry, F. (2016). Multiple-group analysis of similarity in latent profile solutions. Organizational Research Methods, 19(2), 231–254. https://doi.org/10.1177/1094428115621148
- AppliedWang, C. K. J., Liu, W. C., Nie, Y., et al. (2017). Latent profile analysis of students' motivation and outcomes in mathematics. Heliyon, 3(6), e00308. https://doi.org/10.1016/j.heliyon.2017.e00308
Software
软件工具
Mplus
Gold standard. Most flexible, best support for LPA/LCA.
黄金标准。最灵活,对 LPA/LCA 支持最好。
R — tidyLPA
Free, user-friendly. Good for learning and basic analyses.
免费、易用。适合学习和基础分析。
Python — sklearn
Gaussian Mixture. Lacks BLRT/entropy but adequate for exploration.
高斯混合模型。缺少 BLRT/entropy,但足以做探索性分析。