如何用R語言進行探索性因子分析(EFA)
探索性因子分析(EFA)是一系列用來發現一組變數的潛在結構的方法。它通過尋找一組更小的、潛在的或隱藏的結構來解釋已觀測到的、顯式的變數間的關係。
導入數據
大家一般在進行探索性因子分析時都會使用SPSS,今天教大家如何在R語言中實現EFA。
打開R studio,將數據導入R中
載入超時,點擊重試
點擊import Dataset,選擇From SPSS。因為本次分析的數據是以SPSS的sav格式保存的,所以這裡選擇從SPSS中導入。
載入超時,點擊重試
在彈出的對話框中點擊Browse選取數據文件,在左下角可以對導入R之後的數據文件進行重命名,最後點擊import導入。
可以看到數據中包含13個變數,240條數據,第一行為變數名。
計算相關係數
載入psych包,然後計算相關係數矩陣,因為下面會用到。
library(psych)
correlations <- cor(efa) # 計算變數相關係數矩陣並賦值給correlations
判斷需提取的公共因子數
fa.parallel(correlations, n.obs = 240, fa = "both", n.iter = 100, main = "平行分析碎石圖")
Parallel analysis suggests that the number of factors = 3 and the number of components = 3
這裡的fa.parallel( )是psych包中的因子分析函數,能生成含平行分析的碎石圖。
下面簡要介紹以下平行分析(Parallel Analysis)
依據與初始矩陣相同大小的隨機數據矩陣來判斷要提取的特徵值。若基於真實數據的某個特徵值大於一組隨機數據矩陣相應的平均特徵值,那麼該主成分可以保留。這種方法稱作平行分析。
- correlations:相關係數矩陣;
- n.obs:樣本量;
- fa = "both":表示同時展示主成分和公共因子分析的結果。如果只需要主成分,即fa = "pc",只需要公共因子分析,即fa = "fa";
- n.iter:模擬分析的次數(Number of simulated analyses to perform);
- main:標題命名。
運行代碼後,系統告訴我們平行分析建議因子數為3,成分數為3。我們需要根據平行分析的碎石圖來看為什麼是3個。
碎石圖中,實線表示真實數據,虛線表示模擬數據。
主成分分析(PC)即x線,真實數據中3個成分高於模擬數據;同樣,因子分析(FA)即三角形線,真實數據中也有3個因子高於100次模擬數據矩陣的特徵值均值。
所以,根據碎石圖,我們可能選擇3個成分。
提取公共因子
現在我們決定提取3個因子,使用fa( )函數獲得結果。
fa(r, nfactors = , n.obs = , rotate = , scores = , fm = )
- r:相關係數矩陣;
- nfactors:設定提取的因子數,默認為1;
- n.obs:樣本量;
- rotate:旋轉方法,默認為變異數最小法;
- scores:設定是否計算因子得分,默認不計算;
- fm:設定因子化方法,包含最大似然法(ml),主軸迭代法(pa),加權最小二乘法(wls),廣義加權最小二乘法(gls)以及默認的極小殘差法(minres)。
fa <- fa(correlations, nfactors = 3, rotate = "none", fm = "pa")
# 提取3個因子,不旋轉,使用主軸迭代法
fa
Factor Analysis using method = pa
Call: fa(r = correlations, nfactors = 3, rotate = "none", fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
PA1 PA2 PA3 h2 u2 com # PA表示成分載荷,即觀測變數與因子的相關係數
A1 0.52 0.45 0.01 0.47 0.53 2.0 # h2表示公因子方差,即因子對每個變數的方法解釋度
A2 0.61 0.54 -0.17 0.70 0.30 2.1 # u2表示成分唯一性,即方差無法被因子解釋的比例(1-h2)
A3 0.64 0.58 -0.09 0.75 0.25 2.0
A4 0.66 0.46 -0.11 0.66 0.34 1.8
A5 0.72 0.53 -0.11 0.81 0.19 1.9
B1 0.32 0.16 0.57 0.45 0.55 1.7
B2 0.35 0.07 0.54 0.42 0.58 1.8
B3 0.29 -0.02 0.57 0.42 0.58 1.5
C1 0.75 -0.41 -0.08 0.74 0.26 1.6
C2 0.76 -0.40 -0.04 0.75 0.25 1.5
C3 0.77 -0.51 -0.10 0.86 0.14 1.8
C4 0.74 -0.49 -0.10 0.80 0.20 1.8
C5 0.71 -0.44 0.03 0.70 0.30 1.7
PA1 PA2 PA3
SS loadings 5.09 2.37 1.04
Proportion Var 0.39 0.18 0.08 # 每個因子對整個數據集的解釋程度
Cumulative Var 0.39 0.57 0.65 # 因子對數據集的累計解釋程度
Proportion Explained 0.60 0.28 0.12
Cumulative Proportion 0.60 0.88 1.00 # 累計方法解釋率
Mean item complexity = 1.8
Test of the hypothesis that 3 factors are sufficient.
The degrees of freedom for the null model are 78 and the objective function was 8.96
The degrees of freedom for the model are 42 and the objective function was 0.41
The root mean square of the residuals (RMSR) is 0.02
The df corrected root mean square of the residuals is 0.03
Fit based upon off diagonal values = 1
Measures of factor score adequacy
PA1 PA2 PA3
Correlation of (regression) scores with factors 0.98 0.95 0.81
Multiple R square of scores with factors 0.95 0.91 0.66
Minimum correlation of possible factor scores 0.91 0.81 0.32
可以看到3個因子解釋了整個數據集100%的方差。
因子旋轉
我們可以使用正交旋轉或斜交旋轉來旋轉上面的結果。正交旋轉將人為地強制3個因子不相關,斜交旋轉允許3個因子相關。
- 用正交旋轉提取因子
fa.varimax <- fa(correlations, nfactors = 3, rotate = "varimax", fm = "pa")
fa.varimax
Factor Analysis using method = pa
Call: fa(r = correlations, nfactors = 3, rotate = "varimax", fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
PA1 PA2 PA3 h2 u2 com
A1 0.08 0.65 0.18 0.47 0.53 1.2
A2 0.12 0.83 0.04 0.70 0.30 1.0
A3 0.10 0.85 0.13 0.75 0.25 1.1
A4 0.20 0.78 0.11 0.66 0.34 1.2
A5 0.19 0.87 0.13 0.81 0.19 1.1
B1 0.04 0.18 0.64 0.45 0.55 1.2
B2 0.12 0.14 0.62 0.42 0.58 1.2
B3 0.14 0.03 0.63 0.42 0.58 1.1
C1 0.83 0.18 0.09 0.74 0.26 1.1
C2 0.83 0.18 0.14 0.75 0.25 1.2
C3 0.92 0.13 0.08 0.86 0.14 1.1
C4 0.88 0.12 0.07 0.80 0.20 1.0
C5 0.81 0.10 0.19 0.70 0.30 1.1
PA1 PA2 PA3
SS loadings 3.80 3.35 1.35
Proportion Var 0.29 0.26 0.10
Cumulative Var 0.29 0.55 0.65
Proportion Explained 0.45 0.39 0.16
Cumulative Proportion 0.45 0.84 1.00
Mean item complexity = 1.1
Test of the hypothesis that 3 factors are sufficient.
The degrees of freedom for the null model are 78 and the objective function was 8.96
The degrees of freedom for the model are 42 and the objective function was 0.41
The root mean square of the residuals (RMSR) is 0.02
The df corrected root mean square of the residuals is 0.03
Fit based upon off diagonal values = 1
Measures of factor score adequacy
PA1 PA2 PA3
Correlation of (regression) scores with factors 0.97 0.95 0.82
Multiple R square of scores with factors 0.94 0.91 0.67
Minimum correlation of possible factor scores 0.87 0.82 0.34
結果顯示因子變得更好解釋了,A1-A5在第二個因子上載荷較大,B1-B3在第三個因子上載荷較大,C1-C5在第一個因子上載荷較大。
2. 用斜交旋轉提取因子
fa.promax <- fa(correlations, nfactors = 3, rotate = "promax", fm = "pa")
fa.promax
Factor Analysis using method = pa
Call: fa(r = correlations, nfactors = 3, rotate = "promax", fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
PA1 PA2 PA3 h2 u2 com
A1 -0.04 0.66 0.09 0.47 0.53 1.0
A2 -0.01 0.87 -0.09 0.70 0.30 1.0
A3 -0.04 0.88 0.01 0.75 0.25 1.0
A4 0.08 0.79 -0.02 0.66 0.34 1.0
A5 0.05 0.88 0.00 0.81 0.19 1.0
B1 -0.08 0.08 0.66 0.45 0.55 1.1
B2 0.02 0.03 0.63 0.42 0.58 1.0
B3 0.06 -0.10 0.65 0.42 0.58 1.1
C1 0.85 0.05 -0.02 0.74 0.26 1.0
C2 0.84 0.05 0.03 0.75 0.25 1.0
C3 0.94 -0.01 -0.04 0.86 0.14 1.0
C4 0.91 -0.01 -0.05 0.80 0.20 1.0
C5 0.82 -0.04 0.09 0.70 0.30 1.0
PA1 PA2 PA3
SS loadings 3.84 3.38 1.28
Proportion Var 0.30 0.26 0.10
Cumulative Var 0.30 0.56 0.65
Proportion Explained 0.45 0.40 0.15
Cumulative Proportion 0.45 0.85 1.00
With factor correlations of
PA1 PA2 PA3
PA1 1.00 0.33 0.30
PA2 0.33 1.00 0.34
PA3 0.30 0.34 1.00
Mean item complexity = 1
Test of the hypothesis that 3 factors are sufficient.
The degrees of freedom for the null model are 78 and the objective function was 8.96
The degrees of freedom for the model are 42 and the objective function was 0.41
The root mean square of the residuals (RMSR) is 0.02
The df corrected root mean square of the residuals is 0.03
Fit based upon off diagonal values = 1
Measures of factor score adequacy
PA1 PA2 PA3
Correlation of (regression) scores with factors 0.97 0.96 0.84
Multiple R square of scores with factors 0.95 0.93 0.71
Minimum correlation of possible factor scores 0.90 0.85 0.42
繪製斜交旋轉後結果圖形
使用factor.plot( )和fa.digram( )函數繪製探索性因子分析的結果圖形。
factor.plot(fa.promax, labels = rownames(fa.promax$loadings))
fa.diagram(fa.promax, digits = 3)
# digits = 3表示保留3為小數
參考文獻:
Kabacoff, R. I. (2015). R in action :data analysis and graphics
with r. Pearson Schweiz Ag, 1-474.
推薦閱讀: