同一個模型兩組不同樣本下，回歸係數間的差異性檢驗？？（非虛擬變數，stata方法）

12-29

假如，都是因變數y對x1和x2回歸，只不過回歸時兩組樣本分別用國有企業、民營企業數據。
這樣就會估計出兩個方程，y=a0+a1*x1+a2*x2(國有)以及y=b0+b1*x1+b2*x2（民營）。
問題來了，我怎麼比較a1與b1間（以及a2與b2）的差異性呢??
ps 各位富美富帥不要再建議用虛擬變數了，我想用的是一個可以直接比較的方法，看到很多論文中都是這麼做的。最好是可以用stata實現。謝謝啊！

如果你想比較兩組的係數是否一致，加虛擬變數。沒有別的辦法了。分開做的大有人在，但是沒辦法做這兩個的係數檢驗。要想檢驗，就得放在一起加虛擬變數。

===================

科普，反對 @徐惟能的答案。

他對Hausman test的理解有偏誤。

Hausman test的應用場景：存在同一組係數的兩個估計b1 b2，滿足：

在H0的條件下，b1 b2均一致，但是b1是最有效的

在H1的條件下，b1 是不一致的，但是b2是一致的

Hausman證明了，在H0的條件下，var(b2-b1)=var(b2)-var(b1)

故而可以構造統計量，(b2-b1)(var(b2)-var(b1))^(-1)(b2-b1)~chi2

比如，檢驗線性回歸是不是有內生性：

H0：不具有內生性

H1：具有內生性

那麼b1 就是ols回歸結果，b2就是iv的回歸結果。

同樣，檢驗固定效應還是隨機效應，b1是隨機效應結果，b2是固定效應結果。

但在這裡例子中，顯然不滿足Hausman test的前提

所以。。。絕對不能用Hausman test

============================

@徐惟能給出的wiki的網址，摘抄一下：

Consider the linear model y = bX + e, where y is the dependent variable and X is vector of regressors, b is a vector of coefficients and e is the error term. We have two estimators for b: b0 and b1. Under the null hypothesis, both of these estimators are consistent, but b1 is efficient (has the smallest asymptotic variance), at least in the class of estimators containing b0. Under the alternative hypothesis, b0 is consistent, whereas b1 isn』t.

翻譯一下就是我說的意思。

謝邀。你需要比較的是兩個回歸模型之間參數估計是不是存在顯著的差異。這是一個典型的應該使用Hausman Test來完成的例子。具體Stata中實現如下：

reg y a0 a1*x1 a2*x2

estimates store stateowned

reg y b0 b1*x1 b2*x2

estimates store private

hausman stateowned private

其中斜體字為你自己可以定義的名稱。reg命令中的交互項需要事先生成新的變數以後才能放入回歸方程。

參考文獻1：Hausman test

參考文獻2：http://www.stata.com/manuals13/rhausman.pdf

不妥之處請指正。

讓我來給你一個回答。親測可行。如果感覺好請給我點贊。

背景：女票一篇文章要改，初稿時直接比較兩個方程的係數，被審稿人明確說了不行，而且必須分組回歸，所以向我求助。雖然我一直認為這種事情很簡單，但並沒有真實做過，也竟一時凝噎。於是我向我大師兄請教，大師兄建議用虛擬變數放在同一個方程比較。不過該方法不適合我女票。

但我確定分開回歸是可以比較的。因為我在Econometrica上看到一篇文章，他們是直接比較的。請看下圖，尤其是黃色部分，該文發表在Econometrica, Vol. 83, No. 4 (July, 2015), 1315–1371。

於是我繼續google其他解決方法。非常幸運的找到了直接進行比較的stata命令，採用的命令為suest，具體請看下圖

我們可以看到，改命令採用的統計量為卡方統計量，但具體怎麼構造的我真不知道，可以像第一張圖那樣直接得到p值。

當然，記得給你上述方法的原網站，Stata Code Fragment: Comparing Regression Coefficients Across Groups using Suest。

最後，學術不容易，祝你好運。

我在專欄中對此問題進行了詳細的介紹：[連玉君專欄]如何檢驗分組回歸後的組間係數差異？

通過實例介紹了三種方法以及它們在 stata 中的實現方法：

方法1：引入交叉項（Chow 檢驗）
方法2：基於似無相關模型的檢驗方法 (suest)
方法3：費舍爾組合檢驗（Permutation test）

試試似無相關模型

reg y x1 if z==0

est store a

reg y x1 if z==1

est store b

suest a b

test [a_mean]x1=[b_mean]x1

The easiest way to perform the test is by Chow test, as is suggested by @慧航 . However, it inevitably involves the using of dummy variable.

Another method is seemingly unrelated regression, as is suggested by @月生可可 .

經濟學菜鳥一個，肯定是可以做檢驗的，因為CHOW TEST做的就是這個事情，只不過檢驗的是所有回歸係數是否一致。加虛擬變數形成交互項更加方便，但也做出了一個更強的假設，即其他沒有形成交互項的產量在兩個組別中無差異，僅僅形成交互項的產量有差異。個人感覺，CHOW TEST可以通過將所有變數進行虛擬變數交乘，然後檢驗所有交互項是否係數是否為0實現，檢驗單個變數估計CHOW TEST無法實現。總結一點，檢驗可以做，CHOW TEST做不了單個變數的

樓主，解決了嗎？是用chowreg嗎

blinder～oaxaca分解。具體可以參見oaxaca的幾篇paper，貌似在economica上？記不太清楚了。

主要思想是把兩組變數之間的差異分成解釋變數的數值差異和係數差異，係數差異通常可以被視為discrimination。

stata的命令如下:oaxaca 你回歸的方程，by（虛擬變數），weight（0或者1）（0和1是基準不同，所以會導致回歸的結果有差異），然後還有一個option，你去help一下，這個option加上去後，就可以看到具體的分解的結果了。

agree with suest。。。

How do you test the equality of regression coefficients that are generated from two different regressions, estimated on two different samples?

Title： Testing the equality of coefficients across independent areas

Author： Allen McDowell, StataCorpDateApril 2001; updated July 2005

You must set up your data and regression model so that one model is nested in a more general model. For example, suppose you have two regressions,

y = a1 + b1*x

and

z = a2 + b2*x

You rename z to y and append the second dataset onto the first dataset. Then, you generate a dummy variable, call it d, that equals 1 if the data came from the second dataset and 0 if the data came from the first dataset. You then generate the interaction between x and d, i.e., w = d*x. Next, you estimate

y = a1 + a2*d + b1*x + b2*w

You can now test whether a2 and b2 are separately or jointly zero. This method generalizes in a straightforward manner to regressions with more than one independent variable.

Here is an example:

. set obs 10 obs was 0, now 10 . set seed 2001 . generate x = invnormal(uniform()) . generate y = 10 + 15*x + 2*invnormal(uniform()) . generate d=0 . regress y x Source | SS df MS Number of obs = 10 -------------+------------------------------ F( 1, 8) = 1363.66 Model | 2369.31814 1 2369.31814 Prob &> F = 0.0000 Residual | 13.8997411 8 1.73746764 R-squared = 0.9942 -------------+------------------------------ Adj R-squared = 0.9934 Total | 2383.21788 9 264.801986 Root MSE = 1.3181 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P&>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 14.88335 .4030394 36.93 0.000 13.95394 15.81276 _cons | 10.15211 .4218434 24.07 0.000 9.179336 11.12488 ------------------------------------------------------------------------------ . save first file first.dta saved . clear . set obs 10 obs was 0, now 10 . set seed 2002 . generate x = invnormal(uniform()) . generate y = 19 + 17*x + 2*invnormal(uniform()) . generate d=1 . regress y x Source | SS df MS Number of obs = 10 -------------+------------------------------ F( 1, 8) = 177.94 Model | 1677.80047 1 1677.80047 Prob &> F = 0.0000 Residual | 75.4304659 8 9.42880824 R-squared = 0.9570 -------------+------------------------------ Adj R-squared = 0.9516 Total | 1753.23094 9 194.803438 Root MSE = 3.0706 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P&>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 17.3141 1.297951 13.34 0.000 14.32102 20.30718 _cons | 18.37409 .9710377 18.92 0.000 16.13488 20.61331 ------------------------------------------------------------------------------ . save second file second.dta saved . append using first . generate w = x*d . regress y x w d Source | SS df MS Number of obs = 20 -------------+------------------------------ F( 3, 16) = 275.76 Model | 4618.88818 3 1539.62939 Prob &> F = 0.0000 Residual | 89.330207 16 5.58313794 R-squared = 0.9810 -------------+------------------------------ Adj R-squared = 0.9775 Total | 4708.21839 19 247.800968 Root MSE = 2.3629 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P&>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 14.88335 .7224841 20.60 0.000 13.35176 16.41495 w | 2.430745 1.232696 1.97 0.066 -.1824545 5.043945 d | 8.221983 1.06309 7.73 0.000 5.968334 10.47563 _cons | 10.15211 .756192 13.43 0.000 8.549053 11.75516 ------------------------------------------------------------------------------

Notice that the constant and the coefficient on x are exactly the same as in the first regression. Here is a simple way to test that the coefficients on the dummy variable and the interaction term are jointly zero. This is, in effect, testing if the estimated parameters from the first regression are statistically different from the estimated parameters from the second regression:

. test _b[d] =0, notest ( 1) d = 0 . test _b[w] = 0, accum ( 1) d = 0 ( 2) w = 0 F( 2, 16) = 31.04 Prob &> F = 0.0000

Here is how you construct the constant from the second regression from the estimated parameters of the third regression:

. lincom _b[_cons] + _b[d] ( 1) d + _cons = 0 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P&>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | 18.37409 .7472172 24.59 0.000 16.79006 19.95812 ------------------------------------------------------------------------------

Here is how you construct the coefficient on x from the second regression using the estimated parameters from the third regression.

. lincom _b[x] + _b[w] ( 1) x + w = 0 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P&>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | 17.3141 .9987779 17.34 0.000 15.19678 19.43141 ------------------------------------------------------------------------------

chow檢驗，王少飛有一篇文章用的就是這個

chow test

參考：Stata | FAQ: Chow tests

鄒至庄檢驗，參考https://zh.m.wikipedia.org/zh-cn/%E9%82%B9%E6%A3%80%E9%AA%8C

反對 @慧航的答案，確實是可以做到分開回歸，不用加虛擬變數的。

參考http://www.brynmawr.edu/socialwork/GSSW/Vartanian/Handouts/chowtest.pdf

因為剛好我也在思考這個問題，所以回答一下。

請@慧航@徐惟能兩位指正。

可以用Hausman test。

Hausman test的應用場景：存在同一組係數的兩個估計b1 b2，滿足：

在H0的條件下，b1 b2均一致，但是b1是最有效的

在H1的條件下，b1 是不一致的，但是b2是一致的

H0在這個情形下，就是國有和民有兩組服從同樣的回歸，有同樣的係數。H1是兩者係數不一樣（或者個別係數不一樣）。

b1是把兩組數據合在一起跑一個回歸得到的係數。b2是把兩組數據分開跑回歸得到的係數。

如果H0是正確的，那麼b1,b2都是一致的，但是因為b1是兩組數跑同一組回歸得到的，所以方差會小，從而更有效（有效可以換個角度說，就是H0正確的話，估計b1的時候使用了這個信息，那麼當然估計值更有效。b2沒有使用這個信息，所以雖然是一致的，但是沒那麼有效）。

如果H0是錯的，也就是H1是正確的。那麼b1是不一致的，而b2是一致的。

所以可以使用Hausman test。

但是這個問題使用Hausman test其實蠻麻煩的。不建議。

同一個模型兩組不同樣本下，回歸係數間的差異性檢驗？？（ 非虛擬變數，stata方法）

同一個模型兩組不同樣本下，回歸係數間的差異性檢驗？？（非虛擬變數，stata方法）