SOLUTIONS TO COMPUTER EXERCISES IN ASSIGNMENT 3

Question C7.2

part (i)

The estimated model is

. reg  lwage educ exper tenure married black south urban

      Source |       SS       df       MS              Number of obs =     935
-------------+------------------------------           F(  7,   927) =   44.75
       Model |  41.8377619     7  5.97682312           Prob > F      =  0.0000
    Residual |  123.818521   927  .133569063           R-squared     =  0.2526
-------------+------------------------------           Adj R-squared =  0.2469
       Total |  165.656283   934  .177362188           Root MSE      =  .36547

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .0654307   .0062504    10.47   0.000     .0531642    .0776973
       exper |    .014043   .0031852     4.41   0.000      .007792     .020294
      tenure |   .0117473    .002453     4.79   0.000     .0069333    .0165613
     married |   .1994171   .0390502     5.11   0.000     .1227801     .276054
       black |  -.1883499   .0376666    -5.00   0.000    -.2622717   -.1144281
       south |  -.0909036   .0262485    -3.46   0.001     -.142417   -.0393903
       urban |   .1839121   .0269583     6.82   0.000     .1310056    .2368185
       _cons |   5.395497    .113225    47.65   0.000      5.17329    5.617704
------------------------------------------------------------------------------

The approximate difference in monthly salary between blacks and nonblacks is -18.8% (strictly speaking, you should use the exact formula that we discussed in class), i.e. black men earn about 18.8% less than the nonblack men. The difference is statistically significant (t-stat.=3.92).

part (ii)

Including the squares of exper and tenure gives

. reg  lwage educ exper tenure married black south urban exper2 tenure2

      Source |       SS       df       MS              Number of obs =     935
-------------+------------------------------           F(  9,   925) =   35.17
       Model |  42.2353257     9  4.69281397           Prob > F      =  0.0000
    Residual |  123.420958   925  .133428062           R-squared     =  0.2550
-------------+------------------------------           Adj R-squared =  0.2477
       Total |  165.656283   934  .177362188           Root MSE      =  .36528

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .0642761   .0063115    10.18   0.000     .0518896    .0766625
       exper |   .0172146   .0126138     1.36   0.173    -.0075403    .0419695
      tenure |   .0249291   .0081297     3.07   0.002     .0089743    .0408838
     married |    .198547   .0391103     5.08   0.000     .1217917    .2753023
       black |  -.1906636   .0377011    -5.06   0.000    -.2646533    -.116674
       south |  -.0912153   .0262356    -3.48   0.001    -.1427035   -.0397271
       urban |   .1854241   .0269585     6.88   0.000     .1325171    .2383311
      exper2 |  -.0001138   .0005319    -0.21   0.831    -.0011576      .00093
     tenure2 |  -.0007964    .000471    -1.69   0.091    -.0017208    .0001279
       _cons |   5.358676   .1259143    42.56   0.000     5.111565    5.605787
------------------------------------------------------------------------------

The F-test for joint significance of exper2 and tenure2 is

. test exper2 tenure2

 ( 1)  exper2 = 0
 ( 2)  tenure2 = 0

       F(  2,   925) =    1.49
            Prob > F =    0.2260

The p-value shows that the 2 variables are jointly insignificant at 20% significance level.

part (iii)

Adding the interaction term black*educ to the model in part (i) gives

. reg  lwage educ exper tenure married black south urban black_educ

      Source |       SS       df       MS              Number of obs =     935
-------------+------------------------------           F(  8,   926) =   39.32
       Model |  42.0055468     8  5.25069335           Prob > F      =  0.0000
    Residual |  123.650736   926  .133532113           R-squared     =  0.2536
-------------+------------------------------           Adj R-squared =  0.2471
       Total |  165.656283   934  .177362188           Root MSE      =  .36542

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .0671153   .0064277    10.44   0.000     .0545008    .0797298
       exper |   .0138259   .0031906     4.33   0.000     .0075642    .0200876
      tenure |    .011787   .0024529     4.81   0.000     .0069732    .0166009
     married |   .1989077   .0390474     5.09   0.000     .1222761    .2755393
       black |   .0948086   .2553994     0.37   0.711    -.4064202    .5960375
       south |  -.0894495   .0262769    -3.40   0.001    -.1410187   -.0378803
       urban |   .1838523   .0269547     6.82   0.000      .130953    .2367516
  black_educ |  -.0226236   .0201827    -1.12   0.263    -.0622326    .0169854
       _cons |   5.374817   .1147027    46.86   0.000      5.14971    5.599925
------------------------------------------------------------------------------

The coefficient on black_educ shows that the return to education for black men is -2.3% lower than the the return to education for nonblack men (6.7%) but the
difference is statistically insignificant (t-stat=1.12) at the usual significance levels.

part (iv)

Choosing single, nonblack men as a base group

. reg  lwage educ exper tenure south urban single_black married_bl married_nbl

      Source |       SS       df       MS              Number of obs =     935
-------------+------------------------------           F(  8,   926) =   39.17
       Model |  41.8849359     8  5.23561699           Prob > F      =  0.0000
    Residual |  123.771347   926  .133662362           R-squared     =  0.2528
-------------+------------------------------           Adj R-squared =  0.2464
       Total |  165.656283   934  .177362188           Root MSE      =   .3656

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .0654751    .006253    10.47   0.000     .0532034    .0777469
       exper |   .0141462    .003191     4.43   0.000     .0078837    .0204087
      tenure |   .0116628   .0024579     4.74   0.000      .006839    .0164866
       south |  -.0919894   .0263212    -3.49   0.000    -.1436455   -.0403333
       urban |   .1843501   .0269778     6.83   0.000     .1314053    .2372948
single_black |    -.24082   .0960229    -2.51   0.012    -.4292677   -.0523723
  married_bl |   .0094484   .0560131     0.17   0.866    -.1004789    .1193757
 married_nbl |   .1889147   .0428777     4.41   0.000     .1047659    .2730635
       _cons |   5.403793   .1141222    47.35   0.000     5.179825    5.627762
------------------------------------------------------------------------------

The estimated wage difference between married blacks and married nonblacks is 100*(.0094-.1889)=-18%, i.e. , on average, married black men earn 18% less than married nonblack men, holding all the other factors fixed.
 

Question C7.6

part (i)

The estimated equation for men is

. reg  sleep totwrk educ age age2 yngkid if male==1

      Source |       SS       df       MS              Number of obs =     367
-------------+------------------------------           F(  5,   361) =   14.28
       Model |  12345920.8     5  2469184.17           Prob > F      =  0.0000
    Residual |  62412191.5   361  172886.957           R-squared     =  0.1651
-------------+------------------------------           Adj R-squared =  0.1536
       Total |  74758112.4   366  204257.138           Root MSE      =   415.8

------------------------------------------------------------------------------
       sleep |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      totwrk |  -.1903699   .0261503    -7.28   0.000    -.2417959   -.1389439
        educ |  -12.99377    8.09208    -1.61   0.109     -28.9073    2.919767
         age |   6.766816   15.40098     0.44   0.661    -23.52009    37.05372
        age2 |  -.0328957   .1815595    -0.18   0.856    -.3899428    .3241514
      yngkid |   65.06551   63.65015     1.02   0.307    -60.10615    190.2372
       _cons |   3665.397   332.4681    11.02   0.000      3011.58    4319.215
------------------------------------------------------------------------------

and the estimated equation for women is

. reg  sleep totwrk educ age age2 yngkid if male==0

      Source |       SS       df       MS              Number of obs =     283
-------------+------------------------------           F(  5,   277) =    6.03
       Model |   6141361.7     5  1228272.34           Prob > F      =  0.0000
    Residual |  56394191.8   277   203589.14           R-squared     =  0.0982
-------------+------------------------------           Adj R-squared =  0.0819
       Total |  62535553.5   282  221757.282           Root MSE      =  451.21

------------------------------------------------------------------------------
       sleep |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      totwrk |  -.1455954   .0294894    -4.94   0.000    -.2036472   -.0875436
        educ |  -9.284323   10.30645    -0.90   0.368    -29.57323    11.00459
         age |    -29.544   19.86969    -1.49   0.138    -68.65878    9.570786
        age2 |   .3651966   .2394761     1.52   0.128    -.1062278    .8366209
      yngkid |  -79.66814   104.5678    -0.76   0.447    -285.5166    126.1803
       _cons |   4215.637   410.4392    10.27   0.000      3407.66    5023.613
------------------------------------------------------------------------------

There are notable differences in the two equations. For instance, having a young child leads to more sleep for men (by 1 hour and 5 minutes per week) but less
sleep for women (by 1 hour and 20 minutes per week) although these effects appear to be insignificant.

part (ii)

The Chow test in this case is an F-test on male and the interaction terms in the regression

. reg  sleep totwrk educ age age2 yngkid male male_twrk male_educ male_age male_age2 male_ykid

      Source |       SS       df       MS              Number of obs =     650
-------------+------------------------------           F( 11,   638) =    9.17
       Model |  18775388.4    11  1706853.49           Prob > F      =  0.0000
    Residual |   118806383   638  186216.902           R-squared     =  0.1365
-------------+------------------------------           Adj R-squared =  0.1216
       Total |   137581772   649  211990.403           Root MSE      =  431.53

------------------------------------------------------------------------------
       sleep |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      totwrk |  -.1455954   .0282032    -5.16   0.000    -.2009777   -.0902131
        educ |  -9.284323   9.856919    -0.94   0.347    -28.64025     10.0716
         age |    -29.544   19.00306    -1.55   0.121    -66.86009    7.772096
        age2 |   .3651966   .2290311     1.59   0.111    -.0845494    .8149425
      yngkid |  -79.66814   100.0069    -0.80   0.426    -276.0507    116.7144
        male |  -550.2391    522.631    -1.05   0.293    -1576.524    476.0457
   male_twrk |  -.0447745   .0391405    -1.14   0.253    -.1216344    .0320854
   male_educ |  -3.709445   12.94949    -0.29   0.775    -29.13823    21.71934
    male_age |   36.31082   24.83132     1.46   0.144    -12.45017     85.0718
   male_age2 |  -.3980923   .2965817    -1.34   0.180    -.9804865    .1843019
   male_ykid |   144.7336   119.8545     1.21   0.228    -90.62328    380.0906
       _cons |   4215.637   392.5375    10.74   0.000     3444.815    4986.458
------------------------------------------------------------------------------

.  test male male_twrk male_educ male_age male_age2 male_ykid

 ( 1)  male = 0
 ( 2)  male_twrk = 0
 ( 3)  male_educ = 0
 ( 4)  male_age = 0
 ( 5)  male_age2 = 0
 ( 6)  male_ykid = 0

       F(  6,   638) =    1.67
            Prob > F =    0.1248

The relevant degrees of freedom are (6,638) and we cannot reject the null at 5% significance level since the p-value of the test is .125 (>.05).

part (iii)

The F-test is now is

.  test male_twrk male_educ male_age male_age2 male_ykid

 ( 1)  male_twrk = 0
 ( 2)  male_educ = 0
 ( 3)  male_age = 0
 ( 4)  male_age2 = 0
 ( 5)  male_ykid = 0

       F(  5,   638) =    0.97
            Prob > F =    0.4353

and we cannot reject the null at the usual significant levels.

part (iv)

Since there are no statistically significant differences in the coefficients for the two groups, the final model should not allow for gender differences.

Question C8.2

part (i)

The regression results with non-robust and robust standard errors are

. reg price bdrms lotsize sqrft

      Source |       SS       df       MS              Number of obs =      88
-------------+------------------------------           F(  3,    84) =   57.46
       Model |  6.1713e+11     3  2.0571e+11           Prob > F      =  0.0000
    Residual |  3.0072e+11    84  3.5800e+09           R-squared     =  0.6724
-------------+------------------------------           Adj R-squared =  0.6607
       Total |  9.1785e+11    87  1.0550e+10           Root MSE      =   59833

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       bdrms |   13852.52   9010.145     1.54   0.128     -4065.14    31770.18
     lotsize |   2.067707   .6421258     3.22   0.002      .790769    3.344644
       sqrft |   122.7782   13.23741     9.28   0.000     96.45415    149.1022
       _cons |  -21770.31   29475.04    -0.74   0.462    -80384.66    36844.04
------------------------------------------------------------------------------

. reg price bdrms lotsize sqrft, robust

Regression with robust standard errors                 Number of obs =      88
                                                       F(  3,    84) =   23.72
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.6724
                                                       Root MSE      =   59833

------------------------------------------------------------------------------
             |               Robust
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       bdrms |   13852.52   8478.625     1.63   0.106    -3008.153     30713.2
     lotsize |   2.067707   1.251424     1.65   0.102    -.4208879    4.556301
       sqrft |   122.7782   17.72533     6.93   0.000     87.52942     158.027
       _cons |  -21770.31   37138.21    -0.59   0.559    -95623.71    52083.09
------------------------------------------------------------------------------

The most important difference is associated with the variable lotsize. Its robust std. error is almost twice as large as the non-robust std. error making lotsize
less significant (t-stat. drops from 3.2 to 1.7). the std. errors and the t-statistics for the other variables are less affected.

part (ii)

For the log-log model, the results are

. reg lprice bdrms llotsize lsqrft

      Source |       SS       df       MS              Number of obs =      88
-------------+------------------------------           F(  3,    84) =   50.42
       Model |   5.1550402     3  1.71834673           Prob > F      =  0.0000
    Residual |  2.86256399    84  .034078143           R-squared     =  0.6430
-------------+------------------------------           Adj R-squared =  0.6302
       Total |   8.0176042    87   .09215637           Root MSE      =   .1846

------------------------------------------------------------------------------
      lprice |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       bdrms |   .0369585   .0275313     1.34   0.183    -.0177905    .0917075
    llotsize |   .1679666   .0382812     4.39   0.000     .0918403    .2440929
      lsqrft |   .7002324   .0928653     7.54   0.000     .5155596    .8849051
       _cons |   5.610714   .6512837     8.61   0.000     4.315565    6.905863
------------------------------------------------------------------------------

. reg lprice bdrms llotsize lsqrft, robust

Regression with robust standard errors                 Number of obs =      88
                                                       F(  3,    84) =   49.32
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.6430
                                                       Root MSE      =   .1846

------------------------------------------------------------------------------
             |               Robust
      lprice |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       bdrms |   .0369585   .0306011     1.21   0.231    -.0238952    .0978121
    llotsize |   .1679666   .0414734     4.05   0.000     .0854922    .2504411
      lsqrft |   .7002324   .1038288     6.74   0.000     .4937575    .9067072
       _cons |   5.610714   .7813144     7.18   0.000     4.056985    7.164443
------------------------------------------------------------------------------

It appears that the log transformation mitigated the heteroskedasticity in the data and the differences in the std. errors now are relatively small.

part (iii)

Similar to the discussion in part (ii)
 

Additional Problem 1

part (a)

The estimated probability model is

. reg  lfp lnnlinc age educ nyc noc, robust

Regression with robust standard errors                 Number of obs =     872
                                                       F(  5,   866) =   26.39
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.1097
                                                       Root MSE      =  .47188

------------------------------------------------------------------------------
             |               Robust
         lfp |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     lnnlinc |  -.1891899   .0370341    -5.11   0.000     -.261877   -.1165028
         age |  -.0122417     .00178    -6.88   0.000    -.0157353   -.0087481
        educ |  -.0098705   .0055969    -1.76   0.078    -.0208556    .0011145
         nyc |  -.2482384   .0328051    -7.57   0.000    -.3126252   -.1838515
         noc |  -.0013628   .0159004    -0.09   0.932    -.0325706     .029845
       _cons |   3.141225   .3758997     8.36   0.000     2.403444    3.879006
------------------------------------------------------------------------------

We need to use heteroskedasticity-robust standard errors since we derived in class that the conditional variance of the errors depends on the regressors. The effects
of income, age and number of young children are strongly significant for the labour force participation of the married women in the sample. Note that the
number of children older than 7 years (noc) has no effect on lfp. Including the square of age in the model produces the following results

. reg  lfp lnnlinc age age2 educ nyc noc, robust

Regression with robust standard errors                 Number of obs =     872
                                                       F(  6,   865) =   36.23
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.1568
                                                       Root MSE      =   .4595

------------------------------------------------------------------------------
             |               Robust
         lfp |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     lnnlinc |  -.2375937    .036336    -6.54   0.000    -.3089106   -.1662767
         age |   .0782537   .0123883     6.32   0.000      .053939    .1025684
        age2 |  -.0011104   .0001482    -7.49   0.000    -.0014013   -.0008195
        educ |  -.0081101   .0054628    -1.48   0.138    -.0188319    .0026118
         nyc |  -.2281682    .032189    -7.09   0.000    -.2913459   -.1649905
         noc |  -.0547312   .0176621    -3.10   0.002    -.0893968   -.0200657
       _cons |   1.968587   .4040262     4.87   0.000     1.175601    2.761573
------------------------------------------------------------------------------

It is interesting to see now that noc has a significant effect on lfp and age has a diminishing effect (inverted U-shape) on lfp. The marginal effects of age on lfp for a 20-year and a 50-year old women are .078-2*.0011*20=.034 and .078-2*.0011*50=-.032, respectively, i.e. the age effect switched from positive (for a 20-year old woman) to negative (for a 50-year old woman). Call the predicted probabilities yhat. Then,

. summarize yhat if yhat<0

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
        yhat |        14   -.1027707    .0941968  -.3507182  -.0185634

. summarize yhat if yhat>1

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
        yhat |         1    1.016574           .   1.016574   1.016574

the results show that there are 14 negative predicted probabilities and 1 predicted probability larger than 1.

part (b)

The results for the probit model are

. probit  lfp lnnlinc age age2 educ nyc noc

Iteration 0:   log likelihood = -601.61168
Iteration 1:   log likelihood = -527.81439
Iteration 2:   log likelihood = -526.37879
Iteration 3:   log likelihood = -526.37677

Probit estimates                                  Number of obs   =        872
                                                  LR chi2(6)      =     150.47
                                                  Prob > chi2     =     0.0000
Log likelihood = -526.37677                       Pseudo R2       =     0.1251

------------------------------------------------------------------------------
         lfp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     lnnlinc |  -.7385525   .1323584    -5.58   0.000    -.9979702   -.4791347
         age |   .2332357   .0398307     5.86   0.000      .155169    .3113024
        age2 |     -.0033   .0004906    -6.73   0.000    -.0042616   -.0023383
        educ |  -.0232248   .0162045    -1.43   0.152     -.054985    .0085353
         nyc |  -.6431679    .095832    -6.71   0.000    -.8309951   -.4553406
         noc |  -.1581265   .0501046    -3.16   0.002    -.2563297   -.0599234
       _cons |   4.652659   1.407971     3.30   0.001     1.893087     7.41223
------------------------------------------------------------------------------

. predict yhat2, p

. summarize yhat2

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       yhat2 |       872    .4607822    .1985681    .004972   .9493863

. dprobit  lfp lnnlinc age age2 educ nyc noc

Iteration 0:   log likelihood = -601.61168
Iteration 1:   log likelihood = -527.81439
Iteration 2:   log likelihood = -526.37879
Iteration 3:   log likelihood = -526.37677

Probit estimates                                        Number of obs =    872
                                                        LR chi2(6)    = 150.47
                                                        Prob > chi2   = 0.0000
Log likelihood = -526.37677                             Pseudo R2     = 0.1251

------------------------------------------------------------------------------
     lfp |      dF/dx   Std. Err.      z    P>|z|     x-bar  [    95% C.I.   ]
---------+--------------------------------------------------------------------
 lnnlinc |  -.2922545   .0523248    -5.58   0.000   10.6856  -.394809   -.1897
     age |   .0922943   .0157431     5.86   0.000   39.9553   .061438   .12315
    age2 |  -.0013058   .0001939    -6.73   0.000   1707.63  -.001686 -.000926
    educ |  -.0091904   .0064122    -1.43   0.152   9.30734  -.021758  .003377
     nyc |  -.2545096   .0378944    -6.71   0.000   .311927  -.328781 -.180238
     noc |  -.0625726   .0198295    -3.16   0.002   .982798  -.101438 -.023708
---------+--------------------------------------------------------------------
  obs. P |   .4598624
 pred. P |   .4492704  (at x-bar)
------------------------------------------------------------------------------
    z and P>|z| are the test of the underlying coefficient being 0

Now the predicted probabilities are between 0 and 1 (the minimum is .005 and the maximum is .95). The estimated marginal effects from the two models are fairly
similar.
 

Additional Problem 2

part (a)

Run the regression in equation (3)

. reg  lwage educ exper tenure

      Source |       SS       df       MS              Number of obs =     935
-------------+------------------------------           F(  3,   931) =   56.97
       Model |  25.6953242     3  8.56510806           Prob > F      =  0.0000
    Residual |  139.960959   931  .150334005           R-squared     =  0.1551
-------------+------------------------------           Adj R-squared =  0.1524
       Total |  165.656283   934  .177362188           Root MSE      =  .38773

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .0748638   .0065124    11.50   0.000      .062083    .0876446
       exper |   .0153285   .0033696     4.55   0.000     .0087156    .0219413
      tenure |   .0133748   .0025872     5.17   0.000     .0082974    .0184522
       _cons |   5.496696   .1105282    49.73   0.000     5.279782    5.713609
------------------------------------------------------------------------------

and generate variables for the square of the residuals (res2), fitted values (yhat), square and cube of the fitted values (yhat2 and yhat3). The RESET test is obtained as an F-test on yhat2 and yhat3 in the regression

. reg lwage educ exper tenure yhat2 yhat3

      Source |       SS       df       MS              Number of obs =     935
-------------+------------------------------           F(  5,   929) =   34.16
       Model |  25.7240527     5  5.14481053           Prob > F      =  0.0000
    Residual |  139.932231   929  .150626728           R-squared     =  0.1553
-------------+------------------------------           Adj R-squared =  0.1507
       Total |  165.656283   934  .177362188           Root MSE      =  .38811

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   4.882578   18.34014     0.27   0.790    -31.11034    40.87549
       exper |   .9994516   3.754893     0.27   0.790    -6.369604    8.368507
      tenure |   .8721999   3.276544     0.27   0.790    -5.558086    7.302485
       yhat2 |   -9.30392   36.02601    -0.26   0.796    -80.00571    61.39787
       yhat3 |   .4490062   1.765325     0.25   0.799    -3.015481    3.913493
       _cons |   210.8379    791.544     0.27   0.790    -1342.584     1764.26
------------------------------------------------------------------------------

. test yhat2 yhat3

 ( 1)  yhat2 = 0
 ( 2)  yhat3 = 0

       F(  2,   929) =    0.10
            Prob > F =    0.9091

Since the p-value is very large, we cannot reject the null hypothesis that the model is correctly specified. The White test for heteroskedasticity is computed from

. reg res2 yhat yhat2

      Source |       SS       df       MS              Number of obs =     935
-------------+------------------------------           F(  2,   932) =    2.48
       Model |  .308151255     2  .154075627           Prob > F      =  0.0841
    Residual |  57.8348912   932  .062054604           R-squared     =  0.0053
-------------+------------------------------           Adj R-squared =  0.0032
       Total |  58.1430424   934  .062251651           Root MSE      =  .24911

------------------------------------------------------------------------------
        res2 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        yhat |  -2.325484   3.387548    -0.69   0.493    -8.973589    4.322621
       yhat2 |   .1632434   .2489785     0.66   0.512      -.32538    .6518668
       _cons |   8.407836   11.51798     0.73   0.466    -14.19635    31.01202
------------------------------------------------------------------------------

. test yhat yhat2

 ( 1)  yhat = 0
 ( 2)  yhat2 = 0

       F(  2,   932) =    2.48
            Prob > F =    0.0841

Since the p-value is .084 we reject the null of homoskedasticity at 10% sign. level. Therefore, we can use the specification in equation (3) but we need to
compute heteroskedasticity-robust standard errors.

part (b)

IQ is included in the model as a proxy for the unobserved ability. If IQ is omitted, we expect a positive bias in the estimated return to education, i.e.,
on average, the estimated return to education will be above its true value. Including IQ in the model from part (i) gives

. reg  lwage educ exper tenure IQ

      Source |       SS       df       MS              Number of obs =     935
-------------+------------------------------           F(  4,   930) =   52.04
       Model |  30.2968781     4  7.57421953           Prob > F      =  0.0000
    Residual |  135.359405   930  .145547747           R-squared     =  0.1829
-------------+------------------------------           Adj R-squared =  0.1794
       Total |  165.656283   934  .177362188           Root MSE      =  .38151

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .0555855   .0072675     7.65   0.000     .0413228    .0698481
       exper |    .015425   .0033155     4.65   0.000     .0089182    .0219318
      tenure |   .0123705   .0025519     4.85   0.000     .0073623    .0173787
          IQ |   .0054564   .0009704     5.62   0.000     .0035519    .0073608
       _cons |   5.209858   .1201247    43.37   0.000     4.974111    5.445605
------------------------------------------------------------------------------

As expected, including a proxy for ability corrects for the upward bias in the estimate for return to education and it drops from 7.5% in part (i) to 5.6%.

part (c)

If ability (or a proxy for ability) is not included in the model, the OLS estimator is biased and inconsistent. We can obtain a consistent estimator
for the return to education by IV provided that we select an "appropriate" instrumental variable for education. Using siblings as an instrumental variable yields

. ivreg  lwage exper tenure (educ=sibs)

Instrumental variables (2SLS) regression

      Source |       SS       df       MS              Number of obs =     935
-------------+------------------------------           F(  3,   931) =   18.89
       Model |  13.3491054     3   4.4497018           Prob > F      =  0.0000
    Residual |  152.307178   931   .16359525           R-squared     =  0.0806
-------------+------------------------------           Adj R-squared =  0.0776
       Total |  165.656283   934  .177362188           Root MSE      =  .40447

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .1338815   .0291925     4.59   0.000     .0765909    .1911722
       exper |   .0294039   .0076291     3.85   0.000     .0144317    .0443762
      tenure |   .0113425   .0028705     3.95   0.000     .0057091    .0169759
       _cons |   4.553756   .4680328     9.73   0.000     3.635235    5.472278
------------------------------------------------------------------------------
Instrumented:  educ
Instruments:   exper tenure sibs
------------------------------------------------------------------------------

Surprisingly, the return to education from the IV regression is higher than its OLS estimate from part (i). This points to some problems in using sibs as an
instrument for educ (see the discussion in Example 15.2 on p.491 in the text).