Multiple Regression_第1頁
Multiple Regression_第2頁
Multiple Regression_第3頁
Multiple Regression_第4頁
Multiple Regression_第5頁
已閱讀5頁,還剩10頁未讀 繼續免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

1、3Multiple Regression3.1IntroductionThe bivariate regression model analysed so far is quite restrictive as it only allows Y to be influenced by a single regressor X. A moments reflection about the two examples that we have been using shows how limited this model is: why shouldnt salary be influenced

2、by both post-school education and experience, and ought we to consider determinants of consumption other than just income, such as inflation and interest rates?The question that we address now is how should we go about estimating the parameters of the model,One possibility might be to regress Y on s

3、eparately, thus obtaining the bivariate regression slope estimatesbut two intercept estimatesThis non-uniqueness of the intercept estimates is just one problem of this approach. An arguably much more serious defect is that, in general, and as we will show later, are biased estimates of :To obtain BL

4、U estimates, we must estimate a multiple regression by OLS. The PRF now becomesand assumption 4 generalises toOLS estimates are obtained by minimisingwith respect to . This yields three normal equations, which can be solved to obtainIt can also be shown thatThese formulae contain , an estimate of wh

5、ich is now given bywhereIt then follows that(Note that the degrees of freedom are because we now have two regressors). Hypothesis tests and confidence intervals can be constructed in an analogous fashion to simple, bivariate, regression.However, with two regressors, further hypotheses are of interes

6、t. One in particular is the joint hypothesiswhich is to be tested against the alternative that are not both zero. This can be tested using the method introduced previously, being based on the statisticwhere (=RSS)andNow, analogous to in bivariate regression, we can define the coefficient of MULTIPLE

7、 determination as , and some algebra then yieldsSalaries, education and experience yet againReturning to our salaries example, multiple regression yields the following estimates, now denoting education as and experience as and noting that the common denominator isThe variances and standard errors of

8、 the slope estimates are thusFor the variance and hence standard error of the intercept, we needso thatandThe t-ratios for testing the individual hypotheses and are,andSince , can be rejected, and cannot be rejected at the 5% level. strictly cannot be rejected, but it can be at the 5.02% level (whic

9、h is known as the p- or prob-value of the test).Furthermore,andAs , we can reject . The regressors explain 94.4% of the variation in Y.The estimated multiple regression has some interesting features that are worth comparing with the two bivariate regressions:andFirst, the estimates of and are close

10、to each other in the multiple and Y on regressions, but the estimates and of are very different: indeed, they are of different signs and the former is insignificant. Consequently, it appears that , post-school education, is not a significant factor in determining salaries after all, and only experie

11、nce counts. Education appears significant when experience is excluded, so that it is acting as a proxy and the Y on regression is spurious.Second, in the multiple regression we have only two degrees of freedom, so information is extremely limited. This is why we need t-ratios in excess of 4.3 for co

12、efficients to be significant at the 5% level and this can be difficult to achieve. It is often wise to choose larger significance levels (and hence smaller critical values) for small sample sizes and, conversely, low significance levels for very large samples. Further, note that the F-statistic reje

13、cts even though cannot be rejected on a t-test: just one of the coefficients need to be non-zero for to be rejected.The spurious nature of the Y on regression can be explained algebraically. Comparing the formulae for and ,we see that only if , i.e., only if are uncorrelated (, in which case we will

14、 also have ). If the two estimates are identical, multiple regression collapses to a pair of simple regressions. In general, though,Thussince the last term in the formula for has zero expectation from assumption 4. Hence, if and are the same sign, and the bias is positive, whereas if they are of opp

15、osite sign the bias is negative. Two related points are worth noting: (i) is the same sign as the correlation between , and (ii) is the slope coefficient in the regression of . We can thus explain why we obtained the results that we did from simple regression: are positively correlated () and, if is

16、 actually zero and is positive, , so that obtaining when is consistent with theory.3.2Regression with k Explanatory VariablesLet us now consider the general multiple regression case when we have k regressors:OLS estimates are BLUE under our set of assumptions. We do not provide formulae as they are

17、impossible to obtain algebraically without using a matrix formulation (which we shall present later in the course) and hence can only realistically be calculated using an appropriate econometric computer package. Nevertheless, all standard results carry forward with minor alterations to reflect the

18、number of regressors, e.g.,We have referred to the quantity as the degrees of freedom of the regression. Why is this? Suppose , so that the normal equations areThese three equations fix the values of three residuals, so that only are free to vary. Thus, if there are regressors then there are NO degr

19、ees of freedom and the regression technique breaks down (in practice, it becomes problematic well before this limit is reached: recall the salary example!)Including an additional variable in the regression cannot increase the RSS, for it will always explain some part of the variation of Y, even if o

20、nly a tiny (and insignificant) amount. Hence, from its definition, will increase towards 1 as more regressors are added, even though they may be unnecessary. To adjust for this effect, we can define the R-bar-squared statistic will only increase when an additional regressor is included if the t-rati

21、o associated with the regressor exceeds unity, and it can even go negative!3.3Hypothesis Tests in Multiple RegressionThere are a variety of hypotheses in multiple regression models that we might wish to consider. All can be treated within the general framework of the F-test by interpreting the null

22、hypothesis as a set of (linear) restrictions imposed on the regression model which we wish to test to find out whether they are acceptable:where: RSS from the unrestricted regression, i.e., the regression estimated under the alternative hypothesis (without restrictions).: RSS from the restricted reg

23、ression, i.e., the regression estimated under the null hypothesis (with restrictions imposed).r: the number of restrictions imposed by the null hypothesis.An equivalent form of the test statistic, which may be easier to compute from regression output, iswhere are the s from the unrestricted and rest

24、ricted regressions respectively.Some examples of hypotheses that might be investigated are the following:(a) We might be interested in testing whether a subset of the coefficients are zero, e.g.,i.e., that the last r regressors are irrelevant (the theory that suggests including them in the model is

25、false). Here the restricted regression is one that contains the first regressors (note that the ordering of the regressors is our choice):(b) A more complicated type of restriction is where the coefficients obey a linear restriction of the general formwhere are constants. An example of this type of

26、restriction, which occurs quite regularly in economics, is two coefficients summing to zero, e.g., . This is obtained from the general form by setting . To construct a test of this hypothesis we have to be able to estimate the restricted regression. Suppose for simplicity. The restricted model is th

27、en, since the hypothesis implies ,orso that the restricted model is the regression of Y on . An equivalent test is the t-ratio on in the regression of Y on X* and - why? Modelling food expenditureLet us now consider a detailed example of multiple regression modelling. Here we use data provided by Do

28、ugherty, Introduction to Econometrics, to model the determinants of food expenditure in the U.S. from 1959 to 1983. The assumed model is that the aggregate expenditure on food, Y, depends upon aggregate personal income, Z, aggregate personal taxation, T, and the relative price of food, P. (Note that

29、 we have given distinct letters to the regressors for easier recognition and t subscripts are used below because we are dealing with time series data).The estimated regression, using annual observations, isThe t-ratios in show that each individual regressor is significant and , being so close to 1,

30、confirms the overall significance of the regression (indeed, the associated F-statistic is 1012!) The residual sum of squares, , is given so that it can be used for subsequent calculations, and the residual standard error, , is reported rather than the residual variance, because, being in the same u

31、nits of measurement as Y, it is easier to interpret.Suppose that an alternative theory suggests that food expenditure is dependent upon income alone, i.e., that . The restricted regression isThe F-statistic testing isThus the restriction is unacceptable and the alternative theory is therefore invali

32、d. Note the large changes in the coefficient estimates in the restricted regression compared to those in the original regression they have roughly halved in size, providing another example of how omitting important variables, here T and P, can seriously bias the coefficients of the remaining regress

33、ors!Now, note that in the unrestricted regression , which suggests testing the hypothesis . The easiest way of doing this is to impose the restriction directly onto the regressionso that we now have a new, combined regressor . Estimating this regression yieldsThe F-statistic for testing isWe see tha

34、t the restriction fits almost perfectly. is estimated much more precisely (its standard error is now 0.003) and are almost unchanged.Is there an economic rationale for this restriction? Indeed there is! The tax variable is defined asso thati.e., Y depends (partially) on personal disposable income (P

35、DI) and not on personal income and tax seperately.3.4MulticollinearityWhen estimating multiple regressions, there is an important problem that is often encountered that has no counterpart in simple regression: multicollinearity. We investigate this problem by employing another simulation example. Su

36、ppose that the true PRF isand that the two regressors are correlated. To design this correlation , the following generating process for was used:and was drawn from a uniform distribution between 0 and 10. The choice of error variance, , is at our disposal. We now investigate what happens when we red

37、uce : i.e., what happens when we make the relationship between tighter and tighter.The regression estimates as is reduced are shown in the table below. The coefficient estimates get increasingly volatile and their standard errors blow up. Yet the fit of the regression hardly alters: for , , and, alt

38、hough the estimates are all individually insignificant, implies a strong relationship between Y and the regressors: are therefore jointly, but not individually, significant. What is going on here?We can analyse the problem analytically using the standard least squares formulae40.8248.56(3.42)3.28(0.94)2.64(0.38)0.87230.89216.15(4.11)4.02(1.22)2.03(0.55)0.

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業或盈利用途。
  • 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論