應用統計學_卡方檢驗_第1頁
應用統計學_卡方檢驗_第2頁
應用統計學_卡方檢驗_第3頁
應用統計學_卡方檢驗_第4頁
應用統計學_卡方檢驗_第5頁
已閱讀5頁,還剩29頁未讀 繼續免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

1、.Week Six Analyzing categorical data: Chi-squared tests .This week lecture will cover.Analysing categorical data (nominal) Chi-square test of differences between proportions Chi-square test of independence.SPSS單樣本非參數檢驗總體分布的總體分布的chi-square檢驗檢驗(1)目的目的: 根據樣本數據推斷總體的分布與某個已知分布是否有顯著差異根據樣本數據推斷總體的分布與某個已知分布是否

2、有顯著差異-吻合性檢驗。吻合性檢驗。適用于分類資料的統計推斷適用于分類資料的統計推斷.SPSS單樣本非參數檢驗單樣本非參數檢驗l總體分布的chi-square檢驗(2)基本假設: H0:總體分布與理論分布無顯著差異(3)基本方法 根據已知總體的構成比計算出樣本中各類別的期望頻數,計算實際觀察頻數與期望頻數的差距,即:計算卡方值 卡方值較小,則實際頻數和期望頻數相差較小.如果P大于a,不能拒絕H0,認為總體分布與已知分布無顯著差異.反之.SPSS單樣本卡方檢驗總體分布的總體分布的chi-square檢驗檢驗(4)基本操作步驟基本操作步驟:菜單:analyze-nonparametric test

3、-chi square選定待檢驗變量入test variable list 框確定待檢驗個案的取值范圍(expected range)get from data:全部樣本use specified range:用戶自定義個案范圍指定期望頻數(expected values)all categories equal:所有類別有相同的構成比value:用戶自定義構成比.Categorical variableVariables that describe categories of entitiesDealing with them all the time in statisticsMaking

4、 comparisons among variablesFor example, whether consumers prefer a particular brand of a product among other competing brands.Checking whether there is a relationship between two categorical variables Gender and preference for a product, whether the preference for a product is independent from gend

5、er.Chi-square test for differences between proportionsThis test involves with nominal data produced by multinomial experimentIt is a generalisation of a binomial experimentThese test the null hypothesis that data in the target population has a particular probability distribution.Example 1We might te

6、st whether consumers are indifferent to which of four materials (glass, plastic, steel or aluminium) that could be used to make soft drink containers.The null hypothesis is that they are indifferent (or that equal numbers prefer glass, plastic, steel and aluminium).Example 1DataLet pG be the probabi

7、lity that an individual selected at random will nominate glass as his/her preference if required to make a choice. Similarly for pP (plastic), pS (steel) and pA (aluminium)HypothesesHO: pG = pP = pS = pA = 0.25.HA: at least one pi 0.25.The alternative is that at least one material is more preferred

8、(or less preferred) than the others.Example 1cont.Procedure:Select a random sample of, say, 100 consumers and determine their preferences.Under the null hypothesisWe expect 25 consumers to nominate glass, 25 to nominate plastic, 25 to nominate steel and 25 to nominate aluminiumThese are the expected

9、 frequencies, Ei.Ei = n pi.We compare the expected frequencies with the sample results or the observed frequencies, Oi. If they are approximately the same we would conclude that the null hypothesis is true.Oi Ei HO is probably true.Example 1cont., Chi squareE)EO(i221GiiWe require a test statistic to

10、 decide whether the difference is large enough to reject the null hypothesis.We use chi square with G - 1 degrees of freedom where G is the number of groups.Suppose in our example, 39 prefer glass, 16 prefer plastic, 20 prefer steel and 25 prefer aluminium. Recall that the expected frequencies were

11、all 25.08.1225)2525(25)2520(25)2516(25)2539(23222223.Obtain the critical value of chi square Critical 23 = 7.82. Obtain the critical value at 5% significance level at 3 d.f., (Table E4, page 742, Berenson et.al. 2013)i.e. there is only a 5 percent chance or less that 23 7.82 if HO is true. Compariso

12、n of chi square values23 = 12.08 7.82 reject HO. Conclusion: at the 5% significance level there is sufficient evidence to reject the null hypothesis. At least one of the probabilities (pi) is different. The sample results indicate that the materials are not equally preferred by consumers in the targ

13、et population. Thus, at least preferences for two materials are different.Chi square test using SPSSExample : Suppose that we want to test whether or not customers have a colour preference for packaging. Three different colours, Blue, Green & Purple, are considered. The null hypothesis is that t

14、hey dont have colour preference.Use Analyse/Nonparametric tests /Chi-Square.The default is that the probabilities are equal.Main display colour2630.0-4.03730.07.02730.0-3.090BlueGreenPurpleTotalObserved NExpected NResidualNumbers of consumers actually choosing particular colours.Numbers of consumers

15、 expected to choose particular colours if the null is true.Main display colour2630.0-4.03730.07.02730.0-3.090BlueGreenPurpleTotalObserved NExpected NResidualDifferent but differentenough to reject the null? .Test Statistics2.4672.291Chi-SquareadfAsymp. Sig.Main DisplayColour0 cells (.0%) have expect

16、ed frequencies less than5. The minimum expected cell frequency is 30.0.a. Degrees of freedom,groups - 1Chi-square statistic.Test Statistics2.4672.291Chi-SquareadfAsymp. Sig.Main DisplayColourCheck this to test the null.Check the sig value to test Ho Cannot reject the null (Ho) that all three colours

17、 are equally preferredbecause Sig 0.05.Conclusion: At 5% significance level there is no sufficient evidence to conclude that consumers in the target population have preference for at least one of three colours of packaging. .Tests of independence Chi-squared test of a contingency tableThis test sati

18、sfies two different problem objectives :Are two nominal variables related? Are there differences among two or more population of nominal variables?Consider the following 3 featuresHeight in centimetres, Weight in kilograms & Colour of eyes.Whilst some people are tall and thin, on average taller

19、people weigh more than shorter people.Weight and height are not independent. It seems unlikely that people with blue eyes weigh more, on average, than people with brown eyes.Weight and eye colour are almost certainly independent.交叉分組下的頻數分析目的 了解不同變量在不同水平下的數據分布情況 例:學習成績與性別有關聯嗎?(兩變量)例:職業、性別、愛逛商店有關聯嗎?(三

20、變量)分析的主要步驟產生交叉列聯表分析列聯表中變量間的關系.產生交叉列聯表收入 職稱 高(人) 中(人) 低(人) 高工 工程師 助工 技術員 合計 什么是列聯表列變量行變量地區控制變量頻數.產生交叉列聯表基本操作步驟(1)菜單選項: analyze-descriptive statistics- crosstabs(2)選擇一個變量作為行變量到row框.(3)選擇一個變量作為列變量到column框.(4)可選一個或多個變量作為控制變量到layer框.控制變量的層次設置:同層為水平數加水平數加;不同層為水平數積水平數積.(5)是否顯示各分組的棒圖(display clustered bar c

21、harts ).產生交叉列聯表進一步計算 cells選項:選擇在頻數分析表中輸出各種百分比.row:行百分比(Row pct);column:列百分比(Col pct);total:總百分比(Tot pct); .分析列聯表中變量間的關系目的: 通過列聯表分析,檢驗行列變量之間是否獨立。方法: 卡方檢驗:對品質數據的相關性進行度量.分析列聯表中變量間的關系卡方檢驗 年齡與工資收入交叉列聯表 低 中 高 青 400 0 0 中 0 5000 老 0 0 600 低 中 高 青 0 0 500 中 0 6000 老 400 0 0.分析列聯表中變量間的關系卡方檢驗基本步驟(1)H0:行列變量之間無

22、關聯或相互獨立(2)構造卡方統計量統計量服從(r-1)*(c-1)個自由度的卡方分布count:觀察(實際)頻數expected count:期望頻數(期望頻數反映的是H0成立情況下的數據分布特征)Residual:剩余(觀察頻數-期望頻數)優良中及格總數男1055323女8124125總數1817944837.535.418.88.3100eeofff22)(.不患肺癌不患肺癌患肺癌患肺癌總計總計不吸煙不吸煙7775427817吸煙吸煙2099492148總計總計98749199651、列聯表2、三維柱形圖3、二維條形圖不患肺癌患肺癌吸煙不吸煙不患肺癌患肺癌吸煙不吸煙080007000600

23、050004000300020001000從三維柱形圖能清晰看出從三維柱形圖能清晰看出各個頻數的相對大小。各個頻數的相對大小。從二維條形圖能看出,吸煙者中從二維條形圖能看出,吸煙者中患肺癌的比例高于不患肺癌的比例。患肺癌的比例高于不患肺癌的比例。通過圖形直觀判斷兩個分類變量是否相關:通過圖形直觀判斷兩個分類變量是否相關:.Tests of independence contExample 2Suppose we interviewed 400 people & asked themwhich of three age groups they are in (under 25, 25 t

24、o 60, and over 60).We also ask their response to the statement that “All imports of automobiles should be banned in order to protect the local industry” (agree, no view either way, disagree).attitudes towards banning importsagreeno viewdisagree Total age groupunder 2519 53 25 9725 - 6046 94 47 187ov

25、er 6030 56 30 116Total95203102 400.Tests of independence contExample 2 cont.Null hypothesis: The null hypothesis is that answers to the two questions are independent.Under the null:Probover 60 and agree = Probover 60 ProbagreeMultiplication rule for independent eventsExpected frequency= Probover 60

26、Probagree sample size.nCRnnCnREjijiijProcedureWe set up a cross-tabulation showing the observed frequencies of answers to the two questions.We calculate the expected frequencies.TestOur test is based on a comparison of the observed and expected frequencies.Short-cut for expected frequencies.Age *att

27、itude to banning imports Cross tabulation19.053.025.097.023.049.224.796.946.094.047.0187.044.494.947.7187.030.056.030.0116.027.658.929.6116.195.0203.0102.0400.095.0203.0102.0400.0CountExpected CountCountExpected CountCountExpected CountCountExpected CountUnder 2525-60Over 60AgeGroupTotalAgreeNo view

28、DisagreeAttitude to ban importsTotalCalculation for expectedfrequency of agree and over 60,95 116 / 400.Age *attitude to banning imports Cross tabulation19.053.025.097.023.049.224.796.946.094.047.0187.044.494.947.7187.030.056.030.0116.027.658.929.6116.195.0203.0102.0400.095.0203.0102.0400.0CountExpe

29、cted CountCountExpected CountCountExpected CountCountExpected CountUnder 2525-60Over 60AgeGroupTotalAgreeNo viewDisagreeAttitude to ban importsTotalThe count (observed) and the expected are different, but different enough to reject the null?.Chi-squared test for independenceE)EO(ij22)1c()1r (ijijRationale:Oij Eij HO is probably true.Test statisticWe requi

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業或盈利用途。
  • 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論