




版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
1、Machine Learning Foundations (_hx?)Lecture 10: Logistic RegressionHsuen Lin (?0).twDepartment of Computer Science & Information EngineeringNationalUniversity( ?c'x? ? ?) Hsuen Lin (NTU CSIE)Machine Learning Foundations0/25Logistic RegressionRoadmap1 When Can Machines Learn?2
2、 Why Can Machines Learn?3 How Can Machines Learn?4 How Can Machines Learn Better? Hsuen Lin (NTU CSIE)Machine Learning Foundations1/25Lecture 10: Logistic RegressionLogistic Regression Problem Logistic Regression ErrorGradient of Logistic Regression ErrorGradient DescentLecture 9: Linear Regressiona
3、nalytic solution wLIN = Xy withlinear regression hypotheses and squared errorLogistic RegressionLogistic Regression ProblemHeart Attack Prediction Problem (1/2)unknown targetdistribution P(y |x)heart disease? yescontaining f (x) + noiselearning algorithmA err cerror measureerrhypothesis setHbinary c
4、lassification: 1ideal f (x) = sign P(+1|x) 1, +12because of classification errHsuen Lin (NTU CSIE)Machine Learning Foundations2/25final hypothesisg ftraining examplesD: (x1, y1), · · · , (xN , yN )age40 yearsgendermaleblood pressure130/85cholesterol level240weight70Logistic Regression
5、Logistic Regression ProblemHeart Attack Prediction Problem (2/2)unknown targetdistribution P(y |x)heart attack? 80% riskcontaining f (x) + noiselearning algorithmA err cerror measureerrhypothesis setHsoft binary classification:f (x) = P(+1|x) 0, 1Hsuen Lin (NTU CSIE)Machine Learning Foundations3/25f
6、inal hypothesisg ftraining examplesD: (x1, y1), · · · , (xN , yN )age40 yearsgendermaleblood pressure130/85cholesterol level240weight70Logistic RegressionLogistic Regression ProblemSoft Binary Classificationtarget function f (x) = P(+1|x) 0, 1same data as hard binary classification, d
7、ifferent target function Hsuen Lin (NTU CSIE)Machine Learning Foundations4/25ideal (noiseless) dataactual (noisy) datax1, y10= 0.9 = P(+1|x1) x2, y20= 0.2 = P(+1|x2) . xN , yN0= 0.6 = P(+1|xN ) x1, y1= P(y |x1) x2, y2= × P(y |x2) . xN , yN= × P(y |xN ) Logistic RegressionLogistic Regressio
8、n ProblemSoft Binary Classificationtarget function f (x) = P(+1|x) 0, 1same data as hard binary classification, different target function Hsuen Lin (NTU CSIE)Machine Learning Foundations4/25ideal (noiseless) dataactual (noisy) data x1, y10= 0.9 = P(+1|x1) x2, y20= 0.2 = P(+1|x2). xN , yN0= 0.6 = P(+
9、1|xN ) x1, y 0= 1 = r ? P(y |x1)z 1 x2, y 0= 0 = r ? P(y |x2)z 2. xN , y 0= 0 = r ? P(y |xN )z NLogistic RegressionLogistic Regression ProblemLogistic HypothesisFor x = (x0, x1, x2, · · · , xd ) features ofpatient, calculate a weighted risk score:1dX(s)s =w xi ii=00sconvert the score
10、to estimated probability by logistic function (s)logistic hypothesis: h(x) = (wT x) Hsuen Lin (NTU CSIE)Machine Learning Foundations5/25age40 yearsgendermaleblood pressure130/85cholesterol level240Logistic RegressionLogistic Regression ProblemLogistic Function1(s)0s1 ;() = 0;(0) =() = 12es1(s) =1 +
11、es1 + essmooth, monotonic, sigmoid function of slogistic regression: use 1h(x) =1 + exp(wT x)to approximate target function f (x) = P(y |x) Hsuen Lin (NTU CSIE)Machine Learning Foundations6/25Logistic RegressionLogistic Regression ProblemFun TimeLogistic Regression and Binary Classification1Consider
12、 any logistic hypothesis h(x) =that approximates1+exp(w x)TP(y x . Convert h(x) to a binary classification prediction by taking| ) 1sign h(x) . What is the equivalent formula for the binary2classification prediction? T121 sign w x 2 sign wT x T13 sign w x +24 none of the aboveReference Answer: 2When
13、 wT x = 0, h(x) is exactly 1 . So21thresholding h(x) atis the same as2thresholding (wT x) at 0. Hsuen Lin (NTU CSIE)Machine Learning Foundations7/25Logistic RegressionLogistic Regression ErrorThree Linear Mslinear scoring function: s = wT xlinear classificationlinear regressionlogistic regressionh(x
14、) = sign(s)h(x) = sh(x) = (s)x0x0x0x1x2x1x2x1x2sssh ( x)h ( x)h ( x)xdplausible err = 0/1xdfriendly err = squared(easy to minimize)xderr = ?(small flipnoise)how to defineEin(w) for logistic regression? Hsuen Lin (NTU CSIE)Machine Learning Foundations8/25Logistic RegressionLogistic Regression ErrorLi
15、kelihoodP(y |x) = f (x)for y = +1target functionf (x) = P(+1|x)1 f (x) for y = 1consider D = (x1, ), (x2, ×), . . . , (xN , ×)probability that f generates Dlikelihood that h generates DP(x1)P(|x1) ×P(x2)P(×|x2) ×. . .P(xN )P(×|xN )P(x1)h(x1) × P(x2)(1 h(x2) ×.
16、 . .P(xN )(1 h(xN )if h f ,then likelihood(h) probability using fprobability using f usually large Hsuen Lin (NTU CSIE)Machine Learning Foundations9/25Logistic RegressionLogistic Regression ErrorLikelihoodP(y |x) = f (x)for y = +1target functionf (x) = P(+1|x)1 f (x) for y = 1consider D = (x1, ), (x
17、2, ×), . . . , (xN , ×)probability that f generates Dlikelihood that h generates DP(x1)f (x1) × P(x2)(1 f (x2) ×. . .P(xN )(1 f (xN )P(x1)h(x1) × P(x2)(1 h(x2) ×. . .P(xN )(1 h(xN )if h f ,then likelihood(h) probability using fprobability using f usually large Hsuen Lin
18、 (NTU CSIE)Machine Learning Foundations9/25Logistic RegressionLogistic Regression ErrorLikelihood of Logistic Hypothesislikelihood(h) (probability using f ) largeg = argmaxhlikelihood(h)1when logistic: h(x) = (wT x)(s)1 h(x) = h(x)0slikelihood(h) = P(x1)h(x1) × P(x2)(1 h(x2) × . . . P(xN )
19、(1 h(xN )NYlikelihood(logistic h) h(y x )n nn=1 Hsuen Lin (NTU CSIE)Machine Learning Foundations10/25Logistic RegressionLogistic Regression ErrorLikelihood of Logistic Hypothesislikelihood(h) (probability using f ) largeg = argmaxhlikelihood(h)1when logistic: h(x) = (wT x)(s)1 h(x) = h(x)0slikelihoo
20、d(h) = P(x1)h(+x1) × P(x2)h(x2) × . . . P(xN )h(xN )NYlikelihood(logistic h) h(y x )n nn=1 Hsuen Lin (NTU CSIE)Machine Learning Foundations10/25Logistic RegressionLogistic Regression ErrorCross-Entropy ErrorNYmaxlikelihood(logistic h) h(y x )n nhn=1 Hsuen Lin (NTU CSIE)Machine Learning Fou
21、ndations11/25Logistic RegressionLogistic Regression ErrorCross-Entropy ErrorN YTmaxlikelihood(w) yw xnnwn=1 Hsuen Lin (NTU CSIE)Machine Learning Foundations11/25Logistic RegressionLogistic Regression ErrorCross-Entropy ErrorN YTmaxln yw xnnwn=1 Hsuen Lin (NTU CSIE)Machine Learning Foundations11/25Lo
22、gistic RegressionLogistic Regression ErrorCross-Entropy ErrorNln ynwT xn X1 min N n=1NX 11 + exp(s)1(s) =:minN n=1wNX1= min err(wxy ) , , nnN n=1w|Ein(zw)err(w, x, y ) = ln (1 + exp(y wx):cross-entropy errorHsuen Lin (NTU CSIE)Machine Learning Foundations11/25wln 1 + exp(ynwT xn) Logistic Regression
23、Logistic Regression ErrorFun TimeThe four statements below help us understand more about the Tcross-entropy error err(w, x, y ) = ln 1 + exp(y w x) . Whichstatement is not true?1 For any w, x, and y , err(w, x, y ) > 0.2 For any w, x, and y , err(w, x, y ) < 1126.3 When y = sign wT x , err(w,
24、x, y ) < ln 2.4 When y 6= sign wT x , err(w, x, y ) ln 2.Reference Answer: 21126, really? :-) You aighly encouraged toplot the curve of err with respect to some fixedy and some varying score s = wT x to know more about the error measure. After plotting, it is easy to see that err is not bounded a
25、bove, and the other three choices are correct. Hsuen Lin (NTU CSIE)Machine Learning Foundations12/25Logistic RegressionGradient of Logistic Regression ErrorMinimizing Ein(w)N X1TminEin(w) =ln 1 + exp(y w x )nnN n=1wEin(w): continuous, differentiable, twice-differentiable, convexhow to minimize? loca
26、te valleywant Ein(w) = 0Einwfirst: derive Ein(w) Hsuen Lin (NTU CSIE)Machine Learning Foundations13/25Logistic RegressionGradient of Logistic Regression ErrorThe Gradient Ein(w) z|NX1NTEin(w)ln1 + exp(y w x )=nn| zn=1 NTX( )(1 + expEin(w)wi1 ln) yw x( nn=N n=1 wi NX1N1N=n=1 !NNXXexp( )1 + exp( )1N y
27、nxn,iynxn,i=( )n=1n=1NP 1TEin(w) = y w xy xnnn nN n=1 Hsuen Lin (NTU CSIE)Machine Learning Foundations14/25Logistic RegressionGradient of Logistic Regression ErrorThe Gradient Ein(w) z|NX1NTEin(w)ln1 + exp(y w x )=nn| zn=1 NTX( )(1 + expEin(w)wi1 ln) yw x( nn=N n=1 wi ! ! NX1N1N1exp( )ynxn,i= n=1 !
28、NNXXexp( )1 + exp( )1N ynxn,iynxn,i=( )n=1n=1NP 1TEin(w) = y w xy xnnn nN n=1 Hsuen Lin (NTU CSIE)Machine Learning Foundations14/25Logistic RegressionGradient of Logistic Regression ErrorMinimizing Ein(w)N X1TminEin(w)ln 1 + exp(y w x )=nnN n=1wwant Ein(w)= 0=scaled -weighted sum of ynxn all (·
29、) = 0: only if ynwT xn 0linear separable DEinweighted sum = 0:non-linear equation of wwd-form solution? no :-( Hsuen Lin (NTU CSIE)Machine Learning Foundations15/251 XN N ynwT xnynxnn=1Logistic RegressionGradient of Logistic Regression ErrorPLA Revisited: Iterative OptimizationPLA: start from some w
30、0 (say, 0), and correct its mistakes on DFor t = 0, 1, . . .1 find a mistake of wt called xn(t), yn(t) Tsign w x=6ytn(t )n(t )2 (try to) correct the mistake bywt +1 wt + yn(t )xn(t )when stop, return last w as g Hsuen Lin (NTU CSIE)Machine Learning Foundations16/25Logistic RegressionGradient of Logi
31、stic Regression ErrorPLA Revisited: Iterative OptimizationPLA: start from some w0 (say, 0), and correct its mistakes on DFor t = 0, 1, . . .1 find a mistake of wt called xn(t), yn(t) Tsign w xtn(t )=6yn(t )2 (try to) correct the mistake bywt +1 wt + yn(t )xn(t )1 (equivalently) pick some n, and upda
32、te wt byr zTw w + sign w x=yy x6t +1tnnn ntwhen stop, return last w as g Hsuen Lin (NTU CSIE)Machine Learning Foundations16/25Logistic RegressionGradient of Logistic Regression ErrorPLA Revisited: Iterative OptimizationPLA: start from some w0 (say, 0), and correct its mistakes on DFor t = 0, 1, . .
33、.1 (equivalently) pick some n, and update wt by rz Twt+1 wt + 1 ·sign w x=y· y x6nnn nt z|z |vwhen stop, return last w as gchoice of (, v) and stopcondition definesiterative optimization approach Hsuen Lin (NTU CSIE)Machine Learning Foundations16/25Logistic RegressionGradient of Logistic R
34、egression ErrorFun TimeN P1TConsider the gradient Ein(w) = y w xy x. That is,nnn nN n=1each example (xn, yn) contributes to the gradient by an amount of T y w x. For any given w, which example contributes the mostnnamount to the gradient?1 the example with the smallest ynwT xn value2 the example wit
35、h the largest ynwT xn value3 the example with the smallest wT xn value4 the example with the largest wT xn valueReference Answer: 1Using the fact that is a monotonic function, we see that the example with the smallest ynwT xn value contributes to the gradient the most. Hsuen Lin (NTU CSIE)Machine Le
36、arning Foundations17/25Logistic RegressionGradient DescentIterative OptimizationFor t = 0, 1, . . .wt+1 wt + vwhen stop, return last w as gPLA: v comes from mistake correctionsmooth Ein(w) for logistic regression: choose v to get the ball roll downhill? direction v:(assumed) of unit length step size
37、 :(assumed) positiveWeights, wa greedy approach for some given > 0:min Ein(wt + v)| z kvk=1wt +1 Hsuen Lin (NTU CSIE)Machine Learning Foundations18/25In-sample Error, EinLogistic RegressionGradient DescentLinear Approximationa greedy approach for some given > 0:minEin(wt + v)kvk=1still non-lin
38、ear optimization, now with constraintsnot any easier than minw Ein(w)local approximation by linear formukes problem easierEin(wt + v) Ein(wt ) + vT Ein(wt )if really small (Taylor expansion)an approximate greedy approach for some given small :vT Ein(wt )minEin(wt ) +| z |z|zkvk=1given positiveknownk
39、nown Hsuen Lin (NTU CSIE)Machine Learning Foundations19/25Logistic RegressionGradient DescentGradient Descentan approximate greedy approach for some given small :vT Ein(wt )minEin(wt ) +| z give|npzositive|zkvk=1knownknownoptimal v: opposite direction of Ein(wt ) Ein(wt ) v =kEin(wt )k E (w ) gradie
40、nt descent: for small , w w in t t +1tkEin(wt )kgradient descent:a simple & popular optimization tool Hsuen Lin (NTU CSIE)Machine Learning Foundations20/25Logistic RegressionGradient DescentChoice of too smalltoo largejust rightlarge small Weights, wtoo slow :-(Weights, wtoo unstable :-(Weights,
41、 wuse changing better be monotonic of kEin(wt )k Hsuen Lin (NTU CSIE)Machine Learning Foundations21/25In-sample Error, EinIn-sample Error, EinIn-sample Error, EinLogistic RegressionGradient DescentSimple Heuristic for Changing better be monotonic of kEin(wt )k if red kEin(wt )k by ratio purple Ein(wt ) wt wt+1kEin(wt )k|wt Ein(wt ) call purple the fixed learning ratefixed learning rate gradient descent:wt+1 wt Ein(wt ) Hsuen Lin (NTU CSIE)Ma
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經權益所有人同意不得將文件中的內容挪作商業或盈利用途。
- 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 嬰兒健康教育活動方案
- 學校活動社團活動方案
- 學校十嚴禁活動方案
- 學員元旦活動方案
- 婦女惠民活動方案
- 學校文藝比賽活動方案
- 婚慶公司店慶活動方案
- 媽媽班講座活動方案
- 學校舉行野炊活動方案
- 娛樂類打卡活動方案
- 富士康職工檔案管理制度
- 7數滬科版期末考試卷-2024-2025學年七年級(初一)數學下冊期末考試模擬卷04
- 胃管置入術考試題及答案
- 2025年全國統一高考英語試卷(全國一卷)含答案
- 鄭州大學cad期末考試試題及答案
- 2025年內蒙古高考物理試卷(含答案)
- 2025年美術教師編制考試模擬試卷:美術教育心理學在課堂管理中的應用試題
- 保利大劇院面試題及答案
- 吉林省吉林市名校2025年七下英語期末考試模擬試題含答案
- 2025屆福建省廈門市名校數學七下期末質量檢測試題含解析
- 北京社工考試題及答案
評論
0/150
提交評論