臺大機器學習基石資源匯總視頻

上傳人：我*** IP屬地：北京上傳時間：2022-02-13 格式：DOCX 頁數：35 大小：461.56KB 積分：9.6 舉報 版權申訴

已閱讀5頁，還剩30頁未讀，繼續免費閱讀

版權說明：本文檔由用戶提供并上傳，收益歸屬內容提供方，若內容存在侵權，請進行舉報或認領

文檔簡介

1、Machine Learning Foundations (_hx?)Lecture 10: Logistic RegressionHsuen Lin (?0).twDepartment of Computer Science & Information EngineeringNationalUniversity( ?c'x? ? ?) Hsuen Lin (NTU CSIE)Machine Learning Foundations0/25Logistic RegressionRoadmap1 When Can Machines Learn?2

2、 Why Can Machines Learn?3 How Can Machines Learn?4 How Can Machines Learn Better? Hsuen Lin (NTU CSIE)Machine Learning Foundations1/25Lecture 10: Logistic RegressionLogistic Regression Problem Logistic Regression ErrorGradient of Logistic Regression ErrorGradient DescentLecture 9: Linear Regressiona

3、nalytic solution wLIN = Xy withlinear regression hypotheses and squared errorLogistic RegressionLogistic Regression ProblemHeart Attack Prediction Problem (1/2)unknown targetdistribution P(y |x)heart disease? yescontaining f (x) + noiselearning algorithmA err cerror measureerrhypothesis setHbinary c

4、lassification: 1ideal f (x) = sign P(+1|x) 1, +12because of classification errHsuen Lin (NTU CSIE)Machine Learning Foundations2/25final hypothesisg ftraining examplesD: (x1, y1), · · · , (xN , yN )age40 yearsgendermaleblood pressure130/85cholesterol level240weight70Logistic Regression

5、Logistic Regression ProblemHeart Attack Prediction Problem (2/2)unknown targetdistribution P(y |x)heart attack? 80% riskcontaining f (x) + noiselearning algorithmA err cerror measureerrhypothesis setHsoft binary classification:f (x) = P(+1|x) 0, 1Hsuen Lin (NTU CSIE)Machine Learning Foundations3/25f

6、inal hypothesisg ftraining examplesD: (x1, y1), · · · , (xN , yN )age40 yearsgendermaleblood pressure130/85cholesterol level240weight70Logistic RegressionLogistic Regression ProblemSoft Binary Classificationtarget function f (x) = P(+1|x) 0, 1same data as hard binary classification, d

8、n ProblemSoft Binary Classificationtarget function f (x) = P(+1|x) 0, 1same data as hard binary classification, different target function Hsuen Lin (NTU CSIE)Machine Learning Foundations4/25ideal (noiseless) dataactual (noisy) data x1, y10= 0.9 = P(+1|x1) x2, y20= 0.2 = P(+1|x2). xN , yN0= 0.6 = P(+

9、1|xN ) x1, y 0= 1 = r ? P(y |x1)z 1 x2, y 0= 0 = r ? P(y |x2)z 2. xN , y 0= 0 = r ? P(y |xN )z NLogistic RegressionLogistic Regression ProblemLogistic HypothesisFor x = (x0, x1, x2, · · · , xd ) features ofpatient, calculate a weighted risk score:1dX(s)s =w xi ii=00sconvert the score

10、to estimated probability by logistic function (s)logistic hypothesis: h(x) = (wT x) Hsuen Lin (NTU CSIE)Machine Learning Foundations5/25age40 yearsgendermaleblood pressure130/85cholesterol level240Logistic RegressionLogistic Regression ProblemLogistic Function1(s)0s1 ;() = 0;(0) =() = 12es1(s) =1 +

11、es1 + essmooth, monotonic, sigmoid function of slogistic regression: use 1h(x) =1 + exp(wT x)to approximate target function f (x) = P(y |x) Hsuen Lin (NTU CSIE)Machine Learning Foundations6/25Logistic RegressionLogistic Regression ProblemFun TimeLogistic Regression and Binary Classification1Consider

12、 any logistic hypothesis h(x) =that approximates1+exp(w x)TP(y x . Convert h(x) to a binary classification prediction by taking| ) 1sign h(x) . What is the equivalent formula for the binary2classification prediction? T121 sign w x 2 sign wT x T13 sign w x +24 none of the aboveReference Answer: 2When

13、 wT x = 0, h(x) is exactly 1 . So21thresholding h(x) atis the same as2thresholding (wT x) at 0. Hsuen Lin (NTU CSIE)Machine Learning Foundations7/25Logistic RegressionLogistic Regression ErrorThree Linear Mslinear scoring function: s = wT xlinear classificationlinear regressionlogistic regressionh(x

14、) = sign(s)h(x) = sh(x) = (s)x0x0x0x1x2x1x2x1x2sssh ( x)h ( x)h ( x)xdplausible err = 0/1xdfriendly err = squared(easy to minimize)xderr = ?(small flipnoise)how to defineEin(w) for logistic regression? Hsuen Lin (NTU CSIE)Machine Learning Foundations8/25Logistic RegressionLogistic Regression ErrorLi

15、kelihoodP(y |x) = f (x)for y = +1target functionf (x) = P(+1|x)1 f (x) for y = 1consider D = (x1, ), (x2, ×), . . . , (xN , ×)probability that f generates Dlikelihood that h generates DP(x1)P(|x1) ×P(x2)P(×|x2) ×. . .P(xN )P(×|xN )P(x1)h(x1) × P(x2)(1 h(x2) ×.

16、 . .P(xN )(1 h(xN )if h f ,then likelihood(h) probability using fprobability using f usually large Hsuen Lin (NTU CSIE)Machine Learning Foundations9/25Logistic RegressionLogistic Regression ErrorLikelihoodP(y |x) = f (x)for y = +1target functionf (x) = P(+1|x)1 f (x) for y = 1consider D = (x1, ), (x

17、2, ×), . . . , (xN , ×)probability that f generates Dlikelihood that h generates DP(x1)f (x1) × P(x2)(1 f (x2) ×. . .P(xN )(1 f (xN )P(x1)h(x1) × P(x2)(1 h(x2) ×. . .P(xN )(1 h(xN )if h f ,then likelihood(h) probability using fprobability using f usually large Hsuen Lin

18、 (NTU CSIE)Machine Learning Foundations9/25Logistic RegressionLogistic Regression ErrorLikelihood of Logistic Hypothesislikelihood(h) (probability using f ) largeg = argmaxhlikelihood(h)1when logistic: h(x) = (wT x)(s)1 h(x) = h(x)0slikelihood(h) = P(x1)h(x1) × P(x2)(1 h(x2) × . . . P(xN )

19、(1 h(xN )NYlikelihood(logistic h) h(y x )n nn=1 Hsuen Lin (NTU CSIE)Machine Learning Foundations10/25Logistic RegressionLogistic Regression ErrorLikelihood of Logistic Hypothesislikelihood(h) (probability using f ) largeg = argmaxhlikelihood(h)1when logistic: h(x) = (wT x)(s)1 h(x) = h(x)0slikelihoo

20、d(h) = P(x1)h(+x1) × P(x2)h(x2) × . . . P(xN )h(xN )NYlikelihood(logistic h) h(y x )n nn=1 Hsuen Lin (NTU CSIE)Machine Learning Foundations10/25Logistic RegressionLogistic Regression ErrorCross-Entropy ErrorNYmaxlikelihood(logistic h) h(y x )n nhn=1 Hsuen Lin (NTU CSIE)Machine Learning Fou

21、ndations11/25Logistic RegressionLogistic Regression ErrorCross-Entropy ErrorN YTmaxlikelihood(w) yw xnnwn=1 Hsuen Lin (NTU CSIE)Machine Learning Foundations11/25Logistic RegressionLogistic Regression ErrorCross-Entropy ErrorN YTmaxln yw xnnwn=1 Hsuen Lin (NTU CSIE)Machine Learning Foundations11/25Lo

22、gistic RegressionLogistic Regression ErrorCross-Entropy ErrorNln ynwT xn X1 min N n=1NX 11 + exp(s)1(s) =:minN n=1wNX1= min err(wxy ) , , nnN n=1w|Ein(zw)err(w, x, y ) = ln (1 + exp(y wx):cross-entropy errorHsuen Lin (NTU CSIE)Machine Learning Foundations11/25wln 1 + exp(ynwT xn) Logistic Regression

23、Logistic Regression ErrorFun TimeThe four statements below help us understand more about the Tcross-entropy error err(w, x, y ) = ln 1 + exp(y w x) . Whichstatement is not true?1 For any w, x, and y , err(w, x, y ) > 0.2 For any w, x, and y , err(w, x, y ) < 1126.3 When y = sign wT x , err(w,

24、x, y ) < ln 2.4 When y 6= sign wT x , err(w, x, y ) ln 2.Reference Answer: 21126, really? :-) You aighly encouraged toplot the curve of err with respect to some fixedy and some varying score s = wT x to know more about the error measure. After plotting, it is easy to see that err is not bounded a

25、bove, and the other three choices are correct. Hsuen Lin (NTU CSIE)Machine Learning Foundations12/25Logistic RegressionGradient of Logistic Regression ErrorMinimizing Ein(w)N X1TminEin(w) =ln 1 + exp(y w x )nnN n=1wEin(w): continuous, differentiable, twice-differentiable, convexhow to minimize? loca

26、te valleywant Ein(w) = 0Einwfirst: derive Ein(w) Hsuen Lin (NTU CSIE)Machine Learning Foundations13/25Logistic RegressionGradient of Logistic Regression ErrorThe Gradient Ein(w) z|NX1NTEin(w)ln1 + exp(y w x )=nn| zn=1 NTX( )(1 + expEin(w)wi1 ln) yw x( nn=N n=1 wi NX1N1N=n=1 !NNXXexp( )1 + exp( )1N y

27、nxn,iynxn,i=( )n=1n=1NP 1TEin(w) = y w xy xnnn nN n=1 Hsuen Lin (NTU CSIE)Machine Learning Foundations14/25Logistic RegressionGradient of Logistic Regression ErrorThe Gradient Ein(w) z|NX1NTEin(w)ln1 + exp(y w x )=nn| zn=1 NTX( )(1 + expEin(w)wi1 ln) yw x( nn=N n=1 wi ! ! NX1N1N1exp( )ynxn,i= n=1 !

28、NNXXexp( )1 + exp( )1N ynxn,iynxn,i=( )n=1n=1NP 1TEin(w) = y w xy xnnn nN n=1 Hsuen Lin (NTU CSIE)Machine Learning Foundations14/25Logistic RegressionGradient of Logistic Regression ErrorMinimizing Ein(w)N X1TminEin(w)ln 1 + exp(y w x )=nnN n=1wwant Ein(w)= 0=scaled -weighted sum of ynxn all (·

29、) = 0: only if ynwT xn 0linear separable DEinweighted sum = 0:non-linear equation of wwd-form solution? no :-( Hsuen Lin (NTU CSIE)Machine Learning Foundations15/251 XN N ynwT xnynxnn=1Logistic RegressionGradient of Logistic Regression ErrorPLA Revisited: Iterative OptimizationPLA: start from some w

30、0 (say, 0), and correct its mistakes on DFor t = 0, 1, . . .1 find a mistake of wt called xn(t), yn(t) Tsign w x=6ytn(t )n(t )2 (try to) correct the mistake bywt +1 wt + yn(t )xn(t )when stop, return last w as g Hsuen Lin (NTU CSIE)Machine Learning Foundations16/25Logistic RegressionGradient of Logi

31、stic Regression ErrorPLA Revisited: Iterative OptimizationPLA: start from some w0 (say, 0), and correct its mistakes on DFor t = 0, 1, . . .1 find a mistake of wt called xn(t), yn(t) Tsign w xtn(t )=6yn(t )2 (try to) correct the mistake bywt +1 wt + yn(t )xn(t )1 (equivalently) pick some n, and upda

32、te wt byr zTw w + sign w x=yy x6t +1tnnn ntwhen stop, return last w as g Hsuen Lin (NTU CSIE)Machine Learning Foundations16/25Logistic RegressionGradient of Logistic Regression ErrorPLA Revisited: Iterative OptimizationPLA: start from some w0 (say, 0), and correct its mistakes on DFor t = 0, 1, . .

33、.1 (equivalently) pick some n, and update wt by rz Twt+1 wt + 1 ·sign w x=y· y x6nnn nt z|z |vwhen stop, return last w as gchoice of (, v) and stopcondition definesiterative optimization approach Hsuen Lin (NTU CSIE)Machine Learning Foundations16/25Logistic RegressionGradient of Logistic R

34、egression ErrorFun TimeN P1TConsider the gradient Ein(w) = y w xy x. That is,nnn nN n=1each example (xn, yn) contributes to the gradient by an amount of T y w x. For any given w, which example contributes the mostnnamount to the gradient?1 the example with the smallest ynwT xn value2 the example wit

35、h the largest ynwT xn value3 the example with the smallest wT xn value4 the example with the largest wT xn valueReference Answer: 1Using the fact that is a monotonic function, we see that the example with the smallest ynwT xn value contributes to the gradient the most. Hsuen Lin (NTU CSIE)Machine Le

36、arning Foundations17/25Logistic RegressionGradient DescentIterative OptimizationFor t = 0, 1, . . .wt+1 wt + vwhen stop, return last w as gPLA: v comes from mistake correctionsmooth Ein(w) for logistic regression: choose v to get the ball roll downhill? direction v:(assumed) of unit length step size

37、 :(assumed) positiveWeights, wa greedy approach for some given > 0:min Ein(wt + v)| z kvk=1wt +1 Hsuen Lin (NTU CSIE)Machine Learning Foundations18/25In-sample Error, EinLogistic RegressionGradient DescentLinear Approximationa greedy approach for some given > 0:minEin(wt + v)kvk=1still non-lin

38、ear optimization, now with constraintsnot any easier than minw Ein(w)local approximation by linear formukes problem easierEin(wt + v) Ein(wt ) + vT Ein(wt )if really small (Taylor expansion)an approximate greedy approach for some given small :vT Ein(wt )minEin(wt ) +| z |z|zkvk=1given positiveknownk

39、nown Hsuen Lin (NTU CSIE)Machine Learning Foundations19/25Logistic RegressionGradient DescentGradient Descentan approximate greedy approach for some given small :vT Ein(wt )minEin(wt ) +| z give|npzositive|zkvk=1knownknownoptimal v: opposite direction of Ein(wt ) Ein(wt ) v =kEin(wt )k E (w ) gradie

40、nt descent: for small , w w in t t +1tkEin(wt )kgradient descent:a simple & popular optimization tool Hsuen Lin (NTU CSIE)Machine Learning Foundations20/25Logistic RegressionGradient DescentChoice of too smalltoo largejust rightlarge small Weights, wtoo slow :-(Weights, wtoo unstable :-(Weights,

41、 wuse changing better be monotonic of kEin(wt )k Hsuen Lin (NTU CSIE)Machine Learning Foundations21/25In-sample Error, EinIn-sample Error, EinIn-sample Error, EinLogistic RegressionGradient DescentSimple Heuristic for Changing better be monotonic of kEin(wt )k if red kEin(wt )k by ratio purple Ein(wt ) wt wt+1kEin(wt )k|wt Ein(wt ) call purple the fixed learning ratefixed learning rate gradient descent:wt+1 wt Ein(wt ) Hsuen Lin (NTU CSIE)Ma

人人文庫> 全部分類> 應用文書

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯系上傳者。文件的所有權益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網頁內容里面會有圖紙預覽，若沒有圖紙預覽就沒有圖紙。
4. 未經權益所有人同意不得將文件中的內容挪作商業或盈利用途。
5. 人人文庫網僅提供信息存儲空間，僅對用戶上傳內容的表現方式做保護處理，對用戶上傳分享的文檔內容本身不做任何修改或編輯，并不能對任何下載內容負責。
6. 下載文件中如有侵權或不適當內容，請與我們聯系，我們立即糾正。
7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

老太爷的乳妓h开裆裤,久久久久久精品国产三级非禁歌 ,久久久久久久99精品国产片,免费观看交性大片

臺大機器學習基石資源匯總視頻

文檔簡介

溫馨提示

最新文檔

評論

老太爷的乳妓h开裆裤,久久久久久精品国产三级非禁歌 ,久久久久久久99精品国产片,免费观看交性大片

臺大機器學習基石資源匯總視頻

文檔簡介

溫馨提示

最新文檔

評論

相關文檔