




版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
1、Tagging with Hidden Markov ModelsCMPT 882 Final ProjectChris DemwellSimon Fraser UniversityThe Tagging TaskIdentification of the part of speech of each word of a corpusSupervised: Training corpus provided consisting of correctly tagged textUnsupervised: Uses only plain textHidden Markov Models 1Obse
2、rvable states (corpus text) generated by hidden states (tags)Generative modelHidden Markov Models 2Model: = A, B, A: State transition probability matrixai,j = probability of changing from state i to state jB: Emission probability matrixbj,k = probability that word at location k is associated with ta
3、g j: Intial state probabilityi = probability of starting in state iHidden Markov Models 3Terms in this presentationN: Number of hidden states in each column (distinct tags)T: Number of columns in trellis (time ticks)M: Number of symbols (distinct words)O: The observation (the untagged text)bj(t): Th
4、e probability of emitting the symbol found at tick t, given state jt,j and t,j : The probability of arriving at state i in time tick t, given the observation before and after tick t (respectively)Hidden Markov Models 4A is a NxN matrixB is a NxT matrix is a vector of size N12a1,1a1,2b1,1b1,2Forward
5、AlgorithmUsed for calculating Likelihood quicklyt,i: The probability of arriving at trellis node (t,j) given the observation seen “so far”.Initialization1,i = iInduction2,21,11,21,3Backward AlgorithmSymmetrical to Forward AlgorithmInitializationT,i =1 for all IInduction:1,22,12,22,3Baum-Welch Re-est
6、imationCalculate two new matrices of intermediate probabilities ,Calculate new A, B, given these probabilitiesRecalculate and , p(O | )Repeat until p(O | ) doesnt change muchHMM Tagging 1Training MethodSupervisedRelative FrequencyRelative Frequency with further Maximum Likelihood trainingUnsupervise
7、dMaximum Likelihood training with random startHMM Tagging 2Read corpus, take counts and make translation tablesTrain HMM using BW or compute HMM using RFCompute most likely hidden state sequenceDetermine POS role that each state most likely playsHMM Tagging: Pitfalls 1Monolithic HMMRelatively opaque
8、 to debugging strategiesDifficult to modularizeSignificant time/space efficiency concernsVaried techniques for prior implementationsNumerical StabilityVery small probabilities likely to underflowLog likelihoodText ChunkingSentences? Fixed? Stream?HMM Tagging: Pitfalls 2State role identificationLexic
9、on giving p(tag | word) from supervised corpusUnseen wordsEqually likely tags for multiple statesLocal maximaHMM not guaranteed to converge on correct modelInitial conditionsRandomTrainedDegenerateHMM Tagging: Prior Work 1Cutting et al.Elaborate reduction of complexity (ambiguity classes)Integration
10、 of bias for tuning (lexicon choice, initial FB values)Fixed-size text chunks, model averaging between chunks for final model500,000 words of Brown corpus: 96% accurate after eight iterationsHMM Tagging: Prior Work 2MerialdoContrasted computed (Relative Frequency) vs trained (BWRE) modelsConstrained
11、 trainingKeep p(tag | word) constant from bootstrap corpus RFKeep p(tag) constant from bootstrap corpus RFConstraints allow degradation, but more slowlyConstraints required extensive calculationConstraints and HMM Tagging 1Elworthy: Accuracy of classic trained HMM always decreases after some pointFr
12、om Elworthy, “Does Baum-Welch Re-Estimation Help Taggers?”Constraints and HMM Tagging 2Tagging: An excellent candidate for a CSPMany degrees of freedom in nave caseLinguistically, only some few tagging solutions are possibleHMM, like modern CSP techniques, does not make final choices in orderMeriald
13、os t and t-w constraintsExpensive, but helpfulConstraints and HMM Tagging 3Obvious places to incorporate constraintsUpdates to A, B, Deny an update to A if tag at (t+1) should not follow tag at (t)Deny an update to B if we are confident that word at (t) should not be associated with tag at (t)Merial
14、dos t and t-w constraintsConstraints and HMM Tagging 4Obvious places to incorporate constraints Forward-Backward calculationsSome tags are linguistically impossible sequentiallyDeny transition probabilityConstraints and HMM Tagging 5Where to get constraints?Grammar databases (WordNet)Bootstrap corpu
15、sUse relative frequencies of tags to guess rulesUse frequencies of words to estimate confidenceAllow violations?reMarker: MotivationreMarker, an implementation in Java of HMM taggingSupport for multiple modelsModular updates for constraint implementationreMarker: The RealityHMM component too time-co
16、nsuming to debugPreliminary rule implementations based on corpus RFUsing Tapas Kanugos HMM implementation in C, externallyreMarker: MethodPenn-Treebank Wall Street Journal part-of-speech tagged dataCorpus handled as stream of wordsRestriciton of Kanugos HMM implementationResults in enormous resource
17、 requirementsResults in degradation of accuracy with increase in training data sizereMarker: ExperimentTwo corpora200 words of PT WSJ Section 005000 words of PT WSJ Section 00Three training methodsRelative Frequency, computedSupervised, but with BWREUnsupervised BWREreMarker: Results200 word corpus5
18、000 word corpusRelative Frequency100%98.0%Supervised, BW estimated80.09%50.04%Unsupervised, BW estimated43.69%22.96%Future WorkFix the reMarker HMMAllow corpus chunkingAllow more complicated constraintsIncorporate tighter constraintsMerialdos t and t-wPossible POS for each word: WordNetMachine-learned rulesReferencesA Tutorial on Hidden Markov Models. Rakesh Dugad and U. B. Desai. Technical Report, Signal Processing and Artificial Neural Networks Laboratory,
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
- 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 社區(qū)人防物資管理制度
- 社區(qū)門診醫(yī)療管理制度
- 電廠設備輪換管理制度
- 藝術學校檔案管理制度
- 股票中介公司管理制度
- 乘務管理員管理制度
- 樂高俱樂部管理制度
- 社區(qū)醫(yī)院處方管理制度
- 藥企公司內部管理制度
- 稻田養(yǎng)殖黃鱔管理制度
- 杭州市拱墅區(qū)部分校教科版六年級下冊期末考試科學試卷(原卷版)
- 2025年甘肅農墾集團招聘筆試參考題庫含答案解析
- 租房合同范本下載(可直接打印)
- 2024年河北省中考地理試題(含答案解析)
- MOOC 模擬電子技術基礎-華中科技大學 中國大學慕課答案
- 駕照體檢表完整版本
- 中考數(shù)學復習專題二方程與不等式
- 大隱靜脈曲張護理查房精選幻燈片
- 供應商管理庫存VMI的實施
- 色彩構成大學課件必看.ppt
- 公司“師帶徒”實施方案
評論
0/150
提交評論