《基于馬氏距離的自適應高斯混合模型的設計與實現路徑》10000字_第1頁
《基于馬氏距離的自適應高斯混合模型的設計與實現路徑》10000字_第2頁
《基于馬氏距離的自適應高斯混合模型的設計與實現路徑》10000字_第3頁
《基于馬氏距離的自適應高斯混合模型的設計與實現路徑》10000字_第4頁
《基于馬氏距離的自適應高斯混合模型的設計與實現路徑》10000字_第5頁
已閱讀5頁,還剩13頁未讀 繼續免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

基于馬氏距離的自適應高斯混合模型的設計與實現路徑目錄1引言..............................................................11.1選題背景及意義...............................................11.2設計基本架構.................................................2馬氏距離..........................................................2.1馬氏距離概述.................................................2.2馬氏距離的幾何意義...........................................2.3公式推導................................................2.4馬氏距離的優缺點.............................................3EM算法...........................................................3.1極大似然估計.................................................3.2EM算法......................................................3.2.1推導.......................................................3.2.2EM算法收斂性證明..........................................3.2.3EM算法性質................................................3.2.4EM算法應用................................................4高斯混合模型......................................................4.1無監督學習...................................................4.2模型結合及概述...............................................結束語參考文獻附錄1引言1.1選題背景及意義高斯密度函數屬于參數模型,包括單高斯模型(SingleGaussianModel,SGM)和高斯混合模型(Gaussianmixturemodel,GMM)兩種情況,SMG無法適應復雜的背景狀態只能進行微小的漸變,當混合具有不同分布的多個樣本時,單個高斯模型就沒辦法準確的顯示樣本特征,也無法清楚地分類樣本。為更精確的表示樣本所具有的不同統計規律,從而能更精確的描述樣本的統計特性,彌補SMG這方面的缺陷,因此引出高斯混合模型(陳浩宇,楊靜萱,2022)。如果有足夠多的高斯模型可以融合,并且它們之間的權重設置得當這個合理的高斯混合模型可以擬合任何分布的樣本,生成任何形狀的非線性函數,在這種理論框架的指引下可推導出并通過多次優化迭代來抵消隱藏的變量錯誤,從而生成更好的參數。為祛除數據之間的相關性,利用取自高斯分布部分參數所表示的馬氏距離來更好的描述具有不同統計概率的重疊率關系。無監督學習已經成為機器學習的趨勢,為實現自適應選擇,本文在查找大量文獻,進行對比分析算法性能,選擇貝葉斯相關學習方法。提出采用基于馬氏距離的自適應高斯混合模型確定最優數量的高斯混合模型并生成最優自適應高斯混合模型(邱天佑,秦文軒,2023)。1.2設計基本架構本研究針對高斯混合模型擬合訓練樣本成分數量難以確定問題、提出一種基于馬氏距離的增量高斯混合模型自適應確定成分數量區間、擬采用貝葉斯最優化準則通過百次運算法則確定最終自適應成分數量,實現高斯混合模型最優擬合給定的數據集,進行多次迭代優化,直至最優結果。最后通過對仿真數據集和實測數據集對所提算法進行性能評估。算法概述如下:(1)通過對數據的成分進行自適應分類來確定樣本間距。(2)根據貝葉斯信息準則(BIC)對數據進行分類,顯而易見的是并通過協方差獲得不同數據之間的位置關系,并確定類別。(3)分類的數據最適合于高斯混合模型,并經過持續迭代優化。(4)通過比較模擬和實際測量數據集來評估所提出算法的性能。2馬氏距離2.1馬氏距離概述馬氏距離(MD)由印度統計學家P.C.Mahalanobis提出,基于變量之間的相關性,通過該相關性可以識別和分析不同的模式,衡量未知樣本集與已知樣本集的相似性,是樣本點與分布之間的距離。它表示數據的協方差距離,且在總體樣本的基礎上進行計算(鄧嘉偉,李秀敏,2021)。也就是說如果拿同樣的兩個樣本,從這些反應可以推斷出放入兩個不同的總體中,最后計算得出的兩個樣本間的馬氏距離通常是不相同的,除非這兩個總體的協方差矩陣相同。對于一個均值μ=μ1,D同樣的當兩個數據點時,馬氏距離表示為:D其中∑是多維隨機變量所組成的協方差矩陣,μ為樣本均值,如果協方差矩陣是單位矩陣,也就是各維度獨立同分布,馬氏距離就變成了歐氏距離。本文在數據分析策略上,既運用了諸如描述性統計、回歸分析等傳統統計方法,也納入了現代數據挖掘技術及其算法。像利用聚類分析識別數據內部結構,或是通過決策樹進行趨勢預測。這些先進技術增強了對復雜現象的理解能力,并有助于挖掘大數據中的潛在關系。同時,本文注重融合定量與定性研究方法,力求提供一個全方位的研究視角。2.2馬氏距離的幾何意義用主成分分析將變量進行旋轉,使維度之間相互獨立;進行標準化,使維度同分布。由PCA可知,主成分就是特征向量的方向,每個方向的方差即對應的特征值,故按照特征向量的方向旋轉,再縮放特征值倍數就可得所求(尤智淵,吳芳菲,2021)。在多維高斯向量中,A圖1.卡方分布(A)和生成的數據與擬合的A圖1.卡方分布(A)和生成的數據與擬合的GMM的混合分量之間的馬氏距離(B)B2.3公式推導通過上述分析結果看首先將樣本點旋轉至主成分,使其維度間線性無關,假定此時坐標為:Aμ(其次,變換之后維度的特征值為方差,且其維度間線性無關,則有:A?==將馬氏距離進行規范化,旋轉縮放之后即為歐式距離,故馬氏距離的計算公式為(侯俊杰,寧曉紅,2022):D===2.4馬氏距離優缺點在這個大前提下歐氏距離(Euclideandistance),是一個通常采用的距離定義,是在k維空間中兩個點之間的直線距離。在二維和三維空間中的歐氏距離的就是兩點之間的距離。但是在大部分的統計類問題中,坐標往往會有不同程度的波動。此時馬氏距離的提出,在這樣的前提之下有效的解決的這類問題(余睿德,穆俊馳,2023)。它展示了兩個服從于同一分布的且協方差矩陣為∑的隨機變量的差異程度。但是由于協方差矩陣的影響,馬氏距離的計算并不穩定。基于當前階段性的研究成果總結,對后續工作產生了指導意義。尤其是在研究方法上,本文識別出多個方面可以進行優化的空間。前一階段的經驗教訓為本文展示了哪些做法有效,哪些則需改進或淘汰。比如,在數據采集時,本文需要更加關注樣本的廣泛性和代表性,確保選取的樣本能夠全面代表目標人群的特點。另外,面對不同研究議題時,采用多種數據搜集技術有助于增強數據的覆蓋面和可信度。因為它有求逆矩陣的過程,協方差矩陣必須滿秩,所以要求數據要有原維度等量的特征值。在計算馬氏距離過程中,要求總體樣本數大于樣本的維數,否則得到的總體樣本協方差矩陣逆矩陣不存在。雖然滿足條件總體樣數大于樣本維數,但是協方差矩陣仍不存在也需要采用歐氏距離計算(邱雨昕,唐羽澄,2022)。同時,馬氏距離不被量綱所影響,且與原始數據的測量單位無關,原始數據和均值的差計算出的兩個樣本點的馬氏距離是一樣的,從這能見其概此外還可以排除變量之間的相關性的干擾(許睿羽,黃澤謙,2022)。因此,本文提出采用基于馬氏距離的自適應高斯混合模型。在經典歐氏距離的求解中,我們很直觀可以看出其分布均為球型分布,也就是說歐氏距離只適用于理想情況下的,歐幾里得空間內的直線距離;若有若干復雜模型,因互相之間的干擾構成橢球型,通過上述事實能知曉歐氏距離明顯無法完美的解決。馬氏距離就應運而生,在各變量符合正態分布的情況下,排除各成分之間的量綱干擾,無單位影響,擴大一些微小的量,排除離群值(孫羽航,周佳慧,2021)。在圖2.中,四個點到觀測中心的歐氏距離都相等,然而卻并不是都屬于該類。此時的藍色點的馬氏距離明顯要比黃色點的馬氏距離小,我們有理由判斷藍色點更有可能屬于這個類(郭浩,韓雨萱,2022)。圖2.馬氏距離圖解圖2.馬氏距離圖解3自動模型的選擇這在一定層面上表露大量的無監督學習中,聚類和降維的方法成為最基本的問題。高斯混合模型因為其包含的協方差矩陣,需要有充足的運行數據才能夠保證模型參數的準確性;但是當將其協方差矩陣換為對角協方差矩陣時,則需要足夠多的高斯模型才能提供較高的識別能力。在高斯混合模型中的降維方法中,在這般的環境中局部因子分析成為主流方法。通過該方法,降低協方差矩陣的自由度,提高準確性(鄧澤洋,吳彤彤,2021)。數據分析期間,本文借助多種統計方法來核實數據的可靠性,并探尋潛在的異常數值。通過深入探究數據分布特性,本文精確地移除了異常數據點,同時確保核心樣本信息得到保留。除此之外,本文還利用敏感性實驗來測定參數波動對研究結論的穩定性及通用性的影響。為選擇局部因子分析的成分數量和局部的維度時,通常借助的統計準則之一是極大似然學習。但是其計算較為復雜,在1994年提出BayesianYing-Yang(BYY),經過不斷地發展完善,成為通用的學習框架。BYYharmonylearning由BYY系統和基本的harmonylearning原則組成,與局部因子分析組合,成為一種正則化的方法,執行參數學習和自動模型選擇(何炳福,周志時,2022)ADDINCSL_CITATION{"citationItems":[{"id":"ITEM-1","itemData":{"DOI":"10.1007/11840930_27","ISBN":"3540388710","ISSN":"16113349","abstract":"AfurtherinvestigationismadeonanadaptivelocalfactoranalysisalgorithmfromBayesianYing-Yang(BYY)harmonylearning,whichmakesparameterlearningwithautomaticdeterminationofboththecomponentnumberandthefactornumberineachcomponent.Acomparativestudyhasbeenconductedonsimulateddatasetsandseveralrealproblemdatasets.ThealgorithmhasbeencomparedwithnotonlyarecentapproachcalledIncrementalMixtureofFactorAnalysers(IMoFA)butalsotheconventionaltwo-stageimplementationofmaximumlikelihood(ML)plusmodelselection,namely,usingtheEMalgorithmforparameterlearningonaseriescandidatemodels,andselectingonebestcandidatebyAIC,CAIC,andBIC.ExperimentshaveshownthatIMoFAandML-BICoutperformML-AICorML-CAICwhiletheBYYharmonylearningconsiderablyoutperformsIMoFAandML-BIC.Furthermore,thisBYYlearningalgorithmhasbeenappliedtothepopularMNISTdatabasefordigitsrecognitionwithapromisingperformance.?Springer-VerlagBerlinHeidelberg2006.","author":[{"dropping-particle":"","family":"Shi","given":"Lei","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Xu","given":"Lei","non-dropping-particle":"","parse-names":false,"suffix":""}],"container-title":"LectureNotesinComputerScience(includingsubseriesLectureNotesinArtificialIntelligenceandLectureNotesinBioinformatics)","id":"ITEM-1","issue":"Ml","issued":{"date-parts":[["2006"]]},"page":"260-269","title":"Localfactoranalysiswithautomaticmodelselection:Acomparativestudyanddigitsrecognitionapplication","type":"article-journal","volume":"4132LNCS"},"uris":["/documents/?uuid=d0548008-1e94-46a5-a4c6-1ae0e7632c17"]}],"mendeley":{"formattedCitation":"[2]","plainTextFormattedCitation":"[2]","previouslyFormattedCitation":"[2]"},"properties":{"noteIndex":0},"schema":"/citation-style-language/schema/raw/master/csl-citation.json"}[2]。在文獻ADDINCSL_CITATION{"citationItems":[{"id":"ITEM-1","itemData":{"DOI":"10.1007/11840930_27","ISBN":"3540388710","ISSN":"16113349","abstract":"AfurtherinvestigationismadeonanadaptivelocalfactoranalysisalgorithmfromBayesianYing-Yang(BYY)harmonylearning,whichmakesparameterlearningwithautomaticdeterminationofboththecomponentnumberandthefactornumberineachcomponent.Acomparativestudyhasbeenconductedonsimulateddatasetsandseveralrealproblemdatasets.ThealgorithmhasbeencomparedwithnotonlyarecentapproachcalledIncrementalMixtureofFactorAnalysers(IMoFA)butalsotheconventionaltwo-stageimplementationofmaximumlikelihood(ML)plusmodelselection,namely,usingtheEMalgorithmforparameterlearningonaseriescandidatemodels,andselectingonebestcandidatebyAIC,CAIC,andBIC.ExperimentshaveshownthatIMoFAandML-BICoutperformML-AICorML-CAICwhiletheBYYharmonylearningconsiderablyoutperformsIMoFAandML-BIC.Furthermore,thisBYYlearningalgorithmhasbeenappliedtothepopularMNISTdatabasefordigitsrecognitionwithapromisingperformance.?Springer-VerlagBerlinHeidelberg2006.","author":[{"dropping-particle":"","family":"Shi","given":"Lei","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Xu","given":"Lei","non-dropping-particle":"","parse-names":false,"suffix":""}],"container-title":"LectureNotesinComputerScience(includingsubseriesLectureNotesinArtificialIntelligenceandLectureNotesinBioinformatics)","id":"ITEM-1","issue":"Ml","issued":{"date-parts":[["2006"]]},"page":"260-269","title":"Localfactoranalysiswithautomaticmodelselection:Acomparativestudyanddigitsrecognitionapplication","type":"article-journal","volume":"4132LNCS"},"uris":["/documents/?uuid=d0548008-1e94-46a5-a4c6-1ae0e7632c17"]}],"mendeley":{"formattedCitation":"[2]","plainTextFormattedCitation":"[2]","previouslyFormattedCitation":"[2]"},"properties":{"noteIndex":0},"schema":"/citation-style-language/schema/raw/master/csl-citation.json"}[2]中,比較了局部因子分析的性能,測試的范圍為極大似然估計與AIC,CAIC,BIC三者的分別組合,為了避免初始化導致的局部最優,分別進行十次BYYharmony算法并且對應實施EM算法(許珂茜,付明哲,2019)。進行100次模擬得到平均值。結果表明BYY-LFA在性能和計算時間上的表現是最好的。為了驗證和優化理論結構,本文積累了豐富的數據材料。這些數據不僅涉及多樣的研究對象,還橫跨不同的歷史時期和社會背景,為理論框架的全方位驗證提供了重要依據。借助統計軟件對數據進行解析,有助于確認理論中的各項假設是否成立,并發現其不足。后續的研究工作考慮加入更多因素或者采用更大量的樣本,以提升理論框架的有效性和前瞻性。4極大似然估計極大似然估計(MLE)就是在知道結果的情況下,通過概率的計算比較,找到最有可能導致這個結果的參數值,就需要考慮到先驗概率和后驗概率ADDINCSL_CITATION{"citationItems":[{"id":"ITEM-1","itemData":{"ISBN":"9781467347143","author":[{"dropping-particle":"","family":"Wang","given":"Pingbo","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Wang","given":"Yu","non-dropping-particle":"","parse-names":false,"suffix":""}],"id":"ITEM-1","issued":{"date-parts":[["2013"]]},"page":"1454-1458","publisher":"IEEE","title":"TwoIterativeAlgorithmsforMaximumLikelihoodEsitimationofGaussianMixtureParameter","type":"article-journal"},"uris":["/documents/?uuid=ebf02573-8fec-40a3-a5ce-9a2aae377728"]}],"mendeley":{"formattedCitation":"[3]","plainTextFormattedCitation":"[3]","previouslyFormattedCitation":"[3]"},"properties":{"noteIndex":0},"schema":"/citation-style-language/schema/raw/master/csl-citation.json"}[3],在此,先介紹一下較為簡單的貝葉斯準則。經典的貝葉斯公式如下(林志博,何夢琪,2021):p其中,pa為先驗概率,即模型內不同類別的分布概率;pab為類條件概率;相應的pba在統計學的實際應用中,似然函數因結構復雜導致難以計算最大化時,通常會有參數估計的問題ADDINCSL_CITATION{"citationItems":[{"id":"ITEM-1","itemData":{"ISBN":"9788578110796","ISSN":"1098-6596","PMID":"25246403","abstract":"Hasilpenelitian,sistemkomisimerupakansalahsatusistempenjualan(sistemoperasi)jasaangkutantaksiyangberlaku/diterapkandiDKIJakarta.Penelitianinimencobamengungkapduahalyangberkaitandengansistemkomisiditinjaudariaspekpengemudi,yaitu(1)bagaimanasikappengemuditaksiterhadapsistemkomisijasaangkutantaksidan(2)bagaimanatingkatkepuasanpengemuditaksiterhadapsistemkomisijasaangkutantaksi.Hasilkajianmenunjukkansikappengemuditaksibluebirdterhadapsistemkomisijasaangkutantaksibluebirdumumnyarelatifpositif,setujuataumaumenerimadengantingkatjawabandiataslimapuluhpersen.Tingkatkepuasanresponden(pengemudi)atassistemkomisiyangditerapkanrelatifrendahdibawahlimapuluhpersen,Banyakfaktoryangmempengaruhikepuasanatauketidakpuasanseorangpengemudidalammelaksanakanpekerjaandalamsistemkomisi,diantaranyadidugadipengaruhiolehtargetsetoranyangrelatiftinggidantingkatpersainganyangsemakinketatsesamaoperatortaksi.Dampaknyamenurunnyahasilpenjualanyangakhirnyadapatmengurangitingkatkesejahteraanpengemudi.Perludilakukanpenelitianlanjutanterhadapsistempenjualan(sistemkomisi)jasataksiditinjaudariaspekperusahaanatauoperatortaksi.Perusahaantaksilebihmemperhatikanaspekkesejahteraanpengemuditaksi,sepertimenurunkanbatastargetsetoransecararealistisagartingkatkepuasanpengemudimeningkat.","author":[{"dropping-particle":"","family":"Basuki","given":"Kustiadi","non-dropping-particle":"","parse-names":false,"suffix":""}],"container-title":"ISSN2502-3632(Online)ISSN2356-0304(Paper)JurnalOnlineInternasional&NasionalVol.7No.1,Januari–Juni2019Universitas17Agustus1945Jakarta","id":"ITEM-1","issue":"9","issued":{"date-parts":[["2019"]]},"number-ofs":"1689-1699","title":"済無NoTitleNoTitle","type":"book","volume":"53"},"uris":["/documents/?uuid=d8e5aab1-d8b5-45d5-9252-6ad320509c84"]}],"mendeley":{"formattedCitation":"[4]","plainTextFormattedCitation":"[4]","previouslyFormattedCitation":"[4]"},"properties":{"noteIndex":0},"schema":"/citation-style-language/schema/raw/master/csl-citation.json"}[4],EM算法可以有效地解決極大似然估計中更復雜的情形,是最常見的隱變量估計方法之一。這在一定范圍內顯示了它采用一種迭代優化的思想,可用來求解最優值,被廣泛用于查找高斯概率密度函數或簡要地以高似然性擬合樣本測量向量的高斯分量的混合參數(邱晨曦,蔣涵瑤,2019)。EM算法的每次迭代都涉及兩個步驟,我們將其稱為期望步驟和最大化步,其過程之所以引人注目,部分原因是相關理論的簡單性和普遍性,部分原因是因為它涵蓋了廣泛的示例。當基礎完整數據來自易于計算其最大似然估計的指數族時,在這種理論框架的指引下可推導出同樣容易計算EM算法的每個最大化步驟。然而EM算法不局限于找到概率密度函數的參數,還可用于(朱文靜,高夢媛,2020):(1)檢測偏離先驗已知的樣本;(2)找到達到最低預測誤差的特征子集;(3)使用加權最小二乘法找到加權參數。算法具體組成如圖1.所示EMEM算法已知結果,尋求使該結果出現的可能性最大的條件,以此作為估計值。極大似然估計Jensen不等式圖3.算法組成5EM算法期望最大化(EM)是可迭代計算最大似然(MLE)計值的一種廣泛適用的方法,可用于各種不完整數據問題。最大似然估計和基于似然的推斷在統計理論和數據分析中至關重要。最大似然估計是一種非常重要的通用方法。它是概率論框架中最常用的估計技術。它也與貝葉斯框架相關ADDINCSL_CITATION{"citationItems":[{"id":"ITEM-1","itemData":{"author":[{"dropping-particle":"","family":"Geoffrey","given":"J","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Ket","given":"See","non-dropping-particle":"","parse-names":false,"suffix":""}],"id":"ITEM-1","issued":{"date-parts":[["2004"]]},"title":"www.econstor.eu","type":"article-journal"},"uris":["/documents/?uuid=950e228d-878f-4cfd-9576-0b7fa4db1d30"]}],"mendeley":{"formattedCitation":"[5]","plainTextFormattedCitation":"[5]","previouslyFormattedCitation":"[5]"},"properties":{"noteIndex":0},"schema":"/citation-style-language/schema/raw/master/csl-citation.json"}[5],顯而易見的是貝葉斯解決方案在似然和最大似然估計的幫助下證明得出的,貝葉斯解決方案與懲罰似然估計ADDINCSL_CITATION{"citationItems":[{"id":"ITEM-1","itemData":{"author":[{"dropping-particle":"","family":"Sankhyā","given":"Source","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Indian","given":"The","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Series","given":"a","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Feb","given":"No","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Url","given":"Stable","non-dropping-particle":"","parse-names":false,"suffix":""}],"container-title":"OrderAJournalOnTheTheoryOfOrderedSetsAndItsApplications","id":"ITEM-1","issue":"1","issued":{"date-parts":[["2012"]]},"page":"49-66","title":"IndianStatisticalInstituteConsistentEstimationoftheOrderofMixtureModelsAuthor(s):C.KeribinCONSISTENTESTIMATIONOFTHEORDEROF","type":"article-journal","volume":"62"},"uris":["/documents/?uuid=50a925de-9be2-4ef4-a819-e5b860b7be6c"]}],"mendeley":{"formattedCitation":"[6]","plainTextFormattedCitation":"[6]","previouslyFormattedCitation":"[6]"},"properties":{"noteIndex":0},"schema":"/citation-style-language/schema/raw/master/csl-citation.json"}[6],最大似然估計是一種普遍存在的技術,并且在統計學域中得到廣泛使用。在處理數據時,以往研究的經驗表明應加大新興技術工具的應用力度。隨著信息技術的日新月異,大數據分析、機器學習等前沿手段已成為科研不可或缺的部分。這類技術不僅提高了處理大規模數據的效率,還能揭示傳統方法無法觸及的深層次結構和模式。因此,后續研究需深入探討如何將這些先進技術整合進分析流程中,以增強研究成果的準確性和深度。經典的似然函數是用數值迭代的方法求解,例如牛頓-拉夫森(NR)ADDINCSL_CITATION{"citationItems":[{"id":"ITEM-1","itemData":{"DOI":"10.1109/ICOMSSC45026.2018.8941745","ISBN":"9781538667514","abstract":"Systemidentificationmethodandhybridmodelingmethod,usingamathematicalmodelestablishedbythesametheoreticalmethod,duetotheuncertaintyofthestructure,parameters,andenvironmentofthecontrolledobject,areaffectedbyfactorssuchasthespecificenvironment.Indifferentenvironments,thespecificstructuralparametersofthemathematicalmodelarenotexactlythesame.Therefore,thesystemidentificationmethodhasmorepracticalapplicationvaluethanthetheoreticalmodelingmethod.Inthispaper,aNewton-Raphsonevaluationsystemidentificationmethodisproposed.Theshortcomingsofthepreviousidentificationsystemcanbesatisfactorilyidentified,andsimulationexperimentsandphysicalexperimentscanbecarriedout.Thefeasibilityandeffectivenessoftheproposedidentificationandmodelingmethodareverified.","author":[{"dropping-particle":"","family":"Zhang","given":"Wanjun","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Zhang","given":"Feng","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Zhang","given":"Jingxuan","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Zhang","given":"Jingyi","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Zhang","given":"Jingyan","non-dropping-particle":"","parse-names":false,"suffix":""}],"container-title":"Proceedingsof2018InternationalComputers,SignalsandSystemsConference,ICOMSSC2018","id":"ITEM-1","issue":"1","issued":{"date-parts":[["2018"]]},"page":"737-742","publisher":"IEEE","title":"StudyonSystemRecognitionMethodforNewton-RaphsonIterations","type":"article-journal"},"uris":["/documents/?uuid=277a3eaf-b9a1-474c-b8f9-76597eb5fb78"]}],"mendeley":{"formattedCitation":"[7]","plainTextFormattedCitation":"[7]","previouslyFormattedCitation":"[7]"},"properties":{"noteIndex":0},"schema":"/citation-style-language/schema/raw/master/csl-citation.json"}[7]及其系列算法,例如Fisher評分方法(鄭思遠,謝雨欣,2019):從這些反應可以推斷出在對樣本做出合理假設和足夠準確的初始值的情況下,由NR算法生成相應的離散序列具有局部二次收斂的特性,這也是該算法的主要優勢。但是在分析和計算上,就顯得非常局限ADDINCSL_CITATION{"citationItems":[{"id":"ITEM-1","itemData":{"DOI":"10.1111/j.2517-6161.1977.tb01600.x","abstract":"Abroadlyapplicablealgorithmforcomputingmaximumlikelihoodestimatesfromincompletedataispresentedatvariouslevelsofgenerality.Theoryshowingthemonotonebehaviourofthelikelihoodandconvergenceofthealgorithmisderived.Manyexamplesaresketched,includingmissingvaluesituations,applicationstogrouped,censoredortruncateddata,finitemixturemodels,variancecomponentestimation,hyperparameterestimation,iterativelyreweightedleastsquaresandfactoranalysis.","author":[{"dropping-particle":"","family":"Dempster","given":"A.P.","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Laird","given":"N.M.","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Rubin","given":"D.B.","non-dropping-particle":"","parse-names":false,"suffix":""}],"container-title":"JournaloftheRoyalStatisticalSociety:SeriesB(Methodological)","id":"ITEM-1","issue":"1","issued":{"date-parts":[["1977"]]},"page":"1-22","title":"MaximumLikelihoodfromIncompleteDataViatheEMAlgorithm","type":"article-journal","volume":"39"},"uris":["/documents/?uuid=b135b1c9-b1c3-4072-aa15-e323fde49701"]}],"mendeley":{"formattedCitation":"[8]","plainTextFormattedCitation":"[8]","previouslyFormattedCitation":"[8]"},"properties":{"noteIndex":0},"schema":"/citation-style-language/schema/raw/master/csl-citation.json"}[8],此時EM算法提供了一種更加完善的替代方法,成為主流的用于迭代似然函數的方法,用于解決數據丟失和信息不完整的各種問題(張志杰,陳雨婷,2022)。5.1推導我們令X=xi,(i=1,2,…,N)作為觀測數據,即X是一組隨機變量且屬于任意的樣本空間;θ為g(XL其中fx|θ為x的pdf,最終的目的是找到最優參數θ,記作θ?,讓Lμ,θ我們引入隱變量pa(xi)L式中fqxi|θq的先驗概率。與K-means有很大的不同,兩者之間最明顯的區別在于在K-means中每個樣本都被以0或1的概率進行分類(馮天宇,張紫怡,2020);在這個大前提下而在EM算法中是在0到1之間的概率值。在特殊情況下,密度函數fqp式中||Sq||是矩陣的行列式;D是Xp5.2EM算法收斂性在Dempster等人對EM算法進行常規描述之前,有學者在深入研究了該算法一般性的幾個收斂問題之后提出,當完整數據來自具有緊湊參數空間的彎曲指數族,并且當Q函數滿足某個適度的微分條件時,則任何EM序列都收斂到似然函數的固定點(不一定是最大值)(曾俊杰,韓璇瑩,2018)ADDINCSL_CITATION{"citationItems":[{"id":"ITEM-1","itemData":{"DOI":"10.1109/TSP.2008.917350","ISSN":"1053587X","abstract":"Inthispaper,theexpectation-maximization(EM)algorithmforGaussianmixturemodelingisimprovedviathreestatisticaltests.ThefirsttestisamultivariatenormalitycriterionbasedontheMahalanobisdistanceofasamplemeasurementvectorfromacertainGaussiancomponentcenter.Thefirsttestisusedinordertoderiveadecisionwhethertosplitacomponentintoanothertwoornot.ThesecondtestisacentraltendencycriterionbasedontheobservationthatmultivariatekurtosisbecomeslargeifthecomponenttobesplitisamixtureoftwoormoreunderlyingGaussiansourceswithcommoncenters.Ifthecommoncenterhypothesisistrue,thecomponentissplitintotwonewcomponentsandtheircentersareinitializedbythecenterofthe(old)componentcandidateforsplitting.Otherwise,thesplittingisaccomplishedbyadiscriminantderivedbythethirdtest.Thistestisbasedonmarginalcumulativedistributionfunctions.ExperimentalresultsarepresentedagainstsevenotherEMvariantsbothonartificiallygenerateddata-setsandrealones.TheexperimentalresultsdemonstratethattheproposedEMvarianthasanincreasedcapabilitytofindtheunderlyingmodel,whilemaintainingalowexecutiontime.?2008IEEE.","author":[{"dropping-particle":"","family":"Ververidis","given":"Dimitrios","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Kotropoulos","given":"Constantine","non-dropping-particle":"","parse-names":false,"suffix":""}],"container-title":"IEEETransactionsonSignalProcessing","id":"ITEM-1","issue":"7I","issued":{"date-parts":[["2008"]]},"page":"2797-2811","title":"GaussianmixturemodelingbyexploitingtheMahalanobisdistance","type":"article-journal","volume":"56"},"uris":["/documents/?uuid=a7bb29e6-30ee-493a-8160-67bcfae0b7e5"]}],"mendeley":{"formattedCitation":"[9]","plainTextFormattedCitation":"[9]","previouslyFormattedCitation":"[9]"},"properties":{"noteIndex":0},"schema":"/citation-style-language/schema/raw/master/csl-citation.json"}[9]。研究所得出的初期結果與先前的計算數據和文獻回顧結果大體一致,這表明了本研究在方法上的可靠性和有效性。一致性不僅重申了早期研究的結論,也給現有的理論框架帶來了新的證據。通過精心設計的研究方案、詳盡的數據收集及嚴格的分析程序,本文能夠復制前人的關鍵發現,并在此基礎上進行深化研究。這不僅提高了對研究假設的信心,也體現了所選研究方法的科學依據。此外,這種一致性為跨學科研究間的對比分析提供了依據,有助于建立更加綜合和統一的理論體系。為解決極大似然估計中的尋找最佳參數θ,需用到EM算法來解決這個復雜情況。在面臨僅知道結果卻不知道是哪個概率分布實現的問題,我們需要根據現有的觀測數據先行估計一個大概的值,再根據這個估計值來推敲未觀測數據x的條件概率的期望,最終由求到的期望根據極大似然估計獲得最優值θ?。但是僅僅由這一次的估計所得到的最優,并不是最終的結果,在這樣的前提之下再不斷地重復進行該操作,直到最終收斂時,得到的結果才是我們需要的值。若想在不斷優化之后得到值,則算法一定趨于某個值,也就是說,其結果一定是收斂的。證明如下(孔鵬飛,謝茹潔,2022):用極大似然函數遞推得到某模型的參數,公式表示如下:L模型之中含有未知的隱藏變量,假設為Z,則:L從這能見其概式中的求和實際上就是求Z的期望,假定它的概率為yi(Z),取值的分布為g(Z)L==由簡森不等式(見圖4.)可以推出,凸函數f(E(L圖4.簡森不等式圖解很容易得出Lθ有下屆,則每次的迭代優化過程中,提供一個下屆J(Z,θ),不斷地提高圖4.簡森不等式圖解在收斂性推導過程中,引入了一個新的變量yi當簡森不等式相等時,pxi則(王文濤,王雪霏,2020)y==p(Z|5.32EM算法的初始值當初始值選擇出現偏差時,EM算法的收斂速度會降低ADDINCSL_CITATION{"citationItems":[{"id":"ITEM-1","itemData":{"DOI":"10.1109/TSP.2008.917350","ISSN":"1053587X","abstract":"Inthispaper,theexpectation-maximization(EM)algorithmforGaussianmixturemodelingisimprovedviathreestatisticaltests.ThefirsttestisamultivariatenormalitycriterionbasedontheMahalanobisdistanceofasamplemeasurementvectorfromacertainGaussiancomponentcenter.Thefirsttestisusedinordertoderiveadecisionwhethertosplitacomponentintoanothertwoornot.ThesecondtestisacentraltendencycriterionbasedontheobservationthatmultivariatekurtosisbecomeslargeifthecomponenttobesplitisamixtureoftwoormoreunderlyingGaussiansourceswithcommoncenters.Ifthecommoncenterhypothesisistrue,thecomponentissplitintotwonewcomponentsandtheircentersareinitializedbythecenterofthe(old)componentcandidateforsplitting.Otherwise,thesplittingisaccomplishedbyadiscriminantderivedbythethirdtest.Thistestisbasedonmarginalcumulativedistributionfunctions.ExperimentalresultsarepresentedagainstsevenotherEMvariantsbothonartificiallygenerateddata-setsandrealones.TheexperimentalresultsdemonstratethattheproposedEMvarianthasanincreasedcapabilitytofindtheunderlyingmodel,whilemaintainingalowexecutiontime.?2008IEEE.","author":[{"dropping-particle":"","family":"Ververidis","given":"Dimitrios","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Kotropoulos","given":"Constantine","non-dropping-particle":"","parse-names":false,"suffix":""}],"container-title":"IEEETransactionsonSignalProcessing","id":"ITEM-1","issue":"7I","issued":{"date-parts":[["2008"]]},"page":"2797-2811","title":"GaussianmixturemodelingbyexploitingtheMahalanobisdistance","type":"article-journal","volume":"56"},"uris":["/documents/?uuid=a7bb29e6-30ee-493a-8160-67bcfae0b7e5"]}],"mendeley":{"formattedCitation":"[9]","plainTextFormattedCitation":"[9]","previouslyFormattedCitation":"[9]"},"properties":{"noteIndex":0},"schema":"/citation-style-language/schema/raw/master/csl-citation.json"}[9]。通過上述事實能知曉但是存在這種可能性,在參數空間的邊緣不受限制的情況下,如果選擇初始值太接近邊界,則由EM算法生成的估計序列可能會發散。另外對于似然方程具有對應于局部最大值的多個根的應用,在對所有局部最大值檢索時,這在一定層面上表露應從廣泛的初始值選擇中應用EM算法。其一種變體使用間隔分析方法ADDINCSL_CITATION{"citationItems":[{"id":"ITEM-1","itemData":{"DOI":"10.1109/ISCAS.2010.5537044","ISBN":"9781424453085","abstract":"ThisworkproposesalowcomplexitycomputationofEMalgorithmforGaussianmixturemodel(GMM)andacceleratestheparameterestimation.Inpreviousworks,theauthorsrevealedthatthecomputationalcomplexityofGMM-basedclassificationcanbereducedbyusinganintervalcalculationtechnique[1],[2].ThisworkappliestheideatoEMalgorithmforGMMparameterestimation.Fromexperiments,itisconfirmedthatthecomputationalspeedoftheproposalachievesmorethantwicethatofthestandardmethodwith'exp()'function.Therelativeerrorsarelessthan0.6%and0.053%whenthenumberofbitsfortableaddressingare4and8,respectively.?2010IEEE.","author":[{"dropping-particle":"","family":"Watanabe","given":"Hidenori","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Muramatsu","given":"Shogo","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Kikuchi","given":"Hisakazu","non-dropping-particle":"","parse-names":false,"suffix":""}],"container-title":"ISCAS2010-2010IEEEInternationalSymposiumonCircuitsandSystems:Nano-BioCircuitFabricsandSystems","id":"ITEM-1","issue":"8","issued":{"date-parts":[["2010"]]},"page":"2686-2689","title":"IntervalcalculationofEMalgorithmforGMMparameterestimation","type":"article-journal"},"uris":["/documents/?uuid=efc0cd48-a65f-4b7e-a496-0359ea57137d"]}],"mendeley":{"formattedCitation":"[10]","plainTextFormattedCitation":"[10]","previouslyFormattedCitation":"[10]"},"properties":{"noteIndex":0},"schema":"/citation-style-language/schema/raw/master/csl-citation.json"}[10]在參數空間的任何指定區域內找到對數似然函數的多個固定點。本文在這篇文章中構建了一個新的框架模型,該模型在信息流通及數據分析技術上均反映了對以往學術成就的認可和繼承,并在其之上實現了突破和進步。尤其是在信息流通的設計上,采用了傳統的信息管理原則,確保信息從獲取到分析的每一個階段都準確無誤。借助嚴格的數據甄選機制和規范化的處理流程,提升了信息品質,同時也強調了信息流的公開透明和可追溯性。在指定混合模型的初始值時,有以下方法(徐星宇,李若彤,2020):在g個成分的混合模型的獨立數據,E-step的作用在更新該成分的后驗概率,可通過以下方法找到第一個E-step:指定τj(0)(j=1,…,n)來執行第一步,τZ將整體分為x個組件,比如在g=1的正態分量和對應的協方差矩陣混合的情況下,采用初步劃分數據的方法,在這般的環境中提供兩個p變量。對于高維

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業或盈利用途。
  • 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論