《基于馬氏距離的自適應(yīng)高斯混合模型的設(shè)計與實現(xiàn)路徑》10000字_第1頁
《基于馬氏距離的自適應(yīng)高斯混合模型的設(shè)計與實現(xiàn)路徑》10000字_第2頁
《基于馬氏距離的自適應(yīng)高斯混合模型的設(shè)計與實現(xiàn)路徑》10000字_第3頁
《基于馬氏距離的自適應(yīng)高斯混合模型的設(shè)計與實現(xiàn)路徑》10000字_第4頁
《基于馬氏距離的自適應(yīng)高斯混合模型的設(shè)計與實現(xiàn)路徑》10000字_第5頁
已閱讀5頁,還剩13頁未讀 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

基于馬氏距離的自適應(yīng)高斯混合模型的設(shè)計與實現(xiàn)路徑目錄1引言..............................................................11.1選題背景及意義...............................................11.2設(shè)計基本架構(gòu).................................................2馬氏距離..........................................................2.1馬氏距離概述.................................................2.2馬氏距離的幾何意義...........................................2.3公式推導................................................2.4馬氏距離的優(yōu)缺點.............................................3EM算法...........................................................3.1極大似然估計.................................................3.2EM算法......................................................3.2.1推導.......................................................3.2.2EM算法收斂性證明..........................................3.2.3EM算法性質(zhì)................................................3.2.4EM算法應(yīng)用................................................4高斯混合模型......................................................4.1無監(jiān)督學習...................................................4.2模型結(jié)合及概述...............................................結(jié)束語參考文獻附錄1引言1.1選題背景及意義高斯密度函數(shù)屬于參數(shù)模型,包括單高斯模型(SingleGaussianModel,SGM)和高斯混合模型(Gaussianmixturemodel,GMM)兩種情況,SMG無法適應(yīng)復雜的背景狀態(tài)只能進行微小的漸變,當混合具有不同分布的多個樣本時,單個高斯模型就沒辦法準確的顯示樣本特征,也無法清楚地分類樣本。為更精確的表示樣本所具有的不同統(tǒng)計規(guī)律,從而能更精確的描述樣本的統(tǒng)計特性,彌補SMG這方面的缺陷,因此引出高斯混合模型(陳浩宇,楊靜萱,2022)。如果有足夠多的高斯模型可以融合,并且它們之間的權(quán)重設(shè)置得當這個合理的高斯混合模型可以擬合任何分布的樣本,生成任何形狀的非線性函數(shù),在這種理論框架的指引下可推導出并通過多次優(yōu)化迭代來抵消隱藏的變量錯誤,從而生成更好的參數(shù)。為祛除數(shù)據(jù)之間的相關(guān)性,利用取自高斯分布部分參數(shù)所表示的馬氏距離來更好的描述具有不同統(tǒng)計概率的重疊率關(guān)系。無監(jiān)督學習已經(jīng)成為機器學習的趨勢,為實現(xiàn)自適應(yīng)選擇,本文在查找大量文獻,進行對比分析算法性能,選擇貝葉斯相關(guān)學習方法。提出采用基于馬氏距離的自適應(yīng)高斯混合模型確定最優(yōu)數(shù)量的高斯混合模型并生成最優(yōu)自適應(yīng)高斯混合模型(邱天佑,秦文軒,2023)。1.2設(shè)計基本架構(gòu)本研究針對高斯混合模型擬合訓練樣本成分數(shù)量難以確定問題、提出一種基于馬氏距離的增量高斯混合模型自適應(yīng)確定成分數(shù)量區(qū)間、擬采用貝葉斯最優(yōu)化準則通過百次運算法則確定最終自適應(yīng)成分數(shù)量,實現(xiàn)高斯混合模型最優(yōu)擬合給定的數(shù)據(jù)集,進行多次迭代優(yōu)化,直至最優(yōu)結(jié)果。最后通過對仿真數(shù)據(jù)集和實測數(shù)據(jù)集對所提算法進行性能評估。算法概述如下:(1)通過對數(shù)據(jù)的成分進行自適應(yīng)分類來確定樣本間距。(2)根據(jù)貝葉斯信息準則(BIC)對數(shù)據(jù)進行分類,顯而易見的是并通過協(xié)方差獲得不同數(shù)據(jù)之間的位置關(guān)系,并確定類別。(3)分類的數(shù)據(jù)最適合于高斯混合模型,并經(jīng)過持續(xù)迭代優(yōu)化。(4)通過比較模擬和實際測量數(shù)據(jù)集來評估所提出算法的性能。2馬氏距離2.1馬氏距離概述馬氏距離(MD)由印度統(tǒng)計學家P.C.Mahalanobis提出,基于變量之間的相關(guān)性,通過該相關(guān)性可以識別和分析不同的模式,衡量未知樣本集與已知樣本集的相似性,是樣本點與分布之間的距離。它表示數(shù)據(jù)的協(xié)方差距離,且在總體樣本的基礎(chǔ)上進行計算(鄧嘉偉,李秀敏,2021)。也就是說如果拿同樣的兩個樣本,從這些反應(yīng)可以推斷出放入兩個不同的總體中,最后計算得出的兩個樣本間的馬氏距離通常是不相同的,除非這兩個總體的協(xié)方差矩陣相同。對于一個均值μ=μ1,D同樣的當兩個數(shù)據(jù)點時,馬氏距離表示為:D其中∑是多維隨機變量所組成的協(xié)方差矩陣,μ為樣本均值,如果協(xié)方差矩陣是單位矩陣,也就是各維度獨立同分布,馬氏距離就變成了歐氏距離。本文在數(shù)據(jù)分析策略上,既運用了諸如描述性統(tǒng)計、回歸分析等傳統(tǒng)統(tǒng)計方法,也納入了現(xiàn)代數(shù)據(jù)挖掘技術(shù)及其算法。像利用聚類分析識別數(shù)據(jù)內(nèi)部結(jié)構(gòu),或是通過決策樹進行趨勢預(yù)測。這些先進技術(shù)增強了對復雜現(xiàn)象的理解能力,并有助于挖掘大數(shù)據(jù)中的潛在關(guān)系。同時,本文注重融合定量與定性研究方法,力求提供一個全方位的研究視角。2.2馬氏距離的幾何意義用主成分分析將變量進行旋轉(zhuǎn),使維度之間相互獨立;進行標準化,使維度同分布。由PCA可知,主成分就是特征向量的方向,每個方向的方差即對應(yīng)的特征值,故按照特征向量的方向旋轉(zhuǎn),再縮放特征值倍數(shù)就可得所求(尤智淵,吳芳菲,2021)。在多維高斯向量中,A圖1.卡方分布(A)和生成的數(shù)據(jù)與擬合的A圖1.卡方分布(A)和生成的數(shù)據(jù)與擬合的GMM的混合分量之間的馬氏距離(B)B2.3公式推導通過上述分析結(jié)果看首先將樣本點旋轉(zhuǎn)至主成分,使其維度間線性無關(guān),假定此時坐標為:Aμ(其次,變換之后維度的特征值為方差,且其維度間線性無關(guān),則有:A?==將馬氏距離進行規(guī)范化,旋轉(zhuǎn)縮放之后即為歐式距離,故馬氏距離的計算公式為(侯俊杰,寧曉紅,2022):D===2.4馬氏距離優(yōu)缺點在這個大前提下歐氏距離(Euclideandistance),是一個通常采用的距離定義,是在k維空間中兩個點之間的直線距離。在二維和三維空間中的歐氏距離的就是兩點之間的距離。但是在大部分的統(tǒng)計類問題中,坐標往往會有不同程度的波動。此時馬氏距離的提出,在這樣的前提之下有效的解決的這類問題(余睿德,穆俊馳,2023)。它展示了兩個服從于同一分布的且協(xié)方差矩陣為∑的隨機變量的差異程度。但是由于協(xié)方差矩陣的影響,馬氏距離的計算并不穩(wěn)定。基于當前階段性的研究成果總結(jié),對后續(xù)工作產(chǎn)生了指導意義。尤其是在研究方法上,本文識別出多個方面可以進行優(yōu)化的空間。前一階段的經(jīng)驗教訓為本文展示了哪些做法有效,哪些則需改進或淘汰。比如,在數(shù)據(jù)采集時,本文需要更加關(guān)注樣本的廣泛性和代表性,確保選取的樣本能夠全面代表目標人群的特點。另外,面對不同研究議題時,采用多種數(shù)據(jù)搜集技術(shù)有助于增強數(shù)據(jù)的覆蓋面和可信度。因為它有求逆矩陣的過程,協(xié)方差矩陣必須滿秩,所以要求數(shù)據(jù)要有原維度等量的特征值。在計算馬氏距離過程中,要求總體樣本數(shù)大于樣本的維數(shù),否則得到的總體樣本協(xié)方差矩陣逆矩陣不存在。雖然滿足條件總體樣數(shù)大于樣本維數(shù),但是協(xié)方差矩陣仍不存在也需要采用歐氏距離計算(邱雨昕,唐羽澄,2022)。同時,馬氏距離不被量綱所影響,且與原始數(shù)據(jù)的測量單位無關(guān),原始數(shù)據(jù)和均值的差計算出的兩個樣本點的馬氏距離是一樣的,從這能見其概此外還可以排除變量之間的相關(guān)性的干擾(許睿羽,黃澤謙,2022)。因此,本文提出采用基于馬氏距離的自適應(yīng)高斯混合模型。在經(jīng)典歐氏距離的求解中,我們很直觀可以看出其分布均為球型分布,也就是說歐氏距離只適用于理想情況下的,歐幾里得空間內(nèi)的直線距離;若有若干復雜模型,因互相之間的干擾構(gòu)成橢球型,通過上述事實能知曉歐氏距離明顯無法完美的解決。馬氏距離就應(yīng)運而生,在各變量符合正態(tài)分布的情況下,排除各成分之間的量綱干擾,無單位影響,擴大一些微小的量,排除離群值(孫羽航,周佳慧,2021)。在圖2.中,四個點到觀測中心的歐氏距離都相等,然而卻并不是都屬于該類。此時的藍色點的馬氏距離明顯要比黃色點的馬氏距離小,我們有理由判斷藍色點更有可能屬于這個類(郭浩,韓雨萱,2022)。圖2.馬氏距離圖解圖2.馬氏距離圖解3自動模型的選擇這在一定層面上表露大量的無監(jiān)督學習中,聚類和降維的方法成為最基本的問題。高斯混合模型因為其包含的協(xié)方差矩陣,需要有充足的運行數(shù)據(jù)才能夠保證模型參數(shù)的準確性;但是當將其協(xié)方差矩陣換為對角協(xié)方差矩陣時,則需要足夠多的高斯模型才能提供較高的識別能力。在高斯混合模型中的降維方法中,在這般的環(huán)境中局部因子分析成為主流方法。通過該方法,降低協(xié)方差矩陣的自由度,提高準確性(鄧澤洋,吳彤彤,2021)。數(shù)據(jù)分析期間,本文借助多種統(tǒng)計方法來核實數(shù)據(jù)的可靠性,并探尋潛在的異常數(shù)值。通過深入探究數(shù)據(jù)分布特性,本文精確地移除了異常數(shù)據(jù)點,同時確保核心樣本信息得到保留。除此之外,本文還利用敏感性實驗來測定參數(shù)波動對研究結(jié)論的穩(wěn)定性及通用性的影響。為選擇局部因子分析的成分數(shù)量和局部的維度時,通常借助的統(tǒng)計準則之一是極大似然學習。但是其計算較為復雜,在1994年提出BayesianYing-Yang(BYY),經(jīng)過不斷地發(fā)展完善,成為通用的學習框架。BYYharmonylearning由BYY系統(tǒng)和基本的harmonylearning原則組成,與局部因子分析組合,成為一種正則化的方法,執(zhí)行參數(shù)學習和自動模型選擇(何炳福,周志時,2022)ADDINCSL_CITATION{"citationItems":[{"id":"ITEM-1","itemData":{"DOI":"10.1007/11840930_27","ISBN":"3540388710","ISSN":"16113349","abstract":"AfurtherinvestigationismadeonanadaptivelocalfactoranalysisalgorithmfromBayesianYing-Yang(BYY)harmonylearning,whichmakesparameterlearningwithautomaticdeterminationofboththecomponentnumberandthefactornumberineachcomponent.Acomparativestudyhasbeenconductedonsimulateddatasetsandseveralrealproblemdatasets.ThealgorithmhasbeencomparedwithnotonlyarecentapproachcalledIncrementalMixtureofFactorAnalysers(IMoFA)butalsotheconventionaltwo-stageimplementationofmaximumlikelihood(ML)plusmodelselection,namely,usingtheEMalgorithmforparameterlearningonaseriescandidatemodels,andselectingonebestcandidatebyAIC,CAIC,andBIC.ExperimentshaveshownthatIMoFAandML-BICoutperformML-AICorML-CAICwhiletheBYYharmonylearningconsiderablyoutperformsIMoFAandML-BIC.Furthermore,thisBYYlearningalgorithmhasbeenappliedtothepopularMNISTdatabasefordigitsrecognitionwithapromisingperformance.?Springer-VerlagBerlinHeidelberg2006.","author":[{"dropping-particle":"","family":"Shi","given":"Lei","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Xu","given":"Lei","non-dropping-particle":"","parse-names":false,"suffix":""}],"container-title":"LectureNotesinComputerScience(includingsubseriesLectureNotesinArtificialIntelligenceandLectureNotesinBioinformatics)","id":"ITEM-1","issue":"Ml","issued":{"date-parts":[["2006"]]},"page":"260-269","title":"Localfactoranalysiswithautomaticmodelselection:Acomparativestudyanddigitsrecognitionapplication","type":"article-journal","volume":"4132LNCS"},"uris":["/documents/?uuid=d0548008-1e94-46a5-a4c6-1ae0e7632c17"]}],"mendeley":{"formattedCitation":"[2]","plainTextFormattedCitation":"[2]","previouslyFormattedCitation":"[2]"},"properties":{"noteIndex":0},"schema":"/citation-style-language/schema/raw/master/csl-citation.json"}[2]。在文獻ADDINCSL_CITATION{"citationItems":[{"id":"ITEM-1","itemData":{"DOI":"10.1007/11840930_27","ISBN":"3540388710","ISSN":"16113349","abstract":"AfurtherinvestigationismadeonanadaptivelocalfactoranalysisalgorithmfromBayesianYing-Yang(BYY)harmonylearning,whichmakesparameterlearningwithautomaticdeterminationofboththecomponentnumberandthefactornumberineachcomponent.Acomparativestudyhasbeenconductedonsimulateddatasetsandseveralrealproblemdatasets.ThealgorithmhasbeencomparedwithnotonlyarecentapproachcalledIncrementalMixtureofFactorAnalysers(IMoFA)butalsotheconventionaltwo-stageimplementationofmaximumlikelihood(ML)plusmodelselection,namely,usingtheEMalgorithmforparameterlearningonaseriescandidatemodels,andselectingonebestcandidatebyAIC,CAIC,andBIC.ExperimentshaveshownthatIMoFAandML-BICoutperformML-AICorML-CAICwhiletheBYYharmonylearningconsiderablyoutperformsIMoFAandML-BIC.Furthermore,thisBYYlearningalgorithmhasbeenappliedtothepopularMNISTdatabasefordigitsrecognitionwithapromisingperformance.?Springer-VerlagBerlinHeidelberg2006.","author":[{"dropping-particle":"","family":"Shi","given":"Lei","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Xu","given":"Lei","non-dropping-particle":"","parse-names":false,"suffix":""}],"container-title":"LectureNotesinComputerScience(includingsubseriesLectureNotesinArtificialIntelligenceandLectureNotesinBioinformatics)","id":"ITEM-1","issue":"Ml","issued":{"date-parts":[["2006"]]},"page":"260-269","title":"Localfactoranalysiswithautomaticmodelselection:Acomparativestudyanddigitsrecognitionapplication","type":"article-journal","volume":"4132LNCS"},"uris":["/documents/?uuid=d0548008-1e94-46a5-a4c6-1ae0e7632c17"]}],"mendeley":{"formattedCitation":"[2]","plainTextFormattedCitation":"[2]","previouslyFormattedCitation":"[2]"},"properties":{"noteIndex":0},"schema":"/citation-style-language/schema/raw/master/csl-citation.json"}[2]中,比較了局部因子分析的性能,測試的范圍為極大似然估計與AIC,CAIC,BIC三者的分別組合,為了避免初始化導致的局部最優(yōu),分別進行十次BYYharmony算法并且對應(yīng)實施EM算法(許珂茜,付明哲,2019)。進行100次模擬得到平均值。結(jié)果表明BYY-LFA在性能和計算時間上的表現(xiàn)是最好的。為了驗證和優(yōu)化理論結(jié)構(gòu),本文積累了豐富的數(shù)據(jù)材料。這些數(shù)據(jù)不僅涉及多樣的研究對象,還橫跨不同的歷史時期和社會背景,為理論框架的全方位驗證提供了重要依據(jù)。借助統(tǒng)計軟件對數(shù)據(jù)進行解析,有助于確認理論中的各項假設(shè)是否成立,并發(fā)現(xiàn)其不足。后續(xù)的研究工作考慮加入更多因素或者采用更大量的樣本,以提升理論框架的有效性和前瞻性。4極大似然估計極大似然估計(MLE)就是在知道結(jié)果的情況下,通過概率的計算比較,找到最有可能導致這個結(jié)果的參數(shù)值,就需要考慮到先驗概率和后驗概率ADDINCSL_CITATION{"citationItems":[{"id":"ITEM-1","itemData":{"ISBN":"9781467347143","author":[{"dropping-particle":"","family":"Wang","given":"Pingbo","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Wang","given":"Yu","non-dropping-particle":"","parse-names":false,"suffix":""}],"id":"ITEM-1","issued":{"date-parts":[["2013"]]},"page":"1454-1458","publisher":"IEEE","title":"TwoIterativeAlgorithmsforMaximumLikelihoodEsitimationofGaussianMixtureParameter","type":"article-journal"},"uris":["/documents/?uuid=ebf02573-8fec-40a3-a5ce-9a2aae377728"]}],"mendeley":{"formattedCitation":"[3]","plainTextFormattedCitation":"[3]","previouslyFormattedCitation":"[3]"},"properties":{"noteIndex":0},"schema":"/citation-style-language/schema/raw/master/csl-citation.json"}[3],在此,先介紹一下較為簡單的貝葉斯準則。經(jīng)典的貝葉斯公式如下(林志博,何夢琪,2021):p其中,pa為先驗概率,即模型內(nèi)不同類別的分布概率;pab為類條件概率;相應(yīng)的pba在統(tǒng)計學的實際應(yīng)用中,似然函數(shù)因結(jié)構(gòu)復雜導致難以計算最大化時,通常會有參數(shù)估計的問題ADDINCSL_CITATION{"citationItems":[{"id":"ITEM-1","itemData":{"ISBN":"9788578110796","ISSN":"1098-6596","PMID":"25246403","abstract":"Hasilpenelitian,sistemkomisimerupakansalahsatusistempenjualan(sistemoperasi)jasaangkutantaksiyangberlaku/diterapkandiDKIJakarta.Penelitianinimencobamengungkapduahalyangberkaitandengansistemkomisiditinjaudariaspekpengemudi,yaitu(1)bagaimanasikappengemuditaksiterhadapsistemkomisijasaangkutantaksidan(2)bagaimanatingkatkepuasanpengemuditaksiterhadapsistemkomisijasaangkutantaksi.Hasilkajianmenunjukkansikappengemuditaksibluebirdterhadapsistemkomisijasaangkutantaksibluebirdumumnyarelatifpositif,setujuataumaumenerimadengantingkatjawabandiataslimapuluhpersen.Tingkatkepuasanresponden(pengemudi)atassistemkomisiyangditerapkanrelatifrendahdibawahlimapuluhpersen,Banyakfaktoryangmempengaruhikepuasanatauketidakpuasanseorangpengemudidalammelaksanakanpekerjaandalamsistemkomisi,diantaranyadidugadipengaruhiolehtargetsetoranyangrelatiftinggidantingkatpersainganyangsemakinketatsesamaoperatortaksi.Dampaknyamenurunnyahasilpenjualanyangakhirnyadapatmengurangitingkatkesejahteraanpengemudi.Perludilakukanpenelitianlanjutanterhadapsistempenjualan(sistemkomisi)jasataksiditinjaudariaspekperusahaanatauoperatortaksi.Perusahaantaksilebihmemperhatikanaspekkesejahteraanpengemuditaksi,sepertimenurunkanbatastargetsetoransecararealistisagartingkatkepuasanpengemudimeningkat.","author":[{"dropping-particle":"","family":"Basuki","given":"Kustiadi","non-dropping-particle":"","parse-names":false,"suffix":""}],"container-title":"ISSN2502-3632(Online)ISSN2356-0304(Paper)JurnalOnlineInternasional&NasionalVol.7No.1,Januari–Juni2019Universitas17Agustus1945Jakarta","id":"ITEM-1","issue":"9","issued":{"date-parts":[["2019"]]},"number-ofs":"1689-1699","title":"済無NoTitleNoTitle","type":"book","volume":"53"},"uris":["/documents/?uuid=d8e5aab1-d8b5-45d5-9252-6ad320509c84"]}],"mendeley":{"formattedCitation":"[4]","plainTextFormattedCitation":"[4]","previouslyFormattedCitation":"[4]"},"properties":{"noteIndex":0},"schema":"/citation-style-language/schema/raw/master/csl-citation.json"}[4],EM算法可以有效地解決極大似然估計中更復雜的情形,是最常見的隱變量估計方法之一。這在一定范圍內(nèi)顯示了它采用一種迭代優(yōu)化的思想,可用來求解最優(yōu)值,被廣泛用于查找高斯概率密度函數(shù)或簡要地以高似然性擬合樣本測量向量的高斯分量的混合參數(shù)(邱晨曦,蔣涵瑤,2019)。EM算法的每次迭代都涉及兩個步驟,我們將其稱為期望步驟和最大化步,其過程之所以引人注目,部分原因是相關(guān)理論的簡單性和普遍性,部分原因是因為它涵蓋了廣泛的示例。當基礎(chǔ)完整數(shù)據(jù)來自易于計算其最大似然估計的指數(shù)族時,在這種理論框架的指引下可推導出同樣容易計算EM算法的每個最大化步驟。然而EM算法不局限于找到概率密度函數(shù)的參數(shù),還可用于(朱文靜,高夢媛,2020):(1)檢測偏離先驗已知的樣本;(2)找到達到最低預(yù)測誤差的特征子集;(3)使用加權(quán)最小二乘法找到加權(quán)參數(shù)。算法具體組成如圖1.所示EMEM算法已知結(jié)果,尋求使該結(jié)果出現(xiàn)的可能性最大的條件,以此作為估計值。極大似然估計Jensen不等式圖3.算法組成5EM算法期望最大化(EM)是可迭代計算最大似然(MLE)計值的一種廣泛適用的方法,可用于各種不完整數(shù)據(jù)問題。最大似然估計和基于似然的推斷在統(tǒng)計理論和數(shù)據(jù)分析中至關(guān)重要。最大似然估計是一種非常重要的通用方法。它是概率論框架中最常用的估計技術(shù)。它也與貝葉斯框架相關(guān)ADDINCSL_CITATION{"citationItems":[{"id":"ITEM-1","itemData":{"author":[{"dropping-particle":"","family":"Geoffrey","given":"J","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Ket","given":"See","non-dropping-particle":"","parse-names":false,"suffix":""}],"id":"ITEM-1","issued":{"date-parts":[["2004"]]},"title":"www.econstor.eu","type":"article-journal"},"uris":["/documents/?uuid=950e228d-878f-4cfd-9576-0b7fa4db1d30"]}],"mendeley":{"formattedCitation":"[5]","plainTextFormattedCitation":"[5]","previouslyFormattedCitation":"[5]"},"properties":{"noteIndex":0},"schema":"/citation-style-language/schema/raw/master/csl-citation.json"}[5],顯而易見的是貝葉斯解決方案在似然和最大似然估計的幫助下證明得出的,貝葉斯解決方案與懲罰似然估計ADDINCSL_CITATION{"citationItems":[{"id":"ITEM-1","itemData":{"author":[{"dropping-particle":"","family":"Sankhyā","given":"Source","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Indian","given":"The","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Series","given":"a","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Feb","given":"No","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Url","given":"Stable","non-dropping-particle":"","parse-names":false,"suffix":""}],"container-title":"OrderAJournalOnTheTheoryOfOrderedSetsAndItsApplications","id":"ITEM-1","issue":"1","issued":{"date-parts":[["2012"]]},"page":"49-66","title":"IndianStatisticalInstituteConsistentEstimationoftheOrderofMixtureModelsAuthor(s):C.KeribinCONSISTENTESTIMATIONOFTHEORDEROF","type":"article-journal","volume":"62"},"uris":["/documents/?uuid=50a925de-9be2-4ef4-a819-e5b860b7be6c"]}],"mendeley":{"formattedCitation":"[6]","plainTextFormattedCitation":"[6]","previouslyFormattedCitation":"[6]"},"properties":{"noteIndex":0},"schema":"/citation-style-language/schema/raw/master/csl-citation.json"}[6],最大似然估計是一種普遍存在的技術(shù),并且在統(tǒng)計學域中得到廣泛使用。在處理數(shù)據(jù)時,以往研究的經(jīng)驗表明應(yīng)加大新興技術(shù)工具的應(yīng)用力度。隨著信息技術(shù)的日新月異,大數(shù)據(jù)分析、機器學習等前沿手段已成為科研不可或缺的部分。這類技術(shù)不僅提高了處理大規(guī)模數(shù)據(jù)的效率,還能揭示傳統(tǒng)方法無法觸及的深層次結(jié)構(gòu)和模式。因此,后續(xù)研究需深入探討如何將這些先進技術(shù)整合進分析流程中,以增強研究成果的準確性和深度。經(jīng)典的似然函數(shù)是用數(shù)值迭代的方法求解,例如牛頓-拉夫森(NR)ADDINCSL_CITATION{"citationItems":[{"id":"ITEM-1","itemData":{"DOI":"10.1109/ICOMSSC45026.2018.8941745","ISBN":"9781538667514","abstract":"Systemidentificationmethodandhybridmodelingmethod,usingamathematicalmodelestablishedbythesametheoreticalmethod,duetotheuncertaintyofthestructure,parameters,andenvironmentofthecontrolledobject,areaffectedbyfactorssuchasthespecificenvironment.Indifferentenvironments,thespecificstructuralparametersofthemathematicalmodelarenotexactlythesame.Therefore,thesystemidentificationmethodhasmorepracticalapplicationvaluethanthetheoreticalmodelingmethod.Inthispaper,aNewton-Raphsonevaluationsystemidentificationmethodisproposed.Theshortcomingsofthepreviousidentificationsystemcanbesatisfactorilyidentified,andsimulationexperimentsandphysicalexperimentscanbecarriedout.Thefeasibilityandeffectivenessoftheproposedidentificationandmodelingmethodareverified.","author":[{"dropping-particle":"","family":"Zhang","given":"Wanjun","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Zhang","given":"Feng","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Zhang","given":"Jingxuan","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Zhang","given":"Jingyi","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Zhang","given":"Jingyan","non-dropping-particle":"","parse-names":false,"suffix":""}],"container-title":"Proceedingsof2018InternationalComputers,SignalsandSystemsConference,ICOMSSC2018","id":"ITEM-1","issue":"1","issued":{"date-parts":[["2018"]]},"page":"737-742","publisher":"IEEE","title":"StudyonSystemRecognitionMethodforNewton-RaphsonIterations","type":"article-journal"},"uris":["/documents/?uuid=277a3eaf-b9a1-474c-b8f9-76597eb5fb78"]}],"mendeley":{"formattedCitation":"[7]","plainTextFormattedCitation":"[7]","previouslyFormattedCitation":"[7]"},"properties":{"noteIndex":0},"schema":"/citation-style-language/schema/raw/master/csl-citation.json"}[7]及其系列算法,例如Fisher評分方法(鄭思遠,謝雨欣,2019):從這些反應(yīng)可以推斷出在對樣本做出合理假設(shè)和足夠準確的初始值的情況下,由NR算法生成相應(yīng)的離散序列具有局部二次收斂的特性,這也是該算法的主要優(yōu)勢。但是在分析和計算上,就顯得非常局限ADDINCSL_CITATION{"citationItems":[{"id":"ITEM-1","itemData":{"DOI":"10.1111/j.2517-6161.1977.tb01600.x","abstract":"Abroadlyapplicablealgorithmforcomputingmaximumlikelihoodestimatesfromincompletedataispresentedatvariouslevelsofgenerality.Theoryshowingthemonotonebehaviourofthelikelihoodandconvergenceofthealgorithmisderived.Manyexamplesaresketched,includingmissingvaluesituations,applicationstogrouped,censoredortruncateddata,finitemixturemodels,variancecomponentestimation,hyperparameterestimation,iterativelyreweightedleastsquaresandfactoranalysis.","author":[{"dropping-particle":"","family":"Dempster","given":"A.P.","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Laird","given":"N.M.","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Rubin","given":"D.B.","non-dropping-particle":"","parse-names":false,"suffix":""}],"container-title":"JournaloftheRoyalStatisticalSociety:SeriesB(Methodological)","id":"ITEM-1","issue":"1","issued":{"date-parts":[["1977"]]},"page":"1-22","title":"MaximumLikelihoodfromIncompleteDataViatheEMAlgorithm","type":"article-journal","volume":"39"},"uris":["/documents/?uuid=b135b1c9-b1c3-4072-aa15-e323fde49701"]}],"mendeley":{"formattedCitation":"[8]","plainTextFormattedCitation":"[8]","previouslyFormattedCitation":"[8]"},"properties":{"noteIndex":0},"schema":"/citation-style-language/schema/raw/master/csl-citation.json"}[8],此時EM算法提供了一種更加完善的替代方法,成為主流的用于迭代似然函數(shù)的方法,用于解決數(shù)據(jù)丟失和信息不完整的各種問題(張志杰,陳雨婷,2022)。5.1推導我們令X=xi,(i=1,2,…,N)作為觀測數(shù)據(jù),即X是一組隨機變量且屬于任意的樣本空間;θ為g(XL其中fx|θ為x的pdf,最終的目的是找到最優(yōu)參數(shù)θ,記作θ?,讓Lμ,θ我們引入隱變量pa(xi)L式中fqxi|θq的先驗概率。與K-means有很大的不同,兩者之間最明顯的區(qū)別在于在K-means中每個樣本都被以0或1的概率進行分類(馮天宇,張紫怡,2020);在這個大前提下而在EM算法中是在0到1之間的概率值。在特殊情況下,密度函數(shù)fqp式中||Sq||是矩陣的行列式;D是Xp5.2EM算法收斂性在Dempster等人對EM算法進行常規(guī)描述之前,有學者在深入研究了該算法一般性的幾個收斂問題之后提出,當完整數(shù)據(jù)來自具有緊湊參數(shù)空間的彎曲指數(shù)族,并且當Q函數(shù)滿足某個適度的微分條件時,則任何EM序列都收斂到似然函數(shù)的固定點(不一定是最大值)(曾俊杰,韓璇瑩,2018)ADDINCSL_CITATION{"citationItems":[{"id":"ITEM-1","itemData":{"DOI":"10.1109/TSP.2008.917350","ISSN":"1053587X","abstract":"Inthispaper,theexpectation-maximization(EM)algorithmforGaussianmixturemodelingisimprovedviathreestatisticaltests.ThefirsttestisamultivariatenormalitycriterionbasedontheMahalanobisdistanceofasamplemeasurementvectorfromacertainGaussiancomponentcenter.Thefirsttestisusedinordertoderiveadecisionwhethertosplitacomponentintoanothertwoornot.ThesecondtestisacentraltendencycriterionbasedontheobservationthatmultivariatekurtosisbecomeslargeifthecomponenttobesplitisamixtureoftwoormoreunderlyingGaussiansourceswithcommoncenters.Ifthecommoncenterhypothesisistrue,thecomponentissplitintotwonewcomponentsandtheircentersareinitializedbythecenterofthe(old)componentcandidateforsplitting.Otherwise,thesplittingisaccomplishedbyadiscriminantderivedbythethirdtest.Thistestisbasedonmarginalcumulativedistributionfunctions.ExperimentalresultsarepresentedagainstsevenotherEMvariantsbothonartificiallygenerateddata-setsandrealones.TheexperimentalresultsdemonstratethattheproposedEMvarianthasanincreasedcapabilitytofindtheunderlyingmodel,whilemaintainingalowexecutiontime.?2008IEEE.","author":[{"dropping-particle":"","family":"Ververidis","given":"Dimitrios","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Kotropoulos","given":"Constantine","non-dropping-particle":"","parse-names":false,"suffix":""}],"container-title":"IEEETransactionsonSignalProcessing","id":"ITEM-1","issue":"7I","issued":{"date-parts":[["2008"]]},"page":"2797-2811","title":"GaussianmixturemodelingbyexploitingtheMahalanobisdistance","type":"article-journal","volume":"56"},"uris":["/documents/?uuid=a7bb29e6-30ee-493a-8160-67bcfae0b7e5"]}],"mendeley":{"formattedCitation":"[9]","plainTextFormattedCitation":"[9]","previouslyFormattedCitation":"[9]"},"properties":{"noteIndex":0},"schema":"/citation-style-language/schema/raw/master/csl-citation.json"}[9]。研究所得出的初期結(jié)果與先前的計算數(shù)據(jù)和文獻回顧結(jié)果大體一致,這表明了本研究在方法上的可靠性和有效性。一致性不僅重申了早期研究的結(jié)論,也給現(xiàn)有的理論框架帶來了新的證據(jù)。通過精心設(shè)計的研究方案、詳盡的數(shù)據(jù)收集及嚴格的分析程序,本文能夠復制前人的關(guān)鍵發(fā)現(xiàn),并在此基礎(chǔ)上進行深化研究。這不僅提高了對研究假設(shè)的信心,也體現(xiàn)了所選研究方法的科學依據(jù)。此外,這種一致性為跨學科研究間的對比分析提供了依據(jù),有助于建立更加綜合和統(tǒng)一的理論體系。為解決極大似然估計中的尋找最佳參數(shù)θ,需用到EM算法來解決這個復雜情況。在面臨僅知道結(jié)果卻不知道是哪個概率分布實現(xiàn)的問題,我們需要根據(jù)現(xiàn)有的觀測數(shù)據(jù)先行估計一個大概的值,再根據(jù)這個估計值來推敲未觀測數(shù)據(jù)x的條件概率的期望,最終由求到的期望根據(jù)極大似然估計獲得最優(yōu)值θ?。但是僅僅由這一次的估計所得到的最優(yōu),并不是最終的結(jié)果,在這樣的前提之下再不斷地重復進行該操作,直到最終收斂時,得到的結(jié)果才是我們需要的值。若想在不斷優(yōu)化之后得到值,則算法一定趨于某個值,也就是說,其結(jié)果一定是收斂的。證明如下(孔鵬飛,謝茹潔,2022):用極大似然函數(shù)遞推得到某模型的參數(shù),公式表示如下:L模型之中含有未知的隱藏變量,假設(shè)為Z,則:L從這能見其概式中的求和實際上就是求Z的期望,假定它的概率為yi(Z),取值的分布為g(Z)L==由簡森不等式(見圖4.)可以推出,凸函數(shù)f(E(L圖4.簡森不等式圖解很容易得出Lθ有下屆,則每次的迭代優(yōu)化過程中,提供一個下屆J(Z,θ),不斷地提高圖4.簡森不等式圖解在收斂性推導過程中,引入了一個新的變量yi當簡森不等式相等時,pxi則(王文濤,王雪霏,2020)y==p(Z|5.32EM算法的初始值當初始值選擇出現(xiàn)偏差時,EM算法的收斂速度會降低ADDINCSL_CITATION{"citationItems":[{"id":"ITEM-1","itemData":{"DOI":"10.1109/TSP.2008.917350","ISSN":"1053587X","abstract":"Inthispaper,theexpectation-maximization(EM)algorithmforGaussianmixturemodelingisimprovedviathreestatisticaltests.ThefirsttestisamultivariatenormalitycriterionbasedontheMahalanobisdistanceofasamplemeasurementvectorfromacertainGaussiancomponentcenter.Thefirsttestisusedinordertoderiveadecisionwhethertosplitacomponentintoanothertwoornot.ThesecondtestisacentraltendencycriterionbasedontheobservationthatmultivariatekurtosisbecomeslargeifthecomponenttobesplitisamixtureoftwoormoreunderlyingGaussiansourceswithcommoncenters.Ifthecommoncenterhypothesisistrue,thecomponentissplitintotwonewcomponentsandtheircentersareinitializedbythecenterofthe(old)componentcandidateforsplitting.Otherwise,thesplittingisaccomplishedbyadiscriminantderivedbythethirdtest.Thistestisbasedonmarginalcumulativedistributionfunctions.ExperimentalresultsarepresentedagainstsevenotherEMvariantsbothonartificiallygenerateddata-setsandrealones.TheexperimentalresultsdemonstratethattheproposedEMvarianthasanincreasedcapabilitytofindtheunderlyingmodel,whilemaintainingalowexecutiontime.?2008IEEE.","author":[{"dropping-particle":"","family":"Ververidis","given":"Dimitrios","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Kotropoulos","given":"Constantine","non-dropping-particle":"","parse-names":false,"suffix":""}],"container-title":"IEEETransactionsonSignalProcessing","id":"ITEM-1","issue":"7I","issued":{"date-parts":[["2008"]]},"page":"2797-2811","title":"GaussianmixturemodelingbyexploitingtheMahalanobisdistance","type":"article-journal","volume":"56"},"uris":["/documents/?uuid=a7bb29e6-30ee-493a-8160-67bcfae0b7e5"]}],"mendeley":{"formattedCitation":"[9]","plainTextFormattedCitation":"[9]","previouslyFormattedCitation":"[9]"},"properties":{"noteIndex":0},"schema":"/citation-style-language/schema/raw/master/csl-citation.json"}[9]。通過上述事實能知曉但是存在這種可能性,在參數(shù)空間的邊緣不受限制的情況下,如果選擇初始值太接近邊界,則由EM算法生成的估計序列可能會發(fā)散。另外對于似然方程具有對應(yīng)于局部最大值的多個根的應(yīng)用,在對所有局部最大值檢索時,這在一定層面上表露應(yīng)從廣泛的初始值選擇中應(yīng)用EM算法。其一種變體使用間隔分析方法ADDINCSL_CITATION{"citationItems":[{"id":"ITEM-1","itemData":{"DOI":"10.1109/ISCAS.2010.5537044","ISBN":"9781424453085","abstract":"ThisworkproposesalowcomplexitycomputationofEMalgorithmforGaussianmixturemodel(GMM)andacceleratestheparameterestimation.Inpreviousworks,theauthorsrevealedthatthecomputationalcomplexityofGMM-basedclassificationcanbereducedbyusinganintervalcalculationtechnique[1],[2].ThisworkappliestheideatoEMalgorithmforGMMparameterestimation.Fromexperiments,itisconfirmedthatthecomputationalspeedoftheproposalachievesmorethantwicethatofthestandardmethodwith'exp()'function.Therelativeerrorsarelessthan0.6%and0.053%whenthenumberofbitsfortableaddressingare4and8,respectively.?2010IEEE.","author":[{"dropping-particle":"","family":"Watanabe","given":"Hidenori","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Muramatsu","given":"Shogo","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Kikuchi","given":"Hisakazu","non-dropping-particle":"","parse-names":false,"suffix":""}],"container-title":"ISCAS2010-2010IEEEInternationalSymposiumonCircuitsandSystems:Nano-BioCircuitFabricsandSystems","id":"ITEM-1","issue":"8","issued":{"date-parts":[["2010"]]},"page":"2686-2689","title":"IntervalcalculationofEMalgorithmforGMMparameterestimation","type":"article-journal"},"uris":["/documents/?uuid=efc0cd48-a65f-4b7e-a496-0359ea57137d"]}],"mendeley":{"formattedCitation":"[10]","plainTextFormattedCitation":"[10]","previouslyFormattedCitation":"[10]"},"properties":{"noteIndex":0},"schema":"/citation-style-language/schema/raw/master/csl-citation.json"}[10]在參數(shù)空間的任何指定區(qū)域內(nèi)找到對數(shù)似然函數(shù)的多個固定點。本文在這篇文章中構(gòu)建了一個新的框架模型,該模型在信息流通及數(shù)據(jù)分析技術(shù)上均反映了對以往學術(shù)成就的認可和繼承,并在其之上實現(xiàn)了突破和進步。尤其是在信息流通的設(shè)計上,采用了傳統(tǒng)的信息管理原則,確保信息從獲取到分析的每一個階段都準確無誤。借助嚴格的數(shù)據(jù)甄選機制和規(guī)范化的處理流程,提升了信息品質(zhì),同時也強調(diào)了信息流的公開透明和可追溯性。在指定混合模型的初始值時,有以下方法(徐星宇,李若彤,2020):在g個成分的混合模型的獨立數(shù)據(jù),E-step的作用在更新該成分的后驗概率,可通過以下方法找到第一個E-step:指定τj(0)(j=1,…,n)來執(zhí)行第一步,τZ將整體分為x個組件,比如在g=1的正態(tài)分量和對應(yīng)的協(xié)方差矩陣混合的情況下,采用初步劃分數(shù)據(jù)的方法,在這般的環(huán)境中提供兩個p變量。對于高維

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論