




版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
1、2022/7/151高級人工智能 知識發(fā)現(xiàn) 2022/7/152 概述 在數(shù)據庫基礎上實現(xiàn)的知識發(fā)現(xiàn)系統(tǒng),通過綜合運用統(tǒng)計學、粗糙集、模糊數(shù)學、機器學習,和專家系統(tǒng)等多種學習的手段和方法, 從大量的數(shù)據中提煉出抽象的知識,從而揭示出蘊涵在這些數(shù)據背后的客觀世界的內在聯(lián)系和本質規(guī)律,實現(xiàn)知識的自動獲取,這是一個富有挑戰(zhàn)性、應用前景廣闊的研究課題。2022/7/153提綱KDD的由來和應用領域KDD的定義KDD的各個步驟KDD軟件KDD領域的會議和雜志2022/7/154Evolution of Database Technology:from data management to data an
2、alysis1960s:Data collection, database creation, IMS and network DBMS.1970s: Relational data model, relational DBMS implementation.1980s: RDBMS, advanced data models (extended-relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scientific, engineering, etc.).1990s: Data mining an
3、d data warehousing, multimedia databases, and Web technology.2022/7/155Motivations “Necessity is the Mother of Invention”Data explosion problem: Automated data collection tools, mature database technology and internet lead to tremendous amounts of data stored in databases, data warehouses and other
4、information repositories. We are drowning in information, but starving for knowledge! (John Naisbett)Data warehousing and data mining :On-line analytical processingExtraction of interesting knowledge (rules, regularities, patterns, constraints) from data in large databases.2022/7/1561989 IJCAI Works
5、hop on KDDKnowledge Discovery in Databases (G. Piatetsky-Shapiro and W. Frawley, eds., 1991)1991-1994 Workshops on KDDAdvances in Knowledge Discovery and Data Mining (U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., 1996)1995-1998 AAAI Int. Conf. on KDD and DM (KDD95-98)Journal of
6、 Data Mining and Knowledge Discovery (1997)1998 ACM SIGKDD 1999 SIGKDD99 Conf.Important dates of data mining2022/7/157數(shù)據庫知識發(fā)現(xiàn)該術語于1989年出現(xiàn),F(xiàn)ayyad定義(1996)為“KDD是從數(shù)據集中識別出有效的、新穎的、潛在有用的,以及最終可理解的模式的非平凡過程” the nontrivial process of identifying valid, novel, potentially useful,and ultimately understandable pa
7、tterns in data2022/7/158IdentifyProblem or OpportunityMeasure effectof ActionAct onKnowledgeKnowledgeResultsStrategyProblemThe virtuous cycle2022/7/159Application Areas and OpportunitiesMarketing: segmentation, customer targeting, .Finance: investment support, portfolio managementBanking & Insurance
8、: credit and policy approvalSecurity: fraud detectionScience and medicine: hypothesis discovery, prediction, classification, diagnosis Manufacturing: process modeling, quality control,resource allocationEngineering: simulation and analysis, pattern recognition, signal processingInternet: smart searc
9、h engines, web marketing 2022/7/1510Selection and PreprocessingData MiningInterpretation and EvaluationData ConsolidationKnowledgep(x)=0.02WarehouseData SourcesPatterns & ModelsPrepared Data ConsolidatedDataThe KDD process2022/7/1511Data mining is a major component of the KDD process - automated dis
10、covery of patterns and the development of predictive and explanatory models.What is KDD? A process!2022/7/1512Learning the application domain:relevant prior knowledge and goals of applicationData consolidation: Creating a target data setSelection and Preprocessing Data cleaning : (may take 60% of ef
11、fort!)Data reduction and projection:find useful features, dimensionality/variable reduction, invariant representation.Choosing functions of data mining summarization, classification, regression, association, clustering.Choosing the mining algorithm(s)Data mining: search for patterns of interestInter
12、pretation and evaluation: analysis of results.visualization, transformation, removing redundant patterns, Use of discovered knowledgeThe steps of the KDD process2022/7/1513Garbage in Garbage out The quality of results relates directly to quality of the data50%-70% of KDD process effort is spent on d
13、ata consolidation and preparationMajor justification for a corporate data warehouseData consolidation and preparation2022/7/1514From data sources to consolidated data repositoryRDBMSLegacy DBMSFlat FilesDataConsolidationand CleansingWarehouseObject/Relation DBMS Multidimensional DBMS Deductive Datab
14、ase Flat files ExternalData consolidation2022/7/1515Determine preliminary list of attributes Consolidate data into working databaseInternal and External sourcesEliminate or estimate missing valuesRemove outliers (obvious exceptions)Determine prior probabilities of categories and deal with volume bia
15、sData consolidation2022/7/1516Generate a set of exampleschoose sampling methodconsider sample complexitydeal with volume bias issuesReduce attribute dimensionalityremove redundant and/or correlating attributescombine attributes (sum, multiply, difference)Reduce attribute value rangesgroup symbolic d
16、iscrete valuesquantize continuous numeric valuesTransform datade-correlate and normalize values map time-series data to static representationOLAP and visualization tools play key roleData selection and preprocessing2022/7/1517Data mining tasks and methods Automated Exploration/Discoverye.g. discover
17、ing new market segmentsclustering analysisPrediction/Classificatione.g. forecasting gross sales given current factorsregression, neural networks, genetic algorithms, decision treesExplanation/Descriptione.g. characterizing customers by demographics and purchase historydecision trees, association rul
18、esx1x2f(x)xif age 35 and e $35k then .2022/7/1518Clustering: partitioning a set of data into a set of classes, called clusters, whose members share some interesting common properties.Distance-based numerical clusteringmetric grouping of examples (K-NN)graphical visualization can be usedBayesian clus
19、teringsearch for the number of classes which result in best fit of a probability distribution to the data AutoClass (NASA) one of best examplesAutomated exploration and discovery2022/7/1519Learning a predictive modelClassification of a new case/sample Many methods:Artificial neural networksInductive
20、 decision tree and rule systemsGenetic algorithmsNearest neighbor clustering algorithmsStatistical (parametric, and non-parametric)Prediction and classification2022/7/1520The objective of learning is to achieve good generalization to new unseen cases.Generalization can be defined as a mathematical i
21、nterpolation or regression over a set of training pointsModels can be validated with a previously unseen test set or using cross-validation methodsf(x)xGeneralization and regression2022/7/1521Objective: Develop a general model or hypothesis from specific examplesFunction approximation (curve fitting
22、)Classification (concept learning, pattern recognition)f(x)xx1x2ABSummarizing: inductive modeling = learning2022/7/1522Learn a generalized hypothesis (model) from selected dataDescription/Interpretation of model provides new knowledge Methods:Inductive decision tree and rule systemsAssociation rule
23、systemsLink Analysis Explanation and description2022/7/1523Generate a model of normal activityDeviation from model causes alertMethods:Artificial neural networksInductive decision tree and rule systemsStatistical methodsVisualization toolsException/deviation detection2022/7/1524Outlier and exception
24、 data analysisTime-series analysis (trend and deviation): Trend and deviation analysis: regression, sequential pattern, similar sequences, trend and deviation, e.g., stock analysis.Similarity-based pattern-directed analysisFull vs. partial periodicity analysisOther pattern-directed or statistical an
25、alysis2022/7/1525A data mining system/query may generate thousands of patterns, not all of them are interesting.Interestingness measures:easily understood by humansvalid on new or test data with some degree of certainty.potentially usefulnovel, or validates some hypothesis that a user seeks to confi
26、rm Objective vs. subjective interestingness measuresObjective: based on statistics and structures of patterns, e.g., support, confidence, etc.Subjective: based on users beliefs in the data, e.g., unexpectedness, novelty, etc.Are all the discovered pattern interesting?2022/7/1526Find all the interest
27、ing patterns: Completeness.Can a data mining system find all the interesting patterns?Search for only interesting patterns: Optimization.Can a data mining system find only the interesting patterns?ApproachesFirst generate all the patterns and then filter out the uninteresting ones.Generate only the interesting patterns - mining query optimization.Completeness vs. optimization2022/7/1527EvaluationStatistical validation and significance testingQualitative review by experts in the fieldPilot surveys to evaluate model accuracyInterpretationIn
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
- 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2024年全球及中國一次性使用負壓引流敷料行業(yè)頭部企業(yè)市場占有率及排名調研報告
- 影視制作合作合同
- 中國CMO市場深度調研分析及投資前景研究預測報告
- 2025年中國糧食行業(yè)市場運營現(xiàn)狀及投資規(guī)劃研究建議報告
- 門窗項目可行性研究報告
- 排球知識培訓課件
- 陜西中煙工業(yè)有限責任公司真題2024
- 中國消心痛片制劑行業(yè)市場前景預測及投資價值評估分析報告
- 2025年中國汽車覆蓋件行業(yè)市場深度評估及投資戰(zhàn)略規(guī)劃報告
- 2024年甘肅天水工業(yè)和信息化廳廳屬事業(yè)單位招聘考試真題
- 2024年汽車駕駛員(技師)理論考試題及答案
- 四川省宜賓縣2024屆語文八下期末聯(lián)考試題含解析
- 醫(yī)務人員手衛(wèi)生規(guī)范培訓課件預防醫(yī)院感染的手衛(wèi)生措施
- 電纜敷設專項施工方案
- 兒童下支氣管肺炎護理查房課件
- 倉庫搬遷安全須知培訓培訓課件
- 機車高壓電器-高壓連接器
- 【課件】Unit+1Reading+and+thinking說課課件人教版必修第二冊
- ic封裝公司運營管理方案
- 軟件項目管理 復習題(附參考答案)
- 有機電子學課件
評論
0/150
提交評論