數據挖掘-1-商務智能與數據挖掘概述_第1頁
數據挖掘-1-商務智能與數據挖掘概述_第2頁
數據挖掘-1-商務智能與數據挖掘概述_第3頁
數據挖掘-1-商務智能與數據挖掘概述_第4頁
數據挖掘-1-商務智能與數據挖掘概述_第5頁
已閱讀5頁,還剩44頁未讀 繼續免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

1、數據挖掘理論與應用哈爾濱工業大學 管理學院葉強yeqiangSAP成立于1972年,總部位于德國沃爾多夫市是全球最大的企業管理軟件供應商、全球最大ERP軟件商,全球第三大獨立軟件供應商Fortune Global 500 企業80%是SAP的用戶。ORACLE公司成立于1977年,總部位于美國加州 ,全球第二大獨立軟件供應商和第二大ERP軟件供應商,1998年Oracle宣布大舉進軍應用軟件市場 Whats the same ?Business IntelligenceIDC的一項研究表明北美企業紛紛轉向商務智能工具的開發和應用。商業智能軟件市場方興未艾,為低迷的IT產業帶來一束曙光。根據弗瑞

2、斯特研究中心(Forrester Research)的調查發現,去年底有44的企業表示,計劃在今年購買BI軟件。另外,據IDC提供的數據,在五年之內,商業智能市場規模將從現在的55億美元擴大到157億美元。正是看到了BI市場的巨大“錢途”,一些IT服務提供商紛紛推出自己的BI軟件,如Oracle、微軟、IBM公司、國內的金碟公司等均大力開發和推廣BI軟件。與此同時,一些商業企業、制造企業也在積極應用BI軟件,為其決策提供有價值的服務。研究生選題美孚公司的客戶分析美孚石油公司畢馬威 KPMG參考書:Jiawei Han, Micheline Kambr. 數據挖掘概念與技術(影印版).高等教育出

3、版社,2001.5(英文)韓家煒 , Micheline Kambr. 數據挖掘概念與技術.機械工業出版社,2001.8(中譯本)Margaret H. Dunham. 數據挖掘教程. 清華大學出版社。2003(英文)史忠植. 知識發現. 清華大學出版社, 2002.陳文偉,黃金才. 數據倉庫與數據挖掘 .人民郵電出版社,2004 Olivia Parr Rud. 數據挖掘實踐. 機械工業出版社. 2003邢文訓.現代優化計算方法.清華大學出版社,1999.閻平凡、張長水. 人工神經網絡與模擬進化計算. 清華大學出版社,2000劉勇、康立山、陳毓屏. 非數值并行算法遺傳算法. 科學出版社, 2

4、000MISQ (MIS Quarterly)ISR (Information System Research)CACM (Communication of the ACM)MS (Management Science)JMIS (Journal of Management Information Systems)AI (Artificial Intelligence)DSI (Decision Science)HBR (Harvard Business Review)IEEETrans (IEEE Transactions )AIMag (AI Magazine)EJIS (European

5、 Journal of Information Systems)DSS (Decision Support Systems)Ranking of MIS JournalsJournals of Data Mining Data Mining and Knowledge Discovery (DMKD, since 1997) SCI (2.8) Springer Machine learning (SCI 3.258) Monthly Springer IEEE Transactions on Knowledge and Data Engineering (TKDE) SCI (1.243)K

6、nowledge and Information Systems (KAIS, since 1999) SCI Springer Many others, DSS、ISR DATA & KNOWLEDGE ENGINEERING ACM Transaction on Information Systems ACM Transactions on Database Systems ACM Transactions on Knowledge Discovery from Data (TKDD) 國際會議SIGKDD (International Conference on Knowledge Di

7、scovery and Data Mining ,ACM SIGKDD , 2006 August 23-26 ,Philadelphia)SIGMOD (Special Interest Group on Management Of Data, ACM SIGMOD, Association for Computing Machinery, June 2006, Chicago )ICIS (International Conference on Information Systems, AIS)HICSS (Hawaii International Conference on System

8、 Science, IEEE)/在線直播Han Jiawei(韓家煒) 1 商務智能概述商務智能的含義 商務智能(Business Intelligence, 縮寫為BI)指利用計算機及計算機網絡,從商業數據存儲中提取與分析企業關注信息的智能化的數據分析處理系統。 商務智能的提出九十年代末,美國商務智能的核心技術 數據倉庫DW , 在線分析處理 OLAP, 數據挖掘DM BIBusiness Intelligence DW Data Warehousing OLAP On-Line Analytical Processing DM Data Mining Architecture of Bus

9、iness Intelligence System(Won Kim, 1998 2nd Worldwide Computing and its Applications)Business Intelligence ApplicationsData miningengineOLAPengineData Warehouse/Data MartBase dataBase dataBase data1)數據倉庫 為滿足管理決策中的數據需求,W.H.Inmon,在1992最先提出了數據倉庫的概念。按照Inmon的定義,數據倉庫是面向主題的、集成的、穩定的、不同時間的數據集合,用于支持經營管理的決策制定過

10、程。目前已經成為企業級決策系統的重要組成部分。 產品1產品2數據倉庫的多維模型1季度地區維產品維時間維2季度3季度4季度關系型數據模型(二維)多維數據模型:星型 :事實表(中心表),附屬表(維表)雪花型星云型Example of Star Schema time_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcitystate_or_provincecountrylocationSales Fact Table time_key item_key branch_key location_key units_sold dol

11、lars_sold avg_salesMeasuresitem_keyitem_namebrandtypesupplier_typeitembranch_keybranch_namebranch_typebranchExample of Snowflake Schematime_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcity_keylocationSales Fact Table time_key item_key branch_key location_key units_sold dollars_sold av

12、g_salesMeasuresitem_keyitem_namebrandtypesupplier_keyitembranch_keybranch_namebranch_typebranchsupplier_keysupplier_typesuppliercity_keycitystate_or_provincecountrycityExample of Fact Constellationtime_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcityprovince_or_statecountrylocationSal

13、es Fact Tabletime_key item_key branch_key location_key units_sold dollars_sold avg_salesMeasuresitem_keyitem_namebrandtypesupplier_typeitembranch_keybranch_namebranch_typebranchShipping Fact Tabletime_key item_key shipper_key from_location to_location dollars_cost units_shippedshipper_keyshipper_nam

14、elocation_keyshipper_typeshipper2)OLAP 為滿足基于大型數據庫的復雜查詢、決策分析等需求,彌補OLTP(On-Line Transaction Processing, 在線事務處理)在功能上的不足,90年代初出現了OLAP技術(E.F. Codd,1993),目前OLAP 已經成為大部分數據倉庫的重要分析工具。 多維數據模型上的OLAP操作上卷(roll-up)下鉆(drill-down)切片(dice)切塊(slice)旋轉(rotate),旋軸(pivot) 產品1產品2多維數據模型上的操作1季度地區維產品維時間維2季度3季度4季度3)數據挖掘 查詢驅動

15、的OLAP可以按要求將數據展示在決策者面前,卻無法自動發現潛藏在數據中的有用信息,大大降低了數據的使用價值。為實現對潛藏信息的自動發掘,90年代中期,出現了數據挖掘技術(Data Mining)。數據挖掘是源于KDD(Knowledge Discovery in Database,數據庫中的知識發現)的一項以人工智能為基礎的數據分析技術,其主要功能是在大量數據中自動發現潛在有用的知識,這些知識可以被表示為概念、規則、規律、模式等。2數據挖掘概述What Motivated Data Mining ?The major reason that data mining has attracted

16、a great deal of attention in the information industry in recent years is due to the wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge.尿布與啤酒的故事全球連鎖零售巨頭沃爾瑪公司公司Teradata(數據倉庫)事業部,提供數據倉庫系統Data Collection and Database Creation (1960

17、s and earlier)-primitive file processingDatabase Management Systems (1970-1980s)-Hierarchical and network database systems-Relational database systems-Indexing and data organization techniques: B+tree, hashing tree-Query language: SQL, etc.-OLTPAdvanced Database Systems(mid-1980spresent)-Object Orie

18、nted-object-relational-Application-oriented: spatial, temporalWeb-based Databases Systems(1990s-present)-XML-based database systems-web miningData Warehousing and Data Mining (late 1980s-present)-data warehouse and OLAP technology -Data mining and knowledge discoveryNew Generation of Integrated Info

19、rmation Systems (2000-)The evolution of database technologyWhats Data Mining ?Data Mining refers to extracting or “mining” Knowledge from large amount of data.Handetal.(2000):“Data mining is the process of seeking interesting or valuable information in large data bases.(數據挖掘是一種在大型數據庫中尋找你感興趣或是有價值信息的過

20、程。)” Data Mining and KDDData mining is a step of KDD ?Data mining is KDD ?Data Mining: A KDD ProcessData mining:core of knowledge discovery processData CleaningData IntegrationDatabasesData WarehouseKnowledgeTask-relevant DataSelectionData MiningPattern EvaluationData Mining and Business Intelligenc

21、e Increasing potentialto supportbusiness decisionsDecision makerBusiness Analyst DataAnalystDBA MakingDecisionsData PresentationVisualization TechniquesData MiningInformation DiscoveryData ExplorationOLAPStatistical Analysis, Querying and ReportingData Warehouses / Data MartsData Sources Files, Info

22、rmation Providers, Database Systems, OLTPData mining is a confluence of multiple disciplinesDatabase technologyStatisticsMachine learningOther disciplinesVisualizationData MiningInformationscienceClassification of Data Mining SystemsCriteria:The kinds of databases mined The kinds of knowledge minedT

23、he kinds of techniques utilizedThe applications adaptedClassification according to the kinds of database minedData models:Relational Data MiningObject-oriented Data MiningObject-relational Data MiningData warehouse Data MiningTypes of data:Spatial Data MiningTime-series Data MiningText Data MiningMu

24、ltimedia Data MiningWeb Data MiningClassification according to the kinds of knowledge minedDM for characterization knowledgeDM for discrimination knowledgeDM for association knowledgeDM for classification knowledgeDM for clustering knowledgeDM for outlier analysis DM for evolution analysisClassifica

25、tion according to the kinds of techniques utilizedMachine learningStatisticsVisualizationPattern recognitionNeural networksGenetic AlgorithmClassification according to the applications adaptedFinanceSales data miningTelecommunicationsDNA sequence analysisStock markets analysisE-mail processing Data

26、preprocessingData cleaningMissing values; Noisy Data; Inconsistent dataData integration and transformationThe merging of data from multiple data storesTransform into the forms appropriate for miningData reductionData cube aggregation; dimensionality reduction; data compression; numerosity reductionD

27、iscretization and concept Hierarchy Generation allComputer accessoryprintersoftwarecomputerWrist padmouseb/wcolorapplicationOSlaptopdesktopIBMHPMicrosoftToshibaA concept hierarchy Pattern evaluation Interesting MeasuresSimplicityRule length (conjunction normal form)Certainty Confidence Utility SupportNoveltyPresentation of discovered patternsRulesTableCrosstabPie chartBar chartData cubeNeural networ

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業或盈利用途。
  • 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論