大數據采集與清洗(1)_第1頁
大數據采集與清洗(1)_第2頁
大數據采集與清洗(1)_第3頁
大數據采集與清洗(1)_第4頁
大數據采集與清洗(1)_第5頁
已閱讀5頁,還剩31頁未讀 繼續免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

1、 數據采集與清洗數據采集與清洗 什么是大數據大數據處理流程大數據的主要特征大數據采集的概念大數據采集應用1什么是大數據整理ppt7淘寶推薦 依據購物行為偏好依據購物行為偏好引薦引薦依據你最近的閱讀依據你最近的閱讀行為和消費行為進行為和消費行為進行引薦行引薦依據你用的設備往依據你用的設備往來不斷猜特征來不斷猜特征. .依據時節改變進行依據時節改變進行引薦引薦整理ppt82014-032015-082017-102016-032018行業現狀與前景行業現狀與前景整理ppt9整理ppt102019年人社部擬最新發布15項新職業1.大數據工程技術人員大數據工程技術人員2.云計算工程技術人員云計算工程技

2、術人員3.人工智能工程技術人員人工智能工程技術人員4.物聯網工程技術人員物聯網工程技術人員5.什么是大數據大數據(大數據(Big DataBig Data)是指無法使用是指無法使用傳統和常用的軟件技術和工具在一定時傳統和常用的軟件技術和工具在一定時間內完成獲取、管理和處理的數據集間內完成獲取、管理和處理的數據集2大數據的主要特征大數據主要特征 VolumeVolumeVelocityVelocityVarietyVarietyVeracityVeracity真實性(真實性(VeracityVeracity),),即追求高質量的數即追求高質量的數據。據。容量大(容量大(VolumeVolume)

3、, ,指大規模的數據量,指大規模的數據量,并且數據量呈持續并且數據量呈持續增長趨勢。增長趨勢。速度快(速度快(VelocityVelocity), ,指的是數據被創建指的是數據被創建和移動的速度。和移動的速度。種類多(種類多(VarietyVariety), ,指數據來自多種數指數據來自多種數據源,數據種類和據源,數據種類和格式。格式。ValueValue價值密度低價值密度低(ValueValue),指隨著),指隨著數據量的增長,數數據量的增長,數據中有意義的信息據中有意義的信息卻沒有成相應比例卻沒有成相應比例增長。增長。3大數據處理流程大數據處理流程 數據預處理數據預處理 就是將采集就是將采

4、集來的數據從多種數據庫來的數據從多種數據庫導入到大型的分布式數導入到大型的分布式數據庫中(目前主要是據庫中(目前主要是hfdshfds或或hivehive), ,并同時做并同時做一些簡單的清洗和預處一些簡單的清洗和預處理工作。理工作。數據統計分析數據統計分析 就是對上面就是對上面已經完成的存儲在大型分已經完成的存儲在大型分布式數據庫中的數據進行布式數據庫中的數據進行歸類統計,可以滿足一般歸類統計,可以滿足一般場景的分析需求。場景的分析需求。數據挖掘數據挖掘 是對數據進是對數據進行基于各種算法的分析行基于各種算法的分析計算,從而起到預測的計算,從而起到預測的效果,實現一些高級別效果,實現一些高級

5、別數據分析的需求。數據分析的需求。數據采集數據采集 就是利用就是利用多種數據庫(關系型,多種數據庫(關系型,NOSQLNOSQL)去存儲不)去存儲不同來源的數據。同來源的數據。數據展示數據展示 就是對就是對以上處理完的結果以上處理完的結果進行分析,或者形進行分析,或者形成報表。成報表。4大數據采集的概念大數據采集的概念3 3、大數據采集技術方法、大數據采集技術方法 大數據采集技術就是對數據進行大數據采集技術就是對數據進行 ETL ETL 操作,通過對數據進行提取、轉換、加載,最操作,通過對數據進行提取、轉換、加載,最終挖掘數據的潛在價值。終挖掘數據的潛在價值。ETLETL指的是指的是Extra

6、ct-Transform-LoadExtract-Transform-Load,也就是抽取、轉換、加,也就是抽取、轉換、加載。載。 抽取抽取-從各種數據源獲取數據從各種數據源獲取數據 轉換轉換-按需求格式將源數據轉換為目標數據按需求格式將源數據轉換為目標數據 加載加載-把目標數據加載到數據倉庫中把目標數據加載到數據倉庫中2 2、數據采集與大數據采集的區別、數據采集與大數據采集的區別 傳統數據采集:來源單一,數據量相當小;結構單一;關系數據庫和并行數據庫傳統數據采集:來源單一,數據量相當?。唤Y構單一;關系數據庫和并行數據庫 大數據的數據采集:來源廣泛,數量巨大;數據類型豐富;分布式數據庫大數據的

7、數據采集:來源廣泛,數量巨大;數據類型豐富;分布式數據庫1 1、什么是數據采集、什么是數據采集 數據采集就是數據獲取,數據源主要分為線上數據和內容數據數據采集就是數據獲取,數據源主要分為線上數據和內容數據大數據采集系統1.日志采集系統(Apache Flume、Scribe)3.數據庫采集系統(關系型、nosql等各種數據庫)2.網絡數據采集系統(Scrapy 框架、Apache Nutch)5大數據采集應用技能準備PythonPython基礎基礎LinuxLinux操作系統基本操作操作系統基本操作數據庫基礎(數據庫基礎(SQLSQL語句操作)語句操作)環境準備PythonPythonJdk(

8、javaJdk(java環境環境) )數據庫(數據庫(mysqlmysql)Thanks YOUR TITLE Nothing is difficult Nothing is difficult to the man who to the man who will try.Nothing is will try.Nothing is difficult to the man difficult to the man who will try.who will try.Nothing is difficult Nothing is difficult to the man who to the

9、man who will try.Nothing is will try.Nothing is difficult to the man difficult to the man who will try.who will try.Nothing is difficult Nothing is difficult to the man who to the man who will try.Nothing is will try.Nothing is difficult to the man difficult to the man who will try.who will try.Noth

10、ing is difficult Nothing is difficult to the man who to the man who will try.Nothing is will try.Nothing is difficult to the man difficult to the man who will try.who will try.YOUR TITLE Nothing is difficult to the man who Nothing is difficult to the man who will try.Nothing is difficult to the will

11、 try.Nothing is difficult to the man who will try.Nothing is difficult man who will try.Nothing is difficult to the man who will try.Nothing is to the man who will try.Nothing is difficult to the man who will try.difficult to the man who will try.Nothing is difficult to the man who Nothing is diffic

12、ult to the man who will try.Nothing is difficult to the will try.Nothing is difficult to the man who will try.Nothing is difficult man who will try.Nothing is difficult to the man who will try.Nothing is to the man who will try.Nothing is difficult to the man who will try.difficult to the man who wi

13、ll try.2OKPPT工作室YOUR TITLE Nothing is difficult Nothing is difficult to the man who to the man who will try.Nothing is will try.Nothing is difficult to the man difficult to the man who will try.who will try.Nothing is difficult Nothing is difficult to the man who to the man who will try.Nothing is w

14、ill try.Nothing is difficult to the man difficult to the man who will try.who will try.Nothing is difficult Nothing is difficult to the man who to the man who will try.Nothing is will try.Nothing is difficult to the man difficult to the man who will try.who will try.Nothing is difficult Nothing is d

15、ifficult to the man who to the man who will try.Nothing is will try.Nothing is difficult to the man difficult to the man who will try.who will try.YOUR TITLE Nothing is difficult to the Nothing is difficult to the man who will try.Nothing man who will try.Nothing is difficult to the man is difficult

16、 to the man who will try.who will try.Nothing is difficult to the Nothing is difficult to the man who will try.Nothing man who will try.Nothing is difficult to the man who is difficult to the man who will try.will try.Nothing is difficult to Nothing is difficult to the man who will the man who will

17、try.Nothing is difficult try.Nothing is difficult to the man who will to the man who will try.try.Nothing is difficult to the Nothing is difficult to the man who will try.Nothing man who will try.Nothing is difficult to the man is difficult to the man who will try.who will try.Nothing is difficult t

18、o the Nothing is difficult to the man who will try.Nothing man who will try.Nothing is difficult to the man who is difficult to the man who will try.will try.YOUR TITLE 21%9%28%42%3OKPPT工作室YOUR TITLE Nothing is difficult to the man who will try.Nothing is difficult to the man who will Nothing is dif

19、ficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try

20、. try.Nothing is difficult to the man who will try. Nothing is difficult to the man who will try.Nothing is difficult to the man who will Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.try.Nothing is difficult to

21、 the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.YOUR TITLE Nothing is difficult to the man who will try.Nothing is difficult to the man who will N

22、othing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is Nothing is diffi

23、cult to the difficult to the man who will try.man who will try.Nothing is Nothing is difficult to the difficult to the man who will try.man who will try.Nothing is Nothing is difficult to the difficult to the man who will try.man who will try.Nothing is Nothing is difficult to the difficult to the man who will try.man who will try.YOUR TITLE Nothing is difficult to the man who will Nothing is difficult to the man who will try.Nothing is difficult to the man who try.Not

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業或盈利用途。
  • 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論