




版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
10十二月2022DataWarehousingandOLAPTechnology1數據倉庫和OLAP技術什么是數據倉庫(Whatisadatawarehouse)?多維數據模型(Amulti-dimensionaldatamodel)數據倉庫體系結構(Datawarehousearchitecture)數據倉庫實現(Datawarehouseimplementation)FurtherdevelopmentofdatacubetechnologyFromdatawarehousingtodatamining08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology2數據庫的定義傳統的數據庫技術是以單一的數據資源為中心,同時進行從事務處理,批處理到決策分析的各類處理;數據庫主要是為自動化,精簡工作任務和高速數據采集服務的。它的運行是事務驅動,面向應用的,數據庫的根本任務是完成數據操作,即及時安全地將當前事務所產生的記錄保存下來。08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology3兩種不同的數據處理需求計算機系統中存在著兩類不同的數據處理需求,即:操作型處理(事務處理):主要是對一個或一組記錄的查詢和修改,這時候人們關心的是響應時間、數據的安全性和完整性;分析型處理(信息型處理):用于管理人員的決策分析,如DDS(decisionsupportsystem)、多維分析等。
08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology4為什么要建立數據倉庫?數據DATA知識KNOWLEDGE決定DECISIONSPatternsTrendsFactsRelationsModelsAssociationsSequencesTargetMarketsFundsallocationTradingoptionsWheretoadvertiseCatalogmailinglistSalesgeography財經的Financial經濟的Economic政府Government銷售分數Point-of-Sale人口統計學Demographic生活方式Lifestyle
痛苦:太多數據,無法作出正確判斷!08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology5WhatisDataWarehouse?"數據倉庫是在企業管理和決策中面向主題的,集成的,與時間相關的和不可修改的數據集合“Adatawarehouseisa
subject-oriented,integrated,time-variant,andnonvolatile
collectionofdatainsupportofmanagement’sdecision-makingprocess.”—W.H.InmonDatawarehousing:Theprocessofconstructingandusingdatawarehouses08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology6DataWarehouse—Subject-OrientedOrganizedaroundmajorsubjects,suchascustomer,product,sales.Focusingonthemodelingandanalysisofdatafordecisionmakers,notondailyoperationsortransactionprocessing.Provideasimpleandconciseviewaroundparticularsubjectissuesbyexcludingdatathatarenotusefulinthedecisionsupportprocess.08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology7面向應用舉例采購子系統:訂單(訂單號,供應商號,總金額,日期)訂單細則(訂單號,商品號,類別,單價,數量)供應商(供應商號,供應商名,地址,電話)銷售子系統:顧客(顧客號,姓名,性別,年齡,地址,電話)銷售(員工號,顧客號,商品號,數量,單價日期)庫存管理子系統:領料單(領料單號,領料人,商品號,數量,日期)進料單(進料單號,訂單號,進料人,收料人,日期)庫存(商品號,庫房號,庫存量,日期)庫房(庫房號,倉庫保管員,地點,庫存商品描述)人事管理子系統:員工(員工號,姓名,性別,年齡,部門號)部門(部門號,部門名稱,部門主管,電話)面向主題舉例:商品:商品固有信息:商品號,商品名,類別,顏色等商品采購信息:商品號,供應商號,供應價,供應日期,供應量等商品銷售信息:商品號,顧客號,售價,銷售日期,銷售量等商品庫存信息:商品號,庫房號,日期,庫存量等供應商:供應商固有信息:供應商號,供應商名,地址,電話等供應商品信息:供應商號,商品號,供應價,供應日期,供應量等顧客:顧客固有信息:顧客號,顧客名,性別,年齡,住址,電話等顧客購物信息:顧客號,商品號,售價,購買日期,購買量等08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology8DataWarehouse—IntegratedConstructedbyintegratingmultiple,heterogeneousdatasourcesrelationaldatabases,flatfiles,on-linetransactionrecordsDatacleaninganddataintegrationtechniquesareapplied.Ensureconsistencyinnamingconventions,encodingstructures,attributemeasures,etc.amongdifferentdatasourcesE.g.,Hotelprice:currency,tax,breakfastcovered,etc.Whendataismovedtothewarehouse,itisconverted.08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology9DataWarehouse—TimeVariantThetimehorizonforthedatawarehouseissignificantlylongerthanthatofoperationalsystems.Operationaldatabase:currentvaluedata.Datawarehousedata:provideinformationfromahistoricalperspective(e.g.,past5-10years)EverykeystructureinthedatawarehouseContainsanelementoftime,explicitlyorimplicitlyButthekeyofoperationaldatamayormaynotcontain“timeelement”.08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology10DataWarehouse—Non-VolatileAphysicallyseparatestoreofdatatransformedfromtheoperationalenvironment.Operationalupdateofdatadoesnotoccurinthedatawarehouseenvironment.Doesnotrequiretransactionprocessing,recovery,andconcurrencycontrolmechanismsRequiresonlytwooperationsindataaccessing:initialloadingofdataandaccessofdata.08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology11DataWarehousevs.HeterogeneousDBMSTraditionalheterogeneousDBintegration:Buildwrappers/mediatorsontopofheterogeneousdatabasesQuerydrivenapproachWhenaqueryisposedtoaclientsite,ameta-dictionaryisusedtotranslatethequeryintoqueriesappropriateforindividualheterogeneoussitesinvolved,andtheresultsareintegratedintoaglobalanswersetComplexinformationfiltering,competeforresourcesDatawarehouse:update-driven,highperformanceInformationfromheterogeneoussourcesisintegratedinadvanceandstoredinwarehousesfordirectqueryandanalysis08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology12DataWarehousevs.OperationalDBMSOLTP(on-linetransactionprocessing)MajortaskoftraditionalrelationalDBMSDay-to-dayoperations:purchasing,inventory,banking,manufacturing,payroll,registration,accounting,etc.OLAP(on-lineanalyticalprocessing)MajortaskofdatawarehousesystemDataanalysisanddecisionmakingDistinctfeatures(OLTPvs.OLAP):Userandsystemorientation:customervs.marketDatacontents:current,detailedvs.historical,consolidatedDatabasedesign:ER+applicationvs.star+subjectView:current,localvs.evolutionary,integratedAccesspatterns:updatevs.read-onlybutcomplexqueries08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology13OLTPvs.OLAP08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology14WhySeparateDataWarehouse?HighperformanceforbothsystemsDBMS—tunedforOLTP:accessmethods,indexing,concurrencycontrol,recoveryWarehouse—tunedforOLAP:complexOLAPqueries,multidimensionalview,consolidation.Differentfunctionsanddifferentdata:missingdata:DecisionsupportrequireshistoricaldatawhichoperationalDBsdonottypicallymaintaindataconsolidation:DSrequiresconsolidation(aggregation,summarization)ofdatafromheterogeneoussourcesdataquality:differentsourcestypicallyuseinconsistentdatarepresentations,codesandformatswhichhavetobereconciled08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology15DataWarehousingandOLAPTechnologyWhatisadatawarehouse?Amulti-dimensionaldatamodelDatawarehousearchitectureDatawarehouseimplementationFurtherdevelopmentofdatacubetechnologyFromdatawarehousingtodatamining08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology16FromTablesandSpreadsheetstoDataCubesAdatawarehouseisbasedonamultidimensionaldatamodelwhichviewsdataintheformofadatacubeAdatacube,suchassales,allowsdatatobemodeledandviewedinmultipledimensionsDimensiontables,suchasitem(item_name,brand,type),ortime(day,week,month,quarter,year)Facttablecontainsmeasures(suchasdollars_sold)andkeystoeachoftherelateddimensiontablesIndatawarehousingliterature,ann-Dbasecubeiscalledabasecuboid.Thetopmost0-Dcuboid,whichholdsthehighest-levelofsummarization,iscalledtheapexcuboid.Thelatticeofcuboidsformsadatacube.08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology17Cube:ALatticeofCuboidsalltimeitemlocationsuppliertime,itemtime,locationtime,supplieritem,locationitem,supplierlocation,suppliertime,item,locationtime,item,suppliertime,location,supplieritem,location,suppliertime,item,location,supplier0-D(apex)cuboid1-Dcuboids2-Dcuboids3-Dcuboids4-D(base)cuboid08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology18ConceptualModelingofDataWarehousesModelingdatawarehouses:dimensions&measuresStarschema:AfacttableinthemiddleconnectedtoasetofdimensiontablesSnowflakeschema:Arefinementofstarschemawheresomedimensionalhierarchyisnormalizedintoasetofsmallerdimensiontables,formingashapesimilartosnowflakeFactconstellations:Multiplefacttablessharedimensiontables,viewedasacollectionofstars,thereforecalledgalaxyschemaorfactconstellation
08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology19ExampleofStarSchema
time_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcityprovince_or_streetcountrylocationSalesFactTable
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_salesMeasuresitem_keyitem_namebrandtypesupplier_typeitembranch_keybranch_namebranch_typebranch08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology20ExampleofSnowflakeSchematime_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcity_keylocationSalesFactTable
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_salesMeasuresitem_keyitem_namebrandtypesupplier_keyitembranch_keybranch_namebranch_typebranchsupplier_keysupplier_typesuppliercity_keycityprovince_or_streetcountrycity08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology21ExampleofFactConstellationtime_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcityprovince_or_streetcountrylocationSalesFactTabletime_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_salesMeasuresitem_keyitem_namebrandtypesupplier_typeitembranch_keybranch_namebranch_typebranchShippingFactTabletime_key
item_key
shipper_key
from_location
to_location
dollars_cost
units_shippedshipper_keyshipper_namelocation_keyshipper_typeshipper08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology22ADataMiningQueryLanguage,DMQL:LanguagePrimitivesCubeDefinition(FactTable)definecube<cube_name>[<dimension_list>]:<measure_list>DimensionDefinition(DimensionTable)definedimension<dimension_name>as(<attribute_or_subdimension_list>)SpecialCase(SharedDimensionTables)Firsttimeas“cubedefinition”definedimension<dimension_name>as<dimension_name_first_time>incube<cube_name_first_time>08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology23DefiningaStarSchemainDMQLdefinecubesales_star[time,item,branch,location]:dollars_sold=sum(sales_in_dollars),avg_sales=avg(sales_in_dollars),units_sold=count(*)definedimensiontimeas(time_key,day,day_of_week,month,quarter,year)definedimensionitemas(item_key,item_name,brand,type,supplier_type)definedimensionbranchas(branch_key,branch_name,branch_type)definedimensionlocationas(location_key,street,city,province_or_state,country)08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology24DefiningaSnowflakeSchemainDMQLdefinecubesales_snowflake[time,item,branch,location]:dollars_sold=sum(sales_in_dollars),avg_sales=avg(sales_in_dollars),units_sold=count(*)definedimensiontimeas(time_key,day,day_of_week,month,quarter,year)definedimensionitemas(item_key,item_name,brand,type,supplier(supplier_key,supplier_type))definedimensionbranchas(branch_key,branch_name,branch_type)definedimensionlocationas(location_key,street,city(city_key,province_or_state,country))08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology25DefiningaFactConstellationinDMQLdefinecubesales[time,item,branch,location]:dollars_sold=sum(sales_in_dollars),avg_sales=avg(sales_in_dollars),units_sold=count(*)definedimensiontimeas(time_key,day,day_of_week,month,quarter,year)definedimensionitemas(item_key,item_name,brand,type,supplier_type)definedimensionbranchas(branch_key,branch_name,branch_type)definedimensionlocationas(location_key,street,city,province_or_state,country)definecubeshipping[time,item,shipper,from_location,to_location]:dollar_cost=sum(cost_in_dollars),unit_shipped=count(*)definedimensiontimeastimeincubesalesdefinedimensionitemasitemincubesalesdefinedimensionshipperas(shipper_key,shipper_name,locationaslocationincubesales,shipper_type)definedimensionfrom_locationaslocationincubesalesdefinedimensionto_locationaslocationincubesales08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology26Measures:ThreeCategoriesdistributive:iftheresultderivedbyapplyingthefunctiontonaggregatevaluesisthesameasthatderivedbyapplyingthefunctiononallthedatawithoutpartitioning.E.g.,count(),sum(),min(),max().algebraic:
ifitcanbecomputedbyanalgebraicfunctionwithMarguments(whereMisaboundedinteger),eachofwhichisobtainedbyapplyingadistributiveaggregatefunction.E.g.,
avg(),min_N(),standard_deviation().holistic:ifthereisnoconstantboundonthestoragesizeneededtodescribeasubaggregate.
E.g.,median(),mode(),rank().08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology27AConceptHierarchy:Dimension(location)allEuropeNorth_AmericaMexicoCanadaSpainGermanyVancouverM.WindL.Chan..................allregionofficecountryTorontoFrankfurtcity08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology28ViewofWarehousesandHierarchiesSpecificationofhierarchiesSchemahierarchyday<{month<quarter;week}<yearSet_groupinghierarchy{1..10}<inexpensive08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology29MultidimensionalDataSalesvolumeasafunctionofproduct,month,andregionProductRegionMonthDimensions:Product,Location,TimeHierarchicalsummarizationpathsIndustryRegionYearCategoryCountryQuarterProductCityMonthWeekOfficeDay08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology30ASampleDataCubeTotalannualsalesofTVinU.S.A.DateProductCountryAll,All,Allsumsum
TVVCRPC1Qtr2Qtr3Qtr4QtrU.S.ACanadaMexicosum08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology31CuboidsCorrespondingtotheCubeallproductdatecountryproduct,dateproduct,countrydate,countryproduct,date,country0-D(apex)cuboid1-Dcuboids2-Dcuboids3-D(base)cuboid08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology32BrowsingaDataCubeVisualizationOLAPcapabilitiesInteractivemanipulation08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology33TypicalOLAPOperationsRollup(drill-up):summarizedatabyclimbinguphierarchyorbydimensionreductionDrilldown(rolldown):reverseofroll-upfromhigherlevelsummarytolowerlevelsummaryordetaileddata,orintroducingnewdimensionsSliceanddice:
projectandselect
Pivot(rotate):
reorientthecube,visualization,3Dtoseriesof2Dplanes.Otheroperationsdrillacross:involving(across)morethanonefacttabledrillthrough:throughthebottomlevelofthecubetoitsback-endrelationaltables(usingSQL)08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology34AStar-NetQueryModel
ShippingMethodAIR-EXPRESSTRUCKORDERCustomerOrdersCONTRACTSCustomerProductPRODUCTGROUPPRODUCTLINEPRODUCTITEMSALESPERSONDISTRICTDIVISIONOrganizationPromotionCITYCOUNTRYREGIONLocationDAILYQTRLYANNUALYTimeEachcircleiscalledafootprint08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology35DataWarehousingandOLAPTechnologyforDataMiningWhatisadatawarehouse?Amulti-dimensionaldatamodelDatawarehousearchitectureDatawarehouseimplementationFurtherdevelopmentofdatacubetechnologyFromdatawarehousingtodatamining08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology36DesignofaDataWarehouse:ABusinessAnalysisFrameworkFourviewsregardingthedesignofadatawarehouseTop-downviewallowsselectionoftherelevantinformationnecessaryforthedatawarehouseDatasourceviewexposestheinformationbeingcaptured,stored,andmanagedbyoperationalsystemsDatawarehouseviewconsistsoffacttablesanddimensiontablesBusinessqueryview
seestheperspectivesofdatainthewarehousefromtheviewofend-user08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology37DataWarehouseDesignProcessTop-down,bottom-upapproachesoracombinationofbothTop-down:Startswithoveralldesignandplanning(mature)Bottom-up:Startswithexperimentsandprototypes(rapid)FromsoftwareengineeringpointofviewWaterfall:structuredandsystematicanalysisateachstepbeforeproceedingtothenextSpiral:rapidgenerationofincreasinglyfunctionalsystems,shortturnaroundtime,quickturnaroundTypicaldatawarehousedesignprocessChooseabusinessprocesstomodel,e.g.,orders,invoices,etc.Choosethegrain(atomiclevelofdata)ofthebusinessprocessChoosethedimensionsthatwillapplytoeachfacttablerecordChoosethemeasurethatwillpopulateeachfacttablerecord08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology38Multi-TieredArchitectureDataWarehouseExtractTransformLoadRefreshOLAPEngineAnalysisQueryReportsDataminingMonitor&IntegratorMetadataDataSourcesFront-EndToolsServeDataMartsOperational
DBsothersourcesDataStorageOLAPServer08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology39SourceDatabasesDataExtraction,Transformation,loadWarehouseAdmin.ToolsExtract,TransformandLoadDataModelingToolCentralMetadataArchitectedDataMartsDataAccessandAnalysisEnd-UserDWToolsCentralDataWarehouseCentralDataWarehouseMid-TierMid-TierDataMartDataMartLocalMetadataLocalMetadataLocalMetadataMetadataExchangeMDBDataCleansingToolRelationalAppl.PackageLegacyExternalRDBMSRDBMS體系結構
[Pieter,1998]08十二月2022DataWarehousingan數據倉庫的焦點問題-數據的獲得、存儲和使用
RelationalPackageLegacyExternalsourceDataCleanToolDataStagingEnterpriseDataWarehouseDatamartDatamartRDBMSROLAPRDBMSEnd-UserToolEnd-UserToolMDBEnd-UserToolEnd-UserTool數據倉庫和集市的加載能力至關重要數據倉庫和集市的查詢輸出能力至關重要數據倉庫的焦點問題-數據的獲得、存儲和使用
RelationETL工具去掉操作型數據庫中的不需要的數據統一轉換數據的名稱和定義計算匯總數據和派生數據估計遺失數據的缺省值調節源數據的定義變化ETL工具去掉操作型數據庫中的不需要的數據10十二月2022DataWarehousingandOLAPTechnology42ThreeDataWarehouseModelsEnterprisewarehousecollectsalloftheinformationaboutsubjectsspanningtheentireorganizationDataMartasubsetofcorporate-widedatathatisofvaluetoaspecificgroupsofusers.Itsscopeisconfinedtospecific,selectedgroups,suchasmarketingdatamartIndependentvs.dependent(directlyfromwarehouse)datamartVirtualwarehouseAsetofviewsoveroperationaldatabasesOnlysomeofthepossiblesummaryviewsmaybematerialized08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology43DataWarehouseDevelopment:ARecommendedApproachDefineahigh-levelcorporatedatamodelDataMartDataMartDistributedDataMartsMulti-TierDataWarehouseEnterpriseDataWarehouseModelrefinementModelrefinement08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology44OLAPServerArchitecturesRelationalOLAP(ROLAP)
Userelationalorextended-relationalDBMStostoreandmanagewarehousedataandOLAPmiddlewaretosupportmissingpiecesIncludeoptimizationofDBMSbackend,implementationofaggregationnavigationlogic,andadditionaltoolsandservicesgreaterscalabilityMultidimensionalOLAP(MOLAP)
Array-basedmultidimensionalstorageengine(sparsematrixtechniques)fastindexingtopre-computedsummarizeddataHybridOLAP(HOLAP)Userflexibility,e.g.,lowlevel:relational,high-level:arraySpecializedSQLserversspecializedsupportforSQLqueriesoverstar/snowflakeschemas08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology45DataWarehousingandOLAPTechnologyforDataMiningWhatisadatawarehouse?Amulti-dimensionaldatamodelDatawarehousearchitectureDatawarehouseimplementationFurtherdevelopmentofdatacubetechnologyFromdatawarehousingtodatamining08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology46EfficientDataCubeComputationDatacubecanbeviewedasalatticeofcuboidsThebottom-mostcuboidisthebasecuboidThetop-mostcuboid(apex)containsonlyonecellHowmanycuboidsinann-dimensionalcubewithLlevels?MaterializationofdatacubeMaterializeevery(cuboid)(fullmaterialization),none(nomaterialization),orsome(partialmaterialization)SelectionofwhichcuboidstomaterializeBasedonsize,sharing,accessfrequency,etc.08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology47CubeOperationCubedefinitionandcomputationinDMQLdefinecubesales[item,city,year]:sum(sales_in_dollars)computecubesalesTransformitintoaSQL-likelanguage(withanewoperatorcubeby,introducedbyGrayetal.’96)SELECTitem,city,year,SUM(amount)FROMSALESCUBEBYitem,city,yearNeedcomputethefollowingGroup-Bys
(date,product,customer),(date,product),(date,customer),(product,customer),(date),(product),(customer)()(item)(city)()(year)(city,item)(city,year)(item,year)(city,item,year)08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology48CubeComputation:ROLAP-BasedMethodEfficientcubecomputationmethodsROLAP-basedcubingalgorithms(Agarwaletal’96)Array-basedcubingalgorithm(Zhaoetal’97)Bottom-upcomputationmethod(Bayer&Ramarkrishnan’99)ROLAP-basedcubingalgorithmsSorting,hashing,andgroupingoperationsareappliedtothedimensionattributesinordertoreorderandclusterrelatedtuplesGroupingisperformedonsomesubaggregatesasa“partialgroupingstep”Aggregatesmaybecomputedfrompreviouslycomputedaggregates,ratherthanfromthebasefacttable08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology49CubeComputation:ROLAP-BasedMethod(2)ThisisnotinthetextbookbutinaresearchpaperHash/sortbasedmethods(Agarwalet.al.
VLDB’96)Smallest-parent:computingacuboidfromthesmallestcubodpreviouslycomputedcuboid.Cache-results:cachingresultsofacuboidfromwhichothercuboidsarecomputedtoreducediskI/OsAmortize-scans:computingasmanyaspossiblecuboidsatthesametimetoamortizediskreadsShare-sorts:sharingsortingcostscrossmultiplecuboidswhensort-basedmethodisusedShare-partitions:sharingthepartitioningcostcrossmultiplecuboidswhenhash-basedalgorithmsareused08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology50Multi-wayArrayAggregationforCubeComputationPartitionarraysintochunks(asmallsubcubewhichfitsinmemory).Compressedsparsearrayaddressing:(chunk_id,offset)Computeaggregatesin“multiway”byvisitingcubecellsintheorderwhichminimizesthe#oftimestovisiteachcell,andreducesmemoryaccessandstoragecost.Whatisthebesttraversingordertodomulti-wayaggregation?AB29303132123459131415166463626148474645a1a0c3c2c1c0b3b2b1b0a2a3CB44285640245236206008十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology51Multi-wayArrayAggregationforCubeComputationAB29303132123459131415166463626148474645a1a0c3c2c1c0b3b2b1b0a2a3C442856402452362060B08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology52Multi-wayArrayAggregationforCubeComputationAB29303132123459131415166463626148474645a1a0c3c2c1c0b3b2b1b0a2a3C442856402452362060B08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology53Multi-WayArrayAggregationforCubeComputation(Cont.)Method:theplanesshouldbesortedandcomputedaccordingtotheirsizeinascendingorder.SeethedetailsofExample2.12(pp.75-78)Idea:keepthesmallestplaneinthemainmemory,fetchandcomputeonlyonechunkatatimeforthelargestplaneLimitationofthemethod:computingwellonlyforasmallnumberofdimensionsIftherearealargenumberofdimensions,“bottom-upcomputation”andicebergcubecomputationmethodscanbeexplored08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology54IndexingOLAPData:BitmapIndexIndexonaparticularcolumnEachvalueinthecolumnhasabitvector:bit-opisfastThelengthofthebitvector:#ofrecordsinthebasetableThei-thbitissetifthei-throwofthebasetablehasthevaluefortheindexedcolumnnotsuitableforhighcardinalitydomainsBasetableIndexonRegionIndexonType08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology55IndexingOLAPData:JoinIndicesJoinindex:JI(R-id,S-id)whereR(R-id,…)S(S-id,…)TraditionalindicesmapthevaluestoalistofrecordidsItmaterializesrelationaljoininJIfileandspeedsuprelationaljoin—arathercostlyoperationIndatawarehouses,joinindexrelatesthevaluesofthedimensionsofastartschematorowsinthefacttable.E.g.facttable:SalesandtwodimensionscityandproductAjoinindexoncitymaintainsforeachdistinctcityalistofR-IDsofthetuplesrecordingtheSalesinthecityJoinindicescanspanmultipledimensions08十二月2022DataWarehousingan10十二月2022DataWarehousingandOLAPTechnology56EfficientProcessingOLAPQueriesDeterminewhichoperationsshouldbeperformedontheavailablecuboids:transformdrill,roll,ocorrespondingSQLand/orOLAPoperations,e.g,dice=selection+projectionDeterminetowhichmaterializedcuboid(s)therelevantoperationsshouldbeapplied.Exploringindexingstructuresandcompressedvs.densearraystructuresinMOLAP08十二月2022DataWarehousingan10
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經權益所有人同意不得將文件中的內容挪作商業或盈利用途。
- 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 醫療機構患者隱私及信息保密協議書范本
- 旗桿采購及戶外照明及廣告發布合同
- 2025合同范本廣告制作委托合同示例
- 2025未簽訂勞動合同證明
- 2025專業版抵押借款合同范本
- 2025年水電站報廢改建工程機電設備更新改造項目招標合同商務條款
- 線練學校高三英語第一次模擬考試卷 (三)
- 基于多傳感信息融合的隧道掌子面炮孔檢測與定位方法研究
- 財務穩健性測試題及答案
- 健康有趣小測試題及答案
- 2023-2024學年廣西貴百河高一下學期5月新高考月考測試數學試卷(含答案)
- 2024年四川省樂山市中考生物試卷附答案
- JBT 14543-2024 無刷穩速直流電動機技術規范(正式版)
- JBT 7041.1-2023 液壓泵 第1部分:葉片泵 (正式版)
- 信息技術智慧樹知到期末考試答案章節答案2024年煙臺職業學院
- 成功求職六步走-知到答案、智慧樹答案
- 第8課《良師相伴 亦師亦友》第1框《良師相伴助力成長》-【中職專用】《心理健康與職業生涯》同步課堂課件
- 第二部 第四章-名著《鋼鐵是怎樣煉成的》閱讀導引+思維導圖+內容概括+原文批注+閱讀訓練
- 2024春期國開電大本科《現代漢語專題》在線形考(任務1至6)試題及答案
- MOOC 跨文化交際-蘇州大學 中國大學慕課答案
- 國開《當代中國政治制度》機考復習題匯總
評論
0/150
提交評論