




版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
HowtoSelectYour
CloudDataWarehousePlatformStrategically
May2022
RichardWinter,CEO
WinterCorpLLC
TheExpertsinAnalyticDataManagementatScale
WC
RichardWinter
Presenter
RichardWinter,CEOandPrincipalConsultant,WinterCorp
?Careerfocusinanalyticdatamanagementatlargescale
?Pioneeringdeveloperofdatawarehousetechnologyandproducts
?Hasperformedmorethan50datawarehouseevaluations,oftenincludingbenchmarks,forleadingenterprises
?Advisertoindustryleadersinanalytics,userandvendor
?Skilledatdatawarehousestrategies,requirements,strategicdesign,engineeringanalysis,testdesignandevaluation
?Experiencedconsultantandtechnologyeducator,widelypublished
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC2
TheDataDrivenEnterpriseofthe2020s
MostCompaniesareIncreasingUseof:
1.Digital,realtimebusinessprocesses
2.Automateddecisionmaking(AI/ML)
3.Focusoncompetingwithanalytics
Alltrendsacceleratedbytheincreasingspeedofbusinesscompetition
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC3
SourceSystems
…
Orders
Report1
Products
Products
Report2
Suppliers
ReportN
Shipments
Cleanseddata,integrated
acrosstheenterprise
WeeklyLoads
TheDataWarehouseImpetus:1990s
Orders
CONSISTENT,ACCURATEREPORTING
DataWarehouse
Inventory
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC4
ANALYZING
WHYdidit
happen?
REPORTING
WHAT
happened?
A
STRATEGICINTELLIGENCE
AT
PREDICTING
WHATWILL
happen?
StagesofDataWarehouseEvolution
OPERATIONALINTELLIGENCE
OPERATIONALIZING
WHATIS
happeningnow?
ACTIVATING
MAKEithappen!
A
T
A
T
Batch
AdHoc
AnalyticsContinuousupdates, tacticalqueriesEventdriven
A
T
Source:DiagramadaptedfromBrobst&Rarey,2003 ?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.WWCC5
WC
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
LeadingEdge–200%CAGR
EB
5
4
3
2
Estimated
1
2010201420182022
Data
Warehouse
Growth
Source:WinterCorpPrimaryResearch–surveysandcustomerreports
WC6
WC
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
Key
Dimensions
ofData
Warehouse
Scale
WC7
Averagecommercialbuilding(20ksf)
BurjKhalifa,Dubai–TallestBuildingintheWorld
Average
single
family
home
(2500sf)
?2717feettall
?3.3millionsquarefeetinteriorspace
?57elevators
?62milesofpipesforplumbing(plus132milesforthefireemergencysystem)
?24kwindows
?62khpcoolingsystem
?1100xthesizeofasinglefamilyhome
?165xthesizeoftheaveragecommercialbuilding
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC8
SCALE+COMPLEXITY
Ifyouonlyrememberonethingtoday…
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC9
DataMartvsIntegratedDataWarehouse(IDW)
?Purposebuilt
?Onelargetable
?Dimensionsusuallysmall
?1-1or1-to-manyrelationships
?Commonpartitioningkeys
?Mostjoinsnotsodemanding
?User-visiblepre-joinscanbeok
StarSchema–DataMart
Store
PK
storeid
storename
street
city
state
zip
managerid
storeclass
storesize
salesstrategy
buildingtype
dateopened
dateclosed
numberofregisterscachementpopulation
SMSA
censustract
Transaction
PK
transid
transtype
date
time
productid
storeid
customerid
associateid
basketid
quantity
unitprice
discount
taxcategory
taxamount
extendedpricepaymentmethodaccountid
Product
PK
productid
namedescription
SKU
package
size
color
style
attr1-n
Customer
PK
customerid
name
street
city
state
zip
phone
maritalstatusfamilysizeagegroupincomegroupeducationallevelgenderbirthyear
attr1-n
?Manylargetableswithmultiplejoinpaths
?Many-to-manyrelationships
?Differentpartitioningkeys
?Joinsaremoredemanding
?User-visiblepre-joinsareaproblem
RelationalSchema-IDW
Organization
PK
FK1
name
street
city
state
zip
country
phone
fax
emailwebaddressindustrycodenumberofempbusinesssizestatus
startdate
enddate
orgtype
partyID
Person
PK
personID
FK1
FK2
name
street
city
state
zip
phone
gender
dob
incomelevelstatus
startdateenddatehouseholdIDpartyID
Household
PK
householdID
FK1
street
ctiy
state
zip
phone
householdsizeincomelevelpartyID
Party
PK
partyID
partytypestartdateenddate
Arrangement
PK
arrangementID
arrangementtypenamedescription
startdate
enddate
notes
Resource
PK
resourceID
FK1
resourcetypestartdataenddatenamedescriptionarrangementID
Location
PK
locationID
FK1
locationtype
name
street
city
state
zip
partyID
Product
PK
productID
FK2
productclassnamedescriptionproductstatusarrangementID
Conditions
PK
conditionID
FK1
conditiontypestartdateenddateamountarrangementID
Statement
PK
statement_id
FK1
balancepast_due_amountpastduedayseventID
Event
PK
eventID
FK1
eventtypeeventdateeventtimeeventamountaccountID
Relationship
PK
relationshipID
FK1
FK2
FK3
rel_typeaccountIDarrangementIDpartyID
Account
PK
accountID
accounttypestartdateenddateaccountstatus
orgID
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC10
IllustrativeDataWarehouseUses:FinancialServices
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC11
WC
12
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
TheDataRelationshipsinanIntegratedDataWarehouse
?Analysisofactualdatarelationshipsinanintegrateddatawarehouse,largeUScompany
?Dotsaretablesorviews
?Linesaredatarelationshipsthatactuallyoccurinqueries
?Diagramillustratesthatacombinatorialexplosionofjoinsdevelopsasthebusinessvalueofthedatarelationshipsisexploited
WC
IntegratedDataWarehouse
TYPICALCOREREQUIREMENTS
?Supportefficient(bigtable,bigtable)joinswithoutco-locationorpre-join(Toomanydifferentjoinsarerequired)
?Oftenlargerscale
?Supportcomplexqueriesalongavarietyofjoinpaths
?Effectivecomplexqueryoptimization
?Efficient,fasttacticalquery
?Managecomplex,mixedworkloads
?Supportmanyconcurrentoperations
?Supportupdatesthroughoutthedayaswellasanynightlybatchruns
?High,ifnotcontinuous,availabilityofup-to-datedata
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC13
JoinPerformanceatScale
?Challengingproblemiswhenthesetstobematchedarelarge
?Costandtimearemanytimesgreaterwhen:
—Techniquesarenotscalable
—Intermediateresultsspilltodisk
?Customerscanbemisledwhenthesamequeriesworkfineatsmallerscale
?Dominatestheperformancepictureinmanydatawarehouses
?Allpopulardatawarehouseplatformsperformsomejoinswell
?Manyperformthelargerandmorecomplexjoinspoorly
?Differencebetweenthebestandtheworstisoftenverylarge
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC14
JoinTestDatabase
?TPC-Hbenchmarkdatabase
?ScaleFactors:10,100,1000,10000
?AtSF1databaseis1GB
TableName
RowCount
RowSize
(Bytes)
EstimatedTableSize(MB,Uncompressed)
LineItem
6,000,000
121
725
Order
1,500,00
110
165
Part
200,00
120
24
Part-Supplier
800,000
143
114
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC15
Orders
X4
6millionrows
660MB
Part-Supplier
800Krows
114MB
N
Part
200Krows
24MB
4
N
N
1
1
DatabaseQuery–BigJoin
AtSF1,resultingjoinisatablewith:
?24millionrows
?450bytesperrow
?≈10.8GBtotalsize,uncompressed
Allsizesincreaseproportionallywithscalefactor
Line
Item
6millionrows
725MB
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC16
QueryPerformanceandCostvsDataSize
?Runonsmallvirtualdatawarehouse
?Twoservers,eachwith8virtualprocessors
DataSize
ResponseTime
(Seconds)
OnDemand
Cost
Comparedto
LinearGrowth
10GB
156
$0.09
-
100GB
1626
$0.90
104%
1TB
30,656
$17
196%
10TB
736,072
$409
454%
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC17
$17at1TB
(196%ofprojectedcost)
QueryCostisWorseThanLinearwithDataSize
POPULARCLOUDDATAWAREHOUSESERVICE
QueryCost($)
$409at10TB
Source:WinterCorpCloudDatabaseLabTestingonGCP
400
(472%ofprojectedcost)
320
240
ActualProjected
160
$0.90at100GB
Shouldbe$90at10TBGB
80
DataSize(GB)
100
1000
500
Shouldbe
$9at1TB
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC18
BigJoinExperiment:Interpretation
?Forthesystemtested,causeoftheperformanceproblemwasthatintermediatequeryresultsweresignificantlylargerthanSSDperserver
?Largeintermediateresultsoccurwith:
—Joinofmultiplebigtables—commonintheintegrateddatawarehouse
—Self-joinofalargetable—commoninpathanalytics,networkanalytics,IoT
?Specificquerypatternoccursinpractice,thoughinfrequently
?Sameproblemoccursinnconcurrentquerieseachneedcreate(1/n)intermediateresults—canoccurinawiderangeofusecases
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC19
15
?Independentconsultant
?Differentrequirement
ProductB
12
mightfavoradifferentproduct
9
WC
20
AnnualCloudChargesfortheSameDataWarehouse
WorkloadwithThreePopularCloudDWEngines
?Realcustomerdatabase&workload
$Millions
ProductA
6
Nooneproductis
alwaysbest!!!
ProductC
3
Source:BEZNextWhitePaper-WhichPlatformIsBestforYourCloudDataWarehouse
/#readthepaper
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC
DataWarehousingintheCloud
THEPROMISE
—Agility
—Elasticity
—Cost
—Scalability
—WorkloadManagement
Thesub-text:don’tworry,moreresourcesarealwaysavailable
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC21
DataWarehousingintheCloud
THEREALITY
?Datawarehouseengineeringchallengesarestillfundamental—Performance
—Cost
—DataAvailability
majorriskareas
hardtodeliveratenterprisescaleandcomplexity
?Resourcesnotfree
?Doublingtheresourcesoftendoesn’tdoubletheperformanceorthroughput
?Scalerisesfast
DatawarehouseproductsdiffergreatlyinperformanceandcostandtodaywearefocusingonBIGdifferences(usually>3x)
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC22
WWCC
MostWidelyUsedCloudDataWarehousePlatforms
Vendor
AmazonWeb
Services
(AWS)
Cloudera
Microsoft
Oracle
Snowflake
Teradata
Product/Service
Redshift
ClouderaData
Warehouse
AzureSynapseAnalytics
BigQuery
AutonomousDataWarehouse(ADW)
SnowflakeDataPlatform
Vantage
CloudAvailability
AWS
AWS,Azure,GCP
Azure
GCP
OracleCloud(AWSNotAs-a-Service)
AWS,Azure,GCP
AWS
GCPAnnounced
MicrosoftAzure
TeradataIntellicloud
OnPremisesAvailability
No
Yes,onHadoop
MicrosoftSQLServer
(Similarbutnotidentical)
No
Yes
Oracle
Cloud@Customer
No
Yes,optionallywithconsumptionpricing
Serverless
No
No
No
Yes
Twomodes:dedicatedandshared(~serverless)
ServerT-ShirtSizes
No
PricingModel
Default:ondemand
pricingbythesecond;
reservedinstance
discount(1-3yearterm);
payseparatelyfordata
storage
Priceperhourperserver
forsoftware;customer
payscloudprovider
separatelyforcompute
andstorage
Separatecomputeand
storage.Payfordata
stored,withflexible
scaling.Payforthe
computelevel,with
optiontopausecompute
entirely.Monthlypricing
(with3yearterm)
Default:ondemandpricing-
payperquerybasedondata
scanned;flatratepricingfor
reserved"queryslots"(a
blendedresourceunit);pay
separatelyfordatastorage
Unitpricingfor"shared"
publiccloudusefor
computeandstorage;
dedicatedhardwarewith
licenseportability
availableinpubliccloud
andonpremise;private
cloudonpremise
availableonsubscription
basis.
Separationofcompute
andstorage;"Snowflake
onDemand"persecond
pricing;reservedcapacity
availableonrequest;3
editions
Licenseportability;
consumptionpricing--
payonlyforcompleted
queries(cloudandon
prem);customernot
chargedforoverhead
operationssuchas
statistics;alsooffer
capacitypricing;
?2020,2021WinterCorpLLC,TyngsboroMA
23
WC
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
The
Datanolongerneedstobein
onecentralstore:cloud,onprem,multi-cloud,allofabove
Modern
Data
Warehouse
WC24
TheModernAnalyticDataPlatform
DataLake
Alltypesofdata
including“unstructured”
(largevideo,scans,etc.)
Managesalltypesofdataforanalyticuseincludingdatathatis/has:
?largeandvariableinsize
?relatively“unstructured”
?lowervaluedensity
?lessshareduse
?variedlevelsofcuration
?colder
Data&Requests
Data&Requests
Data&requestsflowinboth
directions
Boundariesareunclearand
changing
Maybeitwillsoonbeone
“dataplatform”–sometimes
positionedasthe“datacloud”
DataWarehouse
Manytypesofdata,mosthassomestructure
Managesmanytypesof
dataforanalyticusethat
ismost:
?intensivelyused
?widelyshared
?highlycurated
?integrated
?valuable
DataLakehousehassomecharacteristicsofbothlakeandwarehouse
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC25
Industry
Trends
?QuantifiedRequirementsare:
—DatabaseMacro
structure
Projected
Growth**
—Scale
—Workload(e.g.,queryclassesandfrequencies)
—Userpopulation
—ServiceLevels
Estimateinranges,but
estimate!!
DefiningQuantifiedArchitecturalRequirements
Business**Interests/Vision
CurrentNeeds
Projected
Requirements
(Quantified)
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC26
DoaQuantifiedEvaluation,withMeasurement
WorkingfromyourEstimateofMacro-Requirements
1.Canthecandidateplatformsperformatyourtargetlevelsofscaleandcomplexity?
?Notonlyfeatures,butanengineeringassessment
?Requireproductionreferenceswithsimilarmacro-structureandscale
2.Criti
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經權益所有人同意不得將文件中的內容挪作商業或盈利用途。
- 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 山西鐵道職業技術學院《納稅會計》2023-2024學年第二學期期末試卷
- 陜西省咸陽市乾縣二中2025屆高二下化學期末預測試題含解析
- 上海市徐匯區上海中學、復旦附中等八校2024-2025學年高二下物理期末達標檢測模擬試題含解析
- 襄陽科技職業學院《軌道交通建設項目管理》2023-2024學年第二學期期末試卷
- 膿毒性休克的護理措施
- 浙江機電職業技術學院《全媒體編輯實務》2023-2024學年第二學期期末試卷
- 中國石油大學(北京)《聚合物材料的表面與界面》2023-2024學年第二學期期末試卷
- 上海市2025屆數學高二第二學期期末綜合測試模擬試題含解析
- 武漢文理學院《邊坡工程》2023-2024學年第二學期期末試卷
- 四川建筑職業技術學院《粉體工程》2023-2024學年第二學期期末試卷
- 山東省青島市平度市2024屆中考二模語文試題含解析
- GB/T 43635-2024法庭科學DNA實驗室檢驗規范
- 門診突發事件應急處理培訓
- 安全生產重在提升執行力
- 建筑工程《擬投入本項目的主要施工設備表及試驗檢測儀器設備表》
- 亞健康調理行業:調理產品效果評估
- 小學語文作文:五感法描寫課件
- 常用不規則動詞變化表
- 《法律的基本原則》
- 酒店客房技能實訓中式鋪床
- 物理競賽所有公式
評論
0/150
提交評論