




版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
大數據系統的部署、調度與監控徐葳本次課的目標系統管理的重要性從裸機到大數據系統系統全局狀態的維護和管理一致性與Chubby
/
Zookeeper任務調度軟硬件系統的監控怎么聽本節課我ResearcherPractitionerSplit
personality一個系統管理員(我)的血淚我維護的200節點集群ProductionSupport100sofresearchersrunning“bigdata”workloadSystemsResearchSelf-drivingbigdatainfrastructureEverythingmanagedby…HPC:Thegoodolddays
(forsysadmins)Rocks
Cluster
RollsJohn
Boyle.
Biology
must
develop
its
own
big-data
systems.
Nature
(world
view).
July
2013
Demand
1:customers
want
flexibility
…Motivation2:
Customersdemand
performance
…Prof.DavidHausslerBiologist
at
UC
Santa
CruzGodDamnI/O!Wehavea
variety
of
applicationsScientific
Image
ProcessingCryo-EM
and
Protein
StructureSocial“BigData”Social
NetworkingOnlineEducationDataLotsofdependencies…Natural
Language
Processing*ImagecourtesyofProf.GerarddeMelo@TsinghuaResource
hungry
too
…CC++JavaGenomeAnalysisCustomer’s
needs
change
…Protein
DesignCustomer’s
needs
change
…Protein
DesignCustomer’s
needs
change
…Protein
DesignCustomer’s
needs
change
…Protein
Design系統的部署:從裸機到大數據系統Source:
Juju
website基本想法:安裝一臺機器,自動安裝所有其他機器Rocks
Cluster
RollsHeadComputeNodesNoapplicableroll?==Sorry網絡和硬件的配置IPMIDNSCiscoRouterRAIDBMCVPNFirewall解決方案:定制化服務器+整機架交付開放數據中心委員會(ODCC)整機架OCP整機架*集裝箱規模的交付和部署Photo
from
Lintao
Zhang硬件支持如何遠程控制裸機?IntelligentPlatformManagementInterface(IPMI)實現方法:專用BMC芯片功能:重啟機器,
Console,電壓,溫度,網絡連接PXE(網絡啟動)Bootp,
TFTP操作系統和基礎架構CentOSGPUDriversLDAPImageServersCobblerLocalRepoStorageOSoptimizationNetworkDriversSSOOSDriversSecurityStorageSR-IOV解決方案:配置管理Ubuntu
MAAS
(Metal
as
a
Service)把配置轉化為程序流行的配置管理工具配置管理:可視化Figure
from
Juju
website項目要求和截止期階段0:項目選擇和組隊(本周)階段1:與用戶初次交流,提交需求分析與項目計劃(11月11日)階段2:至少每兩周與用戶交流一次,提交階段性報告(11月25日,12月9日,12月23日)階段3:進行項目展示(12月30日課上)階段4:提交項目報告(17周末)課程項目comments1.不少組背景部分離項目本身有點遠,和項目相關的部分一筆帶過,感覺有點像湊篇幅的;
2.有幾個組需要自己爬數據,而寫爬蟲代碼和搭系統的又是一波人,盡量別耽誤了后面的部分;
3.有些組給的技術路線只是現有技術的介紹,還沒有組織好,可能會影響后面的進度;
4.那幾個被安排志愿組做得都挺好的,值得鼓勵課程項目特別提醒不能抄襲(加上了出處也不行)引用一些圖片可以,但是必須注明出處Hadoop作業問題?系統全局狀態的維護與管理系統的全局狀態:挑戰GFS
--
masterMapReduce
–
masterDryad
–
master問題:誰是master節點?如果master節點掛了?解決?找一個人來決定誰是master問題?Chubby的解決方案:A
servicethatprovidessynchronization(leaderelection,sharedenv.info.)reliabilityavailabilityeasy-to-understandsemanticsperformance,throughput,latencyonlysecondaryPrimaryElectionDistributedconsensusproblemAsynchronouscommunicationloss,delay,reorderingWhy
it
is
hard?
FLPimpossibilityresultAmodel:twogeneralproblemTwoarmiesareonoppositesidesofacityinthevalleyThetwogeneralsshouldcoordinatetheattack;eachhasaninitialvalue(attackorretreat)Theonlycommunicationisthroughsendingmessengerswhicharepronetobeingcaptured/lostinthevalleyNodeterministicalgorithmforreachingconsensus!ProofbycontradictionFischer-Lynch-Paterson(FLP)Evenifwehavereliablemessagedelivery…Noconsensuscanbeguaranteedinanasynchronouscommunicationsysteminthepresenceofanyfailures.Intuitiona“failed”processmayjustbeslow,andcanrisefromthedeadatexactlythewrongtime.PaxosIntroductionPaxosisanasynchronousconsensusalgorithm.FLPresultsaysnoasynchronousconsensusalgorithmcanguaranteebothsafetyandliveness.Paxosisguaranteedsafe.Consensusisastableproperty:oncereacheditisneverviolated;theagreedvalueisnotchanged.Paxosisnotguaranteedlive.Consensusisreachedif“alargeenoughsubnetwork...isnon-faultyforalongenoughtime.”OtherwisePaxosmightneverterminate.Paxos:
the
namePaxosConsensus
ModelLeslieLamportTuring
Award,
2013“fundamentalcontributionstothetheoryandpracticeofdistributedandconcurrentsystems,notablytheinventionofconceptssuchascausalityandlogicalclocks,safetyandliveness,replicatedstatemachines,andsequentialconsistency”LaTeXSequentialconsistencyByzantinefaulttolerancePaxosalgorithmPhoto
from
WikipediaAPaxosRoundReplicated
State
MachineMaintainreplicasbyexecutingoperationsinexactly
the
sameorderRequiresallreplicasto“agree”onthe(setand)orderofoperationsThepoint:ifoneserverfails,canuseotherservers,whichhaveexactlythesamestateUsing
PaxosThree
(Five)
replicas
Clientscan
anyreplica(notjustprimary)Serverappendseachclientoptoareplicated*log*ofoperationsPut,Get,
Update,
DeleteNumberedlogentries–“instances”–seqPaxosagreementoncontentofeachlogentrynote:eachinstance(logentry)isanentirelyseparatePaxosagreement
withentirelyseparateproposalnumbersUsing
Paxos
to
replicate
statesKV
Server
Paxos
Peer
(library)Other
peersLogGET(a)PUT(a,b)……..Instances(LogEntry)
#Client
OpsExample
1:WriteKvpaxosServerS1KvpaxosServer
S2KvpaxosServer
S3Client
1PUT(a,b)LogEntry3,
PUT(a,b)LogEntry3,
PUT(a,b)LogEntry3,
PUT(a,b)LogEntry
3PUT(a,b)……..……..Example2:ReadKvpaxosServerS1KvpaxosServer
S2KvpaxosServer
S3Client
2GET(a)LogEntry4,
GET(a)LogEntry4,
GET(a)LogEntry4,
GET(a)PUT(a,b)GET(a)……..……..LogEntry
4Scan
upto
LogEntry4Consistent
during
a
PartitionKvpaxosServerS1KvpaxosServer
S2KvpaxosServer
S3Client
1Client
2Client
3GETPOSTPOSTPartitionWorks!Does
not
work!Chubby
Design:SystemStructureTwomaincomponents:server(Chubbycell)clientlibraryFigure
from
the
Chubby
paperDesign:Files,Dirs,HandlesFSinterface/ls/cs6464-cell/lab2/testspecializedAPIalsoviainterfaceusedbyGFSLock
LeasesSessionmaintainedthroughKeepAlivesHandles,locks,cacheddataremainvalidclientmustacknowledgeinvalidationmessagesTerminatedexplicitly,orafterleasetimeoutZooKeeperServiceServerServerServerServerServerServerLeaderOpen
source
alternative:
ZooKeeperClientClientClientClientClientClientClientClientAllserversstoreacopyofthedata(inmemory)?AleaderiselectedatstartupFollowersserviceclients,allupdatesgothroughleaderUpdateresponsesaresentwhenamajorityofservershavepersistedthechangeExample
use
of
Zookeeper(Well
known
address
for
Zookeeper)圖片復制于任務調度:問題和挑戰Problem:
ResourceSharinginDataCentersProblemNosingleframeworkoptimalforallapplicationsWanttorunmultipleframeworksinasinglecluster…tomaximizeutilization…tosharedatabetweenframeworksHadoopPregelMPISharedclusterSlide
from
Lintao
ZhangSolution:ResourceSchedulerResourceManagerNodeNodeNodeNodeHadoopPregel…NodeNodeHadoopNodeNodePregel…Slide
from
Lintao
ZhangWhat
are
the
“demands”?Multiple
usersJobs
–
tasksEach
have
different
requirementsRequests
coming
in
over
time
(online)What
are
the
“resources”?CPU,
RAM,
Disk
spaceNetworkingSpecial
constraints
Location,
colocationSpecial
hardwareGoals
for
the
scheduler
(1)Whatresourcesareavailable?resourcetracking
(who
already
has
what)failure
handlingGoals
for
the
scheduler
(2)Who
can
get
whatresource(andwhen)?FairnessImprove
utilizationImprove
average
completion
timeImprove
power
efficiency(often)
conflicting
goalsGoals
for
the
scheduler
(3)Howcantheuseraccesstheresource?namingOthergoalsEnsureuserisolation
(Container,
VMs)Allow
users
to
monitor
their
servicesA
description
language
/
UI
for
resource
specs任務調度舉例:BorgBorg10+
years
@
GoogleManaging
millions
of
machinesResources
managed
by
Borg~10,000
(median)
servers
per
cellHeterogeneous
machinesSize,
processor
type,
external
IPs,
peformanceSpecial
hardware
like
SSDDemandsJob
TasksDifferent
sizesProd
/
non-prodOnline
and
batchRequirement
descriptions
written
in
BCLCan
“update”
task
requirementsRolling
updatesBorg
ArchitectureSource:
Borg
EuroSys
paperHow
Borg
achieved
the
goalsResource
TrackingThrough
Borglets
(local
agents
on
each
machine)Monitoring
+
executions(logically)
single
central
Borg
MasterFault
tolerant
using
Chubby
(always
knows
which
is
the
current
master)Records
all
jobs
in
Paxos
storeBorg
Scheduling
PolicyPriority
+
admission
controlUsed
a
scoring
mechanism
Minimize
the
cost
change
when
placing
a
jobVs.
“best
fit”NamingBorg
names
a
process
with
an
IP
address
+
ports
To
allow
different
jobs
runs
on
a
single
machineShould
this
be
done
by
the
scheduler?Other
things
Borg
handlesPackage
distribution
(how
to
copy
the
binaries
to
all
machines)AutoscalingRe-packing
tasksContainers
to
do
performance
isolationMonitoring
UIDebugging
UITracing
Integration
(later)BCL
(Borg
Configuration
Language)Local
disk
management……LessonsThe
Borg
master
should
be
the
kernel
of
the
data
centerOther
things
can
move
to
separate
servicesShould
simplify
Naming
and
addressing
management
Should
have
multiple
ways
to
group
tasks
(not
necessarily
jobs)Too
much
optimizations
for
power
users,
too
complicated.(230
specifications
in
BCL)
Open
source:
kubernetes任務調度:MesosMesos
DemoMesosArchitectureSlide
from
Lintao
ZhangResourceOfferingResourceoffersOfferavailableresourcestoframeworks,letthempickwhichresourcestouseandwhichtaskstolaunch
KeepsMesossimple,letsitsupportfutureframeworksDecentralizeddecisionsmightnotbeoptimalOptimization:Letframeworksshort-circuitrejectionbyprovidingapredicateonresourcestobeofferedE.g.“nodesfromlistL”or“nodeswith>8GBRAM”CouldgeneralizetootherhintsaswellSlide
from
Lintao
Zhang任務調度:sparrow問題:scheduler太慢怎么辦?分布式?集中的scheduler:知道全局資源的狀態分散的scheduler:同步狀態?10min.10sec.100ms1ms2004:MapReducebatchjob2009:Hivequery2010:DremelQuery2012:Impalaquery2010:In-memorySparkquery2013:SparkstreamingOn
100016-coremachines26decisions/secondSchedulerthroughput1.6Kdecisions/second160Kdecisions/second16Mdecisions/secondFigure
from
KayOusterhout
et
al.
Sparrow
presentation多個scheduler的問題?WorkerWorkerWorkerWorkerWorkerSchedulerSchedulerSchedulerSchedulerJobWorkerFigure
from
KayOusterhout
et
al.
Sparrow
presentationPer-tasksamplingWorkerWorkerWorkerWorkerWorkerSchedulerSchedulerSchedulerSchedulerJobWorkerPowerofTwoChoicesFigure
from
KayOusterhout
et
al.
Sparrow
presentationPer-tasksamplingWorkerWorkerWorkerWorkerWorkerSchedulerSchedulerSchedulerSchedulerJobWorkerPowerofTwoChoicesFigure
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經權益所有人同意不得將文件中的內容挪作商業或盈利用途。
- 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 電力行業數據監控報告表
- 分析制造業中質量管理體系的建設與實施
- 六一創意綜合活動方案
- 六一散打活動方案
- 六一治水活動方案
- 六一活動游園活動方案
- 六一活動迪士尼活動方案
- 六一活動餃子活動方案
- 六一燈謎活動方案
- 六一節活動童裝活動方案
- 2024年海南省中考數學試題卷(含答案解析)
- 2024年選拔鄉鎮副科級領導干部考試模擬試題及答案
- 2023秋北師版八上數學 第一章 勾股定理 單元測試卷【含答案】
- 2024年全國青少年航天創新大賽航天知識競賽試題
- 道路危險貨物運輸押運人員資格考試復習題庫及答案
- MOOC 微生物學-浙江工業大學 中國大學慕課答案
- 國家開放大學《Python語言基礎》實驗2:基本數據類型和表達式計算參考答案
- 吉蘭-巴雷綜合征
- “項目路演”評分細則
- 小學科學課上教師指導學生
- 焊接技術的應用與發展課件
評論
0/150
提交評論