系統管理與監控_第1頁
系統管理與監控_第2頁
系統管理與監控_第3頁
系統管理與監控_第4頁
系統管理與監控_第5頁
已閱讀5頁,還剩92頁未讀 繼續免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

大數據系統的部署、調度與監控徐葳本次課的目標系統管理的重要性從裸機到大數據系統系統全局狀態的維護和管理一致性與Chubby

/

Zookeeper任務調度軟硬件系統的監控怎么聽本節課我ResearcherPractitionerSplit

personality一個系統管理員(我)的血淚我維護的200節點集群ProductionSupport100sofresearchersrunning“bigdata”workloadSystemsResearchSelf-drivingbigdatainfrastructureEverythingmanagedby…HPC:Thegoodolddays

(forsysadmins)Rocks

Cluster

RollsJohn

Boyle.

Biology

must

develop

its

own

big-data

systems.

Nature

(world

view).

July

2013

Demand

1:customers

want

flexibility

…Motivation2:

Customersdemand

performance

…Prof.DavidHausslerBiologist

at

UC

Santa

CruzGodDamnI/O!Wehavea

variety

of

applicationsScientific

Image

ProcessingCryo-EM

and

Protein

StructureSocial“BigData”Social

NetworkingOnlineEducationDataLotsofdependencies…Natural

Language

Processing*ImagecourtesyofProf.GerarddeMelo@TsinghuaResource

hungry

too

…CC++JavaGenomeAnalysisCustomer’s

needs

change

…Protein

DesignCustomer’s

needs

change

…Protein

DesignCustomer’s

needs

change

…Protein

DesignCustomer’s

needs

change

…Protein

Design系統的部署:從裸機到大數據系統Source:

Juju

website基本想法:安裝一臺機器,自動安裝所有其他機器Rocks

Cluster

RollsHeadComputeNodesNoapplicableroll?==Sorry網絡和硬件的配置IPMIDNSCiscoRouterRAIDBMCVPNFirewall解決方案:定制化服務器+整機架交付開放數據中心委員會(ODCC)整機架OCP整機架*集裝箱規模的交付和部署Photo

from

Lintao

Zhang硬件支持如何遠程控制裸機?IntelligentPlatformManagementInterface(IPMI)實現方法:專用BMC芯片功能:重啟機器,

Console,電壓,溫度,網絡連接PXE(網絡啟動)Bootp,

TFTP操作系統和基礎架構CentOSGPUDriversLDAPImageServersCobblerLocalRepoStorageOSoptimizationNetworkDriversSSOOSDriversSecurityStorageSR-IOV解決方案:配置管理Ubuntu

MAAS

(Metal

as

a

Service)把配置轉化為程序流行的配置管理工具配置管理:可視化Figure

from

Juju

website項目要求和截止期階段0:項目選擇和組隊(本周)階段1:與用戶初次交流,提交需求分析與項目計劃(11月11日)階段2:至少每兩周與用戶交流一次,提交階段性報告(11月25日,12月9日,12月23日)階段3:進行項目展示(12月30日課上)階段4:提交項目報告(17周末)課程項目comments1.不少組背景部分離項目本身有點遠,和項目相關的部分一筆帶過,感覺有點像湊篇幅的;

2.有幾個組需要自己爬數據,而寫爬蟲代碼和搭系統的又是一波人,盡量別耽誤了后面的部分;

3.有些組給的技術路線只是現有技術的介紹,還沒有組織好,可能會影響后面的進度;

4.那幾個被安排志愿組做得都挺好的,值得鼓勵課程項目特別提醒不能抄襲(加上了出處也不行)引用一些圖片可以,但是必須注明出處Hadoop作業問題?系統全局狀態的維護與管理系統的全局狀態:挑戰GFS

--

masterMapReduce

masterDryad

master問題:誰是master節點?如果master節點掛了?解決?找一個人來決定誰是master問題?Chubby的解決方案:A

servicethatprovidessynchronization(leaderelection,sharedenv.info.)reliabilityavailabilityeasy-to-understandsemanticsperformance,throughput,latencyonlysecondaryPrimaryElectionDistributedconsensusproblemAsynchronouscommunicationloss,delay,reorderingWhy

it

is

hard?

FLPimpossibilityresultAmodel:twogeneralproblemTwoarmiesareonoppositesidesofacityinthevalleyThetwogeneralsshouldcoordinatetheattack;eachhasaninitialvalue(attackorretreat)Theonlycommunicationisthroughsendingmessengerswhicharepronetobeingcaptured/lostinthevalleyNodeterministicalgorithmforreachingconsensus!ProofbycontradictionFischer-Lynch-Paterson(FLP)Evenifwehavereliablemessagedelivery…Noconsensuscanbeguaranteedinanasynchronouscommunicationsysteminthepresenceofanyfailures.Intuitiona“failed”processmayjustbeslow,andcanrisefromthedeadatexactlythewrongtime.PaxosIntroductionPaxosisanasynchronousconsensusalgorithm.FLPresultsaysnoasynchronousconsensusalgorithmcanguaranteebothsafetyandliveness.Paxosisguaranteedsafe.Consensusisastableproperty:oncereacheditisneverviolated;theagreedvalueisnotchanged.Paxosisnotguaranteedlive.Consensusisreachedif“alargeenoughsubnetwork...isnon-faultyforalongenoughtime.”OtherwisePaxosmightneverterminate.Paxos:

the

namePaxosConsensus

ModelLeslieLamportTuring

Award,

2013“fundamentalcontributionstothetheoryandpracticeofdistributedandconcurrentsystems,notablytheinventionofconceptssuchascausalityandlogicalclocks,safetyandliveness,replicatedstatemachines,andsequentialconsistency”LaTeXSequentialconsistencyByzantinefaulttolerancePaxosalgorithmPhoto

from

WikipediaAPaxosRoundReplicated

State

MachineMaintainreplicasbyexecutingoperationsinexactly

the

sameorderRequiresallreplicasto“agree”onthe(setand)orderofoperationsThepoint:ifoneserverfails,canuseotherservers,whichhaveexactlythesamestateUsing

PaxosThree

(Five)

replicas

Clientscan

anyreplica(notjustprimary)Serverappendseachclientoptoareplicated*log*ofoperationsPut,Get,

Update,

DeleteNumberedlogentries–“instances”–seqPaxosagreementoncontentofeachlogentrynote:eachinstance(logentry)isanentirelyseparatePaxosagreement

withentirelyseparateproposalnumbersUsing

Paxos

to

replicate

statesKV

Server

Paxos

Peer

(library)Other

peersLogGET(a)PUT(a,b)……..Instances(LogEntry)

#Client

OpsExample

1:WriteKvpaxosServerS1KvpaxosServer

S2KvpaxosServer

S3Client

1PUT(a,b)LogEntry3,

PUT(a,b)LogEntry3,

PUT(a,b)LogEntry3,

PUT(a,b)LogEntry

3PUT(a,b)……..……..Example2:ReadKvpaxosServerS1KvpaxosServer

S2KvpaxosServer

S3Client

2GET(a)LogEntry4,

GET(a)LogEntry4,

GET(a)LogEntry4,

GET(a)PUT(a,b)GET(a)……..……..LogEntry

4Scan

upto

LogEntry4Consistent

during

a

PartitionKvpaxosServerS1KvpaxosServer

S2KvpaxosServer

S3Client

1Client

2Client

3GETPOSTPOSTPartitionWorks!Does

not

work!Chubby

Design:SystemStructureTwomaincomponents:server(Chubbycell)clientlibraryFigure

from

the

Chubby

paperDesign:Files,Dirs,HandlesFSinterface/ls/cs6464-cell/lab2/testspecializedAPIalsoviainterfaceusedbyGFSLock

LeasesSessionmaintainedthroughKeepAlivesHandles,locks,cacheddataremainvalidclientmustacknowledgeinvalidationmessagesTerminatedexplicitly,orafterleasetimeoutZooKeeperServiceServerServerServerServerServerServerLeaderOpen

source

alternative:

ZooKeeperClientClientClientClientClientClientClientClientAllserversstoreacopyofthedata(inmemory)?AleaderiselectedatstartupFollowersserviceclients,allupdatesgothroughleaderUpdateresponsesaresentwhenamajorityofservershavepersistedthechangeExample

use

of

Zookeeper(Well

known

address

for

Zookeeper)圖片復制于任務調度:問題和挑戰Problem:

ResourceSharinginDataCentersProblemNosingleframeworkoptimalforallapplicationsWanttorunmultipleframeworksinasinglecluster…tomaximizeutilization…tosharedatabetweenframeworksHadoopPregelMPISharedclusterSlide

from

Lintao

ZhangSolution:ResourceSchedulerResourceManagerNodeNodeNodeNodeHadoopPregel…NodeNodeHadoopNodeNodePregel…Slide

from

Lintao

ZhangWhat

are

the

“demands”?Multiple

usersJobs

tasksEach

have

different

requirementsRequests

coming

in

over

time

(online)What

are

the

“resources”?CPU,

RAM,

Disk

spaceNetworkingSpecial

constraints

Location,

colocationSpecial

hardwareGoals

for

the

scheduler

(1)Whatresourcesareavailable?resourcetracking

(who

already

has

what)failure

handlingGoals

for

the

scheduler

(2)Who

can

get

whatresource(andwhen)?FairnessImprove

utilizationImprove

average

completion

timeImprove

power

efficiency(often)

conflicting

goalsGoals

for

the

scheduler

(3)Howcantheuseraccesstheresource?namingOthergoalsEnsureuserisolation

(Container,

VMs)Allow

users

to

monitor

their

servicesA

description

language

/

UI

for

resource

specs任務調度舉例:BorgBorg10+

years

@

GoogleManaging

millions

of

machinesResources

managed

by

Borg~10,000

(median)

servers

per

cellHeterogeneous

machinesSize,

processor

type,

external

IPs,

peformanceSpecial

hardware

like

SSDDemandsJob

TasksDifferent

sizesProd

/

non-prodOnline

and

batchRequirement

descriptions

written

in

BCLCan

“update”

task

requirementsRolling

updatesBorg

ArchitectureSource:

Borg

EuroSys

paperHow

Borg

achieved

the

goalsResource

TrackingThrough

Borglets

(local

agents

on

each

machine)Monitoring

+

executions(logically)

single

central

Borg

MasterFault

tolerant

using

Chubby

(always

knows

which

is

the

current

master)Records

all

jobs

in

Paxos

storeBorg

Scheduling

PolicyPriority

+

admission

controlUsed

a

scoring

mechanism

Minimize

the

cost

change

when

placing

a

jobVs.

“best

fit”NamingBorg

names

a

process

with

an

IP

address

+

ports

To

allow

different

jobs

runs

on

a

single

machineShould

this

be

done

by

the

scheduler?Other

things

Borg

handlesPackage

distribution

(how

to

copy

the

binaries

to

all

machines)AutoscalingRe-packing

tasksContainers

to

do

performance

isolationMonitoring

UIDebugging

UITracing

Integration

(later)BCL

(Borg

Configuration

Language)Local

disk

management……LessonsThe

Borg

master

should

be

the

kernel

of

the

data

centerOther

things

can

move

to

separate

servicesShould

simplify

Naming

and

addressing

management

Should

have

multiple

ways

to

group

tasks

(not

necessarily

jobs)Too

much

optimizations

for

power

users,

too

complicated.(230

specifications

in

BCL)

Open

source:

kubernetes任務調度:MesosMesos

DemoMesosArchitectureSlide

from

Lintao

ZhangResourceOfferingResourceoffersOfferavailableresourcestoframeworks,letthempickwhichresourcestouseandwhichtaskstolaunch

KeepsMesossimple,letsitsupportfutureframeworksDecentralizeddecisionsmightnotbeoptimalOptimization:Letframeworksshort-circuitrejectionbyprovidingapredicateonresourcestobeofferedE.g.“nodesfromlistL”or“nodeswith>8GBRAM”CouldgeneralizetootherhintsaswellSlide

from

Lintao

Zhang任務調度:sparrow問題:scheduler太慢怎么辦?分布式?集中的scheduler:知道全局資源的狀態分散的scheduler:同步狀態?10min.10sec.100ms1ms2004:MapReducebatchjob2009:Hivequery2010:DremelQuery2012:Impalaquery2010:In-memorySparkquery2013:SparkstreamingOn

100016-coremachines26decisions/secondSchedulerthroughput1.6Kdecisions/second160Kdecisions/second16Mdecisions/secondFigure

from

KayOusterhout

et

al.

Sparrow

presentation多個scheduler的問題?WorkerWorkerWorkerWorkerWorkerSchedulerSchedulerSchedulerSchedulerJobWorkerFigure

from

KayOusterhout

et

al.

Sparrow

presentationPer-tasksamplingWorkerWorkerWorkerWorkerWorkerSchedulerSchedulerSchedulerSchedulerJobWorkerPowerofTwoChoicesFigure

from

KayOusterhout

et

al.

Sparrow

presentationPer-tasksamplingWorkerWorkerWorkerWorkerWorkerSchedulerSchedulerSchedulerSchedulerJobWorkerPowerofTwoChoicesFigure

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業或盈利用途。
  • 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論