大數據流式架構生態分析-

上傳人：1*** IP屬地：湖北上傳時間：2023-11-30 格式：PPTX 頁數：47 大小：1.46MB 積分：6 舉報 版權申訴

已閱讀5頁，還剩42頁未讀，繼續免費閱讀

版權說明：本文檔由用戶提供并上傳，收益歸屬內容提供方，若內容存在侵權，請進行舉報或認領

文檔簡介

大數據流式架構生態分析Functional

ComparisonandPerformanceEvaluationOverviewStreaming

CoreMISCPerformance

BenchmarkChoose

your

weapon

!Execution

Model

+Fault

ToleranceMechanismApache

SparkStreaming*AapcheFlink*ApacheStorm*Apache

StormTrident*ApacheGearpump*TwitterHeron*Micro-BatchSourceOperatorSinkContinuousStreaming4SourceOperatorSinkAapcheFlink*Storm*Apache

StormTrident*ApacheGearpump*This

the

critical

part,

affects

manyfeaturesMicro-BatchCheckpoint

perABpaacthce

hSpark

Streaming*Continuous

StreamingCheckpoint

“per

Batch”AckerJobManager/

HDFSidoffsetstate

strackSourceOperatoSinkSourceOperatoSinkSourceOperatoSinkrrrDriverStorageStoragejob

statusHDFSidoffsetstate

strContinuousStreaming

AckAppacehre

Record

TwitterHeron*Storage5Low

LatencyHigh

OverheadLowThroughputLow

OverheadHighThroughput6AapcheFlink*Storm*Apache

StormTrident*ApacheGearpump*rHeron*Micro-BatchCheckpoint

perABpaacthce

hSpark

Streaming*Continuous

StreamingCheckpoint

“per

Batch”ContinuousStreaming

AckAppacehre

Record

TwitteDelivery

GuaranteeAt

least

onceExactly

once

Ackers

know

about

arecord

processedsuccessfully

not.

itfailed,

replay

it.

There

stateconsistency

guarantee.

State

persisted

indurable

storage

Checkpoint

linkedwithstate

storage

perBatchApache

SparkStreaming*AapcheFlink*ApacheStorm*Apache

StormTrident*ApacheGearpump*TwitterHeron*Native

StateOperatorYes*YesFlink

Java

API:ValueStateListStateReduceStateFlink

Scala

API:·

mapWithStateGearpump·

persistStateYesSpark

1.5:·

updateStateByKeySpark

1.6:·

mapWithStateTrident:persistentAggregateStateStorm:·

KeyValueStateHeron:X

UserMaintainApache

SparkStreaming*AapcheFlink*ApacheStorm*Apache

StormTrident*ApacheGearpump*TwitterHeron*APIDeclarativeHigher

order

function

operators

(filter,

mapWithState…)Logical

plan

optimizationDataStream<String>

text

env.readTextFile(params.get("input"));DataStream<Tuple2<String,

Integer>>

counts

text.flatMap(new

Tokenizer()).keyBy(0).sum(1);“foo,

foo,

bar”“foo”,

“foo”,

“bar”

{“foo”:

“foo”:

“ba{r“”f:o1o}”:

“bar”:

1}11Apache

SparkStreaming*Apache

StormTrident*AapcheFlink*ApacheGearpumpStatistical

Data

scientistfriendlyDynamic

typePythonlines

ssc.textFileStream(params.get("input"))

words

lines.flatMap(lambda

line:

line.split(“,"))pairs

words.map(lambda

word:

(word,

1))counts

pairs.reduceByKey(lambda

y)counts.saveAsTextFiles(params.get("output"))Rlines

textFile(sc,

“input”)words

flatMap(lines,

function(line)

{strsplit(line,

“

”)[[1]]})wordCount

lapply(words,

function(word){

list(word,

1L)}counts

reduceByKey(wordCount,

“+”,

2L)StructuredStreaming*12Apache

SparkStreaming*ApacheStorm*TwitterHeron*ApacheStorm*SQLORDERS

(ID

INT

PRIMARY

KEY,

UNIT_PRICE

INT,

QUANTITYINT)LOCATION

"kafka://localhost:2181/brokers?topic=orders"TBLPROPERTIES

"{...}}‘INSERT

INTO

LARGE_ORDERS

SELECT

ID,

UNIT_PRICE

*QUANTITYAS

TOTAL

FROM

ORDERS

WHERE

UNIT_PRICE

QUANTITY

>50bin/

storm

sql

XXXX.

sqlInputDStream.transform((rdd:

RDD[Order],

time:

Time)

=>{import

sqlContext.implicits._rdd.toDF.registAsTempTableval

SQL

"SELECT

ID,

UNIT_PRICE

QUANTITY

ASTOTAL

FROM

ORDERS

WHERE

UNIT_PRICE

*QUANTITY

50"val

largeOrderDF

sqlContext.sql(SQL)largeOrderDF.toRDD})Fusion

StylePure

StyleCREATE

EXTERNAL

TABLE13Apache

SparkStreaming*AapcheFlink*StructuredStreamingApache

StormTrident*Summary14CompositionalDeclarativePython/RSQLApache

SparkStreaming*X√√√ApacheStorm*√X√NOT

supportaggregation,windowing

andjoiningApache

StormTrident*X√XApacheGearpump*√√XXAapcheFlink*X√XSupport

select,

from,where,

unionTwitterHeron*√X√XRuntimeModel

Multi

Tasks

Multi

Applications

SingleProcessJVMProcessConnectwithlocal

SMThreadThreadTaskSingle

Task

Single

ProcessTaskTaskJVMProcessThreadTaskTaskJVMProcessTaskThreadThreadtask

from

applicationAThreadThreadtask

from

applicationBTaskTaskJVMProcessThreadTaskConnectwithlocal

SMThread16TwitterAapcheFlink*Multi

Tasks

Single

application

SingleTaskTaskThreadTaskTaskTaskThreadJVMProcessThreThreadado

Multi

tasks

single

threadTaskTaskJVMProcessThreaThreaddTaskTaskJVMProcessThreadTaskThreadThreadTaskTaskTaskJVMProcess17PorSoicnegslse

task

single

thread

Apache

SparkStreaming*ApacheStorm*Apache

StormTrident*ApacheGearpump*Window

Support ●

Out-of-order

Processing ●

Memory

ManagementResource

Management ●

Web

●

Community

MaturityWindowSupportsmaller

than

gapsession

gapttSliding

Window

CountWindowSession

WindowSliding

WindowCount

WindowSession

WindowApache

SparkStreaming*√XXApacheStorm*√√XApache

StormTrident*√√XApacheGearpump*√XXApache

Flink*√√√ApacheHeron*XXX19Out-of-orderProcessing20Processing

TimeEvent

TimeWatermarkApache

SparkStreaming*√√XApacheStorm*√√√√XXApache

StormTrident*ApacheGearpump*√√√AapcheFlink*√√√TwitterHeron*√XXMemoryManagement21JVM

ManageSelf

Manage

on-heapSelf

Manage

off-heap√√√Apache

SparkStreaming*√√√AapcheFlink*√XXApacheStorm*√XXApacheGearpump*√XXTwitterHeron*Resource

Management2StandaloneYARNMesos√√√Apache

SparkStreaming*√√√ApacheStorm*√√√Apache

StormTrident*ApacheGearpump*√√XAapcheFlink*√√XTwitterHeron*√√√Web

UI23SubmitJobsCancelJobsInspectJobsShowStatisticsShowInput

RateCheckExceptionsInspeConfApacheSparkStreaming*X√√√√√√ApacheStorm*X√√√√√√ApacheGearpump*√√√√√√√ApacheFlink*√√√√X√√TwitterHeron*XX√√√√√21612371615147725002000150010005000Resloved78013002341217184202151000800600

4002000Past

Months

Summary

GitHubCommittoCommitsrCommunityMaturityInitiationTimeApacheTopProjectContributorsApacheSparkStreaming*20132014926ApacheStorm*20112014219ApacheGearpump*2014Incubator21ApacheFlink*20102015208TwitterHeron*2014N/A44Spark

Storm20Gearpump

FlinkHeron

Source

website:/apache/spark/pulse/monthlyPast

Months

Summary

JIRACreatedSpark

StormGearpump

FlinkHeronSource

website:

/jira/secure/Dashboard.js24HiBench

6.0“Lazy

Benchmarking”Simple

test

case

infer

practical

use

caseTest

PhilosophicalCluster

SetupApache

Kafka*

ClusterCPU:

Intel(R)

Xeon(R)

CPU

E5-2699

v3@

2.30GHzMem:

128

GBDisk:

HDD

(1TB)Network:

Gbps10

GbpsTest

ClusterCPU:

Intel(R)

Xeon(R)

CPU

E5-2697

v2@

2.70GHzCore:

24Mem:

128

GBDisk:

HDD

(1TB

)Network:

Gbpsx7x3NameVersionJava1.8Scala2.11.7Apache

Hadoop*2.6.2Apache

Zookeeper*3.4.8Apache

Kafka*Apache

Spark*1.6.1Apache

Storm*1.0.1Apache

Flink*1.0.3Apache

Gearpump*0.8.1Apache

Heron*

require

specific

Operation

System

(Ubuntu／CentOS／MacOS）Structured

Streaming

doesn’t

support

Kafka

source

yet (Spark

2.０)27ArchitectureTestCluster

(Standalone)DataGeneratorMetrics

ReaderFileSystemKafkaBrokerKafkaBrokerKafkaBrokerClientMasterSlaveSlaveSlaveSlave20

Core80GMem20

Core80GMemSlave20

Core80GMem20

Core80GMemSlave20

Core80GMem20

Core80GMemSlave20

Core80GMemTopic

ATopic

BResultTopic

AIn

TimeOut

Time

–

TimeFrameworkConfigurationFrameworkRelated

Configuration7

Executor140

Parallelism7

TaskManager140

Parallelism28

Worker140

KafkaSpout28

Executors140

KafkaSource29Apache

SparkStreaming*ApacheStorm*AapcheFlink*ApacheGearpump*Raw

Input

DataKafka

Topic

Partition:

140Size

Per

Message

(configurable):

200

bytesRaw

Input

Message

Example:“0,6,nbizrgdziebsaecsecujfjcqtvnpcnxxwiopmddorcxnlijdizgoi,1991-06-10,0.115967035,Mozilla/5.0

(iPhone;

CPUlike

Mac

X)AppleWebKit/420.1

(KHTML

Gecko)

Version/3.0

Mobile/4A93Safari/419.3,YEM,YEM-AR,snowdrops,1”Strong

Type:

class

UserVisit

(ip,

sessionId,

browser)

Keep

feeding

data

specific

rate

for

5minutes5

minutes30Data

Input

RateThroughputMessage/SecondKafkaProducer

Num40KB/s0.2K1400KB/s2K14MB/s20K140MB/s200K180MB/s400K1400MB/s2M10600MB/s3M15800MB/s4M20Let"s

start

with

the

simplestcaseTest

Case:

IdentityThe

application

reads

input

data

from

Kafka

and

then

writes

resultto

Kafka

immediately,

there

complex

business

logic

involved.Result8765432100100700800P99

Latency

(s)200Apache

Spark*30

400

5000

Input

Ra6t0e0(MB/s)Apache

Flink*Apache

Storm*

without

Apache

Storm*

with

AckAckFor

complete

information

about

performance

and

benchmark

results,

visit

/benchmarks.Results

have

been

estimated

simulated

using

internal

Intel

analysis

architecture

simulation

modeling,

and

provided

you

for

informational

purposes.

Any

differencesin

your

system

hardware,

software

configuration

may

affect

your

actual

performance.Test

Case:

RepartitionBasically,

this

test

case

can

stand

for

the

efficiency

data

shuffle.NetworkShuffleResult01002003004000200800400

600Input

Rate

(MB/s)P99

Latency

(s)Apache

Spark*Apache

Flink*Apache

Storm*

withoutAck

Apache

Gearpump*Apache

Storm*

with

Ack0200400600800020040

60800Input

0Rate

(MB/s)0Throughput

(MB/s)Apache

Spark*Apache

Flink*Apache

Storm*

withoutAck

Apache

Gearpump*Apache

Storm*

with

AckFor

complete

information

about

performance

and

benchmark

results,

visit

/benchmarks.Results

have

been

estimated

simulated

using

internal

Intel

analysis

architecture

simulation

modeling,

and

provided

you

for

informational

purposes.

Any

differencesin

your

system

hardware,

software

configuration

may

affect

your

actual

performance.Observation

Spark

Streaming

need

schedule

task

with

additional

context.

Undertiny

batch

interval

case,

the

overhead

could

dramatic

worsecompared

other

frameworks.According

our

test,

minimum

Batch

Interval

Spark

about

80ms(140

tasks

per

batch),

otherwise

task

schedule

delay

will

keep

increasingRepartition

heavy

for

every

framework,

but

usually

it’s

unavoidable.

Latency

Gearpump

still

quite

low

even

under

800MB/sinput

throughput.Test

Case:

Stateful

WordCountNative

state

operator

supported

all

frameworks

weevaluated

Stateful

operator

performance

Checkpoint/AckercostResult0204060801002000800400

60P99

Latency

(s)InputRate

(MB/s)

0Apache

Flink*Apache

Spark*Apache

Flink*

without

CP Apache

Storm*Apache

Gearpump*80070060050040030020010000200800400

60Input

Rate

(MB/s)0Throughput

(MB/s)Apache

Spark* Apache

Flink*Apache

Storm*

Gearpump*For

complete

information

about

performance

and

benchmark

results,

visit

/benchmarks.Results

have

been

estimated

simulated

using

internal

Intel

analysis

architecture

simulation

modeling,

and

provided

you

for

informational

purposes.

Any

differencesin

your

system

hardware,

software

configuration

may

affect

your

actual

performance.Observation?Exactly-once

semantics

usually

require

state

management

and

checkpoint.But

better

guarantees

come

high

cost.

There

obvious

performance

difference

Flink

when

switchingfault

tolerance

off.Checkpoint

mechanisms

and

storages

play

critical

role

here.Test

Case:

Window

Based

AggregationThis

test

case

manages

10-seconds

slidingwindowResult2001801601401201008060402000200800400

60Input

Rate

(MB/s)

0P99

Latency

(s)Apache

Spark*Apache

Flink*Storm*60050040030020010000200800400

60Input

Rate

(MB/s)0Throughput

(MB/s)Apache

Spark*Apache

Flink*Storm*For

complete

information

about

performance

and

benchmark

results,

visit

/benchmarks.Results

have

been

estimated

simulated

using

internal

Intel

analysis

architecture

simulation

modeling,

and

provided

you

for

informational

purposes.

Any

differencesin

your

system

hardware,

software

configuration

may

affect

your

actual

performance.So

which

streaming

frameworkshould

use?Do

your

ownbenchmarkHiBench

cross

platforms

micro-benchmark

suite

for

bigdata

(/intel-hadoop/HiBench)Open

Source

since

2012Better

streaming

benchmark

supporting

will

included

nextrelease[HiBench

6.0]Legal

DisclaimerNo

license

(express

implied,

estoppel

otherwise)

any

intellectual

property

rights

granted

this

document.Intel

does

not

control

人人文庫> 全部分類> 應用文書 > 作業報告

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯系上傳者。文件的所有權益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網頁內容里面會有圖紙預覽，若沒有圖紙預覽就沒有圖紙。
4. 未經權益所有人同意不得將文件中的內容挪作商業或盈利用途。
5. 人人文庫網僅提供信息存儲空間，僅對用戶上傳內容的表現方式做保護處理，對用戶上傳分享的文檔內容本身不做任何修改或編輯，并不能對任何下載內容負責。
6. 下載文件中如有侵權或不適當內容，請與我們聯系，我們立即糾正。
7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

老太爷的乳妓h开裆裤,久久久久久精品国产三级非禁歌 ,久久久久久久99精品国产片,免费观看交性大片

大數據流式架構生態分析-

文檔簡介

溫馨提示

最新文檔

評論

老太爷的乳妓h开裆裤,久久久久久精品国产三级非禁歌 ,久久久久久久99精品国产片,免费观看交性大片

大數據流式架構生態分析-

文檔簡介

溫馨提示

最新文檔

評論

相關文檔