




版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
TodayStoragetechnologiesandtrendsLocalityofreferenceCachinginthememoryhierarchyRandom-AccessMemory(RAM)KeyfeaturesRAMistraditionallypackagedasachip.Basicstorageunitisnormallyacell(onebitpercell).MultipleRAMchipsformamemory.StaticRAM(SRAM)Eachcellstoresabitwithafourorsix-transistorcircuit.Retainsvalueindefinitely,aslongasitiskeptpowered.Relativelyinsensitivetoelectricalnoise(EMI),radiation,etc.FasterandmoreexpensivethanDRAM.DynamicRAM(DRAM)Eachcellstoresbitwithacapacitor.OnetransistorisusedforaccessValuemustberefreshedevery10-100ms.Moresensitivetodisturbances(EMI,radiation,…)thanSRAM.SlowerandcheaperthanSRAM.SRAMvsDRAMSummary
Trans. Access Needs Needs perbit time refresh? EDC? Cost ApplicationsSRAM 4or6 1X No Maybe 100x CachememoriesDRAM 1 10X Yes Yes 1X Mainmemories, framebuffersConventionalDRAMOrganizationdxwDRAM:dwtotalbitsorganizedasdsupercellsofsizewbitscolsrows01230123Internalrowbuffer16x8DRAMchipaddrdatasupercell(2,1)2bits/8bits/Memorycontroller(to/fromCPU)ReadingDRAMSupercell(2,1)Step1(a):Rowaccessstrobe(RAS)selectsrow2.Step1(b):Row2copiedfromDRAMarraytorowbuffer.ColsRowsRAS=20123012Internalrowbuffer16x8DRAMchip3addrdata2/8/MemorycontrollerReadingDRAMSupercell(2,1)Step2(a):Columnaccessstrobe(CAS)selectscolumn1.Step2(b):Supercell(2,1)copiedfrombuffertodatalines,andeventuallybacktotheCPU.ColsRows01230123Internalrowbuffer16x8DRAMchipCAS=1addrdata2/8/Memorycontrollersupercell(2,1)supercell(2,1)ToCPUMemoryModules:supercell(i,j)64MBmemorymoduleconsistingofeight8Mx8DRAMsaddr(row=i,col=j)MemorycontrollerDRAM7DRAM00317815162324326339404748555664-bitdoublewordatmainmemoryaddressAbits0-7bits8-15bits16-23bits24-31bits32-39bits40-47bits48-55bits56-6364-bitdoubleword03178151623243263394047485556EnhancedDRAMsBasicDRAMcellhasnotchangedsinceitsinventionin1966.CommercializedbyIntelin1970.DRAMcoreswithbetterinterfacelogicandfasterI/O:SynchronousDRAM(SDRAM)UsesaconventionalclocksignalinsteadofasynchronouscontrolAllowsreuseoftherowaddresses(e.g.,RAS,CAS,CAS,CAS)Doubledata-ratesynchronousDRAM(DDRSDRAM)DoubleedgeclockingsendstwobitspercycleperpinDifferenttypesdistinguishedbysizeofsmallprefetchbuffer:DDR(2bits),DDR2(4bits),DDR4(8bits)By2010,standardformostserveranddesktopsystemsIntelCorei7supportsonlyDDR3SDRAMNonvolatileMemoriesDRAMandSRAMarevolatilememoriesLoseinformationifpoweredoff.NonvolatilememoriesretainvalueevenifpoweredoffRead-onlymemory(ROM):programmedduringproductionProgrammableROM(PROM):canbeprogrammedonceEraseablePROM(EPROM):canbebulkerased(UV,X-Ray)ElectricallyeraseablePROM(EEPROM):electronicerasecapabilityFlashmemory:EEPROMswithpartial(sector)erasecapabilityWearsoutafterabout100,000erasings.UsesforNonvolatileMemoriesFirmwareprogramsstoredinaROM(BIOS,controllersfordisks,networkcards,graphicsaccelerators,securitysubsystems,…)Solidstatedisks(replacerotatingdisksinthumbdrives,smartphones,mp3players,tablets,laptops,…)DiskcachesTraditionalBusStructureConnecting
CPUandMemoryAbusisacollectionofparallelwiresthatcarryaddress,data,andcontrolsignals.Busesaretypicallysharedbymultipledevices.MainmemoryI/ObridgeBusinterfaceALURegisterfileCPUchipSystembusMemorybusMemoryReadTransaction(1)CPUplacesaddressAonthememorybus.
ALURegisterfileBusinterfaceA0AxMainmemoryI/Obridge%eaxLoadoperation:
movlA,%eaxMemoryReadTransaction(2)MainmemoryreadsAfromthememorybus,retrieveswordx,andplacesitonthebus.ALURegisterfileBusinterfacex0AxMainmemory%eaxI/ObridgeLoadoperation:
movlA,%eaxMemoryReadTransaction(3)CPUreadwordxfromthebusandcopiesitintoregister%eax.xALURegisterfileBusinterfacexMainmemory0A%eaxI/ObridgeLoadoperation:
movlA,%eaxMemoryWriteTransaction(1)CPUplacesaddressAonbus.Mainmemoryreadsitandwaitsforthecorrespondingdatawordtoarrive.yALURegisterfileBusinterfaceAMainmemory0A%eaxI/ObridgeStoreoperation:
movl%eax,AMemoryWriteTransaction(2)CPUplacesdatawordyonthebus.yALURegisterfileBusinterfaceyMainmemory0A%eaxI/ObridgeStoreoperation:
movl%eax,AMemoryWriteTransaction(3)MainmemoryreadsdatawordyfromthebusandstoresitataddressA.yALUregisterfilebusinterfaceymainmemory0A%eaxI/ObridgeStoreoperation:
movl%eax,AWhat’sInsideADiskDrive?SpindleArmActuatorPlattersElectronics(includingaprocessorandmemory!)SCSIconnectorImagecourtesyofSeagateTechnologyDiskGeometryDisksconsistofplatters,eachwithtwosurfaces.Eachsurfaceconsistsofconcentricringscalledtracks.Eachtrackconsistsofsectorsseparatedbygaps.SpindleSurfaceTracksTrackkSectorsGapsDiskGeometry(Muliple-PlatterView)Alignedtracksformacylinder.Surface0Surface1Surface2Surface3Surface4Surface5CylinderkSpindlePlatter0Platter1Platter2DiskCapacityCapacity:maximumnumberofbitsthatcanbestored.Vendorsexpresscapacityinunitsofgigabytes(GB),where
1GB=109Bytes(Lawsuitpending!Claimsdeceptiveadvertising).Capacityisdeterminedbythesetechnologyfactors:Recordingdensity(bits/in):numberofbitsthatcanbesqueezedintoa1inchsegmentofatrack.Trackdensity(tracks/in):numberoftracksthatcanbesqueezedintoa1inchradialsegment.Arealdensity(bits/in2):productofrecordingandtrackdensity.Moderndiskspartitiontracksintodisjointsubsetscalledrecordingzones
Eachtrackinazonehasthesamenumberofsectors,determinedbythecircumferenceofinnermosttrack.Eachzonehasadifferentnumberofsectors/track ComputingDiskCapacityCapacity=(#bytes/sector)x(avg.#sectors/track)x (#tracks/surface)x(#surfaces/platter)x (#platters/disk)Example:512bytes/sector300sectors/track(onaverage)20,000tracks/surface2surfaces/platter5platters/diskCapacity=512x300x20000x2x5 =30,720,000,000=30.72GBDiskOperation(Single-PlatterView)ThedisksurfacespinsatafixedrotationalrateBymovingradially,thearmcanpositiontheread/writeheadoveranytrack.Theread/writeheadisattachedtotheendofthearmandfliesoverthedisksurfaceonathincushionofair.spindlespindlespindlespindlespindleDiskOperation(Multi-PlatterView)ArmRead/writeheadsmoveinunisonfromcylindertocylinderSpindleTracksdividedintosectorsDiskStructure-topviewofsingleplatterSurfaceorganizedintotracksDiskAccessHeadinpositionaboveatrackDiskAccessRotationiscounter-clockwiseDiskAccess–ReadAbouttoreadbluesectorDiskAccess–ReadAfterBLUE
readAfterreadingbluesectorDiskAccess–ReadAfterBLUE
readRedrequestschedulednextDiskAccess–SeekAfterBLUE
readSeekforREDSeektored’strackDiskAccess–RotationalLatencyAfterBLUE
readSeekforREDRotationallatencyWaitforredsectortorotatearoundDiskAccess–ReadAfterBLUE
readSeekforREDRotationallatencyAfterREDreadCompletereadofredDiskAccess–ServiceTimeComponentsAfterBLUE
readSeekforREDRotationallatencyAfterREDreadDatatransferSeekRotationallatencyDatatransferDiskAccessTimeAveragetimetoaccesssometargetsectorapproximatedby:Taccess=Tavgseek+Tavgrotation+TavgtransferSeektime(Tavgseek)Timetopositionheadsovercylindercontainingtargetsector.TypicalTavgseekis3—9msRotationallatency(Tavgrotation)Timewaitingforfirstbitoftargetsectortopassunderr/whead.Tavgrotation=1/2x1/RPMsx60sec/1minTypicalTavgrotation=7200RPMsTransfertime(Tavgtransfer) Timetoreadthebitsinthetargetsector.Tavgtransfer=1/RPMx1/(avg#sectors/track)x60secs/1min.DiskAccessTimeExampleGiven:Rotationalrate=7,200RPMAverageseektime=9ms.Avg#sectors/track=400.Derived:Tavgrotation=1/2x(60secs/7200RPM)x1000ms/sec=4ms.Tavgtransfer=60/7200RPMx1/400secs/trackx1000ms/sec=0.02msTaccess=9ms+4ms+0.02msImportantpoints:Accesstimedominatedbyseektimeandrotationallatency.Firstbitinasectoristhemostexpensive,therestarefree.SRAMaccesstimeisabout4ns/doubleword,DRAMabout60nsDiskisabout40,000timesslowerthanSRAM,2,500timesslowerthenDRAM.LogicalDiskBlocksModerndiskspresentasimplerabstractviewofthecomplexsectorgeometry:Thesetofavailablesectorsismodeledasasequenceofb-sizedlogicalblocks(0,1,2,...)Mappingbetweenlogicalblocksandactual(physical)sectorsMaintainedbyhardware/firmwaredevicecalleddiskcontroller.Convertsrequestsforlogicalblocksinto(surface,track,sector)triples.Allowscontrollertosetasidesparecylindersforeachzone.Accountsforthedifferencein“formattedcapacity”and“maximumcapacity”.I/OBusMainmemoryI/ObridgeBusinterfaceALURegisterfileCPUchipSystembusMemorybusDiskcontrollerGraphicsadapterUSBcontrollerMouseKeyboardMonitorDiskI/ObusExpansionslotsforotherdevicessuchasnetworkadapters.ReadingaDiskSector(1)MainmemoryALURegisterfileCPUchipDiskcontrollerGraphicsadapterUSBcontrollermousekeyboardMonitorDiskI/ObusBusinterfaceCPUinitiatesadiskreadbywritingacommand,logicalblocknumber,anddestinationmemoryaddresstoaport(address)associatedwithdiskcontroller.ReadingaDiskSector(2)MainmemoryALURegisterfileCPUchipDiskcontrollerGraphicsadapterUSBcontrollerMouseKeyboardMonitorDiskI/ObusBusinterfaceDiskcontrollerreadsthesectorandperformsadirectmemoryaccess(DMA)transferintomainmemory.ReadingaDiskSector(3)MainmemoryALURegisterfileCPUchipDiskcontrollerGraphicsadapterUSBcontrollerMouseKeyboardMonitorDiskI/ObusBusinterfaceWhentheDMAtransfercompletes,thediskcontrollernotifiestheCPUwithaninterrupt(i.e.,assertsaspecial“interrupt”pinontheCPU)SolidStateDisks(SSDs)Pages:512KBto4KB,Blocks:32to128pagesDataread/writteninunitsofpages.PagecanbewrittenonlyafteritsblockhasbeenerasedAblockwearsoutafter100,000repeatedwrites.FlashtranslationlayerI/ObusPage0Page1PageP-1…Block0…Page0Page1PageP-1…BlockB-1FlashmemorySolidStateDisk(SSD)RequeststoreadandwritelogicaldiskblocksSSDPerformanceCharacteristics Whyarerandomwritessoslow?Erasingablockisslow(around1ms)WritetoapagetriggersacopyofallusefulpagesintheblockFindanusedblock(newblock)anderaseitWritethepageintothenewblockCopyotherpagesfromoldblocktothenewblockSequentialreadtput 250MB/s Sequentialwritetput 170MB/sRandomreadtput 140MB/s Randomwritetput 14MB/sRandreadaccess 30us Randomwriteaccess 300usSSDTradeoffs vsRotatingDisksAdvantagesNomovingpartsfaster,lesspower,moreruggedDisadvantagesHavethepotentialtowearoutMitigatedby“wearlevelinglogic”inflashtranslationlayerE.g.IntelX25guarantees1petabyte(1015bytes)ofrandomwritesbeforetheywearoutIn2010,about100timesmoreexpensiveperbyteApplicationsMP3players,smartphones,laptopsBeginningtoappearindesktopsandserversMetric 1980 1985 1990 1995 2000 2005 2010 2010:1980$/MB 8,000 880 100 30 1 0.1 0.06 130,000access(ns) 375 200 100 70 60 50 40 9typicalsize(MB) 0.064 0.256 4 16 64 2,000 8,000 125,000
StorageTrendsDRAMSRAMMetric 1980 1985 1990 1995 2000 2005 2010 2010:1980$/MB 500 100 8 0.30 0.01 0.005 0.0003 1,600,000access(ms) 87 75 28 10 8 4 3 29typicalsize(MB) 1 10 160 1,000 20,000 160,000 1,500,000 1,500,000DiskMetric 1980 1985 1990 1995 2000 2005 2010 2010:1980$/MB 19,200 2,900 320 256 100 75 60 320access(ns) 300 150 35 15 3 2 1.5 200CPUClockRates
1980 1990 1995 2000 2003 2005 2010 2010:1980CPU 8080 386 Pentium P-III P-4 Core2 Corei7 Clockrate(MHz)1 20 150 600 3300 2000 2500 2500Cycletime(ns) 1000 50 6 1.6 0.3 0.50 0.4 2500Cores 1 1 1 1 1 2 4 4Effectivecycle 1000 50 6 1.6 0.3 0.25 0.1 10,000time(ns)Inflectionpointincomputerhistorywhendesignershitthe“PowerWall”homework6.236.256.316.326.33
TheCPU-MemoryGapThegapwidensbetweenDRAM,disk,andCPUspeeds.DiskDRAMCPUSSDLocalitytotheRescue! ThekeytobridgingthisCPU-MemorygapisafundamentalpropertyofcomputerprogramsknownaslocalityTodayStoragetechnologiesandtrendsLocalityofreferenceCachinginthememoryhierarchyLocalityPrincipleofLocality:
ProgramstendtousedataandinstructionswithaddressesnearorequaltothosetheyhaveusedrecentlyTemporallocality:Recentlyreferenceditemsarelikely
tobereferencedagaininthenearfutureSpatiallocality:Itemswithnearbyaddressestend
tobereferencedclosetogetherintimeLocalityExampleDatareferencesReferencearrayelementsinsuccession(stride-1referencepattern).Referencevariablesumeachiteration.InstructionreferencesReferenceinstructionsinsequence.Cyclethroughlooprepeatedly.sum=0;for(i=0;i<n;i++) sum+=a[i];returnsum;SpatiallocalityTemporallocalitySpatiallocalityTemporallocalityQualitativeEstimatesofLocalityClaim:Beingabletolookatcodeandgetaqualitativesenseofitslocalityisakeyskillforaprofessionalprogrammer.Question:Doesthisfunctionhavegoodlocalitywithrespecttoarraya?intsum_array_rows(inta[M][N]){inti,j,sum=0;for(i=0;i<M;i++)for(j=0;j<N;j++)sum+=a[i][j];returnsum;}LocalityExampleQuestion:Doesthisfunctionhavegoodlocalitywithrespecttoarraya?intsum_array_cols(inta[M][N]){inti,j,sum=0;for(j=0;j<N;j++)for(i=0;i<M;i++)sum+=a[i][j];returnsum;}LocalityExampleQuestion:Canyoupermutetheloopssothatthefunctionscansthe3-darrayawithastride-1referencepattern(andthushasgoodspatiallocality)?intsum_array_3d(inta[M][N][N]){inti,j,k,sum=0;for(i=0;i<M;i++)for(j=0;j<N;j++)for(k=0;k<N;k++)sum+=a[k][i][j];returnsum;}MemoryHierarchiesSomefundamentalandenduringpropertiesofhardwareandsoftware:Faststoragetechnologiescostmoreperbyte,havelesscapacity,andrequiremorepower(heat!).ThegapbetweenCPUandmainmemoryspeediswidening.Well-writtenprogramstendtoexhibitgoodlocality.Thesefundamentalpropertiescomplementeachotherbeautifully.Theysuggestanapproachfororganizingmemoryandstoragesystemsknownasamemoryhierarchy.TodayStoragetechnologiesandtrendsLocalityofreferenceCachinginthememoryhierarchyAnExampleMemoryHierarchyRegistersL1cache(SRAM)Mainmemory(DRAM)Localsecondarystorage(localdisks)Larger,slower,cheaperperbyteRemotesecondarystorage(tapes,distributedfilesystems,Webservers)LocaldisksholdfilesretrievedfromdisksonremotenetworkserversMainmemoryholdsdiskblocksretrievedfromlocaldisksL2cache(SRAM)L1cacheholdscachelinesretrievedfromL2cacheCPUregistersholdwordsretrievedfromL1cacheL2cacheholdscachelinesretrievedfrommainmemoryL0:L1:L2:L3:L4:L5:Smaller,faster,costlierperbyteCachesCache:
Asmaller,fasterstoragedevicethatactsasastagingareaforasubsetofthedatainalarger,slowerdevice.Fundamentalideaofamemoryhierarchy:Foreachk,thefaster,smallerdeviceatlevelkservesasacacheforthelarger,slowerdeviceatlevelk+1.Whydomemoryhierarchieswork?Becauseoflocality,programstendtoaccessthedataatlevelkmoreoftenthantheyaccessthedataatlevelk+1.Thus,thestorageatlevelk+1canbeslower,andthuslargerandcheaperperbit.BigIdea:Thememoryhierarchycreatesalargepoolofstoragethatcostsasmuchasthecheapstoragenearthebottom,butthatservesdatatoprogramsattherateofthefaststoragenearthetop.GeneralCacheConcepts012345678910111213141589143CacheMemoryLarger,slower,cheapermemoryviewedaspartitionedinto“blocks”Dataiscopiedinblock-sizedtransferunitsSmaller,faster,moreexpensivememorycachesasubsetoftheblocks444101010GeneralCacheConcepts:Hit012345678910111213141589143Cache
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經權益所有人同意不得將文件中的內容挪作商業或盈利用途。
- 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 星級酒店集團管理辦法
- 血液中心全面質量管理體系建設及程序文件解讀
- 華為后備梯隊管理辦法
- 公廁建設后續管理辦法
- 水輪機增效優化技術-洞察及研究
- 傳播學領域的爭議、轉向及新聞傳播范疇探討
- 公益健康基金管理辦法
- 公文書寫中的字體規范與設計原則
- 安全生產八不傷害心得體會
- 世界文化遺產申報
- GB/T 18884.2-2015家用廚房設備第2部分:通用技術要求
- GB/T 12239-2008工業閥門金屬隔膜閥
- 軍標類型整理文檔
- 山東中醫藥大學2020-2021學年內科護理學試題及答案1
- DB32T 4174-2021 城市居住區和單位綠化標準
- 基本原理與性能特點多自由度電磁軸承課件
- Q∕SY 1836-2015 鍋爐 加熱爐燃油(氣)燃燒器及安全聯鎖保護裝置檢測規范
- 北京輸變電工程標準工藝應用圖冊(圖文并茂)
- 儀器使用記錄表
- 石河子大學化學化工學院學院綜合測評方案-理學院
- 《汽車電工電子技術》全套教案(完整版)
評論
0/150
提交評論