HPC高性能計(jì)算最佳實(shí)踐_第1頁(yè)
HPC高性能計(jì)算最佳實(shí)踐_第2頁(yè)
HPC高性能計(jì)算最佳實(shí)踐_第3頁(yè)
HPC高性能計(jì)算最佳實(shí)踐_第4頁(yè)
HPC高性能計(jì)算最佳實(shí)踐_第5頁(yè)
已閱讀5頁(yè),還剩32頁(yè)未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、HPC高性能計(jì)算最佳實(shí)踐技術(shù)創(chuàng)新 變革未來(lái)內(nèi)容提要HPC 簡(jiǎn)介ANSYS HPC 軟件配置ANSYS HPC 硬件選擇應(yīng)用案例2July 4, 2018HPC簡(jiǎn)介什么是HPCHPC帶來(lái)的好處HPC計(jì)算原理ANSYS HPC加速效果3July 4, 2018什么是高性能計(jì)算(HPC)高性能計(jì)算一般是指通過(guò)集合更多的計(jì)算資源提供遠(yuǎn)遠(yuǎn)超過(guò)單一工作站的計(jì)算能力去求解科學(xué)、工程問(wèn)題的實(shí)踐。4July 4, 2018HPC帶來(lái)的好處提高保真度計(jì)算更復(fù)雜的裝配體考慮更多的非線(xiàn)性更多的設(shè)計(jì)場(chǎng)景驗(yàn)證更多的 優(yōu)化分析5July 4, 2018HPC計(jì)算原理ANSYS HPC:通過(guò)將大規(guī)模計(jì)算問(wèn)題分解成可以并行 計(jì)

2、算的子問(wèn)題,分配到多個(gè)計(jì)算核心(CPU或者GPU)上進(jìn)行并行計(jì)算。充分利用計(jì)算資源,加速計(jì)算。ANSYS HPC Parametric Pack:將參數(shù)化模型的每一組參數(shù)設(shè)置分配到 多個(gè)計(jì)算節(jié)點(diǎn),同時(shí)求解多個(gè)不同參數(shù) 設(shè)置的模型,實(shí)現(xiàn)加速。可以結(jié)合HPC 一同使用。6July 4, 2018加速效果-HPCSovler Rating:It is computed by dividing the number of seconds in a day (86400 seconds) by the number of seconds required to run the benchmark. A

3、higher rating means faster performance.7July 4, 2018加速效果-HPC Parametric Pack020,00040,00060,00080,000100,000120,0001 Job4 Jobs1 HPC PP8 Jobs2 HPC PP16 Jobs3 HPC PP2083206920822068110,16029,46015,0608,220CalculationGeometry update+ 1 HPC PP+ 1 HPC PP+ 1 HPC PP8July 4, 2018參數(shù)化模型使用不同數(shù)量HPC PP的計(jì)算時(shí)間對(duì)比HPC軟

4、件配置9July 4, 2018HPC (per-process) HPC PackHPC product rewarding volume parallel processing for high-fidelity simulationsEach simulation consumes one or more PacksParallel enabled increases quickly with added PacksHPC WorkgroupHPC product rewards volume parallel processing for increased simulation th

5、roughput shared among engineers throughout a single location or the world16 to 32768 parallel shared across any number of simulations on a single serverHPC Parametric PackEnables simultaneous execution of multiple design pointswhile consuming just one set of licenses2048328128512Parallel Enabled (Co

6、res)3276881921234567HPC Packs per SimulationSingle HPC solution for FEA/CFD/FSI and any level of fidelity12 instead of 8 in 1st Pack at Release 19.0 and higher軟件配置選項(xiàng)10July 4, 2018ANSYS 19.0新特性11July 4, 2018ANSYS Mechanical Pro,Premium, EnterpriseANSYS CFD Premium andEnterpriseANSYS Mechanical CFDANS

7、YS HFSSANSYS AIMANSYS Q3D ExtractorANSYS MaxwellANSYS IcepakANSYS Mechanical CFD Maxwell 3DANSYS Chemkin-Pro and EnterpriseANSYS Mechanical Maxwell 3DANSYS SIwaveMore products are now using ANSYS HPCStandalone HPC licenses, HPC Packs and HPC Workgroup become more flexible and work across physics wit

8、h all ANSYS Mechanical, Fluids and Electronics products*4 Built-in HPCs now across all physics4 built-in HPCs are now included in Mechanical, Fluids and Electronics products, including ANSYS AIM and ANSYS Chemkin Enterprise.HPC Packs are now additiveHPC Packs becomes additive in nature to the 4 buil

9、t- in HPCs (e.g. 1 HPC Pack licenses 8 + 4 = 12 total cores, 2 HPC Pack license 32 + 4 = 36 total cores, etc.)* Impacted products :Note: R19.0 license manager is required. For ANSYS Mechanical and Fluids products changes are backward compatible; for ANSYS Electronics products changes are compatible

10、with version 19.0 and forwardNote: built-in HPCs are linked to a solver seat and cannot be shared with other solver seats!Note: the single, standalone HPCs are not additive to the PacksANSYS HPC Parametric Pack介紹HPC license for running parametric FEA or CFD simulations on multiple CPU cores simultan

11、eously, and more cost effectivelyKey BenefitsAbility to automatically and simultaneously execute design points while consuming just one set of application licensesScalable because number of simultaneous design points enabled increases quickly with added packsAmplifies complete workflow because desig

12、n points can include execution of multiple applications (pre, meshing, solve, HPC, post)Number of SimultaneousDesign PointsEnabled6432168412345Number of HPC Parametric Pack Licenses12July 4, 2018HPC Parametric Packs大幅縮短設(shè)計(jì)時(shí)間dp4 dp3dp2 dp1Sequentialseries of Design pointsUnused Cores94% Reduced Timeto

13、 InnovationHPC Parametric Packs amplify both solver licenses and HPC licenses allowing you to drastically reduce time to innovation, without the cost of additional solver or HPC licensesOne solver key without HPCFour solver keys OROne solver key and one HPC Parametric Pack+ 4 HPC keys13July 4, 2018G

14、PU加速Electronics products4 HPC licenses enable 1 GPU through the available 8 HPC tasks1 HPC Pack enables up to 12 CPU cores + 1 GPUs through the available 12 HPC tasks2 HPC Packs enable up to 36 CPU cores + 4 GPUs through the available 36 HPC tasksFluids / Structural products1 GPU requires 1 HPC task

15、 as long as GPUs CPU coresExamples:2 HPC licenses enable up to 3 CPU cores + 3 GPUs through the available 6 HPC tasks1 HPC Pack enables up to 6 CPU cores + 6 GPUs through the available 12 HPC tasks2 HPC Packs enable up to 18 CPU cores + 18 GPUs through the available 36 HPC tasks14July 4, 20181 GPU u

16、nlocked by every 8 HPC tasksGPU acceleration can be enabled through all ANSYS HPC product licenses: ANSYS HPC, ANSYS HPC Pack and ANSYS HPC Workgroup.HPC license cost decreases as more are purchasedeither as HPC Packs or as HPC Workgroups.ANSYS HPC and ANSYS HPC Workgroup gives flexible use of a poo

17、l of licenses.ANSYS HPC Pack gives “quick” scale-up but is more restrictive in how users can use it.The ability to be more flexible is why the HPC Workgroup options cost more than the HPC Packs.HPC Parametric Pack enables more cost-effective licensing for design exploration and optimization.我該選擇哪種配置

18、?15July 4, 2018Multiple licensing options to fit different requirements.HPC Packs for quick scale-up.HPC Workgroup for Flexibility.GPUs treated the same as cores in the licensing model.As you scale-up license cost decreases per core.Per core pricing becomes less ofan issue.小結(jié)- 軟件配置Running on 2,000 c

19、ores instead of 20 coresat 1.5X and not 100XFilling up a 1024- instead of 128-core cluster with 32-core jobs will cut the price per job in half!Enabling 64 instead of 4 simultaneous design points at 3X and not 16X16July 4, 2018HDD vs. SSD選擇什么樣的硬件配置SMP vs. DMPInterconnects?Clusters?CPUs?GPUs?17July 4

20、, 2018HPC硬件術(shù)語(yǔ)Machine 1 (or Node 1)GPUProcessor 1(or Socket 1)Processor 2(or Socket 2)Interconnect (GigE or InfiniBand)Machine N (or Node N)GPUProcessor 1(or Socket 1)Processor 2(or Socket 2)18July 4, 2018共享內(nèi)存并行Single Machine Parallel (SMP) systems share a single global memory image that may be distr

21、ibuted physically across multiple cores, but is globally addressable.OpenMP is the industry standard.Machine 1 (or Node 1)Processor 1(or Socket 1)19July 4, 2018分布式內(nèi)存并行Distributed memory parallel processing (DMP) assumes that physicalmemory for each process is separate from all other processes.Parall

22、el processing on such a system requires some form of message passing software to exchange data between the cores.MPI (Message Passing Interface) is the industry standard for this.Machine 1 (or Node 1)Processor 1(or Socket 1)20July 4, 2018了解時(shí)鐘速度的影響- ANSYS MechanicalEffect of increased core operating

23、frequencies on the DMPbenchmarks running on 12 coresInfluence is highest for sparse solver benchmarksUsing higher clock speed is always helpful to realize productivity gains21July 4, 2018了解內(nèi)存帶寬的影響- Is 24 Cores Equal to 24 Cores?3 x (2 x 4) = 24 coresx5570 x5570 x55702 x (2 x 6) = 24 coresx5670 x5670

24、22July 4, 20183 x (2 x 4) = 24 coresx5570 x5570 x55702 x (2 x 6) = 24 coresx5670 x5670Consider memory per core!23July 4, 2018了解內(nèi)存帶寬的影響- Is 24 Cores Equal to 24 Cores?分布式內(nèi)存并行優(yōu)于共享內(nèi)存并行SMPDMP48121605.02.50.050.025.06412819225600.0SMP vs. DMPSpeedup Factor vs. Number of Coresfor ANSYS Mechanical24July 4,

25、 2018GPU 加速ANSYS Application Examples25July 4, 2018Need fast interconnects to feed fast processorsTwo main characteristics for each interconnect: latency and bandwidthDistributed ANSYS is highly bandwidth bound26July 4, 2018+- D I S T R I B U T E DA N S Y SS T A T I S T I C S -+Release: 14.5Build: U

26、P20120802Platform: LINUX x64Date Run: 08/09/2012Time: 23:07Processor Model: Intel(R) Xeon(R) CPU E5-2690 0 2.90GHzTotal number of cores available:Number of physical cores available :32324 (Distributed Memory Parallel)Number of cores requested: MPI Type: INTELMPICoreMachine NameWorking Directory-0123

27、hpclnxsmc00 /data1/ansyswork hpclnxsmc00 /data1/ansyswork hpclnxsmc01 /data1/ansyswork hpclnxsmc01 /data1/ansysworkLatency time from master to core Latency time from master to core Latency time from master to core1 =2 =3 =1.171 microseconds2.251 microseconds2.225 microsecondsCommunication speed from

28、 master to core Communication speed from master to core Communication speed from master to core1 =7934.49 MB/sec Same machine2 =3011.09 MB/sec QDR Infiniband3 =3235.00 MB/sec QDR Infiniband了解互聯(lián)速度的影響了解互聯(lián)速度的影響- ANSYS MechanicalFor ANSYS Mechanical GiGE does not scale to more than 1 node!27July 4, 2018

29、了解互聯(lián)速度的影響- ANSYS MechanicalV13sp-5 ModelTurbine geometry2,100 K DOF SOLID187 FEsStatic, nonlinear One iteration Direct sparseLinux cluster (8 cores per node)01020304050608 cores16 cores32 cores64 cores 128 coresRating (runs/day)Interconnect PerformanceGigabit EthernetDDR Infiniband- particularly at

30、higher core/node c28July 4, 2018Using faster interconnects can behelpful to realize productivity gainsountsNeed fast hard drives to feed fast processorsCheck the bandwidth specsANSYS Mechanical can be highly I/O bandwidth boundSparse solver in the out-of-core memory mode does lots of I/ODistributed

31、ANSYS can be highly I/O latency boundSeek time to read/write each set of files causes overheadConsider SSDsHigh bandwidth and extremely low seek timesConsider RAID configurationsRAID 0 for speedRAID 1,5 for redundancyRAID 10 for speed and redundancy29July 4, 2018了解存儲(chǔ)速度的影響- ANSYS Mechanical了解存儲(chǔ)速度的影響-

32、 ANSYS Mechanical 18.1When working directory is assigned to Z Turbo Drive G2 and BMT models for CG solver are used with more than 16 cores, job speeds up by 1.4 times.When working directory is assigned to Z Turbo DriveG2 and BMT models for SPARSE are used with more than 16 cores, job speeds up by1.8

33、-2.6 times.higher is betterhigher isbetter1.4x1.4x1.4x2.6x2.1x1.8xHardware Configuration:HP Z840 workstation with dual E5-2699v4 (2.2 GHz), 128GBs 2400MHz memoryOptional Storage: Micron SATA SSD No RAID or HP Z Turbo Drive G2 512GB No RAID30July 4, 2018了解存儲(chǔ)速度的影響- ANSYS MechanicalRatingNumber of CoresUsing faster disks can be helpful to realize productivity gains- particularly at higher core/node counts31July 4, 2018時(shí)鐘速度內(nèi)存帶寬互聯(lián)速度GPU加速存儲(chǔ)速度:I/O is very important for Mechanical SolverRaid 0 mandator

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論