This week’s Spotlight is on Yifeng Cui, director of the High Performance GeoComputing Lab (HPGeoC) at the San Diego Supercomputer Center (and adjunct professor in the Department of Geological Sciences at San Diego State University).
HPGeoC was recently named a winner of the HPC Innovation Excellence Award by IDC for developing a highly scalable computer code that promises to dramatically cut research times and energy costs in simulating seismic hazards. The video below demonstrates some of the earthquake simulation done by HPGeoC.
Read Yifeng’s full Spotlight here. Here is an excerpt.
NVIDIA: How are GPUs helping you solve key challenges in your field?
Yifeng: Our team developed a scalable GPU code based on AWP-ODC, an Anelastic Wave Propagation code originally developed by Prof. Kim Olsen of SDSU. This code has two versions that give equivalent results. The first version can efficiently calculate extreme scale ground motions at many sites.
The second can efficiently calculate ground motions from many single-site ruptures as capacity computing. The optimization of the code results in around a 110x speed up over the CPU in key strain tensor calculations critical to the probabilistic seismic hazard analysis.
NVIDIA: What specific approaches do you use to apply GPU computing to your work?
Yifeng: This code is a memory-bounded stencil that is limited in compute performance by its low computational intensity and poor data locality. We re-designed the Fortran code to C to maximize throughput and memory locality. Good scalability was achieved through a two-layer 2D decomposition, and an algorithm-level communication reduction scheme, which eliminates stress data exchange otherwise needed per iteration.
CUDA asynchronous memory copy operations help effective overlap of CPU/PCI-e data transfer with GPU computation. A two-layer scalable IO technique was developed to efficiently handle many terabytes of dynamic source and media inputs, as well as 3D volume velocity outputs. We are also tuning co-scheduling to allow full utilization of both CPUs and GPUs in the hybrid heterogeneous systems. We are grateful for NVIDIA’s support during our implementation process.
NVIDIA: Tell us about your use of the Titan system at Oak Ridge National Lab.
Yifeng: We simulated realistic 0-10 hertz ground motions on a mesh comprising 443 billion elements in a calculation that includes both small-scale fault geometry and media complexity at a model size far beyond what has been done previously. This was done in collaboration with Profs. Olsen and Steve Day of SDSU; Prof. Thomas Jordan of USC, the Director of SCEC; and others at SCEC. The validation simulation on Titan demonstrated ideal scalability up to 8K Titan nodes, and sustained 2.3 petaflop/s on 16K Titan nodes.
Read the full interview. Read more CUDA Spotlights.
∥∀
Parallel Forall is the NVIDIA Parallel Programming blog. If you enjoyed this post, subscribe to the Parallel Forall RSS feed! You may contact us via the contact form.