|
马上注册,结交更多好友,享用更多功能,让你轻松玩转社区。
您需要 登录 才可以下载或查看,没有账号?注册
x
GPGPU computing (General Purpose GPU computing) is certainly a game changer. The potential speedup for SIMD-rich code is 10 to 100, using current high-end GPUs versus CPUs. Just as important as the speedup is the fact that such a GPU consumes merely one to two times as much electrical power as the CPU. The GPGPUs are basically the reincarnation of the array or vector coprocessors of the '80s and '90s, but this time the broad consumer graphics market has driven their prices to levels on par with those of CPUs. The future of HPC appears to be clusters of GPUs or similar massively multicore processors. Both NVIDIA and AMD are hurrying to deliver double-precision arithmetic on GPUs for the HPC market (the consumer graphics market is generally satisfied with single-precision arithmetic).
While hardware price is not a barrier to entry (you can put together a four teraflop machine in a single computer case for less than $3000), it is rather the software framework that is currently a barrier. NVIDIA has certainly been leading the way with their CUDA. Wolfram Research has already demonstrated a version of Mathematica that achieves speedup in that range using CUDA and a NVIDIA card. AMD has their Brook+ compiler. However, reportedly, both of these are not easy for code developers to master. The OpenCL specification (initially drawn up by Apple) released ten days ago by the Khronos Group is a major step forward in an open standard for vendor-agnostic GPGPU programming. However, it still appears to be too close-to-the-metal for many developers to embrace. Michael Wolfe, a compiler engineer at the Portland Group, in the November issue of Linux Journal magazine, talks about the feasibility of C/C++ and Fortran compilers that automate most of the memory and process management that currently must be coded by the developer in CUDA/Brook+ (and expectedly in OpenCL code), and automate the generation of parallelized code with the help of a few OpenMP-like compiler directives from the code developer. Adding to the complexity of the parallelization effort is that large-scale HPC will still need MPI to distribute computations among nodes in clusters, while using OpenCL or other approaches to accelerating each node's process via its GPU/s.
So the current uncertainty for code developers is whether to jump in and radically rewrite their codes to take advantage of OpenCL (which is basically C with a lot of GPU management functions) or whether to wait for C++ or Fortran compilers that will possibly need them to only make OpenMP style changes in their codes. Another uncertainty is about what Intel will bring to the table. Intel will certainly not lie back and allow NVIDIA and AMD to steal the HPC market away from it. However, Intel's likely response in the form of its x86-based Larrabee GPU seems to be a little late to market. Intel is certainly maintaining a presence in the OpenCL initiative.
At any rate, the ascent of cheap and powerful SIMD/MIMD coprocessors marks a watershed for HPC codes. I have a feeling that codes whose developers are not nimble enough to take advantage of these massively parallel and power-efficient coprocessors will disappear from the market as their more nimble rivals produce turn-around times that leave them in the dust. I expect that many of the unmaintainable in-house legacy codes will disappear when they prove too difficult to successfully port to GPUs, and as newer better-written codes outperform them by ridiculous margins. |
|