In 1994, two NASA employees connected 16 commodity workstations together using a standard ethernet LAN and installed open-source message passing software that allowed their number-crunching scientific application to run on the whole “cluster” of machines as if it were a single entity. Their do-it-yourself, MacGyver-like efforts were motivated by a frustration with the pricing, availability and maturity of then existing massively parallel processors e.g. (nCube, Thinking Machines, Convex and Cray). They named their machine Beowulf. Thomas Sterling and Donald Becker may not have known it at the time but their ungainly machine would usher in an era of commodity parallel computing that persists today and 1994 would prove to be a pivotal year in the evolution of High Performance Computing (HPC).
I believe that 2016 will be another such pivotal year. This year will see the launch of two powerful processors from two different companies with two competing visions for HPC. The future of HPC is massive on-chip parallelism, basically taking those commodity clusters born in 1994 and putting them on a single chip. Intel and NVIDIA have been rivals in this battle over HPC vision for almost a decade. With its modest introduction in 2007 of the Tesla compute family of GPUs and CUDA, a compiler that made it much easier to do general programming on its products, NVIDIA introduced the HPC community to general purpose GPU computing (GPGPU). HPC computations, in general, are bound either by the speed of calculations (FLOPS) or the speed of data movement from memory (bandwidth). When compared on a chip-to-chip basis, against Intel’s Xeon line of CPUs, GPUs have significantly better capability on both FLOPS and bandwidth. This performance advantage has led to the brisk adoption of CUDA and GPUs over the past decade and significant penetration of HPC markets in government, academia and industry. In addition, the past three years has seen a strong resurgence of interest in machine intelligence, deep learning and AI that has largely been enabled by the compact high-performance of NVIDIA GPUs and the massive training sets now available on the internet. This year NVIDIA introduces Pascal, the latest offering in the Tesla compute line.
Intel has not stood still and the success of GPUs in HPC and the emerging AI market has not escaped its attention. The company has presented a consistent vision of a many-core line of x86 compatible chips stretching back to the mid 2000’s with the Larrabee project. Larrabee was to be an x86 programmable discrete graphics chip, in other words a chip to compete head-to-head with NVIDIA and ATI (now AMD) in their core business. Product delays and disappointing performance led to the cancellation of Larrabee in May 2010 and its morphing into Knights Ferry, the first product in Intel’s many-core HPC chip family, Xeon Phi. Perhaps, recognizing the early success of NVIDIA accelerators in HPC or as part of a strategic vision Intel positioned itself and its newly branded Phi line to compete for this emerging accelerator market. As the HPC incumbent, Intel had and still has significant advantages, including a huge installed customer base, x86 software compatibility and control of the host system. This year Intel introduces Knights Landing, the latest offering in its Xeon Phi family.
Intel and NVIDIA are battling each other for dominance over the massive number crunching and data moving work that is the hallmark of HPC. It’s the kind of work that includes huge modeling and simulation tasks of everything from airflow over automobiles and aircraft, climate and weather modeling, seismic processing, reservoir simulation and much more. This year that battle is being played out by the matchup between Knights Landing and Pascal. The stakes are very high and the HPC hardware market only scratches the surface. The real cost is in the millions of person-hours that will be invested in writing and porting large, complicated, technical codes to exploit the full performance benefits of these two platforms. It’s a huge investment for companies and developers, one that will set the future course of HPC. While GPUs have gained a strong and dedicated following over the last decade as a next generation HPC platform, many companies, fearing the investment in software development, the scope of the task and limited experience with GPUs have chosen a conservative wait and see posture. As loyal Intel customers, they have waited almost a decade to get a viable many-core computing platform; one optimized for throughput processing of threads. All the while the performance gap between GPU-based codes and their CPU-based equivalents has grown with each processor generation. The Xeon Phi family from Larrabee through Knights Corner has thus far been disappointing. It stands in stark contrast to the near military precision, consistent performance and technical excellence that Intel has exhibited in its main Xeon line since the introduction of the core-2 architecture in 2004. Knights Landing is Intel’s third try. After almost a decade of waiting and promises, the expectations on Knights Landing are understandably high and a failure to match or exceed the performance of Pascal should trigger heated debate in the cubicles, data centers and board rooms where HPC matters.
Will Intel’s Knights Landing finally apply pressure to Pascal and the NVIDIA Tesla line, or will Pascal become Intel’s Knights Mare? This year will tell.