The Human Capital of High-Performance Computing

Mike Fisk, CIO, Los Alamos National Laboratory
362
558
114
Mike Fisk, CIO, Los Alamos National Laboratory

Mike Fisk, CIO, Los Alamos National Laboratory

The often-overlooked aspect of high-performance computing is not the flashy “big iron” but the people. While the hardware can be spectacular, it won’t benefit your business if you don’t have talent who know how to use it, or if those employees don’t have the supporting infrastructure they need.

For much of its existence, high-performance computing resided in specialized supercomputing centers supporting a relatively small cadre of “users.” Those “users” are in fact sophisticated software developers who use specialized libraries or programming languages to harness the horsepower of supercomputers. These specialized developers are typically domain scientists in fields such as physics, materials, or geology who are building and running simulations of some complex environment or system. Making these people productive is the key to getting value out of your computing investments. Don’t underestimate the importance of human capital investments to build the capability to understand how high-performance computing platforms work and how to generate value from computing. HPC also requires significant investments in facilities, power measured in megawatts, and cooling capacity to match. These tangible problems will be surmounted before you take delivery of your HPC system. Other issues, like network bandwidth between the users and the computing and the intellectual capacity to effectively use the computer, can linger if not addressed.

  ​A program that can exploit multiple cores on a commodity system generally cannot exploit the scale of HPC without a complete rewrite   

The rise of the phrase high-performance computing over the older term supercomputer reveals that performance is no longer achieved through buying a singular, supercomputer but instead through a much more complex ecosystem of clusters, power and cooling infrastructure, specialized interconnect networks, languages and runtime environments. The first supercomputers were simply the fastest computers of the day, but today performance comes from parallel computation across large numbers of often commodity processors. The leading supercomputers today have millions of compute cores. Orchestrating that parallel computation, requires that programmers adopt specialized programming models such as message passing or parallel global address space languages. A program that can exploit multiple cores on a commodity system generally cannot exploit the scale of HPC without a complete rewrite.

To get value out of these complex computing environments, you have to enable teams of scientific developers. HPC programmers are at the leading edge of parallel computing, but often lag behind in their use of contemporary software development practices and tools. Many HPC programmers may be more engrained in their disciplines than in the fast-moving world of technology startups, scrum masters, and social coding. Provide them with contemporary, collaborative software development platforms such as GitHub, GitLab, or Atlassian. Offer them classes in parallel frameworks, software development tools, and in agile software development. Research software quality has different objectives than production software. With the exception of core software libraries, HPC codes may evolve each time they run and serve internal technical teams rather than outside customers. Nonetheless, developer productivity and the ability to effectively evolve software over time, remain vital.

Today, enterprises are building out new types of high-performance computing focused on analytics of large data sets. Many data analytics parallelize more easily than traditional HPC workloads and in 2004 Google popularized the highly scalable and approachable map-reduce programming model for data analytics. As a result, popular platforms like Hadoop and Spark are now widely supported and increasingly common in enterprises. HPC systems can run analytics workloads, but data analytics platforms are not well suited to some other kinds of computation. And similar to HPC, power “users” for analytics are really developing software on top of these specialized software stacks. Building an analytics team is yet another investment in specialized developers. Where HPC software teams need domain scientists who understand the phenomenology being simulated, analytics teams need experts in the data sources (be they measurements or customer transactions). Data have artifacts that have to be understood and propagated through the analytical process or you will get answers that misinterpret the data. The mechanics of acquiring, transforming, and loading data into an analytical environment take up more time and energy than developing the first statistic to be computed on the data. A smart analyst or data scientist can be stymied by unapproachable data.

Machine learning software has recently become approachable enough to be added to the analytics practitioner’s toolbox. Machine learning takes us beyond simple statistics to let the computer identify patterns—even high-dimensional patterns that you may not be able to see yourself. It is not a panacea and cannot extrapolate into unforeseen circumstances, but can do things like match new data against previously labeled categories. The underlying algorithms are powerful, but the practitioner needs to understand the technical capabilities of those algorithms to use them effectively.

Moving forward, enterprises can look to cloud vendors to manage the hardware and supporting infrastructure of all but the highest-performance computing. This can eliminate the capital expenditures, allow for experimental adoption, and be more cost-effective than keeping around a platform that isn’t heavily used. Platform as a Service providers can even maintain the complicated software stacks for you. But what you can’t buy in the cloud is results that matter to your business. For that, you need people.

The one consistent prerequisite for ongoing investment is talent. Build a pipeline of creative people who like to ask the questions that haven’t been asked before, are self-motivated to keep pace with rapidly changing technology, and understand your business enough to ask the right questions. 

Read Also

High Performance Computing Grows Up

High Performance Computing Grows Up

David Rukshin, CTO, WorldQuant, LLC
Making High-Performance Computing Work for Industry

Making High-Performance Computing Work for Industry

Mark Shephard, Director, Scientific Computation Research Center, Rensselaer Polytechnic Institute
Container Revolution Enables Science Breakthroughs

Container Revolution Enables Science Breakthroughs

Jack Wells, Director of Science, Oak Ridge Leadership Computing Facility
Fostering a Spirit of Collaboration Between IT and Business Leaders

Fostering a Spirit of Collaboration Between IT and Business Leaders

Sandy Jacolow, CIO, Silverstein Properties