The HPSS Collaboration between IBM Houston Global Services and what are now five DOE National Laboratories (Lawrence Berkeley, Lawrence Livermore, Los Alamos, Oak Ridge, and Sandia) began in the fall of 1992. The goal was to produce a highly scalable high performance storage system. The High Performance Storage System (HPSS) needed to provide scalable hierarchical storage management (HSM), archive, and file system services. No product meeting the requirements existed. When HPSS design and implementation began scientific computing power and storage capabilities at a site, such as a DOE national laboratory, was measured in a few 10s of gigaops, data archived in HSMs in a few 10s of terabytes at most, data throughput rates to an HSM in a few megabytes/s, and daily throughput with the HSM in a few gigabytes/day. At that time, the DOE national laboratory and IBM HPSS design team recognized that we were headed for a data storage explosion driven by computing power rising to teraops/petaops requiring data stored in HSMs to rise to petabytes and beyond, data transfer rates with the HSM to rise to gigabytes/s and higher, and daily throughput with a HSM in 10s of terabytes/day. Therefore, we set out to design and deploy a system that would scale by a factor of 1,000 or more and evolve from the base above toward these expected targets and beyond.
Because of the highly scalable HPSS architecture these targets have been successfully met. We now recognize that computing power will rise to exaops by about 2020 with a corresponding rise in the need to scale storage in its various dimensions by another factor of 1,000. Further, other major application domains, such as real-time data collection, also require such extreme scale storage. We believe the HPSS architecture and basic implementation, built around a scalable relational database management system (IBM’s DB2) make it well suited to this challenge.
For a distributed collaboration such as the HPSS Collaboration that is producing a major software system to be successful careful thought went into its basic organization. Its basic governing document is a Collaboration Agreement spelling out intellectual property rights of the development partners and their management and organization. IBM has the responsibilities for commercialization and deployment, outside the development partners. There is an Executive Committee co-chaired by IBM and DOE lab representatives that sets major development and other policies. This group meets several times a year, primarily by teleconference. HPSS development is overseen by a Technical Committee coordinated by an IBM project manager. This group is generally organized around the major architectural modules of the system. It meets weekly, and more often as needed, by teleconference and once or twice a year in person. Development of the system follows industry standard software engineering practices and has an SEI CMM Level 3 rating. Following these software engineering practices has been a major factor in its success in producing a stable maintainable product.
The HPSS collaboration is based on the premise that no single organization has the experience and resources to meet all the challenges represented by the growing imbalance between computing power and data collection capabilities, and storage system I/O, capacity, and functionality. Over 20 organizations including industry, Department of Energy (DOE), other federal laboratories, universities, National Science Foundation (NSF) supercomputer centers and French Commissariat a l'Energie Atomique (CEA) have contributed to various aspects of this effort.
The primary HPSS development team consists of
- IBM Global Business Services (Houston, TX)
- Los Alamos National Laboratory (Los Alamos, NM)
- Lawrence Livermore National Laboratory (Livermore, CA)
- Lawrence Berkeley National Energy Research Supercomputer Center (Berkeley, CA)
- Oak Ridge National Laboratory (Oak Ridge, TN)
- Sandia National Laboratories (Albuquerque, NM)
- Commissariat a l'Energie Atomique, Direction des Applications Militaires (Bruyeres le Chatel, France)
- Gleicher Enterprises (Tucson, AZ)
|HPSS @ SC15 - SC15 is the 2015 international conference for high performance computing, networking, storage and analysis. SC15 will be in Austin, Texas, from November 16th through 19th - Learn More. Come visit the HPSS folks at the IBM booth and schedule an HPSS briefing at the IBM Executive Briefing Center|
|Swift On HPSS - OpenStack Swift Object Server implementation enables objects created using the Swift API to be accessed by name in HPSS - /account name/container name/object name. Legacy HPSS files can be accessed using the Swift API. Contact us for more information.|
|2015 HPSS Users Forum - The HPSS User Forum 2015 will be hosted by SciNet in Toronto, Canada from Monday, September 28 through Friday, October 2. For more information.|
|HPSS @ ISC15 - ISC15 is the 2015 International Supercomputing Conference for high performance computing, networking, storage and analysis. ISC15 will be in Frankfurt, Germany, from July 12th through 16th - Learn More. Come visit the HPSS folks at the IBM booth and schedule an HPSS briefing at the IBM Executive Briefing Center|
| 2015 HPSS Training - The next HPSS System Administration course from August 24th - 28th. For more information and registration.
| HPSS @ MSST 2015 - MSST 2015 is the 31st International Conference on Massive Storage Systems and Technology. This year's theme is Media Wars: Disk versus FLASH in the Struggle for Capacity and Performance. Learn More
| NCSA in production with RAIT - A massive 380 petabyte HPSS system was successfully deployed. -- the world’s largest automated near-line data repository for open science. Learn more from NCSA, and HPCwire. The new HPSS system went into production using HPSS Redundant Array of Independent Tapes (RAIT) tiers, which is similar to RAID, providing redundancy for a tape stripe. RAIT allows HPSS customers to meet their performance and redundancy requirements without doubling their tape cost. Learn more about RAIT.