High Performance Storage System |
![]() |
Incremental Scalability
Based on storage needs and deployment schedules, HPSS scales incrementally by adding computer, network and storage resources. A single HPSS namespace can scale from petabytes of data to exabytes of data, from millions of files to billions of files, and from a few file-creates per second to thousands of file-creates per second.
About HPSS :
HPSS for Spectrum Scale (formerly GPFS)
Overview
|
![]() HPSS for Spectrum Scale |
HPSS for Spectrum Scale key terms
- Spectrum Scale: A proven, scalable, high-performance data, and file management solution (based upon IBM General Parallel File System or GPFS technology). Spectrum Scale is a true distributed, clustered file system. Multiple nodes (servers) are used to manage the data and metadata of a single file system. Individual files are broken into multiple blocks and striped across multiple disks, and multiple nodes, which eliminates bottlenecks.
- Spectrum Scale ILM policies: Spectrum Scale Information Lifecycle Management (ILM) policies are SQL-like rules used to identify and sort filenames for processing.
- Disaster recovery services: Software to capture a point-in-time backup of Spectrum Scale to HPSS media on a periodic basis, manage the backups in HPSS, and restore Spectrum Scale using a point-in-time backup.
- Space management services: Software to automatically move data between high-cost low-latency Spectrum Scale storage media and low-cost high-latency HPSS storage media, while managing the amount of free space available on Spectrum Scale. The bulk of data are stored in HPSS, allowing Spectrum Scale to maintain a range of free space for new and frequently accessed data. Data are automatically made available to the user when accessed. Beyond the latency of recalling data from tape, the Spectrum Scale space management activities (migrate, purge and recall) are transparent to the end-user.
- Online storage: Data that are typically available for immediate access (e.g. solid-state or spinning disk).
- Near-line storage: Data that are typically available within minutes of being requested (e.g. robotic tape that is local or remote).
Many to one advantage of HPSS for Spectrum Scale
- A single HPSS may be used to manage one or more Spectrum Scale clusters.
- ALL of the HPSS storage may be shared by all Spectrum Scale file systems.
- One Spectrum Scale file system may leverage all HPSS storage if required.
- The Max Planck Computing and Data Facility (MPCDF, formerly known as RZG) is space managing and capturing point-in-time backups for seven (7) Spectrum Scale file systems with ONE HPSS.
Many Spectrum Scale to one HPSS
|
What HPSS software is installed on the Spectrum Scale cluster?
- HPSS Session software directs the space management and disaster recovery services.
- HPSS I/O Manager (IOM) software manages data movement between HPSS and Spectrum Scale.
HPSS Software - HPSS Session software is configured on all Spectrum Scale Quorum nodes but is only active on the CCM (Spectrum Scale Cluster Configuration Manager) node.
- The CCM may failover to any Quorum node, and the HPSS Session software will follow the CCM.
Where HPSS software runs in the cluster - HPSS IOM software may be configured to run on any Spectrum Scale node with an available Spectrum Scale mount point.
- There are five processes that comprise the HPSS Session software: HPSS Process Manager, HPSS Mount Daemon, HPSS Configuration Manager, HPSS Schedule Daemon, and HPSS Event Daemon.
- There are three processes that comprise the HPSS IOM software: HPSS I/O Manager, HPSS I/O Agent, and HPSS ISHTAR.
- HPSS I/O Agents copy individual files while ISHTAR copies groups of files (ISHTAR is much like UNIX tar, but faster, indexed, and Spectrum Scale specific).
|
|
| |
|
Additional Spectrum Scale cluster hardware
- HPSS IOMs are highly scalable and multiple IOMs may be configured on multiple Spectrum Scale nodes for each file system.
- Bandwidth requirements help determine the expected node count for a deployment.
- New Spectrum Scale quorum nodes for HPSS may need to be added to the cluster.
- HPSS for Spectrum Scale is typically deployed on a set of dedicated nodes that will become the Quorum nodes.
- HPSS Session and IOM software are configured on the new Quorum nodes.
Dedicated HPSS nodes becomes Quorum nodes
| |
|
Space manage Spectrum Scale with HPSS
- Periodic ILM policy scans are initiated by the HPSS Session software to identify groups of files that must be copied to HPSS.
- The HPSS Session software distributes the work to the I/O Managers. Spectrum Scale data are copied to HPSS in parallel.
- The HPSS advantage is realized with two areas: (1) high performance transfers; and (2) how data are organized on tape.
Data flows to HPSS on migration - When Spectrum Scale capacity thresholds are reached (the file system is running out of storage capacity), unused files are purged from Spectrum Scale, but the inode and other attributes are left behind.
- All file names are visible, and the user may easily identify which files are online and which files are near-line.
- The HPSS Session software will automatically recall any near-line file from HPSS back to Spectrum Scale when accessed.
- Tools to efficiently recall large numbers of files are also provided.
Data flows to Spectrum Scale on recall
| |
| |
|
Backup and disaster recovery with HPSS for Spectrum Scale
- The backup process captures the following data:
- File data - the space management process (discussed above) is the process used by HPSS to capture the data for each file.
- Namespace data - the Spectrum Scale name space is captured using the Spectrum Scale image backup command.
- Cluster configuration data - the cluster configuration is saved to HPSS to protect the cluster configuration.
- The HPSS disaster recovery processing minimizes data movement and is ideal for high performance computing (HPC) environments where the goal is to bring the namespace back online quickly.
|
Come meet with us!
2023 HUF The 2023 HPSS User Forum (HUF) will be an in-person event scheduled October 30th through November 3rd, 2023, in Herndon, VA. This will be a great opportunity to hear from HPSS users, collaboration developers, testers, support folks and leadership (from IBM and DOE Labs). Would you like to Learn More? Please contact us if you are not a customer but would like to attend. |
HPSS @ SC23 The 2023 international conference for high performance computing, networking, storage and analysis will be in Denver, CO from November 12th through 17th, 2023 - Learn More. As we have each year (pre-pandemic), we are scheduling and meeting with customers via IBM Single Client Briefings. Please contact your local IBM client executive or contact us to schedule a HPSS Single Client Briefing to meet with the IBM business and technical leaders of HPSS. |
HPSS @ STS 2024 The 5th Annual Storage Technology Showcase is in the planning phase, but HPSS expects to support the event. Check out their web site - Learn More. |
HPSS @ MSST 2024 The 38th International Conference on Massive Storage Systems and Technology will be in Santa Clara, California in May of 2024 - Learn More. Please contact us if you would like to meet with the IBM business and technical leaders of HPSS at Santa Clara University. |
HPSS @ ISC 2024 ISC 2024 is the event for high performance computing, machine learning, and data analytics, and will be in Hamburg, Germany at the Congress Center Hamburg, from May 12th through May 16th, 2024 - Learn More. As we have done each year (pre-pandemic), we are scheduling and meeting with folks attending the conference. Please contact us meet with the IBM business and technical leaders of HPSS. |
What's New?
HPSS 10.2 Release - HPSS 10.2 was released on February 16th, 2023 and introduces six new features and numerous minor updates. |
HUF 2022 - The HPSS User Forum was hosted by IBM Houston in October 2021, at their IBM Houston Kurland building. |
Celebrating 30 Years - Fall 2022 marks the 30th anniversary of the High Performance Storage System (HPSS) Collaboration. |
HPSS 10.1 Release - HPSS 10.1 was released on September 30th, 2022 and introduces fourteen new features and numerous minor updates. |
Lots of Data - In March 2022, IBM/HPSS delivered a storage solution to a customer in Canada, and demonstrated a sustained tape ingest rate of 33 GB/sec (2.86 PB/day peak tape ingest x 2 for dual copy), while simultaneously demonstrating a sustained tape recall rate of 24 GB/sec (2.0 PB/day peak tape recall). HPSS pushed six 18-frame IBM TS4500 tape libraries (scheduled to house over 1.6 Exabytes of tape media) to over 3,000 mounts/hour. |
DOE Announces HPSS Milestone - Todd Heer, Deputy Program Lead, Advanced Simulation and Computing (ASC) Facilities, Operations, and User Support (FOUS), announced that DOE High Performance Storage Systems (HPSS) eclipse one exabyte in stored data. |
Atos Press Release - Atos boosts Météo-France’s data storage capacity to over 1 exabyte in 2025 to improve numerical modeling and climate predictions. Want to read more? |
Capacity Leader - ECMWF (European Center for Medium-Range Weather Forecasts) has a single HPSS namespace with over 824 PB spanning over 556 million files. |
File-Count Leader - LLNL (Lawrence Livermore National Laboratory) has a single HPSS namespace with over 78 PB spanning 1.746 billion files. |
Older News - Want to read more? |