HPSS for IBM Storage Scale

HPSS GHI

Our GPFS HPSS Interface, hereto after called HPSS GHI, provides fully automated space management services and disaster recovery services for IBM Storage Scale.

IBM Storage Scale (previously IBM Spectrum Scale) is built on the award-winning IBM General Parallel File System (GPFS). IBM Storage Scale provides file, object, and integrated data analytics for high performance computing, big data, Hadoop Distributed File System (HDFS), and private clouds.

HPSS GHI for Space Management

HPSS GHI leverages Storage Scale’s information lifecycle management (ILM) policies and provides a complete solution to copy file system data to tiers of less expensive storage (disk and tape) managed by HPSS. When space thresholds are reached, HPSS GHI will automatically purge the older (unused) files, from the file system to free space for more data. Purging files is not deleting files. Purging removes the data blocks, but leaves the metadata behind, so the Storage Scale name space remains whole. When accessed, files are automatically re-hydrated (recalled from HPSS) for normal use in the file system.

HPSS GHI can be configured to aggregate many small Storage Scale files into much larger, tape-friendly HPSS aggregates, which gives HPSS two advantages: (1) When aggregating thousands of files to a single HPSS aggregate, the file-operations-per-second burden on the HPSS name space goes away; and (2) small-file tape transfer-performance goes up!  In addition to small-file tape write performance, a small HPSS disk cache for these small-file aggregates enables HPSS to achieve high-bandwidth tape-reads, because HPSS can be configured to recall the entire small-file tape aggregate back to the HPSS disk cache, and this enables HPSS GHI to work with the lower-latency HPSS disk cache rather than the higher-latency HPSS tape drive for small file IO.

HPSS GHI enables an IBM Storage Scale provisioned with a few petabytes of storage to contain hundreds of petabytes of files. Remember, HPSS disk and HPSS tape are not licensed by capacity, so it may be more economical to add more HPSS block storage units rather than provisioning the file system with additional capacity, which increases the file system license fee.

HPSS GHI for backups

HPSS GHI leverages the scale-out backup and restore (SOBAR) feature of IBM Storage Scale to save a snapshot of the file system metadata to HPSS. HPSS GHI also captures the cluster, and file system details which allows the HPSS GHI restore process to completely rebuild your file system from scratch when disaster strikes.

After an HPSS GHI restore, the file system metadata are completely restored, and the file system is ready for use. For high performance computing (HPC), a small amount of data can be recalled, and the HPC cluster can be put back to work. Other less-critical data may be recalled from HPSS by priority.

Protect Many File Systems With One HPSS

Multiple IBM Storage Scale file systems can send data to a single HPSS. The Max Planck Computing and Data Facility (MPCDF) is space managing and capturing point-in-time backups from eleven (11) Storage Scale clusters with a single multi-site HPSS system near Munich, Germany.

Where HPSS GHI Runs

HPSS GHI session software is installed on all IBM Storage Scale quorum nodes, and runs on the Cluster Configuration Manager (CCM). If the CCM should fail, another quorum node will become the CCM, and HPSS GHI session software will automatically start on the new CCM.

HPSS GHI IO software is responsible for copying files from IBM Storage Scale to HPSS, and for recalling file data back from HPSS. The HPSS GHI IO software does not need to be installed on the quorum nodes, but they must have access to the mount point to copy data.