High Performance Storage System

HPSS Logo
Incremental Scalability
Based on storage needs and deployment schedules, HPSS scales incrementally by adding computer, network and storage resources. A single HPSS namespace can scale from petabytes of data to exabytes of data, from millions of files to billions of files, and from a few file-creates per second to thousands of file-creates per second.
About HPSS   :    HPSS Version 7.5.2

The following is an overview of some of the new features that can be found in HPSS 7.5.2.

New to HPSS 7.5.2

Abort I/O Jobs

HPSS now provides a mechanism for administrators to abort ongoing I/O jobs. The HPSS Core Server has new mechanisms for tracking the ownership of I/O jobs and can provide additional statistics.

The additional job tracking information and abort interface has been integrated with the RTM functionality within SSM. Administrators can view the job tracking information in the RTM Details via the HPSS GUI and ADM. They can also abort requests via the RTM Summary screen (the "Abort I/O" button) in the HPSS GUI or the following command in ADM:

hpssadm> rtm cancel -id

The ADM now supports displaying detailed information:

hpssadm> rtm info -id

It is important to note that aborting a job in HPSS can be described as a "lazy" abort. When an abort is requested, and the abort request returns successfully, it indicates that the request has been placed in the ABORTING state and that whatever appropriate abort action has been initiated. However, the request may take some time to complete depending on its circumstances (e.g., it may require a tape be dismounted or some I/O protocol be satisfied, for example). The request will remain in the RTM Details until the request ends, with the ABORTING flag shown in its job state.

Aborted requests from end clients will return an error, which in most cases will be mapped back to the POSIX EIO (-5) error code. HPSS applications can use the hpss_GetLastHPSSErrno HPSS Client API to determine if the underlying error is due to an abort or some other cause.

See the HPSS Management Guide I/O Abort section.

Db2 Off-Node Support

To increase the resources available to Db2 and the HPSS servers, HPSS supports instances of Db2 partitioned databases utilizing nodes other than the HPSS core server.

SCSI PVR Move Queue and Performance Improvements

The HPSS SCSI PVR has been modified to improve the mount throughput performance of the software. In testing, this has shown up to a 30% mount throughput benefit on very busy systems, especially on Spectra Logic and TS4500 robotics. This functionality is provided as an optional binary; to enable it, modify the SCSI PVR Execute Pathname to /opt/hpss/bin/hpss_pvr_scsi_beta. This is currently an opt-in binary, so it is not enabled by default.

The SCSI PVR has been modified in several important ways that improve mount rate performance:

  • The SCSI PVR now queues up cartridge move requests internally and can issue them out of order using certain heuristics. This allows for finer control of when moves are requested.
  • The SCSI PVR now always prefers to mount an empty drive, if one is available, until the robot becomes full. Dismounts will not cause the SCSI PVR to stop mounting new drives while there are empty cartridges.
  • The SCSI PVR has streamlined communication paths to avoid unnecessary chatter over the wire with the library. This is especially beneficial for robotics with a low communication rate like the TS3500.
  • The SCSI PVR now supports SCSI Command Queuing. The SCSI PVR can now detect when a SCSI medium changer device supports command queueing, and may send multiple commands to that device in some cases to improve performance.
  • The queue contents and statistics about the move queue within the SCSI PVR have been added to the SIGHUP output.

Syslog-Based Logging

Logging has been significantly modified. Logging in HPSS now goes through syslog, rather than HPSS logging code. This results in the removal of the Log Daemon (logd) and Log Client (logc), and requires that administrators properly configure syslog for use by HPSS. Configuration considerations for syslog are described in the HPSS Management Guide Logging section. The logging format has changed to be a single line, "key=value" format.

Logging policy configuration has been modified to tie servers to an explicit log policy. There is no longer a global default log policy. Log policies can be shared among servers in the Server Configuration window.

Logging for remote HPSS processes, including movers, RAIT engines, and remote PVRs, now requires syslog configuration on those remote hosts, and configuration of the HPSS server running SSM as a syslog server that can receive remote messages. This configuration is described in the HPSS Management Guide Logging section.

Log rotation and archival has been modified significantly. These now rely upon the Linux tool logrotate. A logrotate policy should be configured to facilitate space management of the logs kept on the system and their archival. There is a new tool, hpss_log_archive, which facilitates the movement of log files into HPSS via logrotate. This configuration is described in the HPSS Management Guide Logging section. More information about hpss_log_archive can be found in its man page, $HPSS_ROOT/man/hpss_log_archive.7.

Media Access Logging

The Core Server now logs file operations with information on the media and drives used for each operation. These messages are logged with the INFO log level. This allows for analysis of affected files when drives or media become unreliable, yet have not failed outright. For example, if a corruption bug is discovered in a tape drive's firmware, it may be necessary to find out which media have been written to or read from with that drive, and also which files were involved. These logging messages will aid in such investigations.

This feature allows an administrator to understand which files may have been written to or read from a particular disk or tape volume, which tape drives may have been used to read or write a file, and when those operations occurred.

Additionally, the PVL now logs, at the INFO log level, cartridge state changes along with their duration. For example, how long it took to mount or dismount a particular cartridge.

Media access logs occur at the start and end of an operation and include the bitfile being staged, the fileset in which it belongs, and the path from that fileset. When the operation completes, the same information is logged again along with drives and volumes used for reading and writing, along with the status of the request. The log message will indicate when multiple volumes were involved in a read or write operation.

In the example below, T4:JD021700 means "tape, device ID 4, volume JD021700", and D2:MY000200 indicates "disk, device ID 2, volume MY000200".

Media access logs look like:

May 30 10:02:07.793192 hpss_core(HPSS)[756]:: #msgtype=INFO #server=Core Server@server #func=bfs_StageThread (line 1478) #rc=0 #msgid=CORE2188 #reqid=6068557350692913156 #msg=Media Access : Stage begin : BFID x'00000539070000000801E841C3C0A743B01E34' : FilesetRoot.29 : ./path/to/file
May 30 10:02:07.793801 hpss_core(HPSS)[756]:: #msgtype=INFO #server=Core Server@server #func=bfs_StageCleanup (line 3046) #rc=0 #msgid=CORE2189 #reqid=6068557350692913156 #msg=Media Access : Stage end : read {T4:JD021700} : wrote {D2:MY000200,D1:MY000100} : BFID x'00000539070000000801E841C3C0A743B01E34' : FilesetRoot.29 : ./path/to/file : Result No error (0)

Media access logs exist for file read, write, migrate, batch migrate, stage, async stage, and copy operations. Media access logging is currently unavailable for Repack. See the Media Access Logging section of the HPSS Management Guide for more details.

Recover Utility Makes Use of Tape Ordered Recall (TOR)

The recover utility currently causes data to be staged from an alternate, non-damaged level in the hierarchy up to the top level of the hierarchy. Prior to this change, stages were executed one file at a time. With this change, the files will now be staged in batches allowing more stages to occur concurrently. The batch stages will allow the Core Server to take advantage of its Tape Ordered Recall feature.

Since concurrent staging will now occur, HPSS administrators may need to tune their Core Server Maximum Active Copy Request and Maximum Active I/O settings, as these limit the total number of stages that can occur at the same time. (See the Core Server specific configuration section of the HPSS Management Guide for more details.) Stage requests which fail due to a busy system will be retried a couple times before the system gives up on them.

Additionally, while files are being staged to disk during a recover, the disk storage class may run out of free space. As an enhancement to recover, purge will now start (if it's not disabled or suspended) and will run within the rules of the purge policy for the disk storage class. Thus, recover may cause fully-migrated files to be purged from disk storage classes in order to make room for files that are being recovered.

Update PFTP to Support TOR

With the addition of Tape Ordered Recall to HPSS, PFTP has been updated to allow users to take advantage of new batch stage functionality. The following site commands have been added to PFTP to allow users to request the batch stage of files:

site stagebatch
site rstagebatch


The site stagebatch command will batch stage any files that match the glob pattern. The site rstagebatch command will batch stage any files that match the glob pattern, and also recursively batch stage any files inside folders that match the glob pattern.

In addition, the mget command has been updated to allow users the ability to automatically call a batch stage on the requested files before retrieving them. This ability can be toggled on or off via the following command:

toggleBatchStage

An entry has been added to HPSS.conf to allow system administrators to disable this functionality:

# Attempt batch stage on mget
# Depends on the server supporting TOR
# Default = on
; Disable stagebatch on mget

Use a File System as a Storage Device

HPSS now supports the specification of a sparse file in the Device Name field of a Disk Device configuration. This sparse file is expected to be backed by a file system which supports fsync to hardware. Data written to the file system is managed by the HPSS system, just as data written to a tape or raw device is. This is, in essence, a software device. Depending upon the underlying file system that the sparse file is stored on, features of the file system such as compression, automated backup, and software RAID can be used to enhance the integrity of the data being stored. This feature can also be used to interface with storage technologies that provide a file system interface.

Improve Sparse Tape Repack

Repacking tapes which are highly sparse (that is, more than 30%) has been shown to have poor performance in certain scenarios. In testing, this penalty for highly sparse volumes has been shown to cause repacks of highly sparse volumes to require two to three times the total time to repack, compared with a full tape.

In order to improve sparse tape repack scenarios, a new method of processing has been added to repack. This processing method (called reduced tape positioning) will move entire aggregates, including inactive space, skipping the inactive space elimination and aggregate reorganization steps. This is similar to the way repack operated prior to HPSS 7.4.1 and will allow repack to stream the tape and provide consistent performance despite varying levels of sparsity. This improvement is expected to be most useful for certain deadline driven technology insertion cases or in cases where a repack is required due to damaged media and it is more important to get the data off of the tape quickly rather than to reclaim all of the inactive space.

Tape repacks now have the following options:

  • ('--eliminate-whitespace', '-W') Eliminate whitespace mode. This is the default behavior of repack and has been the only method since 7.4.1. In this mode, 'whitespace' (inactive space on the tape) is not copied over and aggregates are re-formed.
  • ('--reduce-tape-positioning', '-P') Reduced tape position mode. This is similar to the pre-7.4.1 repack in that it does not eliminate whitespace or reorganize aggregates. This results in more consistent performance due to the lack of tape seeking.
  • ('--reduce-tape-pos-detection', '-0') Detect reduced tape position mode. This is a compromise between the above options. In this mode, repack will enable the reduced tape position mode for volumes which are highly aggregated and highly sparse. The default definition is any tape where more than 25% of the active bytes are in aggregates and more than 50% of the tape is sparse. Both criteria must be met in order for the tape to be processed with reduced tape positioning.
  • (--reduce-tape-pos-aggr, -1) This option modifies the aggregate percentage threshold for the reduced tape position detection option above. This is a value between "0" and "100". Tapes with more than this percentage of active data in aggregates will qualify.
  • ('--reduce-tape-pos-sparse', '-2') This option modifies the sparse percentage threshold for the reduced tape position detection option above. This is a value between "0" and "100". Tapes with more than this percentage of sparsity will qualify. Sparsity is the complement of the active percentage (that is, if a tape is 20% active, it is 80% sparse).

    See the repack man page for more details.

Tape Drive Quotas for Tape Recalls

HPSS provides the ability to modify the resource quota configuration (a maximum number of tapes, of each drive type, that the system can have concurrently mounted for recall), without interrupting the HPSS service. The SSM and ADM now allow configuration of drive recall quotas.

Full Aggregate Recall (FAR)

HPSS now supports Full Aggregate Recall (FAR). When a Class of Service is configured for FAR and a user requests the stage of a file which is stored in a tape aggregate, the core server will also stage every other file in that tape aggregate. The entire aggregate is read from tape and written to the multiple files on disk in a single IO operation. All the associated metadata updates for the data transfer are performed in a single database transaction.

The primary goal of this work is to improve the performance of staging multiple files from the same tape aggregate. The primary use case is for sites where users frequently retrieve multiple files from the same aggregate at or near the same time.

FAR support is configurable as a COS option. The HPSS Client API provides an option for an individual user to override this behavior and stage only the single file he was requesting.

< Home

Come meet with us!
HPSS @ MSST 2019
The 35th International Conference on Massive Storage Systems and Technology will be in Santa Clara, California in May of 2019 - Learn More. Please contact us if you would like to meet with the IBM business and technical leaders of HPSS at Santa Clara University.

HPSS @ ISC19
The 2019 international conference for high performance computing, networking, and storage will be in Frankfurt, Germany from June 16th through 20th, 2019 - Learn More. Come visit the HPSS folks at the IBM booth and contact us if you would like to meet with the IBM business and technical leaders of HPSS in Frankfurt.

2019 HUF
The 2019 HPSS User Forum details are coming soon. The HUF is typically hosted by HPSS customers in the September/October timeframe. The 2018 HUF was hosted by the UK Met Office in Exeter, United Kingdom from October 15th through October 18th, 2018.

HPSS @ SC19
The 2019 international conference for high performance computing, networking, storage and analysis will be in Denver, Colorado from November 18th through 21st, 2019 - Learn More. Come visit the HPSS folks at the IBM booth and contact us if you would like to meet with the IBM business and technical leaders of HPSS in Denver.

What's New?
HPSS 7.5.3 Release - HPSS 7.5.3 was released in December 2018 and introduces many new and exciting improvements.

IBM TS1160 - On November 20, 2018 IBM announced the new enterprise tape technology supporting 20 TB of native capacity and 400 MB/s of native bandwidth. Learn more.

Best of Breed for Tape - HPSS 7.5.2 and 7.5.3 improvements raise HPSS tape library efficiency to 99% on both IBM and Spectra Logic tape libraries.

Explosive data growth - HPSS Collaboration leadership from Lawrence Berkeley National Laboratory's National Energy Research Scientific Computing Center (NERSC) helped author the "NERSC Storage 2020" report, and NERSC trusts HPSS to meet their immediate and long term data storage challenges.

HPSS Vendor Partnership Grows - HPSS begins Quantum Scalar i6000 tape library testing in 2018. Other HPSS tape vendor partners include IBM, Oracle, and Spectra Logic.

Swift On HPSS - Leverage OpenStack Swift to provide an object interface to data in HPSS. Directories of files and containers of objects can be accessed and shared across ALL interfaces with this OpenStack Swift Object Server implementation - Contact Us for more information, or Download Now.

Capacity Leader - ECMWF (European Center for Medium-Range Weather Forecasts) has a single HPSS namespace with 385 PB spanning 356 million files.

File-Count Leader - LLNL (Lawrence Livermore National Laboratory) has a single HPSS namespace with 46 PB spanning 1.208 billion files.

RAIT - Oak Ridge National Laboratory cut redundant tape cost-estimates by 75% with 4+P HPSS RAIT (tape stripe with rotating parity) and enjoy large file tape transfers beyond 1 GB/s.
Home    |    About HPSS    |    Services    |    Contact us
Copyright 2018, HPSS Collaboration. All Rights Reserved.