Learn About HPSS
  Technology 
  HPSS Offerings 
  FAQ's 
  Tech Papers 
  Collaboration 
 
Information for Administrators
  Obtain Support 
  Resources 
  Problem DB 
  Ops Bulletins 
  Rel/Patch Info 
  Tools 
  Software PTR's 
  Customer Info 
 
Information for Users
  User Forum 
  Documentation 
  Applications 
 
Education and Consulting
  Classes 
  Consulting 
 
Contact HPSS
  Key People 
  Directions 
  IBM Clearlake 
 
HPSS
HPSS Home
About HPSS
  Technology
  HPSS for GPFS
  HPSS Offerings
  FAQ's
  Collaboration
Users
  User Forum
  Applications
Documentation
  Users
  Administrators
  Rel/Patch Info
  Prerequisite Software
Services
  Classes
  Consulting
Contact Us
  Key People
Members Only
  Obtain Support
  Problem DB
  Ops Bulletins
  Tools
  Software PTR's
About HPSS
HPSS for GPFS at SC07
       
Are you having problems with your existing data management system?  Are you having scaling issues, perhaps you are bottlenecked at a single server?  Are you finding it difficult to backup or HSM manage your files becuase you have too many?  Are you experiencing poor tape drive performance?  If so, you really need to keep reading!

IBM's Booth @ SC07 The IBM Billion File Demo showcased the General Parallel File System (GPFS) Information Lifecycle Management (ILM) policy scan performance, which was the springboard for introducing the new High Performance Storage System GPFS/HPSS Interface (GHI).  At the Almaden Research Center, a pre-GA version of GPFS is capable of scanning a single GPFS file system, containing a billion files, in less than 15 minutes!

GPFS/HPSS Interface (GHI)

Why is the speed of the GPFS ILM policy scan important?  GHI uses the policy scan results to manage the GPFS disk resources using HPSS, IBM's highly scalable Hierarchical Storage Management (HSM) system, and to backup the GPFS namespace to HPSS tape.  The faster the file system can be scanned, the faster GHI can begin working on moving the data between GPFS and HPSS tape.  Furthermore, policy scans can take place at more frequent intervals, resulting in better management of the file system.

The backup feature of GHI, captures a point-in-time snapshot of the GPFS file system.  If the GPFS file system should fail, GHI can help rebuild your GPFS file system.  The restore feature of GHI, re-populates the GPFS namespace, using a point-in-time backup.  Once the namespace has been restored, the file system is available for use.  As files are accessed, the file data is staged back to GPFS, from HPSS tape.

The HSM feature of GHI, manages the disk space of the GPFS file system.  GHI will allow you to store petabytes of GPFS files on terabytes of GPFS disks.  As files are written to GPFS, they are copied to HPSS tape.  As files age, the file data is removed from the GPFS disks, leaving only the filename behind.  To the user, the file remains unchanged.  If the user should access one of these files, GHI will automatically recall the file data back to GPFS, from HPSS tape.  GPFS ILM policy scans can also be used to bulk stage a set of files back to GPFS, from HPSS tape.

The file aggregation feature of GHI, improves tape drive performance.  On most file systems, 90% of the files take up 10% of the disk space -- lots of small files.  Moving small files to tape usually kills tape drive performance.  Not with GHI.  To maximize tape drive performance, GHI bundles small files into large aggregates.  At SC07, we bundled 10,000 small files into each aggregate, and we were processing a dozen aggregates in parallel.  Rather than writing 120,000 small files to tape, on a given policy scan, GHI only wrote twelve files to tape.  This resulted in tape write performance that was close to the tape drive limits!

As the HPSS Colaboration, and our other customers know, HPSS has no problems dealing with HUGE files.  The HPSS distributed Mover technology allows HPSS to move huge files to tape FAST!  Do you need more performance than a single tape drive can offer?  HPSS can also stripe data across multiple tapes, if needed.  Both GPFS and HPSS are distributed, parallel and highly scalable by design, and can move data at incredible speeds.  That's why we say...

GPFS + HPSS = Extreme Storage Scalability!



For questions about HPSS for GPFS, contact Jim Gerry