|
|
|
|
| HPSS for GPFS at SC07 |
| |
|
|
|
Are you having problems with your existing data management
system? Are you having scaling issues, perhaps
you are bottlenecked at a single server? Are you
finding it difficult to backup or HSM manage your
files becuase you have too many? Are you
experiencing poor tape drive performance? If so,
you really need to keep reading!
|
 |
The IBM Billion File Demo showcased the
General Parallel File System
(GPFS) Information Lifecycle Management (ILM) policy scan
performance, which was the springboard for introducing the new
High Performance Storage System GPFS/HPSS
Interface (GHI). At the Almaden Research Center,
a pre-GA version of GPFS is capable of scanning a single GPFS
file system, containing a billion files, in less than 15
minutes!
|
GPFS/HPSS Interface (GHI)
Why is the speed of the GPFS ILM policy scan
important? GHI uses the policy scan results to manage the
GPFS disk resources using HPSS, IBM's highly scalable Hierarchical
Storage Management (HSM) system, and to backup the GPFS namespace to
HPSS tape. The faster the file system can be scanned, the
faster GHI can begin working on moving the data between GPFS and HPSS
tape. Furthermore, policy scans can take place at more
frequent intervals, resulting in better management of the file
system.
The backup feature of GHI, captures a point-in-time snapshot
of the GPFS file system. If the GPFS file system should
fail, GHI can help rebuild your GPFS file system. The
restore feature of GHI, re-populates the GPFS namespace, using a
point-in-time backup. Once the namespace has been
restored, the file system is available for use. As files
are accessed, the file data is staged back to GPFS, from HPSS tape.
The HSM feature of GHI, manages the disk space of the GPFS
file system. GHI will allow you to store petabytes of GPFS
files on terabytes of GPFS disks. As files are written to
GPFS, they are copied to HPSS tape. As files age, the file
data is removed from the GPFS disks, leaving only the filename
behind. To the user, the file remains
unchanged. If the user should access one of these files,
GHI will automatically recall the file data back to GPFS, from HPSS
tape. GPFS ILM policy scans can also be used to bulk stage
a set of files back to GPFS, from HPSS tape.
The file aggregation feature of GHI, improves tape drive
performance. On most file systems, 90% of the files take
up 10% of the disk space -- lots of small files. Moving
small files to tape usually kills tape drive
performance. Not with GHI. To maximize tape
drive performance, GHI bundles small files into large
aggregates. At SC07, we bundled 10,000 small files into
each aggregate, and we were processing a dozen aggregates in
parallel. Rather than writing 120,000 small files to tape,
on a given policy scan, GHI only wrote twelve files to
tape. This resulted in tape write performance that was
close to the tape drive limits!
As the HPSS Colaboration, and our other customers know, HPSS has
no problems dealing with HUGE files. The HPSS distributed
Mover technology allows HPSS to move huge files to tape
FAST! Do you need more performance than a single tape
drive can offer? HPSS can also stripe data across
multiple tapes, if needed. Both GPFS and HPSS are
distributed, parallel and highly scalable by design, and can move
data at incredible speeds. That's why we say...
GPFS + HPSS = Extreme Storage Scalability!
|
For questions about HPSS for GPFS, contact
Jim Gerry
|
|
|