
Are you having problems with your existing data management system? Are you having scaling issues, perhaps you are bottlenecked at a single server? Are you finding it difficult to backup or HSM manage your files because you have too many? Are you experiencing poor tape drive performance? If so, you really need to keep reading!
The IBM Billion File Demo showcased the General Parallel File System (GPFS) Information Lifecycle Management (ILM) policy scan performance, which was the springboard for introducing the new High Performance Storage System GPFS/HPSS Interface (GHI). At the Almaden Research Center, a pre-GA version of GPFS is capable of scanning a single GPFS file system, containing a billion files, in less than 15 minutes!
Why is the speed of the GPFS ILM policy scan important? GHI uses the policy scan results to manage the GPFS disk resources using HPSS, IBM's highly scalable Hierarchical Storage Management (HSM) system, and to backup the GPFS namespace to HPSS tape. The faster the file system can be scanned, the faster GHI can begin working on copying the data between GPFS and HPSS tape. Furthermore, policy scans can take place at more frequent intervals, resulting in better management of the file system.
The backup feature of GHI captures a point-in-time snapshot of the GPFS file system. If the GPFS file system should fail, GHI can help rebuild your GPFS file system. The restore feature of GHI re-populates the GPFS namespace, using a point-in-time backup. Once the namespace has been restored, the file system is available for use. As files are accessed, the file data are staged back to GPFS, from HPSS tape.
The HSM feature of GHI manages the disk space of the GPFS file system. GHI will allow you to store petabytes of GPFS files on terabytes of GPFS disks. As files are written to GPFS, they are copied to HPSS tape. As files age, the file data are removed from the GPFS disks, leaving only the filename behind. To the user, the file remains unchanged. If the user should access one of these files, GHI will automatically recall the file data back to GPFS, from HPSS tape. GPFS ILM policy scans can also be used to bulk stage a set of files back to GPFS, from HPSS tape.
The file aggregation feature of GHI improves tape drive performance. On most file systems, 90% of the files take up 10% of the disk space -- lots of small files. Copying small files to tape usually kills tape drive performance. Not with GHI. To maximize tape drive performance, GHI bundles small files into large aggregates. At SC07, we bundled 10,000 small files into each aggregate, and we were processing a dozen aggregates in parallel. Rather than writing 120,000 small files to tape, on a given policy scan, GHI only wrote twelve files to tape. This resulted in tape write performance that was close to the tape drive limits!
As the HPSS Collaboration and our other customers know, HPSS also has no problems dealing with HUGE files. Do you need more performance than a single tape drive can offer -- perhaps you have a requirement to copy a 1 TB file to tape in less than 30 minutes? HPSS can also stripe a file across multiple tapes to meet these types of requirements. The HPSS distributed Mover technology allows a single instance HPSS to achieve a very high total system throughput rate.
Both GPFS and HPSS are distributed, parallel and highly scalable by design, and can move data at incredible speeds. That's why we say...
|
2012 HPSS Users Forum - The 2012 HPSS Users Forum (HUF) will be hosted by CEA and held at the Hotel Pullman (Paris Bercy) in Paris, France. The conference will run October 15th through noon on October 18th. The conference website is under construction and, when available, will allow users to register, view the draft agenda, and obtain other related information about the meeting. The length of the conference is the same as last year - 3 1/2 days, with time available in the evening to visit/tour Paris.
|
| Purdue University: Data Storage Archive- Information Technology at Purdue will be upgrading the Fortress archive system from EMC's DiskXtender (DXUL) to IBM's High Performance Storage System (HPSS). In addition to the new archive data, HPSS will manage and retrieve the legacy data on DXUL tapes. For more details, please visit http://www.rcac.purdue.edu/news/detail.cfm?NewsID=475 |
| Library of Congress: Data Storage Archive - The Library of Congress has acquired HPSS for use in the National Audiovisual Conservation Center. The NAVCC is located in Washington, DC and at the Packard Campus for Audio-Visual Conservation, located in Virginia. |