Deploying a High-Performance Filesystem on BGL

Hardware

In these early stages we had some difficulty getting PVFS2 running on all 16 storage nodes. We have 12 nodes, providing in aggregate a 1.1 TB PVFS2 volume.

Hardware benchmarks

Each storage node has a RAID array for pvfs2 storage. To get some idea of the performance of the disk subsystem, here is a run from bonnie++.

Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
fs2             10G 38476  95 49354  14 23415   5 35808  72 63971   5 557.3   0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  3113  99 +++++ +++ +++++ +++  3138  99 +++++ +++  9620 100
fs2,10G,38476,95,49354,14,23415,5,35808,72,63971,5,557.3,0,16,3113,99,+++++,+++,+++++,+++,3138,99,+++++,+++,9620,100

Software

The storage, login, and IO nodes are all running a CVS snapshot of PVFS2 from Feb. 16th. Our BlueLight version is DRV521_2004-050113. The login and storage nodes run SLES 9.

Items of note

Benchmarks

mpi-io-test

mpi-io-test is a simple MPI-IO contiguous access benchmark. Each process writes a large chunk of data to a non-overlapping, non-interleaved region of a file and then reads it back. It reports the aggregate IO performance of all processes involved in the job. We would expect this benchmark to give an upper bound on IO performance.

We ran mpi-io-test across the entire rack, varying the number of compute nodes as well as the amount of data each process wrote to the PVFS2 file. Read performance topped out at 600 MBytes/sec for 1024 nodes, each writing 32 MB chunks. Write performance was quite erratic. Note that the peak write bandwidth tops out at around 150 MBytes/sec.

[PVFS2 read performance on 1024 nodes] [PVFS2 write performance on 1024 nodes]

coll_perf

In coll_perf , the program writes a 3 dimensional array to a file. All processes perform collective IO. Data pending.

mpi-tile-io

Not done yet


Last update:

Tue Mar 29 16:38:05 CST 2005

Valid HTML 4.01!