5/14/2012 - CIFTS-related publication coming-up at conferences
The CIFTS OSU team will present their research and publication on Monitoring and Predicting Hardware Failures in HPC Clusters with FTB-IPMI at the International Workshop on System Management Techniques, Processes, and Services (SMTPS), in conjunction with the International Parallel and Distributed Processing Symposium (IPDPS '12), on May 21st 2012, Shanghai.
5/10/2011 - Upcoming CIFTS talks/presentations at conferences
The CIFTS team will be presenting several CIFTS-related papers at various conferences and workshops. Highlights include:
- "Realization of User-Level Fault Tolerance Policy Management through a Holistic Approach for Fault Correlation", B-H. Park, T. Naughton et al., IEEE International Symposium on Policies for Distributed Systems and Networks (POLICY), June 2011
- "Strategies for Fault Tolerance in Multicomponent Applications", A. Shet, W. Elwasif et al. Proceedings of the International Conference on Computational Science (ICCS 2011), June 2011
- "Co-Analysis of RAS Log and Job Log on Blue Gene/P", Z. Zheng, L. Yu, et al., 25th IEEE International Parallel and Distributed Processing Symposium (IPDPS' 11), May 2011
4/25/2011 - Open MPI 1.5.3 released with FTB support
Open MPI 1.5.3 has released with FTB support today. More information and detailed instructions on the release; as well as a list of supported FTB fault events can be found on the Open Systems Laboratory website.
10/15/2010 - CIFTS at SC'10
Its SC time again! Come visit us again at the Supercomputing SC'10 conference this year and share your CIFTS experiences. The CIFTS team plans to have several activities at SC'10.
- Tuesday 16th, 12:15 - 1:15pm, CIFTS: Bird of a Feather Session, held in Room: 388
- Tuesday 16th, 2:00 - 2:45pm, CIFTS Demos, ANL Booth#: 2513
- Tuesday 16th, 3:00 - 4:00pm, CIFTS Round Table Discussion, LBNL Booth#: 2448
- Wednesday 17th, 2:00 - 2:45pm, CIFTS Demos, ANL Booth#: 2513
- Thursday 18th, 1:30 - 2:15pm, FTB-Enabled MPI: Experience with MVAPICH2, a Featured Booth talk by D.K. Panda, ANL Booth#: 2513
Click here to download the SC'10 CIFTS flyer.
11/14/2010 - FTB-enabled MPICH2 1.3.1 Released
MPICH 1.3.1 from Argonne National Laboratory is FTB-enabled. More information on this can be found on the MPICH2 Wiki Page. With the MPICH2 1.3.1 release, MPICH2 is now fully compliant with the FTB MPI standardized events 1.0 and publishes all events described in this standard.
11/13/2010 - FTB-ENABLED PROCESS MIGRATION Support NOW AVAILABLE in MVAPICH2 1.6
MVAPICH2 1.6 from Ohio State University now provides FTB-enabled support for both Checkpoint-Restart and Process Migration. More information on these features can be found from the OSU FTB website. MVAPICH2 1.6 code-base and the associated user guide with related information about using these features can be downloaded from the MVAPICH website.
08/16/2010 - FTB SOFTWARE VERSION 0.6.2 Now Available
The CIFTS team is pleased to announce the
release of FTB software version 0.6.2, based on the FTB
API 0.5 specification. This release is supported on IBM
BG/P, Cray XT and the Linux systems.
The ftb-0.6.2 software is available for download here.
08/06/2010 - FTB SOFTWARE VERSION 0.6.2rc1 now available
FTB software version 0.6.2.rc1 release is now available here. Please contact the cifts team if you wish to be a part of the testing effort.
07/07/2010 - FTB SOFTWARE VERSION 0.6.2b1 now available
FTB software version 0.6.2.b1 release is now available here. Please contact the cifts team if you wish to be a part of the testing effort.
06/01/2010 - FTB SOFTWARE VERSION 0.6.2a1 now available
FTB software version 0.6.2.a1 release is now available here. Please contact the cifts team if you wish to be a part of the testing effort.
2/2/2010 - Download CIFTS Demonstration video from SC'09
Did you miss us at SC'09? Several demonstrations with FTB and FTB-enabled software, such as MPICH2, MVAPICH, Open MPI, FT-LA, the Log monitoring tool for CRAY, various application were presented at the Supercomputing Conference, Portland, Nov 2009. If you missed the demonstrations, you can download the POV-RAY demonstration now. This video demonstrates process resilience in Open MPI using CIFTS.
11/10/2009 - CIFTS at SC'09
Come visit us at SC! The CIFTS team will be in Portland, Oregon for the Supercomputing Conference. We have several activities planned for this year:
- Tuesday, Nov 17th, 12:15-1:15 pm - Room D137-138 - CIFTS BOF
- Tuesday, Nov 17th, 2:00-3:00pm - ANL booth (#644) - CIFTS Demos
- Tuesday, Nov 17th, 3:00-4:30p, - LBNL booth (#723) - Round Table Discussion on Checkpoint/Restart Research
- Tuesday, Nov 17th, 5:15-7:00pm, Oregon Ballroom Lobby - Poster on "FTB-Enabled Failure Prediction for Blue Gene/P Systems"
- Wednesday, Nov 18th, 2:00-3:00pm - ANL booth (#644) - CIFTS Demos
The CIFTS demos this year will present failure scenarios with several FTB-enabled software. This includes FTB-enabled applications like the Integrated Plasma Simulator, the FTB-enabled FT-LA library, FTB-enabled MPICH2 library, FTB-enabled MVAPICH library, FTB-enabled Open MPI, Log Monitoring using FTB on Cray etc. Click here to download the CIFTS Supercomputing'09 flier.
11/10/2009 FTB-enabled MVAPICH2 1.4 NOW AVAILABLE
09/14/2009 - FTB software version 0.6.1 Now Available
The CIFTS team is pleased to announce the
release of FTB Version 0.6.1, based on the FTB
API 0.5 specification. This release is supported on IBM
BG/P, Cray XT and the Linux (Ubuntu 9.04) systems.
The ftb-0.6.1 software is
available for download here.
We welcome your feedback and comments. We look forward to working with the community to develop FTB-enabled software for improving the reliability of large-scale systems.
08/12/2009 - CIFTS BOF at Supercomputing 2009
Following the success of the Birds-of-Feather sessions held at Supercomputing conference for the last two years, we will again be holding a CIFTS BOF at the SC'09 conference. SC'09 will be held in Portland, Oregon this year at the "Oregon Convention Center". The CIFTS BOF is scheduled on on Tuesday, Nov 17th from 12:15PM - 1:15PM in room D137-138. Time to mark your calendars!
05/21/2009 - CIFTS publication at ICPP 2009
The publication "CIFTS: A Coordinated infrastructure for Fault-Tolerant Systems", which details the CIFTS framework and Fault Tolerance Backplane architecture, will be published in the Proceedings of the 38th International Conference on Parallel Processing (ICPP 2009) to be held in Vienna, Austria from September 22-25, 2009. The paper can be downloaded here
1/16/09 - Berkeley Lab Checkpoint/Restart Software v0.8.0 is FTB-enabled
Lawrence Berkeley National Laboratory's popular Checkpoint/Restart Software BLCR (version 0.8.0) is now FTB-Enabled. Users can download this software from the BLCR website. Please refer to README.FTB (packaged in the software) for more information on how to install this software and the fault events it supports with the Fault Tolerance Backplane(FTB).
12/10/08 - CIFTS team and IIT win the The Cray Log Analysis Contest
The ORNL CIFTS team, in collaboration with Ziming Zheng from Zhiling Lan's team from IIT is a winner of the "The Cray Log Analysis Contest" at the First USENIX Workshop on the Analysis of System Logs (WASL'08), which was co-located with the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI'08).
11/10/08 - FTB-enabled InfiniBand Monitoring Software Release 1.0 now available
Ohio State University's Network-based
Computing Laboratory is pleased to announce
the release of the first version of the
FTB-Enabled InfiniBand Monitoring
software currently supports FTB version 0.6. The
FTB-InfiniBand software uses the FTB Infrastructure
to notify other FTB enabled components about
failures in the InfiniBand network.
You can download the software from the CIFTS Software page or from the NOWLAB FTB-InfiniBand website.
11/04/08 - CIFTS at Supercomputing 2008!
CIFTS has several events going on thru the
conference. Here is a short listing of the activities.
Tuesday Nov 18th, 12:15-1:15 PM - BallRoom G - CIFTS BOF
Tuesday Nov 18th, 2:00-3:00 PM - ANL(558) - CIFTS Demos
Tuesday Nov 18th, 1:30-3:00 PM - LBNL(540) - Checkpoint/Restart Round Table Discussion
Tuesday Nov 18th, 5:15-7:00 PM - Rotunda Lobby - Poster Presentation on 'Analyzing Failure Events on ORNLs Cray XT4'
Wednesday Nov 19th, 2:00-3:00 PM - ANL(558) - CIFTS Demos
Click here to download the CIFTS Supercomputing'08 Flyer.
Be sure to drop by and check us out. See you in Austin!
09/04/08 - FTB SOFTWARE version 0.6 now available
The first release of FTB - named FTB version
0.6, based on the FTB API 0.5 specification, is now publicly available. This release
has support for Linux(Ubuntu Fiesty, Hardy) clusters, IBM BG/L,
IBM BG/P and Cray XT4.
The FTB software can be downloaded from here
06/01/08 - FTB API 0.5 specification now publicly available
The CIFTS team is proud to release the the FTB API
version 0.5 specification to the High Performance Community.
The CIFTS FTB API 0.5 specification defines an
interface specification that allows libraries,
run- time systems, and applications to exchange
fault related information.
The specification can be downloaded from here.
Please email feedback to firstname.lastname@example.org (public mailing list). You can also reach us at email@example.com.
08/20/08 -SuperComputing 2008 news
We will be having a CIFTS BOF at SC'08 on Tuesday, November 18th from 12:15PM - 1:15PM.
In addition, we will be having a set of presentations and demos to update the community on the progress with CIFTS and FTB. Schedule will be made available on this website in the coming weeks.
07/29/08 - News Update
Firstly, apologies for being slow on the news front. But we've got good news :-). We have done porting our CIFTS FTB code to the new IBM BG/P at ANL. Right now, we are gearing for the FTB API public release as well as the code release by end-of-August 2008. Stay tuned for more exciting details.
11/27/07 - Thank you!
Thank you everyone for the great feedback at SC'07. We are working on incorporating some of the feedback received into our FTB design and implementation. We are aiming for a public release of our software in mid-2008. Stay tuned. Meanwhile, you can get updates on our activities on our CIFTS wiki at http://wiki.mcs.anl.gov/cifts/index.php
11/01/07 - CIFTS at SC'07
The CIFTS team will be organizing various demos, talks and a Birds-of-feather at SC'07.
- Tuesday Nov 13th, 1:30-2.00 pm - ANL(551) - Presentation talk on CIFTS
- Tuesday Nov 13th, 2:00-3:00 pm - ANL(551) - CIFTS Demos
- Wednesday Nov 14th, 10:00-11:00 am - LBNL(351) - Presentation talk on CIFTS and BLCR
- Wednesday Nov 14th, 12:15-1:15 pm - Room A3/A4 - CIFTS BOF
- Wednesday Nov 14th, 2:00-3:00 pm - ANL(551) - CIFTS Demos
- Thursday Nov 15th, 2:00-3:00 pm - ANL(551) - CIFTS Demos
10/25/07 - CIFTS Demos at SC'07
The CIFTS team will be organizing various demos at SC'07. These demos will demonstrate the following
- FTB-Enabled MPICH2 with FTB-Enabled Cobalt,
- FTB-Enabled Molecular Dynamics application and
- FTB-Enabled InfiniBand network monitoring
Stay tuned for more information on the demo schedule.
09/15/07 - CIFTS Birds-of-Feather (Bof) at Supercomputing'07
We will be having a CIFTS BOF at SC'07 on
November 14, 2007 (Wednesday) at 12:15PM -
1:15PM in Room #:A3/A4
We hope to have an open discussion about the usefulness, impact, and adoption of a comprehensive fault-tolerance framework like CIFTS in enterprise and research environments. We also hope to interact with your folks to better understand fault management and fault-tolerance challenges being faced in todays environment.