Here are the results of Otter 3.1 and MACE 1.4 on TPTP v2.3.0.
These are the versions of Otter and MACE that will be entered in the CASC-17 ATP System Competition.
Otter (which searches for proofs) and MACE (which searches for models/counterexamples) are separate programs. Some of the TPTP problems are satisfiable, so the two programs were combined with a shell script (otter-mace, that runs Otter for up to 5 seconds, then (if no proof was found) runs MACE for up to 5 seconds, then (if no counterexample was found) runs Otter for up to for the rest of the allowed time.
TPTP v2.3.0 has 4229 problems. Most have proofs (are unsatisfiable) and some have models. Otter-MACE found proofs for 1902 problems, and models for 194 problems, giving about 49.5% success rate (given 5 minutes for each problem) over the whole TPTP.
These figures were obtained in May 2000 on a (400 MHz PII) linux box.
The table has entries of the following form.
Problem Result Reason Seconds Memory Generated Size --------------------------------------------------------------------------- ALG002-1 PROOF --- 1 223K 1851 10 ALG008-1 MODEL --- 6 - - 3 ALG001-1 fail time 300 ??K ?? - --------------------------------------------------------------------------- Problem: the TPTP problem name Result/Model: PROOF, MODEL, fail Reason: reason for failure Seconds: time used Memory: memory used (applies only to Otter searches) Generated: clauses generated (applies only to Otter searches) Size: proof length or model size
Here is the table (gzipped plain text).
tptp2X -f otter:hypothesis:[set(tptp_eq),set(auto),clear(print_given)] -t rm_equality:stfp
These activities are projects of the Mathematics and Computer Science Division of Argonne National Laboratory.