next up previous
Next: Analysis of Results Up: The Parallel Closure Algorithm Previous: Portable Implementation of Parallelism

Results

(Sam,some ec,imp4,semigroup(223), maybe apbhp)  

In this section we discuss the workings of ROO in practice and present some experimental results.

At first glance one's intuition might suggest that the ability of ROO to deliver the expected performance would be hobbled by the restriction that only one process can be carrying out Task B at a time. If Task B falls behind (the K list grows continually, and never emptied), there will be two bad effects. The first is that the use of an important generated clause may be delayed by a long residence in the K list, thus postponing the proof. The second is that, particularly for large numbers of processes, Task B may not be able to keep the pool of work units filled by adding new, fully processed clauses to the set of support. In this situation, idle processes will await work in the askfor monitor, wasting potential processing power.

In practice, however, this situation does not arise as often as one might think. (It does indeed arise however; see Section ?? below). To understand just what does happen as ROO proceeds, it is useful to study its behavior with a general purpose tool developed at Argonne called Atrace [1].


  
Figure: Beginning of Sam's Lemma
7#7

In Figure [*] we see the situation at the beginning of the run of Sam's Lemma on ROO with sixteen processes on a Sequent Symmetry. (Describe start up, set of support size, task B, etc.) But later in the run (3 / 4sec. later) the situation has changed, and the restriction that only one process can do B at a time is not a restriction. All processes are busy.

(upshot picture of Sam at 750 millisecs.)

What has happened is that although the instances of A are generating new candidate clauses, after a while the existing clause space has expanded to the point where most new clauses are subsumed by Task A and never make it into the K list. When K does become non-empty, it quickly emptied by the next process to become available.


 
Table: Sam's Lemma
  OTTER ROO1 ROO4 ROO8 ROO16 ROO24
seconds 26.5(1.02) 27.1(1.0) 7.5(3.6) 4.1(6.6) 2.6(10.4) 2.1(12.9)
generated 5924 5981 6139 6077 6199 6278
gen/sec 224(1.02) 220(1.0) 819(3.7) 1482(6.7) 2384(10.8) 2990(13.5)
kept 134 132 131 131 130 131
memory 96K 128K 192K 319K 575K 830K
 


 
Table: The IMP-4 problem of Lukaciewicz Logic
  OTTER ROO1 ROO4 ROO8 ROO16 ROO24
seconds 27303(1.1) 30458(1.0) 7398(4.1) 3541(8.6) 1876(16.2) 1199(25.4)
generated 6706380 6668046 6369094 6142292 6352314 6251041
gen/sec 246(1.1) 218(1.0) 860(3.9) 1734(7.9) 3386(15.5) 5213(23.9)
kept 20410 20342 20323 20571 18951 15849
memory 7185K 7664K 8590K 10250K 12965K 12039K
 


 
Table: Finding the size of F3B2
  roo1 roo8 roo16 roo24
run time(*) 1120.38 143.94 71.94 50.57
generated 8610 8610 8610 8610
kept 1429 1429 1429 1429
memory (K) 6259 8334 11464 16541
speedups:        
``proof'' 1.0 7.8 15.6 22.2

 

A smaller version of this problem is described in [2].


next up previous
Next: Analysis of Results Up: The Parallel Closure Algorithm Previous: Portable Implementation of Parallelism
Karen D. Toonen
1998-11-19