next up previous
Next: Parallel Theorem Prover Up: Applications Previous: Vortex Dynamics in High-Temperature

  
Superconductivity--Vortex Structures

(Contributed by Mario Palumbo and Paul Plassmann)


 

Table: Comparison of the CPU and elapsed times for three different cases: (1) the Intel Touchstone DELTA, (2) the IBM SP1 running EUIH, and (3) the IBM SP1 running p4. All cases are for 100 BFGS iterations with a constant global domain size of 17#17.
Number of Intel DELTA IBM SP1 (EUIH) IBM SP1 ( p4)
Processors CPU Elapsed CPU Elapsed CPU Elapsed
1 - - 203.89 205.46 203.90 205.29
2 307.67 308.00 86.09 86.75 81.26 118.44
4 160.26 160.00 37.54 37.61 33.29 112.46
8 79.33 80.00 21.33 21.50 17.94 196.09
16 43.13 43.00 12.75 12.97 - -

 



 

Table: Comparison of the CPU and elapsed time for the cases: (1) the Intel Touchstone DELTA, (2) the IBM SP1 running EUIH, and all cases are for 100 BFGS iterations with a constant local domain size of 18#18.
Number of Intel DELTA IBM SP1 (euih)
Processors CPU Elapsed CPU Elapsed
1 73.71 74.00 15.91 16.01
2 76.27 76.00 17.71 17.90
4 77.85 78.00 19.33 19.49
8 79.33 80.00 21.46 21.57
16 80.57 81.00 22.75 22.98

 


We developed a parallel code that uses the limited-memory BFGS algorithm [12] to find optimal vortex solutions within the three-dimensional anisotropic Ginzburg-Landau model. Our implementation is capable of considering arbitrary field orientation as well as various types of random and correlated disorder. This code is currently being used to study various properties of uniaxial superconductors such as the lower critical field and the anomalous ``vortex-chain'' state.

The parallelization was achieved through a simple three-dimensional domain decomposition scheme in which the global domain is partitioned across an arbitrary number of processors. The communication between processors is carried out using the Chameleon parallel software package (see Section [*]). The portability of the Chameleon primitives has allowed us to run the code on a variety of parallel platforms, using several different parallel communication paradigms, without any coding changes. Performance comparisons for a selection of these cases are provided in Tables [*] and [*]. Note the superlinear speedup in the SP1 results in Table [*]; this is most likely caused by cache effects. Also note the CPU time column from the SP1 (p4) results. These show very good performance in a time-shared environment, even though the elapsed time performance is relatively poor.

Table [*] shows the performance as the local domain size is held fixed and the number of unknowns grows proportionally with the number of processors. These suggest that 18#18 (only 4096 mesh points) local domain is too small for the SP1. This result is consistent with the faster speed of the processors with respect to the communication than for the Intel DELTA, and emphasizes why the large per-node memory is an important feature of the SP1.


next up previous
Next: Parallel Theorem Prover Up: Applications Previous: Vortex Dynamics in High-Temperature

Karen D. Toonen
1998-11-18