The supercomputers of today are parallel computers. In order to exploit effectively the power that these machines make available, it is necessary to develop parallel algorithms, introducing a new set of issues like contention and load balancing that are absent from traditional sequential algorithms.
Parallel programming is still a highly experimental science in which the designs of both user and system programs are undergoing simultaneous study. System implementors tune systems in response to experimental data from users, and users try to understand the effects of decisions made for them by system implementors as well as the effects of variations in the parallel algorithms being executed by their programs.
Collecting timing statistics and measuring speedups is often insufficient for understanding why the results are what they are. This is because in a sequential program, we know the sequence of events, whereas in a parallel program, not only is the precise sequence of events unknown to us, but it changes from one run to the next. In most cases, we have some expectation of the rough sequence of events, the overall ``behavior'' of the program, but in the parallel case it is very difficult to deduce from readily available statistics whether the program actually behaved according to our expectations or not. A typical situation is one in which our intuition has told us that near-linear speedups should occur, but execution times (easy to measure) tell us that we are not getting them. It is often not at all clear what to do next.