SCEE Invited Talks

Return to SCEE-2000 home page

SCEE-2000
Scientific Computing in Electrical Engineering
August 20 - 23, 2000
Warnemünde, Germany

Heinz K. Dirks, (Christian-Albrechts-University at Kiel)

Parallel Algorithms for Solving Linear Equations in VLSI Circuit Simulation

Analog circuit simulation is indispensable for the design of integrated circuits. Simulation complexity and expenditure grow with increasing packing density of the circuits. Most of the computing time is required for the recurrent direct solution of large systems of linear equations Ax=b. Matrix A has a very sparse and irregular pattern of nonzero coefficients, with changing values during the simulation process. To minimize the number of fill-ins an empirical reordering method (Markowitz) is used. In this contribution parallel algorithms are presented, which reduce simulation time using multiprocessor machines with shared memory. Experimental results rest upon efficient multithreaded methods. Elimination trees and elimination graphs embody the dependence of data during the elimination process. Therefore, they are important means for estimating and controlling parallel solution algorithms. At the so-called coarse sparse-matrix granularity level, for each variable a quantitative measure of the computational effort of the corresponding elimination step is introduced yielding an estimate of the available speed-up. Applying these instruments and a modification of Liu's rotation method, the elimination tree is restructured such that the potential parallelism is increased. This restructuring corresponds to reordering the equations. Using an empirical algorithm, which reduces the critical path length (CPL), the total elimination task is subdivided into a number of partitions. In most cases the employed rules prevent disadvantageous partitions, but often the result is unbalanced. This problem is solved by increasing the number of partitions (pool of tasks) and using a corresponding scheduling mechanism. With the CPL as relevant measure, the quality of the partitioning is estimated comparing the elimination and partition tree. Furthermore, experimental results are brought into comparison with theoretical speed-ups, which are calculated in a symbolic scheduling process, where any overhead like that for synchronization is neglected. For some examples the expectations are exceeded. For the other examples the difference between theoretical and experimental results, and the saturation behaviour is insignificant. Despite the rotation method the speed-up is typically limited to a factor of four. Therefore it is necessary to exploit the potential parallelism based on medium sparse-matrix granularity to further increase speed-up. Because of the synchronization overhead, the use of such a method is successful in special matrix areas only. Such areas are detected by a cost analysis. Static and dynamic multigranular methods, which are based on such a cost analysis, are presented.

SCEE-2000
last updated 14.06.2000