Recently many numeric computing kernels such as matrix multiplication [ATLAS and PHiPAC],
sparse matrix vector product [Sparsity] the fast Fourier transform [FFTW, UHFFT], and
more general DSP transforms [SPIRAL] have utilized code generation and automated
performance tuning to obtain high-performance implementations tuned to specific
platforms. This talk outlines how these techniques may be applied to symbolic
computation and computer algebra systems.
Examples demonstrating the benefit of these techniques are drawn from the
papers [ISSAC06, ISSAC05] which show that the performance of the Taylor shift
operation used in real root isolation can be substantially improved through
delayed carry propagation, radix reduction, and an interlaced coefficient
representation combined with register tiling.
This speedup is obtained over a straightforward implementation that utilizes
the heavily optimized integer addition routines from GMP. The resulting
code is significantly more complicated that the original code utilizing GMP.
Moreover, the code can be tuned using different tile sizes, amounts of
unrolling and different scheduling strategies. Note that compilers are not
capable of performing this tuning due to various dependencies and lack
higher knowledge needed to transform the code.
The particular tuning parameters used depends on the platform (e.g. number
of registers, number of integer units, etc.) and furthermore, the optimal
parameter settings are not easily determined without runtime experiments.
This talk presents a code generator from [Richardson09] that produces code
with different tile sizes and empirically searches for the best implementation
on a given platform.
[ISSAC06] Jeremy Johnson, Werner Krandick, Kevin Lynch, David Richardson, and
Anatole Ruslanov, High-performance implementations of the Descartes method.
In J.-G. Dumas, editor, International Symposium on Symbolic and Algebraic Computation,
pp. 154-161, ACM Press, 2006.
[ISSAC05] Jeremy Johnson, Werner Krandick, and Anatole Ruslanov, Architecture-aware
classical Taylor shift by 1. In M. Kauers, editor, International Symposium on Symbolic
and Algebraic Computation, pp. 200-207, ACM Press, 2005.
[Richardson09] David Richardson, Efficient Programming Techniques for the SACLIB
Computer Algebra Library, Ph.D. Thesis, Drexel University, 2009. |