First presenter Co-presenter(s)
Name :  Jeremy Johnson * Name:  Dave Richardson
E-mail: E-mail:  
Affiliation: Drexel University Name:  Werner Krandick
Department: Department of Computer Science  E-mail:    
City: Name:  Anatole Ruslanov *
State/Province:   E-mail:    
Country: USA Name:   
11-11  E-mail:    
Session: 11- High-Performance Computer Algebra Schedule:
Friday, 12:00
Related website:  
Title of
Code Generation and Autotuning in Computer Algebra
Recently many numeric computing kernels such as matrix multiplication [ATLAS and PHiPAC], sparse matrix vector product [Sparsity] the fast Fourier transform [FFTW, UHFFT], and more general DSP transforms [SPIRAL] have utilized code generation and automated performance tuning to obtain high-performance implementations tuned to specific platforms. This talk outlines how these techniques may be applied to symbolic computation and computer algebra systems.

Examples demonstrating the benefit of these techniques are drawn from the papers [ISSAC06, ISSAC05] which show that the performance of the Taylor shift operation used in real root isolation can be substantially improved through delayed carry propagation, radix reduction, and an interlaced coefficient representation combined with register tiling.

This speedup is obtained over a straightforward implementation that utilizes the heavily optimized integer addition routines from GMP. The resulting code is significantly more complicated that the original code utilizing GMP. Moreover, the code can be tuned using different tile sizes, amounts of unrolling and different scheduling strategies. Note that compilers are not capable of performing this tuning due to various dependencies and lack higher knowledge needed to transform the code.

The particular tuning parameters used depends on the platform (e.g. number of registers, number of integer units, etc.) and furthermore, the optimal parameter settings are not easily determined without runtime experiments. This talk presents a code generator from [Richardson09] that produces code with different tile sizes and empirically searches for the best implementation on a given platform.

[ISSAC06] Jeremy Johnson, Werner Krandick, Kevin Lynch, David Richardson, and Anatole Ruslanov, High-performance implementations of the Descartes method. In J.-G. Dumas, editor, International Symposium on Symbolic and Algebraic Computation, pp. 154-161, ACM Press, 2006.

[ISSAC05] Jeremy Johnson, Werner Krandick, and Anatole Ruslanov, Architecture-aware classical Taylor shift by 1. In M. Kauers, editor, International Symposium on Symbolic and Algebraic Computation, pp. 200-207, ACM Press, 2005.

[Richardson09] David Richardson, Efficient Programming Techniques for the SACLIB Computer Algebra Library, Ph.D. Thesis, Drexel University, 2009.