First presenter Co-presenter(s)
Name :  Jeremy Johnson * Name:  Lingchuan Meng *
E-mail: E-mail:  
Affiliation: Drexel University Name:   
Department: Department of Computer Science  E-mail:    
City: Name:   
State/Province:   E-mail:    
Country: USA Name:   
11-08  E-mail:    
Session: 11- High-Performance Computer Algebra Schedule:
Friday, 10:30
Related website:  
Title of
SPIRAL-Generated Modular FFTs
In this talk we present a preliminary investigation on the use of the SPIRAL system ( to generate code for modular Fast Fourier Transforms (FFTs). SPIRAL is a library generation system that automatically generates platform-tuned implementations of digital signal processing algorithms with an emphasis on fast transforms. Currently, SPRIAL can generate highly optimized fixed point and floating-point FFTs for a variety of platforms including vectorization, multi-threaded and distributed memory parallelization. The code produced is competitive with the best available code for these platforms and SPIRAL is used by Intel for its IPP (Intel Performance Primitives) and MKL (Math kernel Library) libraries.

The SPIRAL system uses a mathematical framework for representing and deriving algorithms. Algorithms are derived using rewrite rules and additional rules are used to symbolically manipulate algorithms into forms that take advantage of the underlying hardware. A search engine with a feedback loop is used to tune implementations to particular platforms. New transforms are added by introducing new symbols and their definition and new algorithms can be generated by adding new rules.

We extended SPIRAL to generate algorithms for FFT computation over finite fields. This addition required adding a new data type, several new rules and a new transform (ModDFT) definition. In addition, the unparser (where code is generated) was extended so that it can generate code for modular arithmetic. With these enhancements, the SPRIAL machinery can be applied to modular transforms that are of interest to the computer algebra community. This provides a framework for systematically optimizing these transforms, utilizing vector and parallel computation, and for automatically tuning them to different platforms. In this talk we present preliminary results from this exploration.