Edward Rutledge. Robert Bond. HPEC 2002. 25 September, 2002. Lexington, MA. * This work is sponsored by the Department of Defense, under Air Force ...
Monolithic Compiler Experiments Using C++ Expression Templates* Lenore R. Mullin** Edward Rutledge Robert Bond HPEC 2002 25 September, 2002 Lexington, MA * This work is sponsored by the Department of Defense, under Air Force Contract F19628-00-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the Department of Defense.
MIT Lincoln Laboratory 999999-1 XYZ 10/3/02
** Dr. Mullin participated in this work while on sabbatical leave from the Dept. of Computer Science, University of Albany, State University of New York, Albany, NY.
Outline • Overview – Motivation – The Psi Calculus – Expression Templates
• Implementing the Psi Calculus with Expression Templates • Experiments • Future Work and Conclusions
020723-er-2 KAM 10/3/02
MIT Lincoln Laboratory
Motivation: The Mapping Problem y= conv (x) Map
intricate math intricate memory accesses (indexing)
Mathematics of Arrays
Approach
• Math and indexing operations in same expression • Framework for design space search – Rigorous and provably correct – Extensible to complex architectures
Example: “raising” array dimensionality
x: < 0
1
2 … 35 >
Memory Hierarchy
Main Memory
1 4
2 > 5 >
< 6 7 8 > < 9 10 11 >
L2 Cache
< 12 13 14 > < 15 16 17 >
L1 Cache
Map:
< 18 19 20 > < 21 22 23 > < 24 25 26 > < 27 28 29 >
Parallelism
020723-er-3 KAM 10/3/02
< 0 < 3
< 30 31 32 > < 33 34 35 >
MIT Lincoln Laboratory
Basic Idea Benefits • Theory based • High level API • Efficient
•• Expression ExpressionTemplates Templates ––Efficient Efficienthigh-level high-level container containeroperations operations ––C++ C++
Implementation PETE PETE Style Style Array Array Operations Operations
•• Psi PsiCalculus Calculus
––Array Arrayoperations operationsthat that compose efficiently compose efficiently ––Minimum Minimumnumber numberof of memory reads/writes memory reads/writes
Theory
Combining CombiningExpression ExpressionTemplates Templatesand andPsi PsiCalculus Calculusyields yields an anoptimal optimalimplementation implementationof ofarray arrayoperations operations 020723-er-4 KAM 10/3/02
MIT Lincoln Laboratory
Psi Calculus1 Key Concepts Denotational Normal Form (DNF): • Minimum number of memory reads/writes for a given array expression • Independent of data storage order Gamma function: Specifies data storage order
Operational Normal Form (ONF): • Like DNF, but takes data storage into account • For 1-d expressions, consists of one or more loops of the form: x[i]=y[stride*i+offset], l
i
h= < 5 6 7 >
h= < 5 6 7 >
x’=cat(reshape(, ), cat(x, reshape(,)))=
x’= < 0 0 1 . . . 4 0 0 >
x’ rot=binaryOmega(rotate,0,iota(N+M-1), 1 x’)
x’ rot= < 0 1 2 3 . . . >
x’ final=binaryOmega(take,0,reshape(,),1,x’ rot)
x’ final= < 0 1 2 >
Prod=binaryOmega (*,1, h,1,x’ final)
Prod= < 0 6 14 > < 5 12 21 >
Y=unaryOmega (sum, 1, Prod)
Y= < 7 20 38 . . . >
Psi PsiCalculus Calculusreduces reducesthis thisto toDNF DNFwith withminimum minimummemory memoryaccesses accesses 020723-er-7 KAM 10/3/02
MIT Lincoln Laboratory
Typical C++ Operator Overloading Example: Example: A=B+C A=B+C vector vector add add 22 temporary temporary vectors vectors created created
Main 1. Pass B and C references to operator + B&,
Operator +
Additional Memory Use
2. Create temporary result vector 3. Calculate results, store in temporary 4.Return copy of temporary 5. Pass results reference to operator= tem
Operator =
pc op y&
6. Perform assignment
020723-er-8 KAM 10/3/02
• Static memory • Dynamic memory (also affects execution time)
C&
temp B+C
temp
tem
op pc
Additional Execution Time
y
temp copy
A
• Cache misses/ page faults • Time to create a new vector • Time to create a copy of a vector • Time to destruct both temporaries MIT Lincoln Laboratory
C++ Expression Templates and PETE Parse Tree +
Expression Expression Templates
A=B+C A=B+C
Main
BinaryNode
Reduced Memory Use
, B& & C
2. Create expression parse tree 3. Return expression parse tree
• Parse tree only contains references
+ B&
C&
py co
Reduced Execution Time
4. Pass expression tree reference to operator
Operator =
C
Parse Parsetrees, trees,not notvectors, vectors,created created Parse trees, not vectors, created
1. Pass B and C references to operator +
Operator +
B
Expression Type
co
py
&
5. Calculate result and perform assignment B+C
A
• Better cache use • Loop fusion style optimization • Compile-time expression tree manipulation
• PETE, the Portable Expression Template Engine, is available from the Advanced Computing Laboratory at Los Alamos National Laboratory • PETE provides: – Expression template capability PETE: http://www.acl.lanl.gov/pete – Facilities to help navigate and evaluating parse trees
020723-er-9 KAM 10/3/02
MIT Lincoln Laboratory
Outline • Overview – Motivation – The Psi Calculus – Expression Templates
• Implementing the Psi Calculus with Expression Templates • Experiments • Future Work and Conclusions
020723-er-10 KAM 10/3/02
MIT Lincoln Laboratory
Implementing Psi Calculus with Expression Templates Example: A=take(4,drop(3,rev(B)))
1. Form expression tree take 4
B= A=
drop 3
rev B
020723-er-11 KAM 10/3/02
MIT Lincoln Laboratory
Implementing Psi Calculus with Expression Templates Example: A=take(4,drop(3,rev(B)))
1. Form expression tree take 4
B= A=
drop 3
rev B
Size info
2. Add size information
size=10
020723-er-12 KAM 10/3/02
B
MIT Lincoln Laboratory
Implementing Psi Calculus with Expression Templates Example: A=take(4,drop(3,rev(B)))
1. Form expression tree take 4
B= A=
drop 3
rev B
Size info
2. Add size information
size=10 size=10
020723-er-13 KAM 10/3/02
rev B
MIT Lincoln Laboratory
Implementing Psi Calculus with Expression Templates Example: A=take(4,drop(3,rev(B)))
1. Form expression tree take 4
B= A=
drop 3
rev B
Size info
2. Add size information
size=7
drop
3
size=10 size=10
020723-er-14 KAM 10/3/02
rev B
MIT Lincoln Laboratory
Implementing Psi Calculus with Expression Templates Example: A=take(4,drop(3,rev(B)))
1. Form expression tree take 4
B= A=
drop 3
rev B
2. Add size information Size info
size=4 4
take size=7
drop
3
size=10 size=10
020723-er-15 KAM 10/3/02
rev B
MIT Lincoln Laboratory
Implementing Psi Calculus with Expression Templates Example: A=take(4,drop(3,rev(B)))
1. Form expression tree take 4
B= A=
drop 3
rev B
3. Apply Psi Reduction rules
2. Add size information 4
take size=7
drop
3
size=10 size=10
020723-er-16 KAM 10/3/02
Reduction
Size info
size=4
rev B
size=10
A[i]=B[i]
MIT Lincoln Laboratory
Implementing Psi Calculus with Expression Templates Example: A=take(4,drop(3,rev(B)))
1. Form expression tree take 4
B= A=
drop 3
rev B
Size info
size=4 4
take size=7
drop
3
size=10 size=10
020723-er-17 KAM 10/3/02
rev
size=10
A[i] =B[-i+B.size-1] =B[-i+9]
B
size=10
A[i]=B[i]
Reduction
3. Apply Psi Reduction rules
2. Add size information
MIT Lincoln Laboratory
Implementing Psi Calculus with Expression Templates Example: A=take(4,drop(3,rev(B)))
1. Form expression tree take 4
B= A=
drop 3
rev B
Size info
size=4 4
take size=7
drop
3
size=10 size=10
020723-er-18 KAM 10/3/02
size=7 rev
A[i] =B[-(i+3)+9] =B[-i+6]
size=10
A[i] =B[-i+B.size-1] =B[-i+9]
B
size=10
A[i]=B[i]
Reduction
3. Apply Psi Reduction rules
2. Add size information
MIT Lincoln Laboratory
Implementing Psi Calculus with Expression Templates Recall: Psi Reduction for 1-d arrays always yields one or more expressions of the form: x[i]=y[stride*i+ offset] l i