#### Constraint-Based Code Generation

Gabriel Hjort Blindell – KTH, SICS Roberto Castañeda Lozano – SICS Mats Carlsson – SICS Frej Drejhammar – SICS Christian Schulte – KTH, SICS



#### IOSS 2013

## Outline

- **1** Background & Motivation
- 2 Our Approach
- 3 Instruction Selection
- 4 Instruction Scheduling
- 5 Register Allocation
- 6 Challenges and Future Work

#### 1 Background & Motivation

- 2 Our Approach
- 3 Instruction Selection
- 4 Instruction Scheduling
- 5 Register Allocation
- 6 Challenges and Future Work

 $\blacksquare$  Target-independent program representation  $\rightarrow$  Optimized target-specific assembly code

- $\blacksquare$  Target-independent program representation  $\rightarrow$  Optimized target-specific assembly code
  - One of the oldest computer science problems

- $\blacksquare$  Target-independent program representation  $\rightarrow$  Optimized target-specific assembly code
  - One of the oldest computer science problems
- Set of interdependent NP-complete problems

- $\blacksquare$  Target-independent program representation  $\rightarrow$  Optimized target-specific assembly code
  - One of the oldest computer science problems
- Set of interdependent NP-complete problems
  - Traditionally solved using non-optimal heuristics

- $\blacksquare$  Target-independent program representation  $\rightarrow$  Optimized target-specific assembly code
  - One of the oldest computer science problems
- Set of interdependent NP-complete problems
  - Traditionally solved using non-optimal heuristics
  - Phase ordering

# Traditional compiler

## Traditional compiler



## Traditional compiler





Control flow graph (CFG), per function



Control flow graph (CFG), per function



Control flow graph (CFG), per function

Expression tree, per block





Control flow graph (CFG), per function

Directed acyclic graph (DAG)



Select which CPU instructions to use



Select which CPU instructions to use

Find covering with least cost



 $(c_4)$ 

 $(r_{sp})$ 

Select which CPU instructions to useFind covering with least cost

$$(c_{i}) = 0 \qquad (c_{i}) = 1 \qquad$$





addi  $t_0$ ,  $r_{sp}$ , 4 ld  $t_1$ ,  $t_0$ add  $t_z$ ,  $t_1$ ,  $t_1$ 

addi  $t_0$ ,  $r_{sp}$ , 4 ld  $t_1$ ,  $t_0$ add  $t_z$ ,  $t_1$ ,  $t_1$ 

Schedule instructions

addi  $t_0$ ,  $r_{sp}$ , 4 ld  $t_1$ ,  $t_0$ add  $t_z$ ,  $t_1$ ,  $t_1$ 

- Schedule instructions
- Assign temporaries to registers (or spill to memory)

- addi  $t_0$ ,  $r_{sp}$ , 4 ld  $t_1$ ,  $t_0$ add  $t_z$ ,  $t_1$ ,  $t_1$
- addi  $r_0$ ,  $r_{sp}$ , 4 ld  $r_1$ ,  $r_0$ add  $r_2$ ,  $r_1$ ,  $r_1$

Requires 3 registers

- Schedule instructions
- Assign temporaries to registers (or spill to memory)

- addi  $t_0$ ,  $r_{sp}$ , 4 ld  $t_1$ ,  $t_0$ add  $t_z$ ,  $t_1$ ,  $t_1$
- addi  $r_0$ ,  $r_{sp}$ , 4 ld  $r_1$ ,  $r_0$ add  $r_2$ ,  $r_1$ ,  $r_1$

Requires 3 registers

- Schedule instructions
- Assign temporaries to registers (or spill to memory)

addi  $r_0$ ,  $r_{sp}$ , 4 ld  $r_0$ ,  $r_0$ add  $r_0$ ,  $r_0$ ,  $r_0$ 

Requires only 1 register

#### **1** Background & Motivation

#### 2 Our Approach

- 3 Instruction Selection
- 4 Instruction Scheduling
- 5 Register Allocation
- 6 Challenges and Future Work

Constraint programming

- Constraint programming
  - Optimality

Constraint programming

Optimality\*

\*Given enough patience and money to burn while waiting

#### Constraint programming

- Optimality\*
- Flexible model

\*Given enough patience and money to burn while waiting

#### Constraint programming

- Optimality\*
- Flexible model
- Integration

\*Given enough patience and money to burn while waiting

# Our Approach

Constraint programming

- Optimality\*
- Flexible model
- Integration
- Current status

\*Given enough patience and money to burn while waiting

# Our Approach

Constraint programming

- Optimality\*
- Flexible model
- Integration
- Current status
  - Instruction selection concepts / ideas

\*Given enough patience and money to burn while waiting

# Our Approach

Constraint programming

- Optimality\*
- Flexible model
- Integration
- Current status
  - Instruction selection concepts / ideas
  - Instruction scheduling & register allocation prototype + paper\*

\*Given enough patience and money to burn while waiting \*Constraint-based register allocation and instruction scheduling. CP2012. 1 Background & Motivation

2 Our Approach

- 3 Instruction Selection
- 4 Instruction Scheduling
- 5 Register Allocation
- 6 Challenges and Future Work

# Which DAG node do we cover by which pattern?

1 Identify potential use of patterns in the DAG

- 1 Identify potential use of patterns in the DAG
- 2 Find optimal covering

- Identify potential use of patterns in the DAG
  Use existing O(n) techniques
- 2 Find optimal covering

- **1** Identify potential use of patterns in the DAG
  - Use existing O(n) techniques
- 2 Find optimal covering
  - Build and solve a constraint model







 One integer variable for each DAG node to decide by which pattern instance is it covered





- Variables:
  - One integer variable for each DAG node to decide by which pattern instance is it covered
  - A Boolean variable for each pattern instance to decide whether it is used



- Variables:
  - One integer variable for each DAG node to decide by which pattern instance is it covered
  - A Boolean variable for each pattern instance to decide whether it is used



- Variables:
  - One integer variable for each DAG node to decide by which pattern instance is it covered
  - A Boolean variable for each pattern instance to decide whether it is used
- Constraints:



- One integer variable for each DAG node to decide by which pattern instance is it covered
- A Boolean variable for each pattern instance to decide whether it is used
- Constraints:

$$D(\bigcirc^{c_4}) = P_{4,1} \Longleftrightarrow D(\bigcirc^+) = P_{4,1}$$
$$D(\bigcirc^{r_{sp}}) = P_{4,1} \Longleftrightarrow D(\bigcirc^+) = P_{4,1}$$



- One integer variable for each DAG node to decide by which pattern instance is it covered
- A Boolean variable for each pattern instance to decide whether it is used
- Constraints:

$$D(\bigcirc 4) = P_{4,1} \iff D(\bigcirc) = P_{4,1}$$
  
 $D(\bigcirc p) = P_{4,1} \iff D(\bigcirc) = P_{4,1}$   
 $D(\bigcirc) = P_{4,1} \iff D(\bigcirc) = 1$ 



- One integer variable for each DAG node to decide by which pattern instance is it covered
- A Boolean variable for each pattern instance to decide whether it is used
- Constraints:
  - $D(\stackrel{(c_4)}{=} P_{4,1} \iff D(\stackrel{+}{=}) = P_{4,1}$  $D(\stackrel{(r_{sp})}{=} P_{4,1} \iff D(\stackrel{+}{=}) = P_{4,1}$  $D(\stackrel{+}{=}) = P_{4,1} \iff D(B_{P_{4,1}}) = 1$ Objective:



- One integer variable for each DAG node to decide by which pattern instance is it covered
- A Boolean variable for each pattern instance to decide whether it is used
- Constraints:
- $D((c_4)) = P_{4,1} \iff D((+)) = P_{4,1}$  $D((r_{sp})) = P_{4,1} \iff D((+)) = P_{4,1}$  $D((+)) = P_{4,1} \iff D(B_{P_{4,1}}) = 1$  $\bullet \text{Objective:}$ 
  - $\min \sum_{p,i \in P_{p,i}} c_p B_{P_{p,i}}$



 Pattern restrictions can simply be added as constraints 1 Background & Motivation

2 Our Approach

- 3 Instruction Selection
- 4 Instruction Scheduling
- 5 Register Allocation
- 6 Challenges and Future Work

### in which cycle is each instruction issued?

Classic scheduling model with:

precedences among instructions

Classic scheduling model with:

precedences among instructions

if *i* defines a value used by *j*:

i must be issued before j

Classic scheduling model with:

precedences among instructions

if *i* defines a value used by *j*:

*i* must be issued before *j* 

resource constraints

Classic scheduling model with:

precedences among instructions

if *i* defines a value used by *j*:

*i* must be issued before *j* 

resource constraints

if *i* and *j* use the same functional unit: *i* and *j* must be issued in different cycles 1 Background & Motivation

- 2 Our Approach
- 3 Instruction Selection
- 4 Instruction Scheduling
- 5 Register Allocation
- 6 Challenges and Future Work

Several problems

register assignment

Several problems

- register assignment
- spilling

Several problems

- register assignment
- spilling
- coalescing

Several problems

- register assignment
- spilling
- coalescing
- Extra challenge: whole function

• A temp is live while it might still be used:



• A temp is live while it might still be used:



• A temp is live while it might still be used:



Two temps interfere if they are live simultaneously

• A temp is live while it might still be used:



Two temps interfere if they are live simultaneously
 non-interfering temps can share registers

■ *t*<sup>0</sup> is *global*: live in multiple blocks

How to model interference of global temps?

- *t*<sup>0</sup> is *global*: live in multiple blocks
- How to model interference of global temps?
- LSSA: decompose global temps into multiple local temps

- *t*<sub>0</sub> is *global*: live in multiple blocks
- How to model interference of global temps?
- LSSA: decompose global temps into multiple local temps



- *t*<sub>0</sub> is *global*: live in multiple blocks
- How to model interference of global temps?
- LSSA: decompose global temps into multiple local temps



- *t*<sub>0</sub> is *global*: live in multiple blocks
- How to model interference of global temps?
- LSSA: decompose global temps into multiple local temps



- *t*<sub>0</sub> is *global*: live in multiple blocks
- How to model interference of global temps?
- LSSA: decompose global temps into multiple local temps



- *t*<sub>0</sub> is *global*: live in multiple blocks
- How to model interference of global temps?
- LSSA: decompose global temps into multiple local temps



- *t*<sub>0</sub> is *global*: live in multiple blocks
- How to model interference of global temps?
- LSSA: decompose global temps into multiple local temps



Invariant: all temps are local

- *t*<sub>0</sub> is *global*: live in multiple blocks
- How to model interference of global temps?
- LSSA: decompose global temps into multiple local temps



■ Invariant: all temps are local → simple interference model

## Register Assignment

## to which register do we assign each temporary?

Register Assignment

**Rectangle Packing** 

Register Assignment temp live ranges

Rectangle Packing rectangles

Register Assignment

temp live ranges temp size Rectangle Packing rectangles rectangle width

# Register AssignmentRecttemp live rangesrectatemp sizerectainterfering temps cannot share registersrecta

Rectangle Packing

rectangles

rectangle width

rectangles cannot overlap

| Register Assignment                                   | Rectangle Packing         |
|-------------------------------------------------------|---------------------------|
| temp live ranges                                      | rectangles                |
| temp size                                             | rectangle width           |
| interfering temps cannot share registers              | rectangles cannot overlap |
| $\rightarrow$ based on (Pereira <i>et al.</i> , 2008) |                           |











Spilling

consider memory locations as registers too

Spilling

- consider memory locations as registers too
- optional copies to transfer temps

Spilling

- consider memory locations as registers too
- optional copies to transfer temps
- new variables to decide on copy implementation
  - special case of instruction selection

Spilling

- consider memory locations as registers too
- optional copies to transfer temps
- new variables to decide on copy implementation
  - special case of instruction selection
- Coalescing
  - assign copy source and destination to same register

Spilling

consider memory locations as registers too

optional copies to transfer temps

new variables to decide on copy implementation

special case of instruction selection

Coalescing

assign copy source and destination to same register

Global

decomposed temps are assigned to same register

#### 1 Background & Motivation

- 2 Our Approach
- 3 Instruction Selection
- 4 Instruction Scheduling
- 5 Register Allocation
- 6 Challenges and Future Work

 $\vdots$  $t_1 \leftarrow \texttt{store} \ldots$ 

2

÷

 $t_1 \leftarrow \texttt{store} \ldots$ 





 $t_1 \leftarrow \text{store} \ldots$  $t_2 \leftarrow \text{load } t_1$  $\ldots \leftarrow \operatorname{neg} t_2$ .  $t_3 \leftarrow \text{load } t_1$  $\ldots \leftarrow \texttt{inc} t_3$ 

.



Ultimate goal: fully unified code generation

- Ultimate goal: fully unified code generation
- Problem for scalability?

- Ultimate goal: fully unified code generation
- Problem for scalability? not necessarily!

- Ultimate goal: fully unified code generation
- Problem for scalability? not necessarily!
  - many dominant instruction selection decisions

- Ultimate goal: fully unified code generation
- Problem for scalability? not necessarily!
  - many dominant instruction selection decisions why use mult and add when you can use mac?

- Ultimate goal: fully unified code generation
- Problem for scalability? not necessarily!
  - many dominant instruction selection decisions why use mult and add when you can use mac?
  - most instruction selection decisions are local

- Ultimate goal: fully unified code generation
- Problem for scalability? not necessarily!
  - many dominant instruction selection decisions why use mult and add when you can use mac?
  - most instruction selection decisions are local
- Work in progress ....

- Inlining and other optimizations can blow function size
- Not usual, but we cannot just ignore them!

- Inlining and other optimizations can blow function size
- Not usual, but we cannot just ignore them!
- Possible strategies:
  - 1 progressiveness

- Inlining and other optimizations can blow function size
- Not usual, but we cannot just ignore them!
- Possible strategies:
  - progressiveness
  - 2 local search, large neighborhood search, etc.

- Inlining and other optimizations can blow function size
- Not usual, but we cannot just ignore them!
- Possible strategies:
  - 1 progressiveness
  - 2 local search, large neighborhood search, etc.
  - **3** if everything else fails, resort to greedy algorithms
    - decent polynomial solutions always available