Lab 10: Instruction Selection

Project
November 26, 2021

See the slides of Lecture 12 on code generation for an approach to get started.

Objectives

  1. Develop tests for your compiler that explore the edge cases.

  2. Define a transformation strategy that translates ChocoPy expressions to RISC-V code, including variable definitions, integer and boolean constants and operators.

  3. Develop a simplification transformation that transforms expressions such that more concise code can be generated.

Basic Compiler Pipeline

Follow the slides of Lecture 12 to set up a basic compiler pipeline:

  compile-to-rv32im-ast :: Program -> RProgram

  compile-to-rv32im-ast =
    compile-cpy-to-cir
    ; compile-cir-to-rv32im
    ; compile-rv32im

Debugging

Use debug(!msg) or dbg(|msg) to print the current term to the console.

You can also debug your generated code, by creating a .cpy file, transforming it to an .rv32im file using the Spoofax > Generation menu, and copying the generated output to the online Venus editor, where you can use the Simulator tab to run your code or step through each line, and inspect the memory and registers.

From ChocoPy to C-IR

Develop a first compilation phase to transform ChocoPy programs to an intermediate language that makes control flow explicit.

 compile-cpy-to-cir :: Program -> CProgram

 compile-cpy-to-cir =
   explicate-types
   ; desugar
   ; uniquify
   ; remove-complex-operands
   ; explicate-control

Explicate Types

A useful transformation is to type specialize the constructors of the AST, such that the analysis results are no longer needed. That is useful when applying transformations, since preserving those annotations cannot be done for all transformations. The following explicate-types transformation should be invoked on the AST before invoking the compiler transformation.

signature
  constructors
    AddInt : Exp * Exp -> Exp

rules

  explicate-types =
    innermost(type-specialize)

  type-specialize :
    add@Add(e1, e2) -> AddInt(e1, e2)
    where <get-type> add => Int()

Note that the rules for translating integer additions should be adapted to reflect the change in constructors.

Desugar

To improve the result of code generation, it can be useful to transform the source language expression. For example, left-associative additions may produce a better result.

  desugar-exp :
    AddInt(e1, AddInt(e2, e3)) -> AddInt(AddInt(e1, e2), e3)

Define a transformation to desugar ChocoPy programs, making the task of downstream compilation easier.

This simplification can include constant folding, eventually. However, do not yet include constant folding rules, since you want to test the translation of operators. That is, your code generator should be able to translate arbitrary combinations of operators, so be prepared for the general case.

Think about (and try out in the online compiler!) the differences between adding integers and concatenating strings in RISC-V. In ChocoPy, they initially both use the Add(...) constructor, so make sure to disambiguate them into separate constructors you will later use for code generation. The same holds for other operations, but you have to think about those yourself.

Uniquify

Define a transformation that eliminates shadowing. (See slides)

Remove Complex Operands

To simplify instruction selection, simplify expressions to only use atomic expressions (constants or variables) creating new variables to store intermediate results.

Explicate Control using C-IR

Use an intermediate language to explicate control. That is, the explicate-control strategy translates from ChocoPy ASTs

signature
  sorts CID CINT CProgram CBlock CLabel CTail CStmt
        CType CExp CAtom CVar
  constructors
                     : string -> CID
                     : string -> CINT
    CProgram         : List(CBlock) -> CProgram
    CBlock           : CLabel * CTail -> CBlock
    CLabel           : CID -> CLabel
    CReturn          : CExp -> CTail
    CReturnNone      : CTail
    CSeq             : CStmt * CTail -> CTail
    CVarDec          : CVar * CType * CExp -> CStmt
    CAssign          : CVar * CExp -> CStmt
    CIntT            : CType
                     : CAtom -> CExp
    CMin             : CAtom -> CExp
    CAdd             : CAtom * CAtom -> CExp
    CMul             : CAtom * CAtom -> CExp
    CDiv             : CAtom * CAtom -> CExp
    CInt             : CINT -> CAtom
                     : CVar -> CAtom
    CVar             : CID -> CVar

Instruction Selection

Define a transformation from C-IR to RISC-V instructions:

 compile-cir-to-rv32im :: CProgram -> RProgram

 compile-cir-to-rv32im =
   select-instructions-cprogram

The essence of this translation consists of instruction selection, i.e. mapping the high-level instructions of the intermediate language to concrete RISC-V instructions. For example, register allocation for integer constants, variables, and addition may be defined as follows:

  select-instrs-exp(|RArg) :: CExp -> List(RLine)

  select-instrs-exp(|x) :
    CInt(i) -> [RLi(x, <cint-to-rint>i)]

  select-instrs-exp(|x) :
    CVar(y) -> [RMv(x, RVar(<cid-to-string>y))]

  select-instrs-exp(|x) :
    CAdd(y@CVar(_), z@CVar(_)) -> [RAdd(x, <cvar-to-rvar>y, <cvar-to-rvar>z)]

Note that because expressions have been transformed to applications of operators to atomic expressions, this step does not need to invent names for intermediate results. We use RVar(x) as symbolic registers in RISC-V code in order to postpone the mapping to concrete registers.

Define transformation rules for all operators of ChocoPy expressions.

Special Cases

RISC-V provides specialized instructions for some operations. For example, the addi instruction allows directly adding an integer constant (between -2048 and 2047) to a register. A compiler can make use of such instructions, by detecting special patterns in the source language. For example, the following rule, detects additions with an integer constant, and translates those to applications of addi, avoiding the use of an extra register.

select-instrs-exp(|x) :
  CAdd(z@CVar(_), CInt(i)) -> [RAddi(x, <cvar-to-rvar>z, <cint-to-rint>i)]
  with debug(!"select-instrs-exp: ")

Can you detect other specialized instructions and corresponding source language patterns that provide a more concise and/or faster target code?

Booleans

In ChocoPy we compile the Booleans True and False to integers in RISC-V (1 and 0, respectively). Implement the Boolean operators and, or and not, and integer comparison operators ==, !=, <, >, <=, >= (you can ignore is for now since it operates on objects).

You can make use of the online compiler, or the RISC-V instruction set to find the proper instructions in RISC-V.

Patching RISC-V instructions

Finally, we define a compiler stage that patches up the RISC-V code:

 compile-rv32im :: RProgram -> RProgram

 compile-rv32im =
   assign-homes
   ; patch-instructions

Assign Homes

Map symbolic registers to concrete registers. Either by mapping variables to actual registers or by using a stack discipline. We will look at register allocation next, so keep it simple.

Patch Instructions

This is the place for any mopping up that needs to be done.