1. Lexical Analysis
Role in the Compilation Process: The lexical analysis phase (scanner) is the first stage of compilation. It reads the raw source code (a stream of characters) and groups these characters into meaningful sequences called lexemes. For each lexeme, the lexer produces a token – a symbolic representation (token type plus an optional attribute like an identifier name or numeric value) (Compiler Construction 6001310-3). In this way, the lexer converts the free-form text of the source program into a structured sequence of tokens that is easier for the parser to handle. As part of this process, the lexer ignores whitespace and comments, since these are not significant for syntax – they are recognized as lexemes but not passed on as tokens (they are usually discarded) (Lexical analysis – Wikipedia). For example, consider the source fragment:
x = a + b * 2;
A lexical analyzer would output a token sequence such as:
(identifier, "x"), (operator, "="), (identifier, "a"), (operator, "+"), (identifier, "b"), (operator, "*"), (literal, 2), (separator, ";")
each token representing a category of lexeme (Lexical analysis – Wikipedia) (identifier, operator, literal, etc.). This token stream is then passed to the parser.
Typical Tools and Techniques: Lexical analyzers are often generated by tools like Lex/Flex, using rules based on regular expressions that describe the patterns of lexemes. Under the hood, these tools construct finite automata (deterministic finite automata) to recognize tokens efficiently. In fact, tokens are typically defined using regular expressions, which a lexer generator converts into a DFA to scan the input (Lexical analysis – Wikipedia). The lexer operates by reading characters and matching the longest possible lexeme (Maximal Munch rule) against the token patterns. If multiple patterns match, a priority order (usually the order of rules or an explicit precedence) decides which token is produced. This way, the lexeme “==” might be recognized as a single “EQ” token rather than two consecutive “=” tokens, and keywords (e.g. while
) can be distinguished from identifiers. Simplicity is key: the lexical analyzer deferes complex language issues to later phases, only classifying basic words and symbols (Lexical analysis – Wikipedia). Any characters that don’t form a valid token for the language result in a lexical error (e.g., an illegal character).
Handling of Whitespace/Comments: Whitespace (spaces, newlines, tabs) and comments are recognized by the lexer but typically not emitted as tokens – they are treated as separators or ignored altogether (Lexical analysis – Wikipedia). For instance, a rule may match any sequence of whitespace or comment text and have no action (so the lexer simply skips it). This means that x=10; /* comment */ y=20;
yields tokens for x
, =
, 10
, ;
, y
, =
, 20
, ;
and nothing for the comment or spaces. (One exception is in languages where indentation or comments are significant, but C is not such a language.) By stripping out non-significant text, the lexer simplifies the job of the parser.
Example – Simple Lexer in C/Flex: Below is a simplified example of token definitions using Flex (a Lex-compatible tool) for a tiny C-like language. It illustrates how patterns map to token actions:
%% // Flex rules section
[ \t\n]+ ; /* ignore whitespace */
"//".* ; /* ignore line comments */
"/*"[^*]*"*/" ; /* ignore block comments */
"int" { return T_INT; } /* keyword token */
"return" { return T_RETURN; } /* keyword token */
[0-9]+ { yylval.intval = atoi(yytext);
return T_NUMBER; } /* integer literal */
[A-Za-z_][A-Za-z0-9_]* { yylval.sym = strdup(yytext);
return T_ID; } /* identifier */
"==" { return T_EQ; } /* '==' operator */
"=" { return T_ASSIGN; } /* '=' operator */
"+" { return T_PLUS; }
"*" { return T_STAR; }
";" { return T_SEMI; }
. { return yytext[0]; } /* any other single char as itself */
%%
In this Flex specification, regular expressions define lexemes: e.g., [0-9]+
for integers, identifier pattern for names, etc. The actions associate tokens (like T_NUMBER
, T_ID
) with these lexemes. Whitespace and comments have no action (thus are skipped). A hand-written C lexer would follow a similar logic: read characters, use a finite-state machine or decision tree to identify token lexemes, and ignore irrelevant characters.
Data Structures: The lexer often relies on a character input buffer and may use state variables for context (e.g., to handle multicharacter lookahead or different lexical states in languages where the interpretation of characters can change). Modern lexer generators produce a DFA that is executed in code to categorize input; hand-written lexers might use chains of switch
/if
statements. Symbol tables may be used at this stage for tasks like recording identifiers (the lexer might enter new identifier lexemes into the symbol table and store an index as the token’s attribute value).
Key Takeaways:
- Lexical analysis transforms raw source text into a sequence of tokens, discarding layout text like spaces and comments (Lexical analysis – Wikipedia). This reduces the complexity of the remaining compiler stages by presenting a structured input.
- Lexers use regular expressions to describe tokens and finite automata to recognize them efficiently (Lexical analysis – Wikipedia). Tools like Lex/Flex automate the construction of lexers from regex rules.
- Each token has a type (e.g., identifier, number, keyword) and optionally a value (e.g., the actual identifier name or numeric value). Errors at this stage include unrecognized character sequences.
- The lexer uses the longest match rule and rule priority to resolve ambiguities in tokenization. For example, in
"=="
vs"="
, the lexer will match the two-character token if possible. - By handling low-level details (character encoding, skipping comments) in one place, lexical analysis makes the parser’s job more straightforward and contributes to a modular compiler design (Compiler Construction 6001310-3).
2. Parsing
Parsing Phases – Top-Down vs. Bottom-Up: Parsing is the compiler phase that takes the token stream from the lexer and builds a syntax tree (or some representation of syntactic structure) according to the grammar of the language. Parsers are broadly categorized into top-down and bottom-up methods. A top-down parser (e.g., an LL parser) begins at the start symbol of the grammar and tries to expand it to match the input tokens, essentially predicting the structure from the top of the parse tree and reading tokens left-to-right (LL parser – Wikipedia). In contrast, a bottom-up parser (e.g., LR-family parsers) reads tokens and gradually reduces them to non-terminals, essentially building the parse tree from leaves (tokens) up to the root, reconstructing derivations in reverse (LR parser – Wikipedia).
- LL Parsing (Top-Down): An LL(k) parser (with k token lookahead) reads input Left-to-right and produces a Leftmost derivation of the sentence (LL parser – Wikipedia). A common top-down approach is recursive descent, where each non-terminal in the grammar is implemented by a procedure that consumes the appropriate tokens or calls other procedures (according to the grammar rules). LL(1) grammars (using 1-token lookahead) are popular for hand-written parsers due to their simplicity; many programming languages are designed to be LL(1) to simplify parsing (LL parser – Wikipedia). However, left-recursive grammar rules or certain ambiguity patterns cannot be handled by simple LL(1) parsers without refactoring the grammar (e.g., left recursion must be eliminated or factored into an iterative form).
- LR Parsing (Bottom-Up): An LR parser (Left-to-right, producing a Rightmost derivation in reverse) uses a shift-reduce strategy. It shifts (reads) input tokens and pushes them onto a stack until a rule’s right-hand side is on top of the stack, then it reduces that sequence to the non-terminal (applying a production in reverse). The canonical LR(1) method (Knuth’s LR) uses full 1-token lookahead, but simplified variants are used in practice. SLR(1) (Simple LR) and LALR(1) (Lookahead LR) parsers are optimized versions that use fewer states. For example, Yacc and Bison use LALR(1) parsing tables. LR parsers can handle a broader class of grammars than LL (notably those with left recursion or that require 1-token lookahead in the middle of productions) (LR parser – Wikipedia). They are deterministic and will parse valid sentences in linear time without backtracking (LR parser – Wikipedia). The downside is that constructing LR parsing tables is more complex, and error messages can be less straightforward.
Parsing Algorithms (LL, LR, SLR, LALR):
- LL(1): A predictive parser that uses a parsing table indexed by [Non-terminal, Lookahead Token] to decide which production to apply. It requires the grammar to have no ambiguity, no left recursion, and enough token differentiation (no FIRST/FOLLOW conflicts).
- LR(0), LR(1): Bottom-up methods using states representing parser stack contexts. LR(1) uses lookahead tokens in states, making it very powerful (can handle all deterministic CFLs). But LR(1) tables can be huge.
- SLR(1): Uses simpler state items (like LR(0)) but consults FOLLOW sets of non-terminals to decide reductions. It’s simpler but can mistakenly reduce in some cases where lookahead is needed.
- LALR(1): Merges certain LR(1) states that have identical LR(0) cores, producing a compact table (same number of states as SLR) with lookahead power for most practical grammars (LR parser – Wikipedia). LALR(1) handles more grammars than SLR(1) (it resolves certain conflicts by using lookahead differences that SLR would miss) (LR parser – Wikipedia), yet it may still be confused by some contexts that full LR(1) would distinguish. In practice, most Yacc/Bison grammars target LALR(1). Conflicts (shift/reduce or reduce/reduce ambiguities) are resolved by default precedence rules or reported for grammar adjustment.
Handling Ambiguous Grammars: A grammar is ambiguous if there exists some input string that can produce more than one distinct parse tree (leftmost or rightmost derivation) (Lecture02). Ambiguity is undesirable in compilers because the parser wouldn’t know which interpretation to choose. For example, the classic expression grammar without specified precedence is ambiguous (e.g., a + b * c
could parse as (a+b)*c
or a+(b*c)
). Compilers handle ambiguity by reformulating the grammar or by adding disambiguation rules:
- Rewrite Grammar: You can refine the grammar to enforce precedence and associativity. For instance, introduce non-terminals for each precedence level (
expr -> expr + term | term
, etc., andterm -> term * factor | factor
) to ensure multiplication binds tighter than addition (Lecture02) (Lecture02). The unambiguous grammar yields a unique parse tree. - Precedence/Associativity Declarations: Tools like Yacc/Bison let you declare operator precedences and associativities (e.g.,
%left '+' '-'
and%left '*' '/'). The parser generator will then resolve parse conflicts (shift/reduce) by favoring the production consistent with those declarations (e.g.,
*higher than
+`, and left associative for sequences). This effectively directs the parser to prefer certain parses, disambiguating the grammar without changing its formal rules. For instance, using precedence rules allows one to keep an ambiguous grammar for expressions but still get a unique parse tree that respects conventional operator priorities (SI413: LR parsing & precedence/associativity). - Semantic Checks: In certain cases (like the infamous dangling
else
ambiguity), the grammar can be ambiguous but a semantic rule or tie-break (e.g., anelse
matches the nearest unmatchedif
) resolves it. Parser generators often allow attaching such rules or use default conventions.
In practice, the grammar of C is designed to be LALR(1) (Yacc was originally written for C’s grammar), though some contexts (like >>
in templates vs. shift operators in C++ or the “dangling else”) require care. Modern languages sometimes use GLR or PEG parsers if needed to handle more complex grammars, but classic compilers stick to LL or LR methods for efficiency.
Real-World Parser Generators – Yacc/Bison: The venerable UNIX tool Yacc (Yet Another Compiler-Compiler) generates a parser in C from a given grammar specification with actions. Yacc produces an LALR(1) parser by default (Yacc – Wikipedia). GNU Bison is a widely used Yacc-compatible parser generator that extends Yacc with more features (like GLR parsing for ambiguous grammars, internationalization, etc.). A Yacc/Bison grammar file allows one to define token types (from the lexer), grammar rules, and embed C code for actions. The generator outputs C code for the parsing tables and a driver that uses a stack to implement the LR parsing. These tools automate the construction of efficient parsers, handling the complexity of table generation. Other modern tools include ANTLR (LL(*) parser generator), which generates top-down parsers with powerful lookahead, and hand-written recursive descent is also common for simple or very performance-critical scenarios.
Example – Excerpt of a Bison Grammar (Expressions): Below is a snippet illustrating grammar rules for expressions with semantic actions (in C) that build an abstract syntax tree (AST) or evaluate the expression. Terminals NUM
, PLUS
, TIMES
correspond to tokens for numbers and +
, *
operators (with precedence declarations ensuring *
binds tighter than +
):
%left PLUS
%left TIMES
%%
expr : expr PLUS expr { $$ = makeNode('+', $1, $3); }
| expr TIMES expr { $$ = makeNode('*', $1, $3); }
| NUM { $$ = makeNumNode($1); }
;
%%
This grammar fragment (simplified) tells Bison that an expr
can be the sum of two expr
or the product of two expr
or just a number. The %left
lines declare +
and *
as left-associative and give them precedence (the later the declaration, the higher the precedence, so TIMES
> PLUS
). The semantic action in braces creates an AST node for the operation, combining the sub-expressions. Bison (and Yacc) will compute parse tables that implement an LALR(1) shift-reduce parser for this grammar.
During parsing, the compiler constructs either a concrete syntax tree (full parse tree with all grammar details) or more commonly an abstract syntax tree (AST) which is a simplified tree that omits redundant nodes (explained more in the next section). If any token sequence cannot be derived from the grammar, the parser reports a syntax error, often trying error-recovery strategies (like synchronizing at a semicolon) to continue parsing and report further errors.
Key Takeaways:
- The parser checks that the token sequence is syntactically valid according to the language grammar and builds a structured representation (parse tree or AST). It operates in linear time for unambiguous grammars using efficient algorithms (LL or LR parsing).
- Top-down (LL) parsers expand grammar rules from the start symbol downwards, while Bottom-up (LR) parsers reduce substrings of tokens to non-terminals, effectively building the parse from leaves to root. Both achieve the same result but with different strategies (LL parser – Wikipedia) (LR parser – Wikipedia).
- LL parsers are easy to construct (even by hand) and understand; they require grammars without certain conflicts (no left recursion, etc.). LR parsers handle a larger class of grammars and work more automatically (suitable for parser generators like Yacc), but the algorithm and generated tables are more complex. LALR(1) in particular is widely used as it strikes a balance between power and table size (LR parser – Wikipedia).
- Ambiguities in grammar are resolved by redesigning grammar rules or applying precedence and associativity rules. The compiler must ensure that each valid program has a unique parse (or at least a deterministic way to choose one parse) (Lecture02). Operator precedence is usually handled not by making the programmer parenthesize everything, but by the compiler internally enforcing rules that yield the expected parse tree.
- Parser generators like Yacc/Bison automate parser creation. Yacc produces an LALR(1) parser by default (Yacc – Wikipedia). They allow embedding semantic actions that will be executed during parsing to build AST nodes or perform other translations.
- The output of parsing is typically an Abstract Syntax Tree (AST), a structured representation of the program without syntax noise. The AST becomes the input to the next phases (semantic analysis and code generation).
3. Syntax-Directed Translation
Once parsing establishes the structure of the code, syntax-directed translation (SDT) is the process of associating semantic actions with grammar rules, so that as the parser recognizes constructs, it can perform certain translations. In other words, we attach meaning to the structure. A formal way to specify this is via syntax-directed definitions (SDDs), which attach attributes to grammar symbols and give rules for computing these attributes, or via translation schemes, which embed program code (actions) in the grammar productions (Syntax-Directed Translation).
- Syntax-Directed Definitions: In an SDD, each grammar production has an associated set of semantic rules that define how to compute attribute values. Attributes can be synthesized (computed from children up to the parent) or inherited (passed down from parent to children). For example, in an expression grammar, we might have each non-terminal
E
carry an attributeE.val
representing its numeric value. The productionE → E1 '+' T
could have a ruleE.val = E1.val + T.val
. Using the parse tree, we can evaluate these rules either through a post-order traversal (for synthesized attributes) or as needed for inherited ones. The end result might be, say, the numerical value of a constant expression or a type for a subexpression. Annotated parse trees are parse trees where each node is augmented with attribute values – this shows the flow of information (e.g., a subtree annotated with the computed value of that expression). - Translation Schemes: These are similar, but instead of specifying abstract rules, you directly embed program fragments (actions) into the grammar, typically enclosed in braces alongside productions. For instance, one might write:
E → E1 '+' T { print("+"); }
to output a+
when that production is recognized, effectively printing a postfix representation of the expression. Translation schemes are more operational – they say “when parsing this production, execute this code.” They are often used in parser generators to directly build AST nodes or emit intermediate code during parsing.
In compilers, syntax-directed translation is used to build the Abstract Syntax Tree (AST) or even to generate intermediate code on the fly. The AST vs. Parse Tree distinction is important: a parse tree (concrete syntax tree) includes every grammar token and rule, including things like parentheses or keyword tokens that don’t affect the meaning. An AST is a condensed tree that represents the syntactic structure of the program without unnecessary details (cs160-lec7_2022 – Compatibility Mode). In an AST, operator nodes are arranged with their operands as children, and we omit syntactic scaffolding nodes. For example, for an input x = (2+3)*4
, a full parse tree would have a node for every non-terminal (Expression, Term, Factor, etc.) and terminals including the parentheses; the AST, by contrast, might be a tree with =
at the root, and children x
(variable) and *
(multiply) as the assignment’s components. The *
node would have children representing the 2+3
(with +
node and children 2
and 3
) and 4
. Parentheses don’t appear as nodes – they only guided the grouping. The AST thus abstracts away the concrete grammar rules, focusing on the essential tree of operations and operands (parsing – What is the difference between an Abstract Syntax Tree and a Parse Tree? – Programming Language Design and Implementation Stack Exchange) (parsing – What is the difference between an Abstract Syntax Tree and a Parse Tree? – Programming Language Design and Implementation Stack Exchange). This makes it much easier for later compiler phases to do their work (e.g., type checking or optimization) because they can ignore irrelevant punctuation or grammar-specific nodes.
Syntax-Directed Translation for AST Construction: Typically, we write semantic actions to create AST nodes when reducing grammar productions. For example, consider a simple expression grammar:
Expr → Expr '+' Term { $$ = new AstNode('+', $1, $3); }
Expr → Term { $$ = $1; }
Term → Term '*' Factor { $$ = new AstNode('*', $1, $3); }
Term → Factor { $$ = $1; }
Factor → NUM { $$ = new NumNode(atoi(yytext)); }
Factor → '(' Expr ')' { $$ = $2; }
This Bison-style snippet uses the notation $1, $3, $$
to refer to the semantic value of the first symbol, third symbol, and to set the value of the left-hand side, respectively. So when an Expr
is the sum of two subexpressions, we create a new AST node with '+'
and link the two operands. When Expr
is just a Term
, we carry it up unchanged ($$ = $1
). This is a classic syntax-directed translation: as the parse occurs, the AST is built bottom-up. By the end of parsing the whole program, we have an AST representing the input.
Other Uses of SDT: We can similarly direct translation to other tasks:
- Building a symbol table of declarations (e.g., on a production for a variable declaration, insert the variable name into the symbol table with its type).
- Performing type checking by attaching type attributes to expressions and checking them in the semantic rules (reporting an error if a rule is violated).
- Direct code generation: e.g., outputting assembly or intermediate code instructions. One might emit a sequence of three-address code statements as actions. However, a one-pass generation like this can be limiting; many compilers prefer to build an AST or other intermediate form first, then traverse it for code generation (for clarity and better opportunity to optimize).
Example – Translating Arithmetic Expressions to Postfix (SDT): As a simple illustration, consider translating an infix expression to a postfix string (Reverse Polish Notation) using SDT. We can write a grammar with embedded actions that output parts of the expression in postfix order:
Expr → Expr '+' Term { /* after parsing sub Expr and Term */ printf("+ "); }
Expr → Expr '-' Term { printf("- "); }
Expr → Term { /* no action here */ }
Term → Term '*' Factor { printf("* "); }
Term → Term '/' Factor { printf("/ "); }
Term → Factor { /* no action */ }
Factor → NUM { printf("%s ", yytext); }
Factor → '(' Expr ')' { /* no action needed */ }
If the input is 2 + 3 * 4
, the parser (top-down or bottom-up) following this scheme will eventually print 2 3 4 * +
, which is the postfix form. Here we didn’t explicitly build a tree, we just performed a translation on the fly. This is acceptable for simple compilers or interpreters (like expression evaluators). In a real compiler, we might instead output structured intermediate code rather than a flat postfix string.
Best Practices: Whether building an AST or generating code directly, one must ensure the actions occur in the correct order relative to parsing. With bottom-up parsing, actions can be triggered on reductions (ensuring subcomponents are done first). With top-down, sometimes one uses recursive semantic routines paralleling the recursive descent parser. The design choice – AST-first vs. immediate code gen – affects compiler structure. Many modern compilers favor constructing a full AST and performing semantic analysis on it, because it separates concerns: parsing deals with structure, and then a later phase can handle the meaning with full information.
Takeaways & Benefits:
- Syntax-directed translation ties semantic computation to the parse structure. It provides a framework to perform translations (like building an AST, computing types, or even producing code) during parsing by attaching rules to grammar productions (Syntax-Directed Translation).
- AST vs. Parse Tree: The AST is the primary data structure carrying the code’s structure after parsing. It is more compact and abstract than the concrete parse tree (cs160-lec7_2022 – Compatibility Mode). It omits redundant nodes (like grouping symbols or certain intermediate non-terminals) and is tailored to the semantics of the language (e.g., an
if
AST node with two children for then- and else-blocks, rather than many smaller nodes). - SDT allows implementation of one-pass compilers for simple cases (immediate translation), but complex optimizations usually separate the steps. In practice, one often builds an AST, then does semantic analysis (type checking, etc.) as a separate pass over the AST, and then generates code. Still, the principles of syntax-directed definitions are used in those passes as well (each AST node can have methods or functions that process children and compute results).
- By formally specifying syntax-directed rules (as in an attribute grammar), we can reason about dependencies (e.g., ensuring no circular definitions of attributes, choosing evaluation order). Tools exist (like YACC/Bison for actions, or ANTLR’s actions, or special attribute grammar evaluators) to automate parts of this.
- Example recap: If the input code fragment is something like
int x = 2 + 3 * 4;
, the parser (with SDT) will ultimately produce an AST node for the assignment=
with left child a varx
and right child an expression node+
. That+
node’s children would be the integer2
and another node*
whose children are3
and4
. This AST omits theint
keyword (which is handled as a declaration separately) and any parse-specific nodes. The AST then becomes the basis for generating intermediate code in the next phase.
4. Runtime Environments
After syntax and semantic analysis, the compiler’s focus shifts to preparing for execution. The runtime environment of a program refers to how the program’s data and control information are laid out in memory during execution and how function calls are implemented (managing control flow, parameters, return values, etc.). The compiler must generate code that correctly sets up and manages this environment at runtime.
Memory Organization: A typical program’s memory is divided into segments:
- Code (Text) Segment: where the machine instructions of the program reside (often read-only).
- Static (Data) Segment: for global or static variables and constants. This is allocated at program load time. It can further be divided into initialized data and uninitialized data (bss).
- Heap: an area for dynamic memory allocation (e.g., malloc in C). Managed via runtime library calls, this grows as needed for dynamic objects. Allocation and deallocation are under program control (or automatic in garbage-collected languages).
- Stack: used for managing function calls (automatic variables and control data). Each time a function is called, a stack frame (activation record) is pushed; when it returns, that frame is popped. The stack grows and shrinks with nested calls.
In a C program, for example, if you call a function f()
, a new block of memory is allocated on the stack to hold f
’s local variables, arguments, return address, etc. When f
returns, that block is freed (automatically, by popping the stack). The stack thus supports last-in-first-out nesting of calls, which matches the structured nature of most languages (functions don’t persist after returning). Variables with static storage duration (like global variables or those marked static
inside functions) live in the static data segment for the entire execution.
Activation Records (Call Stack Frames): An activation record contains the information needed to execute a single invocation of a procedure or function. A typical activation record layout includes (Microsoft PowerPoint – lecture10short):
- Return Address: the location in the caller’s code to jump back to when this function returns.
- Control Link (Dynamic Link): a pointer to the previous activation record (the caller’s frame). This is often the old frame pointer value, used to restore the caller’s base pointer on return (lect.dvi). It allows traversal of the stack for debugging or exceptions (a call chain).
- Access Link (Static Link): (if needed) a pointer to another activation record for lexical scoping (the enclosing function in languages with nested functions). C does not have nested lexical scopes across functions (only file scope and block scope), so C’s activation records do not include a static link. But languages like Pascal or Ada, which allow inner procedures to access variables of outer procedures, include a static link to find the defining environment.
- Local Variables and Temporaries: storage for all the local data of the function. This includes compiler-generated temporaries, and sometimes space for saving registers that the function might modify (according to the calling convention).
- Parameters: depending on the convention, parameters passed to the function may be stored in the caller’s frame or the callee’s frame. Often in high-level discussion, parameters are considered part of the callee’s activation record (accessible via the frame pointer). In implementation, some architectures push arguments onto the stack (above the return address) before the call, effectively putting them in the callee’s frame area.
- Saved Registers: if the calling convention specifies that certain registers must be preserved across calls (callee-saved registers), the callee will save those in its frame on entry and restore on exit. Similarly, some conventions have the caller save registers (caller-saved) it cares about around the call.
For example, an activation record for a function in an Algol-like language might look like: { Return Address; Dynamic Link; Static Link; Parameters; Local Variables } (Microsoft PowerPoint – lecture10short). In C’s case, there is no static link, and parameters are often accessed via the same base pointer as locals (just at positive offsets). When a function is called, a new activation record is pushed; the CPU’s stack pointer moves downwards (on most architectures) to allocate it. A frame pointer register (like EBP
in x86 or frame pointer in ARM) often points to a fixed location within the frame (e.g., the start of locals or end of saved context) to simplify access to variables via known offsets.
Parameter Passing Methods: Different programming languages use different parameter passing semantics, and the compiler must implement these:
- Call by Value: The argument’s value is computed and copied into the parameter variable. In C, all parameters are passed by value (for objects like ints, pointers, etc.). This means inside the function, you work on a local copy (for scalars) or a copy of the pointer (for arrays and structures, large structs are copied unless passed as pointer). The activation record will have space for each parameter and the caller will place the argument value there (either via registers or stack).
- Call by Reference: Instead of a value, the address of the argument is passed. The parameter inside the function is effectively an alias for the caller’s variable. Languages like C++ have references, and many languages (Fortran, Pascal’s
var
parameters) default to reference semantics. Implementation-wise, the caller passes a pointer. The callee might treat that pointer as the variable – for instance, to read or write the variable, it dereferences the pointer. C does not have true call-by-reference for primitives, but you can achieve similar effect by explicitly passing pointers (e.g.,void foo(int *p)
). The compiler ensures any dereference happens as needed. - Call by Value-Result: Also known as copy-in/copy-out. The caller copies the value in (like call-by-value), and additionally, when the function returns, the final value of the parameter is copied back out to the original argument. Ada has such parameters (
in out
mode). To implement this, the activation record has a local slot for the parameter (copy-in), and the address of the caller’s variable is kept; on return, the value is written back. - Call by Name: Used in Algol 60 (largely historical), where an argument is re-evaluated each time it’s accessed, somewhat like a macro expansion or using thunks. This is complex to implement (it’s like passing in a hidden function to compute the value on demand).
- In modern compiled languages, call by value and call by reference cover most needs. The calling convention (often defined by the platform ABI – Application Binary Interface) will dictate exactly how parameters are passed (which registers or stack locations) and who cleans up. For example, in the cdecl convention on x86, arguments are pushed on the stack right-to-left and the caller cleans up the stack; on many x86-64 ABIs, the first few arguments are passed in registers (e.g., RCX, RDX, R8, R9 for Microsoft x64), with remaining on stack. The compiler’s code generator follows these rules when emitting call and return sequences.
Activation and Control Transfer: When a function is called, the following typically happens (on a system using a stack and frame pointer):
- The caller evaluates each argument and either pushes it on the stack or places it in the designated register.
- The caller then executes a machine instruction like
CALL
which pushes the return address (IP/EIP/RIP) onto the stack (or otherwise saves it) and jumps to the callee’s entry. - On entry, the callee pushes the old frame pointer (so it can restore it later) and sets the frame pointer to the current stack pointer location (establishing a stable base for this frame). It then decreases the stack pointer to allocate space for locals.
- If required, the callee saves callee-save registers onto its stack frame.
- The body of the function executes. Local variables are accessed at known offsets from the frame pointer (or stack pointer in stack-frame-pointerless setups).
- When returning, the function may place a return value in a specific register (e.g.,
EAX
for int return on x86) or in memory as per convention. - The callee restores any saved registers, restores the old frame pointer (pop), and uses a
RET
instruction which pops the return address into the instruction pointer, thus jumping back to the caller. - The caller then (in caller-clean conventions) adjusts the stack pointer to remove arguments if needed, and uses any return value.
The above sequence is orchestrated by compiler-generated prologue and epilogue code for each function. This ensures each call’s linkage is correct – i.e., the calling convention is respected so that caller and callee agree on where to find things like parameters and return address.
Memory for Local Data: Local (automatic) variables live in the stack frame. If a local is large (like a big array in C) or if the language allows dynamic sized locals (C’s alloca or Ada’s stack allocation), the compiler adjusts the stack pointer accordingly. Some languages (like VLAs in C99) require runtime size determination for stack allocation.
Heap Allocation and Garbage Collection: The heap is a region for dynamic allocation (via malloc/free
in C, new/delete
in C++, or automatic garbage-collected allocation in Java/C#). The compiler typically calls library routines for allocation/free (except in languages with manual memory management, where free
is explicitly called by the programmer). However, in languages with garbage collection, the runtime environment includes a garbage collector that automatically reclaims memory that the program no longer references. Garbage collection (GC) can be by reference counting or by tracing (mark-and-sweep, generational copying collectors, etc.). The compiler’s role in a garbage-collected language is to emit allocations (which call into the GC allocator) and to provide information (perhaps via generated tables or tags) about what parts of the stack or heap contain pointers, so the GC can find live objects. Garbage collection is essentially automatic memory management – the runtime identifies allocated objects that are no longer reachable by the program and frees them, relieving the programmer of manual deallocation responsibilities (Garbage collection (computer science) – Wikipedia). For example, Java’s runtime periodically runs a GC that finds unused objects and reclaims their heap space. C and C++ do not have built-in GC, but some higher-level runtime support systems (like the Boehm GC) can be introduced if needed.
Example – Activation Record in Action: Consider a simple C function:
int add(int a, int b) {
int c = a + b;
return c;
}
When add(5, 7)
is called from main, the stack might look like:
- The activation record for
main
is at the bottom. - Calling
add
: main (caller) pushesb=7
, thena=5
(or puts them in registers as per ABI), then executes CALL. The return address and perhaps a register-saving are done. - In
add
’s frame: at entry,EBP
is saved and a newEBP
set. Space forc
is reserved (e.g., sub esp, 4). Nowa
andb
are accessible perhaps at EBP+8 and EBP+12 (if passed on stack) or via registers/stack if ABI uses registers. The code computesc
=a+b
(from those locations) and stores it at [EBP-4] (assuming that’s c’s offset). Then it moves that into EAX (return register). On return, it restores EBP and does RET. The return address was on stack, so RET pops it and jumps back. The caller then perhaps adjusts stack pointer to pop the arguments (in cdecl). - Thus, after return, main sees the result in EAX (or appropriate register) and continues.
All these details (offsets, which registers to use) are decided at compile-time by the compiler, following the platform’s conventions. The compiler emits code to implement the protocol of calls and returns.
Takeaways:
- The runtime environment is structured so that each function call has a stack frame (activation record) containing its locals, parameters, return address, and links (Microsoft PowerPoint – lecture10short). The compiler’s generated code manages this stack frame on calls and returns.
- Calling conventions define how arguments are passed (in registers vs stack) and who cleans up the stack. The compiler must generate code accordingly for every call and return. For instance, in one convention the callee might pop arguments before returning (e.g., stdcall), in another the caller does it (cdecl). These conventions also cover where return values go (commonly a register).
- Proper handling of the activation record is crucial for correct recursion and reentrancy. Each recursive call gets its own frame, with its own set of locals. The dynamic link (saved frame pointer) chains back so one can recover the caller’s context on return.
- Parameter passing methods (value vs reference, etc.) are implemented by the compiler through either copying values or passing addresses. Conceptually, call-by-value is simplest (just copy), while call-by-reference is implemented by pointers (sharing the caller’s variable) and call-by-value-result by a combination (copy in, and copy out at end).
- If the language supports exceptions, the runtime needs additional info (like stack unwinding tables) to destroy activation records properly when an exception jumps out of a function. Similarly, coroutines or threads require more complex management (stacks per thread, etc.), which the compiler and runtime handle together.
- Garbage collection (when applicable) runs alongside the program to manage heap memory (Garbage collection (computer science) – Wikipedia). The compiler may insert write barriers or stack maps to aid GC. In non-GC languages like C, the compiler’s involvement with heap is minimal beyond calling
malloc/free
. - In summary, the runtime environment set up by the compiler ensures that high-level language features (function calls, variable lifetimes, etc.) have a correct low-level implementation in memory.
5. Intermediate Code Generation
After semantic analysis, many compilers translate the AST or annotated syntax tree into an Intermediate Representation (IR). IR is a code form that is typically lower-level than the AST but not yet tied to a specific machine – it’s like a portable assembly or quads that facilitate optimization. The compiler can then perform optimizations on the IR and eventually translate it into target machine code.
Common IRs: There are various forms of IR:
- Three-Address Code (TAC): a sequence of simple instructions, each with at most three operands (hence the name:
x = y op z
). Each instruction typically has one operation and is akin to a single three-operand assembly operation or a high-level assembly for a fictitious machine. For instance, a complex expressiont = (a + b) * c
might be broken into TAC as:t1 = a + b t2 = t1 * c
wheret1
andt2
are compiler-generated temporary names. In three-address code, we allow an instruction of the formX := Y op Z
(or unaryX := op Y
), as well as jumps and conditional jumps, etc. (Three-Address Code) (Three-Address Code). It’s called “three-address” because an instruction can mention up to three addresses (locations) – for example, the result and two sources (Three-Address Code). This linear sequence of simple instructions is convenient for optimization. Many compilers (including GCC’s older RTL or the intermediate stages of Java JITs) use a form of TAC. - Control-Flow Graph and Basic Blocks: Often the IR is viewed not just as a list of instructions, but organized into basic blocks (straight-line code with no jumps except into the beginning and out of the end) linked in a control-flow graph (CFG). This is conceptual rather than a different IR grammar, but it’s important for optimization.
- Static Single Assignment (SSA) Form: SSA is a modern form of IR where each variable is assigned exactly once, by introducing distinct versions for variables whenever they would be assigned anew (SSA (GNU Compiler Collection (GCC) Internals)). In SSA, our previous example might look like:
t1 = a_1 + b_1 t2 = t1 * c_1
wherea_1, b_1, c_1
are the initial versions ofa, b, c
. If later we reassigna
, we’d call ita_2
, etc. At join points of control flow, phi-functions merge variables: e.g., after an if-else,x_3 = φ(x_1, x_2)
to indicatex_3
is eitherx_1
(from then-branch) orx_2
(from else-branch) (SSA (GNU Compiler Collection (GCC) Internals)). SSA is not an alternative to TAC, but a property of the IR – typically SSA is applied to a three-address code representation. SSA simplifies many analyses and optimizations because each variable version has a single definition point, making data-flow relationships explicit. Modern compilers like LLVM use SSA-based IR (LLVM IR is in SSA form) because it provides clear def-use chains and eases optimizations like constant propagation and dead code elimination. In fact, “converting to SSA” is a common step before performing global optimizations (SSA (GNU Compiler Collection (GCC) Internals)). - High-level IR vs Low-level IR: Some compilers use multiple IR levels. For instance, one IR might still resemble source constructs (like a tree for a loop with a condition and increment), which is good for certain high-level optimizations, and a lower IR might break everything into machine-like operations (good for register allocation). GCC, for example, uses GENERIC and GIMPLE (in SSA) as higher-level IR, then eventually lower-level RTL. Java HotSpot uses bytecode (stack-based IR) then an SSA IR in JIT.
Translating AST to IR: The compiler’s intermediate code generator walks the AST and emits corresponding IR instructions. Key design decisions include:
- Evaluation Order: The AST may not explicitly dictate evaluation order (if language allows reordering as in expressions with no side effects), but the IR must pick a specific order. The typical choice is left-to-right or based on tree traversal. For example, to translate an infix expression tree, a postorder traversal can naturally produce code that evaluates children then applies the operator.
- Temporary Management: For complex expressions, the compiler introduces fresh temporary variables to hold intermediate results. For every sub-expression node in the AST that isn’t a simple operand, we likely create a temporary (unless we do direct register allocation later).
- Structured vs. Unstructured Control: The AST for a loop or if-statement is structured (with subtrees for body, etc.). IR can either keep some structure or lower it to jumps. Many intermediate codes use explicit labels and gotos to represent control flow. For example, a high-level
while
loop node can be translated to something like:L1: if !(condition) goto L2 // check loop condition, if false, break <body IR> goto L1 L2: ... // exit loop
which is a set of three-address code instructions including conditional and unconditional jumps. This unstructured form (jumping back to start, jumping out at end) is convenient for general optimizations (Types of Three-address codes – GeeksforGeeks). - Procedure calls: The IR might have a pseudo-instruction for calls (e.g.,
call p, n
with n arguments, as in the three-address code description (Three-Address Code)) and an instruction for returning a value. Or it might inline procedure bodies if doing interprocedural work. - Memory accesses: Depending on IR design, the IR might abstract memory access (e.g., having array indexing as a high-level operation) or lower it into address arithmetic and load/store operations. A three-address IR often includes instructions like
t = addr(arr, i)
andx = load t
to representx = arr[i]
.
Example – Converting C Constructs to IR:
- Conditional (if-else): Consider:
if (x < y) max = y; else max = x;
A possible three-address code sequence:if x < y goto L_true ([Three-Address Code](https://www.csd.uwo.ca/~mmorenom/CS447/Lectures/IntermediateCode.html/node3.html#:~:text=They%20have%20the%20form)) goto L_false
L_true: max = y goto L_join L_false: max = x L_join: (continue here)
This uses a conditional jump for the `if` test, directing control to either the true-label or false-label. After executing one of the assignments, an unconditional jump skips over the other branch’s code. In SSA form, at `L_join` we might have `max3 = φ(y, x)` merging the two values for `max`.
2. *Loop:* For a `while (i < n) { i = i + 1; }`, IR could be:
L_top: if i >= n goto L_exit // check loop condition (negated for exit) (Three-Address Code) i = i + 1 // loop body goto L_top L_exit: (next code)
This is a typical lowering of a while-loop to a label at the top, a conditional branch to break, and a back-edge.
3. *Arithmetic:* Expression `z = (x + 2) * (y - 3)` may translate as:
t1 = x + 2 (Three-Address Code) t2 = y – 3 t3 = t1 * t2 z = t3
This sequence ensures each operation is broken out. In a real compiler, some of those temporaries might be optimized away or kept in registers, but at the IR level we treat them as separate storage to make data dependencies explicit. Note how an AST that had `*` at root with children `(x+2)` and `(y-3)` becomes a linear sequence computing the subexpressions then the multiply.
4. *Function call:* A call `p = foo(a, b*2);` might become:
t1 = b * 2 param a (Three-Address Code) param t1 t2 = call foo, 2 // call foo with 2 args, result in t2 p = t2
In this style (common in textbooks for three-address code), you push parameters and then use a `call` opcode. Alternatively, IR might keep it as one pseudo-instruction with arguments embedded.
In designing IR, compilers consider trade-offs:
- A higher-level IR (with nodes for loops, for example) is easier to relate to source constructs and may enable certain high-level optimizations (like loop transformations) more directly.
- A lower-level IR (just jumps and simple operations) is uniform and simpler to implement analyses on (graphs of basic blocks, etc.). It also maps more directly to machine code.
- Static single assignment form is a *choice* to simplify analysis: many global optimizations (constant propagation, value numbering, etc.) become cleaner in SSA form because you don’t have to consider multiple definitions of a variable interfering – each SSA name has one definition, and φ-nodes handle merges.
Modern optimizing compilers often turn the AST into a **three-address code in SSA**. For example, **LLVM IR** resembles a typed three-address code in SSA form (with an unlimited number of virtual registers each defined once). This combination makes analyses like liveness or constant propagation easier to implement.
**Trade-offs and Design:** The IR must be rich enough to represent all operations of the source language (or at least be a platform-independent model of them). If the source has complex features (say, exception handling or closures), the IR needs constructs or conventions for those (like special instructions or function objects). Simpler IR keeps fewer opcodes and relies on library or runtime calls for complex operations.
**Key Takeaways:**
- Intermediate code is a bridge between the high-level AST and low-level machine code. It often takes the form of a generic assembly-like language (e.g., three-address code) that is easier to optimize because it breaks operations into simple components ([Three-Address Code](https://www.csd.uwo.ca/~mmorenom/CS447/Lectures/IntermediateCode.html/node3.html#:~:text=T%20HREE,code%20instruction%20has%20the%20form)).
- **Three-address code (TAC)** is a prevalent choice: it linearizes computations into sequences of simple assignments and jumps ([Three-Address Code](https://www.csd.uwo.ca/~mmorenom/CS447/Lectures/IntermediateCode.html/node3.html#:~:text=,and%20one%20for%20the%20result)). Complex expressions are decomposed into multiple TAC instructions using temporary variables, which correspond to intermediate results ([Three-Address Code](https://www.csd.uwo.ca/~mmorenom/CS447/Lectures/IntermediateCode.html/node3.html#:~:text=,becomes)).
- **SSA form** is a powerful IR property where each variable is assigned exactly once, making the flow of values explicit. Most modern compilers convert their IR to SSA to perform optimizations, then possibly convert out of SSA for final code generation ([SSA (GNU Compiler Collection (GCC) Internals)](https://gcc.gnu.org/onlinedocs/gccint/SSA.html#:~:text=The%20SSA%20form%20is%20based,matches%20that%20of%20the%20most)). SSA introduces φ-nodes at control-flow merges to select the correct incoming value ([SSA (GNU Compiler Collection (GCC) Internals)](https://gcc.gnu.org/onlinedocs/gccint/SSA.html#:~:text=Sometimes%2C%20flow%20of%20control%20makes,For%20instance)). This greatly simplifies data-flow reasoning.
- The IR is typically organized into basic blocks and a control-flow graph to facilitate control-flow analyses. Constructs like loops and if-else are translated into labels and gotos in the IR (unless the IR keeps them structured). For example, *available expressions analysis* or *liveness analysis* will operate on the CFG of the IR.
- **Example recap:** A high-level language construct, say a `for` loop, will be translated to IR with explicit jump instructions implementing the loop control. This unambiguous, low-level representation is what optimization passes work with. Later, when emitting assembly, each IR instruction (or small group) will map to one or a few machine instructions.
- By designing a good IR, compiler developers isolate machine-dependent parts (done later, in code generation) from the general optimization logic (done on the IR). This improves retargetability (you can reuse the front-end and optimizer for different machines, just by having a new back-end that knows how to translate IR to machine code).
<br>
# 6. Local Optimization
After generating intermediate code, the compiler performs **optimization** to improve the efficiency of the code. Some optimizations are **local** to a small region of code (e.g., a basic block) and can be done quickly, even during code generation. Local optimizations focus on transformations that don’t require whole-program analysis – typically within a single basic block or a small window of consecutive instructions. The goal is to produce code that runs faster or is smaller, without changing the program’s meaning.
**Scope and Rationale:** Local optimizations operate on straight-line code sequences (no control flow joins or branches except at the ends). This limited scope means we can assume any value not defined in the block is unchanged in that block, etc. While local optimizations may miss opportunities that require global knowledge, they are still very useful and usually cheap to perform. They often clean up obvious inefficiencies introduced by earlier phases (like redundant computations from naive translation).
**Peephole Optimization:** This is a classic local optimization technique where the compiler examines a *sliding window* (the "peephole") of a few instructions and replaces that pattern with a better sequence when possible ([Compiler Construction](http://homepage.cs.uiowa.edu/~jones/compiler/notes/38.shtml#:~:text=Peephole%20optimization%20involves%20examination%20of,a%20typical%20peephole%20optimization%20rule)). It was introduced as a method to improve generated assembly code by recognizing inefficient instruction sequences and replacing them:
- For instance, a sequence like
```asm
MOV R0, R1
MOV R1, R0
(two moves swapping registers in a redundant way) could be simplified or eliminated if it has no net effect (in this case, it just leaves registers as they were, so it can be removed entirely).
- Another common peephole rule is to eliminate
push; pop
pairs to the same place, or simplify address calculations. - On modern CPUs, replacing a pair of instructions with one with an immediate operand can be beneficial. For example, if the IR or naive code gives:
MOV R3, #5 ; load 5 into R3 (literal 5 into a register) ADD R2, R2, R3 ; add R3 to R2
a peephole optimizer could detect this pattern and replace it withADD R2, R2, #5
using an immediate add, which is shorter and faster (Compiler Construction). - Redundant load/stores are also target for peephole: e.g., loading a value from memory that’s already in a register from just prior instructions, or storing a variable then reading it back can be optimized.
Peephole optimization is typically machine-dependent because it works with the target instruction set and its idioms (though one can apply it to IR as well). It’s often done after code generation, on the assembly or machine code level, because that’s where real instruction sequences and their interactions are visible. It can also fix up suboptimal code patterns emitted by earlier phases.
Algebraic Simplifications: Within a basic block or even a single instruction, the compiler can use algebraic identities to simplify computations (compil11a):
- Constant Folding: If an expression has constant operands, compute it at compile time. E.g., replace
x = 4 * 2;
withx = 8;
. This is often done as early as the AST level, but can also occur in IR. (Constant folding can be global or local; locally, we just fold constants that are both available in this instruction.) - Strength Reduction: Replace an expensive operation with a cheaper one when possible. Classic example: multiplication or division by a power of 2 can be replaced by a left or right shift. E.g.,
t = y * 8;
becomest = y << 3;
. Shifts are generally faster than multiplies on many architectures (though modern CPUs have fast multipliers, but shifting is still usually at least as fast and sometimes the compiler’s only choice in languages without direct multiply, historically). Another strength reduction: use addition instead of multiplication in a loop for index stepping (this is actually a loop optimization often applied globally, but can be done locally for single occurrences). - Algebraic identities: Remove or simplify operations that have no effect or an obvious outcome:
x = x + 0
orx = 0 + y
→x = y
(addition by 0 eliminated) ([PDF] CS153: Compilers Lecture 19: Optimization – Harvard University).y = x * 1
→y = x
(multiplication by 1).z = x * 0
→z = 0
(and similarly,x * 0
yields 0 regardless of x, assuming no side effects; compilers must be cautious ifx
were an expression with side effects, but if it’s a pure variable, it’s safe).a = a * 2
might be turned intoa = a + a
(though whether that’s better depends on architecture).y = x / 1
→y = x
;y = x - 0
→y = x
.- Boolean simplifications:
if (b && true)
becomesif (b)
, etc.
- Reassociation and Constant Combination: Within a basic block, if we have something like
t1 = 4 + 5
and latert2 = t1 + 6
, we can fold it intot2 = 15
directly. Compilers often combine constants: for instance, transform((a + 5) + 3)
into(a + 8)
. This is local if it’s within one expression’s computation.
Redundant Code Elimination (Local): Within a basic block, if a value is computed multiple times with no intervening modifications to its operands, one can eliminate the redundancy:
- Common subexpression elimination (local variant): If the same expression
a + b
occurs twice in a block and neithera
norb
changed in between, the second one can be replaced by a reuse of the first result. This is often done via value numbering in a basic block, which assigns a number to each unique computed value. For example:t1 = A + B ... t2 = A + B
IfA
andB
haven’t changed, the optimizer can realizeA+B
was already computed intot1
and just reuset1
: e.g., deletet2 = A + B
and replace uses oft2
witht1
. (If needed, one might renamet1
to a more permanent name.) - Dead code elimination (local): If an instruction in a block computes a value that is never used in any subsequent instruction (within the block or at its end), it can be removed immediately. For instance, if we have
x = y * 0;
and nothing ever uses x afterwards, that instruction can be removed (assuming no side effects – here multiplication has none aside from computing x). This often occurs after other transformations; e.g., constant propagation might make an assignment dead.
Example of Local Optimization: Suppose our IR for part of a program (basic block) is:
t1 = 4 * 1 // t1 = 4
t2 = t1 + 0 // t2 = t1
x = t2
y = x * 8
z = x * 8
Without optimization, that’s a direct translation. Now, local optimizations:
- Constant fold
4 * 1
to just4
. Sot1 = 4
. - Remove
+ 0
:t2 = t1 + 0
becomest2 = t1
or simplyt2 = 4
(since t1 is 4). Actually, that’s copy propagation:t2
just copiest1
. We can propagate and findx = t2
meansx = 4
as well. - So now we have
x = 4
. The multipliesy = x * 8
andz = x * 8
each multiply 4 by 8. They yield 32. We can strength-reduce* 8
to a shift (though at IR level it might just stay multiply until later). More obviously, we seey
andz
compute the same expressionx*8
with no changes to x in between. Local common subexpression elimination would computex*8
once:y = x * 8 z = y // reuse the result for z
Nowz = y
is copy propagation, and if no further use of z separately, it might even be optimized out depending on context. - After all this, we might end with:
x = 4 y = x << 3 // (shift left 3 bits to multiply by 8) z = y
Then copy propagation could yieldz = 4 << 3
which isz = 32
. Constant folding makes thatz = 32
. And ify
orz
aren’t used later, dead code elimination can remove them entirely. So a lot can happen from a tiny sequence! In practice, these steps are done systematically: constant propagation/folding, then algebraic idents, common subexpr, copy propagation, dead code removal—often iteratively until no more changes.
Peephole Example (Assembly level): Consider the machine code:
ldr r4, [r3, #0] ; load from address in r3
str r4, [r3, #0] ; store it back to same address
This load-store sequence does nothing (writes the value back where it came from with no change). A peephole optimizer could spot this and delete both instructions as a no-op. Another example:
sub r1, r1, #0 ; subtract 0 (no effect on r1)
This is useless and can be removed.
Why Local Optimization Matters:
- It catches many obvious inefficiencies that arise from naive code generation or from intermediate transformations. For example, after inlining or unrolling loops, there might be duplicate calculations that local optimization can remove.
- It runs fast because it only considers limited contexts, and thus can be done even in compilers that don’t do heavy global optimization (such as at -O1 optimization level, you might at least get peephole improvements).
- It improves code quality without needing complex data-flow analyses (or it uses very simple ones within a block).
Key Takeaways:
- Local optimizations clean up code within basic blocks or small instruction sequences, making it more efficient without considering the larger program context.
- Peephole optimization is a technique of pattern-matching short sequences of instructions and replacing them with more optimal sequences (Compiler Construction). It can remove redundant instructions, combine adjacent instructions, and substitute faster operations (like immediate addressing instead of register moves) (Compiler Construction).
- Algebraic simplifications use mathematical identities to simplify code: remove operations with neutral elements (adding 0, multiplying by 1), replace expensive ops with cheaper ones (multiply -> shift) (compil11a), and pre-compute constant expressions. These make the generated code shorter and faster.
- Common subexpression elimination (CSE) can be done within a block by noticing when the same expression (with identical operands) is computed more than once. By reusing a previous computation, we avoid doing it again. Local CSE is often implemented by value numbering or similar techniques that assign a unique number to each distinct expression in the block.
- Copy propagation replaces unnecessary intermediate copies by using the original value directly. For example, if
u = v; w = u;
then we can simplify tow = v;
and perhaps eliminateu
if it’s no longer needed. - Dead code elimination (DCE) gets rid of computations whose results are not used. Locally, if a value is computed and not used before being overwritten or before the block ends, that computation can be removed.
- Although each local optimization might only save a small amount of time or space, cumulatively they significantly improve the efficiency of the final code. They often enable further optimizations (e.g., eliminating one instruction might make another one dead or another subexpression common). Therefore, compilers apply a series of local optimizations (sometimes repeatedly) to polish the code.
- Importantly, local optimizations never change the program’s intended results – they must preserve semantics. They only remove or change instructions that are proven to be unnecessary or equivalent to a simpler form. The correctness of these transformations is usually evident in the local context (e.g., “x*0 is always 0, so replace with 0” is safe as long as evaluating x has no needed side effect).
7. Data Flow Analyses
To perform more sophisticated optimizations, a compiler needs knowledge about how values flow through the program. Data flow analysis is a framework for gathering information about possible values computed at various points in a program. It typically involves setting up equations on a control-flow graph and solving them (often by iterative algorithms). Several classic optimizations rely on specific data flow analyses. We’ll discuss a few key ones: constant propagation, liveness analysis, and common subexpression elimination (which relies on available expressions analysis).
Constant Propagation
Purpose: Constant propagation is an optimization that aims to discover values that are constant at compile time and substitute them directly in the code (Constant Propagation – CRAN). For example, if the compiler can deduce that variable N
is always 10 at a certain point, it can replace occurrences of N
with 10. This often enables further simplifications like constant folding or dead code elimination (e.g., turning if (N == 10)
into if (true)
if N is known 10).
How it works: Constant propagation is formulated as a forward data flow analysis over the control-flow graph. Each program point is given an abstract state that tracks, for each variable, whether we know it to have a constant value (and if so, what that value is). A simple lattice for this analysis has elements: either “unknown” (⊤, top), “constant value v”, or sometimes an explicit “undefined” (⊥, bottom) for variables not yet given a value () (). Initially, all variables are marked unknown (except maybe those initialized at program start). Then:
- For an assignment
x = 5
, we infer x’s value is constant 5 after that statement. - For
y = x + 2
, if at that point x is known to be 5, we infer y is 7 (so constant). - If a variable is assigned different constants on different paths (or a constant on one path and unknown on another), when paths merge the result for that variable becomes unknown (unless the constants agree). A classic example:
if (...) { c = 1; } else { c = 2; } // after if-else, c could be 1 or 2, not a single constant
So after the merge,c
is marked unknown (or “not constant”). - If a variable is assigned from a non-constant expression or from input, it becomes unknown (or a special symbol meaning “not a constant”).
The data flow equations propagate these facts. When a block has multiple predecessors, a variable’s constant value is taken if it’s the same constant coming from all predecessors; if any predecessor has it unknown or a different constant, it becomes unknown at the join (). The process iterates until a fixed-point is reached (no more changes).
Algorithmic overview: Often implemented with a worklist algorithm: initialize known constants (e.g., immediate assignments), put affected blocks on a worklist, then repeatedly process a block, updating the constants info for variables defined and used, and if anything changes, propagate that info to successor blocks. A famous algorithm is Wegman and Zadeck’s Sparse Conditional Constant Propagation (SCCP) which combines constant propagation with unreachable code detection, working on SSA form for efficiency ([PDF] Data Flow Analysis • Data Flow Frameworks • Constant Propagation …). But conceptually, even a simple iterative analysis on three-address code can do it.
Example: Suppose we have:
int a = 3;
int b;
b = a + 2;
int c;
if (b < 5) {
c = b * 2;
} else {
c = b * 3;
}
print(c);
Constant propagation would start knowing a
is 3 (constant). Then propagate: after b = a + 2
, since a is 3, b becomes 5 (constant). At the if (b < 5)
, it evaluates to if (5 < 5)
, which is false. So the true-branch is actually dead (if the compiler is combining with constant folding and dead-code removal, it could remove it). Even without removing it, we know in the true branch condition is false, so that branch won’t execute; in the false branch, inside else, b is still 5, so c = b * 3
means c = 15 (constant). After the if-else, c is 15 on the only feasible path. The program will print 15 without needing to actually do the arithmetic at runtime. Essentially, the analysis deduces b and c’s values and could replace them:
b = 5;
c = 15;
print(15);
And then print(15) might be turned into just an output of constant.
Use in optimization: Constant propagation enables constant folding across basic blocks. For instance, even if b = a+2
and print(b)
are in different blocks, CP can propagate the constant so that print(5)
is output. It also often turns conditional jumps into fixed branches (like an if
that becomes if-true or if-false), which then allows dead code elimination of the never-taken branch. This can significantly simplify control flow.
It’s important that the analysis is conservative in cases of loops (it might never mark something constant if it changes in a loop) or in presence of unknown inputs. If a value isn’t guaranteed constant on all paths, it must be left as non-constant. However, some path-specific constants might allow conditional code specialization (beyond standard constant propagation, that’s more advanced).
To sum up, constant propagation determines where variables have constant values and substitutes them, thereby reducing computation and enabling further optimization (Constant Propagation in Compiler Design – GeeksforGeeks). This is done by a forward data-flow analysis considering the effects of each assignment on the knowledge of constants.
Liveness Analysis (Live Variable Analysis)
Definition: Liveness analysis determines, for each program point (typically each instruction or each variable at a point), whether a variable’s current value will be used in the future. A variable is live at a point if from that point onward (along some path to program end) there is a use of that variable’s value before the variable is overwritten; conversely, it is dead if it holds a value that will never be needed (Live-variable analysis – Wikipedia). Informally, “alive means its value matters later, dead means we can forget it.”
Why it’s useful: The primary application is in register allocation – deciding which variables to keep in registers. If a variable is not live (dead) at a certain point, the register holding it can be repurposed for another variable without causing errors. Liveness is also used for dead code elimination: if a variable is assigned a value that is never subsequently used (meaning that definition is dead immediately after), that assignment can be removed as useless.
Data-flow nature: Liveness is a classic backward data-flow analysis because whether a variable is needed now depends on future uses (Live-variable analysis – Wikipedia). We traverse the control flow graph from end to start. At the end of the program, no variables are live (assuming program exit doesn’t need variable values). We then move backwards:
- At each instruction, determine how it affects liveness: if the instruction is
x = y + z
, then:x
is defined here, so after this point (immediately after the assignment)x
’s old value is no longer needed (it’s been overwritten). In terms of data flow, the definition killsx
from being live beyond this instruction because the value ofx
coming into this instruction is replaced.- The operands
y
andz
are used, so they are certainly needed at this point (thus they must be live before this instruction executes, since their values are read).
- The liveness equation typically is:
Live_In[instr] = (Live_Out[instr] – {defs at instr}) ∪ {uses at instr}
andLive_Out[instr] = ∪(Live_In of all successor instructions)
(for a given instruction or basic block) ([PDF] CS153: Compilers Lecture 20: Dataflow analysis – Harvard University). In words: a variable is live entering an instruction if it is live after the instruction (and not redefined by it) or if it is used by that instruction. - This analysis propagates backward until a fixed point: e.g., in a loop, a variable might be live within the loop body and that information flows around.
Example: Consider this sequence:
1. a = 3;
2. b = a + 2;
3. if (b > 5)
4. c = b - 5;
5. d = b * 2;
6. print(d);
Let’s analyze liveness roughly:
- At the end (after
print(d)
), no variable is live (assumingprint
is the end of usage ford
). - Step 6 uses
d
, so before 6,d
must be live (its value is needed for print). It doesn’t define any new variable (aside from possible system calls), so liveness out of 5 is just liveness into 6. After printing,d
is not live. - Step 5 defines
d
and usesb
. After 5,d
is live (until printed), and what aboutb
?b
is not used after 5 at all (assuming print(d) doesn’t use b). Sob
is not live after 5. During 5,b
was needed (to compute d), so at line 5’s start,b
was live;d
was not live before line 5 (it gets defined at 5, so its old value, if any, wasn’t needed). - Step 4 (the if body) defines
c
and usesb
. However, noticec
is never used later (no print of c, etc.), so any value assigned to c is dead right after line 4.b
is used in 4, so at that pointb
had to be live (on the path where the if executes). - The if condition at line 3 uses
b
, so coming into line 3,b
is live (because it’s needed for the comparison). Line 3 doesn’t define anything itself. - Line 2 defines
b
and usesa
. So for line 2:a
must be live coming in (because it’s used to compute b), andb
is not yet live coming in (it’s defined here, so previous value of b, if any, was irrelevant). After line 2,b
might be live if there’s a use later. Indeed, we have uses at 3,4,5, so yes, after line 2,b
is live (because of those future uses). - Line 1 defines
a
. It doesn’t use anything. Isa
live after line 1? Only ifa
is used later before being redefined. We see thata
is used at line 2, so yes,a
is live at the end of line 1 (its value is needed in line 2). After line 2, isa
live?a
is not used anywhere after line 2, so after that point,a
becomes dead (its value 3 is not needed beyond computing b).
Thus, liveness sets might be: at start of line 1, maybe nothing is live (assuming no prior context), at end of line 1, a
is live (for line 2). At end of line 2, b
is live (for lines 3-5), a
is dead (was just used and not needed further). At the conditional (line 3), b
is live (used in condition, and also in line 5 regardless of the branch). Inside line 4, b
is live (used), c
becomes defined (but dead after). End of line 4, b
still live (because of line 5 uses), c
dead. End of line 5, d
is live (for print), b
dead (no further use). End of line 6, d
dead (program ends).
Application – Register Allocation: Liveness information is used to build an interference graph for register allocation: each variable is a node, and an edge is drawn between two variables if they are live at the same time (within a basic block or at a program point). If two variables interfere (are live simultaneously), they cannot share the same register. Conversely, if one variable is dead while another is live, they could potentially use the same register (this is register spilling avoidance and register reuse logic). A classic algorithm is Graph Coloring register allocation: color the interference graph with k colors (registers) such that no two adjacent nodes share a color.
Application – Dead Code Elimination: If liveness shows a variable’s value is never used (i.e., it’s not live at any point after it’s assigned), then the assignment to that variable is dead code. For example, c = b - 5
above is dead code (if c is not used later). Dead store elimination will remove such an assignment. Note that we require that writing the variable has no other side effect – in this case a plain arithmetic assignment doesn’t, but an assignment might have a function call or volatile memory effect embedded, which we wouldn’t remove. But if it’s purely a local calculation, we can remove it.
Summary: Liveness analysis is essential for optimizing storage. It answers the question “At this point in the program, which variables’ current values might be read later?” (Live-variable analysis – Wikipedia). Those variables need to be kept (in registers or memory) until their use. Variables not in that set can be allowed to “die” – registers can be reused, and if a variable is never live again, we don’t even need to compute it (if we hadn’t computed it yet, we could skip, or if computed and not used, that computation can be erased).
Common Subexpression Elimination (CSE)
We touched on a local version of CSE earlier; now the focus is global (across basic blocks). Common subexpression elimination aims to avoid recomputation of the same expression when it is known to have the same value as before. A common subexpression is an expression that appears multiple times such that between those appearances, none of the variables in the expression have changed. If so, the result computed the first time can be reused.
Theory – Available Expressions: Global CSE relies on available expression analysis (Available expression – Wikipedia). This is a forward data flow analysis which determines for each program point which expressions have already been computed and not invalidated (i.e., their operands values haven’t changed) on all paths to that point. An expression E
(like a+b
) is available at a point if along every path from the entry of the program to that point, E
has been computed and neither a
nor b
(nor other sub-parts of E
) have been redefined since that computation (Available expression – Wikipedia) (Available expression – Wikipedia). In other words, the value of E
is still “available” in a temporary or register.
The data flow for available expressions is similar: at the start of the program, no expressions are available (nothing computed yet). As we go forward:
- If a new expression is computed, it becomes available thereafter (until its operands change).
- If a variable in some available expression is assigned a new value, any expressions involving that variable become unavailable (killed) from that point, because the earlier computed value is no longer valid.
- At control-flow merges, an expression is available only if it is available from all incoming paths (Available expression – Wikipedia).
When we find that an expression E
is about to be computed but it’s already available (meaning we have a previous computation of E
in some variable or temporary), we can eliminate the duplicate by using the previously computed value:
- We ensure the earlier value is saved in a temporary (if it wasn’t already).
- Instead of recomputing
E
, we replace that occurrence with the temporary holdingE
’s value.
Example:
x = a + b;
...
y = a + b;
If between those two lines, neither a
nor b
changed, then a+b
is a common subexpression. CSE would do: introduce a temp, e.g.,
t = a + b;
x = t;
...
y = t;
Thus the second a+b
is eliminated (Common Subexpression Elimination – Code optimization Technique …). For correctness, one must be certain that along every path from the first computation to the second, a
and b
retained their values. If there’s an if
in between where a
might be reassigned, then on some paths the expression isn’t the same value, so it’s not “available” on entry to the second assignment, and we cannot safely reuse it without more complex conditions.
Global CSE Implementation: Usually done by:
- Running an available expressions data flow to gather sets of expressions available at block entries (Available expression – Wikipedia).
- During a traversal (or using the results), whenever a computed expression is available, redirect to a common temp. Many compilers use value numbering or hash-based approaches to perform CSE within extended basic blocks or globally. In SSA form, CSE can partially happen automatically because SSA form with φ can unify values, though explicit algorithms like global value numbering are used.
Benefits: Avoiding redundant computations saves execution time. It can also expose other optimizations; e.g., if you eliminate an expensive computation in a loop by hoisting it out (which is a form of CSE across iterations, sometimes called loop-invariant code motion, needing similar analysis).
Limitations: CSE must not cross operations with side-effects if they intervene between two candidate subexpressions. Also, if the expression is very cheap (like simple register move), eliminating it might not matter much; but for costly operations or memory accesses, CSE can yield significant gains.
Example with Control Flow:
t1 = p + q;
if (...) {
t2 = p + q; // appears again
r = t2 * 5;
}
If p
or q
isn’t changed in the if, then p+q
is available at the start of the if’s true-block. Data flow would show p+q
is available coming into that block (assuming it’s computed on all paths; here, one path – through the if – indeed had it). So we can replace t2 = p+q;
with t2 = t1;
(using the value computed before the if) ([PDF] CS 4120/5120 Lecture 28 Redundancy elimination 11 April 2016). Actually, better: just reuse t1
itself, no need for t2
at all, you could directly do r = t1 * 5;
in the if. But careful: if the if might not execute, we should ensure t1
is still computed regardless (here it is outside, so fine).
Connection to Data Flow: The available expression analysis is the engine behind global CSE (Available expression – Wikipedia). It’s a forward may-analysis: it tells what expressions may be available (actually must be available on all paths to be safe). Using its result, we eliminate recomputations. In practice, compilers sometimes combine this with building a graph of expression occurrences or use SSA’s properties to identify equivalent computations.
Takeaways:
- Common Subexpression Elimination ensures an expression is computed only once and reused, provided its operands don’t change in between. It relies on knowing what expressions’ values are still valid (available) at various points (Available expression – Wikipedia).
- This optimization can significantly reduce duplicated work. For example, in code with complex arithmetic or function calls with identical arguments, CSE will call the function once and reuse the result the second time (assuming no side effects in between).
- There is both local CSE (within a basic block, often handled by local value numbering) and global CSE (across multiple blocks, handled by data flow analysis and sometimes called global value numbering as well).
- Compilers must be cautious not to perform CSE when intervening operations might change the outcome (for instance, two identical function calls generally cannot be CSE’d because the function might produce different results or have side effects, unless it’s known to be pure).
- By eliminating common subexpressions, not only do we save time, but we may also reduce register pressure (one less live value if we reuse one) and enable other optimizations like loop invariant code motion or dead code elimination (if the redundant calculation was used only for that now-eliminated expression, it could remove it entirely).
In summary, data flow analyses provide the information needed for powerful optimizations:
- Constant propagation uses a forward analysis to find when values are constants and replaces them (Constant Propagation – CRAN), simplifying code and enabling more optimizations (like folding and branch removal).
- Liveness analysis is a backward analysis that finds what values are needed later (Live-variable analysis – Wikipedia), crucial for resource management (register allocation) and for removing dead stores.
- Common subexpression elimination uses forward analysis (available expressions) to avoid redundant recalculation, improving efficiency (Available expression – Wikipedia).
Each of these analyses operates on the IR’s control-flow graph and typically they are applied repeatedly or in different sequences as part of an optimization pipeline in an advanced compiler.
8. End-to-End Compilation Example
To cement the concepts from lexing through optimization, let’s walk through a small C code snippet and illustrate how each phase of the compiler processes it, resulting in progressively more optimized output. Consider the following simple C program:
int main() {
int a = 2 + 3;
int i = 0;
int sum = 0;
while (i < a) {
sum = sum + i;
i = i + 1;
}
return sum;
}
This program computes the sum of numbers from 0 to a-1 (with a
being 5 in this case, so sum of 0..4 which is 10). We’ll see how the compiler front-end and optimizer handle it:
1. Lexical Analysis: The lexer reads the characters and produces tokens. Ignoring most punctuation, the token sequence might look like:
'int' 'main' '(' ')' '{'
'int' 'a' '=' '2' '+' '3' ';'
'int' 'i' '=' '0' ';'
'int' 'sum' '=' '0' ';'
'while' '(' 'i' '<' 'a' ')' '{'
'sum' '=' 'sum' '+' 'i' ';'
'i' '=' 'i' '+' '1' ';'
'}'
'return' 'sum' ';'
'}'
Each of these is identified with a token type (keyword, identifier, operator, numeric literal, etc.) (Lexical analysis – Wikipedia). Whitespace and comments (if any) are skipped. For instance, "int"
is a keyword token, a
and i
and sum
are identifier tokens (with semantic info like the lexeme string), numbers like 2
, 3
, 0
, 1
are numeric literal tokens (with associated values). The lexer also likely records the source location of tokens for error reporting.
2. Parsing: Using the C grammar, the parser constructs a syntax tree (or directly an AST). It will recognize structures: a function definition, a variable declaration list, a while-loop, etc.
A possible (simplified) parse tree outline:
TranslationUnit
└── FunctionDefinition
├── TypeSpec "int"
├── Declarator "main()"
└── CompoundStmt (function body)
├── DeclStmt "int a = 2 + 3;"
├── DeclStmt "int i = 0;"
├── DeclStmt "int sum = 0;"
├── WhileStmt
│ ├── Condition (Expr "i < a")
│ └── LoopBody (CompoundStmt)
│ ├── ExprStmt "sum = sum + i;"
│ └── ExprStmt "i = i + 1;"
└── ReturnStmt "return sum;"
The parser uses grammar rules to reduce tokens to non-terminals. For instance, it recognizes '2' '+' '3'
as an additive expression and builds a node (with +
as an operator and 2
and 3
as operands). It recognizes the while
loop structure: 'while' '(' Expr ')' Stmt
. Each local variable declaration is recognized and the symbol table is updated with a, i, sum
as integers.
The resulting Abstract Syntax Tree (AST) would be a cleaned-up version of the parse tree: for example, the while
loop node would have as children the condition expression subtree (<
with left i
and right a
) and the body subtree (a list of two assignments). The 2+3
would be a subtree under the initialization of a
. The AST might omit explicit sequence nodes for the compound statement, instead linking declarations separately from statements, etc., depending on compiler design. But logically, we have an AST representing:
a
initialized to(2+3)
i
initialized to0
sum
initialized to0
- a loop: condition
(i < a)
, body{ sum = sum + i; i = i + 1; }
- return
sum
3. Syntax-Directed Translation / Semantic Analysis: During or after building the AST, the compiler performs semantic checks and sets up translations:
- Type Checking: It confirms that
2+3
is an int (2 and 3 are ints, good),i < a
yields a boolean (in C, treated as int true/false), the return type (int) matches the function’s declared type. No type errors here. - Symbol Resolution: The identifiers
a, i, sum
are resolved to the declarations in the symbol table.sum = sum + i
refers to the localsum, i
defined above. This is straightforward as they are all in the same scope. - Syntax-Directed Construction of AST/IR: Suppose the compiler directly builds an AST. Now, using syntax-directed translation, it might produce an intermediate representation. Some compilers go directly to IR in one pass; others produce an AST then have a separate pass to IR. We will illustrate going to a three-address-like IR.
4. Intermediate Code Generation (Pre-optimization IR): Let’s generate a plausible TAC sequence from the AST:
- For
int a = 2 + 3;
: the expression2+3
can be evaluated at compile time, but let’s assume initially it generates code:t1 = 2 + 3 // compute 2+3, result in temporary t1 a = t1 // assign to a
Or it might directly doa = 2 + 3
in one instruction, but using a temp is fine. int i = 0;
->i = 0
int sum = 0;
->sum = 0
- The
while (i < a)
loop: We need a label at loop start and a jump for the condition.L1: // loop start t2 = i < a // evaluate condition if t2 == 0 goto L2 // if false (0) jump to loop exit // loop body: t3 = sum + i sum = t3 // sum = sum + i t4 = i + 1 i = t4 // i = i + 1 goto L1 // repeat loop L2: // loop exit
Here, we introducedt2, t3, t4
for intermediate results.t2
holds the result ofi < a
(1 for true, 0 for false). Theif
usest2
to decide to exit the loop wheni < a
is false. The loop body updatessum
andi
. return sum;
: In IR, we might represent that as copyingsum
into a dedicated return register or location. For now, we could say:ret = sum // pseudo: place sum in return register/variable
and then an instruction to actually return. In three-address code we might just mark it as an end.
So the initial IR might look like:
// prologue (not shown: allocate space for a, i, sum if needed)
t1 = 2 + 3
a = t1
i = 0
sum = 0
L1:
t2 = i < a
if t2 == 0 goto L2
t3 = sum + i
sum = t3
t4 = i + 1
i = t4
goto L1
L2:
ret = sum
// end function
This intermediate code captures the logic of the original program.
5. Optimization (Intermediate Code Optimization): Now, the compiler applies local and global optimizations on this IR.
- Constant Folding: We see
t1 = 2 + 3
. This can be folded tot1 = 5
(compil11a). In fact, sincea
is assigned5
and never changed,a
is essentially a constant 5 throughout. A constant propagation pass will realize that: - Constant Propagation:
a
is 5, propagate that to any use ofa
. Sot2 = i < a
becomest2 = i < 5
(Constant Propagation in Compiler Design – GeeksforGeeks). Nowa
itself might not be needed as a variable at all (we could replace it with 5 everywhere and possibly remove its storage). - We also have initializations
i = 0
andsum = 0
which are constants being assigned. - Strength Reduction / Algebraic Simplification: There’s not much obvious algebraic simplification here aside from constants. However, one could consider the loop
i = i + 1
might be left as is (fine), andsum = sum + i
similarly. - Dead Code:
t1
is a temp that after folding becomest1 = 5; a = t1
. If we propagate, we knowa = 5
, and we might not needt1
at all, we could just doa = 5
. Thena
being constant might later be removed or just kept as an immutable. - Loop Optimization: Constant propagation tells us the loop runs while
i < 5
. The loop itself cannot be unrolled completely safely by a simple local optimization, but a compiler might choose to unroll it partially or fully since 5 iterations is fixed and small. However, full loop unrolling from noticing a constant loop bound is an optimization that requires care (here it’s easy, but in general constant bounds might still be large). - Common Subexpression Elimination: Not much opportunity here because the loop’s expressions vary each iteration (
sum+i
changes each time becausesum
andi
change). If we had repeated expressions, we’d catch them. But e.g.,i<5
is recalculated every iteration; one could argue that 5 is constant, buti
changes, soi<5
must be evaluated each time (no CSE across iterations unless doing loop invariant extraction, andi<5
is somewhat invariant except fori
… not extractable becausei
changes). - Dead Variable Elimination: After propagation, if
a
is replaced by 5 everywhere, the variablea
is never read (except at its definition). So the assignmenta = 5
could be seen as dead (unless we keep it to adhere to original semantics; but sincea
isn’t used after initialization, and no one can observe it except by final output which doesn’t usea
, it’s effectively a wasted store). A compiler could eliminatea
entirely (allocate no space for it, and not do the assignment) – this is safe because removing it doesn’t change the output (as long as we’re surea
isn’t volatile or something). - Liveness & Register Allocation: At this point, liveness analysis would tell us: –
a
(if still present) is not live after its assignment (it’s never used beyond being the loop bound and we replaced that with constant). Soa
’s store is dead. –i
is live within the loop (used in condition and body), but after the loop ends,i
is not used anymore (not used in return), soi
is dead after the loop (so register for i can be reused after). –sum
is live through the loop and at the return, as that’s what we return. –t2, t3, t4
are all temporaries used within the loop iteration and then overwritten or not used beyond one iteration. A good optimizer might not even create new temps every iteration but reuse a couple of registers. - Peephole & Local Cleanups: The sequence
t3 = sum + i; sum = t3;
can be recognized as justsum = sum + i
in one go (redundant temp t3). A peephole or even earlier optimization can eliminate t3 and directly generatesum = sum + i
. Similarly,t4 = i + 1; i = t4;
becomesi = i + 1
. These are local simplifications to make the IR leaner. They reduce the number of instructions and temporaries.
After optimizations, the IR might be simplified to:
a = 5 // (might be eliminated entirely as discussed)
i = 0
sum = 0
L1:
if i >= 5 goto L2 // flipped the condition to directly check exit (using constant 5)
sum = sum + i
i = i + 1
goto L1
L2:
ret = sum
And since a
is not actually needed, it could become:
i = 0
sum = 0
L1:
if i >= 5 goto L2
sum = sum + i
i = i + 1
goto L1
L2:
ret = sum
This is pretty optimal. The only further improvements could be to realize this loop’s result could be computed by a formula (sum of 0..4 = 10), but such a transformation (recognizing a summation pattern) is beyond standard compiler optimizations – it would require either a very smart optimizer or a developer noticing and manually writing code differently. Compilers generally won’t derive a closed-form formula for a summation loop on their own (that’s more theorem-proving territory).
However, a modern compiler might unroll the loop a bit or use strength reduction for addressing if it were array indexing. Here there’s no array, so it’s simple.
6. Code Generation: Finally, the optimized IR is translated into target machine code (say x86-64 assembly). Register allocation will assign registers to i
, sum
, and maybe use a register for the constant 5. For example, one possible x86-64 outcome (pseudo-assembly):
mov DWORD PTR [rbp-8], 0 ; i = 0 (rbp-8 could be i, rbp-12 sum if stored in stack)
mov DWORD PTR [rbp-12], 0 ; sum = 0
jmp .Lcond
.Lloop:
mov eax, DWORD PTR [rbp-8] ; eax = i
add DWORD PTR [rbp-12], eax ; sum = sum + i
add DWORD PTR [rbp-8], 1 ; i = i + 1
.Lcond:
cmp DWORD PTR [rbp-8], 5 ; compare i and 5
jl .Lloop ; if i < 5, loop
.Lexit:
mov eax, DWORD PTR [rbp-12] ; load sum into return register (eax)
leave
ret
This is a straightforward translation of the optimized loop. (If the compiler kept a
in a register or memory, it would load 5 from it, etc. But likely it optimized it out or kept 5 in a register or immediate.)
Note that registers can often hold these values without always spilling to memory. A more optimized register allocation (assuming we don’t have to keep everything in stack memory):
xor edi, edi ; i in edi = 0
xor esi, esi ; sum in esi = 0
.Lloop:
add esi, edi ; sum += i (esi = esi + edi)
inc edi ; i++ (edi = edi + 1)
cmp edi, 5
jl .Lloop
mov eax, esi ; return value in eax (32-bit int)
ret
Here edi
is used for i
, esi
for sum
, and 5
is an immediate in the compare. This tight loop will execute 5 iterations, computing the result 10 in esi
, then put it in eax
for return.
From the initial high-level code to this final assembly, we saw:
- Lexing identified tokens like
while
,i
,a
, etc. (e.g.,while
becomes a control structure token (Lexical analysis – Wikipedia)). - Parsing structured these tokens into an AST that represents the while-loop and arithmetic.
- Syntax-directed translation (and constant folding during analysis) figured out that
2+3
is 5 and possibly directly placed that into the IR (compil11a). - The intermediate code captured the logic in a low-level form (labels and jumps).
- Data flow analysis (constant propagation) and local optimizations removed the unnecessary usage of
a
and combined instructions, resulting in a simpler loop. - Liveness analysis showed
a
was not needed, allowing its removal, and guided register allocation to know thati
andsum
needed to stay in registers during the loop buti
could be freed afterward, etc. - The final generated code is efficient – it only uses a couple of registers and a small loop, which is about as good as one could expect from a compiler for this code.
Key takeaways from the example: We’ve followed the code through each phase:
- Lexical analysis tokenized the input (e.g., turning
while (i < a)
into tokens forwhile
,(
, identifieri
,<
, identifiera
,)
). - Parsing & AST established the structure (a loop containing two assignments).
- Intermediate code generation translated structured constructs into a linear IR with labels and temporaries, bridging to a machine-like representation.
- Optimizations (constant propagation, folding, dead code elimination, etc.) greatly simplified the IR: recognizing compile-time constants (like 5) (Constant Propagation – CRAN), removing never-used variables, and streamlining the loop operations.
- Final code generation mapped the optimized IR to actual instructions, using registers and jumps appropriately. The end result is compact and efficient assembly that correctly implements the original program’s logic.
Every step preserved the program’s semantics (sum still ends up 10 for the range 0–4) while improving the form of the code for execution. This end-to-end walkthrough illustrates how a high-level C program is systematically broken down and transformed by the compiler into an optimized executable form, with each compiler phase contributing its part: lexing and parsing ensure the structure is understood; semantic analysis sets the stage (and catches errors if any); IR and optimization make the code efficient; and code generation produces the machine code that runs the computation.
Be First to Comment