Phase 1: Parse

Here is documentation describing how we parse .mm scripts.

The Grammar

whamm.pest

whamm!'s grammar is written using the Pest parser generator Rust library, which uses Parsing Expression Grammars (PEG) as input. Reading the Pest book first will inform how to read the whamm.pest grammar.

Pest parses a passed .mm script and creates a set of matched Rules that are then traversed in the whamm_parser.rs to generate whamm!'s Abstract Syntax Tree (AST). These Rules correspond to the naming used in the whamm.pest grammar.

The logic for creating the AST from the Pest Rules can be followed by starting at the parsing entrypoint: the parse_script function found in the whamm_parser.rs file.

The Abstract Syntax Tree (AST)

We use an AST to represent the .mm script after parsing. This AST is leveraged in different ways for each of the subsequent compiler phases.

During verification, the AST is used to build the SymbolTable and perform type checking.

While translating the AST into the injection strategy's representation, the AST is visited and restructured in a way that is simpler to compile for each strategy. Each node contains new data unique to each strategy that is helpful while emitting.

While emitting, the simpler AST variation mentioned above is used to lookup global statements and iterate over probe definitions to inject them into locations-of-interest in the Wasm program.