Introduction

Debugging Wasm? Put some whamm! on it!

whamm! is a tool for "Wasm Application Monitoring and Manipulation"1, a domain-specific language (DSL) inspired by Dtrace's D language. If you're building a new dynamic analysis for Wasm and are looking for a framework to support you, you're in the right place! This book will help you get on the right track when working with the language.

whamm! enables Wasm tool implementers to express their instrumentation using high-level abstractions of program events or locations at various levels of granularity, increasing the expressiveness, intuitiveness, and maintainability of the tool.

Here are some of the goals of whamm!:

  1. is high-level and intuitive
  2. instrumentation is easy-to-build, testable, and debuggable
  3. express instrumentation in terms of predicated probes
  4. can instrument events of different granularity
  5. provide behavior as Wasm functions, say where to call them in whamm!
  6. write instrumentation once, whamm! takes care of the injection strategy.

Let's take a moment to consider the scale of impact that this DSL could have on developer tooling by considering the following facts:

  1. WebAssembly is growing to use cases beyond the browser.
  2. Many languages can compile to Wasm.
  3. With whamm! write instrumentation once to support wide domain of apps.
    • Use engine instrumentation capabilities as available.
    • Use bytecode rewriting to support everything else.

This means that developer tools written in whamm! could support a vast domain of applications, making WebAssembly the future target platform for debugging.

Resources

Injection strategies:

Some helpful terms and concepts

  1. WebAssembly (Wasm): WebAssembly is a binary instruction format for a stack-based virtual machine. It is designed as a portable compilation target for programming languages.
  2. Instrumentation: When we say we are "instrumenting a program," at a high-level we mean we are "injecting some code into a program’s execution to do some operation." This definition is intentionally generic since instrumentation can really do anything we can imagine! You can use instrumentation to build debuggers, dynamic analyses, telemetry generators, and more.
  3. Dynamic analysis: A dynamic analysis is something that analyzes a program as it is executing (in contrast to a static analysis which analyzes a program that is not running). This type of analysis can gain useful insights into a program as it is able to access information that is not available statically (such as hot code locations, memory accesses over time, code coverage of test suites, etc.).
  4. Bytecode rewriting: This is an example strategy for injecting instrumentation logic into the application. It injects instrumentation through literally inserting new instructions into the application bytecode.

1: The 'h' is silent.

Getting Started

Here you will find information on how to begin writing instrumentation in the whamm! DSL.

Installation

The current way to install whamm! is to clone the repository, build the source yourself, and add the created binary to your PATH variable. In the future, users will be able to download pre-built binaries on the GH releases page as we have stable, tagged releases of whamm!.

Steps:

  1. Clone the whamm! repo
  2. Build the source code with cargo build
  3. Add the built binary to your PATH. This binary should be located at target/debug/whamm1.

Basic Test

A basic test you can run to make sure that the whamm! binary is on your path and working as-expected is running the following command: whamm --help. The CLI will provide information on various commands and options available for use.

Wasm monitors and manipulators

As mentioned in the introduction, whamm! can be used to either monitor OR manipulate a program's execution.

What we mean by monitor execution is collect some information about a program's dynamic behavior. This is commonly used for debugging, logging, and metric collection.

What we mean by manipulate execution is to literally change the program's dynamic behavior. Consider a specific feature of many debugger tools: using a debugger, a developer can set a breakpoint, inspect the current application state, and change the values of variables. This is an example of manipulating an application's dynamic behavior through changing the state and something we will support doing in whamm!.

Continue reading through this book's "getting started" content for how to write such monitors and manipulators.

Architecture

TODO

Helpful Tools

Here are some tools that may help when working with Wasm:

  1. wabt, aka the WebAssembly Binary Toolkit
  2. wasm-tools

1: We recommend adding the binary built inside target/ to your path as this will enable you to pull the latest changes on master, build the latest version, and automatically have the latest binary on your PATH.

The Language

whamm! enables tool implementers to express their instrumentation in terms of program events and corresponding predicated actions; "When this event occurs during program execution, do these actions if this predicate (or conditional) evaluates to true." This abstraction provides a high-level and intuitive syntax that can target events at various granularities in the instrumented program.

Read on for an overview of the syntax and semantics of the language.

Language Concepts

  • Variables are used to store data.
  • Logical operations can be used to combine boolean expressions.
  • Ternary Expressions can be used for succinct conditional variable assignments.
  • Primitive types are numbers, booleans, and strings.
  • Various arithmetic operations can be used with numbers.
  • Strings are key for dealing with files, text, etc.
  • Tuples allow using multiple values where one value is expected.
  • Maps are key for storing large amounts of data, but they're implemented quite differently in whamm!.
  • Function definitions can be used to reuse code snippets.
  • Conditionals are if/else/elif statements used for simple control flow
  • And finally, probes are used to express instrumentation.
  • All of this syntax is used to write whamm! scripts.

Variables

Variables store data, such as numbers and strings.

// Declaring a new variable `<type> <var_name>;`:
i32 i;
// Assigning a value to a variable `<var_name> = <value>;`:
i = 0;

// Variables can also be set to the result of an expression `<var_name> = <expression>;`:
i = 1 + 2;
i = add(1, 2) + 9; // (assuming that the `add` fn is in scope and returns an `i32`)

Scopes

Each variable is associated with some scope, which is the range of the program in which it is active and accessible. We will see how there are scopes tied to functions, probes, and scripts. The syntax for declaring and assigning to variables is consistent across these contexts.

Logical Operators

Logical operators allow joining multiple boolean expressions. Like C/C++ and Java, the && and || operators provide for logical-and and logical-or. Both operators have short-circuit evaluation; they only evaluate the right-hand-side expression if the left-hand-side evaluates to true or false, respectively.

bool a;
a = false && false; // == false
a = false && true;  // == false
a = true && false;  // == false
a = true && true;   // == true
bool a;
a = false || false; // == false
a = false || true;  // == true
a = true || false;  // == true
a = true || true;   // == true

Ternary Expressions

whamm! supports a version of the "conditional" expression that chooses one of two values based on a condition. The syntax follows C, C++, and Java, which uses ? :.

// with declared types
int a;
a = 1 > 0 ? 16 : 27; // == 16
a = 1 < 0 ? 17 : 29; // == 29

Short-circuit evaluation

The ternary expression will only evaluate the branch corresponding to the value of the condition. In other words, it short-circuits.

Primitives

whamm! offers a small set of primitive types that are useful for performing arithmetic and representing data.

Booleans

With only two values, true, and false, booleans are represented in whamm! with the type bool.

bool x; // default == false
x = true;
x = false;

Integers

Right now, whamm! supports i32 integers (signed 32-bit values), but will be supporting all numeric types provided by Wasm in the future.

// with declared types
i32 d; // default == 0
d = 0;
d = 9993;
d = -42;

The minimum decimal value for type i32 is -2147483648 (equal to -2^31) and the maximum value is 2147483647 (equal to 2^31 - 1).

Arithmetic

Arithmetic is fundamental to computation. whamm! defines a number of arithmetic operators on the primitive types.

Integer types

i32 a;
a = 0;
a++;       // increment == 1
a--;       // decrement == 0
a = 9 + 3; // add == 12
a = 8 - 2; // subtract == 6
a = 7 * 4; // multiply == 28
a = 9 / 2; // divide == 4
a = 5 % 3; // modulus == 2

Strings

NOTE: This functionality hasn't been fully implemented! More docs to come post-implementation!

Tuples

NOTE: This functionality hasn't been fully implemented! More docs to come post-implementation!

Maps

NOTE: This functionality hasn't been fully implemented! More docs to come post-implementation!

Functions

Functions are essential in programming as they enable code modularity, reuse, and organization. By encapsulating specific tasks into discrete units, functions allow developers to write cleaner, more manageable code. They promote code reuse by allowing the same block of code to be executed from multiple places within a program, reducing redundancy and potential for errors, as well as enhancing readability and maintainability.

Compiler-Defined Functions

Some functions will be automatically defined by the compiler based on the providers you have included in your script. These can be called just like user defined functions

Function Definitions

Before being able to call a function, you must define it. We allow functions to be declared anywhere in a script that is not nested within another function, if/else block, or probe.

Formal Syntax: ID ~ "(" ~ (( type ~ ID ) ~ ("," ~ type ~ ID )*) ? ~ ")" ~ ("->" ~ type) ? ~ block

If there is no declared return type, denoted by "->" followed by a type before the block, the default return type is () -- this is effectively "void" or "empty tuple" It is required to have a return statement for all possible flows through a function if it has a non-void return type and to return a value whose type must match the return type of the function.

Examples of Function Definitions:

//This is a function without a return type
i32 i = 0; 
my_function(i32 param) {
    i = param;
    return; //this is not required, but allowed
    i++; //this code is unreachable 
}
//This is another function without a return type
i32 count;
my_function() {
    count++; //function does not require a return, as it has no return type
}
my_function2() -> () { // This is functionally equivalent to my_function
    count++;
}
//here are functions with a return type
dummy_fn() -> i32 {
    return 5;
}
add_ints(i32 a, i32 b) -> i32 {
    return a + b;
}
larger_than_5(i32 num) -> bool {
    return num > 5;
}
//Here is an example of functions using if/else logic and function calls (see below)
i32 my_var = 5;
my_function(bool param) -> i32 {
    if(param){
        my_var++;
        return 0;
    }else{
        my_var--;
        return my_var;
    }
    //as all possible flows through the function have a return statement, all later code will be unreachable and does not require a return statement.
}

Function Calls

After a function is declared, either via the compiler or inside the script, they can be used within other functions and within probes. When called, functions execute the code specified in their definition and return a value with type matching the type of that function

NOTE: You cannot call functions outside of probes or other functions

Formal Syntax: ID ~ "(" ~ ( arg )? ~ ( "," ~ arg )* ~ ")"

Examples:

i32 a = 0;
inner_fn() {
    a++;
}
outer_fn() -> i32 {
    inner_fn();
    return a + 5;
}
//"BEGIN" is our probe that executes on wasm startup
BEGIN {
    inner_fn(); // call without assigning to something when void
    i32 local1 = outer_fn(); // call with assigning to a local when non-void
    outer_fn(); // you can call without assigning to something when non-void
}
larger_than_5(i32 num) -> bool {
    return num > 5;
}
//"BEGIN" is our probe that executes on wasm startup
BEGIN{
    bool local1 = larger_than_5(6);
}

Probes

<probe_specification> / <predicate> / { <actions> }

We use the term probe to refer to this triple of probe_specification, predicate and actions.

When performing bytecode rewriting, whamm!:

  1. traverses the application's Wasm module to find the locations-of-interest as specified by each probe's probe_specification.
  2. checks if the predicate evaluates to false statically
    • if it does evaluate to false it continues on, not injecting the probe's actions
    • if it does not evaluate to false, it injects the probe's actions at that location along with the folded predicate.
      • if the predicate evaluates to true statically, it will simply inject the actions into the program un-predicated.
      • if the predicate does not fold to a simple boolean value, it will inject predicated actions into this location. The predicate will then be evaluated dynamically when the application runs to conditionally execute the probe actions.

Helpful info in CLI

whamm info --help

The info command provided by the CLI is a great resource to view what can be used as the probe specification. This command provides documentation describing the specification parts as well as the globals and functions in scope, which can help users learn about how to build their instrumentation.

The Probe Specification

provider:package:event:mode

The probe_specification is a way to express some "location" you want to instrument for your program.

partdescription
providerThe name of the provider that supports this instrumentation capability used in this probe.
packageThe name of the package within the specified provider that supports the instrumentation capability used in this probe.
eventThe name of the event that would correlate with the location to insert this probe in the instrumented program.
modeThe name of the mode that should be used when emitting the probe actions at the event's location, such as before, after, and alt.

Each part of the probe_specification gradually increases in specificity until reaching the mode of your probe. Consider the following example specification: wasm:bytecode:br_if:before. This spec can be read as "Insert this probe before each of the br_if Wasm bytecode instructions in my program."

Read through our instrumentable events documentation for what we currently support and our future goals.

The Predicate

/ <predicate> /

The predicate is a way to express some "conditional" you want to evaluate to true for the probe's actions to be executed. This aspect of a probe is optional to use. If there is no predicate for some probe, the actions will always execute when the probe's location is reached during program execution.

The Actions

{ <actions> }

The actions are statements that are executed at the probe_specification's location if the predicate evaluates to true.

whamm! Scripts

Instrumentation (aka a monitor) is expressed as a set of predicated probes in a script with the .mm extension.

Here is a high-level view of the grammar for a whamm! script:

// Statements to initialize the global state of the instrumentation
global_statements;
...

// Function definitions to reuse code snippets
fn_name(fn_args) -> ret_val { fn_body; ... }
...

// An example of what a `probe` would look.
// There can be many of these in a monitor.
provider:package:event:mode / predicate / {
  probe_actions;
  ...
}

Instrumenting with the CLI

whamm instr --help

The instr command provided by the CLI enables developers to actually instrument programs.

Instrumentable Events

Currently available packages:

  • wasm:bytecode, e.g. wasm:bytecode:call:alt

Packages to be added:

  • thread operation events
  • gc operation events
  • function enter/exit/unwind events, e.g. wasm:fn:enter:before
  • memory access (read/write) events
  • table access (read/write) events
  • WASI component operation events, e.g. wasi:http:send_req:alt
  • BEGIN/END events
  • traps
  • exception throw/rethrow/catch events

Libraries

NOTE: This functionality hasn't been fully implemented! More docs to come post-implementation!

Testing Your Instrumentation

NOTE: This functionality hasn't been fully implemented! More docs to come post-implementation! This is a future research goal of ours.

Injection Strategies

Debugging and profiling programs are an integral part of engineering software. This is done through instrumenting the program under observation (inserting instructions that provide insight into dynamic execution).

The most-common instrumentation techniques, such as bytecode rewriting, inject instructions directly into the application code. While this method enables instrumentation to support any application domain, it intrudes on the program state space (possibly introducing bugs), complicates the implementation, limits the scope of observation, and cannot dynamically adapt to program behavior.

Instead, one can remedy these issues with bytecode rewriting by interfacing with a runtime engine that directly supports instrumentation. This technique can bring powerful capabilities into play as demonstrated by the Wizard research engine, in the ASPLOS paper Flexible Non-intrusive Dynamic Instrumentation for WebAssembly. This paper demonstrated how to build instrumentation support that protects the application-under-observation, provides consistency guarantees to enable composable tooling, applies JIT optimizations specific to instrumentation that make some tools run even faster than bytecode rewriting, and more. However, this technique is not as widely-used as bytecode rewriting since it limits a tool's scope to applications that can run on such engines.

This is where whamm! comes in. This DSL abstracts above the instrumentation technique to enable developer tooling to support a broad domain of applications while leveraging runtime capabilities as-available without reimplementation. With whamm! you can write instrumentation once and support wide domain of apps. Use engine instrumentation capabilities as available. Use bytecode rewriting to support everything else.

Bytecode Rewriting

walrus

To perform the bytecode rewriting injection strategy, whamm! leverages the walrus Rust library. This library loads a Wasm module into an AST representation that can then be traversed and manipulated to inject the instrumentation logic. Read more about the low-level details in the developers documentation.

Direct Engine Support

Flexible Non-intrusive Dynamic Instrumentation for WebAssembly

NOTE: This functionality hasn't been fully implemented! More docs to come post-implementation!

Examples

NOTE: This functionality hasn't been fully implemented! More docs to come post-implementation!

Branch Monitor

NOTE: This functionality hasn't been fully implemented! More docs to come post-implementation!

For whamm! Developers

Do you want to contribute to whamm! or just learn about the low-level details for fun? Then you're in the right place.

Resources

Parsing:

whamm! Implementation Concepts

The four phases of compilation:

  1. Parse
  2. Verify
  3. Encode as a BehaviorTree
  4. Emit

Other helpful concepts:

The Four Phases of Compilation

First, what is meant by the term "compilation" depends on the selected injection strategy.

For bytecode rewriting, compilation means generating a new instrumented variation of the passed program.

For direct engine support, compilation means compiling the .mm script to a .v3 program that interfaces with an engine to instrument the program dynamically. The original program is not touched when using this strategy.

The first three phases of whamm! compilation are identical for both strategies. The final emit phase is where the variation lies. This is because "emitting" for bytecode rewriting means using the walrus library to insert new instructions into the program. Whereas "emitting" for direct engine support means emitting Virgil code to specify the instrumentation probes in a new format that leverages the target engine's instrumentation API.

These are the four phases of compilation:

  1. Parse
  2. Verify
  3. Encode as a BehaviorTree
  4. Emit

Phase 1: Parse

Here is documentation describing how we parse .mm scripts.

The Grammar

whamm.pest

whamm!'s grammar is written using the Pest parser generator Rust library, which uses Parsing Expression Grammars (PEG) as input. Reading the Pest book first will inform how to read the whamm.pest grammar.

Pest parses a passed .mm script and creates a set of matched Rules that are then traversed in the whamm_parser.rs to generate whamm!'s Abstract Syntax Tree (AST). These Rules correspond to the naming used in the whamm.pest grammar.

The logic for creating the AST from the Pest Rules can be followed by starting at the parsing entrypoint: the parse_script function found in the whamm_parser.rs file.

The Abstract Syntax Tree (AST)

We use an AST to represent the .mm script after parsing. This AST is leveraged in different ways for each of the subsequent compiler phases.

During verification, the AST is used to build the SymbolTable and perform type checking.

While building the behavior tree, the AST is used to inform what the behavior should be as instrumentation is being injected into the target program (for bytecode rewriting). Since the AST encodes the events utilized by the instrumentation and the predicates that must be partially evaluated during injection, the built behavior tree encodes a flow of actions customized to the instrumentation to be emitted. While building the behavior tree, a simpler variation of the AST is created to optimize the lookup of information that is relevant during the emit phase.

While emitting, the simpler AST variation mentioned above is used to lookup global statements and iterate over probe definitions to inject them into locations-of-interest in the Wasm program.

Phase 2: Verify

Here is documentation describing how we verify .mm scripts.

The SymbolTable

During verification, first the SymbolTable is build from the AST. There are great resources online that teach about symbol tables if that is helpful for any readers.

At a high-level, the SymbolTable stores metadata about source code symbols. For .mm scripts, symbols can be parts of the probe specification (e.g. provider, package, event, and mode), function names, and variables (local or global). The enum named Record in the verifier/types.rs file defines these symbols and the metadata-of-interest for each of them.

The metadata-of-interest tends to be type information and addressing that corresponds to the ID assigned to the symbol after it's been emitted into the Wasm program.

Type information is used when type checking the script and when emitting the instrumentation (to know the type of each item being emitted).

An example of when the addressing metadata is used is emitting and calling functions. After a function defined by the instrumentation has been injected, the ID for this function would need to be stored in the SymbolTable to be looked up and used when there a call to the function is being emitted at some future point (see the InitGenerator documentation).

These symbols are contained within some scope. The types of scopes present in a .mm script can be found in the enum named ScopeType in the verifier/types.rs file. The concept of scopes in this context is the same as in other programming languages. A scope defines where variables and functions are accessible based on their location in a program.

In the context of whamm! there are some scopes that exist but aren't accessible to the end-user. Consider the probe specification: provider:package:event:mode. Each part of this specification really has its own scope. This enables each part to introduce its own helpful global variables and functions that the user can leverage to write more expressive instrumentation! These provided globals and functions are added to the AST in the whamm_parser.rs file. See the probes syntax documentation for a helpful CLI tool that enables the user to see what is in-scope for any given probe specification.

Problems / Workarounds

1. Ownership of Records and Scopes.

When writing the SymbolTable structure, there were issues with pushing the ownership of Records and Scopes down into each parent. The workaround was to hold a Vec of all Records and Scopes for the entire program in the SymbolTable struct, then hold usize types that indexes into these Vecs in the Records and Scopes.

It is possible that this could be avoided by just boxing the values in the Records and Scopes, but some experimentation needs to be done.

Building the SymbolTable

The builder_visitor.rs file builds the SymbolTable from the script's AST. The visit_whamm function is the entrypoint for this behavior. This follows the visitor software design pattern. There are great resources online that teach about the visitor pattern if that is helpful for any readers.

The TypeChecker

NOTE: This functionality hasn't been fully implemented! More docs to come post-implementation!

Phase 3: Encode as a BehaviorTree

Here is documentation describing how we encode .mm scripts as a BehaviorTree. NOTE: This is only used for bytecode rewriting!

What is a Behavior Tree?

The following pages are great resources describing behavior trees and how they can be used:

  1. What is a behavior tree?
  2. Introduction to BTs

Why even do this?

That's a great question, I'm glad you asked!

The first implementation of this DSL required traversing lots of the AST inside the emitter rather than building something (e.g. a generator) that generically traversed the AST and calling the emitter to do low-level instruction emission events. Consider what the semantic implications of a probe definition. A probe specifies locations in a program to insert some instrumentation code. One probe could result in 0 to many matched locations. This means, to emit a probe, the program must be traversed gradually, if a location is found, stop, check if the predicate tells us to emit the probe, if it does, we emit the probe's actions that that point.

If we think about this logic at a high level without the use of the behavior tree, there is a weird fuzzy layer between the generator and emitter. Either the generator needs to have some emitter logic in it (traversing the program), or the emitter needs to have some generator logic in it (traversing the AST). The BehaviorTree decouples the two since the generator logic is now in terms of the BehaviorTree's control flow which encodes the decisions and actions to be taken while instrumenting the program rather than hardcoding those decisions and actions while traversing the AST!

This also makes adding new instrumentable events easier since the instrumentation logic can be encoded in the BehaviorTree instead of hardcoding yet another conditional block to support the new functionality.

Visualization for Debugging

whamm vis-script --help

The whamm CLI provides an easy way to generate a visualization of the BehaviorTree to make it easier-to-debug the control flow of instrumentation.

Building the BehaviorTree

The builder_visitor.rs file builds the BehaviorTree from the script's AST. The visit_whamm function is the entrypoint for this action. This follows the visitor software design pattern. There are great resources online that teach about the visitor pattern if that is helpful for any readers.

While building the BehaviorTree a simpler version of the script's AST is also built (see SimpleAST in builder_visitor.rs). This new representation makes it easier to lookup relevant pieces of information that will be relevant to the logic performed in instr_generator.rs.

Using the BehaviorTree

The instr_generator.rs file actually uses the BehaviorTree to follow the logic necessary to make decisions about emitting a probe into a program. The run function is the entrypoint for this action. This follows the visitor software design pattern. There are great resources online that teach about the visitor pattern if that is helpful for any readers.

Phase 4: Emit

Here is documentation describing how we emit .mm scripts.

Some Helpful Concepts

What is a generator? A generator is used to traverse some representation of logic in an abstract way. It then calls the emitter when appropriate to actually emit the code in the target representation.

What is an emitter? The emitter exposes an API that can be called to emit code in the target representation. There will be as many emitters as there are target representations supported by the language.

In the context of whamm!, there are two generators. Each of these generators are used for a specific reason while emitting instrumentation. The InitGenerator is run first to emit the parts of the .mm script that need to exist before any probe actions are emitted, such as functions and global state. The InstrGenerator is run second to emit the probes.

Both of these generators use the emitter that emits instrumentation as configured by the end-user (either via bytecode rewriting or emitting a .v3 file that interfaces with an engine with direct support for instrumentation).

4.1 InitGenerator

The init_generator.rs traverses the AST to emit functions and globals that need to exist before emitting probes. The run function is the entrypoint for this generator. This follows the visitor software design pattern. There are great resources online that teach about the visitor pattern if that is helpful for any readers.

Consider bytecode rewriting. This generator emits new Wasm functions and globals into the program with associated Wasm IDs. These IDs are stored in the SymbolTable for use while running the InstrGenerator. When emitting an instruction that either calls an emitted function or does some operation with an emitted global, the name of that symbol is looked up in the SymbolTable to then use the saved ID in the emitted instruction.

4.2 InstrGenerator

The instr_generator.rs traverses the BehaviorTree which encodes the logic of the instrumentation to emit. The run function is the entrypoint for this generator. This follows the visitor software design pattern. There are great resources online that teach about the visitor pattern if that is helpful for any readers.

This generator calls into the emitter to gradually traverse the program in search for the locations corresponding to each probe.

Constant Propagation and Folding!!

Constant propagation and folding are a compiler optimizations that serve a special purpose in whamm!. There are lots of resources online explaining these concepts if that would be useful to the reader.

The whamm info command helps users see various globals that are in scope when using various probe specifications. All of these global variables are defined by whamm!'s compiler and should only be emitted as constant literals. If the variable were ever emitted into an instrumented program or .v3 monitor, the program would fail to execute since the variable would not be defined.

whamm! uses constant propagation and folding to remedy this situation!

The define_* functions in emitters.rs are examples of how compiler constants are defined. These specific globals are defined in the emitter since their definitions are tied to locations in the Wasm program being instrumented.

The ExprFolder in types.rs performs constant propagation and folding on expressions.

When considering a predicated probe, this behavior can be quite interesting. Take the following probe definition for example:

wasm:bytecode:call:alt /
    target_fn_type == "import" &&
    target_imp_module == "ic0" &&
    target_fn_name == "call_perform"
/ { ... }

All three of the globals used in the predicate are statically defined by the compiler and are provided by the call event. This means that all of these variable uses will be replaced by constants and the predicate will fold to a true or false. If the predicate folds to true, the probe actions can be emitted at the found location without condition. If the predicate folds to false, the probe should not be emitted.

Now, take the next probe definition example:

wasm:bytecode:call:alt /
    target_fn_type == "import" &&
    target_imp_module == "ic0" &&
    target_fn_name == "call_new" &&
    strcmp((arg0, arg1), "bookings") &&
    strcmp((arg2, arg3), "record")
/ { ... }

The predicate of this probe now includes both variables that are defined statically and variables that are defined dynamically, which is totally valid semantically!

So, what happens here? The first three globals will be propagated away to constants, the expression will be folded, and those constant equivalence checks will evaluate to either true or false. This reduced value will then be and-ed together with the following dynamically-defined portion of the expression. So, the same goal will be accomplished here as in the previous example (the probe either will or will not be emitted at that bytecode location based on statically determined information). However, this time the actions emitted will retain a conditional, but it will be the folded conditional that only includes the dynamic portion of the original predicate.

Pretty cool, right??

The whamm! CLI

TODO

The .wast Test Harness

A .wast file is used for testing purposes and simplifies the writing of tests for developers. We use .wast files to encode assertions that should pass when running an instrumented variation of a wasm module.

Writing .wast Tests

The high level structure looks like this:

<module_in_wat>

;; WHAMM --> <some_oneline_whamm_script>
<whamm0_assertion0> ;; The first assertion for the first whamm script
<whamm0_assertion1> ;; The second assertion for the first whamm script

;; WHAMM --> <some_oneline_whamm_script>
<whamm1_assertion0> ;; The first assertion for the second whamm script
<whamm1_assertion1> ;; The second assertion for the second whamm script
<whamm2_assertion1> ;; The third assertion for the second whamm script

The module encoded in the script above would be used for all the following whamm/assertion groups. To verify that all assertions fail before instrumenting, 5 new .wast files would be generated with the original module and run on the configured interpreters. To verify that all assertions pass after instrumenting, 2 new .wast files would be generated, one per specified whamm script, including the assertions under the respective whamm script.

Below is an example .wast test:

;; Test `wasm:opcode:call` event

;; @instrument
(module
    ;; Auxiliary definitions
    (func $other (param i32) (result i32) (local.get 1))
    (func $dummy (param i32) (result i32) (local.get 0))

    ;; Test case functions
    (func (export "instrument_me") (result i32)
        (call $dummy (i32.const 0))
    )
)

;; WHAMM --> wasm:opcode:call:before { arg0 = 1; }
(assert_return (invoke "instrument_me") (i32.const 1)) ;; will be run with the above WHAMM instrumentation

;; WHAMM --> wasm:opcode:call:alt { alt_call_by_name("other"); }
(assert_return (invoke "instrument_me") (i32.const 1)) ;; will be run with the above WHAMM instrumentation

Below is an example .wast test using imports:

(module
    (func (export "dummy") (param i32) (result i32)
        local.get 0
    )
)

(register "test")

;; @instrument
(module
    ;; Imports
    (type (;0;) (func (param i32) (result i32)))
    (import "test" "dummy" (func $dummy (type 0)))

    ;; Globals
    (global $var (mut i32) (i32.const 0))

    ;; Global getters
    (func $get_global_var (result i32)
        (global.get $var)
    )

    ;; Test case functions
    (func $foo
        (call $dummy (i32.const 0))
        global.set $var
    )

    (start $foo)
    (export "foo" (func $foo))
    (export "get_global_var" (func $get_global_var))
    (memory (;0;) 1)
 )
 
;; WHAMM --> i32 count; wasm:opcode:call:alt / arg0 == 0 / { count = 5; return 1; }
(assert_return (invoke "get_global_var") (i32.const 1)) ;; alt, so global should be return value
(assert_return (invoke "get_count") (i32.const 5))

There are several conventions to follow when writing .wast test cases for whamm.

  1. Only one module-to-instrument per .wast file.
    • The test setup goes at the top (which can include multiple modules when considering testing imports).
    • The module-to-instrument is the final part of the setup and is marked by ;; @instrument above the module.
  2. Use comment to specify the whamm! script, syntax: ;; WHAMM --> <whamm_script>
    • The scripts are run on the module in the .wast file.
    • If there are multiple asserts under a whamm! comment, they are all run against the instrumented variation of the module that results from that whamm! script.
  3. All asserts should fail if they were to run without instrumentation.

NOTE: For wizard, don't do manipulations that change arg* (that requires the frame accessor). Instead change global state for now?

The Harness Code

The harness is located in tests/common/wast_harness.rs with the main function as the entrypoint. We invoke this harness through calling the main entrypoint in the run_wast_tests test case located in tests/integration_test.rs.

One can read the harness code and see that it performs the following logic:

  1. Split out test components of each .wast file found under tests/wast_suite as an individual WastTestCase
    • a wasm module
    • a whamm script
    • a list of assertions that should be true post-instrumentation
  2. Ensure all assertions fail before instrumenting with whamm. We do this to be able to claim that correctness of instrumentation was the sole purpose that some test passed. We ensure that this property holds by first creating new .wast files with one assertion per file and making sure that they fail when run on a supported interpreter.
    • We do this re-generation of the .wast files because the interpreters we use exit on the first failed assertion per .wast, but we want to guarantee this property for all assertions.
  3. Ensure all assertions pass after instrumenting with whamm.
    • Run the specified whamm script on the module per set of assertions.
    • Output a new .wast file with the instrumented variation of the module with the respective assertions.

Supported Interpreters

The harness generates *.bin.wast files to run on a list of engines, e.g. wizeng and the spec interpreter.

See the repo's README.md for how to set up the interpreters to run with our test harness.

Some Ideas for Future Improvements

Report Variables

;; Use something like below to assert on the values of some report variable dynamically.
;; REPORT_TRACE(ID) --> 1, 3, 5, 6, 7

;; Use something like below to assert on report variable values!
;; WITH_WHAMM --> (assert_return (invoke "get_report_var" (i32.const 1)) (i32.const 7))

Error Handling

TODO

Contributors

These are the people to thank when you're using whamm!...either genuinely or sarcastically...if you have problems, you should let them know in a GH issue ;)

Elizabeth Gilbert, PhD student at Carnegie Mellon University (CMU).

Alex Bai, undergrad student at Tufts University.