Phase 4: Emit

Here is documentation describing how we emit .mm scripts.

Some Helpful Concepts

What is a generator? A generator is used to traverse some representation of logic in an abstract way. It then calls the emitter when appropriate to actually emit the code in the target representation.

What is an emitter? The emitter exposes an API that can be called to emit code in the target representation. There will be as many emitters as there are target representations supported by the language.

In the context of whamm!, there are two generators. Each of these generators are used for a specific reason while emitting instrumentation. The InitGenerator is run first to emit the parts of the .mm script that need to exist before any probe actions are emitted, such as functions and global state. The InstrGenerator is run second to emit the probes.

Both of these generators use the emitter that emits instrumentation as configured by the end-user (either via bytecode rewriting or emitting a .v3 file that interfaces with an engine with direct support for instrumentation).

4.1 InitGenerator

The init_generator.rs traverses the AST to emit functions and globals that need to exist before emitting probes. The run function is the entrypoint for this generator. This follows the visitor software design pattern. There are great resources online that teach about the visitor pattern if that is helpful for any readers.

Consider bytecode rewriting. This generator emits new Wasm functions and globals into the program with associated Wasm IDs. These IDs are stored in the SymbolTable for use while running the InstrGenerator. When emitting an instruction that either calls an emitted function or does some operation with an emitted global, the name of that symbol is looked up in the SymbolTable to then use the saved ID in the emitted instruction.

4.2 InstrGenerator

The instr_generator.rs traverses the BehaviorTree which encodes the logic of the instrumentation to emit. The run function is the entrypoint for this generator. This follows the visitor software design pattern. There are great resources online that teach about the visitor pattern if that is helpful for any readers.

This generator calls into the emitter to gradually traverse the program in search for the locations corresponding to each probe.

Constant Propagation and Folding!!

Constant propagation and folding are a compiler optimizations that serve a special purpose in whamm!. There are lots of resources online explaining these concepts if that would be useful to the reader.

The whamm info command helps users see various globals that are in scope when using various probe specifications. All of these global variables are defined by whamm!'s compiler and should only be emitted as constant literals. If the variable were ever emitted into an instrumented program or .v3 monitor, the program would fail to execute since the variable would not be defined.

whamm! uses constant propagation and folding to remedy this situation!

The define_* functions in emitters.rs are examples of how compiler constants are defined. These specific globals are defined in the emitter since their definitions are tied to locations in the Wasm program being instrumented.

The ExprFolder in types.rs performs constant propagation and folding on expressions.

When considering a predicated probe, this behavior can be quite interesting. Take the following probe definition for example:

wasm:bytecode:call:alt /
    target_fn_type == "import" &&
    target_imp_module == "ic0" &&
    target_fn_name == "call_perform"
/ { ... }

All three of the globals used in the predicate are statically defined by the compiler and are provided by the call event. This means that all of these variable uses will be replaced by constants and the predicate will fold to a true or false. If the predicate folds to true, the probe actions can be emitted at the found location without condition. If the predicate folds to false, the probe should not be emitted.

Now, take the next probe definition example:

wasm:bytecode:call:alt /
    target_fn_type == "import" &&
    target_imp_module == "ic0" &&
    target_fn_name == "call_new" &&
    strcmp((arg0, arg1), "bookings") &&
    strcmp((arg2, arg3), "record")
/ { ... }

The predicate of this probe now includes both variables that are defined statically and variables that are defined dynamically, which is totally valid semantically!

So, what happens here? The first three globals will be propagated away to constants, the expression will be folded, and those constant equivalence checks will evaluate to either true or false. This reduced value will then be and-ed together with the following dynamically-defined portion of the expression. So, the same goal will be accomplished here as in the previous example (the probe either will or will not be emitted at that bytecode location based on statically determined information). However, this time the actions emitted will retain a conditional, but it will be the folded conditional that only includes the dynamic portion of the original predicate.

Pretty cool, right??