Bytecode Rewriting

For bytecode rewriting, there are two generators. Each of these generators are used for a specific reason while emitting instrumentation. The InitGenerator is run first to emit the parts of the .mm script that need to exist before any probe actions are emitted, such as functions and global state. The InstrGenerator is run second to emit the probes while visiting the app.wasm bytecode (represented as an in-memory IR).

Both of these generators use the emitter that emits Wasm code. The emitter uses utilities that centralize the Wasm emitting logic found at utils.rs

1. `InitGenerator`

The init_generator.rs traverses the AST to emit functions and variables that need to exist before emitting probes. The run function is the entrypoint for this generator. This follows the visitor software design pattern. There are great resources online that teach about the visitor pattern if that is helpful for any readers.

This generator emits new Wasm functions and variables into the program with associated Wasm IDs. These IDs are stored in the SymbolTable for use while running the InstrGenerator. When emitting an instruction that either calls an emitted function or does some operation with an emitted global, the name of that symbol is looked up in the SymbolTable to then use the saved ID in the emitted instruction.

2. `InstrGenerator`

The instr_generator.rs calls into the emitter to gradually traverse the application in search for the locations that correspond to probe events in the .mm's AST. When a probed location is found, the generator emits Wasm code into the application at that point through emitter utilities.

Constant Propagation and Folding!!

Constant propagation and folding are a compiler optimizations that serve a special purpose in whamm!. There are lots of resources online explaining these concepts if that would be useful to the reader.

The whamm info command helps users see various variables that are in scope when using various probe match rules. All of these global variables are defined by whamm!'s compiler and should only be emitted as constant literals. If the variable were ever directly emitted into an instrumented program, with no compiler-provided definition, the program would fail to execute since the variable would not be defined.

whamm! uses constant propagation and folding to remedy this situation!

The define function in visiting_emitter.rs is how compiler constants are defined while traversing the application bytecode. These specific variables are defined in the emitter since their definitions are tied to locations in the Wasm program being instrumented.

The ExprFolder in folding.rs performs constant propagation and folding on expressions.

When considering a predicated probe, this behavior can be quite interesting. Take the following probe definition for example:

wasm:bytecode:call:alt /
    target_fn_type == "import" &&
    target_imp_module == "ic0" &&
    target_fn_name == "call_perform"
/ { ... }

All three of the bound variables used in the predicate are statically defined by the compiler and are provided by the call event. This means that all of these variable uses will be replaced by constants and the predicate will fold to a true or false. If the predicate folds to true, the probe actions can be emitted at the found location without condition. If the predicate folds to false, the probe should not be emitted.

Now, take the next probe definition example:

wasm:bytecode:call:alt /
    target_fn_type == "import" &&
    target_imp_module == "ic0" &&
    target_fn_name == "call_new" &&
    strcmp((arg0, arg1), "bookings") &&
    strcmp((arg2, arg3), "record")
/ { ... }

The predicate of this probe now includes both variables that are defined statically and variables that are defined dynamically, which is totally valid semantically!

So, what happens here? The first three bound variables will be propagated away to constants, the expression will be folded, and those constant equivalence checks will evaluate to either true or false. This reduced value will then be and-ed together with the following dynamically-defined portion of the expression. So, the same goal will be accomplished here as in the previous example (the probe either will or will not be emitted at that bytecode location based on statically determined information). However, this time the actions emitted will retain a conditional, but it will be the folded conditional that only includes the dynamic portion of the original predicate.

Pretty cool, right??

Keyboard shortcuts

whamm!

Bytecode Rewriting

1. InitGenerator

2. InstrGenerator

Constant Propagation and Folding!!

1. `InitGenerator`

2. `InstrGenerator`