Bytecode Rewriting
For bytecode rewriting, there are two generator
s.
Each of these generators are used for a specific reason while emitting instrumentation.
The InitGenerator
is run first to emit the parts of the .mm
script that need to exist before any probe actions are emitted, such as functions and global state.
The InstrGenerator
is run second to emit the probes while visiting the app.wasm
bytecode (represented as an in-memory IR).
Both of these generators use the emitter
that emits Wasm code.
The emitter
uses utilities that centralize the Wasm emitting logic found at utils.rs
1. InitGenerator
The init_generator.rs
traverses the AST to emit functions and globals that need to exist before emitting probes.
The run
function is the entrypoint for this generator.
This follows the visitor software design pattern.
There are great resources online that teach about the visitor pattern if that is helpful for any readers.
This generator emits new Wasm functions and globals into the program with associated Wasm IDs.
These IDs are stored in the SymbolTable
for use while running the InstrGenerator
.
When emitting an instruction that either calls an emitted function or does some operation with an emitted global, the name of that symbol is looked up in the SymbolTable
to then use the saved ID in the emitted instruction.
2. InstrGenerator
The instr_generator.rs
calls into the emitter
to gradually traverse the application in search for the locations that correspond to probe events in the .mm
's AST.
When a probed location is found, the generator
emits Wasm code into the application at that point through emitter
utilities.
Constant Propagation and Folding!!
Constant propagation and folding are a compiler optimizations that serve a special purpose in whamm!
.
There are lots of resources online explaining these concepts if that would be useful to the reader.
The whamm info
command helps users see various globals that are in scope when using various probe match rules.
All of these global variables are defined by whamm!
's compiler and should only be emitted as constant literals.
If the variable were ever directly emitted into an instrumented program, with no compiler-provided definition, the program would fail to execute since the variable would not be defined.
whamm!
uses constant propagation and folding to remedy this situation!
The define
function in visiting_emitter.rs
is how compiler constants are defined while traversing the application bytecode.
These specific globals are defined in the emitter since their definitions are tied to locations in the Wasm program being instrumented.
The ExprFolder
in folding.rs
performs constant propagation and folding on expressions.
When considering a predicated probe, this behavior can be quite interesting. Take the following probe definition for example:
wasm:bytecode:call:alt /
target_fn_type == "import" &&
target_imp_module == "ic0" &&
target_fn_name == "call_perform"
/ { ... }
All three of the globals used in the predicate
are statically defined by the compiler and are provided by the call
event.
This means that all of these variable uses will be replaced by constants and the predicate
will fold to a true
or false
.
If the predicate
folds to true
, the probe actions can be emitted at the found location without condition.
If the predicate
folds to false
, the probe should not be emitted.
Now, take the next probe definition example:
wasm:bytecode:call:alt /
target_fn_type == "import" &&
target_imp_module == "ic0" &&
target_fn_name == "call_new" &&
strcmp((arg0, arg1), "bookings") &&
strcmp((arg2, arg3), "record")
/ { ... }
The predicate
of this probe now includes both variables that are defined statically and variables that are defined dynamically, which is totally valid semantically!
So, what happens here?
The first three globals will be propagated away to constants, the expression will be folded, and those constant equivalence checks will evaluate to either true
or false
.
This reduced value will then be and-ed together with the following dynamically-defined portion of the expression.
So, the same goal will be accomplished here as in the previous example (the probe either will or will not be emitted at that bytecode location based on statically determined information).
However, this time the actions emitted will retain a conditional, but it will be the folded conditional that only includes the dynamic portion of the original predicate
.
Pretty cool, right??