ISSTA2024
PolyTracker: Whole-Input Dynamic Information Flow Tracing
Evan Sultanik, Marek Surovic, Henrik Brodin, Kelly Kaoudis, Facundo Tuesca, Carson Harmon, Lisa Overall, Joseph Sweeney, Bradford Larsen
2 citations
Abstract
We present PolyTracker, a whole-program, whole-input dynamic information flow tracing (DIFT) framework. Given an LLVM compatible codebase or a binary that has been lifted to LLVM intermediate representation (IR), PolyTracker compiles it, adding static instrumentation. The instrumented program will run normally with modest performance overhead, but will additionally output a runtime trace artifact in the co-designed TDAG (Tainted Directed Acyclic Graph) format. TDAGs can be post-processed for a variety of analyses, including tracing every input byte through program execution. TDAGs can be generated either by running the program over a corpus of inputs or by employing a randomized input generator such as a fuzzer. PolyTracker traces (TDAGs) are useful not only for very localized, targeted dynamic program analysis as with smaller-scale DIFT: TDAGs are primarily intended for whole-program runtime exploration and bug finding, granular information-flow diffing between program runs, and comparisons of implementations of the same input specification without any need to emulate and instrument the entire running environment. For user-friendliness and reproducibility, the software repository provides a number of examples of PolyTracker-instrumented builds of popular open-source software projects. We also provide an analysis library and REPL written in Python that are designed to assist users with operating over TDAGs.