Here's one possible design for the taint tracker . . .
Take the example assembly snippet:
1000 mov rdi, 1024
1001 call malloc
1002 mov rax, qword [rbp-24] # [rbp-24] is an 8-byte pointer
1003 mov constant_string, rdi
1004 mov rax, rsi
1005 call strcpy
1006 xor rax, rax
Simple enough. For the less assembly-inclined, that allocates a 1024-byte
block of memory, stores the address on the stack, copies a constant string
into the newly-allocated block, then zeroes out rax (xor'ing a register
against itself is an old trick for reducing the executable size; xor is only 3
bytes (IIRC), whereas mov 0x0, rax is 10 IIRC). Also note that I may have rsi
and rdi mixed up for the strcpy call . . .
So: TaintTracker is passed the Disassembler instance for the program, which
then turns around and does a search for malloc calls. When it finds one (in
this case, it'll find one at 1001), it goes through the rest of the program,
tracking where the values of rax are stored and modified. So, step-by-step,
the taint tracker would go through the example snippet given like so:
1001: found malloc call, marking rax as tainted
1002: copying rax into register offset, marking [rbp-24] as tainted
1004: copying rax into rsi, marking rsi as tainted
1006: changing value of rax, marking rax as untainted
Note there are some problems with this design. Number one is the obvious: a
huge initialization time. This could be fixed by performing the tracking at
runtime (e.g. making MallocObserver add TaintTracker as a breakpoint observer
for the return-address breakpoint, whereupon TaintTracker will do its thing).
The next is slightly harder to spot: strcpy will change rax by itself: in
fact, according to the man page for strcpy, strcpy will actually taint rax,
since it copies the address of dest (the first argument, in this case) into
rax for the return value. In other words, I'm going to have to do some very
tricky stuff throught the entire program, not just the one executable . . .
In other words, aesalon is going to have to parse libc and indeed all
dynamically-loaded libraries as well . . . which is going to be very, very
fun. Don't you agree?