Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Mar 02 21:03

    ivg on master

    fixes, cleans, and optimizes KB… (compare)

  • Mar 02 21:03
    ivg closed #1280
  • Mar 02 19:12
    ivg synchronize #1280
  • Mar 02 19:09
    ivg opened #1280
  • Mar 01 21:35

    ivg on master

    hushes the BIL lifter when ther… (compare)

  • Mar 01 21:35
    ivg closed #1279
  • Mar 01 20:42
    ivg opened #1279
  • Feb 26 20:01

    ivg on master

    moves to centralized and unifor… (compare)

  • Feb 26 20:01
    ivg closed #1278
  • Feb 26 17:17
    ivg synchronize #1278
  • Feb 26 14:44
    ivg synchronize #1278
  • Feb 25 16:43
    ivg opened #1278
  • Feb 24 17:07

    ivg on master

    enables writing program semanti… (compare)

  • Feb 24 17:07
    ivg closed #1277
  • Feb 24 13:24
    ivg synchronize #1277
  • Feb 23 22:03
    ivg opened #1277
  • Feb 23 19:10

    ivg on master

    adds the `--target` parameter t… (compare)

  • Feb 23 19:10
    ivg closed #1276
  • Feb 23 17:24
    ivg opened #1276
  • Feb 22 21:09

    ivg on master

    overhauls target/value abstract… (compare)

Ivan Gotovchits
@ivg
let me clarify, it is not that bts is not supported, just this particular variant is not handled, the (BTS32ri8 EAX EAX 0x1f)
it will help me a lot, if you will create a bug report with it, I will handle this as soon as will get time for it
tgfine
@tgfine
Thanks. I will submit a bug report. Is there a list of instructions and variants of instructions that are not currently lifted?
Ivan Gotovchits
@ivg
Nope, there is no such list, unfortunately
tgfine
@tgfine
No problem. Thanks again!
Ivan Gotovchits
@ivg
it could be also that this is not really a variant but a change of encoding in llvm that we didn't notice. E.g., the encoding that our lifter is expecting is (BTS32ri8 EAX 0x1f), i.e., with one register, but the example has two (which actually makes sense). So probably they have changed the encoding and we missed it. In that case, it is just a pure bug
CryHavoc21
@CryHavoc21
Hi, I was having some issues reading the Bap documentation, and was wondering if someone could clear something up for me. What is the correct way to get a list of sections in an ELF binary (with names intact, since I'm looking for a specific section with a specific name to get its Memory) given an Image of that ELF binary? Right now, I'm getting everything as segments with the Image.segments function, but I can't tell if there is a direct equivalent to that for sections. I'm looking at API the documentation right now, but I'm admittedly pretty lost in it. I can provide code examples of what I have now, if that will help
CryHavoc21
@CryHavoc21

Okay, doing some more reading, I think I found the way to do it. Adapted from the "print all sections" example, I changed my "given an image img" part from

Table.iteri (Image.segments img) ~f:(fun mem seg -> [ ....... ]

to

Image.memory img |> Memmap.to_sequence |> Seq.iter ~f:(fun (mem, section) -> [ .......]

with the rest continuing as the "print all sections" example code does. So instead of directly getting the segments as a table, I just got the Memmap representation of the Image. This is working so far for me

TodAmon
@TodAmon
If an instruction is not lifted, is there a way to distinguish it from a NOP? It would be nice if the default behavior was some sort of exception...
Ivan Gotovchits
@ivg
@TodAmon, not really. In BAP infrastructure we have multiple sources of semantic information each contributing its own bit of semantics. So there is even no such notion as "not lifted". Though, it is always possible that we have absolutely no information about the instruction (e.g., let is_lifted x = Insn.(x <> empty)) but in general, we have at least some info. Do you have any more specific examples (or use cases) when you would need to distinguish non-lifted instructions?
Besides, there is also a log file, where lifters commonly spill out information about the instructions that weren't lifted.
CryHavoc21
@CryHavoc21

Hmm. Does anyone mind if I do some spitballing ideas? I'm hitting a roadblock with what I'm working on but am not sure quite what specific question to ask yet.

I'm currently in the process of writing a plugin that, given an ELF binary (specifically RISC-V 32), will use Bap to iterate over the sections of the binary , pull out the .text section, and do a transform on it. The transform will take the section as a list of int32 representing the bytes of the section (without disassembling it, sidestepping the lack of RISC-V semantics in Bap currently), and spit out two lists of int32: one of which should directly replace the .text section, and one that represents a new section that should go at the end of the program.

At the moment, I have the transform providing me these two lists correctly (after a surprising amount of work lol, it was kinda tough to get to this point). But I'm not really sure how I should approach creating a new ELF binary, or constructing new sections. Does Bap have infrastructure for doing this?

I guess I could try constructing a Memory from that list of int32 and then... somehow getting a section from that Memory object? I'm not entirely certain, but thought someone here might have some idea. Or that writing it out like this would help me "rubber-duck debugging" my way to a solution lol.

I guess I could... have a list of some kind that will get added to with every section as int32, then at the end i take this list of lists of int32, turn it all into one big list, and then try and make a Memory of it, and then... something it
CryHavoc21
@CryHavoc21
or take a giant list of int32 and.. just use OCaml to write it out to a new file in binary mode? I don't know if I'd be missing headers at that point, though
CryHavoc21
@CryHavoc21

Someone (i think i t was Ivan? I'm not sure) gave me this as a function to use:

(* given a Bap.Std.Memory, assume it is a list of 32 bit integers and return it as such. *)
let list_of_mem (mem :Memory.t) : (int32 list) = 
  List.rev @@ 
  Memory.fold mem ~word_size:`r32 ~init:[] ~f:(fun w ws ->
      Word.to_int32_exn w :: ws)

I guess I could try and do the reverse of this? and then I would have a Memory

CryHavoc21
@CryHavoc21
Oh wait a minute, is this it? Memmap.add : 'a t -> mem -> 'a -> 'a t?
https://binaryanalysisplatform.github.io/bap/api/master/bap/Bap/Std/Memmap/index.html#val-add
Ivan Gotovchits
@ivg

@CryHavoc21, first of all, let me state directly - using a list of bytes to represent a binary is probably a worst possible representation. It will hit you eventually. Do not forget that list data type is represented as a singly-linked list, therefore it is slow and inefficient.
The fast and efficient representation is Memory.t that is provided by BAP or Bigstring.t as provided by the Core_kernel libraryf. They use C arrays underneath the hood and is usually just using memory mapping to load the binary into the memory.

I'm currently in the process of writing a plugin that, given an ELF binary (specifically RISC-V 32), will use Bap to iterate over the sections of the binary , pull out the .text section, and do a transform on it. The transform will take the section as a list of int32 representing the bytes of the section (without disassembling it, sidestepping the lack of RISC-V semantics in Bap currently), and spit out two lists of int32: one of which should directly replace the .text section, and one that represents a new section that should go at the end of the program.

So let's start with a better representation of the binary patch. Following your requirements let's represent it as a sequence of chunks and one suffix, e.g.,

  type chunk = {
    offset : int;
    data : Bigstring.t
  }

  type patch = {
    chunks : chunk list;
    suffix : Bigstring.t;
  }

As you can see, we represent data with Bigstring.t, we also need to know the offset so that we can apply the patch. Now let's write the code that takes the input data and applies the binary patch to it and writes the it to the output file. We will use memory mapping for that, e.g.,

open Core_kernel
module Unix = UnixLabels

  let mapfile fd size =
    Bigarray.array1_of_genarray @@
    Mmap.V1.map_file fd
      Bigarray.char Bigarray.c_layout false [|size |]

  let apply_patch ~output src {suffix; chunks}  =
    let fd = Unix.openfile output
        ~mode:Unix.[O_RDWR]
        ~perm:0o600 in
    (* the size of the new file with suffix included *)
    let size = Bigstring.length src + Bigstring.length suffix in
    (* dst holds the uninitialized contents of the output file  *)
    let dst = mapfile fd size in
    (* first blit the input data *)
    Bigstring.blito ~src ~dst ();
    (* next, append the suffix *)
    Bigstring.blito ~src:suffix ~dst ~dst_pos:(Bigstring.length src) ();
    (* now, for each chunk blit (copy) it to the output *)
    List.iter chunks ~f:(fun {offset; data} ->
        Bigstring.blito ~src:data ~dst ~dst_pos:offset ());
    (* Don't forget to close it *)
    Unix.close fd

Now, what is left is to translate your int32 list patches to our representation and the program is ready. The first thing that you might ask is how to get an offset of the memory chunk wrt the original file, here is the answer,

  let offset mem = Bigsubstring.pos (Memory.to_buffer mem)

You really must insist and ask your colleague to rewrite its analysis so that it uses something better than int32 list, but for starters let's use this representation (only if you will promise to switch from it later). So first of all we need to write a function that will transform the int32 list back into Bigstring.t, e.g.,

let bigstring_of_list ints =
  let size = List.length ints * 4 in
  let dst = Bigstring.create size in
  List.iteri ints ~f:(fun i x ->
      Bigstring.set_int32_t_le dst ~pos:(i*4) x);
  dst

Notice how the chosen representation immediately hits us - we need to specify the endianness as we leaked the abstractions of Image and Memory (which hide the endianness)

Now, we have all the parts needed to implement our rewriter,
let rewrite (trans : int32 list -> int32 list * int32 list) image output =
  match find_section image ".text" with
  | None -> failwith "there is no .text section in the binary"
  | Some sec ->
    let chunk,suffix = trans (list_of_mem sec) in
    apply_patch ~output (Image.data image) {
      chunks = [{
          offset = offset sec;
          data = bigstring_of_list chunk
        }];
      suffix = bigstring_of_list suffix;
    }
Ivan Gotovchits
@ivg
Sorry, was driven away, now we can pack all this work into a single plugin and register a command rewrite that applies our transformation. We also add some command-line parsing to make it nice. Here is the full gist with a plugin that does all that you need. Only one hundred lines of code.
CryHavoc21
@CryHavoc21
okay wow, thanks for the response! I'll have to take some time stepping through this to understand all what is going on. There's a lot here I haven't seen before
Ivan Gotovchits
@ivg
yep, look into the gist, I've found a couple of bugs when I ran it. But what is left is to substitute the break_binary function with your colleagues function.
CryHavoc21
@CryHavoc21
Also I believe the reasoning for the choice of int32 list for their function was that their function was originally written in Coq/Galina and then formally verified there, then extracted to OCaml using the Coq built-in extractor. And I think for simplicity of the proof, it made more sense to use int32 list. I'm not sure if the extraction can be changed to extract to something like Bigstring.t, since that would introduce a change in semantics
I will definitely ask about it though, because even on small programs, the current transform I have is really chugging along slowly
Do you have an official link to the full Core_kernel API? I've found a bunch of different links, half of which seem to be dead or incorrect. And the API is partially mirrored on the Bap docs?
Ivan Gotovchits
@ivg
OK, if it is the Coq extraction then I concede :) Probably it is easier to keep it like this. Besides, ask them, maybe it is possible for them to give you some sort of (int * int32) list, i.e., a list of (offset,instr) pairs, which is much more concise.
With that said, I never use it. I use OCaml Merlin that gives me intellisence completion and documentation and search. I have no idea how people could code without having all those facilities) My suggestion is to set it up, see https://github.com/ocaml/merlin
it works for Emacs, Vim, Vscode, and other editors
CryHavoc21
@CryHavoc21
The function they have actually returns (int list * int list list option)
which is (jumptable, newcode), where the jumptable 1:1 replaces the old .text section and the newcode is put in its own section at the end of the file. it's int list list because it might need to return multiple instructions per instruction in the original program
Also gotcha, I'll look into Merlin. I've been just using vanilla vim for this, and dealing with the pain of scouring the documentation. It's actually been a major thorn in my side, since so much of Core_kernel shadows the original standard library Stdlib. So when I call out to a function like, suppose, "fold_left", it's hard to know which one I'm actually getting
(I'm normally a C++ programmer, so I suppose I'm spoiled by having an incredible standard library always to hand)
Ivan Gotovchits
@ivg
I see. It is a little bit suspicious that the jumptable is having the same size as the .text section. Concerning the newcode, I would suggest using List.concat on it, it has type 'a list list -> 'a list
CryHavoc21
@CryHavoc21
Right, that makes sense. That just appends all the inner lists together, one right after another, correct?
Ivan Gotovchits
@ivg
You might find it easier to code in OCaml if you will glimpse through the Real World OCaml book, it uses Janestreet's Core library as the standard library.

Right, that makes sense. That just appends all the inner lists together, one right after another, correct?

Yep, [[1;2;3];[4;5] goes to [1;2;3;4;5]

CryHavoc21
@CryHavoc21
Gotcha. And I remember seeing that book referenced, but I hadn't given it a look yet. I suppose I should actually go and do that, huh?
Ivan Gotovchits
@ivg
also, double check that instructions are written in little-endian, in riscv, they might be in fact written in big-endian, I didn't check it
CryHavoc21
@CryHavoc21
It should be little endian. All my test binaries that I've compiled have been, at least
Ivan Gotovchits
@ivg

I suppose I should actually go and do that, huh?

yep, coding in a language that you haven't learned is always painful

CryHavoc21
@CryHavoc21
I had done OCaml in a few university classes, but it's definitely not the same as doing a complex project. It at least feels like the most "real world" of the common functional languages (at least, more than something like lisp. Or Prolog, but that's not a functional language), but the standard library has been very frustrating to deal with
So in the gist, the mapfile function, you're directly mapping the buffers together? Which library does Mmap.V1.map_file come from?
Ivan Gotovchits
@ivg
that is why we're using Core, it is much bigger (probably much bigger than what C++ can offer you, something closer to Boost)
it came from the mmap library, which is just a compatibility library to enable the uniform interface to this functionality. I was using it to enable compatibility with older versions of OCaml in a matter of habit. But today it is probably a better idea to Unix.map_file, which was introduced in OCaml Unix module in 4.06.0 (and since bap requires at least 4.09, you can safely use it)
CryHavoc21
@CryHavoc21
Gotcha. Is there a difference between the Unix.map_file and UnixLabels.map_file?
Ivan Gotovchits
@ivg
functionally no, syntactically the latter has labels which makes it much easier to use correctly
CryHavoc21
@CryHavoc21
Ah. so the "Labels" part of UnixLabels means it... has labels. I should have probably seen that one coming, honestly.
Ivan Gotovchits
@ivg
haha)) yeap
CryHavoc21
@CryHavoc21

So when you make the call to

let chunk,suffix = trans (list_of_mem sec) in
...

that's using

val trans : ('a, 'b) t -> ('b, 'c) t -> ('a, 'c) t

from janestreet's Base Type_equal library, right? Can you explain what's going on there? I don't see how the call applies a function with that signature to something that is type int32 list

Ivan Gotovchits
@ivg

Nope, trans here is the parameter of the rewriter function:

let rewrite (trans : int32 list -> int32 list * int32 list) image output =
  match find_section image ".text" with
  | None -> failwith "there is no .text section in the binary"
  | Some sec ->
    let chunk,suffix = trans (list_of_mem sec) in
    apply_patch ~output (Image.data image) {
      chunks = [{
          offset = offset sec;
          data = bigstring_of_list chunk
        }];
      suffix = bigstring_of_list suffix;
    }

when you call rewrite you shall pass the Coq-extracted function there (after massaging it to fit the type), in the gist example, I am passing the break_binary function instead of trans, e.g.,

rewrite break_binary image output
CryHavoc21
@CryHavoc21
Oh I completely missed that it was the argument for the function lol.
That makes a lot more sense than what I was thinking