Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jan 16 02:01

    github-actions[bot] on v2.3.0-alpha

    introduces the BIL code slot an… (compare)

  • Jan 13 23:25

    ivg on master

    introduces the BIL code slot an… (compare)

  • Jan 13 23:25
    ivg closed #1266
  • Jan 13 21:31
    ivg edited #1266
  • Jan 13 21:30
    ivg edited #1266
  • Jan 13 21:30
    ivg edited #1266
  • Jan 13 21:29
    ivg opened #1266
  • Jan 09 01:50

    github-actions[bot] on v2.3.0-alpha

    (compare)

  • Jan 08 16:51

    ivg on master

    introduces BIL special encoding… (compare)

  • Jan 08 16:51
    ivg closed #1265
  • Jan 08 14:41
    ivg opened #1265
  • Jan 08 14:23

    ivg on master

    fixes PE/COFF sections (#1263) … (compare)

  • Jan 08 14:23
    ivg closed #1263
  • Jan 07 23:02
    ivg synchronize #1263
  • Jan 05 22:07
    ivg edited #1260
  • Jan 05 21:36
    ivg synchronize #1263
  • Jan 04 22:21
    ivg synchronize #1263
  • Jan 04 22:21

    ivg on master

    starts 2.3.0 release cycle (#12… (compare)

  • Jan 04 22:21
    ivg closed #1262
  • Jan 04 20:57
    ivg synchronize #1262
Daniel Peters
@dtpeters
@ivg if I'm running the uaf checker from the bap-toolkit on a binary around 868kb in a vm with a single core, and no other serious programs running, is a running time of over 8 hours strange?
Ivan Gotovchits
@ivg
yep, but check the swap usage, if bap started to swap (or any other gc program is swapping) then it will take eons to finish. I commonly disable swap on all my machines
bgz25
@bgz25
Can bap handle multithreaded programs and if so are there any pass examples for multithreaded bugs? Most recent discussion I found was from 2017 BinaryAnalysisPlatform/bap#688
Ivan Gotovchits
@ivg
@bgz, no models for pthread or other threading libraries are provided (but everything we need to model them is there, e.g., Primus can handle multiple execution states). No analyses that track multi-thread bugs are implemented.
Daniel Peters
@dtpeters
@ivg so its not swapping (yet) but it has taken a long long time for a seemingly small binary, but more importantly, the process keeps burping up stuff like this in the terminal window https://dpaste.org/FG5N
almost every other toolkit recipe did the same thing
Daniel Peters
@dtpeters
did I miss the chat window for today?
bgz25
@bgz25
@dtpeters fwiw I ran an analysis on one of the bap-artifacts that was spitting out the same errors you are getting and it still completed and generated incidents
Daniel Peters
@dtpeters
nooooooooo @bgz25 I killed the process after I posted that, the incidents and log were both really full, so there's that at least
thanks for the reply @bgz25
bgz25
@bgz25
no problem
Ivan Gotovchits
@ivg
@dtpeters, yes it is the bug that is triggered by your input and it will impede proper analysis, I will fix it soon (if you will create an issue with the file it will help me a lot)
Ivan Gotovchits
@ivg
A teaser for the #1217 PR, we now have a (nice) REPL that let's you interact with the knowledge base, well interactively :)
FactionCube
@FactionCube
A nice teaser.
SlaterLatiao
@SlaterLatiao
Hello, I'm using bap:symbolic-executor to test a binary. The system killed the process when bap tried to acquire 190G of memory. I'm wondering if this is normal and what causes bap to take so much memory. I'm using bap 2.20-alpha and this is my command to run bap: bap --run --primus-lisp-add=/home/zenong/bap_lisp/ --primus-lisp-load=bapFile,null-ptr,posix,symbolic-stdio --primus-print-obs=call,call-return,exception,segfault,assert-failure,pc-changed,lisp-message,lisp-type-error,system-stop /home/zenong/libarchive/libarchive_fuzzer --run-entry-points=entry__archive_read_support_format_rar --run-system=bap:symbolic-executor --log-dir=/home/zenong/libarchive/log --primus-print-output=/home/zenong/libarchive/obs. The binary I'm analyzing takes 33m of space.
Ivan Gotovchits
@ivg
sounds a bit exaggerated :) Did it actually start the symbolic execution? What do you have in the logs?
SlaterLatiao
@SlaterLatiao
log
This is my log file which is very long :) I'm not sure if the symbolic execution has started but the out-of-memory happens several hours after I started bap
Ivan Gotovchits
@ivg
@SlaterLatiao, yep, it didn't even get to the symbolic execution and run out of memory on disassembling. While 33Mb is a lot I had analyzed such big binaries and even larger, but in that case, I suspect, the culprit is the large number of SSE, MMX, and AVX instructions, which explode the state. Given that each instruction in MMX is a big program itself and we model it precisely it is expected. Our current goal is to be able to cope with binaries that are dozens and even hundred megabytes in size, but for this we need to stop keeping the whole program in the memory and do other tricks. But Primus (which is driving the symbolic executor) is still using the old abstractions that are quite memory hungry.
SlaterLatiao
@SlaterLatiao
@ivg Thank you! Is there a estimated time when Primus will be switched to the new abstractions?
Ivan Gotovchits
@ivg
Nope, it is as soon as I will get some time to work on it :) But it is in the queue so eventually, it will happen. What I can guarantee it won't happen in 2.2.0 (which will be solidified in October/November)
There is a solution to your problem, provided that my guess is correct, you can transform SSE/AVX instructions into intrinsics that will reduce the size of the disassembly a lot.

Using this option:

       --bil-enable-intrinsics=VAL
           Translate the specified instructions into calls to intrinsic
           functions. The option accepts a list of instruction specifications
           and can be specified multiple times. Each element of the list is
           either a keyword or a parametrized predicate. If an instruction
           matches any of the specifications than it will be translated into a
           call to an intrinsic function. The following keywords are
           recognized, :any - matches with any instruction, :unknown - matches
           with instructions that have unknown (to our lifters) semantics,
           :special - matches with instructions that have special semantics
           (expressed with the special statement by our lifters). The
           following predicates are recognized, asm:<str> matches with
           instructions which assembly strings start with <str>, tag:<str> -
           matches with instructions that have a tag (kind) that starts with
           <str>, <s1>:<s2> - matches with instructions that have opcodes
           starting with <s2> in the encoding that starts with <s1>. For
           predicates, all string comparisons are made case-insensitive.
           Example, :unknown,:special,asm:addsd,llvm:trap.

It is quite cumbersome, so maybe we should disable SSE by default (and use intrinsics for them) and add an oposite option that will enable them. Basically, it is very easy to kill BAP with SSE instructions and heavy inlining.

SlaterLatiao
@SlaterLatiao
Nice! I'll have a try.
Ivan Gotovchits
@ivg

If I include threads and not lwt.unix, it fails on an undefined unix_exit symbol from lwt.unix, so I don't think it is an exclusively thread related problem.

@philzook58, don't think that I've forgotten about you :) So, I looked at the problem and it looks like that the problem is in the compiler, i.e., in the unix.cmxs file that it generates. It is indeed missing the unix_exit symbol, which is never referenced in their code and therefore is not included. I've sent ocaml/ocaml#9906 to the upstream but even if it will be merged it will hit bap no earlier than in a couple of years, which is not an option for us.
Another two solutions is to ask lwt to remove the reference to unix_exit (or to add their own stub). There is also a workaround, we can just add external unix_exit : int -> 'a = "unix_exit" to the bap_frontend.ml and thus force the linker to link it in, and then we can easily link lwt programs into bap.

Also, it looks like that it was introduced only recently to lwt ocsigen/lwt#764 so that another option would be to temporary bound to a lower version of lwt.
Daniel Peters
@dtpeters
@ivg that teaser for the knowledge component looks awesome, like radare2 but with analysis built in
Chris Casinghino
@ccasin
@ivg very nice find on the missing unix_exit symbol.
Philip Zucker
@philzook58
Wow! Thanks Ivan! I think lowering the version of lwt is a reasonable option
Chris Casinghino
@ccasin
I was looking at the core theory documentation and noticed it mentions micro-architectural state like caches. Has anyone used BAP to model those kinds of program effects, or is the point just that this is potential future work that's supported by the extensibility of the core theory?
Ivan Gotovchits
@ivg
I haven't heard yet about anything real, but yes, the idea is that if necessary you can encode the microarchitectural details of a platform, including caches, pipelines, and execution units. And an important part is that it is possible to have both the low-level microarchitectural lifter and a high-level regular lifter operating at the same time. The only thing that is left - is to write such a lifter :) For the context, the Core Theory was also inspired by the SAIL intermediate language, which was used for ISA verification of microarchitectural properties.
Chris Casinghino
@ccasin
Ah - cool. SAIL is great - I've played with it a little as part of the RISC-V formal specification working group, which eventually adopted Peter's SAIL model as the official spec.
Philip Zucker
@philzook58
I was looking into making a liveness analysis and I found this in BAP https://binaryanalysisplatform.github.io/bap/api/master/bap/Bap/Std/Sub/index.html#val-compute_liveness Looking into the implementation it is using Graphlib to implement the fixpoint. I came into this thinking that I wanted to use the Knowledge Base for the fixpoint computation, but I haven't figured out how to get predecessors or successors in the CFG from the knowledge base. Is using Graphlib recommended or is this an older style?
Ivan Gotovchits
@ivg
It is recommended and it is a different fixed point. In KB we use a stochastic fixed point computation to reach the convergence of several sources of information (aka promises). There is no static graph of dependencies between them and it is discovered dynamically. When you have a static graph and you want to compute the fixed point solution over it, you should definitely use graphlib. Btw, why you're not using the compute_liveness function?
Ivan Gotovchits
@ivg
Besides, with that said, one fixed point doesn't contradict with the other and at the end should be used together, e.g., you can have an analysis, which is stored as a promise in the KB, that compute liveness for the given subroutine but during the computation the analysis may also query for the liveness properties of the other subroutines (which it calls) thus calling itself recursively and the KB will run the fixed point over the dynamically discovered callgraph.
Philip Zucker
@philzook58
I wanted finer grained liveness than at the block level, which is what the docstring seems to imply about compute_liveness. I wanted liveness at the statement level.
Ivan Gotovchits
@ivg
To compute liveness on the statement level I would suggest to compute liveness on the block level and use Blk.free_vars
codyroux
@codyroux
Wait
we shouldn't be using the KB for our usual abstract interpretation passes over the CFG?
Ivan Gotovchits
@ivg
Here is my simple point. If you have a CFG to start with, then using KB is counter-productive. If you don't have a CFG and the CFG itself is the variable in the set of the equation for which the fixed point solution is set, then the KB is the answer. Does this make sense?
Ivan Gotovchits
@ivg

Going back to @philzook58 question, if we would like to compute the liveness analysis without having a CFG, we can employ the KB for that. E.g.,

let () = KB.promise liveness @@ fun label ->
  KB.collect Theory.Program.Semantics.slot label >>= fun insn ->
  let sol = liveness_from_insn insn in
  let dests = KB.Value.get Insn.Slot.dests insn in
  fallthrough label >>= fun fall -> 
  let dests = Set.add dests fall in
  Set.to_sequence dests |>
  KB.Seq.fold ~init:sol ~f:(fun sol dest -> 
     KB.collect liveness dest >>| merge sol)

This will compute the fixed point solution, without actually building the graph. So that asking for the liveness at the given address will trigger some number of instruction (that belong to the same execution path, potentially all instruction, potentiall the current instruction itself) and iterate until the fixed point is reached. Of course, since we don't have the graph we can iterate in the desired order so we have to rely on the stochastic fixed point.

and in the dests property computation there could be the VSA computation, that itself may access the liveness information, and it will be included in the fixed point also.
Ivan Gotovchits
@ivg
However, I don't think that liveness can affect VSA (this is just my intuition, I don't have any proof, please correct me if I am wrong) so liveness refinement will not refine the control flow, therefore the CFG is a constant in this equation. But we can still use KB for liveness computation as a tradeoff between less iterations on a static huge CFG that takes hours to compute vs more iterations on a dynamic CFG that is discovered on-demand.
codyroux
@codyroux
It depends what your VSA is doing obviously
but yes I suspect it doesnt
Ok this is informative
Ok what if you want a /forward/ analysis, that requires a KB.collect on the predecessors?
Ivan Gotovchits
@ivg
Well, it would be harder and not that nice, but you can still compute predecessors as nothing stops you from having your inner data structure (which will be the mapping from labels to predecessors, read graph). But probably there is a more elegant solution, e.g., turning the analysis into the backward form.
Chas Emerick
@cemerick
Hi there :-) I'm curious, why have the (seemingly) tight ocaml dependency range >=4.07.0 & <4.10.0?
Ivan Gotovchits
@ivg
it is imposed by the dependency on the Janestreet Core_kernel library. They break interface every release and three compiler versions is roughly the lifetime of one version of core_kernel :(