Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 13:38
    Rot127 opened #21
  • Aug 06 02:22

    github-actions[bot] on master

    da02e32 (compare)

  • Aug 06 01:06

    github-actions[bot] on v2.6.0-alpha

    makes RSP correction for x86 ta… KB-friendly functions in `Bap.S… (compare)

  • Aug 03 20:30

    ivg on master

    KB-friendly functions in `Bap.S… (compare)

  • Aug 03 20:29
    ivg closed #1545
  • Aug 03 01:22
    DukMastaaa synchronize #1546
  • Aug 03 01:18
    DukMastaaa opened #1546
  • Aug 02 19:58
    bmourad01 synchronize #1545
  • Aug 02 19:53
    bmourad01 synchronize #1545
  • Aug 02 18:49
    bmourad01 synchronize #1545
  • Aug 02 18:33
    bmourad01 synchronize #1545
  • Aug 01 23:40

    ivg on master

    makes RSP correction for x86 ta… (compare)

  • Aug 01 23:40
    ivg closed #1544
  • Aug 01 22:59
    bmourad01 synchronize #1545
  • Aug 01 22:01
    bmourad01 synchronize #1545
  • Aug 01 20:03
    bmourad01 opened #1545
  • Aug 01 19:26
    ivg opened #1544
  • Jul 30 02:29

    github-actions[bot] on master

    17c94e3 (compare)

  • Jul 30 01:10

    github-actions[bot] on v2.6.0-alpha

    prevents IR block contraction o… (compare)

  • Jul 28 15:24
    ivg closed #1543
DukMastaaa
@DukMastaaa

hi @ivg, my colleagues and I are preparing our next PR to bap. currently we have one file aarch64-vector.lisp with all the SIMD instructions we've implemented, but this will get larger and larger as there's many left. also, there are some vector helper functions we've put in aarch64-helper.lisp. do you think it's a good idea to make a subdirectory simd, so the file structure will be

semantics
- simd
  - aarch64-simd-data-movement.lisp
  - aarch64-simd-logical.lisp
  - aarch64-simd-arithmetic.lisip
  ...
  - aarch64-simd.lisp
- aarch64-arithmetic.lisp
- aarch64-atomic.lisp
...
- aarch64.lisp

so basically we make a aarch64-simd package which we (require) in aarch64.lisp?

also, should we also separate the vector helper functions into its own file?
later when we want to look at more floating point/crypto/sve instructions we can also make a fp subdirectory or something
DukMastaaa
@DukMastaaa
also, we have made a primitive nth-reg-in-group to split up the X0_X1 pairs (and larger groups) returned by LLVM. the code is here, but it only works with writing to the register with set$, not for reading in any way (like if 'X0 was used instead of X0).
given this is implemented in OCaml, is there any way to return a Primus variable like that of rd or rn passed in as the argument to one of our lifter functions, as those variables reify into correct BIL code for both reading and writing?
Chloe Fortuna
@fortunac
Is there a reason why the Z3 version for bap-primus-symbolic-executor is less than 4.8.13 for the opam repo? https://github.com/BinaryAnalysisPlatform/opam-repository/blob/testing/packages/bap-primus-symbolic-executor/bap-primus-symbolic-executor.master/opam
Ivan Gotovchits
@ivg

there's many left. also, there are some vector helper functions we've put in aarch64-helper.lisp. do you think it's a good idea to make a subdirectory simd, so the file structure will be

@DukMastaaa, the Primus Lisp features have a flat namespace and are all installed in the same folder, so I wouldn't suggest you to introduce a deep hierarchy. It may only confuse you.

@fortunac, yes, since this version z3 switched to dynamic linking, which doesn't work for us, especially when BAP is delivered as a debian package. With dynamic linking BAP will still work, e.g., when you're using it from opam, so you can safely override this constraint and use a newer version.
Ivan Gotovchits
@ivg

but it only works with writing to the register with set$, not for reading in any way (like if 'X0 was used instead of X0)

@DukMastaaa, according to your implementation it should also contain a value so you should be able to read its value. Make sure that the version of code that you're using is the same as the version that you're referencing. This is the only explanation that comes to mind. You can also use --show-knowledge in bap mc to show the contents of the registers to see if the variable denotation is properly assigned. Also, when you say that it doesn't work, what do you mean by that? What happens in particular?

Kenneth Adam Miller
@KennethAdamMiller
(I deleted my earlier question)
I got curious and wanted to ask you something. Why are there parameter types to the Knowledge run and slot instances. Like, Program.t and Class.t. What are those useful for.
Ivan Gotovchits
@ivg
The first parameter is the class index, the second parameter is the type of the values stored in this slot. So, Semantics.slot has type (Theory.program, Theory.Semantics.t) slot that tells us that this slot could be used to access the semantics property of the values that have Theory.program class and this property is represented with the Theory.Semantics.t type.
Kenneth Adam Miller
@KennethAdamMiller
Oh. I kind of only used the second parameter, and keep all of the class index to be uniform across all of my slots.
Kenneth Adam Miller
@KennethAdamMiller
Also, actually, I deleted my other question, but I have to ask it too. I'm having a non-persistant type instance exception. This instance must receive the command line options, which cannot be promised or provided at toplevel. So, I must call a function with a parameter that evaluates the promise. Then, afterward I try to collect downwind items that should be able to correctly collect the command line options. My non-persistant type instance comes up as none in my other code. My cache is clean and it is a fresh docker run in a build. I'm not sure why my promises and collections aren't matching up.
58 replies
mpmfrans
@mpmfrans
Hi all, can someone explain to me what method of path evaluation BAP uses to choose the right branches? either:
  1. select the best path heuristically (dynamic analysis, statistical likelihoods)
  2. eliminate paths (user defined context "chopping")
  3. external call abstraction (symbolically represent library functions)? Or perhaps a combination? Or something totally different?
Ivan Gotovchits
@ivg
@mpmfrans can you please elaborate what do you mean by "right branches" and "path evaluation"? Are you talking about the control-flow recovery and building the CFG?
25 replies
Benjamin Mourad
@bmourad01
@ivg #1545
Ivan Gotovchits
@ivg
:eyes:
Ivan Gotovchits
@ivg
the review is done!
Benjamin Mourad
@bmourad01
:+1: working on addressing the comments
Benjamin Mourad
@bmourad01
Hmm, I see:
00000952: sub bar(bar_result)
00000963: bar_result :: out u32 = low:32[RAX]

000004ee:
00000795: call %00000953 with noreturn

00000953: sub bar@5b2(bar@5b2_result)
00000964: bar@5b2_result :: out u32 = low:32[RAX]
Not sure what happened there
Ivan Gotovchits
@ivg
Looks like that the database was reset, so the call to bar@5b2 wasn't properly renamed
It could be that there is still some call to a toplevel-using function inside of the lift function
Benjamin Mourad
@bmourad01
I found some of those being called and replaced them, but the issue is still there. I'll try fixing tomorrow
DukMastaaa
@DukMastaaa

hi @ivg, sorry for delayed response. my colleagues and i have sent through #1546. we'll make the PRs more granular in the future.
the instructions added are now sufficient to fully lift cntlm, and include a substantial amount of SIMD instructions.
please take your time, there's no rush to review all of it.

about the primitive not working, i misspoke earlier; it does work for writing and reading in set$, but fails when i try to concat them. i've described this in the PR message and will reproduce here for convenience

[the] implementation prevents the following expression from reifying correctly:

(concat
  (nth-reg-in-group 'X0_X1 0)
  (nth-reg-in-group 'X0_X1 1))

The expected result is X0.X1, but printing out the result with msg gives 0x30000000000000004.
As a temporary workaround, a helper function (register-pair-concat r-pair) has been defined, containing a large switch statement with cases for each 'Xa_Xb, but this is not ideal.
Some advice on how to resolve this would be much appreciated.

Melina-Hoffmann
@Melina-Hoffmann
Hi, my colleague and I are relatively new to bap and after going through the python tutorial we are now trying to analyse some PE files.
Is there any convenient way to get strings from binaries using BAP? Our best strategy so far is to get the sections through BAP and then manually parse the section content to get the strings and calculate their addresses. Is there a better/ more convenient way to do this with bap-python and how do we know where the strings from the section appear in the graph representation of the program?
BAP does not seem to resolve imported functions (such as printf) in PE files, all functions are listed as "sub_<ADDR>". Is this a limitation with PE in BAP, or did we mess up something? Our binary is not stripped, and since we could retrieve the symbols using Ghidra, they are for sure within the file.
Ivan Gotovchits
@ivg
@DukMastaaa, thanks a lot, I will happily review the PR. Concerning the concat issue, it is still strange, I will take a deeper look into this issue.
@Melina-Hoffmann, concerning the names, what version of BAP are you using? If you're using the vagrant file from the tutorial, then it is a very old version of bap.
Ivan Gotovchits
@ivg

Is there any convenient way to get strings from binaries using BAP?

Yes, there is the strings plugin specifically for this:

$ bap testsuite/bin/i686-w64-mingw32-echo --strings --strings-print-addr | tail -n 16
419223: MajorImageVersion
419235: hDllHandle
419240: lpreserved
41924b: dwReason
419254: __enative_startup_state
41926c: ExceptionRecord
41927c: sSecInfo
419285: ExceptionRecord
419295: HighPart
41929e: pSection
4192a7: TimeDateStamp
4192b5: pNTHeader
4192bf: Characteristics
4192cf: pImageBase
4192da: VirtualAddress
4192e9: Section
Lenni Hein
@lennihein

what version of BAP are you using?

We are using the current v2.5.0.

Yes, there is the strings plugin specifically for this:

Ah, thanks. We will be looking into this. Is there any way to call this from Python itself?

Ivan Gotovchits
@ivg

We are using the current v2.5.0

So which names are not resolved for you, those that are externals, like malloc, strcpy, etc, or internals like the names of the local functions?

I have both resolved correctly in our testsuite PE file, but it was cross-compiled with GCC. What toolchain did you use to build your PE file? Would you mind sharing it with me?
mpmfrans
@mpmfrans

Hi, I receive the following error on Ubuntu 20.04 after installing the pre-build packages when running 'bap'

Failed to load plugin "primus-symbolic-executor": Failed to load z3ml: error loading shared library: Dynlink.Error (Dynlink.Cannot_open_dll "Failure(\"/tmp/z3ml956e09.cmxs: undefined symbol: Z3_mk_u32string\")")
Failed to load 1 plugins, details follow:
The plugin `/usr/local/lib/bap/primus_symbolic_executor.plugin' has failed with the following error:
Failed to load z3ml: error loading shared library: Dynlink.Error (Dynlink.Cannot_open_dll "Failure(\"/tmp/z3ml956e09.cmxs: undefined symbol: Z3_mk_u32string\")")

Ivan Gotovchits
@ivg
You mean you installed the debian packages?
44 replies
Lenni Hein
@lennihein

I have both resolved correctly in our testsuite PE file, but it was cross-compiled with GCC. What toolchain did you use to build your PE file? Would you mind sharing it with me?

using the Microsoft Compiler, we compiled the following program:

#include <stdio.h>

int main()
{
    char* local_str = "LOCAL_STRING";
    printf("%s\n", local_str);
    return 0;
}

Link to binary

external functions (internals as well), here vfprintf (gcc compiles to puts but that doesn't matter here) are only shown as sub_<ADDR> when using bap -d callgraph --llvm-x86-syntax=intel pe_min.exe

Ivan Gotovchits
@ivg
I see, thanks for the binary, I will play with it! Besides, you can share files here by simply drag and dropping them here
Lenni Hein
@lennihein
We had a similar experience to yours when using Mingw as well, which only prompted issues post stripping. As with ELF, stripping removes the internal names, but externals still can be resolved.

I see, thanks for the binary, I will play with it! Besides, you can share files here by simply drag and dropping them here

Uh, thanks for letting me know :)

Kenneth Adam Miller
@KennethAdamMiller
I have some questions about primus interpreter observations. Suppose, for every bil, I want to know all the values that went in and out, values assigned to variables from what address - all together. The observations have this split up, and I'm having trouble seeing how I would get that without connecting separate observations together between storing them. I think that that is possibly error prone, because I don't want to trust that the machine doesn't just do to loads followed by separate assignments.
Ivan Gotovchits
@ivg
I don't understand you
Kenneth Adam Miller
@KennethAdamMiller
I need to collect correctly associated triples of var addr and value. The observations are in pairs of var value and addr value I would like to be assured that when I collect them separately, say in a state that is option and where I incrementally update them as the observation is executed that, that things like two addr value pairs are hit followed by two var value or something like that. Because I have to use the value of one of the variables to finish the result. I can use a var Addr.Map.t combined with a separate value Addr.Map.t Var.Map.t to make sure that re-orderings don't ruin things, but it is best if I use just a tuple. Unless there is a better way to do things, I have to collect the information this way. I think there's a better way to do it, but I'm not sure.
Ah, I might have solved my problem... I should have just read the primus machine source.
Because I think a term is an atomic unit.
Ivan Gotovchits
@ivg
Why don't you just use the written event that already has the var * value payload and use Interpreter.pc to get the current address?
Kenneth Adam Miller
@KennethAdamMiller
The pc is the code address. But I need the address that the data came from.
Ivan Gotovchits
@ivg
Then you can use the loaded observation that provides the address (from which the value is loaded) and the value.
Kenneth Adam Miller
@KennethAdamMiller
ok, I will assembled them together.
Is the pc_changed fired just before the content at the address is executed, or after?
Ivan Gotovchits
@ivg
Just before