Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
Alessandro Mantovani
@elManto
To me, it seems that to properly detect this, you need to have some info about the state . I mean, you should know that the i variable is increasing, and that eventually it exceeds the buffer length. But honestly I don't have any ideas about how to implement this in Joern. Maybe is there a better strategy more Joern-oriented?
Niko Schmidt
@itsacoderepo
@elManto you can query for the for condition
eg.
joern> cpg.method.controlStructure.expressionDown.order(2).code.l 
res45: List[String] = List("i <= N")
Niko Schmidt
@itsacoderepo
You could also do something like this:
joern> val loopTo = cpg.method.controlStructure.expressionDown.order(2).isCallTo(Operators.lessEqualsThan).argument.order(2).code.l.head 
loopTo: String = "N"

joern> cpg.method.local.typeFullNameExact(s"""int [ $loopTo ]""").code.l 
res60: List[String] = List("buf")
Niko Schmidt
@itsacoderepo
cpg.method                                        // query all methods
   .controlStructure                              // filter for control structures
   .parserTypeName("ForStatement")                // only for statements
   .expressionDown                                // "going one layer down"
   .order(2)                                      // choosing the second argument of the expression => for(i = 0; i <= N; i++){  
   .isCallTo(Operators.lessEqualsThan)            // it has to be a call to "<="   
   .argument                                      // going to the arguments of the call to "<="
   .order(2)                                      // second argument is the "N"
   .code                                          // get the code of the second argument
   .l                                             // as list (in this case it is only argument but could be more)
   .head                                          // get the first entry in the list
i guess we need to add a "howto find off-by-one errors" example, with comments and everything
Alessandro Mantovani
@elManto
I see, cool! Thanks
Alessandro Mantovani
@elManto

Hey, sorry guys, I cannot model this UAF:

struct a_type * a;
...
free(a);
a->field--;

My idea was to track the flows between the free args and the <operator>.indirectFieldAccess calls. But I'm getting en empty list for now

Rasmus Lindqvist
@rasmusli_gitlab

@rasmusli_gitlab right now, it's just dot, so you'd have to convert.

I re-wrote the old 'graph_for_funcs.sc' script to make it convert to json in that way. But I guess you'd want a more long-term solution that is not a script as a PR

Viktor Bard
@viktorbard_gitlab
Hi everyone! I'm using python subprocess to write commands into the Joern interactive shell. As my dataset is quite large i decided to split the process into many subsets. This is working fine for a small number of splits but for >3 splits the Joern interactive shell freezes and doesn't process. Is there a limit of calls that can be processed after each other in the interactive shell or should this be possible without closing it in between the calls?
Fabian Yamaguchi
@fabsx00
No intended limit, at least. Can you provide exact steps to reproduce?
@rasmusli_gitlab if you could share the script, that would be great. We can base a long term solution on it.
Viktor Bard
@viktorbard_gitlab
@fabsx00 After some consideration I think the problem lies in the size of the functions to be parsed. I tried filtering out large functions and then it works fine.
shan
@shan12138
@rasmusli_gitlab Hello, are you also reproducing the devign paper? I encountered the same problem as you during this process. After the revision, joern does not seem to support vertex node traversal, so that the "graph-for-funcs.sc" script does not run successfully. Do you have a solution to this problem now, and if yes, can you share it?Thank you very much.
Rasmus Lindqvist
@rasmusli_gitlab
@shan12138 , @fabsx00 , Yeah sure, I can share the script. I´ll be able to do it tomorrow afternoon :)
2 replies
scolleyuk3
@scolleyuk3
just a quick question: what algorithm do Joern/Ocular use to carry out taint tracking? taint tracking in Joern is interprocedural these days right?
xiaotianming
@xiaotianming
Does Joern support variable renaming?
@fabsx00 What is the way to call Joern in Python to analyze the source code of string format, not to generate files in workspace? There are so many files generated in this way that it is very slow.
m1cm1c
@m1cm1c
what is the difference between ORDER and ARGUMENT_INDEX? if present, ARGUMENT_INDEX always seems to be the same as ORDER
colorlight
@colorlight
hi everyone, I'm a starter of joern, I'm following the document of quick start, but when I import code in the joern-cli, I get a response of
joern> importCode(inputPath="./x42/c", projectName="x42-c")
Creating project x42-c for code at ./x42/c
Project with name x42-c already exists - overwriting
Support for this language is only available in ShiftLeft Ocular with an appropriate license
res0: Option[Cpg] = None
I'm wondering what's the problem
Fabian Yamaguchi
@fabsx00
hm, sounds like a bug in the distro. We'll get that fixed. For now, try joern-parse ./x42/c/ instead, and then in joern: importCpg("cpg.bin")
DogeWatch
@DogeWatch

@fabsx00 What is the way to call Joern in Python to analyze the source code of string format, not to generate files in workspace? There are so many files generated in this way that it is very slow.

I have the same question, do you have any idea now?

Claudiu-Vlad Ursache
@ursachec
@colorlight did you try downloading the latest version of Joern using the instructions at https://docs.joern.io/installation?
I've just tested the quickstart instructions on a Linux machine and they work as expected.
If that doesn't help, could you post your system details so we can look for potential issues with the distribution?
Claudiu-Vlad Ursache
@ursachec
@xiaotianming If I understood you correctly, and you'd like to generate a CPG for a subset of files found in a project directory, then I suggest that you point Joern at the subdirectory you intend to analyze. You won't get around generated files in the workspace, that's part of Joern's core functionality. As to your question about Python, you can start Joern as a server (https://docs.joern.io/server) and use a Python client library (https://github.com/joernio/cpgqls-client-python) to send it commands
@DogeWatch ^^
xiaotianming
@xiaotianming
@ursachec When my project contains a lot of files and I want to generate Cpg one by one, Joern's speed is too slow. When I generate Cpg from the project at once, the speed will be very fast.Is there any way to improve the separate analysis Speed?
Claudiu-Vlad Ursache
@ursachec
@xiaotianming triggering an analysis has some ramp-up time, so if you trigger multiple on small inputs, they may end up costing more time than a single large one. If you want to generate a large amount of CPGs using joern, then you might have to set up your own data processing pipeline, maybe using custom scripts (https://docs.joern.io/interpreter)
xiaotianming
@xiaotianming
Thank you !@ursachec
Nikita Mehrotra
@nikitamehrotra12
Hi, I'm a new Joern user. I was exporting the generated CPG14 to a dot file...but while using joern-export command I am getting error -> "command not found"
damaoooo
@damaoooo
Hi, How can i use export Joern CPG into (node.csv, edge.csv) or other file format which neo4j can read it can how can I export the three into python? I found that in old version of joern and neo4j, It is sure that change the data path of neo4j can do that, but in new version of joern or in new version of neo4j, that can't be done. So how can I export the CPG14 in neo4j and python?
m1cm1c
@m1cm1c
according to "Modeling and Discovering Vulnerabilities with Code Property Graphs", control flow edges need to be labeled: "While these edges need not be ordered as in the
case of the abstract syntax trees, it is necessary to assign a label of true, false or ε to each edge." how can these labels be accessed in joern? i assumed that edge labels are modeled as edge properties. but i cannot find a single control flow edge with any properties
sweetchuck8481
@sweetchuck8481
grafik.png
Hey guys, I installed joern today and encountered the same problem as @colorlight while trying the stuff from your documentation.
I am running a VM with Ubuntu 16.04.5
Thanks in advance for looking into it!
Juilia F
@FJuilia_twitter
hey :) i'm just wondering how i can follow data dependency edges. i can see them when i export the DDG via joern-export. but i don't know what types of edges to look for when i'm in joern. can you help me, please?
sweetchuck8481
@sweetchuck8481
Hello again. I checked Version v1.1.55 and with that it worked fine. Maybe that information can help.
Claudiu-Vlad Ursache
@ursachec

hey @FJuilia_twitter! Joern features a step named ddgIn you can use to follow data dependency edges. For example, in the following program:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char *argv[]) {
  if (argc > 1 && strcmp(argv[1], "42") == 0) {
    fprintf(stderr, "It depends!\n");
    exit(42);
  }
  printf("What is the meaning of life?\n");
  exit(0);
}

you can follow DDG edges for the call to strcmp like so:

joern> cpg.call.name("strcmp").ddgIn.l 
res103: List[nodes.TrackingPoint] = List(
  Literal(
    id -> 1000117L,
    code -> "0",
    order -> 2,
    argumentIndex -> 2,
    typeFullName -> "int",
    dynamicTypeHintFullName -> List(),
    lineNumber -> Some(6),
    columnNumber -> Some(43),
    depthFirstOrder -> None,
    internalFlags -> None
  ),
  MethodParameterIn(
    id -> 1000104L,
    code -> "char *argv[]",
    order -> 2,
    name -> "argv",
    evaluationStrategy -> "BY_VALUE",
    typeFullName -> "char * [ ]",
    dynamicTypeHintFullName -> List(),
    lineNumber -> Some(5),
    columnNumber -> Some(19)
  )
)
Claudiu-Vlad Ursache
@ursachec
Additionally, reachableBy might also help:
joern> cpg.call.name("strcmp").reachableBy(cpg.method.parameter).l 
res105: List[MethodParameterIn] = List(
  MethodParameterIn(
    id -> 1000104L,
    code -> "char *argv[]",
    order -> 2,
    name -> "argv",
    evaluationStrategy -> "BY_VALUE",
    typeFullName -> "char * [ ]",
    dynamicTypeHintFullName -> List(),
    lineNumber -> Some(5),
    columnNumber -> Some(19)
  )
)
Juilia F
@FJuilia_twitter

@ursachec thank you for your answer :) unfortunately, your solution does not seem to work. ddgIn always yields an empty list, including if i try your example. i also noticed that ddgOut does not exist:

joern> cpg.call.name("strcmp").ddgIn.l 
res59: List[nodes.TrackingPoint] = List()

joern> cpg.call.name("strcmp").l 
res60: List[Call] = List(
  Call(
    id -> 1000112L,
    code -> "strcmp(argv[1], \"42\")",
    name -> "strcmp",
    order -> 1,
    methodInstFullName -> None,
    methodFullName -> "strcmp",
    argumentIndex -> 1,
    dispatchType -> "STATIC_DISPATCH",
    signature -> "TODO assignment signature",
    typeFullName -> "ANY",
    dynamicTypeHintFullName -> List(),
    lineNumber -> Some(6),
    columnNumber -> Some(18),
    resolved -> None,
    depthFirstOrder -> None,
    internalFlags -> None
  )
)

joern> cpg.call.name("strcmp").ddgIn.l 
res61: List[nodes.TrackingPoint] = List()

joern> cpg.call.name("strcmp").ddgOut.l 
cmd62.sc:1: value ddgOut is not a member of overflowdb.traversal.Traversal[io.shiftleft.codepropertygraph.generated.nodes.Call]
val res62 = cpg.call.name("strcmp").ddgOut.l
                                    ^
Compilation Failed

if i try reachableBy, i also just get an empty list:

joern> cpg.call.name("strcmp").reachableBy(cpg.method.parameter).l 
res62: List[MethodParameterIn] = List()

is there a command that needs to be called first so that these commands work? like a command to build the DDG?

Claudiu-Vlad Ursache
@ursachec
@FJuilia_twitter Ah, right, forgot to mention, you have to run joern> run.ossdataflow first
Juilia F
@FJuilia_twitter

@ursachec thank you, it works now! :) however, i cannot re-create the data flow example of the paper "Modeling and Discovering Vulnerabilities with Code Property Graphs". the paper contains PDG

for this code:

void foo()
{
  int x = source();
  if (x < MAX)
    {
      int y = 2 * x;
      sink(y);
    }
}

i'm trying to prove using joern that there is data flow between int x = source() and sink(y). via ./joern-export --repr ddg --out outdir i get output that includes:

  "1000105" -> "1000109"  [ label = "x"] 
  "1000102" -> "1000109" 
  "1000116" -> "1000114"  [ label = "2"] 
  "1000116" -> "1000114"  [ label = "x"] 
  "1000102" -> "1000114" 
  "1000102" -> "1000116" 
  "1000109" -> "1000116"  [ label = "x"] 
  "1000114" -> "1000119"  [ label = "y"]

from this, i can see that 1000105 → 1000109 → 1000116 → 1000114 → 1000119 is a path. 1000105 is int x = source() and 1000119 is sink(y). this proves the data flow. now i want to re-create this in joern. because ddgOut does not seem to exist, i'm walking backwards (starting at the sink): https://pastebin.com/U8VHFBWD i eventually get to 1000106L which is the call to source() but i never get to the assignment call int x = source()

Juilia F
@FJuilia_twitter

reachableBy() does not seem to be the solution because that does not find the data flow either:

joern> cpg.call.id(1000105L).reachableBy(cpg.call.id(1000105L)).l 
res100: List[Call] = List(
  Call(
    id -> 1000105L,
    code -> "x = source()",
    name -> "<operator>.assignment",
    order -> 2,
    methodInstFullName -> None,
    methodFullName -> "<operator>.assignment",
    argumentIndex -> 2,
    dispatchType -> "STATIC_DISPATCH",
    signature -> "TODO assignment signature",
    typeFullName -> "ANY",
    dynamicTypeHintFullName -> List(),
    lineNumber -> Some(3),
    columnNumber -> Some(6),
    resolved -> None,
    depthFirstOrder -> None,
    internalFlags -> None
  )
)

joern> cpg.call.id(1000119L).reachableBy(cpg.call.id(1000119L)).l 
res101: List[Call] = List(
  Call(
    id -> 1000119L,
    code -> "sink(y)",
    name -> "sink",
    order -> 3,
    methodInstFullName -> None,
    methodFullName -> "sink",
    argumentIndex -> 3,
    dispatchType -> "STATIC_DISPATCH",
    signature -> "TODO assignment signature",
    typeFullName -> "ANY",
    dynamicTypeHintFullName -> List(),
    lineNumber -> Some(7),
    columnNumber -> Some(3),
    resolved -> None,
    depthFirstOrder -> None,
    internalFlags -> None
  )
)

joern> cpg.call.id(1000105L).reachableBy(cpg.call.id(1000119L)).l 
res102: List[Call] = List()

joern> cpg.call.id(1000119L).reachableBy(cpg.call.id(1000105L)).l 
res103: List[Call] = List()

i printed the reachability of the nodes to themselves first so you can be sure that i'm at the correct nodes. do you know why this doesn't work? :)

Claudiu-Vlad Ursache
@ursachec
@FJuilia_twitter I am not 100% certain, but I think the behavior you're seeing is because when you're referencing the call to the assigment operator, you're actually referring to the return value of that call, which in your case, is not part of the flow.
if you'd take the source as being the identifier x at line 3, you'd find a flow, and similarly for the call to source also at line 3
def source = cpg.identifier.lineNumber(3)
def sink = cpg.call.name("sink")
sink.reachableBy(source).l
Juilia F
@FJuilia_twitter
@ursachec i'll just always use .astChildrenthen. thank you very much! :)
Claudiu-Vlad Ursache
@ursachec
Glad I could help @FJuilia_twitter !
m1cm1c
@m1cm1c
hi, is it possible to unify two traversals more easily / more efficiently than by turning both of them into lists, concatenating the lists, and then feeding the concatenated lists into the Traversal constructor?