Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
m1cm1c
@m1cm1c
what is the difference between ORDER and ARGUMENT_INDEX? if present, ARGUMENT_INDEX always seems to be the same as ORDER
colorlight
@colorlight
hi everyone, I'm a starter of joern, I'm following the document of quick start, but when I import code in the joern-cli, I get a response of
joern> importCode(inputPath="./x42/c", projectName="x42-c")
Creating project x42-c for code at ./x42/c
Project with name x42-c already exists - overwriting
Support for this language is only available in ShiftLeft Ocular with an appropriate license
res0: Option[Cpg] = None
I'm wondering what's the problem
Fabian Yamaguchi
@fabsx00
hm, sounds like a bug in the distro. We'll get that fixed. For now, try joern-parse ./x42/c/ instead, and then in joern: importCpg("cpg.bin")
DogeWatch
@DogeWatch

@fabsx00 What is the way to call Joern in Python to analyze the source code of string format, not to generate files in workspace? There are so many files generated in this way that it is very slow.

I have the same question, do you have any idea now?

Claudiu-Vlad Ursache
@ursachec
@colorlight did you try downloading the latest version of Joern using the instructions at https://docs.joern.io/installation?
I've just tested the quickstart instructions on a Linux machine and they work as expected.
If that doesn't help, could you post your system details so we can look for potential issues with the distribution?
Claudiu-Vlad Ursache
@ursachec
@xiaotianming If I understood you correctly, and you'd like to generate a CPG for a subset of files found in a project directory, then I suggest that you point Joern at the subdirectory you intend to analyze. You won't get around generated files in the workspace, that's part of Joern's core functionality. As to your question about Python, you can start Joern as a server (https://docs.joern.io/server) and use a Python client library (https://github.com/joernio/cpgqls-client-python) to send it commands
@DogeWatch ^^
xiaotianming
@xiaotianming
@ursachec When my project contains a lot of files and I want to generate Cpg one by one, Joern's speed is too slow. When I generate Cpg from the project at once, the speed will be very fast.Is there any way to improve the separate analysis Speed?
Claudiu-Vlad Ursache
@ursachec
@xiaotianming triggering an analysis has some ramp-up time, so if you trigger multiple on small inputs, they may end up costing more time than a single large one. If you want to generate a large amount of CPGs using joern, then you might have to set up your own data processing pipeline, maybe using custom scripts (https://docs.joern.io/interpreter)
xiaotianming
@xiaotianming
Thank you !@ursachec
Nikita Mehrotra
@nikitamehrotra12
Hi, I'm a new Joern user. I was exporting the generated CPG14 to a dot file...but while using joern-export command I am getting error -> "command not found"
damaoooo
@damaoooo
Hi, How can i use export Joern CPG into (node.csv, edge.csv) or other file format which neo4j can read it can how can I export the three into python? I found that in old version of joern and neo4j, It is sure that change the data path of neo4j can do that, but in new version of joern or in new version of neo4j, that can't be done. So how can I export the CPG14 in neo4j and python?
m1cm1c
@m1cm1c
according to "Modeling and Discovering Vulnerabilities with Code Property Graphs", control flow edges need to be labeled: "While these edges need not be ordered as in the
case of the abstract syntax trees, it is necessary to assign a label of true, false or ε to each edge." how can these labels be accessed in joern? i assumed that edge labels are modeled as edge properties. but i cannot find a single control flow edge with any properties
sweetchuck8481
@sweetchuck8481
grafik.png
Hey guys, I installed joern today and encountered the same problem as @colorlight while trying the stuff from your documentation.
I am running a VM with Ubuntu 16.04.5
Thanks in advance for looking into it!
Juilia F
@FJuilia_twitter
hey :) i'm just wondering how i can follow data dependency edges. i can see them when i export the DDG via joern-export. but i don't know what types of edges to look for when i'm in joern. can you help me, please?
sweetchuck8481
@sweetchuck8481
Hello again. I checked Version v1.1.55 and with that it worked fine. Maybe that information can help.
Claudiu-Vlad Ursache
@ursachec

hey @FJuilia_twitter! Joern features a step named ddgIn you can use to follow data dependency edges. For example, in the following program:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char *argv[]) {
  if (argc > 1 && strcmp(argv[1], "42") == 0) {
    fprintf(stderr, "It depends!\n");
    exit(42);
  }
  printf("What is the meaning of life?\n");
  exit(0);
}

you can follow DDG edges for the call to strcmp like so:

joern> cpg.call.name("strcmp").ddgIn.l 
res103: List[nodes.TrackingPoint] = List(
  Literal(
    id -> 1000117L,
    code -> "0",
    order -> 2,
    argumentIndex -> 2,
    typeFullName -> "int",
    dynamicTypeHintFullName -> List(),
    lineNumber -> Some(6),
    columnNumber -> Some(43),
    depthFirstOrder -> None,
    internalFlags -> None
  ),
  MethodParameterIn(
    id -> 1000104L,
    code -> "char *argv[]",
    order -> 2,
    name -> "argv",
    evaluationStrategy -> "BY_VALUE",
    typeFullName -> "char * [ ]",
    dynamicTypeHintFullName -> List(),
    lineNumber -> Some(5),
    columnNumber -> Some(19)
  )
)
Claudiu-Vlad Ursache
@ursachec
Additionally, reachableBy might also help:
joern> cpg.call.name("strcmp").reachableBy(cpg.method.parameter).l 
res105: List[MethodParameterIn] = List(
  MethodParameterIn(
    id -> 1000104L,
    code -> "char *argv[]",
    order -> 2,
    name -> "argv",
    evaluationStrategy -> "BY_VALUE",
    typeFullName -> "char * [ ]",
    dynamicTypeHintFullName -> List(),
    lineNumber -> Some(5),
    columnNumber -> Some(19)
  )
)
Juilia F
@FJuilia_twitter

@ursachec thank you for your answer :) unfortunately, your solution does not seem to work. ddgIn always yields an empty list, including if i try your example. i also noticed that ddgOut does not exist:

joern> cpg.call.name("strcmp").ddgIn.l 
res59: List[nodes.TrackingPoint] = List()

joern> cpg.call.name("strcmp").l 
res60: List[Call] = List(
  Call(
    id -> 1000112L,
    code -> "strcmp(argv[1], \"42\")",
    name -> "strcmp",
    order -> 1,
    methodInstFullName -> None,
    methodFullName -> "strcmp",
    argumentIndex -> 1,
    dispatchType -> "STATIC_DISPATCH",
    signature -> "TODO assignment signature",
    typeFullName -> "ANY",
    dynamicTypeHintFullName -> List(),
    lineNumber -> Some(6),
    columnNumber -> Some(18),
    resolved -> None,
    depthFirstOrder -> None,
    internalFlags -> None
  )
)

joern> cpg.call.name("strcmp").ddgIn.l 
res61: List[nodes.TrackingPoint] = List()

joern> cpg.call.name("strcmp").ddgOut.l 
cmd62.sc:1: value ddgOut is not a member of overflowdb.traversal.Traversal[io.shiftleft.codepropertygraph.generated.nodes.Call]
val res62 = cpg.call.name("strcmp").ddgOut.l
                                    ^
Compilation Failed

if i try reachableBy, i also just get an empty list:

joern> cpg.call.name("strcmp").reachableBy(cpg.method.parameter).l 
res62: List[MethodParameterIn] = List()

is there a command that needs to be called first so that these commands work? like a command to build the DDG?

Claudiu-Vlad Ursache
@ursachec
@FJuilia_twitter Ah, right, forgot to mention, you have to run joern> run.ossdataflow first
Juilia F
@FJuilia_twitter

@ursachec thank you, it works now! :) however, i cannot re-create the data flow example of the paper "Modeling and Discovering Vulnerabilities with Code Property Graphs". the paper contains PDG

for this code:

void foo()
{
  int x = source();
  if (x < MAX)
    {
      int y = 2 * x;
      sink(y);
    }
}

i'm trying to prove using joern that there is data flow between int x = source() and sink(y). via ./joern-export --repr ddg --out outdir i get output that includes:

  "1000105" -> "1000109"  [ label = "x"] 
  "1000102" -> "1000109" 
  "1000116" -> "1000114"  [ label = "2"] 
  "1000116" -> "1000114"  [ label = "x"] 
  "1000102" -> "1000114" 
  "1000102" -> "1000116" 
  "1000109" -> "1000116"  [ label = "x"] 
  "1000114" -> "1000119"  [ label = "y"]

from this, i can see that 1000105 → 1000109 → 1000116 → 1000114 → 1000119 is a path. 1000105 is int x = source() and 1000119 is sink(y). this proves the data flow. now i want to re-create this in joern. because ddgOut does not seem to exist, i'm walking backwards (starting at the sink): https://pastebin.com/U8VHFBWD i eventually get to 1000106L which is the call to source() but i never get to the assignment call int x = source()

Juilia F
@FJuilia_twitter

reachableBy() does not seem to be the solution because that does not find the data flow either:

joern> cpg.call.id(1000105L).reachableBy(cpg.call.id(1000105L)).l 
res100: List[Call] = List(
  Call(
    id -> 1000105L,
    code -> "x = source()",
    name -> "<operator>.assignment",
    order -> 2,
    methodInstFullName -> None,
    methodFullName -> "<operator>.assignment",
    argumentIndex -> 2,
    dispatchType -> "STATIC_DISPATCH",
    signature -> "TODO assignment signature",
    typeFullName -> "ANY",
    dynamicTypeHintFullName -> List(),
    lineNumber -> Some(3),
    columnNumber -> Some(6),
    resolved -> None,
    depthFirstOrder -> None,
    internalFlags -> None
  )
)

joern> cpg.call.id(1000119L).reachableBy(cpg.call.id(1000119L)).l 
res101: List[Call] = List(
  Call(
    id -> 1000119L,
    code -> "sink(y)",
    name -> "sink",
    order -> 3,
    methodInstFullName -> None,
    methodFullName -> "sink",
    argumentIndex -> 3,
    dispatchType -> "STATIC_DISPATCH",
    signature -> "TODO assignment signature",
    typeFullName -> "ANY",
    dynamicTypeHintFullName -> List(),
    lineNumber -> Some(7),
    columnNumber -> Some(3),
    resolved -> None,
    depthFirstOrder -> None,
    internalFlags -> None
  )
)

joern> cpg.call.id(1000105L).reachableBy(cpg.call.id(1000119L)).l 
res102: List[Call] = List()

joern> cpg.call.id(1000119L).reachableBy(cpg.call.id(1000105L)).l 
res103: List[Call] = List()

i printed the reachability of the nodes to themselves first so you can be sure that i'm at the correct nodes. do you know why this doesn't work? :)

Claudiu-Vlad Ursache
@ursachec
@FJuilia_twitter I am not 100% certain, but I think the behavior you're seeing is because when you're referencing the call to the assigment operator, you're actually referring to the return value of that call, which in your case, is not part of the flow.
if you'd take the source as being the identifier x at line 3, you'd find a flow, and similarly for the call to source also at line 3
def source = cpg.identifier.lineNumber(3)
def sink = cpg.call.name("sink")
sink.reachableBy(source).l
Juilia F
@FJuilia_twitter
@ursachec i'll just always use .astChildrenthen. thank you very much! :)
Claudiu-Vlad Ursache
@ursachec
Glad I could help @FJuilia_twitter !
m1cm1c
@m1cm1c
hi, is it possible to unify two traversals more easily / more efficiently than by turning both of them into lists, concatenating the lists, and then feeding the concatenated lists into the Traversal constructor?
MeNicefellow
@MeNicefellow
Hi, just want to inquire anyone got any idea what is the best solution to convert the cpg.bin to json so that it could be loaded by python?
m1cm1c
@m1cm1c
@MeNicefellow it might be a better idea to export to the dot format: https://docs.joern.io/exporting/
MeNicefellow
@MeNicefellow
@m1cm1c Thanks man.
Anyone got any idea how to get the line number corresponds to a cpg node?
When I use joern-export to export it to dot files.
Alessandro Mantovani
@elManto
Hi! What's the best way to log the results of a query in a file?
Claudiu-Vlad Ursache
@ursachec
@elManto you can use the |> operator, e.g. cpg.method.fullName.l |> "my-fullnames.txt"
Juilia F
@FJuilia_twitter
hey :) i'm trying to distinguish write access from read access. is there a way of finding out whether a specific local variable gets written to? preferably also with some location of where (e.g. which of its identifiers or what call is used)
Claudiu-Vlad Ursache
@ursachec
@FJuilia_twitter it depends what you mean by gets written to. For assignments like x = 2, you can search the graph for CALL nodes with the assignment operator as their method, e.g. cpg.call.methodFullName(Operators.assignment).l. If you're looking for byte-copying stdlib functions with a specific variable as argument, you would search for cpg.call.code(".*strcpy.*").where(_.argument.codeExact("x")) . Other steps from the reference card might be helpful https://docs.joern.io/cpgql/reference-card
Juilia F
@FJuilia_twitter
@ursachec thanks for the ideas. i'm mostly concerned about writes through operators, not through functions that just happen to perform a write. but there are many ways operators can write. i can think of =, +=, -=, *=, /=, %=, |=, &=, ^=, <<=, >>=, var++, ++var, var--, and --var. but there might be more. i think that arrays and structs further complicate things. is there a universal way of detecting writes, at least as far as operators are concerned?
Niko Schmidt
@itsacoderepo
@FJuilia_twitter maybe there is a misunderstanding here. You can do:
call.png
So you can think of the method "assigment". If i stick to the code in the screenshot, it is =(res,crypto_scalarmult((unsigned char *)q, (unsigned char *)n, (unsigned char *)p).
Juilia F
@FJuilia_twitter
@itsacoderepo thanks but i know that operators are implemented as calls. i was wondering whether there is something built-in that finds all calls that definitely perform a write. shortly before you answered, i gave up on finding it and am now using a filter: .filter(node => node.property("NAME") != null && (Array("<operator>.preIncrement", "<operator>.postIncrement", "<operator>.preDecrement", "<operator>.postDecrement").toList.contains(node.property("NAME").toString) || node.property("NAME").toString.slice(0, 21).equals("<operator>.assignment")))
Niko Schmidt
@itsacoderepo

@itsacoderepo thanks but i know that operators are implemented as calls.

Then i misunderstood your question.

operators.png
you can use regex to get the methods you want ^