x42-c
for code at ./x42/c
hey @FJuilia_twitter! Joern features a step named ddgIn
you can use to follow data dependency edges. For example, in the following program:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[]) {
if (argc > 1 && strcmp(argv[1], "42") == 0) {
fprintf(stderr, "It depends!\n");
exit(42);
}
printf("What is the meaning of life?\n");
exit(0);
}
you can follow DDG edges for the call to strcmp like so:
joern> cpg.call.name("strcmp").ddgIn.l
res103: List[nodes.TrackingPoint] = List(
Literal(
id -> 1000117L,
code -> "0",
order -> 2,
argumentIndex -> 2,
typeFullName -> "int",
dynamicTypeHintFullName -> List(),
lineNumber -> Some(6),
columnNumber -> Some(43),
depthFirstOrder -> None,
internalFlags -> None
),
MethodParameterIn(
id -> 1000104L,
code -> "char *argv[]",
order -> 2,
name -> "argv",
evaluationStrategy -> "BY_VALUE",
typeFullName -> "char * [ ]",
dynamicTypeHintFullName -> List(),
lineNumber -> Some(5),
columnNumber -> Some(19)
)
)
reachableBy
might also help:joern> cpg.call.name("strcmp").reachableBy(cpg.method.parameter).l
res105: List[MethodParameterIn] = List(
MethodParameterIn(
id -> 1000104L,
code -> "char *argv[]",
order -> 2,
name -> "argv",
evaluationStrategy -> "BY_VALUE",
typeFullName -> "char * [ ]",
dynamicTypeHintFullName -> List(),
lineNumber -> Some(5),
columnNumber -> Some(19)
)
)
@ursachec thank you for your answer :) unfortunately, your solution does not seem to work. ddgIn
always yields an empty list, including if i try your example. i also noticed that ddgOut
does not exist:
joern> cpg.call.name("strcmp").ddgIn.l
res59: List[nodes.TrackingPoint] = List()
joern> cpg.call.name("strcmp").l
res60: List[Call] = List(
Call(
id -> 1000112L,
code -> "strcmp(argv[1], \"42\")",
name -> "strcmp",
order -> 1,
methodInstFullName -> None,
methodFullName -> "strcmp",
argumentIndex -> 1,
dispatchType -> "STATIC_DISPATCH",
signature -> "TODO assignment signature",
typeFullName -> "ANY",
dynamicTypeHintFullName -> List(),
lineNumber -> Some(6),
columnNumber -> Some(18),
resolved -> None,
depthFirstOrder -> None,
internalFlags -> None
)
)
joern> cpg.call.name("strcmp").ddgIn.l
res61: List[nodes.TrackingPoint] = List()
joern> cpg.call.name("strcmp").ddgOut.l
cmd62.sc:1: value ddgOut is not a member of overflowdb.traversal.Traversal[io.shiftleft.codepropertygraph.generated.nodes.Call]
val res62 = cpg.call.name("strcmp").ddgOut.l
^
Compilation Failed
if i try reachableBy
, i also just get an empty list:
joern> cpg.call.name("strcmp").reachableBy(cpg.method.parameter).l
res62: List[MethodParameterIn] = List()
is there a command that needs to be called first so that these commands work? like a command to build the DDG?
@ursachec thank you, it works now! :) however, i cannot re-create the data flow example of the paper "Modeling and Discovering Vulnerabilities with Code Property Graphs". the paper contains PDG
for this code:
void foo()
{
int x = source();
if (x < MAX)
{
int y = 2 * x;
sink(y);
}
}
i'm trying to prove using joern that there is data flow between int x = source()
and sink(y)
. via ./joern-export --repr ddg --out outdir
i get output that includes:
"1000105" -> "1000109" [ label = "x"]
"1000102" -> "1000109"
"1000116" -> "1000114" [ label = "2"]
"1000116" -> "1000114" [ label = "x"]
"1000102" -> "1000114"
"1000102" -> "1000116"
"1000109" -> "1000116" [ label = "x"]
"1000114" -> "1000119" [ label = "y"]
from this, i can see that 1000105 → 1000109 → 1000116 → 1000114 → 1000119 is a path. 1000105 is int x = source()
and 1000119 is sink(y)
. this proves the data flow. now i want to re-create this in joern. because ddgOut
does not seem to exist, i'm walking backwards (starting at the sink): https://pastebin.com/U8VHFBWD i eventually get to 1000106L which is the call to source()
but i never get to the assignment call int x = source()
reachableBy()
does not seem to be the solution because that does not find the data flow either:
joern> cpg.call.id(1000105L).reachableBy(cpg.call.id(1000105L)).l
res100: List[Call] = List(
Call(
id -> 1000105L,
code -> "x = source()",
name -> "<operator>.assignment",
order -> 2,
methodInstFullName -> None,
methodFullName -> "<operator>.assignment",
argumentIndex -> 2,
dispatchType -> "STATIC_DISPATCH",
signature -> "TODO assignment signature",
typeFullName -> "ANY",
dynamicTypeHintFullName -> List(),
lineNumber -> Some(3),
columnNumber -> Some(6),
resolved -> None,
depthFirstOrder -> None,
internalFlags -> None
)
)
joern> cpg.call.id(1000119L).reachableBy(cpg.call.id(1000119L)).l
res101: List[Call] = List(
Call(
id -> 1000119L,
code -> "sink(y)",
name -> "sink",
order -> 3,
methodInstFullName -> None,
methodFullName -> "sink",
argumentIndex -> 3,
dispatchType -> "STATIC_DISPATCH",
signature -> "TODO assignment signature",
typeFullName -> "ANY",
dynamicTypeHintFullName -> List(),
lineNumber -> Some(7),
columnNumber -> Some(3),
resolved -> None,
depthFirstOrder -> None,
internalFlags -> None
)
)
joern> cpg.call.id(1000105L).reachableBy(cpg.call.id(1000119L)).l
res102: List[Call] = List()
joern> cpg.call.id(1000119L).reachableBy(cpg.call.id(1000105L)).l
res103: List[Call] = List()
i printed the reachability of the nodes to themselves first so you can be sure that i'm at the correct nodes. do you know why this doesn't work? :)
x
at line 3, you'd find a flow, and similarly for the call to source
also at line 3
def source = cpg.identifier.lineNumber(3)
def sink = cpg.call.name("sink")
sink.reachableBy(source).l
x = 2
, you can search the graph for CALL nodes with the assignment operator as their method, e.g. cpg.call.methodFullName(Operators.assignment).l
. If you're looking for byte-copying stdlib functions with a specific variable as argument, you would search for cpg.call.code(".*strcpy.*").where(_.argument.codeExact("x"))
. Other steps from the reference card might be helpful https://docs.joern.io/cpgql/reference-card