Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 11:00
    pnerg commented #1067
  • 11:00
    pnerg closed #1067
  • 09:46
    ivantopo commented #1067
  • 09:02
    pnerg opened #1067
  • Oct 26 18:48
    ihostage synchronize #760
  • Oct 26 18:46
    ihostage synchronize #760
  • Oct 26 12:47
    nvollmar closed #1066
  • Oct 26 12:47
    nvollmar commented #1066
  • Oct 26 12:44
    nvollmar commented #1066
  • Oct 26 12:33
    ivantopo commented #1066
  • Oct 26 12:28
    nvollmar commented #1066
  • Oct 26 12:28
    nvollmar commented #1066
  • Oct 26 12:07
    nvollmar commented #1066
  • Oct 26 11:59
    ivantopo commented #1066
  • Oct 26 11:40
    nvollmar opened #1066
  • Oct 26 06:43

    ivantopo on master

    run release actions on our self… (compare)

  • Oct 26 06:05

    ivantopo on v2.3.1

    (compare)

  • Oct 26 06:00

    ivantopo on master

    Ensure scheduler startup in Pla… (compare)

  • Oct 26 06:00
    ivantopo closed #1065
  • Oct 25 15:47
    ivantopo edited #1065
Rajat Khandelwal
@prongs
sure, I can try with a public project, could take some time though. Meanwhile, If I were to retry the long-living-context thing, do you think it should work out of the box?
Ivan Topolnjak
@ivantopo
not out of the box
I think that with a bit of manual instrumentation you should be able to keep the user/session ids in a context and then create a new context with a new Span for each message that comes through that socket
out for lunch, will be back in about an hour!
Ivan Topolnjak
@ivantopo
back
Rajat Khandelwal
@prongs
My websocket actor will be doing some future operations and will send some messages to other actors. If I wrap these operations with Kamon.runWithContext(mySavedContext), then it should work to some extent, right?
Ivan Topolnjak
@ivantopo

sort of yes.. let me elaborate: if that initial Context has a not-sampled Span in it and you reuse that same Context then all other Spans that would be created by those actor messages will not be sampled as well. Also, all of the Spans would be part of the same trace which most likely will turn the trace useless if you end up having thousands and thousands of Spans spanning (possibly) hours.

I would recommend you manually create a Context with the user/session ids when the actor handling the websocket is created and then, every time you get a new "command" into that actor you create a new span for that new command, call yourOriginalContext.withEntry(Span.Key, newSpanForThisOperation) and then use Kamon.runWithContext

Rajat Khandelwal
@prongs
sure, let me try that. Can I still keep the global span (for the entire WebSocket) and manually close that on actor destroy, so that there is one span for the whole session?
Ivan Topolnjak
@ivantopo
Yeap, there is no problem with that!
Rajat Khandelwal
@prongs
alright, so I'm trying out something like this:
    case w: WebSocketMessage ⇒
      Kamon.runWithSpan(Kamon.spanBuilder(w.msgType.toString).start()) {
        handleWebSocketMessage(arg1, arg2, arg3)(w)
      }
Rajat Khandelwal
@prongs
I'm able to get some of the messages as individual traces. How do I tie them together in one trace?
and I'm getting individual traces for my outbound HTTP calls, db calls etc.
Ivan Topolnjak
@ivantopo
so, for example, one incoming message in the socket generates a Span and a related JDBC/HTTP call also generates a Span but those Spans are not tied on the same trace?
Rajat Khandelwal
@prongs
yup
Ivan Topolnjak
@ivantopo
do you have any Cats or Monix in between the websocket and the JDBC/HTTP calls?
Rajat Khandelwal
@prongs
I think context is not being propagated somehow with both Kamon.runWithContext and Kamon.runWithSpan
no cats or monix
(akka + play + slick)
Ivan Topolnjak
@ivantopo
ok.. what versions are you using?
of Play, Slick and Kamon
Rajat Khandelwal
@prongs
  val akkaVersion           = "2.6.5"
  val playVersion           = "2.6.13"
  val playSlickVersion      = "3.0.3"
  val kamonVersion          = "2.1.1"
Ivan Topolnjak
@ivantopo
and also using the SBT plugin for dev mode?
Rajat Khandelwal
@prongs
addSbtPlugin("io.kamon" % "sbt-kanela-runner-play-2.6" % "2.0.6")
addSbtPlugin("net.virtual-void" % "sbt-dependency-graph" % "0.9.2")
resolvers += Resolver.bintrayIvyRepo("kamon-io", "sbt-plugins")
addSbtPlugin("io.kamon" % "sbt-aspectj-runner-play-2.6" % "1.1.2")
but I'm not running in dev mode as of now. It's packaged as docker image and running in k8s.
Ivan Topolnjak
@ivantopo
ok ok
are you able to access the status page on that container and see what's the status of the instrumentation modules?
Rajat Khandelwal
@prongs
Actually I was targeting 3 of our services with Kamon. 2 of them are working great. The 3rd one had WebSockets. In that also, HTTP is working fine, only WebSocket is problematic. I've been debugging that since 2 days now, tried doing create context at different places (mostly wrapping the message passing to worker actors), but looks like somewhere it gets lost or something.
curl for instrumentation module gives this json
{
  "present": true,
  "modules": {
    "annotation": {
      "name": "Annotation Instrumentation",
      "description": "Provides a set of annotations to create Spans and Metrics out of annotated methods",
      "enabled": true,
      "active": false
    },
    "akka-http": {
      "name": "Akka HTTP Instrumentation",
      "description": "Provides context propagation, distributed tracing and HTTP client and server metrics for Akka HTTP",
      "enabled": true,
      "active": true
    },
    "executor-service": {
      "name": "Executor Service Instrumentation",
      "description": "Provides automatic Context propagation to all non-JDK Runnable and Callable implementations which enables\n         Context propagation on serveral situations, including Scala, Twitter and Scalaz Futures",
      "enabled": true,
      "active": true
    },
    "play-framework": {
      "name": "Play Framework Instrumentation",
      "description": "Provides context propagation, distributed tracing and HTTP client and server metrics for Play Framework",
      "enabled": true,
      "active": true
    },
    "mongo-driver": {
      "name": "Mongo Driver Instrumentation",
      "description": "Provides automatic tracing of client operations on the official Mongo driver",
      "enabled": true,
      "active": false
    },
    "akka-remote": {
      "name": "Akka Remote Instrumentation",
      "description": "Provides distributed Context propagation and Cluster Metrics for Akka",
      "enabled": true,
      "active": true
    },
    "jdbc": {
      "name": "JDBC Instrumentation",
      "description": "Provides instrumentation for JDBC statements, Slick AsyncExecutor and the Hikari connection pool",
      "enabled": true,
      "active": false
    },
    "scala-future": {
      "name": "Scala Future Intrumentation",
      "description": "Provides automatic context propagation to the thread executing a Scala Future's body and callbacks",
      "enabled": true,
      "active": true
    },
    "akka": {
      "name": "Akka Instrumentation",
      "description": "Provides metrics and message tracing for Akka Actor Systems, Actors, Routers and Dispatchers",
      "enabled": true,
      "active": true
    },
    "logback": {
      "name": "Logback Instrumentation",
      "description": "Provides context propagation to the MDC and on AsyncAppenders",
      "enabled": true,
      "active": true
    }
  },
  "errors": {}
}
Ivan Topolnjak
@ivantopo
when it gets to this point and you need to start debugging it gets a bit annoying but relatively easy: println eveywhere! :joy:
I would start logging the current trace id everywhere and see where it gets lost
if it is on a future, actor or something we already support then maybe there is an issue with the agent or initialization... if it is on something else that we don't support at the moment then new (or manual) instrumentation will be necessary
can you please try to narrow it down the specific interaction where the context is lost? and of course, ask if you need any help!
Rajat Khandelwal
@prongs
appreciate the help :)
Ivan Topolnjak
@ivantopo
if there are any akka streams in the middle that could also be breaking propagation
Rajat Khandelwal
@prongs
akka implements websockets as streams
:(
Ivan Topolnjak
@ivantopo
but that happens before it gets to your code, right?
Rajat Khandelwal
@prongs
Yup
Ivan Topolnjak
@ivantopo
it shouldn't be a problem if you are not using streams directly in your code. Try to focus on all places where there is an async boundary: creating or transforming futures, sending messages to actors, throwing tasks into schedulers
those are the typical places where a context would be lost
Rajat Khandelwal
@prongs

Another weird thing, I have enabled trace id in the logs. I'm able to see logs like this:

[info][2020-06-11_13:34:56.636] [8cb3f9b6f92ceebe|5e401da57b753cea] c.ClassName - Log Msg

Where the format is [%traceID|%spanID].

But when I open the URL http://jaeger-host:16686/trace/8cb3f9b6f92ceebe it shows 404

Ivan Topolnjak
@ivantopo
that probably means that the trace was not sampled
I don't remember whether we have a conversion rule for the sampling decision yet
but that's something that annoys me all the time! seeing trace id in the logs and then it wasn't sampled
we should encourage users to log the sampling decision as well!
1 reply
Rajat Khandelwal
@prongs
alright, sampling is fine, but I do have lot of async boundaries like you mention. Is there any recommended way to deal with that? I already wrapped my first actor forward in a runWithContext, do I need to do this in the whole chain?
And does sampling imply that if I try with just one websocket session, it might not even come up in jaeger? I need to try multiple sessions?
Ivan Topolnjak
@ivantopo
no, the Kamon instrumentation will automatically propagate context across actors and futures
regarding sampling, yes
the first Span of the chain takes a sampling decisions and then all related spans just follow the same sampling decision