Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Travis Brown
    @travisbrown
    ^ @non
    Travis Brown
    @travisbrown
    Any review appreciated: typelevel/jawn#144 typelevel/jawn#145
    Tharindu Galappaththi
    @TharinduDG

    Hi,
    I'm stuck with an error when parsing a json response from elastic search. This is my code.

      def dropLeadingChars: Pipe[F, Byte, Byte] = {
        def go(origStr: Stream[F, Byte], beginning: Boolean) : Pull[F, Byte, Unit] = {
          logger.info(s"origStr inside dropLeadingChars -> $origStr")
          origStr
          .pull
          .unconsN(if (beginning) 1 else CHUNK_SIZE, true)
          .flatMap{
            case Some((Chunk.Bytes(chunk, _, _), str)) if beginning && !chunk.headOption.contains('[') =>
              Pull.output(Chunk.empty) >> go(str, true)
            case Some((seg, str)) =>
              logger.info(s"inside some seg is ${seg.toString} and str is $str")
              Pull.output(seg) >> go(str, false)
            case None => Pull.done
          }
        }
        in => go(in, true).stream
      }
    
    val res2: Stream[F, String] = httpClient.getQueryStream(headers, subject.indexName, query, { resp =>
      resp
        .body
        .through(dropLeadingChars)
        .dropLastIf(_ == '}')
        .dropLastIf(_ == '}')
        .chunks
        .unwrapJsonArray
        .map{ json =>
          json.as[Hit].fold(
            err => s"Invalid json: $err for payload $json\n",
            hit => {
              logger.info(s"hit -> $hit")
              val instance = hit._source.instance
              logger.info(s"instnace -> $instance")
              JsonNormalizer.normalizeOld(instance) ++ "\n"
            }
          )
        }
    })

    The error:

    ,"stack_trace":"org.typelevel.jawn.ParseException: expected true got 'text/p...' (line 1, column 2)\n\tat org.typelevel.jawn.Parser.die(Parser.scala:132)\
    Tharindu Galappaththi
    @TharinduDG
    Here is the example json that I'm trying to pass: https://jsoneditoronline.org/?id=9dca6e705acc47ca9945bafd145755b9
    Andriy Plokhotnyuk
    @plokhotnyuk
    It looks that your input is wrong (like parsing of an HTTP header instead of the body)...
    Tharindu Galappaththi
    @TharinduDG
    @plokhotnyuk I'm using resp.body to get the body from the http4s.Response
    Andriy Plokhotnyuk
    @plokhotnyuk
    Could you, please, print and check the input before processing?
    Tharindu Galappaththi
    @TharinduDG
    @plokhotnyuk I checked actually. I also created a standalone app with this piece of code and passed the json from a file. Then it parsed fine. But in production env. it gives that error when it receives the json response.
    can you think of a reason for this?
    and the response headers doesn't have any text/p part. Its application/json
    Andriy Plokhotnyuk
    @plokhotnyuk
    Just print to the log a whole body on prod... it can be something unexpected like some buffer reusage error in blaze/http4s code which serve an HTTP connection... or some runtime issue, when you processing body with 4xx/5xx status error...
    Tharindu Galappaththi
    @TharinduDG
    @plokhotnyuk Ok, I'll try that. Thanks for your support.
    Travis Brown
    @travisbrown
    Anyone want to speak up for CharBuilder? typelevel/jawn#194
    Kewei-Wang-Kevin
    @Kewei-Wang-Kevin

    Hey guys, I have a question about flattening JSON to JSON path format string, for example
    convert json

    { "a": "b21",
      "c": {
         "d": "01"
         "e":[1,2,3]
      }
    }

    to

    /a/b21
    /a/c/d/01
    /a/c/e/1
    /a/c/e/2
    /a/c/e/3

    any good idea about this?

    Travis Brown
    @travisbrown
    Heads up: this morning's 1.0.0-RC1 is broken on JDK 8 (thanks to more specific method types in ByteBuffer).
    I've got a fix here: typelevel/jawn#201
    If any maintainers have a minute for a review I'd appreciate it.
    Vladislav Rybin
    @VladislavRybin
    Hey everyone. Can anyone point out what's the reason to use Jawn in my project.
    Why is it better than any of the existing solutions?
    I'm eager to try it out, though can't find any compelling reason.
    Travis Brown
    @travisbrown
    @VladislavRybin Which existing solutions do you have in mind?
    Vladislav Rybin
    @VladislavRybin
    @travisbrown Hi Travis, I'm talking about Json4s, Circe, spray-json, etc
    Ross A. Baker
    @rossabaker
    Jawn doesn’t replace any of those, but integrates with all of them.
    Travis Brown
    @travisbrown
    Right, almost everyone who uses Circe uses Jawn for parsing.
    Ross A. Baker
    @rossabaker
    Jawn is a JSON parser that is faster than many libraries’ native solution. In the case of Circe, which is my strong recommendation, it is the native solution.
    It also supports incremental parsing, which lends itself nicely to streaming solutions like circe-fs2 or jawn-fs2.
    Vladislav Rybin
    @VladislavRybin
    Got you, guys. Thanks for the explanation.
    Andriy Plokhotnyuk
    @plokhotnyuk
    Jawn is one of most slowers. Please see results of benchmarks which compares Circe (that uses Jawn) and other JSON parsers for Scala: https://plokhotnyuk.github.io/jsoniter-scala/
    Travis Brown
    @travisbrown
    I'm not sure those comparisons are entirely fair to Jawn. Circe's AST and decoding model have some overhead that means that some of the pairings there are apples and oranges, and in any case don't tell you much about Jawn itself.
    In my experience Jawn is competitive with Jackson as a parsing backend for Circe, and it's generally faster than spray-json's parser.
    Andriy Plokhotnyuk
    @plokhotnyuk
    Here is a PR with direct comparison of Jawn vs jsoniter-scala for parsing to Jawn's AST: plokhotnyuk/jsoniter-scala#424
    Ross A. Baker
    @rossabaker
    Converting bytes to Strings and then parsing those also slows things down unnecessarily.
    Ross A. Baker
    @rossabaker
    When your source is bytes, you should be using parseFromByteBuffer in jawn.
    Andriy Plokhotnyuk
    @plokhotnyuk
    the PR tests both ways... and it shows that though String option is faster on JDK 11
    Ross A. Baker
    @rossabaker
    I don’t know how many of these benchmarks you had reviewed by people who are experts in the respective projects, but some of the circe usage is a bit dubious.
    I would be a bit more careful making sure to use the libraries correctly before coming into their channels and taking a shit on their work.
    Andriy Plokhotnyuk
    @plokhotnyuk
    @rossabaker feel free to provide a PR which will make that numbers better... and ask clarifying questions to dispel all doubts
    Andriy Plokhotnyuk
    @plokhotnyuk
    Also, with my PR you can reproduce DoS/DoW vulnerability of JawnFacade and pick a solution of using java.util.LinkedHashMap instead of scala.collection.mutable.HashMap that was shamelessly copied from Circe. To reproduce, please, clone jsoniter-scala repo, checkout the jawn-ast branch and run the following command: sbt -no-colors 'jsoniter-scala-benchmark/jmh:run -i 1 -wi 1 -p size=1,10,100,1000,10000,100000 ExtractFieldsReading.jawn'
    And, you should get result like this:
    [info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
    [info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
    [info] experiments, perform baseline and negative tests that provide experimental control, make sure
    [info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
    [info] Do not assume the numbers tell you what you want them to tell.
    [info] Benchmark                                  (size)   Mode  Cnt        Score   Error  Units
    [info] ExtractFieldsReading.jawnByteBufferParser       1  thrpt       1811139.569          ops/s
    [info] ExtractFieldsReading.jawnByteBufferParser      10  thrpt        260677.346          ops/s
    [info] ExtractFieldsReading.jawnByteBufferParser     100  thrpt         19623.857          ops/s
    [info] ExtractFieldsReading.jawnByteBufferParser    1000  thrpt           301.378          ops/s
    [info] ExtractFieldsReading.jawnByteBufferParser   10000  thrpt             1.608          ops/s
    [info] ExtractFieldsReading.jawnByteBufferParser  100000  thrpt             0.006          ops/s
    [info] ExtractFieldsReading.jawnJsoniterScala          1  thrpt       2865937.664          ops/s
    [info] ExtractFieldsReading.jawnJsoniterScala         10  thrpt        330815.496          ops/s
    [info] ExtractFieldsReading.jawnJsoniterScala        100  thrpt         26956.753          ops/s
    [info] ExtractFieldsReading.jawnJsoniterScala       1000  thrpt          2279.925          ops/s
    [info] ExtractFieldsReading.jawnJsoniterScala      10000  thrpt           203.146          ops/s
    [info] ExtractFieldsReading.jawnJsoniterScala     100000  thrpt            16.296          ops/s
    [info] ExtractFieldsReading.jawnStringParser           1  thrpt       2266465.614          ops/s
    [info] ExtractFieldsReading.jawnStringParser          10  thrpt        358482.177          ops/s
    [info] ExtractFieldsReading.jawnStringParser         100  thrpt         24793.306          ops/s
    [info] ExtractFieldsReading.jawnStringParser        1000  thrpt           352.264          ops/s
    [info] ExtractFieldsReading.jawnStringParser       10000  thrpt             1.630          ops/s
    [info] ExtractFieldsReading.jawnStringParser      100000  thrpt             0.006          ops/s
    So the 1Mb request is able to burn 4GHz CPU core for 3 minutes... I hope this (and any other non-direct usage of Scala's HashMap/HashSet) will be fixed before 1.0.0 release
    Andriy Plokhotnyuk
    @plokhotnyuk
    Parsing of JSON is a minefield... Most of AST-based parsers are vulnerable under attacks which exploit using of recursion for parsing: https://github.com/lovasoa/bad_json_parsers
    Andriy Plokhotnyuk
    @plokhotnyuk
    BTW, Jawn API forces users to introduce yet another security vulnerabilities like: circe/circe#1040
    Srepfler Srdan
    @schrepfler
    just curious, how do other parsers address this kind of DoS?
    would adding support to limit for example big-ints via config to a certain amount of digits be viable solution?
    Travis Brown
    @travisbrown
    I've just been trying to get back to 1.0.0 preparation and I'm wondering what people think about removing the RawX layer?
    It was introduced in typelevel/jawn#102 to maintain the Facade interface, but in my view it doesn't serve any real purpose, it's badly named, and if we're about to commit ourselves to a long-term 1.0.0 now is the time to get rid of it.
    Travis Brown
    @travisbrown
    Also, if anyone has any objection to Scalafmt-ing the Jawn repo, please let us know asap: typelevel/jawn#210
    Ross A. Baker
    @rossabaker
    I'm not aware of Raw ever seeing use. The author went his own way, and I don't recall seeing it anywhere else.
    Travis Brown
    @travisbrown
    I’ll open a PR in the morning. Maybe we can get the scalafmt one merged before then?
    Ross A. Baker
    @rossabaker
    I gave it a second blessing. Just need to rerun again since the other work was merged.
    Travis Brown
    @travisbrown
    Okay, here's the PR: typelevel/jawn#219
    Travis Brown
    @travisbrown
    This week is our last chance to get changes into 1.0.0: https://github.com/typelevel/jawn/issues/193#issuecomment-573684699