These are chat archives for mdedetrich/scala-json-ast

1st
Mar 2016
Ichoran
@Ichoran
Mar 01 2016 04:30
It's too bad you can't set moods on gitter. So you could set "experimental mood" and everything would be shaded, I don't know, rose, until you turned it off.
Matthew de Detrich
@mdedetrich
Mar 01 2016 04:30
lol
Ichoran
@Ichoran
Mar 01 2016 04:41
Anyway, I'll pick this up tomorrow.
Matthew de Detrich
@mdedetrich
Mar 01 2016 04:41
@Ichoran Thanks
I am adding a benchmark suite for the stuff that needs highest priority testing
Ichoran
@Ichoran
Mar 01 2016 04:41
But I really don't think I'm going to say much about offheap because I am too skeptical that it should go in now, and really want a common AST in.
Matthew de Detrich
@mdedetrich
Mar 01 2016 04:41
Which is the .toSafe and .toFast methods
Yeah I am talking about master, not offheap now
Ichoran
@Ichoran
Mar 01 2016 04:42
I am interested in .toSafe and .toFast performance in master.
Matthew de Detrich
@mdedetrich
Mar 01 2016 04:42
I had my fun for now so I am not going to work on it anymore
Precisely
Adding the benchmark suite into master now
Ichoran
@Ichoran
Mar 01 2016 04:42
And in Travis' concerns.
Matthew de Detrich
@mdedetrich
Mar 01 2016 04:43
Honestly, I just wanted to see if offheap had some really easy wins for little time spent, thats obviously not the case
Matt Farmer
@farmdawgnation
Mar 01 2016 04:46
Hi there.
Just got the notification about this room.
How goes it @mdedetrich ?
Matthew de Detrich
@mdedetrich
Mar 01 2016 04:46
Good!
Finally getting a bit of progress on the SLIP
Matt Farmer
@farmdawgnation
Mar 01 2016 04:47
!
Nice
Matthew de Detrich
@mdedetrich
Mar 01 2016 04:47
I should rename the description so its more clear
@Ichoran All pushed, eat your heart out!
@farmdawgnation If you have any questions, feel free to ask them
Matt Farmer
@farmdawgnation
Mar 01 2016 04:49
Yeah, I guess I'm kind of interested in what's happened with the project recently.
Life got kind of busy for me.
haha
So, what have I missed?
Matthew de Detrich
@mdedetrich
Mar 01 2016 04:50
Same, what happend is that the person in charge of SLIP, Dick Wall, stepped down from his position because the effort was too much
So the SLIP progress stalled. I asked @SethTisue, and he said to repackage the current proposal in the scala.json namespace to submit it
Which I am hoping to do by next week
Matt Farmer
@farmdawgnation
Mar 01 2016 04:51
Okay, cool, so you're targeting this for inclusion in the language proper instead of a common library at this point?
Matthew de Detrich
@mdedetrich
Mar 01 2016 04:51
So the idea is its going to be a Scala module (it won’t be part of stdlib, there is actually effort to modularise and move stuff outside of stdlib)
Matt Farmer
@farmdawgnation
Mar 01 2016 04:52
Ah, right, they've done that with xml, parser-combinators, etc
Starting in 2.11
Matthew de Detrich
@mdedetrich
Mar 01 2016 04:52
Yes, its still going to be a scala module, just like with xml and parser combinators
Exactly
Matt Farmer
@farmdawgnation
Mar 01 2016 04:52
How does that impact the release cycle flexibility?
Are you constrained by Scala's release roadmap by doing that?
Matthew de Detrich
@mdedetrich
Mar 01 2016 04:52
There was a concensus to stop adding so much stuff into the stdlib as it was getting too big
Matt Farmer
@farmdawgnation
Mar 01 2016 04:53
Aye
Matthew de Detrich
@mdedetrich
Mar 01 2016 04:53
I need clarification on that, but the idea of the AST is that, ideally, we only have one release for every major. There is a very strong emphasis on stability and compatiblity
So obviously it will get pushed/updated when a new major is out (or platform), but barring that, we don’t want to make extra releases.
The AST is trivial in design, so this shouldn’t be too much of an issue
Matt Farmer
@farmdawgnation
Mar 01 2016 04:54
Well the AST itself is trivial. I'm a bit concerned about the APIs that naturally have to be built around it.
Though, I suppose you could consider that outside of scope.
Matthew de Detrich
@mdedetrich
Mar 01 2016 04:55
Yup, hence why this is done in collaboratively. Last time this was worked on, we had a lot of support from the people in this room that they would either directly, or indirectly support the AST in its last design
@travisbrown Came up with an issue that we are investigating now with the usage of BigDecimal
The safe AST basically mirrors the design that play-json uses, and the fast mirrors the design that other AST’s use
So none of that has really changed
Matt Farmer
@farmdawgnation
Mar 01 2016 04:57
Okay
Matthew de Detrich
@mdedetrich
Mar 01 2016 04:57
As far as I am aware, the only thing that is causing someone to pull their hair out is JNumber being backed by a BigDecimal, due to stuff like BigDecimal("1e2147483647").toBigInt
Apart from that, its just adding tests for Javascript, and a benchmarking suite which is almost done
There are also spec tests to cover the AST
Matt Farmer
@farmdawgnation
Mar 01 2016 04:59
Yeah in Lift we split Number into JInt and JDouble - which still isn't fool proof, but causes surprisingly few issues.
So if I were to write a converting API between this AST and Lift's, which AST should I target? Safe or Fast?
Matthew de Detrich
@mdedetrich
Mar 01 2016 05:00
The real problem is that the JSON spec specifies that a number can be of any size
Matt Farmer
@farmdawgnation
Mar 01 2016 05:01
Yeah. :unamused:
Matthew de Detrich
@mdedetrich
Mar 01 2016 05:01
There is code that converts from safe to fast and vice versa. Ideally you would implement fast as part of your parser, and then provide an interface for users for both safe and fast
You can implement your querying on either AST, which ever one makes sense to do so.
Matt Farmer
@farmdawgnation
Mar 01 2016 05:02
There's no way we'd replace our AST before Lift 4, I think.
That's a pretty major change.
But I could be wrong.
Matthew de Detrich
@mdedetrich
Mar 01 2016 05:02
Generally what I see happening ideally is that frameworks, if possible, will parse the JSON initially into fast, and allow users to use fast, or safe. Quering can be implemented on either
@farmdawgnation Yeah thats fine, its totally expected that initially, frameworks may only be able to provide an interopt module. Parsing into fast initially is mainly a performance thing
Matt Farmer
@farmdawgnation
Mar 01 2016 05:03
Right, and that's probably a long-term possibility.
Matthew de Detrich
@mdedetrich
Mar 01 2016 05:03
fast is basically designed for performance/parsing, plus some other use cases
nafg
@nafg
Mar 01 2016 05:04
@mdedetrich do you have benchmarks that it's faster?
Matt Farmer
@farmdawgnation
Mar 01 2016 05:04
So, the hard bit here is we're going to have to evoke a lot of implicit conversions to maintain the same API richness that our AST has enjoyed for awhile.
Well, not really "hard"
nafg
@nafg
Mar 01 2016 05:04
@farmdawgnation implicit class ... extends AnyVal you mean?
Matthew de Detrich
@mdedetrich
Mar 01 2016 05:05
@nafg I would have to implement a parser to prove that “generally”. There was a benchmark suite in the json4s-ast repository which showed that fast is much faster on constructing the various AST types
Which of couse makes sense, constructing an Array takes much less memory/faster, than a Vector
Matt Farmer
@farmdawgnation
Mar 01 2016 05:05
@nafg Nah, I mean it looks like this AST won't support querying implementation inside the AST itself. So if we adopted this AST as our primary AST we'd have JObject from this AST and then a Lift JObject that implements \\
Matthew de Detrich
@mdedetrich
Mar 01 2016 05:05
same goes for Map
and String vs BigDecimal
Matt Farmer
@farmdawgnation
Mar 01 2016 05:06
We'd have to promote AST instances to things that we can query over.
nafg
@nafg
Mar 01 2016 05:06
@farmdawgnation precisely, you could enrich \\ on with implicit class
Matthew de Detrich
@mdedetrich
Mar 01 2016 05:06
@farmdawgnation Yeah, the idea is you use implicit classes/implicit conversions to add querying capabilities ontop
nafg
@nafg
Mar 01 2016 05:06
implicit class = good, implicit conversion = bad ;)
Matt Farmer
@farmdawgnation
Mar 01 2016 05:07
Well, they both have a cost associated with them.
Matthew de Detrich
@mdedetrich
Mar 01 2016 05:07
implicit conversions still have their uses in certain situations. Its usually a bad idea to overuse them though
nafg
@nafg
Mar 01 2016 05:07
not the same
Matthew de Detrich
@mdedetrich
Mar 01 2016 05:07
@farmdawgnation Performance, or you are talking about something else?
nafg
@nafg
Mar 01 2016 05:07
@farmdawgnation namely?
Matt Farmer
@farmdawgnation
Mar 01 2016 05:07
As I recall (don't have the link in front of me) a large implicit scope is frequently one of the common causes behind high compile times.
nafg
@nafg
Mar 01 2016 05:08
true, implicit search affects compile time.
not sure if this would be significant though
Matthew de Detrich
@mdedetrich
Mar 01 2016 05:08
I think improvements have been made in that area, but yes that is true. I suppose this is one of the costs of having really good abstractions that Scala has
nafg
@nafg
Mar 01 2016 05:09
The real slowdowns is when it has to do type computations
Matt Farmer
@farmdawgnation
Mar 01 2016 05:09
Yeah, I'm more asking the question than really saying it's an issue.
nafg
@nafg
Mar 01 2016 05:09
like finding the right typeclass instance, based on 3 type arguments ;)
Matt Farmer
@farmdawgnation
Mar 01 2016 05:09
I think "out loud" in chats
nafg
@nafg
Mar 01 2016 05:09
but that's really distinct from implicit search
me too ;)
and otherwise ;)
Matt Farmer
@farmdawgnation
Mar 01 2016 05:10
Alright, so @mdedetrich I think it makes sense for me to build out an initial interop module
Matthew de Detrich
@mdedetrich
Mar 01 2016 05:10
Cool, I actually have to get back to work now, @farmdawgnation if you have any issues, feel free to submit an issue and I will look into it.
Matt Farmer
@farmdawgnation
Mar 01 2016 05:11
Yeah, will do. I'll start building things out on my end and let you know if I find something.
Matthew de Detrich
@mdedetrich
Mar 01 2016 05:11
Yup, but I would wait until it gets approved and released as a module before doing that. You can make a prototype implementation like Play did, but I wouldn’t spend too much time on it right now
Matt Farmer
@farmdawgnation
Mar 01 2016 05:11
Before it's approved is the best time to work on it. Still time for things to change :grin:
nafg
@nafg
Mar 01 2016 05:12
exactly
Matthew de Detrich
@mdedetrich
Mar 01 2016 05:12
Yup, I just don’t want you to go around changing/breaking too much stuff if its going to change again. Probably a MVP of what you need to be comfortable
Matt Farmer
@farmdawgnation
Mar 01 2016 05:12
Yeah, I'm just going to write some conversions and some scalacheck tests
See if I can blow something up.
Matthew de Detrich
@mdedetrich
Mar 01 2016 05:12
:thumbsup:
Matt Farmer
@farmdawgnation
Mar 01 2016 05:13
Is there a proper name for this AST btw?
I've just been calling it the Common JSON AST
Matthew de Detrich
@mdedetrich
Mar 01 2016 05:15
Its probably going to be called scala-json-ast (its the JSON AST which will be part of scala-json)
just like you have scala-xml
eugene yokota
@eed3si9n
Mar 01 2016 06:59
the readme says safe is "Strictly pure. Library has no side effects/throwing errors" if A -> String is not referentially transparent, I don't think it's pure.
you can still keep Map and have purity if you hold on to the field ordering with a Vector like argonaut - https://github.com/argonaut-io/argonaut/blob/v6.0.4/src/main/scala/argonaut/JsonObject.scala#L103-L106
this would satisfy the constant lookup camp and insetion order camp with some penalty of memory
Matthew de Detrich
@mdedetrich
Mar 01 2016 11:25
@eed3si9n Yeah I need to revise the documentation properly. Its meant to say that any representation of safe.JValue is going to be valid, not that any possible JSON string can be passed in an idempotent correct way
Thats not really possible, if your JSON key strings, than any Map representation will cull the duplicates (this is also mentioned)
Matthew de Detrich
@mdedetrich
Mar 01 2016 12:00
@eed3si9n I have just updated the docs to remove this
Bryce Anderson
@bryce-anderson
Mar 01 2016 15:21
TLDR: I think the fast AST is a mistake in the context of a common AST for the scala ecosystem.
I have little to do with this conversation and speak only for myself based on my limited experience. If this is intended for a SLIP submission, I fail to understand how the goal of a common AST is best served by two AST's.
I understand that some projects need maximum throughput but I think those cases will have extremely tight requirements and have a low probability of being served by anything but a custom AST anyway.
nafg
@nafg
Mar 01 2016 15:41
I want to see the benchmarks first
Ichoran
@Ichoran
Mar 01 2016 17:19
@mdedetrich - I think it is not okay for fast to let an innocent user produce not even JSON. Bare NaN and Infinity have no business being in JSON. fast shouldn't mean wrong, just "maybe we don't do everything that is recommended or helpful".
Ichoran
@Ichoran
Mar 01 2016 17:29
I also don't think it helps anything to have to rewrap null, true/false, and string. JSON often has a lot of strings. That's a lot of rewrapping. You unfortunately need to put everything into the same file but otherwise you can have this kind of structure:
package test

package common {
  final class Quux extends one.Bar with two.Bar
}

package one {
  sealed trait Bar
  final class Baz extends Bar
}
package object one {
  type Quux = common.Quux
}

package two {
  sealed trait Bar
  final class Baz extends Bar
}
package object two {
  type Quux = common.Quux
}
Ichoran
@Ichoran
Mar 01 2016 17:49
Safe should deal with Float the same way as Double. Right now you can pass Float.NaN and get a mess.
Ichoran
@Ichoran
Mar 01 2016 17:56
safe's toFast doesn't seem to have a consistent type for the array passed to fast.JArray. I know you can't do much with an empty array, but since the type isn't erased, maybe it's better to get them all the same.
Er, to JObject, I mean. JValue vs JField. Also, no reason to create empty arrays more than once: create one and put them in the companion object as a val.
There isn't any reason I can think of why safe should insist on a BigDecimal. It needs to have it as an option, but Double is immensely faster.
eugene yokota
@eed3si9n
Mar 01 2016 18:23

if your JSON key strings, than any Map representation will cull the duplicates (this is also mentioned)

so in other words, safe violates RFC 4627?

at least for sbt's usage, I have no opinion about name/value duplicates
Ichoran
@Ichoran
Mar 01 2016 18:27
RFC7159 says, "The names within an object SHOULD be unique."
So it is "safe" in that ensures what the JSON SHOULD be (not what it is).
I think the "safe" name does not describe well what it does.
eugene yokota
@eed3si9n
Mar 01 2016 18:28
right. fast should be renamed to correct; safe should be renamed to should? :)
for benchmarking maybe look into what circe is doing? - https://github.com/travisbrown/circe
circe (being a fork of argonaut) also uses a combo of Map and Vector in its JObject
Matthew de Detrich
@mdedetrich
Mar 01 2016 22:38
@bryce-anderson fast also saves an issue @eed3si9n needs to be saved which can’t be served by safe, which is the abiility to have a representation of JSON that preserves ordering JObject, as well as any duplicate keys. SBT needs to be able to do String -> JSON -> String, and make sure the JSON contents are the same (for equality reasons). Its not possible to do that if you use a Map or even an OrderedMap
@Ichoran Yeah regarding the spec, it does indeed say that keys SHOULD be unique. Unfortunately I think its kinda a weasel word, and we need to make a decision one way or another. Since Scala Map, which is by far the best data structure to represent JObject (at least for users), it will cull any duplicate keys

@mdedetrich - I think it is not okay for fast to let an innocent user produce not even JSON. Bare NaN and Infinity have no business being in JSON. fast shouldn't mean wrong, just "maybe we don't do everything that is recommended or helpful”.

Wellfast is intended to be used by parsers, and there is a use case for letting (even incorrect) JSON to be parsed, if at some later point it errors out for performance reasons. Fast doesn’t have an guarantee for correctness in the sense that the JSON may be completely correct (for reasons of NaN for example), or if some invalid JSON causes the serialisation of the JValue to blow up at runtime

Like the anology that ws use before with Byte Buffers, you can easily shoot yourself in the foot in multiple ways, thats what you pay for performance
eugene yokota
@eed3si9n
Mar 01 2016 22:43
"there may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course." - https://tools.ietf.org/html/rfc2119 on SHOULD
Matthew de Detrich
@mdedetrich
Mar 01 2016 22:44

for benchmarking maybe look into what circe is doing? - https://github.com/travisbrown/circe
circe (being a fork of argonaut) also uses a combo of Map and Vector in its JObject

Definitely investigated this, we found doing this (at the time) was too hard for that custom data structure to also be represented as a proper Scala collection

eugene yokota
@eed3si9n
Mar 01 2016 22:44
SHOULD is a non-normative recommendation
Matthew de Detrich
@mdedetrich
Mar 01 2016 22:45

"there may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course." - https://tools.ietf.org/html/rfc2119 on SHOULD

Yeah, but the definition isn’t really helpful in deciding what you want to do. It basically leaves it to the implementor, where the implementer provides a warning

In any case, fast will retain duplicate keys/ordering, so that use case is covered, its one of the reasons for fast existing
eugene yokota
@eed3si9n
Mar 01 2016 22:46
i think it means { "a": 1, "a": 2 } is a valid JSON string, and safe not being able to parse it is a violation of the spec
Matthew de Detrich
@mdedetrich
Mar 01 2016 22:46
I also don’t see how you can use a hybrid Vector/Map data structure to deal with duplicate keys, Map just simply won’t work, and you suddenly have a Map(key) -> List[T]?
eugene yokota
@eed3si9n
Mar 01 2016 22:47
that's correct
so fast is the only RFC-adhering AST
Matthew de Detrich
@mdedetrich
Mar 01 2016 22:48
The impression I get, is that its “valid” but you shoul probably never do this, and its really stupid
and if you are going to do it, there needs to be a good reason
eugene yokota
@eed3si9n
Mar 01 2016 22:48
sure
it flies in the face of the claim that Map is somehow more spec-ahering though
Matthew de Detrich
@mdedetrich
Mar 01 2016 22:49
The situation in reality is that you should never rely on the ability of duplicate keys. I don’t have any issue in renaming safe if it better conveys this. I picked the word safe (rather than correct) to signify this difference
eugene yokota
@eed3si9n
Mar 01 2016 22:49
as noted above, I am ok with making the compromise
Matthew de Detrich
@mdedetrich
Mar 01 2016 22:49
However the docs probably need to be reworded better
Safe to me conotates thats its what is considered to be familiar and reasonable, and a good design for people (end users) to use
it may not be completely correct, which would be, as you say, fast

There isn't any reason I can think of why safe should insist on a BigDecimal. It needs to have it as an option, but Double is immensely faster.

Yeah, I don’t know a nice way to solve this. Definitely open for solutions

eugene yokota
@eed3si9n
Mar 01 2016 22:51
safe is too broad term for me to give me any idea
catching problem at compile-time I think would be a good goal
not losing data captured in JSON string also would be good goal
not sure either would be called safe
Matthew de Detrich
@mdedetrich
Mar 01 2016 22:53

catching problem at compile-time I think would be a good goal

You mean like throwing an Error or Either if its not correct?

eugene yokota
@eed3si9n
Mar 01 2016 22:53
JNumber situation sort of is conflict of two valid goals
Matthew de Detrich
@mdedetrich
Mar 01 2016 22:54
The idea behind safe being called safe, is that if me, as a user, has a JValue, we know (to a certain degree) that its both valid JSON (and what SHOULD be valid JSON, according to the situation), and if we do something like serialize it, it won’t blow up at runtime. It also means if we query it, it wont blow up at runtime
eugene yokota
@eed3si9n
Mar 01 2016 22:54
I think the right thing to do is stuff the number inside the String
but that could also mean we delay the error till later
Matthew de Detrich
@mdedetrich
Mar 01 2016 22:54
We can defininitely do error checking immediately and still store it in a String
There is a regex which will check for proper JNumber, need to hunt it down
@Ichoran regarding NaN/Infinity going to JNull, its what Spray does. Also I believe that its what a lot of other JSON implementations do when they encounter NaN/Infinity
@eed3si9n A better way to think of it, is that safe is kind of meant to represent a String. Its a valid immutable sequence of characters, but we are not going to gaurantee that constructing a String will always be error free. We know that when we work with a String, it will be "proper"
eugene yokota
@eed3si9n
Mar 01 2016 22:58
can fast contain JValue that's invalid?
Matthew de Detrich
@mdedetrich
Mar 01 2016 22:58
It can yes. If there is a NaN, it will store “NaN” as a string
Just like when working with byte buffers, you can easily store stuff that is not valid at all
Not intentionally, mainly for performance reasons
Of course there is nothing stopping a parser from doing the check before constructing fast

So it may be easier to think of

  1. safe -> String
  2. fast -> Array[Byte]

As a better analogy

I definitely think I need to improve the docs though to communicate this better, but its the best compromise I could come up with
I am still very open to storing the number as a String in fast that blows up when constructing the JValue if you put in an invalid number
The situation we have with BigDecimal is a bit unfortunate, Scala doesn’t have a proper unbounded (ideally even lazy) real number type in stdlib
eugene yokota
@eed3si9n
Mar 01 2016 23:03
you can't really predict what a "typical" application needs in term of the number
Matthew de Detrich
@mdedetrich
Mar 01 2016 23:03
I used BigDecimal because it was the only option available for using some sought of number thats in stdlib, not because it may have been the best choice
Well play has been using BigDecimal, and @jrudolph said in the thread that it has caused zero problems for their users
At least that he is aware of
eugene yokota
@eed3si9n
Mar 01 2016 23:05
I guess the only question would be parsing perf
Matthew de Detrich
@mdedetrich
Mar 01 2016 23:06
Personally I don’t see a situatution where people would store absolutely ginourmous numbers in JNumber as being typical. The bigger problem that @travisbrown picked up on is that it was possible to “freeze” when using one of the to methods because of user input creating a huge computation
String with regex would be faster than a BigDecimal, and should also make better performance for using .to for stuff like BigInt
however I think it would be slower for the typical use case, which is BigDecimal
The other really unforunate thing is there isn’t a parameterised Number type that you can use in Scala
eugene yokota
@eed3si9n
Mar 01 2016 23:08
if we're pushing for the standard thing, maybe we should pick one option and list out all the pros and cons and let the other library pick up the other use cases
Matthew de Detrich
@mdedetrich
Mar 01 2016 23:08
Yeah precisely, thats what I am trying to do
eugene yokota
@eed3si9n
Mar 01 2016 23:09
for example state "we don't care about parsing perf."
Matthew de Detrich
@mdedetrich
Mar 01 2016 23:09
Another solution is to use BigDecimal, which falls over to BigInt if the number is too large
Thats what Circe does, I am also open to that
eugene yokota
@eed3si9n
Mar 01 2016 23:10
so that option would be stating "we don't care about memory size" and for example duplicate both BigDecimal and String etc.
Matthew de Detrich
@mdedetrich
Mar 01 2016 23:10
There shouldnt be “too” much of a speed hit for reading the BigDecimal in that case for typical use cases
yeah
I think I am liking String option more
With a regex to verify at run time. If compile time regex exists, it could be even faster
eugene yokota
@eed3si9n
Mar 01 2016 23:11
currently it's hard to have rational discussion because we can argue over each other using different axes
Matthew de Detrich
@mdedetrich
Mar 01 2016 23:11
Yup
I think the important thing to remember is that this AST is for general use, and not for I want to deal with 5 TB of JSON data, or I want to deal with numbers which are e^34234234259791
eugene yokota
@eed3si9n
Mar 01 2016 23:13
yea. makes sense
if you have 5TB of data, you would split it using \n or something anyway
Matthew de Detrich
@mdedetrich
Mar 01 2016 23:14
So for general use, I think BigDecimal is “okay”. Double is also “okay” but for different reasons (Double will have much better performance in typical situation, but also only be able to store smaller numbers in size)
I want to see what @IChoran comes up with for the Double/BigDecimal problem
eugene yokota
@eed3si9n
Mar 01 2016 23:15
Double gives me cringes when I know that the number could just be a Long, but it could just be my knee-jerk reaction
Matthew de Detrich
@mdedetrich
Mar 01 2016 23:16
Unfortunately, JSON (and Javascript) just clump all numbers into one “Number” type
So that kind of an issue is problematic in Javascript land as well
I would have loved JSON to at least specify how large you want a number to be (and if its any larger, use a String and document it), that would have solved a lot of problems
but it is what it is
eugene yokota
@eed3si9n
Mar 01 2016 23:17
sure. we can again state it up front saying here's what this AST is going to support
Double is yicky because it can't represent 0.1 correctly
I'd much rather go for BigDecimal
Matthew de Detrich
@mdedetrich
Mar 01 2016 23:18
Well it can’t repersent a lot of things correctly, hence why its a Double
its meant for speed, not correctness
eugene yokota
@eed3si9n
Mar 01 2016 23:19
so the question is if we care about parsing speed or not
Matthew de Detrich
@mdedetrich
Mar 01 2016 23:20
Yup, fast is meant to take into account the speed concerns, and if you want speed with a bit of safety, you would use a parser that does that with fast. That kind of flexibility I designed intentionally
There is nothing stopping other JSON libraries/parsers putting their own validation ontop of fast, but fast is also meant to take into account the user case where you just quickly parse a String into a fast.JValue and then pass it around somewhere else (apparently this is really common in Spray land)
eugene yokota
@eed3si9n
Mar 01 2016 23:22
let's just decide that optimizing for parsing is not the goal of scala.json
Travis Brown
@travisbrown
Mar 01 2016 23:23
How does Double help with parsing speed? Seems like it's more of a concern for memory use and decoding speed.
Matthew de Detrich
@mdedetrich
Mar 01 2016 23:23
Sure, but fast will also satisfy the use case of accepting what the maximum bound of valid JSON is, so it doesn’t really change in that regard
@travisbrown Ask @Ichoran , he had to develop internal parsers for really high performance and there are some tricks that makes Double faster than anything else
@eed3si9n I think the point is, we can’t really clump the two AST’s together will fullfilling both requirements
At least if you want to use datastructures like Map, which by far is the best way to work with what should be valid JObject, same deal with Vector for JArray
Ichoran
@Ichoran
Mar 01 2016 23:26
@travisbrown - How long does it take to parse a number to BigDecimal? How long to Double?
Matthew de Detrich
@mdedetrich
Mar 01 2016 23:27
There are different use cases, and they are each as common as each other and in certain ways they are very distinct
eugene yokota
@eed3si9n
Mar 01 2016 23:27
@mdedetrich if we can agree that parsing perf is not a determining factor, then we can drop the fast thing and move on to the next axis
Travis Brown
@travisbrown
Mar 01 2016 23:27
@Ichoran Jawn for example does neither—it gives you a string that it's decided is a number.
eugene yokota
@eed3si9n
Mar 01 2016 23:27
which is referential transparency
Matthew de Detrich
@mdedetrich
Mar 01 2016 23:28
@eed3si9n For example, if you are going to be retrieving JSON from something like Postgres JSON type and you want to just send that back to the client, fast is going to be better than safe for that, especially since that you know the JSON is valid since its coming from Postgres
Ichoran
@Ichoran
Mar 01 2016 23:28
@travisbrown - That's better, in that it only slows you down by the time it takes to create the String and do an extra scan of the contents.
Matthew de Detrich
@mdedetrich
Mar 01 2016 23:28
However if you are getting JSON from a POST request, you want to be using safe, especially since you will be querying and validating it
@travisbrown Its not really faster, and actually its slower if you know what datatype you want
Jawn uses lazy to defer to the cost of converting numbers till later
But using lazy itself is a performance hit
eugene yokota
@eed3si9n
Mar 01 2016 23:30
if you have to sell the AST to a wide range of Scala community, I think having two set is confusing.
Ichoran
@Ichoran
Mar 01 2016 23:30
I agree. One set would be considerably better.
Matthew de Detrich
@mdedetrich
Mar 01 2016 23:30
@eed3si9n Sure, but I don’t think its actually possible to provide a single reasonable AST. We went through this before, and it didn’t work
eugene yokota
@eed3si9n
Mar 01 2016 23:30
the only reason fast exist is because we tried to be like Jawn
Matthew de Detrich
@mdedetrich
Mar 01 2016 23:30
It basically made no one happy. Either we couldn’t accept all JSON which is considered valid, or the datastructures used where so bad (problem Scalatra has now) that its not ideal for quering
Ichoran
@Ichoran
Mar 01 2016 23:31
You need different structures for modification than for fast I/O.
Matthew de Detrich
@mdedetrich
Mar 01 2016 23:31

the only reason fast exist is because we tried to be like Jawn

Thats not true

eugene yokota
@eed3si9n
Mar 01 2016 23:31
as in fast parsing using Array and String
Matthew de Detrich
@mdedetrich
Mar 01 2016 23:31
Its also to handle what is all valid JSON, and to deal with the issues for JObject. It just so happens that such a design also lends well to being fast
Having to deal with the idiosyncrasis in JObject (and arguably JNumber as well) lends itself well to being “fast” as well
Its kind of like killing two birds with one stone

But as far as I know,

Sure, but I don’t think its actually possible to provide a single reasonable AST. We went through this before, and it didn’t work

Is the case, and there isn’t any way around that

When people work with a JSON for quering, for example, they expect JObject to be “some sought of map that the native language uses”. Thats a Map in Scala. Using Map means you can’t represent all valid JSON
eugene yokota
@eed3si9n
Mar 01 2016 23:34
even if we know for whatever reason that an incoming JSON string is "valid" if we don't care about parsing perf, we can just re-validate and stick it inside scala.json
Travis Brown
@travisbrown
Mar 01 2016 23:34
I think I'm just unclear on the intended role of this library—if it's primarily an AST designed for interop and will often (usually?) be used in conjunction with a real decoding library, then JNumber(value: String) should be fine, but it doesn't seem like that's what it is.
Matthew de Detrich
@mdedetrich
Mar 01 2016 23:35
@travisbrown JNumber string with a runtime check to see if its a valid number was something that was being explored
So I am definitely open to it
The only real issue is, and its something that @jrudolph meantioned, is that Play uses the BigDecimal representation, and at least for 95% of their uses cases (maybe 100, they never got complaints in this area), BigDecimal was ideal.
Ichoran
@Ichoran
Mar 01 2016 23:39
I don't think Play has very much use for numbers.
(Still trying to catch up on past conversation.)
Regarding having a structure that serializes to not-JSON, like {"foo":Infinity}, I think this is not okay. You just shouldn't be able to produce not-even-JSON from a JSON AST. The whole point is for it to map cleanly onto JSON.
Regarding having an AST that assumes that a parser parsing to it will fail on valid JSON, that is also not okay. The spec says a conforming parser must accept all valid JSON. It can accept weird stuff like {"foo":Infinity} too, but it must accept all valid JSON.
But safe can just silently consume or discard extra values with the same key, so that's not really a problem.
Travis Brown
@travisbrown
Mar 01 2016 23:45
Is printing in scope for this library?
eugene yokota
@eed3si9n
Mar 01 2016 23:45
i was gonna say
we are conflating parsing/pretty printing into AST's data structure
Ichoran
@Ichoran
Mar 01 2016 23:45
Well, it should have a not-insane toString, but generally I would say no.
eugene yokota
@eed3si9n
Mar 01 2016 23:46
why not?
I would say the internal data structure is almost meaningless if we can hide it from the user
Ichoran
@Ichoran
Mar 01 2016 23:46
But if you have something that is a JValue and it contains the text string NaN (without quotes!), something is really wrong. You couldn't have gotten that from JSON.
So you may not be doing any parsing or printing, but you shouldn't fantasize stuff into your AST that you couldn't get from any conforming parse.
Travis Brown
@travisbrown
Mar 01 2016 23:48
@Ichoran agreed.
Ichoran
@Ichoran
Mar 01 2016 23:48
That is what I was complaining about.
A method object JValue { def apply(f: Float) = new JValue(f.toString) } just isn't right.
eugene yokota
@eed3si9n
Mar 01 2016 23:50
the crime above is f.toString, not the fact that JValue holds String
nafg
@nafg
Mar 01 2016 23:50
btw will upickle use this?
Travis Brown
@travisbrown
Mar 01 2016 23:50
Then have multiple JNumber constructors and let the decoder decide whether f.toString needs to be called.
Ichoran
@Ichoran
Mar 01 2016 23:50
@eed3si9n - Exactly.
Matthew de Detrich
@mdedetrich
Mar 01 2016 23:51

Regarding having a structure that serializes to not-JSON, like {"foo":Infinity}, I think this is not okay. You just shouldn't be able to produce not-even-JSON from a JSON AST. The whole point is for it to map cleanly onto JSON.

Sure, this is the use case for safe

Regarding having an AST that assumes that a parser parsing to it will fail on valid JSON, that is also not okay. The spec says a conforming parser must accept all valid JSON. It can accept weird stuff like {"foo":Infinity} too, but it must accept all valid JSON.

Yup, but a couple of other things I want to reiterate here

  1. Most other languages (that I know of), will silently glob, or kill, invalid keys. Even Javascript, with JSON was originally designed to work with, will do this. Ruby does this, so does Python, because they want to represent JObject as whatever the Map/Hash datastructure in their language is, and duplicate keys make no sense
  2. For use cases where you actually do need to deal with duplicate keys, people use their own solutions
Ichoran
@Ichoran
Mar 01 2016 23:52
If f.isNaN you shouldn't actually be able to create a JNumber from it. It's not about decoders deciding. It doesn't actually represent a JSON number.
@mdedetrich - I don't think it's okay to be so "fast" that you're wrong.
Dale Wijnand
@dwijnand
Mar 01 2016 23:53
move fast and break things
Matthew de Detrich
@mdedetrich
Mar 01 2016 23:54
Maybe, the intention was for parsers to handle this how they see fit, and thats mainly because parsers may handle breaking differently. One parser may break straight away on something invalid, another might keep on trying to go and report all of the errors
Travis Brown
@travisbrown
Mar 01 2016 23:54
@Ichoran Your JNumber constructor that takes a Float can do a runtime check to make sure it's not NaN or Infinity (that's what circe does).
Matthew de Detrich
@mdedetrich
Mar 01 2016 23:54
I think Lift does something different in this case (@farmdawgnation can you expand here)
Ichoran
@Ichoran
Mar 01 2016 23:54
Yes, that's what I do also, @travisbrown, but that's not what the current implementation of fast does. That's what I was criticizing.
Travis Brown
@travisbrown
Mar 01 2016 23:55
ah, got it.
Matthew de Detrich
@mdedetrich
Mar 01 2016 23:55
The idea is that if you need something that is both a valid and sane Json AST, thats what safe is for
It also does fill other cases
So its not that I don’t want to neccearily put some cheap error handling in fast, its just I don’t see a way to do it considering the different ways people deal with parsing errors right now in the real world
Ichoran
@Ichoran
Mar 01 2016 23:56
Actually, the AST that I am building for high-performance numeric use does actually admit the NaN case (and +- infinity) in the constructor. BUT it also handles its own serialization, and it prints all those as null. So it is hard to get yourself in trouble that way.
Matthew de Detrich
@mdedetrich
Mar 01 2016 23:56
Throwing exceptions versing using some either type versus grabbing all exceptions at the end versus even ignoring it
Ichoran
@Ichoran
Mar 01 2016 23:57
They can still choose whatever they want. Just don't put a NaN into a JNumber and expect to get a JNumber out.
Matthew de Detrich
@mdedetrich
Mar 01 2016 23:57
I think like @eed3si9n mentioned, there is nothing “wrong” with this, as long as its very clear and documented
Which is something that I obviously need to work on
Ichoran
@Ichoran
Mar 01 2016 23:57
I'm not sure we need something that potentially wrong as a module.
Matthew de Detrich
@mdedetrich
Mar 01 2016 23:58
The other thing I need to make clear, and I think my attitude to this one is similar to what Linus says, is that even though the spec says one thing, if in reality everyone does something else, thats what should be “default"

I'm not sure we need something that potentially wrong as a module.

I wouldn’t put it as “wrong”. Often to be “fast”, you need to be “wrong” (as youself just admitted), one way or another

You are either theoritically wrong but you never show it, or you admit you can be wrong, or you handle the situation
Ichoran
@Ichoran
Mar 01 2016 23:59
Well, I don't think the dichotomy here is about speed at all. It's more to do with immutable vs mutable representations.
eugene yokota
@eed3si9n
Mar 01 2016 23:59
I opened a Github issue to pick some set of goals - mdedetrich/scala-json-ast#2