These are chat archives for nrinaudo/kantan.csv

30th
Jan 2017
Andrew Roberts
@aroberts
Jan 30 2017 17:14
@nrinaudo want a PR and test for that discriminator thing from last week? I’d be more than happy to port it over :)
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 17:59
@aroberts That'd be great! The test might be tricky to write though, I have this whole laws framework setup that is natural to me but not documented at all and probably a mess to objective eyes... can we discuss this later when I'm in front of a proper computer?
Andrew Roberts
@aroberts
Jan 30 2017 18:09
sure thing
Andrew Roberts
@aroberts
Jan 30 2017 19:12
@nrinaudo this one should be simple: my use case is streaming data, rather than a static file. what’s the best way to get a reusable String -> T chain for T: RowDecoder?
Andrew Roberts
@aroberts
Jan 30 2017 19:20
I found stringReaderResource, but it’s not super clear what the proper chain is to go from Resource -> Decoder
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 20:24
mmm... I think you might be mixing type classes here
A Resource is used to manipulate streams (of bytes or characters), a Decoder to turn encoded data into decoded values
@aroberts can you decribe your use case in a bit more details? I'm not clear how you go from streaming data to String -> T
Andrew Roberts
@aroberts
Jan 30 2017 20:27
@nrinaudo sure thing - it looks like the documentation is all oriented towards a use case where the client starts with something more or less like a csv file
be it an input stream, file object, url, whatever
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 20:27
right.
Andrew Roberts
@aroberts
Jan 30 2017 20:27
my use case is a single CSV row
which I get at unpredictable but frequent intervals
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 20:28
In that case, can't you just RowDecoder[YourType].decode(yourRow) ?
Andrew Roberts
@aroberts
Jan 30 2017 20:29
I think that takes Seq[String]
should I just handle the string reading out of band?
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 20:29
sorry, don't know what I was thinking, of course you're right
well, it's not pretty, but you could always yourRow.readCsv[List, YourType]
(provided you've imported kantan.csv.ops._)
that's essentially treating each row as an entire CSV resource
Andrew Roberts
@aroberts
Jan 30 2017 20:31
and then headOption to translate it? that’s what I have now (agreed, not pretty)
yeah- I was a little worried about instantiation overhead with that approach
do you have a gut feeling on that?
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 20:32
Honestly, if there's any overhead, it should be minimal
well, alright, so there's a pretty good solution to your problem, but you won't like it
kantan.csv does have a scalaz-streams module
Andrew Roberts
@aroberts
Jan 30 2017 20:33
hmm
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 20:33
it's kind of deprecated, since scalaz-streams has been superseded by fs2, but it works
Andrew Roberts
@aroberts
Jan 30 2017 20:33
is there anything cats-flavored?
(I don’t currently depend on scalaz, in favor of cats)
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 20:34
not out of the box, but I'm pretty sure fs2 is supports either, and it should be fairly simple to write an fs2 module
I've not done it yet because kantan.csv still supports 2.10, but fs2 does not
but now that I think of it, that sounds like a lot of work when what you have is perfectly serviceable
to summarize this whole conversation: now I understand what you mean, and no, kantan.csv does not have first party support for your use case
Andrew Roberts
@aroberts
Jan 30 2017 20:36
I’ll stick with what I have for now, but I might open a ticket for a clearer interface to that - though, you might say, streaming individual CSV lines seems like a bad idea. I’d agree with you - but what can you do?
thanks :)
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 20:38
your use case is actually sorted by treating each row as an entire CSV file. But I've had request for handling chunks of data, swallowing them up until an entire row was available, then decoding that
that's a much more complicated use case, one for which kantan.csv's iterator-like model is very ill-suited
if you want to talk about that PR - how would you feel about write both the decoder and the encoder?
that way, the tests pretty much write themselves, provided you can provide an Arbitrary instance for a test ADT
Andrew Roberts
@aroberts
Jan 30 2017 20:45
I was thinking about that- IS there an encoder? I don’t think a descriminator has meaning in CSV encoding, as the decoder is written (i.e. the downstream decoder is allowed to consume the descriminator as well)
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 20:47
so your discriminator is one cell whose content tells you how to decode the rest of the row, right? And the row is decoded into an ADT - for example, you could have cell 0 have value left or right, and if left, the rew of the row is decoded as a Left, if right as a Right, but either way an Either?
Andrew Roberts
@aroberts
Jan 30 2017 20:47
yes, pretty much
but the encoders would just be for Left and Right
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 20:48
but you wouldn't be able to decode something you've encoded that way though
Andrew Roberts
@aroberts
Jan 30 2017 20:48
no?
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 20:49
mmm...
Andrew Roberts
@aroberts
Jan 30 2017 20:49
let’s say the row type is Int :: Either[Int, String]
oh, hmm
because the either is a cell
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 20:49
right.
Andrew Roberts
@aroberts
Jan 30 2017 20:50
it’s the row type that’s analagous to either
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 20:50
let's take a better example, one where it must be a row and cannot be a cell
sealed abstract class Point
case class Point2d(x: Int, y: Int) extends Point
case class Point3d(x: Int, y: Int, z: Int) extends Point
you'd want to have a RowCodec[Point], that is, both a RowEncoder[Point] and a RowDecoder[Point]
Andrew Roberts
@aroberts
Jan 30 2017 20:51
ah
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 20:52
in order for that to work, you'd need both the encoder and the decoder know that cell 0 describes the kind of point you're dealing with
Andrew Roberts
@aroberts
Jan 30 2017 20:52
so the RowEncoder[Point] would come from a discriminator function f[T <: Point](t: T): EncodeResult[Point]
that’s that part I’m stuck on, though
the index
given that each sub-encoder must encode the csv line in entirety, why does the discriminator-derived encoder care what index the discrim field is?
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 20:53
well, it must care, because it must be able to write it
otherwise you can't decode it later
Andrew Roberts
@aroberts
Jan 30 2017 20:54
I see - there’s sort of an implicit dependency there through Codec
I haven’t looked at Codec at all
I’ll look at what it takes to satisfy that
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 20:55
no, sorry, Codec is really just a helper. You should never have to write a Codec yourself
the only reason it exists is to allow you to declare a Decoder and an Encoder in one call
Andrew Roberts
@aroberts
Jan 30 2017 20:55
right- I’m saying that if you make the single call to codec with a discriminator index, then the decoder will care about (and store) that, but the encoder will not
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 20:55
but alright, let's take a concrete example:
2d,1,2
3d,1,2,3
that's the kind of content you'd expect to decode into my earlier Point, right?
Andrew Roberts
@aroberts
Jan 30 2017 20:56
sure, yep
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 20:57
so if you're encoder is not aware that it needs to write the discriminator at index 0, it'll just write:
1,2
1,2,3
which you can't decode anymore
there are actual laws to Decoder and Encoder. One of which is, if you have both a Decoder and an Encoder for a given type, then you should be able to encode then decode again, and get the same value. This violates that law
Andrew Roberts
@aroberts
Jan 30 2017 20:58
are you expecting to have RowEncoder[Point2d] and RowEncoder[Point3d] instances?
I’m expecting those to exist
and I’m expecting those to know the format entirely (including the discriminator type)
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 20:59
ok, let's assume they exist
Andrew Roberts
@aroberts
Jan 30 2017 20:59
this parallels the decoder type
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 20:59
ah, right, now I see where you're coming from
but Decoder[Point2D] doesn't know about the discriminator. Or does it?
also, if you have an Encoder[Point2D] and an Encoder[Point3D], it unfortunately doesn't help you much if you're trying to encode Point
the compiler will look for an Encoder[Point], which does not exist
Andrew Roberts
@aroberts
Jan 30 2017 21:02
right, Encoder[Point] will be made with Encoder.withDiscriminator[T, S <: T](f: PartialFunction[S, Encode[T]])
I’m guessing a little at the types, and Encode is an alias again to deal with the invariance
f looks like
case p: Point2d => p.asCsvRow
...
(I don’t know the encoding syntax but you get the idea)
it sounds like you see a chance for the library to make a bigger inference? I guess S could be context bound to RowEncoder and more of that oculd be auto-derived based on what’s in scope
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 21:05
well, that's where I'm getting confused - can't decide whether your initial approach wasn't better
Andrew Roberts
@aroberts
Jan 30 2017 21:05
but if the encoder cares about the index, doesn’t it also follow that there needs to be a discriminator cell encoder, AND that it must somehow interact with the cells of the row encoder as the encoding happens?
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 21:05
it's just that it's not symmetrical to the way you write the decoder, and that feels wrong
Andrew Roberts
@aroberts
Jan 30 2017 21:06
I hear what you’re saying, but I think each subtype needs to handle the encoding whole cloth
to give a more complex example, my discriminator column is index 5, and based on that discriminator, the whole line is between 6 and 11 columns
here’s a more pointed question. if RowEncoder[Point] knows about the handling the discriminator, then what’s the output when you call encode val p: Point3d = Point3d(4,5,6)? does it get the 3d at index 0, or is it just 4?
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 21:08
alright, but that still does not help with the fact that you have encoders for your ADT's alternatives, but not for the "type" itself
Andrew Roberts
@aroberts
Jan 30 2017 21:09
right, I think you’d have to create that with the partial function
much like with the decoder
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 21:10
to answer your question - and I'm not saying this is the best answer - but one could imagine having RowEncoder[Point3d] encode to List(4, 5, 6), and then the discriminator to be inserted at index 0
not the most efficient thing in the world, but you'd have both a clean Point3d encoder and a clean Point encoder
Andrew Roberts
@aroberts
Jan 30 2017 21:11
I do see the ADT-ishness of that approach
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 21:14
and your encoder declaration would look very similar to the decoder:
def discriminatorEncoder[D: CellEncoder, E](index: I)(f: E => D)(f: PartialFunction[E, Seq[String] => EncodeResult[String]]): RowEncoder[E] = ???
Andrew Roberts
@aroberts
Jan 30 2017 21:14
the thing that bugs me is that it pollutes the index counts in the specific encoder/decoder
this doesn’t happen in the decoder case because my current implementation doesn’t strip out the discriminator field from the result
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 21:15
how so?
ah
right, I see
Andrew Roberts
@aroberts
Jan 30 2017 21:15
I think that’s the tension - ADT correctness versus absolute index
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 21:15
so your Encoder[Point2D] and Decoder[Point2D] are illegal
Andrew Roberts
@aroberts
Jan 30 2017 21:15
right
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 21:16
well, unlawful.
this is doing my head in, you should be ashamed of doing that to me after an entire day's worth of meetings
Andrew Roberts
@aroberts
Jan 30 2017 21:18
I feel awful :D
this is why I was less eager to tackle the encoding case
terrible decisions to make
I won’t have time to mess with this either way for a week or so, I imagine
so I’m happy to revisit this later
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 21:19
right. Give me some time to think about this. It's an interesting problem, but I don't want to rush a decision one way or the other
hey, if you have another few minutes, can I run something by you?
Andrew Roberts
@aroberts
Jan 30 2017 21:19
side tangent - if I have a string s, and I s.readCsv[List, T].headOption, what does it mean to get None back?
sure!
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 21:20
None => s was empty
right, so, a feature that's too often requested for me to ignore is to add support for header-based decoding
rather than index based
Andrew Roberts
@aroberts
Jan 30 2017 21:21
I thought it was already there?
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 21:21
I'm thinking of creating a HeaderDecoder[A] type, which takes a Seq[String] (the headers) and decodes it into a RowDecoder[A]
nope
Andrew Roberts
@aroberts
Jan 30 2017 21:21
ohhhh— I see
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 21:22
there's no such thing as RowDecoder.decode("foo", "bar", "baz")(Point3d.apply _)
mostly because most of the CSV I deal with has no header
Andrew Roberts
@aroberts
Jan 30 2017 21:22
gotcha - I saw the header bool in readCsv, just walked it back to the skip
ok, I’m with you
wait- this would call Point3d.apply with (<arg under “foo” header>, <arg under “bar”>, <arg under “baz”>), yes?
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 21:25
sorry, wife was asking for my opinion on a new coat
yeah, that'd be the idea
Andrew Roberts
@aroberts
Jan 30 2017 21:25
no sweat
are you in eur?
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 21:25
yes, I'm in Paris
Andrew Roberts
@aroberts
Jan 30 2017 21:25
oh, cool
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 21:25
(the French one, not the Texan one)
Andrew Roberts
@aroberts
Jan 30 2017 21:26
( ;) course )
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 21:26
I've actually had people tell me "oh, cool, Paris Texas!" which I felt was quite a compliment to my accent, but weird
Andrew Roberts
@aroberts
Jan 30 2017 21:27
I’ve never heard of paris tx
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 21:27
right, so, the problem with the HeaderDecoder approach is that it's not backwards compatible
Andrew Roberts
@aroberts
Jan 30 2017 21:27
but- texas isn’t much my jam, I’m up in Boston
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 21:28
can't say that I've been to Boston yet - most of my trips to the US have been to the west coast
except that one time where I spent a summer in Akron, Ohio, but that was just weird
Andrew Roberts
@aroberts
Jan 30 2017 21:29
hah
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 21:29
also, HeaderDecoder has the issue that it's not symmetrical to a hypothetical HeaderEncoder[A], which would take an A and return a Seq[String]
Andrew Roberts
@aroberts
Jan 30 2017 21:30
ah, interesting
could HeaderEncoder take A and return RowEncoder?
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 21:31
but then you get an encoder and a decoder that do not work on the same types
Andrew Roberts
@aroberts
Jan 30 2017 21:31
ahhhh
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 21:32
so I'm currently stuck there - which is a shame, I felt the idea of a decoder that returned another decoder to be quite elegant
Andrew Roberts
@aroberts
Jan 30 2017 21:32
that’s pretty tricky
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 21:33
I think unfortunately I've already dealt with the easy problems :)
maybe it should encode and decode to Map[String, Int] ?
Andrew Roberts
@aroberts
Jan 30 2017 21:35
that would make sense
my mind keeps wanting to go to e.g. LabelledGeneric
but that’s a step further
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 21:36
no but you're right, that's another place I get stuck
Andrew Roberts
@aroberts
Jan 30 2017 21:37
there’s a bunch of uncomfortable statefulness in this header problem
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 21:38
that's why I liked the idea of decoding to a RowDecoder - the entire state is contained in the returned RowDecoder instance, which is immutable
bah. My brain is fried. I'll think more on this with a clear heard tomorrow
(quite impressive the surge of interest I've seen in the last few days - a post on reddit/r/scala, 8 github stars, 1 PR...)
Andrew Roberts
@aroberts
Jan 30 2017 21:47
good stuff
<- also fried. have a good evening!
Nicolas Rinaudo
@nrinaudo
Jan 30 2017 21:47
cheers, you too :)