These are chat archives for nrinaudo/kantan.csv

31st
Aug 2018
Elijah Rippeth
@erip
Aug 31 2018 18:10
Hi all -- anyone here?

I'm curious if kantan.csv can handle nested CSV-to-case class deserialization. I have headers like:

id, "a,b,c", "a,b,d", timeStamp

and I'd love to do something like

final case class B(c: String, d: String)
final case class A(b: B)
final case class Row(id: String, a: A, timeStamp: String)
Not sure if this is tenable.
Nicolas Rinaudo
@nrinaudo
Aug 31 2018 18:14
I’m a bit busy right now but I should be able to take a look in a couple of hours
Elijah Rippeth
@erip
Aug 31 2018 18:14
Take your time! This is pretty low priority. :smile:
Elijah Rippeth
@erip
Aug 31 2018 18:30
@guizmaii is everywhere there's scala... :smile:
Nicolas Rinaudo
@nrinaudo
Aug 31 2018 19:32
@erip right, so, I'm not entirely sure how that'd work
id would go in Row.id, a,b,c in Row.a, but what about a,b,d?
I think maybe your example doesn't exactly match your types, but you'd ike to know how to treat data where a cell contains a nested CSV row?
Nicolas Rinaudo
@nrinaudo
Aug 31 2018 19:45
if I'm not misunderstanding your question, it's tricky but certainly doable:
import kantan.csv._
import kantan.csv.ops._

val input = """id, "a,b,c""""

final case class Foo(a: String, b: String, c: String)

final case class Row(id: String, foo: Foo)

// This is the tricky bit: when decoding a cell to a Foo, you re-use kantan.csv's row decoding mechanism.
// This requires your cell to be manually tokenised to a Seq[String] which, in your case, seems to be simply done by
// splitting on commas.
implicit val fooDecoder: CellDecoder[Foo] = CellDecoder.from {
  val decoder: RowDecoder[Foo] = RowDecoder.ordered(Foo)

  s => decoder.decode(s.split(","))
}

implicit val rowDecoder: RowDecoder[Row] = RowDecoder.ordered(Row)

input.readCsv[List, Row](rfc)
// res0: List[ReadResult[Row]] = List(Right(Row(id,Foo(a,b,c))))
not sure why this isn't getting syntax highlighted properly...
Elijah Rippeth
@erip
Aug 31 2018 19:48
@nrinaudo I was thinking that ”a,b,c” could be some arbitrarily deep “nesting” in which each level would be a separate case class
Nicolas Rinaudo
@nrinaudo
Aug 31 2018 19:49
well, it's certainly doable, provided you're capable of having syntax for your nesting that's non-ambiguous and doesn't clash with the CSV format
but you have to implement it manually
Elijah Rippeth
@erip
Aug 31 2018 19:50
I was thinking that maybe a monadic for-comp might be the best way
Nicolas Rinaudo
@nrinaudo
Aug 31 2018 19:51
I don't think that could work, since in your example, you must be mixing row and cell decoders, which are different types. Monads don't compose
also, that's not how the library works - it uses a (boring) AST to represent CSV data: sequences of strings. The various decoders work on different parts of this AST. But nowhere in there do you have "transforming textual data to a CSV AST"
that's another part of the library altogether, which relies on the decoders, but not the other way around
Elijah Rippeth
@erip
Aug 31 2018 19:53
Ah, understood.
Nicolas Rinaudo
@nrinaudo
Aug 31 2018 19:53
but!
Elijah Rippeth
@erip
Aug 31 2018 19:53
Oh, a but. :smile:
Nicolas Rinaudo
@nrinaudo
Aug 31 2018 19:54
if you're capable of doing the parsing yourself, ie of writing a String => Seq[String] for each level of nesting, then you're good to go
it's a bit boilerplatey, but the code I showed can be nested arbitrarily deep
Elijah Rippeth
@erip
Aug 31 2018 19:54
Yeah, I was thinking something similar
It might be abstractable… maybe
I really appreciate it, though! I can give it a shot
Nicolas Rinaudo
@nrinaudo
Aug 31 2018 19:57

honestly not sure how you'd abstract it, though.

I think it'd probably be fairly straightforward to make a helper function:

def createDecoder[A: RowDecoder](f: String => Seq[String]): CellDecoder[A] = ???

But you'd still have to create a decoder for each nesting level, even if each decoder is probably a one-liner

(yeah, createDecoder is just CellDecoder.from(s => RowDecoder[A].decode(f(s))), or something very similar)
Elijah Rippeth
@erip
Aug 31 2018 20:04
hmm, so maybe the better way is to do this:
id, “a,b,c”, “a,b,d”, time
should be treated like
fcc Row(id: String, a: A, time: String)
fcc A(c: String, d: String)
Nicolas Rinaudo
@nrinaudo
Aug 31 2018 20:06
I'm still not sure how A works - both of your nested values have three components and A only 2
Elijah Rippeth
@erip
Aug 31 2018 20:06
but I don’t think this is possible since the cells are decoded independently, so there’s no way to combine them (like you said)
Nicolas Rinaudo
@nrinaudo
Aug 31 2018 20:06
ooooh
that's what you want to do?
take a value from the first cell, a value from the second, and combine them?
Elijah Rippeth
@erip
Aug 31 2018 20:06
yeah, sorry: the inner commas aren’t delimiting — they’re just an unfortunately “namespacing"
Nicolas Rinaudo
@nrinaudo
Aug 31 2018 20:07
ignore my answers, I'm answering an entirely different question
Elijah Rippeth
@erip
Aug 31 2018 20:07
I wasn’t very clear — my apologies
Nicolas Rinaudo
@nrinaudo
Aug 31 2018 20:07
but then in that case, it should be possible
so, say that c and d have slightly more specific types - NamespacedString, say
write a decoder for:
final case cass Row(id: String, a: A, c: NamespacedString, d: NamespacedString, time: String)
and RowDecoder has a functor, so you can just map RowDecoder[Row] to RowDecoder[WhateverYouWant], which depends on the values in Row
I'm not sure this is very clear
alright, let me try again.
If you know the number of fields in your row, and you need to merge a couple of them, the types of which you know, into a third, that's "just" mapping on the row
does that make more sense?
Elijah Rippeth
@erip
Aug 31 2018 20:12
Maybe I need to read the docs a bit more first — I’m getting caught up between whether cells are consumed one-at-a-time or all at once
Nicolas Rinaudo
@nrinaudo
Aug 31 2018 20:13
I can explain that
Elijah Rippeth
@erip
Aug 31 2018 20:13
i.e., if I implement a RowDecoder[Row], it seems like I’m basically re-writing the CSV parsing logic
Nicolas Rinaudo
@nrinaudo
Aug 31 2018 20:14
right. So, no. The parsing is done for you - turning a row into a sequence of strings
the decoding isn't, this is what you write
what happens under the hood is that each row is consumed entirely, turned into a sequence of strings, and then your row decoder gets that sequence of string
Elijah Rippeth
@erip
Aug 31 2018 20:15
Ok, so by knowing the order of the fields, I should be able to say “these two neighboring fields can be decoded to an A”?
Nicolas Rinaudo
@nrinaudo
Aug 31 2018 20:16
you could, but you'd be forfeiting all of the comfort that cell decoders bring you
a RowDecoder[A] is basically a Seq[String] => A, so if you can write that function, you can definitely write the decoder you have in mind
Elijah Rippeth
@erip
Aug 31 2018 20:16
Heh. :sweat_smile: I’ll go read some docs — I don’t want to waste your time giving me a lesson.
I really appreciate this, though
Nicolas Rinaudo
@nrinaudo
Aug 31 2018 20:17
but then you have to deal with each cell yourself. With CellDecoder, you can tell kantan.csv I want cell at index 2 to be an int, and it'll manage
honestly, the easiest way would be to write a row decoder that turns each cell in your row into an individual type, then map to merge them
RowDecoder[(String, NamescapedString, NamespacedString, LocalDateTime)].map { case (id, a, b, date) => Foo(id, magic(a, b), date) }
Elijah Rippeth
@erip
Aug 31 2018 20:18
Yeah, I was thinking about that — just having an intermediate (but very shallow) structure
The problem is that the structure is very shallow: dozens of fields
More annoying than anything, but perhaps the most doable
Nicolas Rinaudo
@nrinaudo
Aug 31 2018 20:19
dozens, plural?
this might be a problem. We're talking about Scala, where functions can't take more than 22 parameters
Elijah Rippeth
@erip
Aug 31 2018 20:19
Maybe ~20 fields
:wink:
Nicolas Rinaudo
@nrinaudo
Aug 31 2018 20:21
well, I think you understand the solution I'm proposing, which is the only reasonable one I can think off the top of my head
Elijah Rippeth
@erip
Aug 31 2018 20:21
Yeah, it makes sense
Nicolas Rinaudo
@nrinaudo
Aug 31 2018 20:21
always happy to see users trying to do crazy things with kantan.csv, but this is crazyer than most :)
keep me posted, I'm interested in knowing how this turned out. If it's not too nasty, that might be something I want to incorporate in future releases
Elijah Rippeth
@erip
Aug 31 2018 20:22
sorry for the craziness on a Friday
Nicolas Rinaudo
@nrinaudo
Aug 31 2018 20:23
This message was deleted
I'm not complaining, sorry if it sounded like I was
Elijah Rippeth
@erip
Aug 31 2018 20:27
Oh, I didn’t read it that way. :smile: