These are chat archives for nrinaudo/kantan.csv

18th
Jan 2017
yesterday192
@yesterday192
Jan 18 2017 06:31
@nrinaudo thank you, if I want to keep stringly typed for uploadDate, do you have solution for my case?
Nicolas Rinaudo
@nrinaudo
Jan 18 2017 08:20
not really - you're hit by one of the edge cases you have to deal with when writing a CSV decoder - a lot of CSV writers will omit the last cell if it's empty, so I can't treat that as an error
@yesterday192 I'm a bit confused though - if you're going to use String to store your values, why go through the trouble of putting them in a case class? Wouldn't a Tuple21 or a List[String] be better?
it just seems like this is kind of odd - you're kind of typing your data by putting it in a case class, but you stop halfway through.
is it because you feel decoding to proper types would be hard? If so, it really isn't, and I can help you with that.
yesterday192
@yesterday192
Jan 18 2017 08:49
if I parse csv to List[String] or Tuple21, I wouldn't know what is each column mean.
I don't use proper types because I don't want to validate data when parsing csv (Ex: if uploadDate has date time typed, and I put "abc" to uploadDate column of csv file, it will fail parsing)
Nicolas Rinaudo
@nrinaudo
Jan 18 2017 08:50
do you not want to fail parsing of that row if it contains incorrect data?
and if you do not, and are ready to accept obviously incorrect data, why is abc as a date acceptable, but not the empty string?
(I'm not criticizing the way your code works - trying to understand your use case to see if I can make an intelligent suggestion)
yesterday192
@yesterday192
Jan 18 2017 08:52
yes, because I validate data in other place.
Nicolas Rinaudo
@nrinaudo
Jan 18 2017 08:52
ok - I'd personaly validate data at decoding time, but if that's not the way you work, that's not the way you work
but then why wouldn't your validation code deal with the fact that uploadDate contains the empty string?
yesterday192
@yesterday192
Jan 18 2017 08:55
as you see, I have 21 fields in case class, all of them is String.
when csv file has 20 columns parsing is successful
But when csv file has 19 columns, parsing is fail
Nicolas Rinaudo
@nrinaudo
Jan 18 2017 08:56
yeah. That last cell thing is a bit of a hack, but that's the only way I could get things to work for the majority of the CSV data that's out there.
right, so, in your case. Have you tried using the generic module? it's quite a bit less flexible about these things than "normal" decoders
yesterday192
@yesterday192
Jan 18 2017 09:00
Oh, I tried to read document about generic module. But I haven't known how to apply it to my case. Can you help me?
Nicolas Rinaudo
@nrinaudo
Jan 18 2017 09:00
sure. What the generic module does is automatically generate decoders for you. So, first, you update your SBT file to depend on that module:
libraryDependencies += "com.nrinaudo" %% "kantan.csv-generic" % "0.1.16"
then you import the correct package:
import kantan.csv.generic._
and you're done. You should now have a RowDecoder[AuthorCsv] in scope.
This can be validated with:
val decoder: RowDecoder[AuthorCsv] = RowDecoder[AuthorCsv]
yesterday192
@yesterday192
Jan 18 2017 09:03
It mean I don't need to declare decoder for my case class?
Nicolas Rinaudo
@nrinaudo
Jan 18 2017 09:04
That's exactly what it means. The generic module is capable of creating them automatically for case classes and algebraic data types
but, since the decoder is generated automatically, it's a lot more strict about what it accepts
yesterday192
@yesterday192
Jan 18 2017 09:05
thank you, I will try now
Nicolas Rinaudo
@nrinaudo
Jan 18 2017 09:05
I'm happy to write the code for you if you can give me some sample data to try out
yesterday192
@yesterday192
Jan 18 2017 09:06
Thank you. I think I can do it first. If have problem, I will ask you.
Nicolas Rinaudo
@nrinaudo
Jan 18 2017 09:07
in theory, it should be as simple as:
import kantan.csv.generic._
import kantan.ops._

val data: List[DecodeResult[AuthorCsv]] = new java.io.File("path/to/csv").readCsv[List, AuthorCsv](',', false)
right, sure. Have a go and let me know if I can be of further assistance
Nicolas Rinaudo
@nrinaudo
Jan 18 2017 09:21
(also, let me know if you feel the documentation is lacking - you seem to have struggled with generic, was there anything in particular that was confusing or not explained well enough?)
yesterday192
@yesterday192
Jan 18 2017 09:56
I applied generic module successful. Thank you.
Nicolas Rinaudo
@nrinaudo
Jan 18 2017 09:56
did that fix your problem?
yesterday192
@yesterday192
Jan 18 2017 10:03
about document generic module, I feel confuse because of example.
Nicolas Rinaudo
@nrinaudo
Jan 18 2017 10:04
implicit val stringCellDecoder: CellDecoder[String] = CellDecoder.from(s => DecodeResult(s.trim))
this tells kantan.csv that, whenever decoding a string, it should trim it.
do you understand why, or would you like me to explain?
yesterday192
@yesterday192
Jan 18 2017 10:07
yes, thank you.
Nicolas Rinaudo
@nrinaudo
Jan 18 2017 10:08
ok. So the way decoding works is:
  • in order to decode a CSV row as a type A, you need an implicit value of type RowDecoder[A] to be in scope.
  • in order to decode a CSV cell as a type A, you need an implicit value of type CellDecoder[A] to be in scope.
those are called type classes, I'm not sure whether you've heard of them.
very often, instances of RowDecoder are built from instances of CellDecoder - if you have a RowDecoder[List[Int]], for example, it most likely relies on an instance of CellDecoder[Int] to parse actual cells
in your case, you have a RowDecoder[AuthorCsv] generated by the generic module.
this relies on CellDecoder[String] to decode each cell, since all your fields are Strings.
yesterday192
@yesterday192
Jan 18 2017 10:10
I got it.
Nicolas Rinaudo
@nrinaudo
Jan 18 2017 10:10
The default behaviour of the string decoder is to just return its argument. You want something slightly more involved: you'd like the string to be trimmed, as well. So you create a new instance of CellDecoder[String], mark it as implicit and put in scope in order to override the default behaviour
right. Note that I've not tried the code I just typed - it should work, but let me know if it doesn't for some reason
(oh - by Yes, thank you, did you mean Yes, I understand, thank you ? sorry, I understood it to mean Yes, please explain)
yesterday192
@yesterday192
Jan 18 2017 10:12
so, I can declare CellEncoder if I want to process data of cell before write to csv, isn't it?
Nicolas Rinaudo
@nrinaudo
Jan 18 2017 10:12
yes, that works the same way
yesterday192
@yesterday192
Jan 18 2017 10:14
(oh - by Yes, thank you, I mean Yes, please explain)
Nicolas Rinaudo
@nrinaudo
Jan 18 2017 10:15
so do you need to serialise your data as well then? Say, you have an Iterator[AuthoCsv] and you want to write that somewhere?
yesterday192
@yesterday192
Jan 18 2017 10:18
And I read document of kantan.csv here: http://nrinaudo.github.io/kantan.csv/
But I don't see example about readCsv writeCsv or use ordered instead of decode (Maybe I miss them).
Nicolas Rinaudo
@nrinaudo
Jan 18 2017 10:19
well:
oh, actually, this is what you're looking for for readCsv: http://nrinaudo.github.io/kantan.csv/tut/data_as_collection.html
yesterday192
@yesterday192
Jan 18 2017 10:25
I see, thank you.
Nicolas Rinaudo
@nrinaudo
Jan 18 2017 10:26
happy to help