These are chat archives for nrinaudo/kantan.csv

12th
Oct 2017
gthernandez
@gthernandez
Oct 12 2017 13:47
Thanks Nicolas. Yeah, there is weird data. Since I inherited this code base, I dont really want to change it as its been working in production. I think what I am going to do is create a new class that sanitizes the document prior to passing it to the CSVReader.
Nicolas Rinaudo
@nrinaudo
Oct 12 2017 13:48
@gthernandez have a look at this page: http://nrinaudo.github.io/kantan.csv/tut/engines.html
the Jackson engine tends to be very flexible, it might support your use case and it takes a few seconds to test
gthernandez
@gthernandez
Oct 12 2017 13:49
I dont know if its something that may be a change you might want to make for your codebase, but I could see the possibility that people might have quotations in the middle of a cell.
The data that I am parsing comes out of a CRM and its a conversation description field.
Nicolas Rinaudo
@nrinaudo
Oct 12 2017 13:50
you can absolutely have quotation inthe middle of a cell, you just need to do it the way it's done in CSV :)
""This is a "quoted" cell""
gthernandez
@gthernandez
Oct 12 2017 13:50
Ah, then the export from the CRM is exporting it wrong!
Nicolas Rinaudo
@nrinaudo
Oct 12 2017 13:50
have a look at the specs: https://tools.ietf.org/html/rfc4180
yeah, looks like it
gthernandez
@gthernandez
Oct 12 2017 13:51
Cool
Nicolas Rinaudo
@nrinaudo
Oct 12 2017 13:51
it might be supported by other parsers, such as jackson, but the default kantan.csv reader is RFC compliant.
gthernandez
@gthernandez
Oct 12 2017 13:51
Its very well written.
Nicolas Rinaudo
@nrinaudo
Oct 12 2017 13:52
thanks :)