These are chat archives for nrinaudo/kantan.csv

3rd
May 2016
Nicolas Rinaudo
@nrinaudo
May 03 2016 12:42
@metasim your idea of "\\d+".r.as... for kantan.regex was a good one. There's now an additional layer of type class to let you add "compile to regex" capabilities to any type
currently, only scala.matching.Regex, java.util.Pattern and String have it, but it might be interesting for other types late
Simeon H.K. Fitch
@metasim
May 03 2016 12:45
win!
I think it's a library that will reall fill a need.
Nicolas Rinaudo
@nrinaudo
May 03 2016 12:45
it certainly does for me - combined with kantan.xpath, it makes web scrapping much less of a hassle
I also work with some odd software whose output is XML whose CDATA has to be analysed, things like <match start="1" end="2">foo bar [FOO]</match>, where I'm interested in 1, 2 and FOO
Simeon H.K. Fitch
@metasim
May 03 2016 12:48
I wrote an akka based scraper a while back, using JSoup. The extraction part was the weakest link (after I replaced native Akka with Akka Streams).
BTW, do you have a twitter handle?
Nicolas Rinaudo
@nrinaudo
May 03 2016 12:48
yeah, @NicolasRinaudo, although I'm not terribly active
Simeon H.K. Fitch
@metasim
May 03 2016 12:49
If you tweet library updates I'll retweet.
Nicolas Rinaudo
@nrinaudo
May 03 2016 12:49
that'd be much appreciated :)
so, you're saying JSoup is not a dead end then. I really thought the project had been dropped years ago and didn't even look into it
I guess I'll put together a JSoup module for kantan.xpath then, shouldn't be too hard
Simeon H.K. Fitch
@metasim
May 03 2016 12:50
I've seen it crop up a number of times in other projects.
In some ways, I think it's super stable.... just kinda "done".
Nicolas Rinaudo
@nrinaudo
May 03 2016 12:50
I remember it coming appart at the seams with HTML 5, but maybe I didn't give it a fair chance.
Simeon H.K. Fitch
@metasim
May 03 2016 12:50
I haven't used the library you're using, so I can't compare. But I do know that it handles "bad" html pretty well.
I don't remember having any major problems, but don't base things on my feedback... it was a short-lived project.
Nicolas Rinaudo
@nrinaudo
May 03 2016 12:51
oh yeah, I published SNAPSHOT artifacts for kantan.csv that fix your issue
0.1.10-SNAPSHOT, should be on maven central by now. It also changes the syntax of the various RowDecoder methods - arity is not part of the name anymore, so that might cause you some grief at first
cool tweet, cheers :)
Simeon H.K. Fitch
@metasim
May 03 2016 12:53
heh
I don't tweet much either... so when I actually log in, figure I might as well post something :)
Nicolas Rinaudo
@nrinaudo
May 03 2016 12:54
I use twitter quite a bit to keep up to date with new things, but very rarely to actually write things
it's just a bit painful whenever there's a cool conference I can't attend to and my feed fills up with mentions of people enjoying themselves
flatMap Oslo, currently
Simeon H.K. Fitch
@metasim
May 03 2016 12:59
Yeh, I wish I felt more connected with the community on a face-to-face basis. I'm either a) too busy with deadlines, or b) intimidated by the high density of brilliant people. :smirk:
Nicolas Rinaudo
@nrinaudo
May 03 2016 13:00
don't know where you're based, but there's scala.io in a few months, you should come. That's the one conference I get to attend
Simeon H.K. Fitch
@metasim
May 03 2016 13:03
I'm in VA/USA.
Nicolas Rinaudo
@nrinaudo
May 03 2016 13:03
ah, right, so France might be a bit far
Simeon H.K. Fitch
@metasim
May 03 2016 13:04
Certainly tempting!!
Nicolas Rinaudo
@nrinaudo
May 03 2016 13:06
wait, vancouver? As in, "it's 06:00 right now" Vancouver?
Simeon H.K. Fitch
@metasim
May 03 2016 13:18
Virginia :)
Nicolas Rinaudo
@nrinaudo
May 03 2016 13:18
ah, right, a somewhat saner time to be online then
Simeon H.K. Fitch
@metasim
May 03 2016 13:56
@nrinaudo What should I replace calls to RowEncoder.caseEncoder9 with?
Nicolas Rinaudo
@nrinaudo
May 03 2016 13:57
RowEncoder.caseEncoder
the indexes and construction functions have been swapped, so the arity is now part of the signature rather than the name
a more explicit example:
RowEncoder.caseEncoder9(Car.apply)(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
RowEncoder.caseEncoder(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)(Car.apply)
Simeon H.K. Fitch
@metasim
May 03 2016 13:58
ah
Isn't there an ordered somethingorother?
Nicolas Rinaudo
@nrinaudo
May 03 2016 13:58
it also makes for much nicer syntax if you're not building a case class but a "normal" one and want to declare the construction function inline
Simeon H.K. Fitch
@metasim
May 03 2016 13:58
For when you're columns align?
Thought I saw something like that in the code
Nicolas Rinaudo
@nrinaudo
May 03 2016 13:59
sure there is, with the same modification:
RowEncoder.ordered { (i: Int, s: String) => new Foobar(i, s) }
it's slightly more unpleasant with case classes:
RowEncoder.ordered(FooBar.apply _)
Simeon H.K. Fitch
@metasim
May 03 2016 14:00
That latter's not so bad.
Nicolas Rinaudo
@nrinaudo
May 03 2016 14:00
But then again, if you're using case classes where CSV columns and fields map exactly, you're better off using the shapeless module
Simeon H.K. Fitch
@metasim
May 03 2016 14:01
Before I was passing in FooBar.unapply.... now it should be FooBar.apply?
Nicolas Rinaudo
@nrinaudo
May 03 2016 14:01
Yeah, I'm just not a big fan of that trailing _, but it's not horrible
ah, right, sorry, I mixed apply and unapply
Simeon H.K. Fitch
@metasim
May 03 2016 14:01
'k
I don't mind the _ so much.
Nicolas Rinaudo
@nrinaudo
May 03 2016 14:02
I don't know that I would start using 0.1.10-SNAPSHOT in production code just yet though - it should be stable, but I might tweak signatures a bit more
Simeon H.K. Fitch
@metasim
May 03 2016 14:02
Yeh, I just wanted to give you feedback.
Nicolas Rinaudo
@nrinaudo
May 03 2016 14:03
much appreciated :)
Simeon H.K. Fitch
@metasim
May 03 2016 14:03
FYI, if I do RowEncoder.ordered(CuratedDataSetStats.unapply _) I get "could not find implicit value for evidence parameter of type kantan.csv.CellEncoder[Option[(String, Int, Int, Int, Int, Int, Float, Float, Float)]]"
If that's expected, I'm fine with that.
Nicolas Rinaudo
@nrinaudo
May 03 2016 14:04
no, it's absolutely not expected
Simeon H.K. Fitch
@metasim
May 03 2016 14:04
But didn't know if it was a case of missing implicit
Nicolas Rinaudo
@nrinaudo
May 03 2016 14:04
what's CuratedDataSetStats? A tuple ?
Simeon H.K. Fitch
@metasim
May 03 2016 14:04
Case class
Nicolas Rinaudo
@nrinaudo
May 03 2016 14:05
that compilation error makes no sense. Would you mind pasting in the declaration of CuratedDataSetStats?
Simeon H.K. Fitch
@metasim
May 03 2016 14:05
Yeh, I agree.
Nicolas Rinaudo
@nrinaudo
May 03 2016 14:05
just the list of fields would help
Simeon H.K. Fitch
@metasim
May 03 2016 14:07
case class CuratedDataSetStats(
  dataSetName: String,
  documents: Int,
  expectedEntities: Int,
  rulesDefined: Int,
  entitiesFound: Int,
  entitiesCorrect: Int,
  precision: Float = 0.f,
  recall: Float = 0.f,
  fScore: Float = 0.f)
Nicolas Rinaudo
@nrinaudo
May 03 2016 14:07
alright. I can reproduce the issue. Not sure what's happening there, I'll take a look. Thanks!
Simeon H.K. Fitch
@metasim
May 03 2016 14:08
It's a synthesized unapply, which should return Option[CuratedDataStats].
Is there some macro magic going on?
Nicolas Rinaudo
@nrinaudo
May 03 2016 14:08
absolutely not, unless you've imported kantan.csv.generic._
Simeon H.K. Fitch
@metasim
May 03 2016 14:08
Only import kantan.csv.RowEncoder
Nicolas Rinaudo
@nrinaudo
May 03 2016 14:09
I don't think you're doing anything wrong. I, on the other hand, must be
Simeon H.K. Fitch
@metasim
May 03 2016 14:09

At any rate, once I changed to:

RowEncoder.caseEncoder(0, 1, 2, 3, 4, 5, 6, 7, 8)(CuratedDataSetStats.unapply)

everything else in 0.1.10-SNAPSHOT seems to work.

Nicolas Rinaudo
@nrinaudo
May 03 2016 14:10
oh, hey, wait
caseOrdered, I think
Simeon H.K. Fitch
@metasim
May 03 2016 14:10
heh
Yep, that works too! :)
Nicolas Rinaudo
@nrinaudo
May 03 2016 14:11
on the one hand, good news: everything works as expected
on the other hand, it's extremely confusing
I mean I wrote the code and even I had forgotten the distinction between the two. That error message is horrible
Simeon H.K. Fitch
@metasim
May 03 2016 14:12
Yeh, but the error message is kinda outside your control.... unless you can tailor it with @annotation.implicitNotFound
I'm going back to 0.1.9 for now (not a fan of SNAPSHOTS in deployed code.
Nicolas Rinaudo
@nrinaudo
May 03 2016 14:13
you're absolutely right
you don't want my next SNAPSHOT to be included in your deployed code without your control
I'll try to think of a clever way to unify these signatures, maybe there's no need to distinguish between caseOrdered and ordered