These are chat archives for frictionlessdata/chat

25th
Jul 2016
Friedrich Lindenberg
@pudo
Jul 25 2016 12:31
hey all!
pdehaye
@pdehaye
Jul 25 2016 12:32
Hi
Friedrich Lindenberg
@pudo
Jul 25 2016 12:33
so @pwalsh -- re typecast and jts, would love feedback. it's just an attribute right now: https://github.com/pudo/typecast/blob/master/typecast/date.py#L15 and a mapping for the reverse: https://github.com/pudo/typecast/blob/master/typecast/__init__.py#L11
so that people can say: typecast.cast('datetime', '2011-02-02')
Paul Walsh
@pwalsh
Jul 25 2016 12:36
Hey @pudo
Hey @roll meet @pudo too, who has authored many things that we either use or took lots of inspiration from
Friedrich Lindenberg
@pudo
Jul 25 2016 12:37
i.e. the walking maintenance nightmare
Paul Walsh
@pwalsh
Jul 25 2016 12:37
haha
Friedrich Lindenberg
@pudo
Jul 25 2016 12:37
anyway, @roll -- looking at tabulator-py, it's quite beautiful
Paul Walsh
@pwalsh
Jul 25 2016 12:38
so @pudo the jts types in http://github.com/okfn/jsontableschema-py should be much better than when you last looked. my first version was extracted from goodtables and had an inconsistent API. Since, several smarter people have tidied up many things
and in general, the API for types is similar/almost the same as that in typecast https://github.com/pudo/typecast (on purpose)
Friedrich Lindenberg
@pudo
Jul 25 2016 12:39
oh, awesome
Paul Walsh
@pwalsh
Jul 25 2016 12:39
btw i really like that you support roundtripping there
Friedrich Lindenberg
@pudo
Jul 25 2016 12:40
the other thing there is what I ripped out of messytables this weekend:
Paul Walsh
@pwalsh
Jul 25 2016 12:40
@roll @pudo if i understand the direction of typecast and jsontableschema-py correctly, we can essentially design it that typecast provides the lower-level types (as a dependency to the jts lib), and jts extends on that
Friedrich Lindenberg
@pudo
Jul 25 2016 12:41
guesser = typecast.guesser(); guesser.add("299"); guesser.add(...); guesser.best
Paul Walsh
@pwalsh
Jul 25 2016 12:41
the above is not a new issue, i wrote about using typecast as a dependency back in frictionlessdata/jsontableschema-py#26 but it did not happen
@pudo ok, it would be good to look at our guessing in the jts lib and see what you have now in typecast. presumably lots could be improved in the guessing algorithm
Friedrich Lindenberg
@pudo
Jul 25 2016 12:43
yeah I'd love that
I've been struggling a lot with effective date format guessing
trying to guess around 400 formats means it can be ultra-slow
so now I'm doing some fancy pre-checks which are weird
Paul Walsh
@pwalsh
Jul 25 2016 12:45
yeah, i think right now we are not doing format guessing there, only type guessing. i can imagine the hit on date format guessing to be quite extreme
Friedrich Lindenberg
@pudo
Jul 25 2016 12:46
another question of taste, I guess, is decimals: this currently generates Decimals, not floats. That's more precise, but also a good bit slower. Happy to change
Paul Walsh
@pwalsh
Jul 25 2016 12:48
So @pudo, @roll will very very soon be making some changes in the jts lib, and then tabulator, and then, on top of all that, goodtables. I'd say this is right now a good chance to get typecast and jsontableschema in sync, and with a reasonable separation of concerns, presuming, of course, that you do not want to go the whole hog with jts support in typecast directly. WDYT?
Friedrich Lindenberg
@pudo
Jul 25 2016 12:49
what additional hog of JTS support is there?
I mean the concern of typecast is values
not complex structures
but on that level I'd like to support it fully :)
Paul Walsh
@pwalsh
Jul 25 2016 12:51
@pudo ok got it. can we ping you on it as soon as @roll gets to it (def. in the next week), and we can have an api design discussion in here?
Friedrich Lindenberg
@pudo
Jul 25 2016 12:51
perfect!
gimme a shout, this sounds fun :)
Paul Walsh
@pwalsh
Jul 25 2016 12:52
great.
Paul Walsh
@pwalsh
Jul 25 2016 12:58

@pudo we'd also be really happy to see you using tabulator :) , so when you have a play, please open some issues etc. the api is generally quite nice there.

Tabulator is currently focussed on reading, and @roll suggested (and i like the idea a lot) to also add a writer processor (usecase example: recode anything to utf-8 encoded csv as a side effect of the processing pipeline)

Friedrich Lindenberg
@pudo
Jul 25 2016 12:59
it actually looks ideal for aleph
I don't want that to do too much magic
and this looks like just the right amount
Paul Walsh
@pwalsh
Jul 25 2016 13:00
Great. we will review PRs quickly and implement practical features swiftly :)
Friedrich Lindenberg
@pudo
Jul 25 2016 13:00
hm, and ODS is on the "wanna have list"?
Paul Walsh
@pwalsh
Jul 25 2016 13:00
frictionlessdata/tabulator-py#28
it is the next actual feature i want to implement, as there is demand, and it is important that we add ODS support to goodtables
Friedrich Lindenberg
@pudo
Jul 25 2016 13:02
would lxml be too heavy a dep?
i mean it's C, but it's also awesome fast
Paul Walsh
@pwalsh
Jul 25 2016 13:02
well, i want to replace chardet with cchardet, so.... i'm happy with C dependencies that significantly improve performance
roll
@roll
Jul 25 2016 13:26
@pudo thanks! @pwalsh We also heavily use pudo's dataset libraly in OpenTrials)
Friedrich Lindenberg
@pudo
Jul 25 2016 13:27
@roll: oh, cool, didn't realize!
roll
@roll
Jul 25 2016 13:28
About jts types - my main concern for now is things we call types is more like fields based on common practice (different orms etc)
Friedrich Lindenberg
@pudo
Jul 25 2016 13:28
i.e. (value, typeinfo) bundles?
roll
@roll
Jul 25 2016 13:29
e.g. instantiating jts.type we pass into constructor field.name, field.xxx etc
and type behavior depends on things really belonging to field
Friedrich Lindenberg
@pudo
Jul 25 2016 13:31
can you link me? trying to understand this
So field (jts spec object) currently is attribute of type.
Friedrich Lindenberg
@pudo
Jul 25 2016 13:33
wow you implemented a whole json schema-style thing there
roll
@roll
Jul 25 2016 13:33
not me)
Friedrich Lindenberg
@pudo
Jul 25 2016 13:33
haha :)
Paul Walsh
@pwalsh
Jul 25 2016 13:34
@roll yes, i agree on field concept. also become even more apparent when i spec'd out frictionlessdata/jsontableschema-models-js#1