These are chat archives for cltk/cltk_api

Mar 2018
Kevin Stadler
Mar 21 2018 12:36
@kylepjohnson @lukehollis I've had another go at reading the Chinese corpora with MyCapytain, but no luck. I've consequently added implementing a dedicated TEIXMLReader to CLTK to my proposal, which you should be able to access and comment on via the GSoC website. I'd be grateful for any feedback that you have on the scope and level of detail of the current proposal, I'd be happy to add more technical details regarding the implementation by Tuesday if you feel that it's necessary!
Luke Hollis
Mar 21 2018 23:41
Hi @kevinstadler, thank you for this contribution, and I think we need as a community to make a better decision about the textserver and options there. We have a minimally-scoped cltk_json data format described here:
We ingest this json to a postgres database with something like this: depending on the goals of the organization (@kylepjohnson @pletcher @suheb @jtauber), we could continue to do something like this in the future. It sounds like especially with @jtauber and Eldarion, there may be a more robust version of offering this text with metadata in the future?