These are chat archives for cltk/cltk

25th
Sep 2018
Ben Nagy
@bnagy
Sep 25 2018 09:09
Update: the Collatinus app now includes a statistical POS Tagger https://projet.biblissima.fr/en/news/new-version-collatinus-now-available. It uses probabilities from LASLA (1.5 million words analysed by hand). The tagger isn't in pycollatinus, but if you install the standalone app it has a TCP server which is easy to talk to with a thin python shim.
Kyle P. Johnson
@kylepjohnson
Sep 25 2018 16:29
Hey Ben thank you for raising all these points. We devs are aware of some of the shortcomings of the lemma / POS toolchain, however it requires perspectives like yours to make sure we are on the same track. I'll make some replies in some of your gists, then we can connect there or here …
Kyle P. Johnson
@kylepjohnson
Sep 25 2018 16:47
@bnagy to get us started, two comments here: https://gist.github.com/bnagy/2e236c52c174435a459778b299323636#gistcomment-2716163 . I don't think this will solve your problems, however it hopefully explains where we're at.
Kyle P. Johnson
@kylepjohnson
Sep 25 2018 16:58

About your sfst-python -- I'm intrigued!

I assume your goal is to support any compatible data set (I see 4 available on the SFST homepage). If so, then when this is ready would you help us add your method into the CLTK?

I would leave it to your discretion how (or whether) to do this. My goal is for the CLTK to have implementations, or at least wrappers, of the most important NLP algos; but if this doesn't make sense for certain projects, then at least I would like our docs to be a place to point new users in the right direction (eg, this is what someone is working on for SyntaxNet right now).

Let me know what you are thinking. There's more to talk about, I'm sure, and I may be misunderstanding your goals. I see you're a professional programmer so to be honest you probably know better than me for some of these topics.

Kyle P. Johnson
@kylepjohnson
Sep 25 2018 17:15
@bnagy one other related package, which you may not have seen in the issues: cltk/cltk#834