These are chat archives for cltk/cltk

13th
May 2016
James Tauber
@jtauber
May 13 2016 00:20
I'm trying to think of a Greek translation for "Parsey McParseFace" :-)
I guess we just need VERB->deverbal-adjective + [VERB + "face"]->patronymic
for some VERB roughly equivalent to "to parse"
Kyle P. Johnson
@kylepjohnson
May 13 2016 00:28
Brilliant.
le/gw isn't far off. We should check the Eng-Grk dict that Chicago hosts
James Tauber
@jtauber
May 13 2016 02:49
@kylepjohnson I haven't looked at any of the pos-tagging stuff but does any of it use morphology or any kind of word-internal cues or are they all "outside / external" tagging?
Kyle P. Johnson
@kylepjohnson
May 13 2016 03:40
I've only skimmed the article. http://arxiv.org/pdf/1603.06042v1.pdf ...
"following features on a window ±3 tokens cen- tered at the current focus token: word, cluster, character n-gram up to length 3. " (p 5)
What hey mean by character n-gram I don't know (assuming the end)
James Tauber
@jtauber
May 13 2016 03:43
@kylepjohnson I actually meant existing POS stuff used in CLTK
Kyle P. Johnson
@kylepjohnson
May 13 2016 03:44
Oh haha. Sorry I've got TF on my mind a lot these days.
James Tauber
@jtauber
May 13 2016 03:44
like the CRF model you just trained
Kyle P. Johnson
@kylepjohnson
May 13 2016 03:44
CRF I don't know yet. It's well documented tho.
The TnT is really interesting. And way back I messed around with custom prefix/suffix algos but didn't do anything great for Greek or Latin
James Tauber
@jtauber
May 13 2016 03:46
character n-grams probably do a decent job although if they are trigrams and rely on a stop character for end-of-word, they're only getting the last two characters :-)
James Tauber
@jtauber
May 13 2016 04:09
although it's given me an idea for a little experiment and blog post :-)
James Tauber
@jtauber
May 13 2016 04:17
(i guess trigrams are fine because co-incidence of, say, ομε, μεν, and εν# would indicate a lot)
Kyle P. Johnson
@kylepjohnson
May 13 2016 04:20
That seems right
If you want to test any of this, including SyntaxNet, go for it
James Tauber
@jtauber
May 13 2016 04:21
or θησ ησο σον οντ ντα ται αι# :-)
oh, I'd definitely like to play around with it