These are chat archives for cltk/cltk

14th
Mar 2016
Nathan D. Smith
@nathans
Mar 14 2016 00:14
It appears to that the Greek betacode to unicode conversion does not support + to denote diaeresis (as discovered when running it on the CATSS lxxmorph text)
I don't know if there is a betacode standard per se
but if I get some time in the next couple days I'll see if I can work up a PR.
and @kylepjohnson yes you can get in touch re: elasticsearch
Kyle P. Johnson
@kylepjohnson
Mar 14 2016 00:53
@manu-chroma : For us, a "corpus" (plural: corpora) is a collection of text documents, for example those you see in here with "text" in their name: https://github.com/cltk
Hi @PengFoo I got your email, will reply tonight
Kyle P. Johnson
@kylepjohnson
Mar 14 2016 00:58
@nathans The Betacode is a standard – but that doesn't mean everyone follows it! For reference, you could use these two guides: http://www.tlg.uci.edu/encoding/BCM.pdf
Kyle P. Johnson
@kylepjohnson
Mar 14 2016 01:15
Please let me know if you run into any thorny issues in how I implemented it. I wouldn't say that what I did was over–thought
Sourav Singh
@souravsingh
Mar 14 2016 18:24
@kylepjohnson I have made some progress in writing a corpus for Anglo-Saxon(Old English) Language
Should I transfer the repo ownership?
Kyle P. Johnson
@kylepjohnson
Mar 14 2016 18:34
Hi, thank you
@souravsingh I
Could you please put where you got it from and its licensing? Then we'll transfer it
Sourav Singh
@souravsingh
Mar 14 2016 18:40
I have obtained the resource from Here-http://www.sacred-texts.com/neu/ascp/ and it is mentioned as public domain for which we would not need permission.
Kyle P. Johnson
@kylepjohnson
Mar 14 2016 18:42
Ok, great. Give that link in the README, as well as say in it that the texts are Public Domain. Remember, even if a text is free to redistribute, we need to remind our users downstream of this :)
Rob Jenson
@ferthalangur
Mar 14 2016 19:53
Hey folks, not to pee on the cornflakes here, but before republishing, you want to be sure that the material they published that you use is really in the Public Domain. The statement on their site is quite nuanced:
```
The texts presented here are either original scans from books and articles clearly in the public domain, material which has been presented elsewhere on the Internet, or material included under fair use conditions in printed anthologies.
Kyle P. Johnson
@kylepjohnson
Mar 14 2016 19:55
@ferthalangur It's good to be a stickler on this kinda stuff. Thank you!
I'll review myself, perhaps write to the site owner, and make a decision
Rob Jenson
@ferthalangur
Mar 14 2016 19:56
Hi @kylepjohnson. Yeah, in my previous? life as an archivist, you would be amazed at what people believed constituted "Fair Use" or "in the Public Domain."
Kyle P. Johnson
@kylepjohnson
Mar 14 2016 19:57
Ha, I've heard of that. "The Internet is Public Domain, man"
Rob Jenson
@ferthalangur
Mar 14 2016 19:57
If they were volunteer-scanned from texts that have passed into the Public Domain, no problem.
OTOH -- "It is on a web or FTP site" ... not so much. :)
Kyle P. Johnson
@kylepjohnson
Mar 14 2016 19:58
Gotcha. I may follow up with you by email if I need help.
Rob Jenson
@ferthalangur
Mar 14 2016 19:58
No problem. Whatever I can do.
Sourav Singh
@souravsingh
Mar 14 2016 21:04
@kylepjohnson I had written to the owners of the site as a precaution to ask for permission to use the content.
@ferthalangur Thanks for the info. I didn't quite notice that statement.