These are chat archives for cltk/cltk

14th
Feb 2018
Dhruv Apte
@the-ethan-hunt
Feb 14 2018 02:53
Welcome aboard @Sedictious !. You have a look at the https://github.com/cltk/tutorials and the beginner's exercises :smile:
suvamdubey
@suvamdubey
Feb 14 2018 04:02
Is it necessary to know a classical language in order to contribute?
Asutosh
@Asutosh989
Feb 14 2018 04:03
Hi, I have the same question as @suvamdubey
Shashank Shekhar
@thunderandrain
Feb 14 2018 04:06
Me too.
Nishchith Shetty
@inishchith
Feb 14 2018 04:22
@suvamdubey @Asutosh989 @thunderandrain If you're trying to contribute in that domain , it's expected that you have a gist of it . but also there are other issues that you can go ahead solving :smile:
Shashank Shekhar
@thunderandrain
Feb 14 2018 04:23
but is really knowing a language necessary for it?we only have to do natural language processing, right?
Asutosh
@Asutosh989
Feb 14 2018 04:23
@inishchith can you point out the beginner issues for starter?
Nishchith Shetty
@inishchith
Feb 14 2018 04:27
@Asutosh989 you can get started by looking at some tags like 'documentation' , 'bug' , 'enhancement' also you can open a ticket if you find a bug . hope this helps :smile:
@thunderandrain for example , if you're trying to add some feature to a classical language , you at least need a basic understanding of it . you can start with above mentioned tags to get a gist of what i'm trying to say :smile: @kylepjohnson correct me if i'm wrong
Asutosh
@Asutosh989
Feb 14 2018 04:38
Thanks @inishchith
Nishchith Shetty
@inishchith
Feb 14 2018 04:39
:smile: :+1:
Shashank Shekhar
@thunderandrain
Feb 14 2018 04:53
Thanks @inishchith
Nishchith Shetty
@inishchith
Feb 14 2018 05:14
:+1:
Dhruv Apte
@the-ethan-hunt
Feb 14 2018 07:08
@Asutosh989 , you can try the issues titled Easy and also follow up https://github.com/cltk/cltk/wiki/Quickstart-for-contributors
Chatziargyriou Eleftheria
@Sedictious
Feb 14 2018 08:48
I am currently trying to add a Swadesh List for Old Portuguese and it seems like there are some words that don't exist in the given language. What is the indented formatting at this case? (e.g "-"). Edit: It actually looks like this has already been done by another user that simply used empty strings. I'm still trying to get the hang of github :/
Vikrant Goyal
@vikrant97
Feb 14 2018 10:47
I want to add some corpora to cltk core in either hindi or punjabi language to get to know about the cltk codebase better. But I don't know about the copyright issues. Can anyone please help, which corpus is eligible to be added to cltk? Thanks.
Vikrant Goyal
@vikrant97
Feb 14 2018 12:48
Can anyone please tell which tests need to be performed when change the codebase and rebuild it? I have tested it with nosetests only.
Chetanya
@chetanya-shrimali
Feb 14 2018 13:25
Hey, I am new to this community and want contribute GSoC 2018. I have been contributing to opensource for a long time and have a good knowledge of machine learning and Information retrieval(text learning). Where should i begin with?
Asutosh
@Asutosh989
Feb 14 2018 14:53
@the-ethan-hunt thanks, I will be going through it.
Chatziargyriou Eleftheria
@Sedictious
Feb 14 2018 15:14

Hey @chetanya-shrimali and welcome! Now, I am new myself so take my advice with a grain of salt.

As the mentors have pointed out, you will probably need to start by going through the easier issues and checking out the beginner's exercises (https://github.com/cltk/cltk/wiki/Beginners'-exercises). Meanwhile, it would probably help to experiment with the software itself and familiarize yourself with the documentation.

Dhruv Apte
@the-ethan-hunt
Feb 14 2018 15:20

I want to add some corpora to cltk core in either hindi or punjabi language to get to know about the cltk codebase better. But I don't know about the copyright issues. Can anyone please help, which corpus is eligible to be added to cltk? Thanks.

@vikrant97 , I would suggest you to look through the corpora where ancient Punjabi is available. Ancient punjabi here refers to the 10th century or before as the language heavily borrowed words from Persian and Arabic after the Arab invasions in India. :smile:

Vikrant Goyal
@vikrant97
Feb 14 2018 17:43
@the-ethan-hunt , Thanks for your response, I will surely look into this. But, for now I took the task of adding swadeshi list for Punjabi to cltk and submitted a PR. I request any of the mentors to please give their valuable time and review my PR #677 . Thanks :smile:
Asutosh
@Asutosh989
Feb 14 2018 18:37
Hi, I would like to update the corpora for Oriya language
I will be submitting a PR shortly.
@the-ethan-hunt Can I know the format in which Oriya language can be integrated?
Pabitra Lenka
@pabitralenka
Feb 14 2018 23:57
@Sedictious yes there are some missing entries in the Old Portuguese swadesh list. If you feel you can fill those missing entries, please open up a ticket explaining the issue and submit a PR for the same.