Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Oct 17 17:35
    bcdavasconcelos opened #5
  • Oct 08 23:01
    kylepjohnson synchronize #1029
  • Oct 07 05:21
    kylepjohnson opened #1029
  • Oct 07 05:08

    kylepjohnson on 1.0.1a6

    (compare)

  • Oct 07 00:57

    kylepjohnson on 1.0.0a6

    (compare)

  • Oct 06 00:45

    kylepjohnson on dev

    update all tests for syllables … (compare)

  • Oct 06 00:45
    kylepjohnson closed #1028
  • Oct 06 00:35
    kylepjohnson synchronize #1028
  • Oct 06 00:08
    kylepjohnson commented #1028
  • Oct 05 22:57
    kylepjohnson opened #1028
  • Oct 05 22:46

    kylepjohnson on dev

    Forgot to add phonetic_transcri… (compare)

  • Oct 05 22:46
    kylepjohnson closed #1027
  • Oct 05 17:04
    kylepjohnson synchronize #1027
  • Oct 05 09:24
    clemsciences synchronize #1027
  • Oct 05 03:57
    kylepjohnson commented #1027
  • Oct 01 22:38
    clemsciences synchronize #1027
  • Oct 01 22:34
    clemsciences synchronize #1027
  • Oct 01 22:21
    clemsciences synchronize #1027
  • Sep 30 22:18
    clemsciences commented #1027
  • Sep 30 22:16
    kylepjohnson commented #1027
SeenivasanSeeni
@Seenivasanseeni
@Erikishiru take a loot at this cltk/cltk#848
and also cltk/cltk#847 to transfer ownership
Ghost
@ghost~5bd5e42dd73408ce4fad0b93
@Seenivasanseeni is there any interpret able way to incorporate numbers in classic tamil? It's available only in Kalvettu afaik
SeenivasanSeeni
@Seenivasanseeni
@SunilKu12355774_twitter I don't understand. Can you explain ?
Ghost
@ghost~5bd5e42dd73408ce4fad0b93
I am just confused coz ancient tamil numericals are not available in text format.
How can it be scraped?
SeenivasanSeeni
@Seenivasanseeni
@SunilKu12355774_twitter You are correct. We will try to find them but with no assurance.
Piyush Yadav
@Erikishiru
They are available in encoding UTF-8 and we can use this for conversions https://pypi.org/project/Open-Tamil/
Ghost
@ghost~5bd5e42dd73408ce4fad0b93
@Erikishiru does it also include ancient tamil numbers?
Piyush Yadav
@Erikishiru
Ghost
@ghost~5bd5e42dd73408ce4fad0b93
Yep just saw it.
SeenivasanSeeni
@Seenivasanseeni
@Erikishiru Can you review #848 ?
SeenivasanSeeni
@Seenivasanseeni
we can also refer this https://github.com/AshokR/TamilNLP
Piyush Yadav
@Erikishiru
https://github.com/AshokR/TamilNLP is licence read the readme file
Kyle P. Johnson
@kylepjohnson

@Seenivasanseeni

Can text-books of schools and newspaper articles be used even though they are still owned by others ?

No, they cannot. Also, we only use the pre-modern form of languages, so nothing pre-18th or 17th century (sometimes earlier)

@/all Instead of beginning work on tasks like adding corpora, your time is better spent doing research on available, ancient resources. If you have some answers to the 6 questions at the end of our blog post [http://cltk.org/blog/2018/12/30/under-resourced-languages-cltk.html] then you can email me and I'll put you in touch with a potential mentor.
Ghost
@ghost~5bd5e42dd73408ce4fad0b93
I found this can we use this? @Seenivasanseeni , @Erikishiru
Piyush Yadav
@Erikishiru
@SunilKu12355774_twitter looks good
did you try to fill the registration form and access data
i was not able to go through
Ghost
@ghost~5bd5e42dd73408ce4fad0b93
not exactly the site is stating some sort of error. But we can still scrap data from sites like
Ghost
@ghost~5bd5e42dd73408ce4fad0b93
I'll get to work on it.
Tanuj Garg
@tanuj208
Hi, I'm new and I would like to contribute, can someone get me started?
Tanuj Garg
@tanuj208
I have done the beginner's exercises
Piyush Yadav
@Erikishiru
read the contributors guide
find some issues and get working
Ashish Ratnawat
@ashish-ratn
Hi @kylepjohnson , I am computer Science Undergraduate student interested in Deep learning and its intersection with NLP.
Ashish Ratnawat
@ashish-ratn
I want to participate in GSOC this summer.I have done basic tutorials and excercises. I have read the blog http://cltk.org/blog/2018/12/30/under-resourced-languages-cltk.html and I have few queries - 1. Is it necessary to add more datasets or we can work with the old datasets(like greek and latin ) and add new algorithms or make the older ones more effective ? 2. Can I work on a Language like greek/Latin/Older versions of Sanskrit/Hindi for that matter to apply NLP algorithms and do things like POS Tagging, Translation and other stuff ? 3. If I have the answer to 6 questions that the blog asks, where do i put forward those to you ?
AadilMehdi J Sanchawala
@aadilmehdis
Hey, I am new around here. Can someone tell me how I can start contributing?
AadilMehdi J Sanchawala
@aadilmehdis
I've gone through the beginners guide and documentations
jerryfrancis-97
@jerryfrancis-97
hi , i'm new here . what should i do first to contribute?
Kyle P. Johnson
@kylepjohnson
@/all read our blog post (and the project page has been updated, too)
This year, we do NOT want small contributions. We want you to focus on making very good project proposals, instead.
Kyle P. Johnson
@kylepjohnson

@ashish-ratn

we can work with the old datasets(like greek and latin

Of course, reuse is fine, but likely we don't have nearly enough.

Think about the tasks you want to accomplish. And remember, you must know a language somewhat well if you want to work on it.

@Erikishiru We prefer that you not give the other students directions.
Ghost
@ghost~5bd5e42dd73408ce4fad0b93
When can we start submitting our proposals for gsoc? @kylepjohnson
Kyle P. Johnson
@kylepjohnson
@SunilKu12355774_twitter No not yet but focus on your proposal. If you have answers to the 6 questions in our blog post, DM on this
Ghost
@ghost~5bd5e42dd73408ce4fad0b93
@kylepjohnson thanks
sainimohit23
@sainimohit23
@kylepjohnson Do language corpus has to be in original scrpits?
For example - I did some research about the maithili language. It is primarily spoken in Eastern parts of India and Nepal. The original script for maithili is 'Tirhuta'. But, since 20th century 'devnagiri' script is preferred by the writers. 'Tirhuta' is not digitized yet. And now all of the digitized maithili texts are in 'devnagiri' script.
Kyle P. Johnson
@kylepjohnson
Do language corpus has to be in original scrpits?
@sainimohit23 no, they do not. For example, last summer's cuneiform/akkadian project was entirely done in the Latin alphabet. this is because the scholars who digitize these texts choose the Latin alphabet (and we must use what they have created).
APOORV SACHAN
@apoos-maximus
hi iam interested in contributing to cltk project
could someone direct me to the right resources
Nishchith Shetty
@inishchith
Welcome to CLTK @apoos-maximus ! you can have a look at the beginners-excercieses and also follow up quickstart . :smile:
APOORV SACHAN
@apoos-maximus
@inishchith thanks !
APOORV SACHAN
@apoos-maximus
would cltk work with a python3.7 installation
?