Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Nov 21 2021 16:20
    serge-hulne closed #36
  • Nov 21 2021 15:34
    serge-hulne edited #36
  • Nov 21 2021 15:29
    serge-hulne opened #36
  • Mar 29 2020 08:41

    watzon on master

    Update FUNDING.yml (compare)

  • Jan 29 2020 04:24
    watzon closed #35
  • Jan 29 2020 04:24
    watzon commented #35
  • Jan 21 2020 22:37
    Calamari commented #35
  • Jan 21 2020 20:09
    watzon commented #35
  • Jan 21 2020 18:28
    Calamari edited #35
  • Jan 21 2020 18:27
    Calamari opened #35
  • Nov 11 2019 19:58
    watzon commented #30
  • Nov 11 2019 19:20
    rmarronnier commented #30
  • Nov 07 2019 23:07
    watzon unlabeled #34
  • Nov 07 2019 23:07
    watzon unlabeled #33
  • Nov 07 2019 23:07
    watzon labeled #33
  • Nov 07 2019 23:07
    watzon unlabeled #33
  • Nov 07 2019 23:06
    watzon labeled #34
  • Nov 07 2019 23:06
    watzon labeled #34
  • Nov 07 2019 23:06
    watzon labeled #33
  • Nov 07 2019 23:06
    watzon labeled #33
Chris Watson
@watzon
image.png
Too bad I can't spell
Rémy Marronnier
@rmarronnier
I love this UI !
Chris Watson
@watzon
Me too :) I like having a progress bar
I would love to figure out a way to multithread the workflow and have one progress bar per thread, but idk if it's possible
Chris Watson
@watzon
Training just finished on a massive amount of data
188 languages
Rémy Marronnier
@rmarronnier
>
I would love to figure out a way to multithread the workflow and have one progress bar per thread, but idk if it's possible
I don't see why not, don't apt-get do this when dowloading several packages. ?
Anyway, it's so cool to train your own models with your own code. Congrats !
Chris Watson
@watzon
I'm sure that a multithreaded workload is possible, it's just going to require some heavy refactoring in the actual BayesClassifier class
Rémy Marronnier
@rmarronnier
Yeah, I guess. When MT is by default on in Crystal, we'll have refactoring our algos. (Good luck with glove :-p)
Chris Watson
@watzon
Luckily I already have the GloVe algo ready to go
Rémy Marronnier
@rmarronnier
yeah, you're right
Chris Watson
@watzon
It's technically multithreaded already, but it's using a library that spins up new threads itself
Rémy Marronnier
@rmarronnier

but it's using a library that spins up new threads itself

I'll have to look again at your code because apart from apatite I don't see any special library

Chris Watson
@watzon
Oh it looks like I never pushed my last update
That's why
Rémy Marronnier
@rmarronnier
oh nice ! it's chrismas :-D
Chris Watson
@watzon
I got my tokenizer for the language classifer fixed and I'm retraining :D
Rémy Marronnier
@rmarronnier
'till how many languages will you stop ? :-p
PR ready for cadmiumcr/classifier and one waiting for cadmiumcr/distance :-)
Chris Watson
@watzon
Awesome!
Rémy Marronnier
@rmarronnier
Also before I forget, it would be great to factor out the asterite optimized (tm :-)) character tokenizer to integrate it in Cadmium::Tokenizer and eventually in Cadmium::Ngrams WDYT ?
the one you used in your POC language detector
Chris Watson
@watzon
I'm planning on it. I actually improved on it a little more, but it has some language classifier specific things in there right now
I'm going to modify the Case tokenizer
Rémy Marronnier
@rmarronnier
Wow, so fucking nice :-D
Chris Watson
@watzon
It's definitely going to end up a little more powerful
Definitely a little slower though
Rémy Marronnier
@rmarronnier
That's will be so useful in many parts of Cadmium
Chris Watson
@watzon
Definitely :)
Chris Watson
@watzon
First commit is up if you want to play with it https://github.com/cadmiumcr/lang
Rémy Marronnier
@rmarronnier
I guess there were some communication missteps because I have a question and mixed feelings about this : Why was it directly moved to a new cadmiumcr repo (without discussing it first) ?
You set up a great way to discuss additions/changes to the cadmiumcr ecosystem : the cadmiumcr/rfcs. Why not use it ?
Concerning the language identification module, what are your feelings, plans, thoughts ?
To be clear : if you're not satisfied with cadmium_language_detector, I'm open to replace it with you solution, wrap our algos as engines to a common API, etc.
Just say it ;-)
Chris Watson
@watzon
Yeah I suppose it could've been RFCd, but I just wanted something more accurate and different from what currently exists (since I haven't actually seen any language detectors made using a classifer; they all rely on algorithms that are pretty finicky). The cadmium_language_detector project is cool, and we can probably do some API merging, but personally I'd like to go with whichever is more accurate for the actual algorithm.
I'm planning on writing some tests tomorrow, but in my testing so far cadmium_lang is pretty dang accurate, even with small text samples
Rémy Marronnier
@rmarronnier

but personally I'd like to go with whichever is more accurate for the actual algorithm

I'm ok with that criteria.

Can't wait to see your test results. That makes me think, I should write that cadmiumcr/evaluation proposal.

Chris Watson
@watzon
Evaluation? Do tell
Rémy Marronnier
@rmarronnier

Using the way back machine (:-p) :

What kind of Evaluation?
it will be a collection of crystal scripts that :
1 - Download a dataset
2- Run ad Cadmium::module against it
3- Compare the results with the good values

It will be useful to evaluate some algos (Language identification, POS tagging, sentiment analysis) and let our users check for themselves the accuracy of our tools
Chris Watson
@watzon
Ahh yeah
Forgot about that haha. Feel free to submit a proposal :)
Chris Watson
@watzon
Btw not sure if you noticed, but I bought an actual domain https://cadmiumcr.com/
Rémy Marronnier
@rmarronnier
.com ? Things are getting serious :-D
Chris Watson
@watzon
Oh yes lol
The other domain was a shitty free one that ended up breaking after a week
Rémy Marronnier
@rmarronnier
free ? I didn't know that was even possible !
Chris Watson
@watzon
Yeah but the free domains are very unreliable
Rémy Marronnier
@rmarronnier
I was thinking about the Cadmium roadmap. Do you have any thought on the subject ?
Chris Watson
@watzon
We need to make one haha, now that things are pretty well separated we also need to work on getting the main repo fleshed out. I don't know if I want to make it a reference repository or actually include everything all at once