Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Nov 21 2021 16:20
    serge-hulne closed #36
  • Nov 21 2021 15:34
    serge-hulne edited #36
  • Nov 21 2021 15:29
    serge-hulne opened #36
  • Mar 29 2020 08:41

    watzon on master

    Update FUNDING.yml (compare)

  • Jan 29 2020 04:24
    watzon closed #35
  • Jan 29 2020 04:24
    watzon commented #35
  • Jan 21 2020 22:37
    Calamari commented #35
  • Jan 21 2020 20:09
    watzon commented #35
  • Jan 21 2020 18:28
    Calamari edited #35
  • Jan 21 2020 18:27
    Calamari opened #35
  • Nov 11 2019 19:58
    watzon commented #30
  • Nov 11 2019 19:20
    rmarronnier commented #30
  • Nov 07 2019 23:07
    watzon unlabeled #34
  • Nov 07 2019 23:07
    watzon unlabeled #33
  • Nov 07 2019 23:07
    watzon labeled #33
  • Nov 07 2019 23:07
    watzon unlabeled #33
  • Nov 07 2019 23:06
    watzon labeled #34
  • Nov 07 2019 23:06
    watzon labeled #34
  • Nov 07 2019 23:06
    watzon labeled #33
  • Nov 07 2019 23:06
    watzon labeled #33
Chris Watson
@watzon
Oh it looks like I never pushed my last update
That's why
Rémy Marronnier
@rmarronnier
oh nice ! it's chrismas :-D
Chris Watson
@watzon
I got my tokenizer for the language classifer fixed and I'm retraining :D
Rémy Marronnier
@rmarronnier
'till how many languages will you stop ? :-p
PR ready for cadmiumcr/classifier and one waiting for cadmiumcr/distance :-)
Chris Watson
@watzon
Awesome!
Rémy Marronnier
@rmarronnier
Also before I forget, it would be great to factor out the asterite optimized (tm :-)) character tokenizer to integrate it in Cadmium::Tokenizer and eventually in Cadmium::Ngrams WDYT ?
the one you used in your POC language detector
Chris Watson
@watzon
I'm planning on it. I actually improved on it a little more, but it has some language classifier specific things in there right now
I'm going to modify the Case tokenizer
Rémy Marronnier
@rmarronnier
Wow, so fucking nice :-D
Chris Watson
@watzon
It's definitely going to end up a little more powerful
Definitely a little slower though
Rémy Marronnier
@rmarronnier
That's will be so useful in many parts of Cadmium
Chris Watson
@watzon
Definitely :)
Chris Watson
@watzon
First commit is up if you want to play with it https://github.com/cadmiumcr/lang
Rémy Marronnier
@rmarronnier
I guess there were some communication missteps because I have a question and mixed feelings about this : Why was it directly moved to a new cadmiumcr repo (without discussing it first) ?
You set up a great way to discuss additions/changes to the cadmiumcr ecosystem : the cadmiumcr/rfcs. Why not use it ?
Concerning the language identification module, what are your feelings, plans, thoughts ?
To be clear : if you're not satisfied with cadmium_language_detector, I'm open to replace it with you solution, wrap our algos as engines to a common API, etc.
Just say it ;-)
Chris Watson
@watzon
Yeah I suppose it could've been RFCd, but I just wanted something more accurate and different from what currently exists (since I haven't actually seen any language detectors made using a classifer; they all rely on algorithms that are pretty finicky). The cadmium_language_detector project is cool, and we can probably do some API merging, but personally I'd like to go with whichever is more accurate for the actual algorithm.
I'm planning on writing some tests tomorrow, but in my testing so far cadmium_lang is pretty dang accurate, even with small text samples
Rémy Marronnier
@rmarronnier

but personally I'd like to go with whichever is more accurate for the actual algorithm

I'm ok with that criteria.

Can't wait to see your test results. That makes me think, I should write that cadmiumcr/evaluation proposal.

Chris Watson
@watzon
Evaluation? Do tell
Rémy Marronnier
@rmarronnier

Using the way back machine (:-p) :

What kind of Evaluation?
it will be a collection of crystal scripts that :
1 - Download a dataset
2- Run ad Cadmium::module against it
3- Compare the results with the good values

It will be useful to evaluate some algos (Language identification, POS tagging, sentiment analysis) and let our users check for themselves the accuracy of our tools
Chris Watson
@watzon
Ahh yeah
Forgot about that haha. Feel free to submit a proposal :)
Chris Watson
@watzon
Btw not sure if you noticed, but I bought an actual domain https://cadmiumcr.com/
Rémy Marronnier
@rmarronnier
.com ? Things are getting serious :-D
Chris Watson
@watzon
Oh yes lol
The other domain was a shitty free one that ended up breaking after a week
Rémy Marronnier
@rmarronnier
free ? I didn't know that was even possible !
Chris Watson
@watzon
Yeah but the free domains are very unreliable
Rémy Marronnier
@rmarronnier
I was thinking about the Cadmium roadmap. Do you have any thought on the subject ?
Chris Watson
@watzon
We need to make one haha, now that things are pretty well separated we also need to work on getting the main repo fleshed out. I don't know if I want to make it a reference repository or actually include everything all at once
Rémy Marronnier
@rmarronnier
Oh you mean putting all modules in the repo instead of shard-linking to them ?
Chris Watson
@watzon
Exactly. Just a "one time import" type of thing.
But idk if I like that idea
Rémy Marronnier
@rmarronnier
We need to have one big repo to generate api docs for the website, and I don't see how we can do that with separate shards
Chris Watson
@watzon
True, but at the same time the documentation generator isn't capable of generating docs for dependencies yet
I'm actually thinking of setting up a workflow to generate all the docs
Rémy Marronnier
@rmarronnier
that what I feared
that'd be awesome
Chris Watson
@watzon
All we need is a docker container that clones everything, cds into each folder, runs crystal docs, clones the website, and moves all of the generated docs into a subfolder
Then pushes the updated site
Rémy Marronnier
@rmarronnier
hmm maybe move all the folders into a unique cadmium one, else we'll have conflicting index files
Chris Watson
@watzon
Yeah I'm thinking for now they'll all be in individual folders
Eventually I want to be able to group them all together though
Rémy Marronnier
@rmarronnier
ok I see
Chris Watson
@watzon
The documentation generator has a long way to go
Rémy Marronnier
@rmarronnier
Can't we change the color to green ?