Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Nov 21 2021 16:20
    serge-hulne closed #36
  • Nov 21 2021 15:34
    serge-hulne edited #36
  • Nov 21 2021 15:29
    serge-hulne opened #36
  • Mar 29 2020 08:41

    watzon on master

    Update FUNDING.yml (compare)

  • Jan 29 2020 04:24
    watzon closed #35
  • Jan 29 2020 04:24
    watzon commented #35
  • Jan 21 2020 22:37
    Calamari commented #35
  • Jan 21 2020 20:09
    watzon commented #35
  • Jan 21 2020 18:28
    Calamari edited #35
  • Jan 21 2020 18:27
    Calamari opened #35
  • Nov 11 2019 19:58
    watzon commented #30
  • Nov 11 2019 19:20
    rmarronnier commented #30
  • Nov 07 2019 23:07
    watzon unlabeled #34
  • Nov 07 2019 23:07
    watzon unlabeled #33
  • Nov 07 2019 23:07
    watzon labeled #33
  • Nov 07 2019 23:07
    watzon unlabeled #33
  • Nov 07 2019 23:06
    watzon labeled #34
  • Nov 07 2019 23:06
    watzon labeled #34
  • Nov 07 2019 23:06
    watzon labeled #33
  • Nov 07 2019 23:06
    watzon labeled #33
Chris Watson
@watzon
Try deleting it, running shards install, and recomitting
Rémy Marronnier
@rmarronnier
But the shards.lock is not comitted and github actions does a fresh shards install every time
that's why I find this weird.
Chris Watson
@watzon
Hmm you're right
Rémy Marronnier
@rmarronnier
Does Crystal master fixes this issue ?
Chris Watson
@watzon
Here's an idea, try defining branch: master for the apatite dependency
Rémy Marronnier
@rmarronnier
ok
Chris Watson
@watzon
If you don't define a branch it fetches the latest release
Which isn't compatible with 0.30.1
Rémy Marronnier
@rmarronnier
Well, at least it allowed me to discover a bug in the stemmer shard. Fix in PR :-)
Chris Watson
@watzon
Nice haha, I merged it
Rémy Marronnier
@rmarronnier
Thanks
Rémy Marronnier
@rmarronnier
I just added a build status badge for the summarizer repo. Here is the syntax : ![](https://github.com/cadmiumcr/REPONAME/workflows/WORKFLOWNAME/badge.svg)
The WORKFLOWNAME is probably set to Crystal CI but you can change it
Chris Watson
@watzon
Haha I actually had done that for one of the other repos already
My example links to the workflow as well though. I'll see if I can find it.
Also, I'm playing around with the idea of using the Bayes Classifier to do language detection.
Rémy Marronnier
@rmarronnier
Hehe, you beat me to it ;-)
Chris Watson
@watzon
I've got a poc working already
Rémy Marronnier
@rmarronnier

Also, I'm playing around with the idea of using the Bayes Classifier to do language detection.

Have fun !

Wow !
That was fast :-)
Chris Watson
@watzon
Basically it tokenizes a string normally and then takes each of the word tokens and makes them into smaller tokens up to three characters long. That way it can guess the likelyhood of a language based on the characters that are next to each other
Rémy Marronnier
@rmarronnier
I'm playing with vectors and matrices for the summarizer
Chris Watson
@watzon
Ooh nice
image.png
The best part, it works fairly well on small text samples
These both return the correct answer
Rémy Marronnier
@rmarronnier
Fucking awesome !
Chris Watson
@watzon
I just need some good sample sets to train on now
Rémy Marronnier
@rmarronnier
I'm going to write a proposal for an Evaluation repo for Cadmium.
Chris Watson
@watzon
What kind of Evaluation?
Rémy Marronnier
@rmarronnier
it will be a collection of crystal scripts that :
1 - Download a dataset
2- Run ad Cadmium::module against it
3- Compare the results with the good values
For example, for language identification : http://www.cs.cmu.edu/~ralf/langid.html
There is a data set of wikipedia texts in 100+ languages and the associated good iso codes in separate text files
I ran our current language identification algo and I got 24 % of good results :-p
Our current algo only recognise 400 languages :-)
You can train your classifier on this data set
Chris Watson
@watzon
Nice!
Chris Watson
@watzon
Holy shit I've even got it differentiating between Spanish and Portuguese
Which are very similar languages
Chris Watson
@watzon
Training on that dataset is taking a very long time :joy:
Rémy Marronnier
@rmarronnier
HahaHa !
Have you tried with MT :-p
Chris Watson
@watzon
Lol not yet
I'd have to make the classifier support it
Interesting note though, training the classifier on huge chunks of text takes an exorbitant amount of time, but training it on smaller chunks is extremely fast.
At first I tried to do classifier.train on the entire courpus for each of the languages and it was taking hours with no end in sight. But I just now tried training line by line instead and it finished in minutes.
Chris Watson
@watzon
Getting close
image.png
Too bad I can't spell
Rémy Marronnier
@rmarronnier
I love this UI !
Chris Watson
@watzon
Me too :) I like having a progress bar
I would love to figure out a way to multithread the workflow and have one progress bar per thread, but idk if it's possible