Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jul 12 2021 17:27
    hybras opened #219
  • May 29 2021 03:48
    digantamisra98 closed #212
  • Mar 08 2021 15:15
    AtheMathmo closed #218
  • Mar 08 2021 15:15
    AtheMathmo commented #218
  • Feb 23 2021 15:24
    asad-awadia opened #218
  • Dec 20 2020 12:15
    yutiansut opened #217
  • Jul 10 2020 17:17
    alfaevc opened #216
  • Jul 10 2020 17:13
    alfaevc opened #215
  • Jul 10 2020 17:10
    alfaevc opened #214
  • Jan 29 2020 23:01
    sixilli synchronize #213
  • Jan 29 2020 02:39
    sixilli edited #213
  • Jan 29 2020 02:29
    sixilli synchronize #213
  • Jan 29 2020 01:39
    sixilli opened #213
  • Jan 21 2020 16:00
    DenialAdams synchronize #205
  • Jan 17 2020 07:19
    digantamisra98 opened #212
  • Jan 12 2020 10:58
    LukasKalbertodt commented #208
  • Dec 26 2019 11:08
    flip111 opened #211
  • Dec 25 2019 20:09
    flip111 opened #210
  • Dec 18 2019 02:47
    AtheMathmo commented #206
  • Dec 18 2019 02:44
    AtheMathmo closed #209
James Sutton
@zeryx
which is... very weird
I think that's totally a bug yea?
James Lucas
@AtheMathmo
probably not - maybe a consequence of the curse of dimensionality
James Sutton
@zeryx
I checked what format my dad was when being passed into the ::train() method
James Sutton
@zeryx
The opposite is happening though, it's more accurate with more dimensions
James Lucas
@AtheMathmo
hmm, maybe as it uses clusters to cover more of the data? I'm not too sure :)
I doubt it is a bug.. but maybe..
James Sutton
@zeryx
going to test with each dimension having the exact same value for each sample and see if it changes
James Lucas
@AtheMathmo
good idea, sadly I'm not sure I'll be able to offer much.
James Sutton
@zeryx
yup it barfs if I do that
which shouldn't happen
James Lucas
@AtheMathmo
so you are just creating multiple entries of the same point?
and increasing dimensionality
if each point is the same then it will barf as the variance is 0
James Sutton
@zeryx
for each point I'm randomly selecting a value, then copying it for each dimension
so point 1 [0.225,0.225,0.225,...,0.225]
James Lucas
@AtheMathmo
Ah right - I think that could still cause issues? increasing the dimension will modify the distance between two points
James Sutton
@zeryx
the log_liklihood is 1 which is definitely suspicous
James Lucas
@AtheMathmo
i.e. [1,1] and [2,2] are sqrt(2) apart, and [1,1,1], [2,2,2] are sqrt(3) apart
yeah that is fishy :)
James Sutton
@zeryx
yeah the covariance determinate goes to 0 if each dimension is a clone
but intuitively it should be possible to cluster a 100D dataset where D(n) = D(n-1)
James Sutton
@zeryx
yeah if I just set a single dimension to 0 for all samples, it fails to cluster
which seems wrong
fishiness
James Lucas
@AtheMathmo

I agree something is funky.. It looks like I probably wont have time to do the stability improvements tonight, sorry :/

Tomorrow morning I'll definitely have time though. I wrote up AtheMathmo/rusty-machine#152 to track

James Sutton
@zeryx
nice yeah good points
James Sutton
@zeryx
might be worthwhile to encapsulate some functionality into separate functions? it's a bit hard to read what's going on in train(), like I'm not super sure what's going on in the full cov_mat option
James Lucas
@AtheMathmo
Yes definitely - though this part should be pulled out into a covariance function in rulinalg
James Sutton
@zeryx
yeah agreed
James Lucas
@AtheMathmo
@zeryx thought you may want to check this out: AtheMathmo/rusty-machine#155
James Sutton
@zeryx
nice!
James Sutton
@zeryx
One thing that might be worth adding once everything is fixed, is an optional parameter seed for the train() function, so we can actually do some accurate module testing
James Sutton
@zeryx
Big fan of the changes, I was planning on helping on sunday but got distracted with a broken car
James Sutton
@zeryx
Also @AtheMathmo let me know if you're planning on changing the gmm struct, I've implemented my own serialisation stuff in my fork (will make a PR request for it this week)
James Lucas
@AtheMathmo
There is an issue for seeding, AtheMathmo/rusty-machine#138 . We want a solution which will work for all models and so cannot modify train, as this trait is used by deterministic models too
And for the serialization I saw your implementation in the PR but I don't want individual implementations like that for all the models. We should use something like serde or rustc-serialize for the rest. There is an open issue for that too: AtheMathmo/rusty-machine#122
James Sutton
@zeryx
yeah that would be ideal
my serialization method was only done that way as a quick hack, rustc-serialize or serde is far superior
essentially just making all the structs rustcencodable and rustcdecodable will do it, super simple with rustc-serialize
James Sutton
@zeryx
heh I'm new with PRs @AtheMathmo, I feel like lots of those changes shouldn't have been actually within the PR, but I'm assuming when doing fork pull reqs it always goes to the master?
James Lucas
@AtheMathmo
You should be able to specify which branch from and to. In this case the PR is master -> master
James Sutton
@zeryx
good comments / suggestions though, I'll fix that up tonight
James Lucas
@AtheMathmo
My GitFu is pretty weak, but you should be able to make a new branch and rebase master, then make the PR from the new branch onto my master
that way you can keep whichever changes you want out of the PR
James Sutton
@zeryx
nice, yeah I'm good at regular ol commits & merges, PR's are a new thing for me (I don't work in the backend team ofc heh)
rad I'll do that
James Lucas
@AtheMathmo
thanks :) however you're comfortable doing it is best, like I say I'm fairly clueless with a lot of it...
I've started to get through my backlog of non-rust things a little more. Hopefully that means I'll have some time to really dig into the project again soon :)