## Where communities thrive

• Join over 1.5M+ people
• Join over 100K+ communities
• Free without limits
##### Activity
• Jul 12 2021 17:27
hybras opened #219
• May 29 2021 03:48
digantamisra98 closed #212
• Mar 08 2021 15:15
AtheMathmo closed #218
• Mar 08 2021 15:15
AtheMathmo commented #218
• Feb 23 2021 15:24
• Dec 20 2020 12:15
yutiansut opened #217
• Jul 10 2020 17:17
alfaevc opened #216
• Jul 10 2020 17:13
alfaevc opened #215
• Jul 10 2020 17:10
alfaevc opened #214
• Jan 29 2020 23:01
sixilli synchronize #213
• Jan 29 2020 02:39
sixilli edited #213
• Jan 29 2020 02:29
sixilli synchronize #213
• Jan 29 2020 01:39
sixilli opened #213
• Jan 21 2020 16:00
• Jan 17 2020 07:19
digantamisra98 opened #212
• Jan 12 2020 10:58
• Dec 26 2019 11:08
flip111 opened #211
• Dec 25 2019 20:09
flip111 opened #210
• Dec 18 2019 02:47
AtheMathmo commented #206
• Dec 18 2019 02:44
AtheMathmo closed #209
James Lucas
@AtheMathmo
if each point is the same then it will barf as the variance is 0
James Sutton
@zeryx
for each point I'm randomly selecting a value, then copying it for each dimension
so point 1 [0.225,0.225,0.225,...,0.225]
James Lucas
@AtheMathmo
Ah right - I think that could still cause issues? increasing the dimension will modify the distance between two points
James Sutton
@zeryx
the log_liklihood is 1 which is definitely suspicous
James Lucas
@AtheMathmo
i.e. [1,1] and [2,2] are sqrt(2) apart, and [1,1,1], [2,2,2] are sqrt(3) apart
yeah that is fishy :)
James Sutton
@zeryx
yeah the covariance determinate goes to 0 if each dimension is a clone
but intuitively it should be possible to cluster a 100D dataset where D(n) = D(n-1)
James Sutton
@zeryx
yeah if I just set a single dimension to 0 for all samples, it fails to cluster
which seems wrong
fishiness
James Lucas
@AtheMathmo

I agree something is funky.. It looks like I probably wont have time to do the stability improvements tonight, sorry :/

Tomorrow morning I'll definitely have time though. I wrote up AtheMathmo/rusty-machine#152 to track

James Sutton
@zeryx
nice yeah good points
James Sutton
@zeryx
might be worthwhile to encapsulate some functionality into separate functions? it's a bit hard to read what's going on in train(), like I'm not super sure what's going on in the full cov_mat option
James Lucas
@AtheMathmo
Yes definitely - though this part should be pulled out into a covariance function in rulinalg
James Sutton
@zeryx
yeah agreed
James Lucas
@AtheMathmo
@zeryx thought you may want to check this out: AtheMathmo/rusty-machine#155
James Sutton
@zeryx
nice!
James Sutton
@zeryx
One thing that might be worth adding once everything is fixed, is an optional parameter seed for the train() function, so we can actually do some accurate module testing
James Sutton
@zeryx
Big fan of the changes, I was planning on helping on sunday but got distracted with a broken car
James Sutton
@zeryx
Also @AtheMathmo let me know if you're planning on changing the gmm struct, I've implemented my own serialisation stuff in my fork (will make a PR request for it this week)
James Lucas
@AtheMathmo
There is an issue for seeding, AtheMathmo/rusty-machine#138 . We want a solution which will work for all models and so cannot modify train, as this trait is used by deterministic models too
And for the serialization I saw your implementation in the PR but I don't want individual implementations like that for all the models. We should use something like serde or rustc-serialize for the rest. There is an open issue for that too: AtheMathmo/rusty-machine#122
James Sutton
@zeryx
yeah that would be ideal
my serialization method was only done that way as a quick hack, rustc-serialize or serde is far superior
essentially just making all the structs rustcencodable and rustcdecodable will do it, super simple with rustc-serialize
James Sutton
@zeryx
heh I'm new with PRs @AtheMathmo, I feel like lots of those changes shouldn't have been actually within the PR, but I'm assuming when doing fork pull reqs it always goes to the master?
James Lucas
@AtheMathmo
You should be able to specify which branch from and to. In this case the PR is master -> master
James Sutton
@zeryx
good comments / suggestions though, I'll fix that up tonight
James Lucas
@AtheMathmo
My GitFu is pretty weak, but you should be able to make a new branch and rebase master, then make the PR from the new branch onto my master
that way you can keep whichever changes you want out of the PR
James Sutton
@zeryx
nice, yeah I'm good at regular ol commits & merges, PR's are a new thing for me (I don't work in the backend team ofc heh)
James Lucas
@AtheMathmo
thanks :) however you're comfortable doing it is best, like I say I'm fairly clueless with a lot of it...
I've started to get through my backlog of non-rust things a little more. Hopefully that means I'll have some time to really dig into the project again soon :)
James Sutton
@zeryx
:smile: yeah AtheMathmo/rusty-machine#155 isn't blocking me at a grand scale, just blocking the clustering tool from working
we're clustering just to make KNN faster over a huge dataset so it's not 100% necessary at the moment
hierarchical clustering lets you do neat things like partition datasets when searching for recommendations/etc
James Lucas
@AtheMathmo

@andrewcsmith have you had a chance to see my comments on #155 ?

I'm taking another look now to try and find the regression

Zack M. Davis
@zackmdavis
@AtheMathmo dunno if you already saw, but it looks like someone scooped you on integrating with LAPACK: https://github.com/masonium/linxal
James Lucas
@AtheMathmo
ha I did see! I'm glad someone did it - truly it's not been a hugely motivating goal for me personally and I haven't really got a good idea of how it should look in rulinalg.
Sean Martin
@phasedchirp
@AtheMathmo re: #145 After trying out some tests and benchmarks for the QR-based lin_reg training method I had, this implementation at least is probably not worth the speed trade-off (the cases where it makes a noticeable difference in parameters aren't super-realistic and it's much slower).
Sean Martin
@phasedchirp
Although it should be noted that some large portion of the slow-down is likely due to my implementation, since the theoretical differences are not nearly so large.
James Lucas
@AtheMathmo
Hey @phasedchirp sorry that I missed this message.
It's possible that your approach has some stability advantages that might be worth looking into. I'm not too sure personally and would need to do some reading I imagine
Sean Martin
@phasedchirp
@AtheMathmo it has some minor stability advantages in terms of the stability of the solution informal tests, but only in the really artificial case that I tried.
James Lucas
@AtheMathmo
I'll try to find some time to take a look through the code this weekend. We might be able to close the performance gap a little
James Sutton
@zeryx
hey @AtheMathmo sorry about the PR being on hold forever, we got pretty swamped for the past few months at work and changed gears, will look at closing it sometime this/next week.
James Lucas
@AtheMathmo
@zeryx - no worries at all! I've also had a couple very busy months and had to put a lot of things on hold myself. Take your time :)