Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
prats84
@prats84
Sorry missed out to connect after june 04th
This project seems really interesting.
Hugo Gascón
@hgascon
Hi again @prats84
you are welcome to stay around, see how it goes and eventually also contribute
Kacper Sokol
@So-Cool
@prats84 if you have any questions do not hesitate to ask. We're hanging around here all of the time.
prats84
@prats84
Sure, thanks a lot.
prats84
@prats84
You guys using any of the cloud providers to test the functionality ?
Hugo Gascón
@hgascon
What do you mean with cloud providers?
prats84
@prats84
Hey
I mean using Amazon Web Services or Azure
etc
Hugo Gascón
@hgascon
Not yet. Are u suggesting anything in particular?
Leland McInnes
@lmcinnes
I'm curious to hear more about your use of hdbscan! I would love to learn about further use cases for it. I also noted that, at the end of the blog post, you discussed working toward anomaly detection. It might be worth noting that the GLOSH algorithm for outlier detection is built into the hdbscan library and can be accessed via the outlierscores attribute of the clusterer object. In case you haven't already encountered that it may be worth checking out http://hdbscan.readthedocs.io/en/latest/outlier_detection.html
Leland McInnes
@lmcinnes
Thanks again, and I would love to hear any use cases or user stories for hdbscan.
Hugo Gascón
@hgascon
hey @lmcinnes, thanks for the suggestion!
Kacper Sokol
@So-Cool
hi @lmcinnes, since you’re here I hope that you don’t mind a few questions.
Are you planning on introducing possibility to save and load a model? Also have you thought of separate methods one for fitting and the other for querying the clustering with new samples?
Also for the outliers detection, I could retrieve the information that given sample is noise (-1 class) and get probability of it being noise but I couldn’t find a way to get information on noise of which cluster it is; can you do that with HDBSCAN?
Leland McInnes
@lmcinnes
Models should be pickleable as per other sklearn models, so hopefully that covers that problem, although I have known people who wrote custom serialization to JSON. The latter was a surprisingly short amount of code, so I haven't felt compelled to provide it, especially since it is somewhat custom to particular needs.
On the question of fitting and then querying with new samples ... see lmcinnes/hdbscan#57 for some discussion on this. The short answer is that while it can be done, it doesn't necessarily do what you would expect, so one has to be fairly careful. I may get around to writing an extra function (as opposed to a predict method) at some point, but it isn't hard if you want to try yourself. I'd be happy to work with/accept pull requests.
Leland McInnes
@lmcinnes
Finally, dealing with outliers. I would recommend the outlier score as the best way to find them. You can indeed find out which cluster it is "nearest" too, although it takes a few lines of code. The key is that the condensedtree attribute actually has all the relevant information, and you can dig through that to find what you need. The first step is to find which cluster in the tree the outlier left to become noise -- if you have the to_pandas version of the condensed tree this is effectively just "tree.parent[tree.child == point]". The next step is to find the cluster or clusters that are closest in the tree -- they will be descendants of the tree node you just found, so it's really just a matter of walking down the tree until you've found a selected cluster.
Kacper Sokol
@So-Cool
thanks a lot! That’s really helpful. BTW. hdbscan is an amazing piece of work
Leland McInnes
@lmcinnes
Thank you! I'm glad people are finding it useful.
Vishal Sharma
@VishalCR7
Hello
any mentors online ?
Hugo Gascón
@hgascon
Hi Vishal
Kacper Sokol
@So-Cool
Hi, I'm not a mentor per se, but if you have any questions about the codebase I'm happy to help.
MxResearch
@MxResearch
hello friend's
need some help regarding cuckooml
i got some error when i run cuckooml
Traceback (most recent call last):
File "cuckoo.py", line 127, in <module>
test=args.test, ml=args.ml)
File "cuckoo.py", line 71, in cuckoo_init
init_cuckooml()
File "/home/sma-lab/cuckooml-master/modules/processing/cuckooml.py", line 47, in init_cuckooml
ml.load_simple_features(simple_features_dict)
File "/home/sma-lab/cuckooml-master/modules/processing/cuckooml.py", line 517, in load_simple_features
self.extract_simple_features(simple_features)
File "/home/sma-lab/cuckooml-master/modules/processing/cuckooml.py", line 495, in extract_simple_features
simple_features = pd.DataFrame(simple_features).T
NameError: global name 'pd' is not defined
kindly suggest me , what to do on above error ?
Kacper Sokol
@So-Cool
@MxResearch Do you have pandas installed?
MxResearch
@MxResearch
I use Ubuntu os and not use pandas , can I know is there any relation between my error and pandas ?
Kacper Sokol
@So-Cool
pandas is a python package and it's required by cuckooml. More details, including how to install it, are here: https://pandas.pydata.org/
Ziv00s3
@Ziv00s3
Hello, can anybody share dataset with cuckoo reports ?
hipoudah
@hipoudah
Hello
is anyone here familiar with recommendation systems based on clustering?
Vikas sharma
@vikas623
Hello everyone, I am Vikas Sharma and I am an CSE SOPHOMORE . I want to contribute here . Can I get some help .
Kacper Sokol
@So-Cool
@vikas623, what sort of help do you need?
Vikas sharma
@vikas623
Like I want to contribute in the community. So tell me how to contribute in projects and all
Kacper Sokol
@So-Cool
This is not quite how it works. Have you tried using the software?
Vikas sharma
@vikas623
Yes
Kacper Sokol
@So-Cool
Is there anything that does not work as expected? Have you had a look at the issues? Maybe you wish the documentation (docstrings ) was clearer? The ML module is not tested, maybe you could look into it? Any preferences?
Vikas sharma
@vikas623
Yes let me use the software Little more then I'll revert back..
malware.research
@ResearchMalware_twitter
Hello
I am using cuckooML
I want to know, how to set folder path in the config file cuckooml
can any one help me ?
Kacper Sokol
@So-Cool
Hi @ResearchMalware_twitter, please have a look at https://github.com/honeynet/cuckooml/blob/master/conf/cuckooml.conf#L3