Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Oct 02 18:47
    sarony removed as member
  • Oct 02 17:45
    erictleung commented #82
  • Aug 15 11:17
    FrednandFuria opened #82
  • Jun 20 21:19
    @bjorno43 banned @shenerd140
  • May 10 09:13
    @bjorno43 banned @zhaokunhaoa
  • Apr 27 19:48
    @mstellaluna banned @zhonghuacx
  • Apr 25 17:07
    @mstellaluna banned @cmal
  • Jan 08 22:07
    @mstellaluna banned @gautam1858
  • Jan 08 22:05
    @mstellaluna banned @dertiuss323
  • Dec 15 2018 23:34
    @mstellaluna banned @Julianna7x_gitlab
  • Oct 12 2018 05:50
    @bjorno43 banned @NACH74
  • Oct 05 2018 23:02
    @mstellaluna banned @JomoPipi
  • Sep 16 2018 12:21
    @bjorno43 banned @yash-kedia
  • Sep 16 2018 12:16
    @bjorno43 banned @vnikifirov
  • Sep 05 2018 08:13
    User @bjorno43 unbanned @androuino
  • Sep 05 2018 07:38
    @bjorno43 banned @androuino
  • Aug 23 2018 16:58
    User @bjorno43 unbanned @rahuldkjain
  • Aug 23 2018 16:23
    @bjorno43 banned @rahuldkjain
  • Jul 29 2018 14:15
    User @bjorno43 unbanned @jkyereh
  • Jul 29 2018 01:00
    @bjorno43 banned @jkyereh
Sundeep
@pidugusundeep
Hey
Anu-Pra
@Anu-Pra
Hey
Is anyone interested in forming a study group for learning P1xt Data Science guide?
I need a study partner
Eric Leung
@erictleung
@padunk if you remember, what is the difference between np.std() and pd.std()? I'd like to know :smile:
@pidugusundeep hello!
@Anu-Pra I don't think I have the bandwidth to join, but feel free to use this space to bounce ideas! People around here have a range of expertise, but we're all interested in learning as well. Questions you may have while going through the guide are probably general enough for us to answer.
Eric Leung
@erictleung

For those who use R and the tidyverse, tidyr has been updated to 1.0.0! https://www.tidyverse.org/articles/2019/09/tidyr-1-0-0/

Some notable updates:

  • New pivot_longer() and pivot_wider() provide improved tools for reshaping, superceding spread() and gather(). The new functions are substantially more powerful, thanks to ideas from the data.table and cdata packages, and I’m confident that you’ll find them easier to use and remember than their predecessors.
  • New unnest_auto(), unnest_longer(), unnest_wider(), and hoist() provide new tools for rectangling, converting deeply nested lists into tidy data frames.
  • nest() and unnest() have been changed to match an emerging principle for the design of ... interfaces. Four new functions (pack()/unpack(), and chop()/unchop()) reveal that nesting is the combination of two simpler steps.
  • New expand_grid(), a variant of base::expand.grid(). This is a useful function to know about, but also serves as a good reason to discuss the important role that vctrs plays behind the scenes. You shouldn’t ever have to learn about vctrs, but it brings improvements to consistency and performance.
Alice Jiang
@becausealice2
@padunk Different denominators. Have a look at this
Anu-Pra
@Anu-Pra
Thank you @erictleung . Together we all grow!
Eric Leung
@erictleung

So Google's gonna be Google. I was skimming through Google's AI blog (highly recommended btw) and was reading about some new neural networks, namely "weight agnostic neural networks" or WANNs https://ai.googleblog.com/2019/08/exploring-weight-agnostic-neural.html

So traditionally, you'll need to design the architecture of neural networks (i.e., how many layers, the connections, how many nodes, etc). These WANNs are apparently a way to use automation and have the computer find out which architectures work the best. It is a fascinating thing to think about.

Anandesh Sharma
@Anandesh-Sharma
Hey guys welcome me!
londheshubham
@londheshubham
@erictleung Great find buddy, will surely go through it!
Eric Leung
@erictleung
@Anandesh-Sharma welcome!
TJ-coding
@TJ-coding
hello
Anandesh Sharma
@Anandesh-Sharma
@erictleung Thank you
Alice Jiang
@becausealice2
Have you guys seen this yet?
Eric Leung
@erictleung
@becausealice2 oh man, the box surfing strategy got me laughing so hard :laughing: All this reminds me of the Infinite monkey theorem where a monkey given enough time can type out Hamlet https://en.m.wikipedia.org/wiki/Infinite_monkey_theorem Although those free agents aren't given explicit instructions, they are able to "learn" after millions of iterations. It almost seems inevitable for the computer to eventually find an optimal strategy.
Alice Jiang
@becausealice2
Box surfing nearly knocked me out, as well, but the agents were so cute I was honestly laughing the whole time
jaimecuellar14
@jaimecuellar14
Hey there, I was looking for someone to help me understand some things about deep learning
for semantic segmentation
Alice Jiang
@becausealice2
@jaimecuellar14 I don't know that we have any one person who can help but if you ask we might be able to find you an answer
jaimecuellar14
@jaimecuellar14
I have a set of pictures and their mask and also a file containing like the percentage of items in the masks for example lets say chair: 38%, i have made a model (bad one 60% acc) but i am have to give as an answer like img_predicted.png chair:10%, table:40% and i have no idea on how to do this, and also how to improve my model
i am really new to this
Nao
@Ngoldberg
Is anyone on that could is familiar with manipulating data, like dealing with databases, dbf files, free tables. I could use some brainstorming help :)
Eric Leung
@erictleung
@Ngoldberg I don't have experience with dbf files specifically, but I've manipulated my fair share of data in R. What kind of questions do you have?
@jaimecuellar14 I don't have any experience with semantic image segmentation, but happy to brainstorm/debug your issues. Have you seen this document? http://blog.qure.ai/notes/semantic-segmentation-deep-learning-review It seems like it might be of use.
Pedro Henrique Braga da Silva
@pedrohenriquebr
Hi there
I would like to know the best DBSCAN variant algorithms
Eric Leung
@erictleung
@pedrohenriquebr I'm somewhat familiar with DBSCAN, but not its variants. Is DBSCAN not sufficient for your work?
M4H3NDR4N 5P4RK3R
@M4H3NDR4N
Hey guys, how to convert a yolo model as a rest api?
Eric Leung
@erictleung
Ram G Suri
@ramgsuri
Folks if someone help me, I have list of orders along with their timestamp at different stages ( like makeline / being baked/ dispatched to delivery ) Now lets say I have a new order I want to predict its ETA Please help what model to use.
Eric Leung
@erictleung
@ramgsuri a multivariate linear regression would be a good first start
sa-js
@sa-js
Does anyone has experience of dealing with text data. I have a dataset in which there are SEO keywords and I have to predict clicks using them. I have used HashingVectorizer to convert the text data to vectors and then I am feeding this to my model. Now main issue is that my solution will be evaluated on a different dataset containing different keywords. I was thinking of using stemming to reduce keywords to their root words and and remove common words like of,is,are etc by nltk. Then I will be feeding this to vectorizer and in the last I will input these vectors to my model. Is this approach correct. BTW I have split my dataset into 75/25 training and testing sets and results are pretty good but I want to make it more better because I think my technique would fail if there's another dataset with different keywords. Anyone who can guide me?
Eric Leung
@erictleung

@sa-js sounds you're well on your way to analyzing the data! You've already gotten the data in a vector form. Stemming them is a great idea as you've mentioned. NLTK should be able to a lot of this for you as you suggest. I'd agree this is a good enough approach for now.

Also, it looks like NLTK has a built-in classifier you can use https://pythonspot.com/natural-language-processing-prediction/

Here are some other resources that might help:

Good luck!

mridul037
@mridul037
i am new to data science i know python and pandas what next
should i continue with like small project
Philip Durbin
@pdurbin
maybe matplotlib
Alice Jiang
@becausealice2
@mridul037 Like @pdurbin suggests you can learn visualization with matplotlib or you can try learning to work with scikitlearn
Eric Leung
@erictleung

@mridul037 if you want some practice, you can practice going through the collecting data, cleaning/manipulating the data, and visualizing the data workflow.

Here is one such initiative to practice this https://github.com/rfordatascience/tidytuesday

Eric Leung
@erictleung
@mridul037 you can also consider going through challenges on https://www.kaggle.com/. This gives a focused and constrained problem space to work in so that you can practice even more. Good luck!
Praveen Raghuvanshi
@praveenraghuvanshi1512
image.png
I have a CNN model (Conv2D -> Conv2D -> Flatten -> Dense) for a CIFAR-10 dataset with 271,146 parameters. I know its a very small model and intension is to learn the concepts. After training the model, it seems to over-fit. Please share your views on the loss plot shown.
Alice Jiang
@becausealice2
All that chart really suggests to me is that it's overfitting, which you're already aware of.
Josh Goldberg
@GoldbergData
What @becausealice2 said. Try adding regularization?
mmalinda
@mmalinda
Hello, I am a data science student looking to interact with other data scientists. Hope we can learn from each other. I'm interested in health research (specifically bioinformatics), if anyone has any useful resources around that that they can share I would really appreciate it!
Philip Durbin
@pdurbin
I'm not sure if this helps but I know some people at https://informatics.fas.harvard.edu
Eric Leung
@erictleung

@mmalinda welcome!

Bioinformatics related, here are some I've found useful:

If you have specific questions for bioinformatic data science, feel free to ask around here :smile:

mmalinda
@mmalinda

@pdurbin thank you, I am interested in connecting with them if possible.

Thanks @erictleung , I will look into them.