Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jan 07 03:10
    @bjorno43 banned @doctor-sam
  • Oct 02 2019 18:47
    sarony removed as member
  • Oct 02 2019 17:45
    erictleung commented #82
  • Aug 15 2019 11:17
    FrednandFuria opened #82
  • Jun 20 2019 21:19
    @bjorno43 banned @shenerd140
  • May 10 2019 09:13
    @bjorno43 banned @zhaokunhaoa
  • Apr 27 2019 19:48
    @mstellaluna banned @zhonghuacx
  • Apr 25 2019 17:07
    @mstellaluna banned @cmal
  • Jan 08 2019 22:07
    @mstellaluna banned @gautam1858
  • Jan 08 2019 22:05
    @mstellaluna banned @dertiuss323
  • Dec 15 2018 23:34
    @mstellaluna banned @Julianna7x_gitlab
  • Oct 12 2018 05:50
    @bjorno43 banned @NACH74
  • Oct 05 2018 23:02
    @mstellaluna banned @JomoPipi
  • Sep 16 2018 12:21
    @bjorno43 banned @yash-kedia
  • Sep 16 2018 12:16
    @bjorno43 banned @vnikifirov
  • Sep 05 2018 08:13
    User @bjorno43 unbanned @androuino
  • Sep 05 2018 07:38
    @bjorno43 banned @androuino
  • Aug 23 2018 16:58
    User @bjorno43 unbanned @rahuldkjain
  • Aug 23 2018 16:23
    @bjorno43 banned @rahuldkjain
  • Jul 29 2018 14:15
    User @bjorno43 unbanned @jkyereh
Eric Leung
@erictleung

@mridul037 if you want some practice, you can practice going through the collecting data, cleaning/manipulating the data, and visualizing the data workflow.

Here is one such initiative to practice this https://github.com/rfordatascience/tidytuesday

Eric Leung
@erictleung
@mridul037 you can also consider going through challenges on https://www.kaggle.com/. This gives a focused and constrained problem space to work in so that you can practice even more. Good luck!
Praveen Raghuvanshi
@praveenraghuvanshi1512
image.png
I have a CNN model (Conv2D -> Conv2D -> Flatten -> Dense) for a CIFAR-10 dataset with 271,146 parameters. I know its a very small model and intension is to learn the concepts. After training the model, it seems to over-fit. Please share your views on the loss plot shown.
Alice Jiang
@becausealice2
All that chart really suggests to me is that it's overfitting, which you're already aware of.
Josh Goldberg
@GoldbergData
What @becausealice2 said. Try adding regularization?
mmalinda
@mmalinda
Hello, I am a data science student looking to interact with other data scientists. Hope we can learn from each other. I'm interested in health research (specifically bioinformatics), if anyone has any useful resources around that that they can share I would really appreciate it!
Philip Durbin
@pdurbin
I'm not sure if this helps but I know some people at https://informatics.fas.harvard.edu
Eric Leung
@erictleung

@mmalinda welcome!

Bioinformatics related, here are some I've found useful:

If you have specific questions for bioinformatic data science, feel free to ask around here :smile:

mmalinda
@mmalinda

@pdurbin thank you, I am interested in connecting with them if possible.

Thanks @erictleung , I will look into them.

Philip Durbin
@pdurbin
Ok, let me know.
Hèlen Grives
@mesmoiron
Hi, long time no see. But good to hear you are all still around. I’m currently enrolled in a tech startup program. I have met great entrepreneurs who mentor us. I also met a few data scientists thus so far good news. I also want to thank you because without your support I would never have jumped to the opportunity. I hope to catch up a little while loaded with assignments. I have to incorporate this year thus a lot of work to do. Have a great weekend.
Philip Durbin
@pdurbin
you too
Eric Leung
@erictleung
@mesmoiron long time no see! Good to hear you're doing well :+1: Feel free to share any cool things you've learned along the way :smile:
Eric Leung
@erictleung

Hey ya'll. I did fun Twitter analysis of the Disney+ streaming announcement. Here's a nice clean plot focusing on just Pixar films over time and the number of favorites it got on Twitter in the past week.

Here's the code and other plots I made for those interested https://github.com/erictleung/disneyplus-twitter-analysis
frankieliu
@frankieliu
Hello, could anyone tell me guidelines for this group, I did a search on machine learning and this group showed up, not sure where to read about suitable topics for discussion and rules about this group. And if anyone knows a similar active group to talk about machine learning problems please forward them. Thanks.
Alice Jiang
@becausealice2
@frankieliu there's not much to the guidelines here, just be friendly, don't veer too far off topic, and no self-promotion. You can have a quick read through the Code of Conduct if you'd like :)
frankieliu
@frankieliu
@becausealice2 thanks for the response, what are suitable topics to discuss for this group?
Alice Jiang
@becausealice2
Of course! Anything and everything Data Science related is encouraged here, including things like datasets, machine learning, AI, visualization, methods, languages and libraries....
Eric Leung
@erictleung
@frankieliu welcome! Feel free to use this room as like a sounding board for ideas or general help. We have a wide range of interests and expertise. I don't think it would be too out of scope to share your journey of machine learning here as well. Are you studying anything in particular in machine learning right now?

...there's not much to the guidelines here, just be friendly, don't veer too far off topic, and no self-promotion. You can have a quick read through the Code of Conduct if you'd like :)

"...no self-promotion." @becausealice2 Lol whoops. I hope my earlier link wasn't too self-promotion-y! (Aren't I a moderator too? I should know the bounds too haha.) Just sharing an analysis I thought others might be interested in seeing :smile:

I've been trying to review some probability and statistics and came across this https://stanford.edu/~shervine/teaching/cme-106/cheatsheet-statistics It is a little math-y, but I like cheatsheets because they give some roadmap for things to further investigate.
Alice Jiang
@becausealice2
Sharing analyses done without expectation of compensation just doesn't feel like self-promotion to me, especially in a Gitter channel focused on data science. It would be a very stale channel if we couldn't share pet analyses here. Maybe I'm misunderstanding the rules but I've been interpreting "self-promotion" as being anyone coming in specifically to direct attention to their product they've come to peddle.
Alice Jiang
@becausealice2
You guys ever right click a youtube video and watch "stats for nerds"?
Eric Leung
@erictleung

@becausealice2 that's fair, my thoughts exactly :smile:

You guys ever right click a youtube video and watch "stats for nerds"?

Sometimes! Never really sat down to learn about them though.

frankieliu
@frankieliu
@erictleung and @becausealice2 thanks for the response. Anyone in the SF bay area (silicon valley) would like to get together and go through Ian Goodfellow's book on Deep Learning -- or any other such books?
Eric Leung
@erictleung
@frankieliu I'm on the west coast but not in SF. I would be interested in Goodfellow's book. I have been casually reading it.
frankieliu
@frankieliu
@erictleung Great, let's get started maybe get some momentum going, I will get a github going and we can post questions to each other.
Eric Leung
@erictleung
@frankieliu sounds good. We could create issues for discussions and contribute to the repo with collective notes and code :+1:
Eric Leung
@erictleung
For those wanting to dig deeper into the "black boxes" of machine learning, here's a book that might be a good resource https://christophm.github.io/interpretable-ml-book/.
AlexKara
@AlexKara
@frankieliu sounds interesting! did you guys start yet? which books are on the list?
Ruksar Kachchhi
@ruksarjk
Hi
Can you use a decision tree to identify images?
Alice Jiang
@becausealice2
I'm sure it's possible but it sounds inefficient
frankieliu
@frankieliu
@AlexKara started writing notes in https://frankliu.org/dlbook-ig let me know if you can access it
milo_An
@Adser89
hi .. im wondering which is the most optimal way of read several text files inside a folder simultaneously. And then converting them into individual dfs ...
# Create en empty Dict
file_dict = {}

txt_files = [i for i in os.listdir(direct + '\\' + my_dir_path) if os.path.splitext(i)[1] == ext]

# Iterate over the txt files selected.
for f in txt_files:
    # print(f)
    # Open them an assign to the empty Dict.
    with open(os.path.join(direct + '\\' + my_dir_path, f)) as file_object:
        file_dict['mic' + f.strip('.txt')] = file_object.read()
i did this creating a dictionary in where the filename is the key and the content the value. But then how can i convert this into individual dFs... is there a better way?
AlexKara
@AlexKara
@frankieliu can access, feel free to pm w books on the list, happy to join in + collaborate on questions
l0k3ndr
@l0k3ndr
Hi guys, you can check out my repo of mostly python notes here :- https://github.com/l0k3ndr/programming-notes
Also, feel free to ping hi, if you are working on something and just need a sound board for discussion.. especially datascience, ML, algorithm, DS and anything pythonic :D
Alice Jiang
@becausealice2
@Adser89 I don't know if you've found an answer to your question, and I am not able to help, but you can try asking in the Programming Help category of FreeCodeCamp's forum
HARISH GONTU
@HG1112
Hello , I have been working with Spark for my past two years . I am fascinated by its ease of use and the performance . I want to learn about its internals . I found some online resources but I do not understand in which I should proceed to dismantle the code . Whenever I want to learn from others code , I go to main function and then move inside . In this case , it is confusing . Any suggestions
Alice Jiang
@becausealice2
Oh! I love Spark! I've never looked at it's internals, though. Have you tried looking for a developer blog or some kind of explanation for the code? @HG1112
Alice Jiang
@becausealice2
Has anyone here seen The Fix on Netflix? I have mixed-but-leaning-poor feelings about the data they present and especially with the way they present it. :/
Alice Jiang
@becausealice2
Okay so their data expert just presented data on my home state and the information she gave was correct, but had nothing to do with the subject she was trying to provide insight on. I'm actually in awe of how poor her research was...
Alice Jiang
@becausealice2
If you guys are still planning on having a resource discussion group, might I suggest either the Data Science or Machine Learning subforums? The FCC forum already has a bit of momentum and it would allow a more broad participation. Besides, those subforums could use a little push into activity :wink:
Philip Durbin
@pdurbin
makes sense
Alice Jiang
@becausealice2
Heya @pdurbin ! Long time, no see! You should head on over as well and see if there's anything you can ask or contribute as well :wink: