Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
  • May 14 2020 22:39
    @bjorno43 banned @minitechtips_twitter
  • May 14 2020 22:37
    @bjorno43 banned @real-action
  • Feb 01 2020 00:26
    @bjorno43 banned @Ndoua
  • Jan 07 2020 03:10
    @bjorno43 banned @doctor-sam
  • Oct 02 2019 18:47
    sarony removed as member
  • Oct 02 2019 17:45
    erictleung commented #82
  • Aug 15 2019 11:17
    FrednandFuria opened #82
  • Jun 20 2019 21:19
    @bjorno43 banned @shenerd140
  • May 10 2019 09:13
    @bjorno43 banned @zhaokunhaoa
  • Apr 27 2019 19:48
    @mstellaluna banned @zhonghuacx
  • Apr 25 2019 17:07
    @mstellaluna banned @cmal
  • Jan 08 2019 22:07
    @mstellaluna banned @gautam1858
  • Jan 08 2019 22:05
    @mstellaluna banned @dertiuss323
  • Dec 15 2018 23:34
    @mstellaluna banned @Julianna7x_gitlab
  • Oct 12 2018 05:50
    @bjorno43 banned @NACH74
  • Oct 05 2018 23:02
    @mstellaluna banned @JomoPipi
  • Sep 16 2018 12:21
    @bjorno43 banned @yash-kedia
  • Sep 16 2018 12:16
    @bjorno43 banned @vnikifirov
  • Sep 05 2018 08:13
    User @bjorno43 unbanned @androuino
  • Sep 05 2018 07:38
    @bjorno43 banned @androuino
Rahul Bhatia

Cracking an interview can be a difficult task, especially in these times when there is a cut throat competition in the job market, after having interviewed heavily for Machine Learning and Data Science roles at big companies and startups, I have compiled a detailed list of Data Science Interview Resources, which I can assure will prepare you very well for an upcoming Data Science/ML Interview. And I update the list frequently with hand-picked quality resources, which you can use to prepare for your interviews! Best of luck!


Dr. Muhammad Anjum
@alexn11 hi sir
Alexandre De Zotti
@anjumuaf123_twitter Hi sorry i'm quite busy at the moment, I cannot really answer your questions. I hope you have made some progress.
discord datascience best channel! i am out!
Eric Leung
Even if you don't understand methylation (I don't), this recent Twitter thread is a nice lesson in data leakage when training your machine learning models https://twitter.com/jmschreiber91/status/1291161574393221123 It contains screenshots of Python code as well so you can play around with it yourself.
Hello, can someone here explain a mathematical notation for me
Eric Leung
@hassanalt hey there! What kind of math notation are you looking for? As a first look, I'd search through this Wikipedia page or this GitHub page for how to decipher/translate some math notation to plain words
@erictleung I was wondering what this meant o(O(f(n))) = o(f(n))
Eric Leung
@hassanalt depends on the context. The O(f(n)) on the left might be referring to Big-O notation, but I'm not sure if you're working in that space of work. But even so within Big-O notation, I'm not sure what the lowercase O is referring to. What area of work is this equation showing up in?
I simply saw it in my lecture and the professor did not quite explain it.
Was in a list of assertions
Eric Leung
@hassanalt mmm okay. Well, the little "o" may mean little o notation, which is the loose upper bound of how fast an algorithm can go https://www.geeksforgeeks.org/analysis-of-algorithems-little-o-and-little-omega-notations/ So it appears that the general runtime of f(n) (O(f(n))) is equivalent to f(n) itself. This might make sense in the right context, but right here, it kinda doesn't mean much. I hope the notation explanation can lead you to the right direction. You can read more about time complexities here https://en.wikipedia.org/wiki/Time_complexity#Table_of_common_time_complexities
HI Team, I am trying to make a recommendation engine for a food delivery app. I was going through different things like Market Basket Analyisis etc.. Company have huge amount of data for orders from different customers. Could you please help me on where to start?
Eric Leung
@dinaklal you could look at association rule learning https://en.wikipedia.org/wiki/Association_rule_learning and a better way to narrow down how to implement this (or any other method you choose) is to choose a programming language (e.g., Python or R) and then look for packages that can do product recommendations and start building a minimal workingg example before finding tuning it
Kobi Bar Hanin
Hi! I’ve built an interactive git cli - igit. Check it out: https://github.com/kobibarhanin/igit to install: pip install igit
Vijish Madhavan
Introducing ArtLine, create amazing Line Art Portraits.
hello ,i am from this url address https://github.com/freeCodeCamp/open-data/tree/master/open-api, and I really need some datasets about students' performance to do some knowledge tracing reasearch. Is there anybody could tell me how can I get "api key"?
Eric Leung
@935462955 thanks for your interest! I don't believe the API is fully developed. However, that repository does have a bunch of other datasets you can consider.
Is that all the FCC data? The data in recent years may be helpful to me, but I don't seem to find it
Eric Leung
@935462955 that should be all the data that we're allowing to the public for now
Alice Jiang
Long time, no see. Do we have any of our regulars around anymore?
Abdul Qoyyuum
I wanted to log back in to FreeCodeCamp but no matter what I try to login with, it won't let me.
I can't find a "Forget password" or a way to contact to reset my account. Can anyone help?
Dr. Muhammad Anjum
@Qoyyuum Dear no other way , you can create new password
2 replies
@Qoyyuum If you 'sign in' with other auth services like google account or github which has the same email address as your first registered, FCC will find you as same person/account
3 replies
Dhwaj Sharma
Muhammad Yasir
i have a question regarding dataset imbalance
so i will elaborate
Muhammad Yasir

I have a dataset which is for binary classification ( or at least we are approaching it from a binary classification perspective )

There are a total of 2.5 million rows, with label 0 belonging to around 220000 (2.2 million) rows and label 1 belonging to around 321000 (0.3 million) rows , there are around 45 features.

The imbalance approaches a ratio of around 1 : 7

My problem is very straightforward, even WITHOUT any data preprocessing if i try to classify the data

the classification algorithms, no matter what parameters are set, give around 99% in ALL performance metrics ( accuracy, precision, recall, f1 score etc )

This would probably suggest a bad case of overfitting but i am not sure, feel free to explain and add your opinion to what could be the reason

I tried to visualize the graph using TSNE and saw that the entire data is shaped like an ellipse and there is heavy overlap between both the lables. This means that (1) data is badly imbalanced (2) data is badly overlapped , i highly doubt i can use anomaly detection there as all the 'anomalies' (label 1) are sitting close with the 'normal' (label 0) data

any suggestions on how i should proceed ?

7 replies
Dr. Muhammad Anjum
@SyedMuhamadYasir Hi Dear I need some help plz
Hi, I need help with the API Key
HELP any chance somebody can help with this mysql install/config/socket error?
I am running "mysql_secure_installation" and getting
sudo mysql_secure_installation

Securing the MySQL server deployment.

Enter password for user root:
Error: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)
Michael Li 🚀Publish Reproducible Jupyter Notebook
Does anyone else find Matplotlib's API hard to remember? I have friends who just export their Pandas data and plot with Excel. I spend a lot of time googling Matplotlib help. How do you make your Python Data Plots?
2 replies
Josh Goldberg
Yes. Matplotlib is not an intuitive API in my opinion. I prefer ggplot2.
@becausealice2 @erictleung Still here after all these years!
Alice Jiang
@GoldbergData Kinda sorta.... I check in every once in a while but I've made a career change and have been buy trying to keep up with all that comes with that. How have you been? any interesting projects?
I have a question. Lets say a car can have full trust output range [0,1] and can stire to left and right output range [-1,1] and an ANN should find the right values. Lets say the ANN should have the output [trust,stire angle]. Can i just split the raw output of the last linear layer and apply sigmoid to first output and tanh to 2nd output or has the activation to be the same for all entries of the output of the last linear layer?
Quincy Larson

Hey @/all freeCodeCamp is building a data science curriculum with advanced math and data science projects. Learn more here: https://www.freecodecamp.org/news/building-a-data-science-curriculum-with-advanced-math-and-machine-learning/

We are looking for open source contributors and experienced math + CS teachers for (paid) help with instructional design. If you are interested, please reach out to me at quincy@freecodecamp.org

Eric Leung
@theunknown22:matrix.org that's an interesting approach, where you'd apply different activation functions on nodes in a single layer. It appears to be possible, at least in PyTorch https://discuss.pytorch.org/t/control-specific-nodes-in-the-layer/78992. It hypothetically could help and give you more range on what is possible for it to predict. But it will make it more difficult to understand. I hope that helps.
Josh Goldberg
@becausealice2 I still work in the field. No interesting open source projects at this time.
Piyush Hirapara
How hard is it to animate bivariate Gaussian distribution by varying mean, individual variance and covariance?
In Python
Eric Leung
@GoldbergData good to see you around! I couldn't help notice you're at Amazon now. You working mostly in R or Python (or none of the above) these days?
Eric Leung
@Piyush-97 if you're using Jupyter Notebooks, you could consider using Jupyter Widgets to create a slider that can vary mean, variance, and maybe covariance for a distribution you want to create. You can probably then recreate visualizations like this https://rpsychologist.com/cohend/

I digitized some roads as multilines, hospitals as multipoints, boundary as polygon, then created how many roads intersect using the Simple Features (SF) library by getting latitudes and longitudes from google maps and plotted it using ggplot2 it worked well.

I then wanted to check and plot how many roads intersects with a hospital and created a 200mtr buffer around it and tried using st_intersects() function for the same, using this only gave 1:1 as answer and a message saying

Sparse geometry binary predicate list of length 1, where the predicate was `intersects' 1: 1

And when I tried plotting it, using ggplot it gives this error message

Error: data must be a data frame, or other object coercible by fortify(), not an S3 object with class sgbp/list Run rlang::last_error() to see where the error occurred.

I have added more details and code in a Stackoverflow question, please help please help 🥺.

Link as Plaintext: https://stackoverflow.com/questions/67350113/unable-to-plot-intersections-using-st-intersects-in-r

Alice Jiang
@GoldbergData Good on you! I was losing my mind over the local culture in the field and finally just threw in the towel. It's been a couple years since I looked at any data and I've honestly been missing it. I may take it back up as a hobby just to scratch that itch.
Hi I am new to the concept of gitter / channels. Is it ok to ask here just right away any question related to datascience, e.g. regarding a lavaan CFA model?