Cracking an interview can be a difficult task, especially in these times when there is a cut throat competition in the job market, after having interviewed heavily for Machine Learning and Data Science roles at big companies and startups, I have compiled a detailed list of Data Science Interview Resources, which I can assure will prepare you very well for an upcoming Data Science/ML Interview. And I update the list frequently with hand-picked quality resources, which you can use to prepare for your interviews! Best of luck!
O(f(n))on the left might be referring to Big-O notation, but I'm not sure if you're working in that space of work. But even so within Big-O notation, I'm not sure what the lowercase O is referring to. What area of work is this equation showing up in?
O(f(n))) is equivalent to
f(n)itself. This might make sense in the right context, but right here, it kinda doesn't mean much. I hope the notation explanation can lead you to the right direction. You can read more about time complexities here https://en.wikipedia.org/wiki/Time_complexity#Table_of_common_time_complexities
I have a dataset which is for binary classification ( or at least we are approaching it from a binary classification perspective )
There are a total of 2.5 million rows, with label 0 belonging to around 220000 (2.2 million) rows and label 1 belonging to around 321000 (0.3 million) rows , there are around 45 features.
The imbalance approaches a ratio of around 1 : 7
My problem is very straightforward, even WITHOUT any data preprocessing if i try to classify the data
the classification algorithms, no matter what parameters are set, give around 99% in ALL performance metrics ( accuracy, precision, recall, f1 score etc )
This would probably suggest a bad case of overfitting but i am not sure, feel free to explain and add your opinion to what could be the reason
I tried to visualize the graph using TSNE and saw that the entire data is shaped like an ellipse and there is heavy overlap between both the lables. This means that (1) data is badly imbalanced (2) data is badly overlapped , i highly doubt i can use anomaly detection there as all the 'anomalies' (label 1) are sitting close with the 'normal' (label 0) data
any suggestions on how i should proceed ?
sudo mysql_secure_installation Securing the MySQL server deployment. Enter password for user root: Error: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)
Hey @/all freeCodeCamp is building a data science curriculum with advanced math and data science projects. Learn more here: https://www.freecodecamp.org/news/building-a-data-science-curriculum-with-advanced-math-and-machine-learning/
We are looking for open source contributors and experienced math + CS teachers for (paid) help with instructional design. If you are interested, please reach out to me at firstname.lastname@example.org
I digitized some roads as multilines, hospitals as multipoints, boundary as polygon, then created how many roads intersect using the Simple Features (SF) library by getting latitudes and longitudes from google maps and plotted it using ggplot2 it worked well.
I then wanted to check and plot how many roads intersects with a hospital and created a 200mtr buffer around it and tried using st_intersects() function for the same, using this only gave 1:1 as answer and a message saying
Sparse geometry binary predicate list of length 1, where the predicate was `intersects' 1: 1
And when I tried plotting it, using ggplot it gives this error message
Error: data must be a data frame, or other object coercible by fortify(), not an S3 object with class sgbp/list Run rlang::last_error() to see where the error occurred.
I have added more details and code in a Stackoverflow question, please help please help 🥺.