These are chat archives for FreeCodeCamp/DataScience

20th
Mar 2018
Ajayi Olabode
@boratonAJ
Mar 20 2018 06:39
Hi All, I am new in the field of Data science, and I am given a JSON (https://pastebin.com/gd0vdEBU) file to - show any insight into the data that can be extrapolated
  • If given a profile id, find similar profiles like that one. A combination of similar skills, courses and/or job titles.
  • If given a profile id, recommend what their next job title(s) could be. Can someone please recommend me a good tutorial to assist me in solving this problem?
evaristoc
@evaristoc
Mar 20 2018 08:47

@bigyankarki sorry I have not come back to you but I have been busy with my projects. I will try to answer your interesting question when having time to go through the topic and elaborate?

@sabin20 it seems that you are suspecting that your "z" (what is it, in this particular context?) is affected by the encoding you are using? Is this the reason why you are so stuck on that? Regarding the pandas factorize, the method is likely very well documented and you can surely find examples online? I can refer you to some but I invite you to try to find a bit more and discuss them here? Nice if you share what you had found!

@boratonAJ I haven't opened the file but for the kind of questions it appears that classical recommender systems techniques and IR could be an option. I know a bit about them, but I would personally prefer you to come with questions after being stuck working on your "homework" :) . This is my personal approach. You can always try to ask questions either here or in the forum anyway to see if anyone else can help.

evaristoc
@evaristoc
Mar 20 2018 09:36

PEOPLE

An article from BBC about Artificial Intelligence: a bit of myth buster for many. Errata: the journalist is suggesting NNs as the methodology used for AI, when in practice it might be a combination of several techniques.
http://www.bbc.com/capital/story/20180316-why-a-robot-wont-steal-your-job-yet
Ajayi Olabode
@boratonAJ
Mar 20 2018 09:52
@evaristoc. Thank you for that information. It really a pointer toward solving the problem. Beside, the question to guide me I have included them in my first email. More so, I have been working through some online tutorial but my challenges is that most of the site worked with CSV that allows easy data cleaning and structure. And since I am given JSON file, the structure of the data is structure, and that includes nested list and dictionary. I am struggle to clean the data properly and to a column specific. I would like like to know better how to first process the JSON with nested list and dictionary as well as the kind of schematics type (e.g. Pandas or Series).
CamperBot
@camperbot
Mar 20 2018 09:52
boratonaj sends brownie points to @evaristoc :sparkles: :thumbsup: :sparkles:
:cookie: 413 | @evaristoc |http://www.freecodecamp.org/evaristoc
evaristoc
@evaristoc
Mar 20 2018 11:15

@boratonAJ checked the file. Rather than nesting, you have data points grouped in lists. Use the id as index of the pandas dataframe.

There are different ways to treat the data. I wouldn't personally expand the lists but rather iterate through them and compare the lists. You save some memory that way but there is a performance issue you have to live with if you do so.

If I understand the assignment correctly you will be more than ok with a brute force approach for this project as minimal requirement, much more if the only data you have consists in the 8-10 examples of the json file you showed here. If I am right, I suggest you not to go fancy, but practical.

Success!

evaristoc
@evaristoc
Mar 20 2018 11:24

@boratonAJ
Try this pandas module?
from pandas.io.json import json_normalize

It was used at https://www.kaggle.com/residentmario/exploring-freecodecamp-gitter-messages/data.

Read the documentation of the module. I don't remember if when using the normalized dataset I also had to use json library and json.loads method to convert lists from string to a python's list type?

Success!

Ajayi Olabode
@boratonAJ
Mar 20 2018 11:35
@evaristoc thank you very much. I am going through the link you just send. I will let you know if I encounter issues
CamperBot
@camperbot
Mar 20 2018 11:35
boratonaj sends brownie points to @evaristoc :sparkles: :thumbsup: :sparkles:
:cookie: 414 | @evaristoc |http://www.freecodecamp.org/evaristoc
Bigyan Karki
@bigyankarki
Mar 20 2018 14:29
@evaristoc No problem. Take your time :)
what are you working on these days?
Alice Jiang
@becausealice2
Mar 20 2018 20:39
Hello all! Long time no see :)
Is anyone else feeling dread at the thought of public response to Cambridge Analytica?
Josh Goldberg
@GoldbergData
Mar 20 2018 21:23
This may be an analogue for data science in the sense of physics with the atomic bomb, or chemical ware fare for chemist. Maybe not as dramatic, or is it? @becausealice2
Alice Jiang
@becausealice2
Mar 20 2018 22:41
@GoldbergData I would say it's at least almost as dramatic. They claim it was all BS and they were hyping themselves up to win a potential client. Whether or not it's true, the CEO has been suspended and multiple government and private organizations are investigating and even bickering over who gets to audit them first.
Even if it was all bullshit they were making up for the sake of winning over the client, this story is out there and looks awful for data science. Public perception has been tainted and it's extremely difficult to win it back no matter how much of an outlier this one case is because it's high profile.