These are chat archives for FreeCodeCamp/DataScience

15th
Jan 2016
Luis Felipe López G.
@luishendrix92
Jan 15 2016 09:30 UTC
@evaristoc I just saw someone posted a link to this datascrapping web called import.io https://magic.import.io/?site=http:%2F%2Ffreecodecamp.com%2Fmrskinny
We could parse the links to the solutions and save them inside a .md file or .js file. Some the way is to extract the part of the url that contains the solution and replace the HTML entities with their right characters.
If there's another way to fetch the solution links, the parsing can still be done with python or node.
Import.io doesn't seem to display bonfires though.
evaristoc
@evaristoc
Jan 15 2016 10:08 UTC
Good, @luishendrix92! I will check!
Luis Felipe López G.
@luishendrix92
Jan 15 2016 10:09 UTC
@evaristoc I made a code to extract the code, but it's not easy to crawl sites with javascript, if you find a way to do it then it's settled.
I'll paste the codepen link in a few minutes
I'll also do the parser in javascript nodejs and python using beautifulsoup but tomorrow, it's 2 am now and I want to sleep
evaristoc
@evaristoc
Jan 15 2016 10:11 UTC
@luishendrix92 I think I have seen import.io before... it is a scraper, isn't it?
an online one...
Luis Felipe López G.
@luishendrix92
Jan 15 2016 10:11 UTC
@evaristoc It is but it's not showing the bonfires
a python crawler would be better
Ive done it before but I want to find a js way
scrapper* not crawler sorry
evaristoc
@evaristoc
Jan 15 2016 10:13 UTC
@luishendrix92 there are already one: @roelver's top100... I also made another one to analyse data about ziplines
But we are trying not to go that way because it put a lot of pressure on the FCC database
A good option would be to check the data that was made available: it has also the solutions included
Luis Felipe López G.
@luishendrix92
Jan 15 2016 10:15 UTC
I only managed to extract 2000 but it would be fairly easier now
Luis Felipe López G.
@luishendrix92
Jan 15 2016 10:15 UTC
I did it with a python script
@evaristoc No, they're from the torrent but I'm 100% sure those are not ALL the bonfire solutions
I had a hard time trying to apply regexp to a big file
so I may have only extracted a small percentage
evaristoc
@evaristoc
Jan 15 2016 10:16 UTC

@luishendix92: I have also a python script... would you like to talk about this? Shall we make a skype or something?

@/all: everyone is also invited to contribute! Make teams, create projects!

@luishendrix92 can you share your script? We can have a look this week...
Luis Felipe López G.
@luishendrix92
Jan 15 2016 10:23 UTC
@evaristoc It is a small script
I did it all with sublime and regexp replacing
but from what I learned
I can polish that script to accomodate for the regexp I did
evaristoc
@evaristoc
Jan 15 2016 10:24 UTC
Yea... load it into github? We can have a look together... that could be of great use for everyone who want to do something with the torrent data!
Luis Felipe López G.
@luishendrix92
Jan 15 2016 10:25 UTC
lol
here's the codepen I just made, I'm gonna CSS-polish it before sleep
and sure I'll see what I can do with the script!
@evaristoc
evaristoc
@evaristoc
Jan 15 2016 10:34 UTC

@luishendrix92 let's keep talking! I will make a note of this for the next Digest, we could make something interesting with this...

Just one thing: I saw the codepen... what is it purpose, if you can tell me?

Anyway: let's keep in touch!
Luis Felipe López G.
@luishendrix92
Jan 15 2016 10:36 UTC
@evaristoc It is a demonstration of a way to get a "view solution" link and convert it to raw javascript code
it can be used for this:
lol
it can easily be done with a scrapper, in case some other solution is beyond reach such as making an API request to FreeCodeCamp's database.
evaristoc
@evaristoc
Jan 15 2016 10:38 UTC
@luishendrix92, @vicky002: Let's work on that the three of us? ^^^
I have also something done that can help... let's plan a meeting...
We can make an application and load it to heroku...
Luis Felipe López G.
@luishendrix92
Jan 15 2016 10:41 UTC
That's a gread idea, I have school this monday so I won't be available for meetings but I can collaborate
in whatever I can, I'm no expert in this matter, I just started FCC some months ago
Luis Felipe López G.
@luishendrix92
Jan 15 2016 10:46 UTC
Anyway I'm off to sleep see y'all later
evaristoc
@evaristoc
Jan 15 2016 11:02 UTC
@luishendrix92 that's good, doctor: we are all in the same situation. Take care!

@luishendrix92, @vicky002 : it is hard to get it because we only have access to heroku at the moment, but I could say we can try to build a recommender?

Something like, after pasting your code in a form as the @luishendrix92's codepen, the recommender could show SIMILAR solutions which could be ordered according to easy-to-get parameters, like length... Very much like: "Check what other people using similar techniques as you did for this bonfire..."

In fact... it would be a great topic for the hackaton... Anyway: I will discuss this with you both later?
Also: @jameswinegar, @jbmartinez? ^^ What do you think?
evaristoc
@evaristoc
Jan 15 2016 11:08 UTC
@quatridunull I am taking in considerations your observations... I might invite you to be part of the observers of the hackaton discussions if you want to participate?
Sunny
@quatridunull
Jan 15 2016 14:26 UTC
@evaristoc Sure, I would love to help however I can. I appreciate that you are considering my ideas!
James Winegar
@jameswinegar
Jan 15 2016 15:07 UTC
@evaristoc the question is what features are relevant. Number of lines/characters doesn't seem like a "similar" indicator. Although there's probably some correlation. I can't think of a feature that would be easy though. Like which functions were called and things like that.
Vikesh Tiwari
@vicky002
Jan 15 2016 15:27 UTC
@luishendrix92 I'm using beautifulsoup to scrap the data and then replacing Html entities with the right characters. I'm almost done with the script. All the codes will be saved in A new file with the name of the problem. It should be done by today. Problem is I'm very busy nowadays internship and college. . Morning 7 to night 11 I stay out .. so Sorry for the late response.
Luis Felipe López G.
@luishendrix92
Jan 15 2016 15:28 UTC
@vicky002 I'm coding a simple restful API using cheeriojs, it would be used to build a web interface for everyone to grab bonfires from anyone.
Vikesh Tiwari
@vicky002
Jan 15 2016 15:29 UTC
Yeah that'd be good
Luis Felipe López G.
@luishendrix92
Jan 15 2016 15:29 UTC
As for the recommending algorithm I can't do it, at most I can load two sets of bonfire solutions side by side for manual comparison.
I don't have access to the database or the official endpoints.
I could do it with the camper top 100 api though
I just fetch the usernames, store them inside an array and look into each's solutions for whatever reason.
Juan Martínez
@jbmartinez
Jan 15 2016 16:45 UTC
@evaristoc The recommending system is an interesting idea, but as @jameswinegar said, the features as a bit tricky. I can think of use of for or while loops, functional programming style, if vs switch, etc. However, I think it needs human intervention to label the data (recognizing functional programming is not so straight forward and for and while loops are not always used with the same intention)
the algorithm would be a nice way to how other people solved a problem after submitting their solution. Some sites just let the user browse for other solutions, which is not fancy but it's effective
evaristoc
@evaristoc
Jan 15 2016 16:51 UTC

@jbmartinez, @luishendrix92, @vicky002, @jameswinegar indeed: we must cluster/classify the solutions somehow and might need some human intervention for sure... perhaps not entirely for a hackaton for people who just only want to code, but for a good project for an API type that could...

Anyway, we could see... at least we have a first idea...

@jbmartinez

the algorithm would be a nice way to how other people solved a problem after submitting their solution. Some sites just let the user browse for other solutions, which is not fancy but it's effective

what do you mean?

@jameswinegar about:

what features are relevant

That would be emerging features, likely specific per bonfire... it won't be an exact thing: it is a recommender, so it will bring you to similar solutions to comparing...

The only thing is it is a post-recommender, so it is not recommending anything, so probably the name recommender is actually wrongly used here...

evaristoc
@evaristoc
Jan 15 2016 16:57 UTC

But we could suggest how similar the solutions are, without saying anything about efficiency though...

For classifying, we could end up using an ad-hoc classes in case we are lack of any other more-formal classes...

evaristoc
@evaristoc
Jan 15 2016 17:04 UTC

Anyway: I will be talking to @vicky002 soon... I also suggested him other project...

And well: @/all are invited to participate!

Juan Martínez
@jbmartinez
Jan 15 2016 17:10 UTC
@evaristoc I was just thinking on applications of the the algorithm and the way it's currently done on most sites. It's a sort of rubber duck thing
Luis Felipe López G.
@luishendrix92
Jan 15 2016 17:21 UTC
@evaristoc @vicky002 The restful API is working, I just need to make adjustments
https://bonfirefetcher-luishendrix92.c9users.io/luishendrix92
for example, some bonfires are repeated not because they were completed more than 1 more time (that'd make them appear n*2 times) but because sometimes, there are two links inside the same table cell (which is annoying) that link to the same place, difference is, one is hidden the other is visible.
evaristoc
@evaristoc
Jan 15 2016 17:24 UTC
@jbmartinez what do you mean with "rubber duck", doctor?
Luis Felipe López G.
@luishendrix92
Jan 15 2016 17:24 UTC
The other thing I want to do is change the methodology I used to get the name of the challenge and use .parent() and .children() and .prev() (these are cheeriojs jquery methods) to avoid using regexp to extract it. As for the date of completion I also need to add it, it will be easy to add once I sort it out.
evaristoc
@evaristoc
Jan 15 2016 17:25 UTC
@jbmartinez: I was looking at internet and it doesn't seem to be what you really wanted to say... :)
@luishendrix92 you are already using cheeriojs? nice...
Luis Felipe López G.
@luishendrix92
Jan 15 2016 17:26 UTC
@evaristoc I just found about it today lol
lol
evaristoc
@evaristoc
Jan 15 2016 17:27 UTC
@luishendrix92 no sleep: man: coffee is taking the best (and the worst) from you!
Luis Felipe López G.
@luishendrix92
Jan 15 2016 17:29 UTC
I don't want the worst to show
:S
evaristoc
@evaristoc
Jan 15 2016 17:29 UTC
@luishendrix92 is that data about only one person?
Luis Felipe López G.
@luishendrix92
Jan 15 2016 17:29 UTC
Once I implement this restful API on codepen, I can do the bonfire retriever and comparator (loading two sets of solutions side by side)
evaristoc
@evaristoc
Jan 15 2016 17:29 UTC
:)
Luis Felipe López G.
@luishendrix92
Jan 15 2016 17:30 UTC
also adding another property called "methods" with an array of methods used to solve them
if the server detects the usage of ES6 for example, it adds "ES6 Evangelist" to the array
evaristoc
@evaristoc
Jan 15 2016 17:30 UTC
@luishendrix92 hahaha! sounds fun...
@luishendrix92 so your plan is using regex...
Luis Felipe López G.
@luishendrix92
Jan 15 2016 17:32 UTC
I don't know another way.
evaristoc
@evaristoc
Jan 15 2016 17:36 UTC

@luishendrix92 sound good... but I think I can discern some pitfalls in the whole project...

For example: to load a sample side by side the user should have first found one to compare...

My proposal would be to classify all them in advance (eg. functional-like solutions, ES6 solutions), and then let the user to pick one classification of interest where several algos could be found...

The person can select any or an specific one of that group, according to his/her preferred coding style, and suggest a comparison...

We could eventually add a "like" or "views" so we can use also that as a classifier for the most popular algos, and the person can compare with those accordingly...

Anyway: I am still waiting for @jbmartinez to explain the "rubber duck" term to me... probably he has a point...
evaristoc
@evaristoc
Jan 15 2016 17:41 UTC
Although at the end, it is also about learning by playing with the data...
At least someone can argue here that after finishing the basejumps the mobile wouldn't stop ringing because clients or work offers...
Juan Martínez
@jbmartinez
Jan 15 2016 17:43 UTC
lol
evaristoc
@evaristoc
Jan 15 2016 17:44 UTC
But really, @jbmartinez : do you know what did I find about "rubber ducking"?
This is a public space, otherwise I would make it explicit...
So what were you trying to tell, actually?
Juan Martínez
@jbmartinez
Jan 15 2016 17:44 UTC
it's a way to think deeper about a problem. Wikipedia explains it for debugging, but it's valid for solving any problem
evaristoc
@evaristoc
Jan 15 2016 17:45 UTC
Ahhh!!! No: if wikipedia says that, then the rest is wrong in the use of the terminology...
ok...
Juan Martínez
@jbmartinez
Jan 15 2016 17:45 UTC
lol
I guess, rubber is used for lots of things :laughing:
evaristoc
@evaristoc
Jan 15 2016 17:46 UTC
@jbmartinez, @luishendrix92
Reading the wikipedia... hilarious! Explaining to the duck!!!
Ok....

Yea... I don't think we are going to get to a corrector as I suggested...

Giving it a second thought, that would be like making an editor for JS in the first place, and that is for me like re-inventing the wheel...

IMO it could be more like a more elaborated extension of the CodeReview chatroom...

evaristoc
@evaristoc
Jan 15 2016 17:52 UTC

@jbmartinez, @luishendrix92
In that chatroom people meet to share further developments and improvements to their codes...

We could bring that to another level: instead of a chatroom, an API... Actually both...

Juan Martínez
@jbmartinez
Jan 15 2016 17:54 UTC
interesting
evaristoc
@evaristoc
Jan 15 2016 17:58 UTC
@jbmartinez ahh... you see? ;)
hehehe!!!
no, really: if you like the idea... we can all try to put some work on it... it is not going to be difficult...
less if we do it together... the only thing is organising that it is the worth part...
And well: there could be other options...
evaristoc
@evaristoc
Jan 15 2016 18:05 UTC
Anyway: @jbmartinez, @luishendrix92, @vicky002, @jameswinegar... let's see? There are already some ideas and some work already done, I believe something will pop up at the end...
Juan Martínez
@jbmartinez
Jan 15 2016 18:08 UTC
yup, although I'll need another rubber duck
evaristoc
@evaristoc
Jan 15 2016 18:08 UTC
:)
Luis Felipe López G.
@luishendrix92
Jan 15 2016 18:23 UTC
Restful API implemented in the codepen. Right now it crashes the browser because it renders the HTML waypoints with cat pictures and everything. I'll do the filtering on the server.
@evaristoc Yeah I've thought about that. I'm just playing for now but I'll make sure to come up with something.
Luis Felipe López G.
@luishendrix92
Jan 15 2016 18:40 UTC
lol Obviously I need to work on getting rid of the CSS/jquery waypoints
I also want to do some data interpretation based on the results I get
evaristoc
@evaristoc
Jan 15 2016 19:05 UTC
@luishendrix92 : your work is good, man!
Luis Felipe López G.
@luishendrix92
Jan 15 2016 19:06 UTC
@evaristoc thankx! For example, each bonfire has completion date
first bonfire date --> 2nd bonfire date = time elapsed
we could calculate the time elapsed for each bonfire and draw a graph with d3
evaristoc
@evaristoc
Jan 15 2016 19:06 UTC
I see you are relying on a scrap... again: not advisable. First put pressure on the site and second: I can tell you they are changing a lot of things right now...
Luis Felipe López G.
@luishendrix92
Jan 15 2016 19:07 UTC
I will make the scrapping algo public, I can extract some of the logic behind to adapt for more robust data
evaristoc
@evaristoc
Jan 15 2016 19:07 UTC
But I see your idea is about getting data from each camper...
Luis Felipe López G.
@luishendrix92
Jan 15 2016 19:07 UTC
Each piece will eventually be in place, but yeah for now, just single camper data
evaristoc
@evaristoc
Jan 15 2016 19:08 UTC
Believe me: it won't be your algo: if for some reason they change something, your scrap is gone...
A common problem for scraps...
Luis Felipe López G.
@luishendrix92
Jan 15 2016 19:08 UTC
Yeah I noticed they changed it for /challenge and as long as "solution=" doesn't go away, I'm safe
if not, the work is doomed
evaristoc
@evaristoc
Jan 15 2016 19:08 UTC
And I can point to you comments from Berkeley made in this room to people before you about the effect on the database...
Although in your case is less because the search is by each camper though...
roelver is calling about 28000 people every week
Luis Felipe López G.
@luishendrix92
Jan 15 2016 19:10 UTC
Okay, then I'll come up with some ideas for a bigger picture algorithm
working with whatever data can be available
evaristoc
@evaristoc
Jan 15 2016 19:10 UTC
I am not saying is bad, eh? I just saying be aware...
Luis Felipe López G.
@luishendrix92
Jan 15 2016 19:11 UTC
I planned on working with 200 samples up to 1000, something not too crazy, and recent
just to do a little set of graphs
evaristoc
@evaristoc
Jan 15 2016 19:11 UTC
We can think a combination of things... the idea I have doesn't mention any name, which is somehow a pity...
Luis Felipe López G.
@luishendrix92
Jan 15 2016 19:11 UTC
I'll do that on my local machine
evaristoc
@evaristoc
Jan 15 2016 19:12 UTC
that would be great! If you get something let us know?
Luis Felipe López G.
@luishendrix92
Jan 15 2016 19:12 UTC
okk, back to my stuff
I will ;)
evaristoc
@evaristoc
Jan 15 2016 19:14 UTC
But it looks nice! I just clicked on "ta' bueno" in codepen!