These are chat archives for FreeCodeCamp/DataScience

1st
Mar 2016
bitgrower
@bitgrower
Mar 01 2016 02:51
oh man ... I've found the oasis !
Brian
@sludge256
Mar 01 2016 02:58
Population: you
bitgrower
@bitgrower
Mar 01 2016 02:59
well, currently, and you ...
evaristoc
@evaristoc
Mar 01 2016 10:39

@sludge256 thanks for being around :)

@bitgrower keep in touch

@SaintPeter : I made some preliminary cleaning of the text and I found that for this exercise uglifying might not be necessary; a simple parser based on a couple of regex commands to "normalise" the text and eliminating things like comments can be easily implemented; also the data would require to consider all words and even characters but semi-colons, as they are meaningful for a working code in JavaScript.
My idea is to attack this problem using what it is called latent semantic indexing. This is probably one of the simplest solutions for this exercise. For the use of SVD for text mining you can simply read this page.
For more advanced solutions, a JS parser could be needed.

CamperBot
@camperbot
Mar 01 2016 10:39
evaristoc sends brownie points to @sludge256 and @bitgrower and @saintpeter :sparkles: :thumbsup: :sparkles:
:star: 2595 | @saintpeter | http://www.freecodecamp.com/saintpeter
:star: 1729 | @sludge256 | http://www.freecodecamp.com/sludge256
:star: 678 | @bitgrower | http://www.freecodecamp.com/bitgrower
evaristoc
@evaristoc
Mar 01 2016 10:59
@SaintPeter I am still not sure if using pure JS code for the exercise or instead using a python->js or js->python libraries... There is also another interesting exercise I found somewhere that was using web workers to involve C/python code into JS.
Rex Schrader
@SaintPeter
Mar 01 2016 16:08
@evaristoc I actually prefer a slightly lazier analysis. I might search for /\bkeyword\b/ to ensure we're getting a whole keyword and not a partial (IE: for in format. I could also see slightly more complex regexes for things like /for\s*\( or /if\s*\(/. I think even a naive analysis can give us important information. Manual review of outliers will enable us to see if our assumptions are bad.
I like uglifier, though, because it eliminates the possibility of prople using var string = some value or var array = [1,2,3] being false hits.
It also is something to be said for a tried and true uglifier used in production to remove comments and do other translation for us - it means we don't have to "reinvent the wheel".
@evaristoc I use "uglifier" as an example, but really, any context aware JS minmifier would be sufficient. It would need to be able to run as a function, though, not on a file. I have no idea if such a thing even exists.
Rex Schrader
@SaintPeter
Mar 01 2016 16:15
https://github.com/mishoo/UglifyJS2#the-hard-way Describes how to read in code as a string and later on they desribe out to output it as a string (https://github.com/mishoo/UglifyJS2#generating-output).
evaristoc
@evaristoc
Mar 01 2016 16:59

@SaintPeter Agree with you that for the moment we don't know if a simpler characterisation would be enough, in particular if you are making a similarity analysis within one algorithm as group (eg. all palindromes).

Yes: apart of simple parsing, the uglifier tries to build more info around the code. Thanks for the links!

For the approach I am thinking (comparing only exercises of same algorithm), an uglifier seems to be an overkill, I reached the same conclusion as you already mentioned. I am not sure about minmifier.

It is possible that the effectiveness of this approach wouldn't be enough if the project is to compare solutions for DIFFERENT algorithms, or, as I proposed to someone in this room few days ago, to compare more complex projects like codepens.

Anyway: I haven't done any comparison yet (only preparing data and the couple of parsing regex I mentioned...).

CamperBot
@camperbot
Mar 01 2016 16:59
evaristoc sends brownie points to @saintpeter :sparkles: :thumbsup: :sparkles:
:star: 2596 | @saintpeter | http://www.freecodecamp.com/saintpeter
evaristoc
@evaristoc
Mar 01 2016 17:47
@SaintPeter For the analysis you suggest, the role of the SVD will be used to reduce the noise of codes being closely similar but differing in few words/characters. Additionally, The SVD can be used as classifier. Differently to a "bag-of-words" analysis, you would be able to search using SVD equation, something that you cannot do easily with a simple "bag-of-words". That means: part of your interested for a future project could be somewhat solved by this approach.
Rex Schrader
@SaintPeter
Mar 01 2016 18:28
@evaristoc I don't know what SVD is . . . but I'll take your word for it. ;)
evaristoc
@evaristoc
Mar 01 2016 18:31
@SaintPeter hehehe! Singular Value Decomposition: it is a matrix transformation
SaintPeter @SaintPeter is still lost
evaristoc
@evaristoc
Mar 01 2016 18:31
@SaintPeter :)
Rex Schrader
@SaintPeter
Mar 01 2016 18:31
I'll take it as granted that what you propose would be useful in this analysis
evaristoc
@evaristoc
Mar 01 2016 18:35
@SaintPeter Thanks for your suggestions anyway. What you suggest is actually the first step for any eventual solution: creating a matrix of bag-of-words.
CamperBot
@camperbot
Mar 01 2016 18:35
evaristoc sends brownie points to @saintpeter :sparkles: :thumbsup: :sparkles:
:star: 2597 | @saintpeter | http://www.freecodecamp.com/saintpeter
Rex Schrader
@SaintPeter
Mar 01 2016 18:39
@evaristoc What I aim imagining is that for some of the early Algos, probably 80-90% are identical - they're so simple they almost have to be. I expect to see "clumps" of solutions. I especially think this would be helptful to see if we see changes in types of solutions after we change JS curriculum.
evaristoc
@evaristoc
Mar 01 2016 18:45

@SaintPeter yes... in fact my original idea was to get a bit further and try to see if we can start exploring a "bot"( :) ) providing advice about code direction. If someone is working on the code in an imperative way, how close the solution is? Is the camper interested in a functional solution? Then, what else it is needed? Can we create an algorithm that would advice where to find information considering the current or the desired shape of the code? - This is going to be harder now that we are only collecting updated codes, not saving all... also, here we need to use some of the things the uglifier is meant for, and more...

Anyway, evaluating the final codes would help us to evaluate if what we are suggesting are the only things the campers are using, and if on the contrary we should think about providing additional information.

Let's the People speak without speaking.

evaristoc
@evaristoc
Mar 01 2016 18:53
@SaintPeter about the similarity... I would beg that the advance ones look also very similar: in how many ways can you code the Person algo? For what I saw, advance algos are more about concept implementation, not difficulty...
I think the most differences will be found in those algorithms the people ask more about...
But you see? Here I don't really know...
Rex Schrader
@SaintPeter
Mar 01 2016 18:54
Quick! To the Data-mobile!
evaristoc
@evaristoc
Mar 01 2016 18:55
??
Rex Schrader
@SaintPeter
Mar 01 2016 19:07
ERm - I mean to say that once we have data we'll be able to answer questions like this.
Sorry, I communicate primarily in cultural references that do not translate well
evaristoc
@evaristoc
Mar 01 2016 19:24
@SaintPeter Ahh!! Batman!!! :) :) :) Sorry, perhaps because I am in the other side of the ocean that I am getting the joke with delay...
The Data-mobile... lol!
evaristoc
@evaristoc
Mar 01 2016 19:33

@SaintPeter there is still some data to work/approximate some stuff...

But the most important is that we have some assumptions that can be (or not) corroborated with the data available.