These are chat archives for FreeCodeCamp/DataScience

7th
Mar 2018
Eric Leung
@erictleung
Mar 07 2018 01:06
@tesla809 there currently is no Python track for freeCodeCamp. Do you mean on the development of the Python track?
Alice Jiang
@becausealice2
Mar 07 2018 03:23
They were talking about adding Python, it might be on the beta site. I'm away from computers or I'd check.
Anurag Lahon
@anuraglahon16
Mar 07 2018 08:15
''' import pandas as pd
import numpy as np
import statsmodels.api as sm
data= np.loadtxt('bostontrain.csv',delimiter=',')
X=data[:,0:13]
Y=data[:,13]
X = sm.add_constant(X)
est = sm.OLS(Y, X).fit()
test_data=np.loadtxt('bostontest.csv',delimiter=',')
test_predict=est.predict(test_data)
np.savetxt('anuraglahon.csv',test_predict,fmt='%1.5f') '''
CamperBot
@camperbot
Mar 07 2018 08:15
:bulb: to format code use backticks! ``` more info
Anurag Lahon
@anuraglahon16
Mar 07 2018 08:15
import pandas as pd import numpy as np import statsmodels.api as sm data= np.loadtxt('bostontrain.csv',delimiter=',') X=data[:,0:13] Y=data[:,13] X = sm.add_constant(X) est = sm.OLS(Y, X).fit() test_data=np.loadtxt('bostontest.csv',delimiter=',') test_predict=est.predict(test_data) np.savetxt('anuraglahon.csv',test_predict,fmt='%1.5f')
I wan to train and test can anyone help me?
*want
evaristoc
@evaristoc
Mar 07 2018 14:11

PEOPLE

Trying to improve the model I was working for analysing text.

The previous method, counting words, worked fine for text of about 300 words or less, but it was not that easy when working with more words.

For your info, the method is not mine: it is based on concepts by Luhn.

Now I am combining the Luhn method with a entity-centric one.

I am also displaying in browser and terminal.

Still a work in progress and messy as always. But hope better than just reading through the full text.

visualtextanalysis.png

Josh Goldberg
@GoldbergData
Mar 07 2018 14:27
Nice. Is that your output into the browser? Or is that interactive? @evaristoc
evaristoc
@evaristoc
Mar 07 2018 14:31
@GoldbergData It is still the output into the browser. I just started working on this.
I am interacting a bit more through terminal for now.
Josh Goldberg
@GoldbergData
Mar 07 2018 14:34
Ah okay. @evaristoc
Eric Leung
@erictleung
Mar 07 2018 15:12
@anuraglahon16 using scikit-learn, looks like you can use sklearn.model_selection.train_test_split. All training and test sets are just sampling and splitting. It's been a while since I've done Python, but just choose how you wanna split all your data and select the rows based on the indices you've chosen.
@evaristoc nice! Thanks for sharing your work in progress. Very cool :+1:
CamperBot
@camperbot
Mar 07 2018 15:13
erictleung sends brownie points to @evaristoc :sparkles: :thumbsup: :sparkles:
:cookie: 406 | @evaristoc |http://www.freecodecamp.org/evaristoc
evaristoc
@evaristoc
Mar 07 2018 16:33

@erictleung thanks man!

@GoldbergData probably adding something like this: https://stackoverflow.com/questions/28820551/interactive-selection-highlighting-of-text-inside-the-browser

I would introduce a word selection option (probably a tick option) to mark a fixed number of characters (before and after the selected keyword present in text) while making the rest of the text less visible (opacity).

CamperBot
@camperbot
Mar 07 2018 16:33
evaristoc sends brownie points to @erictleung and @goldbergdata :sparkles: :thumbsup: :sparkles:
:cookie: 134 | @goldbergdata |http://www.freecodecamp.org/goldbergdata
:cookie: 576 | @erictleung |http://www.freecodecamp.org/erictleung
evaristoc
@evaristoc
Mar 07 2018 16:34
jQuery. Let's keep it simple.
Doing that, I probably won't need the terminal to analyse the text any more.

If I get fancy, I could include a choice to fill in a Google spreadsheet with sentences I am interested in.

Anyway... My main goal is to progress on this so I won't work on any procedure if it takes time. Not really the core of the project right now.

Sorry. Wrong chat.
Bigyan Karki
@bigyankarki
Mar 07 2018 19:39
def error(x, y, initial_thetta):
    error_sum = 0
    m, n = x.shape
    for i in range(1, m):
        error_sum = (initial_thetta.T[i] * x[i] - y[i] ** 2)
    return (error_sum) / (2 * m)
can anyone look at this code, and let me know if I am heading in the right direction? I am implementing multivariate linear regression in a small dataset with 700 sample
and that function is sum of squared function.
I feel like its working right, but i am not completely sure.
Bigyan Karki
@bigyankarki
Mar 07 2018 19:48
def error(x, y, initial_thetta):
    error_sum = 0
    m, n = x.shape
    for i in range(1, m):
        error_sum += (initial_thetta.T[i] * x[i] - y[i]) ** 2
    return (error_sum) / (2 * m)
```
Initial erorr is [1.73828125e-01 1.15221354e+01 7.75501562e+03 2.54704036e+03
 3.29631510e+02 9.78108268e+03 5.30021348e+02 1.48438496e-01
 6.07216146e+02]
why am i getting output in exponents?
Alice Jiang
@becausealice2
Mar 07 2018 19:53
Hey all! I'm working on rebuilding this map to see if I can functionality and speed improved. I know @evaristoc Has used it in one of his writings from way back, and I tried to use it as my single project portfolio in a job interview, but URLs were wrong and the best I could manage was the cached thumbnail of the pen from my phone :unamused:
I got the job, but the guy at one point admitted I was a desperation hire, so I'm not sure how that works out for me...
It probably won't get much attention until later this month, but I've fixed the broken URLs so at least now it's something that can be looked at... :/
evaristoc
@evaristoc
Mar 07 2018 21:59

@Sprinting interesting... I haven't competed that much to be honest. I would probably check the data though. It is a topic I used to analyse a lot before.

@bigyankarki

Hope this helps.

evaristoc
@evaristoc
Mar 07 2018 22:41
@becausealice2 cooooooooool!!!!
@becausealice2 If you have twitter, I would suggest you to send it to the freeCodeCamp account?
I would add the approx. date the map was made.
Alice Jiang
@becausealice2
Mar 07 2018 22:43
The data is super old, though. I never rebuilt the scraper.
They kept moving the list around :/
evaristoc
@evaristoc
Mar 07 2018 22:45
Doesn't matter. Just write you were dusting your old projects and found this one in your archive, @becausealice2
Alice Jiang
@becausealice2
Mar 07 2018 22:48
I'll wait until I get zoom working correctly, but I'll do that :)