@tesla809 there currently is no Python track for freeCodeCamp. Do you mean on the development of the Python track?

They were talking about adding Python, it might be on the beta site. I'm away from computers or I'd check.

''' import pandas as pd

import numpy as np

import statsmodels.api as sm

data= np.loadtxt('bostontrain.csv',delimiter=',')

X=data[:,0:13]

Y=data[:,13]

X = sm.add_constant(X)

est = sm.OLS(Y, X).fit()

test_data=np.loadtxt('bostontest.csv',delimiter=',')

test_predict=est.predict(test_data)

np.savetxt('anuraglahon.csv',test_predict,fmt='%1.5f') '''

import numpy as np

import statsmodels.api as sm

data= np.loadtxt('bostontrain.csv',delimiter=',')

X=data[:,0:13]

Y=data[:,13]

X = sm.add_constant(X)

est = sm.OLS(Y, X).fit()

test_data=np.loadtxt('bostontest.csv',delimiter=',')

test_predict=est.predict(test_data)

np.savetxt('anuraglahon.csv',test_predict,fmt='%1.5f') '''

```
import pandas as pd
import numpy as np
import statsmodels.api as sm
data= np.loadtxt('bostontrain.csv',delimiter=',')
X=data[:,0:13]
Y=data[:,13]
X = sm.add_constant(X)
est = sm.OLS(Y, X).fit()
test_data=np.loadtxt('bostontest.csv',delimiter=',')
test_predict=est.predict(test_data)
np.savetxt('anuraglahon.csv',test_predict,fmt='%1.5f')
```

I wan to train and test can anyone help me?

*want

Trying to improve the model I was working for analysing text.

The previous method, counting words, worked fine for text of about 300 words or less, but it was not that easy when working with more words.

For your info, the method is not mine: it is based on concepts by Luhn.

Now I am combining the Luhn method with a entity-centric one.

I am also displaying in browser and terminal.

Still a work in progress and messy as always. But hope better than just reading through the full text.

Nice. Is that your output into the browser? Or is that interactive? @evaristoc

@GoldbergData It is still the output into the browser. I just started working on this.

I am interacting a bit more through terminal for now.

@anuraglahon16 using scikit-learn, looks like you can use

`sklearn.model_selection.train_test_split`

. All training and test sets are just sampling and splitting. It's been a while since I've done Python, but just choose how you wanna split all your data and select the rows based on the indices you've chosen.
@evaristoc nice! Thanks for sharing your work in progress. Very cool :+1:

erictleung sends brownie points to @evaristoc :sparkles: :thumbsup: :sparkles:

:cookie: 406 | @evaristoc |http://www.freecodecamp.org/evaristoc

@erictleung thanks man!

@GoldbergData probably adding something like this: https://stackoverflow.com/questions/28820551/interactive-selection-highlighting-of-text-inside-the-browser

I would introduce a word selection option (probably a tick option) to mark a fixed number of characters (before and after the selected keyword present in text) while making the rest of the text less visible (opacity).

evaristoc sends brownie points to @erictleung and @goldbergdata :sparkles: :thumbsup: :sparkles:

:cookie: 134 | @goldbergdata |http://www.freecodecamp.org/goldbergdata

:cookie: 576 | @erictleung |http://www.freecodecamp.org/erictleung

Doing that, I probably won't need the terminal to analyse the text any more.

If I get fancy, I could include a choice to fill in a Google spreadsheet with sentences I am interested in.

Anyway... My main goal is to progress on this so I won't work on any procedure if it takes time. Not really the core of the project right now.

Sorry. Wrong chat.

```
def error(x, y, initial_thetta):
error_sum = 0
m, n = x.shape
for i in range(1, m):
error_sum = (initial_thetta.T[i] * x[i] - y[i] ** 2)
return (error_sum) / (2 * m)
```

can anyone look at this code, and let me know if I am heading in the right direction? I am implementing multivariate linear regression in a small dataset with 700 sample

and that function is sum of squared function.

I feel like its working right, but i am not completely sure.

```
def error(x, y, initial_thetta):
error_sum = 0
m, n = x.shape
for i in range(1, m):
error_sum += (initial_thetta.T[i] * x[i] - y[i]) ** 2
return (error_sum) / (2 * m)
```

```

```
Initial erorr is [1.73828125e-01 1.15221354e+01 7.75501562e+03 2.54704036e+03
3.29631510e+02 9.78108268e+03 5.30021348e+02 1.48438496e-01
6.07216146e+02]
```

why am i getting output in exponents?

Hey all! I'm working on rebuilding this map to see if I can functionality and speed improved. I know @evaristoc Has used it in one of his writings from way back, and I tried to use it as my single project portfolio in a job interview, but URLs were wrong and the best I could manage was the cached thumbnail of the pen from my phone :unamused:

I got the job, but the guy at one point admitted I was a desperation hire, so I'm not sure how that works out for me...

It probably won't get much attention until later this month, but I've fixed the broken URLs so at least now it's something that can be looked at... :/

@Sprinting interesting... I haven't competed that much to be honest. I would probably check the data though. It is a topic I used to analyse a lot before.

@bigyankarki

- I assume you are using
`numpy`

? I think there are ways to skip the loop. - Yes... your second script is better expressed. However, if I am not wrong error sum is about
`model value[i] - average value`

? (eg. https://hlab.stanford.edu/brian/error_sum_of_squares.html, https://en.wikipedia.org/wiki/Partition_of_sums_of_squares) You can also check the following:

- http://www.dummies.com/education/math/business-statistics/find-the-error-sum-of-squares-when-constructing-the-test-statistic-for-anova/ (for ANOVA)
- http://www.statisticshowto.com/residual-sum-squares/ (see the chart in this one! it is a good explanation of what the total squares means - just realize that the distance between the line and the point is not the shortest to the line, but the one parallel to the y-axis, so you are evaluating how far, over or under, the observed point (
`y`

) is to its estimated point (`f(x)`

) as a way to analyse the model fitness).

It is expected that the more data points you add to the sum the larger the error. However, the shape of your dataset and your procedure are not clear. I guess

`m`

are your "examples"?- In your exercise, are you probably interested in the Mean Squared Error instead?

Hope this helps.

@becausealice2 If you have twitter, I would suggest you to send it to the freeCodeCamp account?

I would add the approx. date the map was made.

The data is super old, though. I never rebuilt the scraper.

They kept moving the list around :/

Doesn't matter. Just write you were dusting your old projects and found this one in your archive, @becausealice2

I'll wait until I get zoom working correctly, but I'll do that :)