These are chat archives for FreeCodeCamp/DataScience

29th
Jan 2018
Kartik Mudgal
@Sprinting
Jan 29 2018 16:22
I would like a look at the the first project.
There's one metric I'm interested in that should not be too difficult to measure, and there should be fair amount of data already (I think FCC had survey's on this, I remember filling one out) available. The basic idea is to get a measure of what percentage of FCC's curriculum is completed by people. At what point did they leave?How many of them got employed. Were already employed when they started? Were they employed in a related field?
evaristoc
@evaristoc
Jan 29 2018 17:25
@Sprinting Nice! Are you interested in helping with that or are you proposing those as question to be answered? Both are welcome, just to know.
Kartik Mudgal
@Sprinting
Jan 29 2018 17:36
I would love to be involved - but unfortunately I've just got a new job and don't have a lot of time. I would still like to see the report though - and If the data was public , I'd like to have it so I can play with it when I have some time :)
Kartik Mudgal
@Sprinting
Jan 29 2018 17:41
I just want this question to be answered : does course completion correlate to better chances of having a job down the line?
Carlos Jose Fragoso Santoni
@cjfragoso
Jan 29 2018 19:08
Hi
Im trying to make a linear regression model with a dataframe of values as y and time as x
can anyone help me?
evaristoc
@evaristoc
Jan 29 2018 19:43
@cjfragoso too soon to say it but i doesn't seem difficult. First: are you talking Python's pandas or R?
Carlos Jose Fragoso Santoni
@cjfragoso
Jan 29 2018 19:44
python pandas
evaristoc
@evaristoc
Jan 29 2018 19:44
So what is exactly the problem?
Carlos Jose Fragoso Santoni
@cjfragoso
Jan 29 2018 19:46
i have this dataframe which i made :
Price Time
0 1.23586 10:31:48
1 1.23583 10:31:49
2 1.23582 10:31:50
3 1.23579 10:31:51
4 1.23586 10:31:52
5 1.23593 10:31:53
6 1.23595 10:31:54
7 1.23595 10:31:55
8 1.23592 10:31:56
9 1.23593 10:31:57
10 1.23594 10:31:58
11 1.23594 10:31:59
12 1.23596 10:32:00
13 1.23596 10:32:01
14 1.23595 10:32:02
15 1.23597 10:32:03
16 1.23595 10:32:04
17 1.23594 10:32:05
18 1.23594 10:32:06
19 1.23594 10:32:07
20 1.23594 10:32:08
21 1.23594 10:32:09
22 1.23592 10:32:10
23 1.23589 10:32:11
24 1.23590 10:32:12
25 1.23589 10:32:13
26 1.23588 10:32:14
27 1.23588 10:32:15
28 1.23589 10:32:16
29 1.23589 10:32:17
... ... ...
4891 1.23489 12:34:42
4892 1.23493 12:34:43
4893 1.23494 12:34:44
4894 1.23494 12:34:45
4895 1.23487 12:34:46
4896 1.23485 12:34:47
4897 1.23485 12:34:48
4898 1.23488 12:34:49
4899 1.23489 12:34:50
4900 1.23488 12:34:51
4901 1.23489 12:34:52
4902 1.23492 12:34:53
4903 1.23490 12:34:54
4904 1.23483 12:34:55
4905 1.23480 12:34:56
4906 1.23480 12:34:57
4907 1.23478 12:34:58
4908 1.23478 12:34:59
4909 1.23484 12:35:00
4910 1.23483 12:35:01
4911 1.23479 12:35:02
4912 1.23483 12:35:03
4913 1.23483 12:35:04
4914 1.23481 12:35:05
4915 1.23479 12:35:06
4916 1.23481 12:35:07
4917 1.23474 12:35:08
4918 1.23472 12:35:09
4919 1.23470 12:35:10
4920 1.23471 12:35:11
How exactly can i make linear regression given this dataframe?
That EURUSD forex ticks
evaristoc
@evaristoc
Jan 29 2018 20:08

@cjfragoso a whole copy is not needed, a fragment would have been enough.

There are ways to fit a simple regression line on top of the data you are providing.

I am finding out that apparently pandas might not have a regression formula shipped with the library. You should use another library to help pandas. You can use:

  • numpy / scipy
  • statmodels (my strongest suggestion)
  • scikit-learn

I provide here a couple of references to the one it is my strongest advice for your case:

Notice that statmodels resembles R syntax. It might be an external library in your case and might require you to install it.

If you want to make a figure, my suggestion is seaborn.

Hope this helps.

Carlos Jose Fragoso Santoni
@cjfragoso
Jan 29 2018 20:10
@evaristoc THAAAANKSSS