These are chat archives for FreeCodeCamp/DataScience

18th
Feb 2019
macasillas
@macasillas
Feb 18 13:39
hello everyone, does anyone here speak "R"? I just started learning it and am trying to do something with sequences but I can't figure out how.
Philip Durbin
@pdurbin
Feb 18 13:43
I doubt I can help but you should probably go ahead and ask so others can.
macasillas
@macasillas
Feb 18 13:49
image.png

so, that's the table I have.
I'm trying to calculate the standard deviation and variance for it.
I tried doing this in code:
a<-0:100
b<-100:200
c<-200:300
d<-300:800
sd(rep(c(a,b,c,d), times=c(90,140,150,120)))

But it says "invalid 'times' argument".
So I don't really know what to do XD

Eric Leung
@erictleung
Feb 18 16:14
@macasillas can you explain the table a bit more? Are you sampling 90 values from values 0 to 100 (for the second column)?
macasillas
@macasillas
Feb 18 16:41
@erictleung the idea is that values from 0-100 appear 90 times, then from 100-200 (I guess I can put 101-200) appear 140 times, then from 200-300 appear 150 times, and from 300-800 appear 120 times
Eric Leung
@erictleung
Feb 18 18:19
@macasillas this will give you your answer, but I don't think this result is what you intended.
> tribble(~max, ~seq,   ~num,
+         100,  0:100,    90,
+         200,  100:200, 140,
+         300,  200:300, 150,
+         800,  300:800, 120) %>%
+   mutate(bunches = map2(seq, num, rep)) %>%
+   mutate(sdvals = map_dbl(bunches, sd))
# A tibble: 4 x 5
    max seq           num bunches        sdvals
  <dbl> <list>      <dbl> <list>          <dbl>
1   100 <int [101]>    90 <int [9,090]>    29.2
2   200 <int [101]>   140 <int [14,140]>   29.2
3   300 <int [101]>   150 <int [15,150]>   29.2
4   800 <int [501]>   120 <int [60,120]>  145.
I used the purrr and tibble packages to encapsulate everything into a (modified) data frame.
Eric Leung
@erictleung
Feb 18 18:25
A more interesting result, which is what I think you may be interested in is actually sampling those sequence lists.
> mod_fun <- function(seqval, num) {
+   sample(seqval, num, replace = TRUE)
+ }
> tribble(~max, ~seq,   ~num,
+         100,  0:100,    90,
+         200,  100:200, 140,
+         300,  200:300, 150,
+         800,  300:800, 120) %>%
+   mutate(bunches = map2(seq, num, mod_fun)) %>%
+   mutate(sdvals = map_dbl(bunches, sd))
# A tibble: 4 x 5
    max seq           num bunches     sdvals
  <dbl> <list>      <dbl> <list>       <dbl>
1   100 <int [101]>    90 <int [90]>    27.6
2   200 <int [101]>   140 <int [140]>   31.6
3   300 <int [101]>   150 <int [150]>   28.3
4   800 <int [501]>   120 <int [120]>  143.
Now you can see that the standard deviations vary a bit based on your sampling size. But either way, those two code blocks should help you along. Good luck!
macasillas
@macasillas
Feb 18 20:13
thanks for the help! I'm not familiar with the tribble package, but I'll definitely take a look and dig into it =D