These are chat archives for FreeCodeCamp/DataScience

13th
Aug 2016
Casey Heath
@ExhibitArts
Aug 13 2016 06:02
Anyone willing to help with an experimental start up site?
Need some front and back-end devs as well as a few designers.
Pm me.
:smile:
Quincy Larson
@QuincyLarson
Aug 13 2016 08:35
@smithbrandon yes I think @BerkeleyTrue and @roelver are going to prioritize the open API once we ship React and some of our new curricula. Thanks for your patience.
CamperBot
@camperbot
Aug 13 2016 08:35
quincylarson sends brownie points to @smithbrandon and @berkeleytrue and @roelver :sparkles: :thumbsup: :sparkles:
:cookie: 275 | @smithbrandon |http://www.freecodecamp.com/smithbrandon
:cookie: 373 | @berkeleytrue |http://www.freecodecamp.com/berkeleytrue
:cookie: 537 | @roelver |http://www.freecodecamp.com/roelver
evaristoc
@evaristoc
Aug 13 2016 09:25

@alicejiang1 Planning to join you at Berkeley's edX training. It is more PySpark.

Also noticed that the skale.me's syntax resembles the Spark's one substantially. I think they did it so as an invitation for Spark users to easily transition into skale.me. So if you learn one, you will have an easy introduction to the other one.

evaristoc
@evaristoc
Aug 13 2016 09:36

People

Those who are interested in online trainings in Data Analysis with R

Check DataCamp and kaggle. Remember that in kaggle you can try the FCC Survey 2016.

Those interested in Volunteering as Data Analysts for Social-Minded Projects

Check (of course...) kaggle, DataKind, and DrivenData.

evaristoc
@evaristoc
Aug 13 2016 14:54

@alicejiang1: going already for the second week of the first course (Introduction). I always had this idea that Spark was an amazing tool. I have been in contact with the Big Data / Data Science community for a while but my first introduction to Big Data was Hadoop. I was not until a conference about 1.5 year ago about Spark where I met personally Paco Nathan, one of the founders of Spark / Databricks. Really nice guy. He is a real promoter of Open Source and Learning Access for Everyone.

Of all, what surprise me the most is the Databricks Platform. It is amazing that you can do some Big Data / Data Science exercises without troubling much about settings. This people in Berkeley really did an excellent job.

Immediately after that meeting about Spark everyone in the Big Data circles here in Amsterdam were starting to talk about it and never went back. I remember one guy who was a regular attendant to those meetings saying "Hadoop? Hadoop techs are dead!". He is a Scala guy who by that time was working for a start-up Big Data company here in Holland making projects in the automobile sector. By that time it was more about electricity use because they were more into electric cars, but I wouldn' t be surprise they are moving into the Internet of Things in that sector...

This will be the first time that I go through it. I am lucky to know SQL (for Hive), python and Hadoop, as well as some idea of distributed/multiprocessing theory and practice: it makes it easier.

I think though they should change the Piazza forum by Discourse, although I think it has to do with edX, not Databricks... I think Discourse will become the de facto forum in the future for sure... I think they are using Piazza because its registration system? I also suspect that allows them getting a deeper hand into the data for analytical purposes.
evaristoc
@evaristoc
Aug 13 2016 15:31
@alicejiang1: ... and do you want to know a secret?? Have you tried the Databricks' display feature for displaying a dataset? Have you seen that you can also plot data, etc when using that feature? Well, plottings seems to be in d3.js or similar...
evaristoc
@evaristoc
Aug 13 2016 17:06
@Lightwaves Spark it is mostly about Big Data handling, although it also includes some ML features... if you are interested? Let me know... I am not sure why I think that for you more interesting would be to set up an architecture. Am I right?
Lightwaves
@Lightwaves
Aug 13 2016 17:08
I'm interested in both, I like to understand the architecture from both the implementation side and software side of things, but I'm very interested in learning machine learning.
Is this a course?
evaristoc
@evaristoc
Aug 13 2016 17:10
Yes, edX. Have a look. There is some essentials about ML but... hmmm... the problem with these trainings is where they put more emphasis: in this case, you will see some ML but the focus is on learning to use the tool...
There is A LOT to learn...
A lot of methods...
Lightwaves
@Lightwaves
Aug 13 2016 17:12
Which course is it?
evaristoc
@evaristoc
Aug 13 2016 17:12
Spark series
Lightwaves
@Lightwaves
Aug 13 2016 17:12
introduction to apache spark?
evaristoc
@evaristoc
Aug 13 2016 17:12
Yes... I am doing the basic one now...
They are really short...
I am already in the second week doing the lab... this one seems not that difficult for me because I have some introduction to all the things they are using... pySpark is also about pandas so I can quickly get used to the methods. But it is still huge, it can be overwhelming if you don' t have some idea about all the things that are combined in the course (python, pandas, SQL, distributed/multiprocessing, functional programming and MapReduce, etc). Also a previous knowledge of Hadoop makes it easier...
evaristoc
@evaristoc
Aug 13 2016 17:18
Also analytics, ML...
I have been able to rush this particular course so far becuase all that...
The next one that start Monday is which I am more interested: it will be working on distributed systems
So it will use more parallel and possibly introducing the users in how to make a simple good parallel request...
Again, many methods used here are very similar to those ones suggested by skale.me...
Lightwaves
@Lightwaves
Aug 13 2016 17:21
haha I bet this will save some sysadmins some time analysing those logs :P
evaristoc
@evaristoc
Aug 13 2016 17:21
:)

If you want to do it, go ahead and let me know? @alicejiang1 said she was doing it too... We can use this channel to discuss it... we can eventually come with some projects involving node, JS together with Spark/skale.me...

I am obcessed by that project, the skale.me one... I think it could be a really interesting stuff... but maybe I am the only one who thinks that :)

Something more FCC...
Lightwaves
@Lightwaves
Aug 13 2016 17:30
I don't blame you
Lightwaves
@Lightwaves
Aug 13 2016 17:37
skale seems like a good project
Alice Jiang
@becausealice2
Aug 13 2016 18:55
@evaristoc I'm glad you're enjoying the course. The second one just got archived yesterday and they sent out an email after saying 28000 students from 150 countries were registered. 25% were engaged in the course and 10% passed. 5% paid for the verified certificate and, of them, 89% passed.
I got 100% (which isn't hard to do) :)
I've enjoyed these courses much more than Microsoft's DS/ML, but there's still a whole bunch more that I want to try so I won't declare any favorites yet ;)
evaristoc
@evaristoc
Aug 13 2016 19:04
@alicejiang1 Great! I am doing the recently archived one but it has still access and it looks that it will be better to have that done to go to the most advanced one... I think I will finish the second week lab today and tomorrow I will try the third week section. I think I am going to pay for the verified certificate for the most advanced one...
What are you going to do? Are you planning to take the next one?
Alice Jiang
@becausealice2
Aug 13 2016 19:07
I am planning on taking nlthe next one (without paying) as well as a bunch more from the data science curriculum by Microsoft. So far those have been so easy they can be finished in just a few hours, so those have been more about my resume than learning experience. Columbia also has a good xseries that I started for DS/ML
Harvard has a couple xseries on DS for life sciences/genomics that I'm curious about, and then there's bigdatauniversity.com that I've been meaning to try...
I could be an education blogger. Write honest reviews of courses I take, preferably for money ;P
Darwin RC
@darwinrc
Aug 13 2016 19:10
@alicejiang1 Hello. I did all 3 courses from Columbia(paying) and the are the worst lost of my money and time. I don't recommend them at all
evaristoc
@evaristoc
Aug 13 2016 19:11
I haven' t seen the MS courses, I guess it has to do with Azure? I am not really interested... The Columbia ones I don' t know them. Harvard is WOW. MIT even MORE WOW. I think @erictleung has seen a couple of them... I would like to take several full courses that are in youtube with MIT/Harvard but... takes time and they are more theory than practice...
Alice Jiang
@becausealice2
Aug 13 2016 19:11
Oh really? I'm still only in the first course and it's been good statistics review :/ that's disappointing. What about the courses did you dislike so much?
Darwin RC
@darwinrc
Aug 13 2016 19:11
I'm also taking the spark series (the first course) and it is awesome.
evaristoc
@evaristoc
Aug 13 2016 19:12
@darwinc @alicejiang1 what are those Columbia ones?
@darwinc Spark series with who?
Darwin RC
@darwinrc
Aug 13 2016 19:13
@evaristoc BerkeleyX: CS105x Introduction to Apache Spark ... I believe is the same you are taking
Alice Jiang
@becausealice2
Aug 13 2016 19:13
@evaristoc some of them are about Azure. They are MS product-centric, but they are good lessons in theory (all the code is pre-written for most of the courses) for noobs
evaristoc
@evaristoc
Aug 13 2016 19:13
Ah! Yes, the same we are taken... :)
Lightwaves
@Lightwaves
Aug 13 2016 19:13
I'm going through the first week, and i'm enjoying it.
evaristoc
@evaristoc
Aug 13 2016 19:14
@darwinrc Gonna take the next one?
Alice Jiang
@becausealice2
Aug 13 2016 19:15
@evaristoc I can't remember the name, but if you go to the edX courses search page and under filter by subject choose data analysis and statistics (or whatever they call it) it's listed there. And there's a bunch of others as well
There's a course coming up on DS ethics I've got my eye on that I found there ;)
evaristoc
@evaristoc
Aug 13 2016 19:16
@Lightwaves I really recommend it indeed... the first week is nice... then you will start getting your hands dirty in the second week...
Darwin RC
@darwinrc
Aug 13 2016 19:16
@alicejiang1 Columbia's series first of all are not challenging at all... the grades are just quizzes from the lecture videos. Second, they are just a bunch of professor from all around the world (it seems they were doing the lectures forced) and all the topics are disconnected, they cram almost all CS/Math curricula in 3 courses
Alice Jiang
@becausealice2
Aug 13 2016 19:16
The second course is a lot more fun ;) the last lab of the second course was my absolute most favorite so far!
Darwin RC
@darwinrc
Aug 13 2016 19:16
@evaristoc yes... I have to finish first, though
evaristoc
@evaristoc
Aug 13 2016 19:17
@alicejiang1 I will check... although if it is about Data Analysis and Statistics... hmmmm... I think it is enough for me... a review would come handy but nothing else...
Alice Jiang
@becausealice2
Aug 13 2016 19:17
@darwinrc thanks for clarifying. That's so disappointing :(
CamperBot
@camperbot
Aug 13 2016 19:17
alicejiang1 sends brownie points to @darwinrc :sparkles: :thumbsup: :sparkles:
:cookie: 424 | @darwinrc |http://www.freecodecamp.com/darwinrc
Alice Jiang
@becausealice2
Aug 13 2016 19:18
@evaristoc that's just the name of the subject. Anything related to "data" in any way is there.
evaristoc
@evaristoc
Aug 13 2016 19:19
Oh!!! @alicejiang1 That one about Distributed!!! Nooo! I missed!
Alice Jiang
@becausealice2
Aug 13 2016 19:19
I may never actually work in the field because I'm too busy doing all the courses :'(
Darwin RC
@darwinrc
Aug 13 2016 19:19
@alicejiang1 yes... Those courses have nothing to do with data science or machine learning... Just a lot of theory of algorithms, statistics and electronics but no context for DS/ML.
Alice Jiang
@becausealice2
Aug 13 2016 19:20
@evaristoc this wasn't the first run of the course, I'm sure it won't be the last. You can still do everything, it just won't be graded, and the piazza forum is closed for new discussion
evaristoc
@evaristoc
Aug 13 2016 19:21
@alicejiang1 you need that base in statistics and algos to do a good ML/DS job though... but ML... there is a lot of heuristics added to it
Alice Jiang
@becausealice2
Aug 13 2016 19:21
@darwinrc I'm probably going to still go through them because I am interested in the maths review and theory. I would have expected more practical application from the data science department at Columbia :/
evaristoc
@evaristoc
Aug 13 2016 19:23
@alicejiang1 yes, I am sure they are going to repeat the training in the future but it can take a year... well, nothing that I can do about it now...
Darwin RC
@darwinrc
Aug 13 2016 19:23
@alicejiang1 Maybe the first course is the most decent
evaristoc
@evaristoc
Aug 13 2016 19:23
@darwinrc first course? Introduction seems to be the first, doesn't it?
Alice Jiang
@becausealice2
Aug 13 2016 19:24
@evaristoc if you work through it now, I actually found the code from one of the previous runs and it's almost identical.
Lightwaves
@Lightwaves
Aug 13 2016 19:24
I need a much stronger statistical base
Alice Jiang
@becausealice2
Aug 13 2016 19:24
You can get everything all finished and then be ready for the next run:)
Darwin RC
@darwinrc
Aug 13 2016 19:25
@evaristoc Are we refering to the same series? Spark or Columbia?
Spark one rocks, Columbia sucks
evaristoc
@evaristoc
Aug 13 2016 19:25
:)
I am still in the Spark mode...
Columbia, it is not included in my dialogue, no...
Lightwaves
@Lightwaves
Aug 13 2016 19:26

I've been slowly going through udacity descriptive statistics and i'm on the 6th lesson.

The stuff I really need to learn is PDFs, Conditional probability, Bayes Rule, etc especially on the algorithms/implementation side of things.

evaristoc
@evaristoc
Aug 13 2016 19:26
This message was deleted
Darwin RC
@darwinrc
Aug 13 2016 19:26
@evaristoc jeje. Better then.
evaristoc
@evaristoc
Aug 13 2016 19:27
@Lightwaves : well... in short, you need the very basics... that is in many ways the essence of the rest...
Darwin RC
@darwinrc
Aug 13 2016 19:27
@evaristoc I just have time to take a look at the video about scale.me. I'll let you know
evaristoc
@evaristoc
Aug 13 2016 19:28
Ask questions here if you are lost... I will try to help (as long as I know the answer... :)) @Lightwaves
@darwinrc Nice! It is a bit long and the presenter went through a couple of difficulties so it could be a bit hard to digest (I think it was the second exercise), but I think you will get a better idea about how the project works after all the Spark training!
Darwin RC
@darwinrc
Aug 13 2016 19:33
:sparkles:
evaristoc
@evaristoc
Aug 13 2016 19:33
@alicejiang1 yes, I will check the project archives for sure... were you saying that there is some material that is similar between the two Spark courses you took? What was the most interesting about the recently closed one, about distributed?
@alicejiang1 If you are suggesting that they have sections that are too similar, then I would like to put my efforts in those sections where the closed one is different and more interesting...
Alice Jiang
@becausealice2
Aug 13 2016 19:39
@evaristoc No there was similarities between the second course and a previous course on Spark from last year. Honestly I was so busy trying to cram everything in before the closing date that I didn't pay much attention to what i was learning. It felt the same as the first course, though. There wasn't a lot of explanation on distributed systems. Just more sciencing data
@evaristoc no, the first and second courses are nor the same. The second course is the same as a course Berkeley ran over a year ago, CS190X
evaristoc
@evaristoc
Aug 13 2016 19:43

@alicejiang1 and for you, second course means this one I am doing, the CS105, doesn't it? I think they are ordered as:
First == CS105 (Introduction)
Second == CS110 (Data Analysis)
Third == CS120 (Distributed)

If that true, I would expect the Third one (that recently closed) the most difficult one of those 3... But you said it was not that difficult, didn't you?

And I might be wrong in the ordering anyway...

Alice Jiang
@becausealice2
Aug 13 2016 19:45
The second and third are switched. It wasn't very difficult. The most difficult part for me has been remembering variable and function names, and where to find them scrolling up when I forget.
It wasn't easy, but it wasn't overly difficult either. It felt just right for a learning experience.
evaristoc
@evaristoc
Aug 13 2016 19:47
Positive then! Yes... I think the key of these courses, as I was explaining to @Lightwaves, is more about having an introduction to the tool rather than learning the deeps of DS/ML...
Alice Jiang
@becausealice2
Aug 13 2016 19:48
Yeah it's definitely making an assumption that all you're there for is how to Spark lol
evaristoc
@evaristoc
Aug 13 2016 19:48
@)
Let's Spark the word!
Awful... it sounds too much ML...
Ok... back to my Sparkled world, to see if I can finish this in few minutes...
Alice Jiang
@becausealice2
Aug 13 2016 19:51
I'm sure you can ;)
evaristoc
@evaristoc
Aug 13 2016 19:51
:)
Lightwaves
@Lightwaves
Aug 13 2016 19:51
Seems like @evaristoc is trying to make the world sparkle
Alice Jiang
@becausealice2
Aug 13 2016 19:52
:sparkles: Spark :sparkles:
Lightwaves
@Lightwaves
Aug 13 2016 19:53
:sparkles: world :sparkles:
evaristoc
@evaristoc
Aug 13 2016 20:50

@alicejiang1 @Lightwaves @darwinrc I am enough sparkled for today.... Done with 2 sections... I am maybe finishing the training tomorrow. Just sitting on my bench, coffee, a few snacks and... done. Happy that I can do it now so quickly, some months ago I think I would have struggled a lot...

See you around!