These are chat archives for FreeCodeCamp/DataScience

25th
Aug 2016
Joseph Parkton
@hippybear
Aug 25 2016 01:16
Can I get an API key please
evaristoc
@evaristoc
Aug 25 2016 07:43
@hippybear unfortunately the API key is still not available
Quincy Larson
@QuincyLarson
Aug 25 2016 07:48
@evaristoc awesome! I just updated the article to include these stats. Thanks! https://medium.freecodecamp.com/the-economics-of-working-remotely-28d4173e16e2#.4g3grcxrh
CamperBot
@camperbot
Aug 25 2016 07:48
quincylarson sends brownie points to @evaristoc :sparkles: :thumbsup: :sparkles:
:cookie: 307 | @evaristoc |http://www.freecodecamp.com/evaristoc
Albert Jonathan
@albert2309
Aug 25 2016 08:23
@CodeNonprofit For the non-profit organization survey, are you planning to expand to organization outside of US like MSF (also known as Doctor Without Borders)?
What about the classification of organization? I am slightly concerned about that since some organization expands to other countries like Unicef while other only operates in their country of origin like March of Dimes
evaristoc
@evaristoc
Aug 25 2016 10:01

@CodeNonprofit I checked the NPTFG survey: I don't think is bad, only focused on the specific case of the use of social media... I think NPTFG would like to focus on consultancy on social media for NGO's/nonprofit?

Not a benchmark for the FCC project as FCC interest is totally different...

And here an example of why the sampling design aspect is relevant, @CodeNonprofit:

One serious problem of the NPTFG survey is that many people in the same organisation can answer the survey. That is a situation that will pose serious challenges in the analysis. If there are many respondents from the same organisation answering the survey, which one should I take? Worst: what if the information given by several of them disagree? Better to have only one that you DON'T know if wrong but you can assume it is ok, than having several of them and not knowing what to choose!!!

Therefore data per organisation is better to be UNIQUE to avoid discrepancies: ONE organisation, ONE questionnaire only.

Alice Jiang
@becausealice2
Aug 25 2016 11:39
@QuincyLarson something I would mention, especially with median salaries that high, is that according to the survey, remote workers tend to be more experienced. I wouldn't want to get someone's hopes up that right out of the gate of the front end cert can find a remote position for $100K+ without difficulty...
Michael D. Johnson
@CodeNonprofit
Aug 25 2016 14:08
@evaristoc I'm not sure it is going to be possible to limit it to one respondent per organization. This survey will be anonymous.
Philip Durbin
@pdurbin
Aug 25 2016 14:34
@CodeNonprofit does higher education count as non-profit?
evaristoc
@evaristoc
Aug 25 2016 14:49
@CodeNonprofit Ok... it could pose a serious validity challenge... I will investigate how to mitigate possible effects...
Michael D. Johnson
@CodeNonprofit
Aug 25 2016 15:51
@pdurbin Nonprofits under our definition include only 501(c)(3) charitable organizations (or foreign equivalent). Higher education does not fall under that umbrella.
Philip Durbin
@pdurbin
Aug 25 2016 16:18
@CodeNonprofit ok, just curious. Thanks. Open source is used a lot in higher ed. And higher ed sometimes funds open source development.
CamperBot
@camperbot
Aug 25 2016 16:18
pdurbin sends brownie points to @codenonprofit :sparkles: :thumbsup: :sparkles:
:cookie: 147 | @codenonprofit |http://www.freecodecamp.com/codenonprofit
evaristoc
@evaristoc
Aug 25 2016 19:29

People

Today in a Data Science meetup: PyData (Amsterdam)

https://www.meetup.com/PyData-NL/events/232899698/

Main topics:

  • Deep Learning (convolutional NN) applied in Natural Language Processing for sentiment analysis and topic extraction:
    • Excellent exercise showing the power and the limitations of this approach
    • Advices are if using convolutional, you will think about ignoring the position of the word in the text (which is less important for image ML) and it should be more GPU rather than CPU based; if you want to go NN, think about recurrent networks instead, but otherwise go TFIDF - SVM or just use Google suite, with a recently launched summarising tool...
  • Individual vs Company name recognition:
    • Focus of the project on a system to easily distinguishing individuals from companies in datasets coming from different sources where the only primary key possible is the name of the person of the company; also used as a lookup tool...
    • Useful project for easing the identification of
    • Final best result consisting of few thumb of rule steps to clear the data before passing the remaining data through a assemble of a simple NN + logistic regression.
    • Nice project with a demo at place that showed to be very effective.
If you keep track of the meetup you will eventually get the slides...
evaristoc
@evaristoc
Aug 25 2016 19:41
For the Deep Learning project mentioned above, some keywords were:
  • GPU
  • Recurrent instead of Convolutional for some cases
  • Convolutional tends to bias on the positive control of the kernel data
  • Glove or word2vec to feed the NN (NLP case)
  • A lot of tuning!!!!!
  • Instead of sklearn, start thinking TensorFlow or in python: Theano (find a list of other libraries in Quora...); the presenter used Keras
  • Don't go straight to NN and complex settings: start simple
Michael D. Johnson
@CodeNonprofit
Aug 25 2016 22:05
@pdurbin No problem - I think open source should be utilized by every sector. With regards to the Code Grants, what we're doing aligns more with the free software movement than it does the open source initiative. A good amount of what we do for nonprofits is not open source, although in the future this ratio could change.