These are chat archives for FreeCodeCamp/DataScience
discussion on how we can use statistical methods to measure and improve the efficacy of http://freeCodeCamp.com
Just coming from a meetup about Data Science. Encountering old friends...
Representatives of important companies like booking.com (online hotel bookings, probably the biggest nowadays) or Transavia (airline, Europe) were giving talks. Also a CQM data scientist. The host company was Travix (an European sister company of Travelocity, from Expedia).
About standardisation and model simplicity in the use of models:
I must admit that I was surprised for the common use k-mean clusters or classification trees though. I guess either there should be some differences in some details that were not mentioned or simply put: the simple models were much better than their more expensive counterparts...
Other driver to prefer the simplest model possible could have been cost. When I was practising in kaggle I was actually discussing one project based on that premise: more precise algorithms require A LOT more hardware (remember: I said before that solving for Data Mining shares analogies with solving an NP-hard problem using approximation algorithms!!!). So a company who has to use the results of any Data Mining / Machine Learning implementation should invest a lot more in hardware to effectively go for precision. That precision comes with a cost that MUST be justified, not only in money, but culturally (for those who don't still know: organisations have CULTURES).
in the next days I will prepare a page to show what the projects of the data science room... again: if you are interested please come and share
I was going to bed when I found an article that still want to share with you but first...
@alicejiang1 perhaps... Spark, together with Cassandra, Kafka, and other Apache suits have been taking over. Anyway --- in general there are interesting trends but the report should be taken cautiously. Example: it shouldn't be many questions about technology specifically used for niche sectors like Big Data or parallel computation in stackoverflow. There shouldn't also be much about Close Software. By the way: a nice link about distributed parallel computing: https://computing.llnl.gov/tutorials/parallel_comp/
And now the link I found in medium: for developers and the rising of technologies around Natural Language Processing and AI: