discussion on how we can use statistical methods to measure and improve the efficacy of http://freeCodeCamp.com
For those who use R and the tidyverse, tidyr
has been updated to 1.0.0
! https://www.tidyverse.org/articles/2019/09/tidyr-1-0-0/
Some notable updates:
pivot_longer()
and pivot_wider()
provide improved tools for reshaping, superceding spread()
and gather()
. The new functions are substantially more powerful, thanks to ideas from the data.table
and cdata
packages, and I’m confident that you’ll find them easier to use and remember than their predecessors.unnest_auto()
, unnest_longer()
, unnest_wider()
, and hoist()
provide new tools for rectangling, converting deeply nested lists into tidy data frames.nest()
and unnest()
have been changed to match an emerging principle for the design of ...
interfaces. Four new functions (pack()
/unpack()
, and chop()
/unchop()
) reveal that nesting is the combination of two simpler steps.expand_grid()
, a variant of base::expand.grid()
. This is a useful function to know about, but also serves as a good reason to discuss the important role that vctrs plays behind the scenes. You shouldn’t ever have to learn about vctrs, but it brings improvements to consistency and performance.So Google's gonna be Google. I was skimming through Google's AI blog (highly recommended btw) and was reading about some new neural networks, namely "weight agnostic neural networks" or WANNs https://ai.googleblog.com/2019/08/exploring-weight-agnostic-neural.html
So traditionally, you'll need to design the architecture of neural networks (i.e., how many layers, the connections, how many nodes, etc). These WANNs are apparently a way to use automation and have the computer find out which architectures work the best. It is a fascinating thing to think about.
@sa-js sounds you're well on your way to analyzing the data! You've already gotten the data in a vector form. Stemming them is a great idea as you've mentioned. NLTK should be able to a lot of this for you as you suggest. I'd agree this is a good enough approach for now.
Also, it looks like NLTK has a built-in classifier you can use https://pythonspot.com/natural-language-processing-prediction/
Here are some other resources that might help:
Good luck!
@mridul037 if you want some practice, you can practice going through the collecting data, cleaning/manipulating the data, and visualizing the data workflow.
Here is one such initiative to practice this https://github.com/rfordatascience/tidytuesday