For those who use R and the tidyverse,
tidyr has been updated to
Some notable updates:
pivot_wider()provide improved tools for reshaping, superceding
gather(). The new functions are substantially more powerful, thanks to ideas from the
cdatapackages, and I’m confident that you’ll find them easier to use and remember than their predecessors.
hoist()provide new tools for rectangling, converting deeply nested lists into tidy data frames.
unnest()have been changed to match an emerging principle for the design of
...interfaces. Four new functions (
unchop()) reveal that nesting is the combination of two simpler steps.
expand_grid(), a variant of
base::expand.grid(). This is a useful function to know about, but also serves as a good reason to discuss the important role that vctrs plays behind the scenes. You shouldn’t ever have to learn about vctrs, but it brings improvements to consistency and performance.
So Google's gonna be Google. I was skimming through Google's AI blog (highly recommended btw) and was reading about some new neural networks, namely "weight agnostic neural networks" or WANNs https://ai.googleblog.com/2019/08/exploring-weight-agnostic-neural.html
So traditionally, you'll need to design the architecture of neural networks (i.e., how many layers, the connections, how many nodes, etc). These WANNs are apparently a way to use automation and have the computer find out which architectures work the best. It is a fascinating thing to think about.
@sa-js sounds you're well on your way to analyzing the data! You've already gotten the data in a vector form. Stemming them is a great idea as you've mentioned. NLTK should be able to a lot of this for you as you suggest. I'd agree this is a good enough approach for now.
Also, it looks like NLTK has a built-in classifier you can use https://pythonspot.com/natural-language-processing-prediction/
Here are some other resources that might help:
@mridul037 if you want some practice, you can practice going through the collecting data, cleaning/manipulating the data, and visualizing the data workflow.
Here is one such initiative to practice this https://github.com/rfordatascience/tidytuesday