So Google's gonna be Google. I was skimming through Google's AI blog (highly recommended btw) and was reading about some new neural networks, namely "weight agnostic neural networks" or WANNs https://ai.googleblog.com/2019/08/exploring-weight-agnostic-neural.html
So traditionally, you'll need to design the architecture of neural networks (i.e., how many layers, the connections, how many nodes, etc). These WANNs are apparently a way to use automation and have the computer find out which architectures work the best. It is a fascinating thing to think about.
@sa-js sounds you're well on your way to analyzing the data! You've already gotten the data in a vector form. Stemming them is a great idea as you've mentioned. NLTK should be able to a lot of this for you as you suggest. I'd agree this is a good enough approach for now.
Also, it looks like NLTK has a built-in classifier you can use https://pythonspot.com/natural-language-processing-prediction/
Here are some other resources that might help:
@mridul037 if you want some practice, you can practice going through the collecting data, cleaning/manipulating the data, and visualizing the data workflow.
Here is one such initiative to practice this https://github.com/rfordatascience/tidytuesday
Bioinformatics related, here are some I've found useful:
If you have specific questions for bioinformatic data science, feel free to ask around here :smile: