I have been working on benchmarking commonly used frameworks/libraries for unsupervised learning of word embeddings(word2vec). Since learning embeddings is a frequently used technique, this will be helpful for many working in this field.
I am currently comparing tensorflow(cpu/gpu), gensim, deeplearning4j and the original c code on standard metrics like training time, peak memory usage and quality of learned vectors.
Link to my github repo
(still working on it).
I have directly picked up the code for training on each framework from the example given in their respective official github repositories. I ran the benchmark on text8 corpus(plan to run it on a much larger corpus later for the true picture) which gave me strange results
I would really appreciate it if you could have a look at the tensorflow code
(for word2vec) and give feedback/suggest changes.
Thanks for your time!