These are chat archives for ramin-git/word_tree_structure

22nd
Apr 2015
ramin-git
@ramin-git
Apr 22 2015 11:12
  1. Why BillionWords data has 3 word_tree file not one? Is it for size?
  2. Please describe structure word_treex.th7 files?
    I understood structure of Tensor. But which correspondence between parent word and child word?
Nicholas Léonard
@nicholas-leonard
Apr 22 2015 15:27
It was for the experiment SoftMaxForest which uses 3 different word trees. For the SoftMaxTree, you just need one.
Parents are clusters of children
there are 10 parents, each with 10 children, which themselves have 10 children and so forth and so on until the 10 children are words.
ramin-git
@ramin-git
Apr 22 2015 16:07
have you any script generate word tree form word frequency?
Nicholas Léonard
@nicholas-leonard
Apr 22 2015 16:14
nope. It took me a month with an SQL database to generate those. It used all the text.