These are chat archives for jdrudolph/goenrich

9th
Sep 2017
Alexander Lenail
@zfrenchee
Sep 09 2017 15:05
Hi Jan! I'm curious what the propagation step is for
Could you explain it to me a little?
here's my guess
""" Example graph
            r
          /   \
        c1     c2
          \   /  \
           \ /    \
            c3    c4
        """
if any gene has go term annotation c3, then it should also have c1 and c2 (and r)
even if that's not in the dataframe already
That makes sense, but I guess I just feel like the dataframe should be complete
in the first place
What do you think?
Jan Rudolph
@jdrudolph
Sep 09 2017 15:54
you are correct. the ontology is hierarchical, therefore terms need to be propagated up the DAG. If you download the annotations for GO, you get only the most specific annotations for each gene, making the propagation necessary.
Non-hierarchical annotations, such as KEGG, do not include a propagation step.
You could put all the information into a pd.DataFrame, which will make it quite big, but I don't see the use case for having it in a pd.DataFrame.
Propagation is straight forward in the graph. Generating a pd.DataFrame from the results shouldn't be difficult.