Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • May 09 20:53
    a-r-j edited #59
  • May 09 15:08
    a-r-j opened #59
  • May 09 15:06

    a-r-j on rna

    [RNA Graphs] add pseudoknots an… (compare)

  • May 07 10:17
    a-r-j synchronize #54
  • May 07 10:17

    a-r-j on packaging

    [ML] proto conversion to PyG Da… (compare)

  • May 04 23:14
    a-r-j synchronize #54
  • May 04 23:14

    a-r-j on packaging

    [Reqs] add plotly (compare)

  • May 04 23:13
    a-r-j synchronize #54
  • May 04 23:13

    a-r-j on packaging

    [Packaging] add PR template, re… (compare)

  • Apr 25 16:14
    a-r-j commented #54
  • Apr 25 16:12
    a-r-j synchronize #54
  • Apr 25 16:12

    a-r-j on packaging

    [Conversion] first pass at Nx -… Merge branch 'packaging' of htt… (compare)

  • Apr 20 07:18
    dependabot-preview[bot] labeled #58
  • Apr 20 07:18
    dependabot-preview[bot] opened #58
  • Apr 20 07:18

    dependabot-preview[bot] on pip

    Bump biopandas from 0.2.4 to 0.… (compare)

  • Apr 19 17:35
    a-r-j synchronize #54
  • Apr 19 17:35

    a-r-j on packaging

    refine granularity type should… fix docstring typos return ax instead of plt plt i… and 3 more (compare)

  • Apr 19 17:35
    a-r-j closed #57
  • Apr 19 16:28
    Seanny123 synchronize #57
  • Apr 19 10:18

    dependabot-preview[bot] on pip

    (compare)

Bhavay Aggarwal
@Chokerino
Hey can i get some help getting started with using this package
i managed to set up the environment but running the sample code generates to big os a stack trace and the error - linux-gnu.soAborted (core dumped)
Arian Jamasb
@a-r-j
Hi @Chokerino could you share the stack trace? If I had to guess, I expect there might be some problem wrt your CUDA & PyTorch / Torch Geometric versions
Bhavay Aggarwal
@Chokerino
Hey im trying to try it again on my system and will let you know the results.
Could you tell me how are you using the other libraries like get contacts and all. Also, I am currently creating my dataset of proteins and will at max get about 1500 proteins. Do you think such a complex graph would give good classification results compared to a CNN type model with sequence features or even something like ProtVec
3 replies
Bhavay Aggarwal
@Chokerino
Ah while using pytorch im getting the following error in dgl
"ImportError: cannot import name 'BatchedDGLHeteroGraph' from 'dgl'"
Bhavay Aggarwal
@Chokerino
If i use tensorflow i get a big error which is mainly related to
"tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1';"
Bhavay Aggarwal
@Chokerino
@a-r-j
Arian Jamasb
@a-r-j
It looks like you don't have CUDA installed. Try reinstalling it. Make sure to install the correct CUDA builds of PyTorch, PyTorch geometric & DGL afterwards.
Bhavay Aggarwal
@Chokerino
Hey, the cuda is install version 10.2 and everything else also seems to be correctly installed.
Arian Jamasb
@a-r-j
@Chokerino Could you share the output of nvidia-smi and nvcc —version? This could also be a driver issue but I’m quite confident it’s unrelated to Graphein.
You should also make sure your LD_LIBRARY_PATH is set correctly
Bhavay Aggarwal
@Chokerino
Yea I think it's unrelated to graphein too. I think its something to do with pytorch3d and I'm not able to fix it.
Bhavay Aggarwal
@Chokerino
@a-r-j Hey I finally got it working! Is there any good way to visualize the graph generated?
Arian Jamasb
@a-r-j
Hi again @Chokerino, really glad to hear that! Hope it’s helpful. Re visualisation, I’m adding extended plotting functionality in the near future (1-2 weeks, I expect). Until then, you can construct NetworkX graphs and plot those how you normally would.
Bhavay Aggarwal
@Chokerino
Thanks!
Arian Jamasb
@a-r-j
Hi @Chokerino , I've added some rudimentary plotting functionality to the dev branch. Hope it's helpful!
Bhavay Aggarwal
@Chokerino
@a-r-j hey, do you have any experience in working with pytorch to make graph classification models? I don't have much experience and am only getting more and more errors
Arian Jamasb
@a-r-j
@Chokerino I have some experience. What do you need help with?
Bhavay Aggarwal
@Chokerino
@a-r-j I was trying to make classification models for the graphs I generated with graphein. All the tutorials of pytorch geometric require me to make a torch dataset object but their tutorial to make a new dataset(https://pytorch-geometric.readthedocs.io/en/latest/notes/create_dataset.html#creating-in-memory-datasets) does not show how to make attributes like batch, contains_isolated_nodes() which have been used in their tutorials to make GNN's(https://colab.research.google.com/drive/1I8a0DfQ3fI7Njc62__mVXUlcAleUclnb?usp=sharing#scrollTo=OfQmEavtvOcN)
Arian Jamasb
@a-r-j

What I've done in the past is to create the PyG data objects, from a dgl graph g:

node_features = torch.cat((g.ndata['h'],
                               g.ndata['ss'],
                               g.ndata['asa'],
                               g.ndata['rsa'],
                               torch.Tensor(pssm)), dim=1)

label = torch.Tensor(label).unsqueeze(dim=1)

geom_graph = (Data(x=torch.cat((node_features, label), dim=1),
                       edge_index=torch.stack(g.edges(), dim=1),
                       edge_attr=g.edata['rel_type']
                       ))

You can then take a list of these Data objects and call e.g. geometric_DataLoader(train_data)

There are some examples here: https://github.com/a-r-j/graphein/blob/master/datasets/ppisp/ppisp_example.ipynb

This example may be more useful.

https://github.com/a-r-j/graphein/blob/master/datasets/pscdb/example_model_nb.ipynb

A word of warning for the examples, I wrote them quickly to illustrate a workflow rather than a well-developed project. There may be mistakes.

Bhavay Aggarwal
@Chokerino
@a-r-j Thank you so much!
Bhavay Aggarwal
@Chokerino
@a-r-j i wanted to ask something regarding the graphs which are generated. For example, this is the graph which is generated, DGLGraph(num_nodes=551, num_edges=1447, ndata_schemes={'id': Scheme(shape=(), dtype=torch.int64), 'residue_name': Scheme(shape=(), dtype=torch.int64), 'h': Scheme(shape=(7,), dtype=torch.float32), 'coords': Scheme(shape=(3,), dtype=torch.float32), 'ss': Scheme(shape=(9,), dtype=torch.float32), 'asa': Scheme(shape=(1,), dtype=torch.float32), 'rsa': Scheme(shape=(1,), dtype=torch.float32)} edata_schemes={'rel_type': Scheme(shape=(17,), dtype=torch.float64), 'norm': Scheme(shape=(), dtype=torch.float32)})
Here what does "h" feature represent in the node. Also, how is the secondary structure being represented and what are edge features are being used?
Bhavay Aggarwal
@Chokerino
@a-r-j hey i was also trying to run graphein on a custom pdb file which only contains binding sites but am running into error which says cannot create dssp. Do you know anything about this?
Bhavay Aggarwal
@Chokerino
the residues are not complete
is there any way to bypass this?
Arian Jamasb
@a-r-j
You can set include_ss=False. This should stop using DSSP for the featurisation
This is when initialising the ProteinGraph class
Bhavay Aggarwal
@Chokerino
could you also tell what the "h" feature stands for?
Arian Jamasb
@a-r-j
The "h" features are the low-dimensional embeddings for each amino acid produced by Meiler et al. 2001
Bhavay Aggarwal
@Chokerino
Hey @a-r-j , sorry to disturb you again. I was trying this example https://github.com/a-r-j/graphein/blob/master/datasets/ppisp/ppisp_example.ipynb and i am encountering this error ValueError: Expected input batch_size (49971) to match target batch_size (64).. For some reason, the number of nodes is taken as the entire batch. Have you encountered this/know how to deal with this?
When i call the loss function
Arian Jamasb
@a-r-j
Hi, I'm not sure what's going on there. Are you running the notebook exactly? It might be that some dependency has changed.
But, I would guess that there's something going on with your graph readout
Bhavay Aggarwal
@Chokerino
I have not run your code but tried to run my graph files on it. The example which used DGL-Sci works perfectly fine but this one throws an error for some reason but to my understanding, both of them use the same collate and dataloader and so the batch issue should not be really there. I am not able to find this on any of the forums too.
Bhavay Aggarwal
@Chokerino
@a-r-j are there any plans on upgrading to dgl 0.5.3
Arian Jamasb
@a-r-j
Hi @Chokerino we're doing a bit re-write at the moment to a functional API (you can check it out here: https://github.com/a-r-j/graphein/tree/graphein-api). The core internal data representations are networkx graphs and pandas datarames. We'll have conversion function to get them into DGL and this will make maintenance for future versions easy. If you'd like to contribute, we'd be really happy to provide support.
Bhavay Aggarwal
@Chokerino
@a-r-j I would love to do that but have many projects going on right now as well as an internship.