a-r-j on rna
[RNA Graphs] add pseudoknots an… (compare)
a-r-j on packaging
[ML] proto conversion to PyG Da… (compare)
a-r-j on packaging
[Reqs] add plotly (compare)
a-r-j on packaging
[Packaging] add PR template, re… (compare)
a-r-j on packaging
[Conversion] first pass at Nx -… Merge branch 'packaging' of htt… (compare)
dependabot-preview[bot] on pip
Bump biopandas from 0.2.4 to 0.… (compare)
a-r-j on packaging
refine granularity type should… fix docstring typos return ax instead of plt plt i… and 3 more (compare)
dependabot-preview[bot] on pip
LD_LIBRARY_PATH
is set correctly
What I've done in the past is to create the PyG data objects, from a dgl graph g
:
node_features = torch.cat((g.ndata['h'],
g.ndata['ss'],
g.ndata['asa'],
g.ndata['rsa'],
torch.Tensor(pssm)), dim=1)
label = torch.Tensor(label).unsqueeze(dim=1)
geom_graph = (Data(x=torch.cat((node_features, label), dim=1),
edge_index=torch.stack(g.edges(), dim=1),
edge_attr=g.edata['rel_type']
))
You can then take a list of these Data
objects and call e.g. geometric_DataLoader(train_data)
There are some examples here: https://github.com/a-r-j/graphein/blob/master/datasets/ppisp/ppisp_example.ipynb
This example may be more useful.
https://github.com/a-r-j/graphein/blob/master/datasets/pscdb/example_model_nb.ipynb
A word of warning for the examples, I wrote them quickly to illustrate a workflow rather than a well-developed project. There may be mistakes.
DGLGraph(num_nodes=551, num_edges=1447,
ndata_schemes={'id': Scheme(shape=(), dtype=torch.int64), 'residue_name': Scheme(shape=(), dtype=torch.int64), 'h': Scheme(shape=(7,), dtype=torch.float32), 'coords': Scheme(shape=(3,), dtype=torch.float32), 'ss': Scheme(shape=(9,), dtype=torch.float32), 'asa': Scheme(shape=(1,), dtype=torch.float32), 'rsa': Scheme(shape=(1,), dtype=torch.float32)}
edata_schemes={'rel_type': Scheme(shape=(17,), dtype=torch.float64), 'norm': Scheme(shape=(), dtype=torch.float32)})
ProteinGraph
class
ValueError: Expected input batch_size (49971) to match target batch_size (64).
. For some reason, the number of nodes is taken as the entire batch. Have you encountered this/know how to deal with this?