Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Omid Tarkhaneh
    @OmidTarkhaneh
    Anyone says what merits deepchem has? does it only do vectorizing? why we do not use pytorch or keras to do so? which problems deepchem fixes in fact?
    Bharath Ramsundar
    @rbharath
    @elemets Was the peptide represented as a smiles string for input?
    And how large were the peptides in question?
    Omid Tarkhaneh
    @OmidTarkhaneh
    Hello every body. Would you please recommend some resources for Graph NNs useful for deep learning in molecular and chemistry sciences. Many thanks .
    Bharath Ramsundar
    @rbharath
    @OmidTarkhaneh I'd recommend checking out the deepchem tutorials (see tutorials link on deepchem.io)
    Omid Tarkhaneh
    @OmidTarkhaneh
    @rbharath Thank you so much.
    Sahar RZ
    @SaharRohaniZ
    Hi Deepchem team - I am working with ConvMolFeaturizer and I'd like to know how's the feature matrix generated. I used this featurizer to featurize a molecule (fed through a pdb file) that has 155 atoms, and the feature matrix is of shape [95,75]. I want to understand why only 95 atoms were featurized. Any help would be appreciated.
    Bharath Ramsundar
    @rbharath
    @SaharRohaniZ My guess is hydrogens were dropped likely but I'd have to dig into the source code to know for sure
    The featurization is done by internal methods. Probably fastest to take a look at the ConvMolFeaturizer source directly since the featurization methods aren't part of DeepChem's public API
    Sahar RZ
    @SaharRohaniZ
    @rbharath I did the math and you are right. Thanks for your help.
    Arthur Funnell
    @elemets
    @rbharath Hey yes it was represented by smiles strings and the peptides lengths are varied but they are pretty big
    Bharath Ramsundar
    @rbharath
    @elemets It's possible the graphconv is just struggling with lengths. If you're interested, it would actually be very useful if you could contribute a small peptide benchmark dataset for us. None of DeepChem's sample datasets use peptides so we've never benchmarked for that use case
    Omid Tarkhaneh
    @OmidTarkhaneh
    Hello, pytorch-geometric does not installed for me in google colab, I am using pytorch version 1.9.0+cu102. I appreciate any help. Thanks. The error message is as follows: Detected that PyTorch and torch_sparse were compiled with different CUDA versions. PyTorch has CUDA version 10.2 and torch_sparse has CUDA version 11.0. Please reinstall the torch_sparse that matches your PyTorch install. (edited)
    I tried to install torch_spares with cu102 but it does not work for me
    Omid Tarkhaneh
    @OmidTarkhaneh
    This is the installation command that I have used. import torch
    !pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.9.0+cu102.html
    !pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-1.9.0+cu102.html
    !pip install torch-cluster -f https://pytorch-geometric.com/whl/torch-1.9.0+cu102.html
    !pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-1.9.0+cu102.html
    !pip install torch-geometric
    Vignesh Venkataraman
    @VIGNESHinZONE
    There seems to be error in docs, deepchem uses CUDA 11.0 , Also which version of deepchem are you using?
    Omid Tarkhaneh
    @OmidTarkhaneh
    @VIGNESHinZONE No this is not related to the deepchem. Here I just tried to install pytorch-Geometric in google colab.
    Hannes Stärk
    @HannesStark

    Hello!
    Is there someone who is familiar with the BACE dataset from MoleculeNet?
    I was using the BACE dataset through SNAP Stanford's OGBG library but needed access to the smiles representations of the molecules and downloaded the BACE dataset directly from http://moleculenet.ai/datasets-1
    However, the annotations for the scaffold split in the CSV somewhat confuse me:
    There is the "Model" annotation which gives the following amount of molecules for each split:

    Train: 203
    Valid: 45
    Test: 1265

    This seems strange to me, especially considering the split in the OGBG library:

    Train: 1210
    Valid: 151
    Test: 152

    Is there something I am missing? Many thanks for any help!

    Bharath Ramsundar
    @rbharath
    @HannesStark Sorry for the slow response! I saw your email but didn't have a chance to respond.
    This is correct as structured (I was the author on the BACE paper who added the dataset into moleculenet)
    I think the OBGB folks have restructured the dataset somehow but I'm not familiar with what they've done
    For this paper, the reason the train/valid/test was asymmetrical was we were trying to understand the effects of having a small amount of training data as is standard in drug discovery settings
    Hannes Stärk
    @HannesStark
    @rbharath Thank you very much!
    I would like to add on the question what the split indices in http://deepchem.io.s3-website-us-west-1.amazonaws.com/trained_models/Hyperparameter_MoleculeNetv3.tar.gz for instance in bace_cscaffold123.pkl refer to, since when using them I end up with a split of these sizes:
    Train: 1208
    Valid: 153
    Test: 152
    y6q9
    @yuanqidu
    Hi! I have a question about two molecule featurizers, RDKitDescriptors and MordredDescriptors. Is there any correspondence between the returned feature numpy array and real-world descriptors? I checked the website given by mordred, but they have in total 1826 descriptors, while the returned array has 1613 after ignoring 3D parameter is set as True. Do we by any chance still know the name of each features in correspondence to the real-world descriptors?
    Bharath Ramsundar
    @rbharath
    @HannesStark I believe the three datasets were combined then split using a scaffold split on a 80/10/10 spllit
    @yuanqidu Hmm, I'd recommend just checking the source. I think we just called the mordred API directly
    So I don't think we know the names and would have to look at the mordred docs to figure those out
    y6q9
    @yuanqidu
    @rbharath Oh, got it! Thanks!
    Omid Tarkhaneh
    @OmidTarkhaneh
    Hello everybody. During featurizing and working with RdKit. I received this error. I do not know how to fix my SMILES dataset. Any suggestion. The error is like below:
    RDKit ERROR: [03:13:24] Explicit valence for atom # 4 N, 4, is greater than permitted
    stanleydrift
    @stanleydrift
    Can you share the SMILEs that are causing trouble?
    Omid Tarkhaneh
    @OmidTarkhaneh
    @stanleydrift The dataset is a huge dataset, Actually I do not know which one made this problem. I do not know how to fix this issue.
    Bharath Ramsundar
    @rbharath
    @OmidTarkhaneh Sometimes large datasets have weirdly formatted molecules (valence of 4 on a nitrogen doesn't usually make chemical sense)
    It's probably fine in that it tried to do something reasonable on that edge case
    You can go try to manually fix the molecular structure if you understand the system chemistry
    Omid Tarkhaneh
    @OmidTarkhaneh
    @rbharath Thanks a lot Dr.Ramsundar. This is one example of Smiles that made this problem. C1SCC[NH2][C@@H]1[C]([O])=O. I may have to manually fix the dataset.
    @stanleydrift Hi. This is one example of the smiles that made the problem C1SCC[NH2][C@@H]1[C]([O])=O.
    Omid Tarkhaneh
    @OmidTarkhaneh
    @rbharath Actually my major is computer science, but I am doing a project deep learning in chemistry, would you please introduce a basic book in chemistry, even I do not know what is valence mean. Thanks a lot Dr.Ramsundar
    Bharath Ramsundar
    @rbharath
    Valence here means just the number of bonds
    Nitrogen can't usually form 4 bonds (for example, ammonia is NH3)
    I think in the structure you've linked above there are two hydrogens and two carbons bonded to the nitrogen
    Which wouldn't quite make chemical sense. I'm not an expert chemist either so I'm not sure how I would fix that molecule either and might just leave it out were I doing the analysis
    For reading material, I'd recommend wikipedia as a great place to start
    Omid Tarkhaneh
    @OmidTarkhaneh
    @rbharath Thanks a billion. I have read your book too deepChem which was a great source and paved the way for me in this topic. I will try to use wikipedia as you mentioned it is great to start.
    Pantelispanka
    @Pantelispanka
    Hello! I have an issue using Tensorflow in the docker image with a GPU
    I get the error
    2021-07-17 17:52:58.377296: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
    avinash90000
    @avinash90000
    i need to use deepchem-1.2.0, what python version do i need to use?
    Bharath Ramsundar
    @rbharath
    @avinash90000 I'd recommend against deepchem-1.2.0 if possible; it's really old! Can you use a newer DeepChem?
    @Pantelispanka This happens sometimes on CUDA installs if the installation is a bit off