Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    alat-rights
    @alat-rights

    @Tonylac77 dataset.itersamples() might be what you’re looking for.

    For more info: https://deepchem.readthedocs.io/en/latest/api_reference/data.html#deepchem.data.DiskDataset

    Bharath Ramsundar
    @rbharath
    @Tonylac77 Can you share your code? I've definitely done something similar to what you're describing so I'm not sure what the difference is
    su-chao
    @su-chao
    @Tonylac77 In my ideas if you want use deepchem.model, you should use load_dataset to load you dataset, The ' deepchem.molnet.load_function.molnet_loader' hava a good explain and example
    davidRFB
    @davidRFB
    Hi I am tring yo run some local test for the inclusion of swissprot to molnet. However I am getting this error in the pytest execution :
    image.png
    I already add the function load_swissprot in swissprot_datasets.py
    I already add the line from deepchem.molnet.load_function.swissprot_datasets import load_swissprot
    in the init.py file of molnet
    I dont know if anyone can helpme
    davidRFB
    @davidRFB
    thank u very much
    Bharath Ramsundar
    @rbharath
    Oh this is weird
    Can you run other test files normally?
    su-chao
    @su-chao
    image.png
    Someone can solve the problem that small molecules are too large, for example:
    CC(C)c1ccc(cc1)NC(=O)O[C@@H]1CO[C@H]2[C@H](CO[C@@H]12)NC(=O)Nc1cccc(C(F)(F)F)c1
    my featurizer : dc.feat.CircularFingerprint()
    su-chao
    @su-chao
    Hi, Anyone can tell me the output predict.shape always (#,#,2) which is 3D,
    and that predict[:,:0] and predict[:,:,1] is differnet values,which is the correct predicted value ?
    su-chao
    @su-chao
    It's very confusing
    davidRFB
    @davidRFB
    Hi @rbharath Thank you for answer. The same mistake appears with other test.
    image.png
    I am executing with the deepchem 2.6.0dev into the deepchem directory that I fork from github. Maybe a environment reset could work ?
    Tonylac77
    @Tonylac77

    Dear all, sorry for the late answer. It seems we have fixed the iteration over Diskdatasets. In this case we use a 'for' loop where we split the tuples generated by the k-fold split function, and then output them as two csv files (we want to do this for use in other machine learning software).

    k = [k1, k2, k3, k4, k5, k6, k7, k8, k9, k10] #where K1-K10 are the folds from the splitting
    a = 1
    
    for x in k:
        train = x[0].to_dataframe()
        cv = x[1].to_dataframe()
        train = train['ids']
        cv = cv['ids']
        train.to_csv("k"+str(a) +"_train.csv")
        cv.to_csv("k"+str(a) +"_cv.csv")
        a = a+1

    This works fine when loading our dataset from CSV (with CSVLoader function) without an ID field. However, if we try to use a dataset with ChemBL IDs (in this case) we get the following RDkit error when performing the k-fold-split (see below) would love any input on this!

    ArgumentError                             Traceback (most recent call last)
    <ipython-input-8-604eaa868421> in <module>
          1 # split dataset
    ----> 2 k = splitter.k_fold_split(dataset=dataset, k=10)
    
    C:\Anaconda3\envs\deepchem38\lib\site-packages\deepchem\splits\splitters.py in k_fold_split(self, dataset, k, directories, **kwargs)
         84       frac_fold = 1. / (k - fold)
         85       train_dir, cv_dir = directories[2 * fold], directories[2 * fold + 1]
    ---> 86       fold_inds, rem_inds, _ = self.split(
         87           rem_dataset,
         88           frac_train=frac_fold,
    
    C:\Anaconda3\envs\deepchem38\lib\site-packages\deepchem\splits\splitters.py in split(self, dataset, frac_train, frac_valid, frac_test, seed, log_every_n)
       1107     for ind, smiles in enumerate(dataset.ids):
       1108       mols.append(Chem.MolFromSmiles(smiles))
    -> 1109     fps = [AllChem.GetMorganFingerprintAsBitVect(x, 2, 1024) for x in mols]
       1110 
       1111     # calcaulate scaffold sets
    
    C:\Anaconda3\envs\deepchem38\lib\site-packages\deepchem\splits\splitters.py in <listcomp>(.0)
       1107     for ind, smiles in enumerate(dataset.ids):
       1108       mols.append(Chem.MolFromSmiles(smiles))
    -> 1109     fps = [AllChem.GetMorganFingerprintAsBitVect(x, 2, 1024) for x in mols]
       1110 
       1111     # calcaulate scaffold sets
    
    ArgumentError: Python argument types in
        rdkit.Chem.rdMolDescriptors.GetMorganFingerprintAsBitVect(NoneType, int, int)
    did not match C++ signature:
        GetMorganFingerprintAsBitVect(class RDKit::ROMol mol, unsigned int radius, unsigned int nBits=2048, class boost::python::api::object invariants=[], class boost::python::api::object fromAtoms=[], bool useChirality=False, bool useBondTypes=True, bool useFeatures=False, class boost::python::api::object bitInfo=None, bool includeRedundantEnvironments=False)
    Vignesh Ram Somnath
    @vsomnath
    This feels like a molecule / SMILES is invalid.
    Bharath Ramsundar
    @rbharath
    @davidRFB Looks like an error in your environment setup? We have some issues with installing head right now (because of the deepchem 2.6.0 delay and tensorfllow) so perhaps try installing the environment manually?
    18 replies
    @Tonylac77 Seconding @vsomnath. So basically rdkit couldn't load the smiles you provided into a Mol type and returned None instead which is triggering a downstream error. Perhaps check if you have an invalid smiles
    Antonio Parra
    @parrasevilla91_twitter
    Hi there! Anyone can recommend the most suitable training model for predict the activity of a molecule, based in a MoleculeNet dataset?
    Bharath Ramsundar
    @rbharath
    @parrasevilla91_twitter dc.models.GraphConvModel is probably a good place to start
    alat-rights
    @alat-rights
    I was wondering if we should do anything about the deprecation warning about imp raised by a number of DeepChem unit tests?
    image.png
    I’m not sure if I’ve brought it up before so sorry if I have
    Antonio Parra
    @parrasevilla91_twitter

    @parrasevilla91_twitter dc.models.GraphConvModel is probably a good place to start

    thank you so much @rbharath

    Bharath Ramsundar
    @rbharath
    @alat-rights We should definitely fix these deprecation warnings. They seem to primarily be coming from TensorFlow though, so maybe will be lessened as we migrate to pytorch over time
    alat-rights
    @alat-rights
    Sounds good!
    Arthur Funnell
    @elemets
    I've been training a GraphConv model on peptide data but can't seem to get it predicting decently with an R-squared of -0.02? When I used a DAG model I trained for a lot less time and still got an R squared of around 0.6. Would anyone have a good suggestion of why, what should I try next?
    Omid Tarkhaneh
    @OmidTarkhaneh
    Anyone says what merits deepchem has? does it only do vectorizing? why we do not use pytorch or keras to do so? which problems deepchem fixes in fact?
    Bharath Ramsundar
    @rbharath
    @elemets Was the peptide represented as a smiles string for input?
    And how large were the peptides in question?
    Omid Tarkhaneh
    @OmidTarkhaneh
    Hello every body. Would you please recommend some resources for Graph NNs useful for deep learning in molecular and chemistry sciences. Many thanks .
    Bharath Ramsundar
    @rbharath
    @OmidTarkhaneh I'd recommend checking out the deepchem tutorials (see tutorials link on deepchem.io)
    Omid Tarkhaneh
    @OmidTarkhaneh
    @rbharath Thank you so much.
    Sahar RZ
    @SaharRohaniZ
    Hi Deepchem team - I am working with ConvMolFeaturizer and I'd like to know how's the feature matrix generated. I used this featurizer to featurize a molecule (fed through a pdb file) that has 155 atoms, and the feature matrix is of shape [95,75]. I want to understand why only 95 atoms were featurized. Any help would be appreciated.
    Bharath Ramsundar
    @rbharath
    @SaharRohaniZ My guess is hydrogens were dropped likely but I'd have to dig into the source code to know for sure
    The featurization is done by internal methods. Probably fastest to take a look at the ConvMolFeaturizer source directly since the featurization methods aren't part of DeepChem's public API
    Sahar RZ
    @SaharRohaniZ
    @rbharath I did the math and you are right. Thanks for your help.
    Arthur Funnell
    @elemets
    @rbharath Hey yes it was represented by smiles strings and the peptides lengths are varied but they are pretty big
    Bharath Ramsundar
    @rbharath
    @elemets It's possible the graphconv is just struggling with lengths. If you're interested, it would actually be very useful if you could contribute a small peptide benchmark dataset for us. None of DeepChem's sample datasets use peptides so we've never benchmarked for that use case
    Omid Tarkhaneh
    @OmidTarkhaneh
    Hello, pytorch-geometric does not installed for me in google colab, I am using pytorch version 1.9.0+cu102. I appreciate any help. Thanks. The error message is as follows: Detected that PyTorch and torch_sparse were compiled with different CUDA versions. PyTorch has CUDA version 10.2 and torch_sparse has CUDA version 11.0. Please reinstall the torch_sparse that matches your PyTorch install. (edited)
    I tried to install torch_spares with cu102 but it does not work for me
    Omid Tarkhaneh
    @OmidTarkhaneh
    This is the installation command that I have used. import torch
    !pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.9.0+cu102.html
    !pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-1.9.0+cu102.html
    !pip install torch-cluster -f https://pytorch-geometric.com/whl/torch-1.9.0+cu102.html
    !pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-1.9.0+cu102.html
    !pip install torch-geometric
    Vignesh Venkataraman
    @VIGNESHinZONE
    There seems to be error in docs, deepchem uses CUDA 11.0 , Also which version of deepchem are you using?
    Omid Tarkhaneh
    @OmidTarkhaneh
    @VIGNESHinZONE No this is not related to the deepchem. Here I just tried to install pytorch-Geometric in google colab.
    Hannes Stärk
    @HannesStark

    Hello!
    Is there someone who is familiar with the BACE dataset from MoleculeNet?
    I was using the BACE dataset through SNAP Stanford's OGBG library but needed access to the smiles representations of the molecules and downloaded the BACE dataset directly from http://moleculenet.ai/datasets-1
    However, the annotations for the scaffold split in the CSV somewhat confuse me:
    There is the "Model" annotation which gives the following amount of molecules for each split:

    Train: 203
    Valid: 45
    Test: 1265

    This seems strange to me, especially considering the split in the OGBG library:

    Train: 1210
    Valid: 151
    Test: 152

    Is there something I am missing? Many thanks for any help!