Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    9H-Fluorene (Kasitinard M.)
    @ninehfluorene:matrix.org
    [m]
    again thanks for your help @rbharath
    Atreya Majumdar
    @atreyamaj
    Hey everyone! I was wondering if we could maybe add a description to the PyPI release of Deepchem? I would love to contribute and write the description as well if someone could point me in the right direction as to where I can start writing it!
    alat-rights
    @alat-rights
    I think that would be really helpful! I’m not too sure how that would work. Maybe @rbharath would have a better idea?
    alat-rights
    @alat-rights
    Would it make sense for us to separate the flaky tests from the non-flaky tests in the test-suite so that our “build passing/failing” badge is more useful?
    Bharath Ramsundar
    @rbharath
    @atreyamaj That would be a great idea :). If you'd like to help, DM me your pypi username and I'll give you access to edit the description. It might be good to make the description documented in the main repo and add a note to the release docs to update the pypi description if needed
    @alat-rights It's a little tricky basically. The flaky tests are already separated out so most of their failures don't affect the CI but we have a hundreds of tests and we run into edge cases that are hard to route around. The only solution right now is to just look into each case individually and try to understand the cause of the failure, but maybe there's a better idea
    Atreya Majumdar
    @atreyamaj
    Thank you, I have sent you a DM
    meihua
    @MhDang
    Hello! I am new to AI drug and deepchem library. I have to say this work is remarkable and very user-friendly to beginners like me. I am now trying to play with the models and reproduce the results in http://moleculenet.ai/latest-results, I find the benchmark scripts in deepchem/examples/benchmark.py, I am wondering whether you happen to keep the record of hyperparameters to reproduce the results ?
    Bharath Ramsundar
    @rbharath
    @MhDang Welcome to the project! I'd suggest checking out the new moleculenet repo: https://github.com/deepchem/moleculenet
    We have a new leaderboard up with maintained results. The original model benchmarks were run on TF 1.x and the underlying libraries have changed a lot so it's not easy to directly replicate those results
    meihua
    @MhDang
    @rbharath Thanks a lot for the reference, let me try to reproduce the new results!
    kurokawaikki
    @kurokawaikki
    @rbharath I am sorry for the late responses. In my research, the host compounds are the cyclic (or ring like) compounds such as valinomycin. I try to find a guest compound that can fit into the ring area and predicts the binding energy. Host–guest chemistry is currently being tested in SAMPL challenge. Since I would like to build a prediction model, I hope I could have as many sample as possible. Therefore, is there a dataset in the deepchem? Or do you have any suggestion for me to find the datasets? Thank you very much!
    Bharath Ramsundar
    @rbharath
    @kurokawaikki Ah, I see! That makes sense. Hmm, unfortunately, I'm not aware of a good dataset in DeepChem for host-guest interactions. Closest would be pdbbind but that's for more generic protein-ligand interactions and not for the types of host-guest interactions you're envisioning
    Sahar RZ
    @SaharRohaniZ
    Hi Deepchem team - The GraphConvModel is failing with PDB bind data saying ndarray doesn't have atom_features. Is this a known error? is there a workaround to make GraphConvModel work with PDBbind data ? thanks for your input in advance.
    Bharath Ramsundar
    @rbharath
    @SaharRohaniZ For pdbbind data, you'd probably want an interaction fingerprint or something that handles the protein ligand complexes (check out tutorials 13/14 for examples). You can do graph conv on the ligands only but you might need to do some custom processing
    Sahar RZ
    @SaharRohaniZ
    Thanks @rbharath for your reply.
    Vignesh Venkataraman
    @VIGNESHinZONE

    Hi everyone,
    I have working with generative modelling for Molecules (SMILES) and I was exploring the AspuruGuzikAutoEncoder given on seqtoseq.py. The original paper has a step for Gaussian Process step for exploring the latent space and I couldn't find its implementation in deepchem. It would be really helpful if someone could suggest me generative models research or frameworks which can provide us with the option of exploring the latent space for finding more optimized molecules.

    reference -

    1. Aspuru Guzik's Mol VAE paper - https://arxiv.org/abs/1610.02415 (Gaussian Process is given in Page 11 , Optimization of molecules via properties)

    Thanks in advance :)

    Bharath Ramsundar
    @rbharath
    @VIGNESHinZONE Have you checked out the normalizing flows or the new molgan?
    I don't think we have a good out-of-box technique for exploring the latent space but something should work
    *might work :)
    Atreya Majumdar
    @atreyamaj

    I found this repo for the paper you linked above: https://github.com/HIPS/molecule-autoencoder

    It's outside of deepchem, but hope this helps!
    @VIGNESHinZONE

    Vignesh Venkataraman
    @VIGNESHinZONE
    @rbharath I just checked them out and they might be useful. Thank you :)
    @atreyamaj Thanks for link :) I will definitely check them out
    Gökhan Tahıl
    @gokhantahil
    Hello everyone, I try to optimize hyperparameters of RF on DeepChem but I guess there is a bug.
    _criterionmae_max_depth_8_min_samples_leaf_3_min_samples_split_3_min_weight_fraction_leaf_0_n_estimators_120: 0.6832978488194399 here is the result of the first hyperparameter research
    and the r2 score of testing : 0.7238619516613416
    when i add another value to "min samples leaf" the validation score and r2 score of testing are changing
    so the validation scores :
    _criterionmae_max_depth_8_min_samples_leaf_1_min_samples_split_3_min_weight_fraction_leaf_0_n_estimators_120_njobs-1: 0.6514039887382881
    _criterionmae_max_depth_8_min_samples_leaf_3_min_samples_split_3_min_weight_fraction_leaf_0_n_estimators_120_njobs-1: 0.6667454007419835
    and the r2 score of testing : 0.7514118547850513
    Saurav Maheshkar
    @SauravMaheshkar
    Hello guys, I was working on issue deepchem/deepchem#631 and opened up a draft PR deepchem/deepchem#2501. I'm quite new to deepchem and would appreciate any help I can get. It involved the use of the BindingPocketFeaturizer.
    Bharath Ramsundar
    @rbharath
    @SauravMaheshkar Will try to take a look within a day or two :)
    @gokhantahil Sorry, I'm not sure what the bug is here. Would you mind clarifying a bit more?
    simonaxelrod
    @simonaxelrod
    Hi everyone - I have a basic question about the pdbbind data. My understanding is that a pdbbind model can take either the ligand or the protein-ligand complex as input, and produce -ln(kd/ki) as output. Are the proteins all the same or are they different? If they're different, how can a purely ligand-based model be trained to predict -ln(kd/ki)? Wouldn't it also need some information about the protein as input?
    Bharath Ramsundar
    @rbharath
    @simonaxelrod The proteins are different. (I think a few proteins are repeated but these are the exceptions). The purely ligand models are really learning a measure of "ligand-ness" and are more of a baseline control on the protein-ligand models. The delta between the protein-ligand model and the ligand-only model are a measure of how much information about the protein the model is actually using
    simonaxelrod
    @simonaxelrod
    Thanks @rbharath! That makes sense that the ligand models are really just a baseline control. Though I think some papers would benefit from saying this explicitly - for example, the ChemProp paper just notes that their model outperforms all moleculenet models for all tasks other than QM and pdbbind. But should probably have said that it definitely shouldn't work for pdbbind or something would be really wrong
    Bharath Ramsundar
    @rbharath
    Yes agreed! This is a subtle point that comparisons in the literature often miss
    kingscolour
    @kingscolour

    I'm looking into featurizing a set of molecules with the ConvMolFeaturizer. I'm interested in featurizing the chemical environment of the atoms within the molecule so I presume that I'd want to set the per_atom_fragmentation parameter. In the docs it notes:

    This option is typically used in combination with a FlatteningTransformer to split the lists into separate samples.

    I can't find any mention of FlatteningTransformer in the docs, can someone point me somewhere?

    Bharath Ramsundar
    @rbharath
    @kingscolour per_atom_fragmentation is a new feature so this may be a docs error. Check out the new tutorial at https://github.com/deepchem/deepchem/blob/master/examples/tutorials/Training_a_Normalizing_Flow_on_QM9.ipynb
    kingscolour
    @kingscolour
    Thanks! I actually missed that one because I didn't skipped over the files without a number. The Atomic Contributions for Molecules tutorial was also helpful for my understanding. Cheers for your work!
    Bharath Ramsundar
    @rbharath
    Oh my bad! Meant to link the atomic contributions tutorial and not the normalizing flows one lol
    kingscolour
    @kingscolour
    No worries! The Normalizng Flow tutorial seems to be helpful too. I'd like to model my own small molecule data with deepchem tools, but it's a bit overwhelming because I've only done basic MLP and decision trees/random forests thus far. I have a basic understanding of GraphConv and Transformers, but I'm still trying to bridge that understanding to implementation. So again, thanks!
    ivy1997
    @ivy1997:matrix.org
    [m]
    Hi! I have a question regarding inconsistent numbers of samples before and after Keras model prediction. I have 1809 data; however, after I ran data_generator -> fit_generator -> predict_on_generator, the numbers of my dataset became 1856. Could you help me find the problem? Thank you.
    The above pictures are my code. n_classes=3. Thank you very much.
    hcy5561
    @hcy5561
    Hi , I am relatively new to this. I have 26 compounds. For these, there is a data set in csv format that includes descriptors that I have calculated around 5000 and ic50 value. For these, I would like to perform QSAR analysis with deep learning artificial neural networks and predicted ic50 values. However, I am insufficient with code. Can anyone help?
    Thanks
    Bharath Ramsundar
    @rbharath
    @ivy1997:matrix.org This looks like a batching issue :). So predict on generator pads to full batches by default. Just trim off the last few elements to recover the dataset predictions (null datapoints are appended by default I believe)
    1 reply
    @hcy5561 Check out our tutorials :). Tutorial 4 in particular might help you get started with QSAR analysis