Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Bharath Ramsundar
    @rbharath
    @SaharRohaniZ For pdbbind data, you'd probably want an interaction fingerprint or something that handles the protein ligand complexes (check out tutorials 13/14 for examples). You can do graph conv on the ligands only but you might need to do some custom processing
    Sahar RZ
    @SaharRohaniZ
    Thanks @rbharath for your reply.
    Vignesh Venkataraman
    @VIGNESHinZONE

    Hi everyone,
    I have working with generative modelling for Molecules (SMILES) and I was exploring the AspuruGuzikAutoEncoder given on seqtoseq.py. The original paper has a step for Gaussian Process step for exploring the latent space and I couldn't find its implementation in deepchem. It would be really helpful if someone could suggest me generative models research or frameworks which can provide us with the option of exploring the latent space for finding more optimized molecules.

    reference -

    1. Aspuru Guzik's Mol VAE paper - https://arxiv.org/abs/1610.02415 (Gaussian Process is given in Page 11 , Optimization of molecules via properties)

    Thanks in advance :)

    Bharath Ramsundar
    @rbharath
    @VIGNESHinZONE Have you checked out the normalizing flows or the new molgan?
    I don't think we have a good out-of-box technique for exploring the latent space but something should work
    *might work :)
    Atreya Majumdar
    @atreyamaj

    I found this repo for the paper you linked above: https://github.com/HIPS/molecule-autoencoder

    It's outside of deepchem, but hope this helps!
    @VIGNESHinZONE

    Vignesh Venkataraman
    @VIGNESHinZONE
    @rbharath I just checked them out and they might be useful. Thank you :)
    @atreyamaj Thanks for link :) I will definitely check them out
    Gökhan Tahıl
    @gokhantahil
    Hello everyone, I try to optimize hyperparameters of RF on DeepChem but I guess there is a bug.
    _criterionmae_max_depth_8_min_samples_leaf_3_min_samples_split_3_min_weight_fraction_leaf_0_n_estimators_120: 0.6832978488194399 here is the result of the first hyperparameter research
    and the r2 score of testing : 0.7238619516613416
    when i add another value to "min samples leaf" the validation score and r2 score of testing are changing
    so the validation scores :
    _criterionmae_max_depth_8_min_samples_leaf_1_min_samples_split_3_min_weight_fraction_leaf_0_n_estimators_120_njobs-1: 0.6514039887382881
    _criterionmae_max_depth_8_min_samples_leaf_3_min_samples_split_3_min_weight_fraction_leaf_0_n_estimators_120_njobs-1: 0.6667454007419835
    and the r2 score of testing : 0.7514118547850513
    Saurav Maheshkar
    @SauravMaheshkar
    Hello guys, I was working on issue deepchem/deepchem#631 and opened up a draft PR deepchem/deepchem#2501. I'm quite new to deepchem and would appreciate any help I can get. It involved the use of the BindingPocketFeaturizer.
    Bharath Ramsundar
    @rbharath
    @SauravMaheshkar Will try to take a look within a day or two :)
    @gokhantahil Sorry, I'm not sure what the bug is here. Would you mind clarifying a bit more?
    simonaxelrod
    @simonaxelrod
    Hi everyone - I have a basic question about the pdbbind data. My understanding is that a pdbbind model can take either the ligand or the protein-ligand complex as input, and produce -ln(kd/ki) as output. Are the proteins all the same or are they different? If they're different, how can a purely ligand-based model be trained to predict -ln(kd/ki)? Wouldn't it also need some information about the protein as input?
    Bharath Ramsundar
    @rbharath
    @simonaxelrod The proteins are different. (I think a few proteins are repeated but these are the exceptions). The purely ligand models are really learning a measure of "ligand-ness" and are more of a baseline control on the protein-ligand models. The delta between the protein-ligand model and the ligand-only model are a measure of how much information about the protein the model is actually using
    simonaxelrod
    @simonaxelrod
    Thanks @rbharath! That makes sense that the ligand models are really just a baseline control. Though I think some papers would benefit from saying this explicitly - for example, the ChemProp paper just notes that their model outperforms all moleculenet models for all tasks other than QM and pdbbind. But should probably have said that it definitely shouldn't work for pdbbind or something would be really wrong
    Bharath Ramsundar
    @rbharath
    Yes agreed! This is a subtle point that comparisons in the literature often miss
    kingscolour
    @kingscolour

    I'm looking into featurizing a set of molecules with the ConvMolFeaturizer. I'm interested in featurizing the chemical environment of the atoms within the molecule so I presume that I'd want to set the per_atom_fragmentation parameter. In the docs it notes:

    This option is typically used in combination with a FlatteningTransformer to split the lists into separate samples.

    I can't find any mention of FlatteningTransformer in the docs, can someone point me somewhere?

    Bharath Ramsundar
    @rbharath
    @kingscolour per_atom_fragmentation is a new feature so this may be a docs error. Check out the new tutorial at https://github.com/deepchem/deepchem/blob/master/examples/tutorials/Training_a_Normalizing_Flow_on_QM9.ipynb
    kingscolour
    @kingscolour
    Thanks! I actually missed that one because I didn't skipped over the files without a number. The Atomic Contributions for Molecules tutorial was also helpful for my understanding. Cheers for your work!
    Bharath Ramsundar
    @rbharath
    Oh my bad! Meant to link the atomic contributions tutorial and not the normalizing flows one lol
    kingscolour
    @kingscolour
    No worries! The Normalizng Flow tutorial seems to be helpful too. I'd like to model my own small molecule data with deepchem tools, but it's a bit overwhelming because I've only done basic MLP and decision trees/random forests thus far. I have a basic understanding of GraphConv and Transformers, but I'm still trying to bridge that understanding to implementation. So again, thanks!
    ivy1997
    @ivy1997:matrix.org
    [m]
    Hi! I have a question regarding inconsistent numbers of samples before and after Keras model prediction. I have 1809 data; however, after I ran data_generator -> fit_generator -> predict_on_generator, the numbers of my dataset became 1856. Could you help me find the problem? Thank you.
    The above pictures are my code. n_classes=3. Thank you very much.
    hcy5561
    @hcy5561
    Hi , I am relatively new to this. I have 26 compounds. For these, there is a data set in csv format that includes descriptors that I have calculated around 5000 and ic50 value. For these, I would like to perform QSAR analysis with deep learning artificial neural networks and predicted ic50 values. However, I am insufficient with code. Can anyone help?
    Thanks
    Bharath Ramsundar
    @rbharath
    @ivy1997:matrix.org This looks like a batching issue :). So predict on generator pads to full batches by default. Just trim off the last few elements to recover the dataset predictions (null datapoints are appended by default I believe)
    1 reply
    @hcy5561 Check out our tutorials :). Tutorial 4 in particular might help you get started with QSAR analysis
    hcy5561
    @hcy5561
    This does not meet the question I am asking and what I want. I do not use fingerprints. I will not calculate any descriptors for my molecules. I just gave numbers like 1,2,3,4 to the molecules. Already in the csv file, I put the calculated descriptors for all complexes in the csv from some external programs. I want to use this csv and create my model with deep learning artificial neural networks.
    Bharath Ramsundar
    @rbharath
    You can adapt the example to process your data. Take a look at https://deepchem.readthedocs.io/en/latest/api_reference/featurizers.html#userdefinedfeaturizer. This will require some familiarity with the DeepChem API though since we don't have an out-of-box example for your use case
    hcy5561
    @hcy5561
    Thanks.
    I will search for other deep learning programs.
    ivy1997
    @ivy1997:matrix.org
    [m]
    ivy1997
    @ivy1997:matrix.org
    [m]
    So I originally have 1809 elements. When I used batch_size=64, the number of predicted result bacame 1856, which was 47 (64-17)more than original dataset. Then I tried batch_size=32, the number of predicted result became 1824, which was 15 (32-17) more than original dataset.🥲
    Bharath Ramsundar
    @rbharath
    @ivy1997:matrix.org I believe what's happening is that the last batch is getting padded here. It's a little surprising to me that the predictions aren't all null! This may just be undefined behavior in the predictions (I don't recall how precisely the padded elements are generated and it's possible there's some variation there)
    We should really have better documentation on this...
    Bharath Ramsundar
    @rbharath
    @ivy1997:matrix.org I've started up deepchem/deepchem#2513 to document and hopefully fix
    ivy1997
    @ivy1997:matrix.org
    [m]
    OK. Thank you!! Hope it can be solved.🙂
    paulsonak
    @paulsonak
    Hi all, are there any plans (or existing implementations) for exporting deepchem models into a standard format such as ONNX or PMML?
    Bharath Ramsundar
    @rbharath
    @paulsonak There's some interest! We're working towards establishing a modelhub and adopting some common framework like ONNX/PMML for weight storage would be useful. We don't have any infrastructure for this yet though. See the discussion https://forum.deepchem.io/t/a-sketch-of-a-modelhub/445
    1 reply
    Karthik Viswanathan
    @nickinack
    Hey, I am trying to reproduce a paper that requires the following version of deepchem: https://github.com/deepchem/deepchem/tree/july2017. Unfortunately, this link is inactive. How do I download and use this version?
    Bharath Ramsundar
    @rbharath
    That was released in July 2017
    Karthik Viswanathan
    @nickinack
    @rbharath Thank you very much :) I also wanted to ask how the ConvMolFeaturizer() works. Do you have any documentation which explains how the featurisation takes place?