by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    rfhari
    @rfhari

    Hi @rbharath, I'm working on the QM7/QM7b dataset. From the following link, I downloaded the dataset - http://moleculenet.ai/datasets-1.

    But the .mat file doesn't seem to have the smile format of the molecules. There are only 3-d cartesian data and electronic properties available. Can you please guide me in getting the smiles as well.

    Bharath Ramsundar
    @rbharath
    @rfhari I think you can use RDKit to load molecules from sdf file:
    suppl = Chem.SDMolSupplier('file.sdf')
    mols = [x for x in suppl]
    rfhari
    @rfhari
    Thanks @rbharath! But I think, only .mat file is there in the above link (http://moleculenet.ai/datasets-1). There ain't any sdf file or other files in the link
    Bharath Ramsundar
    @rbharath
    Ah, I see. You can load .mat files with scipy: https://scipy-cookbook.readthedocs.io/items/Reading_mat_files.html
    I think the original datasets were processed with matlab for some reason
    rfhari
    @rfhari
    ohh okay. Thanks @rbharath!
    ^ This file shows how the .mat files can be loaded into deepchem datasets
    rfhari
    @rfhari
    Yeah, Thanks a lot, this works!
    Can you please clarify - the label "u0 _atom" refers to what exactly?
    Bharath Ramsundar
    @rbharath
    I don't recall off the top of my head, but it might be atomization energy I think
    rfhari
    @rfhari
    ohh okay. Thanks a lot for the guidance!
    Bharath Ramsundar
    @rbharath
    As a reminder folks, GSoC applications are due tomorrow! Please make sure to submit if you plan on applying
    macca1996-bit
    @macca1996-bit
    Hey guys, hope everyone is staying healthy. Quick question: Is it possible to convert a graph conv vector representation of a molecule back into a SMILES string?
    Bharath Ramsundar
    @rbharath
    @macca1996-bit Good question! By graph conv vector representation, do you mean the extracted neural fingerprint? There's unfortunately not a great way to translate back to the original (the transformation wasn't designed to be invertible)
    What's your application? There might be a workaround possible for what you're looking to do
    rfhari
    @rfhari
    Hi @rbharath, I'm working on QM7 dataset. I got smiles from the following link https://github.com/deepchem/deepchem/blob/master/deepchem/molnet/load_function/qm7_datasets.py
    But seems like it contains only one label, so how is "Multitasks" performed in this, which is mentioned in the MoleculeNET paper. Can you please explain
    Bharath Ramsundar
    @rbharath
    @rfhari I believe that the multitask was done by combining data from many different datasets together
    I think there might have been one massive multitask model that combined all (or a large subset) of moleculenet
    rfhari
    @rfhari
    ohh, but from the above link, I could get a .csv file with a single label. Can you please guide me on getting this combined big dataset.
    Bharath Ramsundar
    @rbharath
    Hmm, I'll have to go back and check the manuscript tbh. I didn't run the models for this part so I don't recall how it was don
    I remember that the multitask models didn't actually work that great
    Are you trying to get a high performing model or are you interested in directly replicating MoleculeNet
    If it's the first, I think no need to do multitask
    rfhari
    @rfhari
    I'm trying to replicate and understand all the models. It's mentioned in the paper that, for QM7 dataset KRR (CM) i.e. multitask model performed the best among all. So, that why I'm a bit confused. Sorry for bothering you too much
    Bharath Ramsundar
    @rbharath
    No worries at all! I'm glad to help
    It's just that MoleculeNet was a large project with lots of contributors and lots of details
    So I don't know off-hand all the details. I think the paper and the code in the repo now are our best resources
    Have you gotten singletask models on QM7 running and giving reasonable numbers?
    rfhari
    @rfhari
    ohh okay, Thanks a lot @rbharath! I'll refer to those. Yeah, I got the single-task model working good.
    iherath
    @iherath
    Hi I am new to deepchem and was going through the tutorial titled "Predicting Ki of Ligands to a Protein", and I have some questions:
    Bharath Ramsundar
    @rbharath
    Welcome @iherath!
    What issues are you running into?
    iherath
    @iherath
    Thank you!
    Yeah how do you load your own protein into the notebook?
    Is there a place where I can find the necessary kind of csv file specified?
    Bharath Ramsundar
    @rbharath
    Ah, so that model is for a dataset of binding measurement for a particular protein
    It's not a structure based model where you can load a separate protein
    iherath
    @iherath
    Oh okay thank you
    Bharath Ramsundar
    @rbharath
    To work on your own protein, you'll need to write a bit of custom code. Take a look at deepchem.molnet.load_pdbbind and see how it featurizes protein/ligands from the pdbbind dataset
    We should definitely have better documentation on how to do this. I'm working on revamping the docs over the next few weeks so I'll try to improve the explanations
    iherath
    @iherath
    Oh okay thanks so much!
    iherath
    @iherath
    Also, what kind of preprocessing of the protein does one need to do before performing these analyses? And how do I account for the different conformations of the protein and the ligands?
    Bharath Ramsundar
    @rbharath
    So you need to "co-crystal" pose. You can get this by docking
    *need a
    To process different protein/ligand conformations, you basically process them as separate datapoints
    Let's say you have a trained protein-ligand binding energy model (trained on pdbbind)
    And you have a bunch of binding poses of different conformations of protein/ligand
    You could run them through your trained model and take the max perhaps
    To get the most optimal binding free energy