by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Vishesh Mangla
    @XtremeGood
    Do you mean like in convolutional neural networks?
    move a filter across the image?
    Bharath Ramsundar
    @rbharath
    @macca1996-bit Glad to be of help :). You want to transform after the split. The reason for this is to prevent information leakage from the validation/test set into the training set
    @XtremeGood Yes, graph convolutions are the generalization of 2D convolutions to arbitrary graphs rather than grids. I'd recommend working through DeepChem tutorial #4 as an intro.
    Vishesh Mangla
    @XtremeGood
    ya I got it
    graph can be represented by adjacency matrix
    Vishesh Mangla
    @XtremeGood
    by the way are there any prerequisites like merging a pr or something with deepchem?
    Vishesh Mangla
    @XtremeGood
    also the page has open/merged requests than tutorials.
    macca1996-bit
    @macca1996-bit

    @rbharath okay thanks. So is this suitable: splitter = dc.splits.RandomSplitter()
    train_dataset, valid_dataset, test_dataset = splitter.train_valid_test_split(dataset)

    transformers = [dc.trans.BalancingTransformer(transform_w=True, dataset=dataset)]

    for dataset in [train_dataset, valid_dataset, test_dataset]:
    for transformer in transformers:
    dataset = transformer.transform(dataset)

    mukeshb23
    @mukeshb23
    @rbharath thank you sir but I don't know how to writhe script that separates my original data. Please help me to solve this sir.
    Bharath Ramsundar
    @rbharath
    @XtremeGood Its not a prerequisite to apply, but applications are competitive, so it helps to have a record of contributing. Our GSoC student last year had made a few smaller contributions already I think
    @macca1996-bit When initializing the transformer, make sure to use train_dataset. So dc.trans. Balancing Transformer(transform_w=True, dataset=train_dataset)
    @mukeshb23 Have you tried working through the DeepChem tutorials? They walk through use of the splitters and can probably provide guidance
    rfhari
    @rfhari

    Hi @rbharath, I'm working on the QM7/QM7b dataset. From the following link, I downloaded the dataset - http://moleculenet.ai/datasets-1.

    But the .mat file doesn't seem to have the smile format of the molecules. There are only 3-d cartesian data and electronic properties available. Can you please guide me in getting the smiles as well.

    Bharath Ramsundar
    @rbharath
    @rfhari I think you can use RDKit to load molecules from sdf file:
    suppl = Chem.SDMolSupplier('file.sdf')
    mols = [x for x in suppl]
    rfhari
    @rfhari
    Thanks @rbharath! But I think, only .mat file is there in the above link (http://moleculenet.ai/datasets-1). There ain't any sdf file or other files in the link
    Bharath Ramsundar
    @rbharath
    Ah, I see. You can load .mat files with scipy: https://scipy-cookbook.readthedocs.io/items/Reading_mat_files.html
    I think the original datasets were processed with matlab for some reason
    rfhari
    @rfhari
    ohh okay. Thanks @rbharath!
    ^ This file shows how the .mat files can be loaded into deepchem datasets
    rfhari
    @rfhari
    Yeah, Thanks a lot, this works!
    Can you please clarify - the label "u0 _atom" refers to what exactly?
    Bharath Ramsundar
    @rbharath
    I don't recall off the top of my head, but it might be atomization energy I think
    rfhari
    @rfhari
    ohh okay. Thanks a lot for the guidance!
    Bharath Ramsundar
    @rbharath
    As a reminder folks, GSoC applications are due tomorrow! Please make sure to submit if you plan on applying
    macca1996-bit
    @macca1996-bit
    Hey guys, hope everyone is staying healthy. Quick question: Is it possible to convert a graph conv vector representation of a molecule back into a SMILES string?
    Bharath Ramsundar
    @rbharath
    @macca1996-bit Good question! By graph conv vector representation, do you mean the extracted neural fingerprint? There's unfortunately not a great way to translate back to the original (the transformation wasn't designed to be invertible)
    What's your application? There might be a workaround possible for what you're looking to do
    rfhari
    @rfhari
    Hi @rbharath, I'm working on QM7 dataset. I got smiles from the following link https://github.com/deepchem/deepchem/blob/master/deepchem/molnet/load_function/qm7_datasets.py
    But seems like it contains only one label, so how is "Multitasks" performed in this, which is mentioned in the MoleculeNET paper. Can you please explain
    Bharath Ramsundar
    @rbharath
    @rfhari I believe that the multitask was done by combining data from many different datasets together
    I think there might have been one massive multitask model that combined all (or a large subset) of moleculenet
    rfhari
    @rfhari
    ohh, but from the above link, I could get a .csv file with a single label. Can you please guide me on getting this combined big dataset.
    Bharath Ramsundar
    @rbharath
    Hmm, I'll have to go back and check the manuscript tbh. I didn't run the models for this part so I don't recall how it was don
    I remember that the multitask models didn't actually work that great
    Are you trying to get a high performing model or are you interested in directly replicating MoleculeNet
    If it's the first, I think no need to do multitask
    rfhari
    @rfhari
    I'm trying to replicate and understand all the models. It's mentioned in the paper that, for QM7 dataset KRR (CM) i.e. multitask model performed the best among all. So, that why I'm a bit confused. Sorry for bothering you too much
    Bharath Ramsundar
    @rbharath
    No worries at all! I'm glad to help
    It's just that MoleculeNet was a large project with lots of contributors and lots of details
    So I don't know off-hand all the details. I think the paper and the code in the repo now are our best resources
    Have you gotten singletask models on QM7 running and giving reasonable numbers?
    rfhari
    @rfhari
    ohh okay, Thanks a lot @rbharath! I'll refer to those. Yeah, I got the single-task model working good.
    iherath
    @iherath
    Hi I am new to deepchem and was going through the tutorial titled "Predicting Ki of Ligands to a Protein", and I have some questions:
    Bharath Ramsundar
    @rbharath
    Welcome @iherath!
    What issues are you running into?
    iherath
    @iherath
    Thank you!
    Yeah how do you load your own protein into the notebook?
    Is there a place where I can find the necessary kind of csv file specified?
    Bharath Ramsundar
    @rbharath
    Ah, so that model is for a dataset of binding measurement for a particular protein
    It's not a structure based model where you can load a separate protein