by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    mukeshb23
    @mukeshb23
    @rbharath Dear sir thank you so much for clarifying . Sir I have one more doubt regarding this. When I mixed this 77 cardiovascular drugs data sets with other large data set and then reperformed on modelling solubility tutorial I got positive r *2 value and also get more valid data after validation. Sir what I am thinking is it possible can I do this?
    Bharath Ramsundar
    @rbharath
    It's not unreasonable at all, but main thing to do is to make sure your validation R^2 is measured on compounds from your original dataset (it's not useful if your numbers are for molecules you don't care about)
    Vishesh Mangla
    @XtremeGood
    @rbharath I 'm not sure why you are using Jax. I think the new tensorflow 2.0 is complete in itself with even more support for if we have any errors. It is a totally different architecture from 1.0 which the tensorflow community too believes was a problem.
    Pytorch and tensorflow both are good.
    mukeshb23
    @mukeshb23
    @rbharath Dear sir I am getting approx 110 molecules after validation this data sets. then how to do only for target drugs any specific code required for separate this terget drugs for measuring r*2 value.
    macca1996-bit
    @macca1996-bit
    macca1996-bit
    @macca1996-bit
    Would love to hear peoples thoughts on this paper if anyone gets time to read it
    macca1996-bit
    @macca1996-bit
    Also I've today started getting a new error when trying to use DeepChem in colab, will make a forum post:----> 3 import deepchem as dc
    macca1996-bit
    @macca1996-bit
    Bharath Ramsundar
    @rbharath
    @macca1996-bit This is due to Colab updating to TF2.X as default. You need to add a fix as in this post: https://forum.deepchem.io/t/getting-deepchem-running-in-colab/81/3
    @XtremeGood Check out the discussion in https://forum.deepchem.io/t/gsoc-project-dynamic-deepchem/103 for more details about Jax/TF2.0
    @mukeshb23 You will need to write a script that separates your original dataset into train/valid/test. If you have an additional dataset, merge that dataset with your train test. Then train a model, and evaluate on your original valid dataset
    Bharath Ramsundar
    @rbharath
    Thanks to @nd-02110114 for a timely fix on colab! deepchem/deepchem#1789
    The tutorials should now run correctly with this update in place, CC @macca1996-bit
    Vishesh Mangla
    @XtremeGood
    hi @rbharath
    For the project Dynamic DeepChem do you require some existing code to be rewritten in Jax?
    Is that the project's goal?
    Vishesh Mangla
    @XtremeGood
    In that's the thing can you tell me which models are to be reworked upon so that I can prepare the timeline for proposal?
    macca1996-bit
    @macca1996-bit
    @rbharath that fixed the problem, thanks a lot for your help :)
    macca1996-bit
    @macca1996-bit
    quick question: Is it better to transform my data before or after the train, test, valid split? In my case I'm using a balancing transformer
    Based on my own research i get the impression that I should be doing it after data splitting
    Bharath Ramsundar
    @rbharath
    @XtremeGood The basic goal would be to implement graph convolutions in Jax. And perhaps another model or two if time permits
    Vishesh Mangla
    @XtremeGood
    Do you mean like in convolutional neural networks?
    move a filter across the image?
    Bharath Ramsundar
    @rbharath
    @macca1996-bit Glad to be of help :). You want to transform after the split. The reason for this is to prevent information leakage from the validation/test set into the training set
    @XtremeGood Yes, graph convolutions are the generalization of 2D convolutions to arbitrary graphs rather than grids. I'd recommend working through DeepChem tutorial #4 as an intro.
    Vishesh Mangla
    @XtremeGood
    ya I got it
    graph can be represented by adjacency matrix
    Vishesh Mangla
    @XtremeGood
    by the way are there any prerequisites like merging a pr or something with deepchem?
    Vishesh Mangla
    @XtremeGood
    also the page has open/merged requests than tutorials.
    macca1996-bit
    @macca1996-bit

    @rbharath okay thanks. So is this suitable: splitter = dc.splits.RandomSplitter()
    train_dataset, valid_dataset, test_dataset = splitter.train_valid_test_split(dataset)

    transformers = [dc.trans.BalancingTransformer(transform_w=True, dataset=dataset)]

    for dataset in [train_dataset, valid_dataset, test_dataset]:
    for transformer in transformers:
    dataset = transformer.transform(dataset)

    mukeshb23
    @mukeshb23
    @rbharath thank you sir but I don't know how to writhe script that separates my original data. Please help me to solve this sir.
    Bharath Ramsundar
    @rbharath
    @XtremeGood Its not a prerequisite to apply, but applications are competitive, so it helps to have a record of contributing. Our GSoC student last year had made a few smaller contributions already I think
    @macca1996-bit When initializing the transformer, make sure to use train_dataset. So dc.trans. Balancing Transformer(transform_w=True, dataset=train_dataset)
    @mukeshb23 Have you tried working through the DeepChem tutorials? They walk through use of the splitters and can probably provide guidance
    rfhari
    @rfhari

    Hi @rbharath, I'm working on the QM7/QM7b dataset. From the following link, I downloaded the dataset - http://moleculenet.ai/datasets-1.

    But the .mat file doesn't seem to have the smile format of the molecules. There are only 3-d cartesian data and electronic properties available. Can you please guide me in getting the smiles as well.

    Bharath Ramsundar
    @rbharath
    @rfhari I think you can use RDKit to load molecules from sdf file:
    suppl = Chem.SDMolSupplier('file.sdf')
    mols = [x for x in suppl]
    rfhari
    @rfhari
    Thanks @rbharath! But I think, only .mat file is there in the above link (http://moleculenet.ai/datasets-1). There ain't any sdf file or other files in the link
    Bharath Ramsundar
    @rbharath
    Ah, I see. You can load .mat files with scipy: https://scipy-cookbook.readthedocs.io/items/Reading_mat_files.html
    I think the original datasets were processed with matlab for some reason
    rfhari
    @rfhari
    ohh okay. Thanks @rbharath!
    ^ This file shows how the .mat files can be loaded into deepchem datasets
    rfhari
    @rfhari
    Yeah, Thanks a lot, this works!
    Can you please clarify - the label "u0 _atom" refers to what exactly?
    Bharath Ramsundar
    @rbharath
    I don't recall off the top of my head, but it might be atomization energy I think
    rfhari
    @rfhari
    ohh okay. Thanks a lot for the guidance!
    Bharath Ramsundar
    @rbharath
    As a reminder folks, GSoC applications are due tomorrow! Please make sure to submit if you plan on applying
    macca1996-bit
    @macca1996-bit
    Hey guys, hope everyone is staying healthy. Quick question: Is it possible to convert a graph conv vector representation of a molecule back into a SMILES string?
    Bharath Ramsundar
    @rbharath
    @macca1996-bit Good question! By graph conv vector representation, do you mean the extracted neural fingerprint? There's unfortunately not a great way to translate back to the original (the transformation wasn't designed to be invertible)
    What's your application? There might be a workaround possible for what you're looking to do