by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    mukeshb23
    @mukeshb23
    Dear sir, I have 76 cardiovascular drugs data with experimental solubility value which are made by himself . when I load this data set in deep chem modeling solubility tutorial then i found only 8 valid data set after validation after that i also geting negative r^2 value for this . is it possible with 77 datasets to get positive r^2 value and good results. help me regarding this please
    Bharath Ramsundar
    @rbharath
    @XtremeGood This sounds like a project from a different OpenChemistry team. The ones I'm personally mentoring are the DeepChem project track (the Jax and transfer learning project). The application process is competitive, so it's normal to have multiple applications for the same project. If you'd like feedback on your ideas for the DeepChem project track, feel free to ping me anytime (you can DM me on here)
    @mukeshb23 76 datapoints is a very small dataset. The machine learning tools we have will likely struggle with a dataset this small. I'd recommend trying to build a simple random forest model to start and going from there, but it might be hard. Maybe see if there's a way to get more datapoints to improve learning
    mukeshb23
    @mukeshb23
    @rbharath Dear sir thank you so much for clarifying . Sir I have one more doubt regarding this. When I mixed this 77 cardiovascular drugs data sets with other large data set and then reperformed on modelling solubility tutorial I got positive r *2 value and also get more valid data after validation. Sir what I am thinking is it possible can I do this?
    Bharath Ramsundar
    @rbharath
    It's not unreasonable at all, but main thing to do is to make sure your validation R^2 is measured on compounds from your original dataset (it's not useful if your numbers are for molecules you don't care about)
    Vishesh Mangla
    @XtremeGood
    @rbharath I 'm not sure why you are using Jax. I think the new tensorflow 2.0 is complete in itself with even more support for if we have any errors. It is a totally different architecture from 1.0 which the tensorflow community too believes was a problem.
    Pytorch and tensorflow both are good.
    mukeshb23
    @mukeshb23
    @rbharath Dear sir I am getting approx 110 molecules after validation this data sets. then how to do only for target drugs any specific code required for separate this terget drugs for measuring r*2 value.
    macca1996-bit
    @macca1996-bit
    macca1996-bit
    @macca1996-bit
    Would love to hear peoples thoughts on this paper if anyone gets time to read it
    macca1996-bit
    @macca1996-bit
    Also I've today started getting a new error when trying to use DeepChem in colab, will make a forum post:----> 3 import deepchem as dc
    macca1996-bit
    @macca1996-bit
    Bharath Ramsundar
    @rbharath
    @macca1996-bit This is due to Colab updating to TF2.X as default. You need to add a fix as in this post: https://forum.deepchem.io/t/getting-deepchem-running-in-colab/81/3
    @XtremeGood Check out the discussion in https://forum.deepchem.io/t/gsoc-project-dynamic-deepchem/103 for more details about Jax/TF2.0
    @mukeshb23 You will need to write a script that separates your original dataset into train/valid/test. If you have an additional dataset, merge that dataset with your train test. Then train a model, and evaluate on your original valid dataset
    Bharath Ramsundar
    @rbharath
    Thanks to @nd-02110114 for a timely fix on colab! deepchem/deepchem#1789
    The tutorials should now run correctly with this update in place, CC @macca1996-bit
    Vishesh Mangla
    @XtremeGood
    hi @rbharath
    For the project Dynamic DeepChem do you require some existing code to be rewritten in Jax?
    Is that the project's goal?
    Vishesh Mangla
    @XtremeGood
    In that's the thing can you tell me which models are to be reworked upon so that I can prepare the timeline for proposal?
    macca1996-bit
    @macca1996-bit
    @rbharath that fixed the problem, thanks a lot for your help :)
    macca1996-bit
    @macca1996-bit
    quick question: Is it better to transform my data before or after the train, test, valid split? In my case I'm using a balancing transformer
    Based on my own research i get the impression that I should be doing it after data splitting
    Bharath Ramsundar
    @rbharath
    @XtremeGood The basic goal would be to implement graph convolutions in Jax. And perhaps another model or two if time permits
    Vishesh Mangla
    @XtremeGood
    Do you mean like in convolutional neural networks?
    move a filter across the image?
    Bharath Ramsundar
    @rbharath
    @macca1996-bit Glad to be of help :). You want to transform after the split. The reason for this is to prevent information leakage from the validation/test set into the training set
    @XtremeGood Yes, graph convolutions are the generalization of 2D convolutions to arbitrary graphs rather than grids. I'd recommend working through DeepChem tutorial #4 as an intro.
    Vishesh Mangla
    @XtremeGood
    ya I got it
    graph can be represented by adjacency matrix
    Vishesh Mangla
    @XtremeGood
    by the way are there any prerequisites like merging a pr or something with deepchem?
    Vishesh Mangla
    @XtremeGood
    also the page has open/merged requests than tutorials.
    macca1996-bit
    @macca1996-bit

    @rbharath okay thanks. So is this suitable: splitter = dc.splits.RandomSplitter()
    train_dataset, valid_dataset, test_dataset = splitter.train_valid_test_split(dataset)

    transformers = [dc.trans.BalancingTransformer(transform_w=True, dataset=dataset)]

    for dataset in [train_dataset, valid_dataset, test_dataset]:
    for transformer in transformers:
    dataset = transformer.transform(dataset)

    mukeshb23
    @mukeshb23
    @rbharath thank you sir but I don't know how to writhe script that separates my original data. Please help me to solve this sir.
    Bharath Ramsundar
    @rbharath
    @XtremeGood Its not a prerequisite to apply, but applications are competitive, so it helps to have a record of contributing. Our GSoC student last year had made a few smaller contributions already I think
    @macca1996-bit When initializing the transformer, make sure to use train_dataset. So dc.trans. Balancing Transformer(transform_w=True, dataset=train_dataset)
    @mukeshb23 Have you tried working through the DeepChem tutorials? They walk through use of the splitters and can probably provide guidance
    rfhari
    @rfhari

    Hi @rbharath, I'm working on the QM7/QM7b dataset. From the following link, I downloaded the dataset - http://moleculenet.ai/datasets-1.

    But the .mat file doesn't seem to have the smile format of the molecules. There are only 3-d cartesian data and electronic properties available. Can you please guide me in getting the smiles as well.

    Bharath Ramsundar
    @rbharath
    @rfhari I think you can use RDKit to load molecules from sdf file:
    suppl = Chem.SDMolSupplier('file.sdf')
    mols = [x for x in suppl]
    rfhari
    @rfhari
    Thanks @rbharath! But I think, only .mat file is there in the above link (http://moleculenet.ai/datasets-1). There ain't any sdf file or other files in the link
    Bharath Ramsundar
    @rbharath
    Ah, I see. You can load .mat files with scipy: https://scipy-cookbook.readthedocs.io/items/Reading_mat_files.html
    I think the original datasets were processed with matlab for some reason
    rfhari
    @rfhari
    ohh okay. Thanks @rbharath!
    ^ This file shows how the .mat files can be loaded into deepchem datasets
    rfhari
    @rfhari
    Yeah, Thanks a lot, this works!
    Can you please clarify - the label "u0 _atom" refers to what exactly?
    Bharath Ramsundar
    @rbharath
    I don't recall off the top of my head, but it might be atomization energy I think
    rfhari
    @rfhari
    ohh okay. Thanks a lot for the guidance!
    Bharath Ramsundar
    @rbharath
    As a reminder folks, GSoC applications are due tomorrow! Please make sure to submit if you plan on applying