Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Shibo Li
    @lichman0405

    Hello everyone, is there a tutorial for the Seq2Seq model? I am trying to predict the sigma profile of molecules from their SMILES and I can't find any tutorials. Suggestions are most welcome, thanks

    gensim library provides seq2seq modules if I am not wrong. Kenkov showed a demo to it https://github.com/kenkov/seq2seq

    Bharath Ramsundar
    @rbharath
    I'm pleased to share that DeepChem will again be part of Google summer of code! Please see our idea list at https://forum.deepchem.io/t/brainstorming-gsoc-2022-topics/658
    4 replies
    Please consider applying
    Usmanovsky
    @Usmanovsky
    Hello all, please is there a way to display the training loss of the Seq2Seq model? I need to figure out if I'm overtraining my model. I've been going through the source code with no luck
    Bharath Ramsundar
    @rbharath
    @Usmanovsky You can use a callback to do this
    Usmanovsky
    @Usmanovsky
    Thanks @rbharath. I tried using a callback and got this error. Am I doing something wrong? image.png
    Anshuman Mishra
    @shivance
    Is anyone aware of use cases of CFD (Computational Fluid Dynamics ) in field of Drug Discovery and medicine ?
    Bharath Ramsundar
    @rbharath
    @Usmanovsky I think the arguments aren't set correctly perhaps. One of the tutorials (perhaps advanced training) has ausage example that might help
    @shivance There is some work modeling fluid flow from the heart and in blood I believe that may be relevant
    Usmanovsky
    @Usmanovsky
    Can anyone please tell me what type of loss function this is? I saw it in the SeqToSeq class:
    image.png
    deloragaskins
    @deloragaskins

    @shivance I would check out the abstracts at last fall's DFD: https://meetings.aps.org/Meeting/DFD21/SessionIndex2

    and also search arxiv. As Bharath, said there's heart related stuff:
    https://arxiv.org/search/?query=heart+CFD&searchtype=all&abstracts=show&order=-announced_date_first&size=50

    There's a lot out there so hopefully you find something fun to play with (:

    1 reply
    andrew_007
    @andrew_007:matrix.org
    [m]

    Hello DeepChem community, I'm Andrew. Thank you for the excelant library (I'm new to it).

    I have this task of classifying some images related to a project I'm working on and want to use CNNs. Before I start with my task, I was hoping to get my hands dirty with the popular MNIST dataset and use DeepChem's CNN implementation on it. I have been through the docs of CNN but I can't understand what the parameters "n_tasks" and "n_features" represent. Is n_features = image length x image height x number of channels? It would be helpful if someone could give me some idea about it.

    Thank you in advance.

    Bharath Ramsundar
    @rbharath
    n_tasks is the number of outputs predicted. For MNIST this should be 1 I think
    For n_features offhand this may be number of channels? I'm not sure but would recommend chekcing out the CNN docs. Hope that helps!
    andrew_007
    @andrew_007:matrix.org
    [m]

    Hello Bharath, I went through the docs but it only says n_tasks = number of tasks and n_features = number of features which was unclear to me. However after some trial and errors I'm able to produce a high accuracy with the following parameters

    n_samples = 10000
    n_features = 28
    n_tasks = 1
    
    # Define Metric.
    classification_metric = dc.metrics.Metric(dc.metrics.roc_auc_score)
    
    model = dc.models.CNN(
        n_tasks,
        n_features,
        dims=1,
        dropouts=0,
        kernel_size=3,
        mode="classification",
        learning_rate=0.003,
        n_classes=10)
    
    # Fit Model.
    model.fit(dataset, nb_epoch=50)
    
    # Evaluate Model.
    scores = model.evaluate(dataset, [classification_metric], n_classes=10)
    print(f"score = {scores[classification_metric.name]}")

    I have 2 doubts:
    First being, if n_tasks is the number of outputs, shouldn't n_tasks be 10 as I want the output to be classwise probabilities? (And MNIST has 10 classes).
    Second being, why is n_feature = 28 even working? If image height & width were different which one would we enter?

    Bharath Ramsundar
    @rbharath
    n_tasks is 1 since there is only one task (predict class of image)
    I'm not sure on height vs width. I believe we should support rectangular images but not sure off hand!
    Sherif Elsabbagh
    @sherifelsabbagh
    hi, I have a question related to Graph CNN for delaney dataset.
    i splitted dataset and use this code
    model = dc.models.GraphConvModel(n_tasks=1,mode="regression",dropout=0.2)
    when i try to fit the model on train set, using this code
    model.fit(train,nb_epoch=300,loss = dc.models.losses.L1())
    i get an error and when I remove the loss function it goes well. So what is the problem with the loss function here?
    Bharath Ramsundar
    @rbharath
    Hmm, I'm not sure. You don't need to set a loss function for models typically (they have a default loss). Maybe just try removing the loss?
    Sherif Elsabbagh
    @sherifelsabbagh
    I am following tutorial 9 "Advanced Training" but I cant find dc.models.MultiTaskClassifier ? was it removed or shifted to another command
    Sherif Elsabbagh
    @sherifelsabbagh

    one other question. When I use model.predict on a classification task, it always returns 2 vales like this:

    array([[[9.8883396e-01, 1.1165977e-02]],

       [[3.7872460e-01, 6.2127537e-01]],
    
       [[8.4630173e-01, 1.5369828e-01]],

    instead of returning one value (0,1). How can i solve this?

    Bharath Ramsundar
    @rbharath
    The two values are the class probability for the 0 and 1 class
    You can threshold the values to get the class choice
    Sherif Elsabbagh
    @sherifelsabbagh

    Untitled.png

    this is what i get. this happens also with GATModel.

    should i install pytorch ?

    Bharath Ramsundar
    @rbharath
    Yes, I think installing pytorch should fix the issue
    Aryan Amit Barsainyan
    @ARY2260
    What platform will be used to hold regular conversations between mentors and gsoc contributed?
    1 reply
    *contributors
    Julius Park
    @juliusgeo
    is there a summary of m1 support/how to install tensorflow-metal in combination with deepchem? if not, I would be interested in submitting a PR for that.
    Bharath Ramsundar
    @rbharath
    @juliusgeo Yes please go ahead! We have very limited m1 support right now and improving our docs would help
    10 replies
    LinuxFTW
    @LinuxFTW
    Welp I accidentally pressed delete instead of edit.
    First things first - the website displays latest stable as 2.5.0 despite it being 2.6.1 on GitHub which displays PyPi and Conda packages at a version of 2.6.1. Is there a reason for this? The documentation all displays information for 2.6.1 which can lead some of us astray. Secondly, and this is more of a theoretical question. I'm trying to generate enzymes from scratch given a chemical reaction. Would it be a good proof of concept for a featurized reaction as an input and a featurized protein as an output? Later on, I'll have to incorporate 3D structure into the mix later but this is the plan for now.
    10 replies
    Bharath Ramsundar
    @rbharath
    @LinuxFTW Ah oops, the website should be updated. 2.6.1 is the latest stable release so you should use that
    Generating enzymes from scratch is a hard challenge. In general, the more complex the output the harder the learning challenge.
    It's today at 7pm PST and I'm glad to talk through your question in more detail there
    LinuxFTW
    @LinuxFTW
    Just created a PR for the website deepchem/deepchem.github.io#3
    Julius Park
    @juliusgeo
    Just created a forum post for m1 mac support: https://forum.deepchem.io/t/deepchem-m1-mac-support/746
    Bharath Ramsundar
    @rbharath
    Great thanks folks!
    LinuxFTW
    @LinuxFTW
    For some reason when I try and featurize sequences it will finish with the featurization, the RAM will spike and the script will crash.
    I have roughly 19084 sequences that I'm featurizing with a maximum length of ~7000
    7 replies
    LinuxFTW
    @LinuxFTW
    image.png
    Bharath Ramsundar
    @rbharath
    @LinuxFTW Which featurizer are you using?
    There are some tricks to control memory usage when featurizing
    Biggest one is to use a DataLoader class which featurizes data in chunks at a time and should have bounded memory usage
    You might be trying to featurize directly in memory which could be causing the blowup
    LinuxFTW
    @LinuxFTW
    I think currentyly I'm going to stick with the 9000 sequences I have setup in a deepchem dataset and move on with it.
    1 reply
    Although in the future I will come back and set it up properly with all 19084.
    gayanechilingar
    @gayanechilingar
    Hi everyone, I downloaded the MoleculeNet dataset from DeepChem. I want to know is it the smiles molecule canonical or not?
    gayanechilingar
    @gayanechilingar
    tasks, datasets, transformers = dc.molnet.load_{dataset['dataset_name']}(splitter=dataset['split_type'], featurizer = 'ECFP')
    it is my command
    Bharath Ramsundar
    @rbharath
    @gayanechilingar I believe that smiles are canonicalized by default but not sure off hand