Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Henrique Junior
    @HenriqueCSJ

    hello @HenriqueCSJ , no worries at all! Have you looked over the Deepchem tutorials? THose might be what you’re looking for :)

    Dear @alat-rights , thank you for your kind reply. I was with the intention to use DeepChem on my PhD project but reading around the web it looks like the DeepChem approach may be a little hard to adapt for organometallic compounds and my own DataFrame (that is based on cartesian coordinates). That is why I'm trying to learn how to do it "by hand".

    Bharath Ramsundar
    @rbharath
    @HenriqueCSJ Ah interesting! I don't have much approach with organometallic compounds and I'd be glad for us to add better DeepChem support. As a couple of questions, do you have SDF files with your compounds available? If so you can use the CoulombMatrix featurizer to featurize them directly https://deepchem.readthedocs.io/en/latest/featurizers.html#coulombmatrix. The feature outputs are numpy arrays so they can be directly fed into simple machine learning models like dc.models.MultitaskRegressor.
    Also +1 to @alat-rights recommendation of the DeepChem tutorials. We don't cover organometallic compounds, but working through will give you a very solid knowledge of how to use DeepChem to solve problems :)
    joylannister
    @joylannister
    hello erver time i load dataset like"clintox_tasks, clintox_datasets, transformers = load_clintox(split="random")
    train_dataset, valid_dataset, test_dataset = clintox_datasets"but it dosen't work.the test dataset didn't change.my deepchem version is 2.3
    Even though I choose“random”, the test set is the same every time.i don't why
    alat-rights
    @alat-rights
    @joylannister Very strange! Trying to reproduce this rn. I’ll get back to you.
    okay I actually have calc homework due in two hours, I’ll get back to you in a few hours!
    joylannister
    @joylannister
    @alat-rights okay,i work at colab.I'm really confused.I don't know if I'm the only one with this problem
    Henrique Junior
    @HenriqueCSJ

    @HenriqueCSJ Ah interesting! I don't have much approach with organometallic compounds and I'd be glad for us to add better DeepChem support. As a couple of questions, do you have SDF files with your compounds available? If so you can use the CoulombMatrix featurizer to featurize them directly https://deepchem.readthedocs.io/en/latest/featurizers.html#coulombmatrix. The feature outputs are numpy arrays so they can be directly fed into simple machine learning models like dc.models.MultitaskRegressor.

    @rbharath , thank you so much for your kind reply. Unfortunately, I don't have SDF files because I'm feeding pandas (pun intended haha) directly with results from the quantum chemistry package that I'm using. It is a lot of regular expressions parsing data from large outputs with XYZ and internal coordinates. I'm reading your excellent tutorials and seeing in the Tutorial Part 5: Creating Models with TensorFlow and PyTorch the achieved accuracy to the test set was 0.77. My first trial gave me ~0.7 but I was hoping for something a little better. Is something in the 0.7 range considered good enough?

    Bharath Ramsundar
    @rbharath
    @joylannister I just tried this on the nightly build of deepchem and it seems to work out. Could you try on the nightly build? If you check out the deepchem tutorials, you can see an example of how to use the nightly build on colab :)
    Bharath Ramsundar
    @rbharath
    @HenriqueCSJ Ah I see! If there's some other standard file format, I think we'd be glad to put it onto the development roadmap to support.
    For accuracies, it's likely possible to do better but will take some effort. Check out dc.hyper for some hyperparameter tuning capabilities to check for better performing model parameterizations
    alat-rights
    @alat-rights
    Hey Deepchem team is the meeting still at the normal time tonight at 2PM?
    Bharath Ramsundar
    @rbharath
    Yep! Usual time of 3pm PST today. I'll review the survey results and suggest new timings for next week
    Bharath Ramsundar
    @rbharath
    New timings are now up. I've sent out revised updates for the new timings. I've also added a new "deepchem office hours" as free time for users to get help on problems they're working through :)
    If you'd like to be added to any of these calls, please send me a note!
    joylannister
    @joylannister
    image.png
    thank you for your reply. I tried to load the data set again, but I still can’t divide the data set randomly. I don’t know if I wrote it wrong, every time the test set is the same @rbharath
    Gvinkc
    @Gvinkc
    @rbharath , Good morning!. I was reading MoleculeNet. I see the nice bar plots with error bars on top of the bars. Does deepchem have script for such plots? Would it be please possible to know how did you plot those, any tools? Thank you!
    alat-rights
    @alat-rights
    That should be a feature in Matplotlib @Gvinkc Take a look at Matplotlib :)
    1 reply
    Bharath Ramsundar
    @rbharath
    @joylannister Hmm, that's an interesting result. I haven't recently checked the randomness on the splitter. Are you upgraded to DeepChem 2.4.0.rc (the nightly build)? If you're still seeing the issue on the latest build, then I'll take a look at it to see if there's a deeper randomness issue :)
    @Gvinkc I believe those were matplotlib plots as @alat-rights mentions :). I don't think we actually have the scripts that generated the plots open sourced. A bit of an oversight on our parts! We just ran experiments a few times, and I think the rest should be possible to figure out from the matplotlib docs
    1 reply
    James Y
    @yuanjames

    Hi, does anything one know where the default weights/variables generated? if I run

    model = deepchem.models.GraphConvModel(
    len(tasks),
    graph_conv_layers=[n_filters] * 2,
    dense_layer_size=n_fully_connected_nodes,
    batch_size=batch_size,
    learning_rate=learning_rate,
    random_seed=seed,
    mode='regression')

    Bharath Ramsundar
    @rbharath
    You can the the weights for the model by running model.model.get_weights() in this case. model.model pulls out the underlying keras model to work with
    James Y
    @yuanjames
    Thanks,
    I am wondering which random algorithm that tf.Keras uses to generate default weights
    Bharath Ramsundar
    @rbharath
    Good question, I'm sure the default is glorot-uniform, but there isn't a good way to control this from the user level
    James Y
    @yuanjames
    Yes, Thanks alot
    I am researching HPS so that your answer is very helpful
    BTW, I want to make sure weights are inner variables with a model, when the model initialized. Call model.fit() will not re-generate random weights right? Only training/fitting process will update it via GD.
    Bharath Ramsundar
    @rbharath
    Some weights might be lazy generated upon first invocation, but after that only GD updates
    So some models won't construct all weights until they are invoked on something the first time. This is default Keras behavior that we inherit
    James Y
    @yuanjames
    Thanks
    Appreciation
    Bharath Ramsundar
    @rbharath
    Glad to be of help!
    Casey Galvin
    @cjgalvin

    Hello all,
    Using Deepchem nightly build from earlier this week ('2.4.0-rc1.dev').
    I am trying to use the ValidationCallback callback with Tensorboard active in the model. I am getting the following error (I'll put the full trace at the end, for tidiness):
    AttributeError: 'GraphConvModel' object has no attribute '_log_value_to_tensorboard'

    I looked for the _log_value_to_tensorboard function through the inherited classes, and could not find it. Am I missing something, or is logging to Tensorboard not implemented for use with ValidationCallback?

    If I turn Tensorboard off in the model, the callback seems to work as expected.

    As a side note, setting save_on_minimum to True in ValidationCallback, and specifying a ValidationCallback save_dir that is different from the model's model_dir will result in saving different checkpoints in different locations. The minimum ValidationCallback checkpoints will go to save_dir and the other checkpoints will go to model_dir. This is not necessarily good or bad; I just wanted to point out the behavior.

    Full traceback:

    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-132-045b1267119a> in <module>
          1 num_epochs = 5
          2 for epoch in range(num_epochs):
    ----> 3     gcm.fit(train_dataset, nb_epoch=1, callbacks=callbacks)
          4 #     gcm.evaluate(dataset=val_dataset, metrics=metrics, n_classes=None)
    
    ~/miniconda3/envs/blah/lib/python3.7/site-packages/deepchem/models/keras_model.py in fit(self, dataset, nb_epoch, max_checkpoints_to_keep, checkpoint_interval, deterministic, restore, variables, loss, callbacks, all_losses)
        324             dataset, epochs=nb_epoch,
        325             deterministic=deterministic), max_checkpoints_to_keep,
    --> 326         checkpoint_interval, restore, variables, loss, callbacks, all_losses)
        327 
        328   def fit_generator(self,
    
    ~/miniconda3/envs/blah/lib/python3.7/site-packages/deepchem/models/keras_model.py in fit_generator(self, generator, max_checkpoints_to_keep, checkpoint_interval, restore, variables, loss, callbacks, all_losses)
        432         manager.save()
        433       for c in callbacks:
    --> 434         c(self, current_step)
        435       if self.tensorboard and should_log:
        436         with self._summary_writer.as_default():
    
    ~/miniconda3/envs/blah/lib/python3.7/site-packages/deepchem/models/callbacks.py in __call__(self, model, step)
         79     if model.tensorboard:
         80       for key in scores:
    ---> 81         model._log_value_to_tensorboard(tag=key, simple_value=scores[key])
         82     if model.wandb:
         83       import wandb
    
    AttributeError: 'GraphConvModel' object has no attribute '_log_value_to_tensorboard'
    Bharath Ramsundar
    @rbharath
    @cjgalvin Ah thanks for the report! It might just be the case that no one has ever tried GraphConvModel with ValidationCallback and Tensorboard logging. Would you mind raising an issue on Github with this same report and possibly a small reproducing code snippet?
    I think this is a bug in the model possibly, but I haven't used Tensorboard + callbacks so might be good to see if someone else has experience.
    Casey Galvin
    @cjgalvin
    @rbharath Sure, no problem. Thanks for getting back quickly.
    Do you have an alternative approach you use?
    Bharath Ramsundar
    @rbharath
    I just keep things simple and track losses manually! If you use the all_losses argument for fit() you can get a readout of all losses from the model. I plot in matplotlib/jupyter notebook in general. I'm also a vi user and code in the terminal so I might just be too tied to simplistic tools though ^_^
    I know Tensorboard is very useful so we should try to get this fixed
    Casey Galvin
    @cjgalvin
    Cool, I will look into the all_losses flag. Maybe going simpler is the right path :)
    James Y
    @yuanjames
    model = GraphConvModel()
    model.fit(nb_epoch =1)
    model.fit(nb_epoch=9)
    is equal to the
    model.fit(nb_epoch =10). Am I right?
    Bharath Ramsundar
    @rbharath
    @yuanjames Yep, that's right!
    alat-rights
    @alat-rights
    Hey guys I’m trying to fit the WGAN on that failing code sample Bharath helped with last time, and I’m getting a dimension mismatch. Was wondering if yall could help? https://colab.research.google.com/drive/1bN_pxfox7CQGkBSQ1mCI9JAZ4mAK_E0I#scrollTo=j2C88lZ_Dxyr
    James Y
    @yuanjames
    @rbharath Thanks!
    James Y
    @yuanjames

    Hi, @rbharath thanks for you organized the talking. I forgot to ask a question, during my use, I found a lot of models are initialized by giving a random seed. But most of models won't use it. For example,

    elif model_name == 'graphconvreg':
        batch_size = hyper_parameters['batch_size']
        nb_epoch = hyper_parameters['nb_epoch']
        learning_rate = hyper_parameters['learning_rate']
        n_filters = hyper_parameters['n_filters']
        n_fully_connected_nodes = hyper_parameters['n_fully_connected_nodes']
    
        model = deepchem.models.GraphConvModel(
            len(tasks),
            graph_conv_layers=[n_filters] * 2,
            dense_layer_size=n_fully_connected_nodes,
            batch_size=batch_size,
            learning_rate=learning_rate,
            random_seed=seed,
            mode='regression')

    Is it just for coding convenient? As I saw pre settings of all modefs have random seed parameters.

    Bharath Ramsundar
    @rbharath
    @yuanjames Great to see you on the meeting! I sprained my neck badly and unfortunately can't sit and focus right now. If it's OK, let me answer your question on Monday. I think I have to take the weekend off away from the computer
    James Y
    @yuanjames
    @rbharath Take care, hope you recover back soon.