def featurizing_smis(smi): featurized = featurizer(smi) return featurized
vectorized_featurizing = np.vectorize(featurizing_smis) array_of_feats = vectorized_featurizing(array_of_smis)
@ignaczgerg I'm currently working on this! I was offline most of the last week but just starting to come back online and get to work on the migration. I'll post more information soon
@rbharath That is awesome, thank you! If I could help with anything regarding this, I am more than happy to help.
@MasunNabhanHoms_twitter Can you report more details about the error that you're seeing? I'm not sure what the issue is
When it crashes, it say "Session crashes with no reason". The log messages are :
WARNING:root:kernel dab80b04-c3b5-45ae-827c-33975f71d502 restarted
KernelRestarter: restarting kernel (1/5), keep random ports 2021-06-04 08:15:54.625105: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
NotImplementedError: Cannot convert a symbolic Tensor (gradient_tape/private__graph_conv_keras_model/graph_gather/sub:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported
Hi, which python version are you using?
I got this error when I accidentally used python=3.9 with deepchem. Try to downgrade to python=3.7 and you will be fine.
@Tonylac77 dataset.itersamples() might be what you’re looking for.
Dear all, sorry for the late answer. It seems we have fixed the iteration over Diskdatasets. In this case we use a 'for' loop where we split the tuples generated by the k-fold split function, and then output them as two csv files (we want to do this for use in other machine learning software).
k = [k1, k2, k3, k4, k5, k6, k7, k8, k9, k10] #where K1-K10 are the folds from the splitting a = 1 for x in k: train = x.to_dataframe() cv = x.to_dataframe() train = train['ids'] cv = cv['ids'] train.to_csv("k"+str(a) +"_train.csv") cv.to_csv("k"+str(a) +"_cv.csv") a = a+1
This works fine when loading our dataset from CSV (with CSVLoader function) without an ID field. However, if we try to use a dataset with ChemBL IDs (in this case) we get the following RDkit error when performing the k-fold-split (see below) would love any input on this!
ArgumentError Traceback (most recent call last) <ipython-input-8-604eaa868421> in <module> 1 # split dataset ----> 2 k = splitter.k_fold_split(dataset=dataset, k=10) C:\Anaconda3\envs\deepchem38\lib\site-packages\deepchem\splits\splitters.py in k_fold_split(self, dataset, k, directories, **kwargs) 84 frac_fold = 1. / (k - fold) 85 train_dir, cv_dir = directories[2 * fold], directories[2 * fold + 1] ---> 86 fold_inds, rem_inds, _ = self.split( 87 rem_dataset, 88 frac_train=frac_fold, C:\Anaconda3\envs\deepchem38\lib\site-packages\deepchem\splits\splitters.py in split(self, dataset, frac_train, frac_valid, frac_test, seed, log_every_n) 1107 for ind, smiles in enumerate(dataset.ids): 1108 mols.append(Chem.MolFromSmiles(smiles)) -> 1109 fps = [AllChem.GetMorganFingerprintAsBitVect(x, 2, 1024) for x in mols] 1110 1111 # calcaulate scaffold sets C:\Anaconda3\envs\deepchem38\lib\site-packages\deepchem\splits\splitters.py in <listcomp>(.0) 1107 for ind, smiles in enumerate(dataset.ids): 1108 mols.append(Chem.MolFromSmiles(smiles)) -> 1109 fps = [AllChem.GetMorganFingerprintAsBitVect(x, 2, 1024) for x in mols] 1110 1111 # calcaulate scaffold sets ArgumentError: Python argument types in rdkit.Chem.rdMolDescriptors.GetMorganFingerprintAsBitVect(NoneType, int, int) did not match C++ signature: GetMorganFingerprintAsBitVect(class RDKit::ROMol mol, unsigned int radius, unsigned int nBits=2048, class boost::python::api::object invariants=, class boost::python::api::object fromAtoms=, bool useChirality=False, bool useBondTypes=True, bool useFeatures=False, class boost::python::api::object bitInfo=None, bool includeRedundantEnvironments=False)