Is there someone who is familiar with the BACE dataset from MoleculeNet?
I was using the BACE dataset through SNAP Stanford's OGBG library but needed access to the smiles representations of the molecules and downloaded the BACE dataset directly from http://moleculenet.ai/datasets-1
However, the annotations for the scaffold split in the CSV somewhat confuse me:
There is the "Model" annotation which gives the following amount of molecules for each split:
This seems strange to me, especially considering the split in the OGBG library:
Is there something I am missing? Many thanks for any help!
bace_cscaffold123.pklrefer to, since when using them I end up with a split of these sizes: