These are chat archives for FreeCodeCamp/DataScience
discussion on how we can use statistical methods to measure and improve the efficacy of http://freeCodeCamp.com
@srniranjan has a point. If you are doing similarity though it is because you are NOT having elements in the target set that could be found in the learning/test set, or your task doesn't consists in finding them.
A similarity test is applicable if you just want to see how similar they are "overall".
What CNN does nicely is sampling (by panels): Can you do something similar? Can you then think a way of indexing? You can index the panels and compare from there. Things I have also seen used is hashing, in the same idea.
Then you can try to find a branch-and-bound search algorithm over the sampled panels/hashes which would be more efficient that comparing all images one by one. That, assuming that similarity has to do with the way the image is displayed (eg. position).
An indexed branch-and-bound algo is also "parallelizable".
Just an idea.