These are chat archives for beniz/deepdetect

1st
Jun 2016
Jacky Yang
@anguoyang
Jun 01 2016 11:24
Hi, @beniz, could I send the image search requirement here?
Emmanuel Benazera
@beniz
Jun 01 2016 11:39
@anguoyang hello, this is an open forum, you can ask whatever you want. There may not be enough people for you to get useful answers, that's why I did mention other forums as well.
issues are mainly for problems with the software (and also for Chinese residents to ask questions since google groups are blocked IIRC)
Jacky Yang
@anguoyang
Jun 01 2016 11:58
Hi, @beniz , thanks a lot for your understanding and concern:), yes, just as you said, the google groups are blocked, it is very sad, however, our Chinese people have found many clever or tricky ways, such as VPN, to fight with the Great Fire Wall, it is not Great Wall, it is Great Fire Wall. It is really funny that the first internet message sent from China in 1987 is "Across the Great Wall, I could go everywhere in the world", now we have to use VPN to across the Great Fire Wall then we could go everywhere in the world, haha.
Emmanuel Benazera
@beniz
Jun 01 2016 12:01
ah nice, didn't know
at least gitter is not blocked, nor github. Yes VPN are good, but their packets can get blocked too, as I heard.
Jacky Yang
@anguoyang
Jun 01 2016 12:15
Yes, you are right, VPN packets could also be blocked, if they know the IP address of VPN server overseas, so we Chinese found another tricky way: change the IP address and username/password dynamically, some of them even change it every half a hour, it sounds crazy, as we have to refresh the web page which contains the VPN server information every half a hour, and re-dial the VPN connection, and then ask questions on google groups...life is not easy...hope gitter or github will be always OK...
Emmanuel Benazera
@beniz
Jun 01 2016 12:16
well, github is a crucial economic resource for China I guess, that's a lot of code to power companies, etc...
Jacky Yang
@anguoyang
Jun 01 2016 12:29
yes, maybe important for China, at least for economic . as for me, if github are blocked, I have to find opportunities to move overseas :)
I got response from torch @anguoyang
A basic idea: extract features from resnet for images in both datasets A and B and use cosine similarity as measure of similarity.
Jacky Yang
@anguoyang
Jun 01 2016 12:34
Hi, all,
Could anyone be kindly to help me on this issue? thanks, a requirement from my friend:
"We have lots of photos/images, say 10 million or more, they are original photos/images from our customers which need to be protected(To prevent plagiarism), here we call it as dataset A.
We also got lots of images by way of web crawler, from bloggers, websites, forum, etc. some of these images are simply copied from dataset A, some added with additional watermark, we call it as dataset B. it currently contains about 300000 images, but will grow day by day.
We will use 1 image or several images from dataset A, we call it as dataset C, we want to search images in B which is similar with C, and list all similar images.
We want to use deep learning for similarity search, but most of the images in dataset A has no tag, could we train these images into a specific model, then we could get more accurate result while searching similar images?"
Thanks a lot for your patience to read this long requirement, and have a nice day!
Emmanuel Benazera
@beniz
Jun 01 2016 14:08
by 'more accurate', you mean better search than with using a pre-trained model I guess ?
Jacky Yang
@anguoyang
Jun 01 2016 14:52
yes, it is
maybe train the model with dataset A will be more accurate than pre-trained model?
I don't know yet, maybe it is better to do it?
Is it possible to compare the cosine distance based on resnet features when using the imgsearch demo?
Jacky Yang
@anguoyang
Jun 01 2016 15:21
ok, great, another more question, could the angular compare result be converted into similarity percentage? they need to set a threshold on similarity, say 90%, if > this value, then it is similar, in the imgsearch demo, it seems the threshold is the number of output--search-size
Emmanuel Benazera
@beniz
Jun 01 2016 15:32
This is let to you as an exercise :) https://en.wikipedia.org/wiki/Cosine_similarity