Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Aleksander Nowiński
@axnow
Co tu tak cicho?
Łukasz Bolikowski
@bolo1729
In English, please, in English ;)
Aleksander Nowiński
@axnow
Ops, sorry.
Titipat Achakulvisut
@titipata
Hi guys! I have a question. Do you guys have CERMINE interface for python?
Titipat Achakulvisut
@titipata
I wish to have python library which can extract affiliation string and return python dictionary i.e. {'Institution': '...', 'Address': '...', 'Country':'...'}. However, you guys made a really great library, thanks!
Dominika Tkaczyk
@dtkaczyk
Hi titipata! Thank you for your interest in CERMINE :)
Unfortunately, no, there is only Java interface available.
It should be possible to execute CERMINE JAR from your Python code, but of course you will have to parse the output XML.
Titipat Achakulvisut
@titipata
Thanks dtkaczyk! I will try to use jar file in python and parse the output as you suggested. Thanks again for creating this great library :D
Dominika Tkaczyk
@dtkaczyk
You will find the information how to use the JAR in CERMINE GitHub README. If you come across any problems, just let me know.
Titipat Achakulvisut
@titipata
Definitely. I'm doing mvn part right now
Titipat Achakulvisut
@titipata
Hello! Do anyone know how to call java jar file from python?
Titipat Achakulvisut
@titipata
And also, can i pass several affiliation string (for example a list of string) at once? It takes about 700 ms per string.
Titipat Achakulvisut
@titipata
never mind guys, I'll call it from Scala. It's much faster that way
abhi09rawat
@abhi09rawat
hi, i am new to git Environment i wanted to use this CERMINE as an offline. so i thought to import it as an JAVA project. but doing so i am facing error in the project, can anyone tell the procedure to make this project work offline. Thanks
Dominika Tkaczyk
@dtkaczyk
Hi @abhi09rawat Adding a proper dependency to your pom.xml file should work. It is described in the README of the project. How did you import the project and what kind of problems are you experiencing?
abhi09rawat
@abhi09rawat
Hi @dtkaczyk i have added dependency to the pom.xml. I imported it as a java project. Problem is the
import org.apache.commons.cli.ParseException;
import org.apache.commons.io.FileUtils;
import org.apache.commons.lang.exception.ExceptionUtils;
import org.jdom.Element;
import org.jdom.output.Format;
import org.jdom.output.XMLOutputter;
import com.google.common.collect.Lists;
org.apache,org.jdom,com.google are red underlined.
Dominika Tkaczyk
@dtkaczyk
Have you tried building the project after adding the dependency? Most probably the jars aren't downloaded yet and that's why they are underlined. Building the project should fix this.
Dominika Tkaczyk
@dtkaczyk
@abhi09rawat What exactly are you trying to do? Do you have your own Maven project in which you want to use CERMINE, or you want to use CERMINE as a standalone application?
abhi09rawat
@abhi09rawat
I want to use CERMINE as a standalone Application.
Dominika Tkaczyk
@dtkaczyk
In that case it will be better to download JAR file from http://maven.icm.edu.pl/artifactory/simple/kdd-releases/pl/edu/icm/cermine/cermine-impl/ (for example cermine-impl-1.11-jar-with-dependencies.jar) and execute it with Java. Please see README for the details.
You do not need to import the code to your IDE
abhi09rawat
@abhi09rawat
Thank you for the HELP.
Anton Kulaga
@antonkulaga
Hi all!
Anton Kulaga
@antonkulaga
I wrote a simple tool ( https://github.com/antonkulaga/extractor ) for internal use that uses CERMINE and I have to admit that doi recognition is terrible! From the paper that I have in "files" half of doi-s are not recognized at all, the other half gets extra ")" that I have to delete manually.
Erik Bauch
@ebauch
@dtkaczyk Hi Dominika, I have been playing with cermine for a bit and must say I am very impressed. I am co-founder of open rev (openrev.org) a platform to discuss sciene papers in the browser.
we have been looking for something like that in a long time, especially for reference extraction.
I noticed though that it takes about 20 s per pdf to get meta data + references etc. out
is there an option to only extract the references, e.g. not the authors, abstract etc?
getting the meta data is a lot easier for paper than the references so we are mostly interested in the latter.
in general, what path forward to you suggest to speed up cerime to maybe bring it down to a few seconds per pDF?
thanks a lot for you help
Dominika Tkaczyk
@dtkaczyk
Hi @ebauch it might be possible to lower the processing time, although I am not sure whether it will be easy to achieve a few seconds. How do you run CERMINE, from code or you use JAR file?