Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Oct 22 2018 21:12

    dimus on v4.0.3

    (compare)

  • Oct 22 2018 21:12

    dimus on master

    v 4.0.3 gems update (compare)

  • Sep 18 2017 08:28

    dimus on v4.0.2

    (compare)

  • Sep 18 2017 08:28

    dimus on master

    add span for resolution (compare)

  • Sep 17 2017 18:57

    dimus on master

    make rubocop happy (compare)

  • Sep 17 2017 14:24

    dimus on v4.0.1

    (compare)

  • Sep 17 2017 14:24
    dimus closed #43
  • Sep 17 2017 14:24

    dimus on master

    Fix #43 partially The idea was… (compare)

  • Sep 14 2017 22:19

    dimus on 43-unify-resolution-injestion

    wip (compare)

  • Sep 14 2017 21:47

    dimus on 43-unify-resolution-injestion

    wip (compare)

  • Sep 13 2017 22:02

    dimus on 43-unify-resolution-injestion

    Fix #43 unify injection and res… (compare)

  • Sep 13 2017 21:30

    dimus on 43-unify-resolution-injestion

    (compare)

  • Sep 13 2017 21:26
    dimus opened #43
  • Sep 13 2017 20:20

    dimus on v4.0.0

    (compare)

  • Sep 13 2017 20:20

    dimus on 42-better-eta

    (compare)

  • Sep 13 2017 20:20
    dimus closed #42
  • Sep 13 2017 20:20

    dimus on master

    Fix #42 - better speed estimati… (compare)

  • Sep 12 2017 22:16

    dimus on 42-better-eta

    wip (compare)

  • Sep 12 2017 21:46

    dimus on 40-concurrency

    (compare)

  • Sep 12 2017 21:45

    dimus on 42-better-eta

    Fix #42 - better speed estimati… (compare)

Dmitry Mozzherin
@dimus
of its syntax
But the most amazing things that Rust is a language without GC and with memory safety
and it is completely new and revolutionary
dashish333
@dashish333
What's your take on this new language Julia, from recent times people are encouraging it.
Dmitry Mozzherin
@dimus
i hear good words about it, it seemed to be positioned as a language for data scientists, but I did not see any advantages for myself in Julia
dashish333
@dashish333
Your work mostly revolves around NLP and algorithms?
Dmitry Mozzherin
@dimus
finding, normalizing and verifying scientific names is about all i do
dashish333
@dashish333
Great, great!!
Well, I must say bye, for now, it's nice to have a conversation with you, learned couple of thing amidst resolving the installation issue. Thanks a ton!
Dmitry Mozzherin
@dimus
have a good night @dashish333
dashish333
@dashish333
One quick question, what should we prefer given our use case would involve input as pdfs and URL. Should we stick to local installation of GNRD or go with gnfinder?
Dmitry Mozzherin
@dimus
GNRD is aging, if you are able to have a conversion to plain UTF text I would go with gnfinder. I used gnfinder to go throught 20 million books and it worked just fine
dashish333
@dashish333
Ok...Thanks!
R. Prabhakar
@rsprabha
Thanks Dima, Can we meet at 11:30a UTC which will be 5:00pm IST?
Dmitry Mozzherin
@dimus
sounds good @rsprabha
Dmitry Mozzherin
@dimus

@dashish333 and @rsprabha here are some other GN resources you might find useful:

https://github.com/gnames/gntagger

https://github.com/gnames/gnverify

https://gitlab.com/gogna/gnparser

Dmitry Mozzherin
@dimus
@rsprabha and @dashish333 I made a draft RFC document for gnfinder+ project: https://github.com/gnames/gnfinder/wiki/RFC-for-%22gnfinder-plus%22-project
Dmitry Mozzherin
@dimus
we can discuss the gnfinder+ project there
R. Prabhakar
@rsprabha
Thanks for the document. Took a quick look at it.
Harsh Zalavadiya
@harshzalavadiya

Hi @dimus,

After reading RFC I wrote a tiny go CLI application for gnfinder+
https://github.com/harshzalavadiya/gnfinder-plus

As of now it's just a simple binary there's no gRPC or file URL support included
Cross compiled binaries are available in releases

p.s. I'm from strand biodiv team

however pdfcpu lakes plain text output so I have to use library that uses unipdf behind the scenes
R. Prabhakar
@rsprabha
@dimus I posted you RFC on our chat channel and our colleague took it up did this in his spare hour.
Dmitry Mozzherin
@dimus
@harshzalavadiya thats great! And I think it is way better to move little by little with new functionality, find and solve problems of practical use that are appearing in the process, than try to implement everything and find that the project cannot be used for one or another reason, I am looking at your code now
@rsprabha I am glad we have some practical results from our conversation!
Dmitry Mozzherin
@dimus
@harshzalavadiya first quick note. I see that you released v1.0.0, from the point of semantic versioning it is a very serious statement, that means you are not going to change API of your project until the release of v2.x.x. I would suggest to start with v0.x.x and move forward until you feel that the API is well designed and is stable for a foreseable future
Dmitry Mozzherin
@dimus
@harshzalavadiya https://github.com/lu4p/cat is a nice find
Dmitry Mozzherin
@dimus
@rsprabha do I understand it correctly, that if gnfinder-plus uses AGPL-licensed library, the project itself has to be AGPL as well? Using MIT licensed gnfinder should not be a problem for AGPL project
R. Prabhakar
@rsprabha
@dimus I was referring to unipdf which is under agpl license. Are we ok with this? So are you saying lu4p cannot be used? And you are right about the semantic versioning. We should be slow @harshzalavadiya
R. Prabhakar
@rsprabha
@dimus looked up the lu6p unlicense. Seems to be approved by the FSF and GPL compatible. https://en.wikipedia.org/wiki/Unlicense Do you see a problem with using this library?
Dmitry Mozzherin
@dimus
https://github.com/lu4p/cat uses (cloned for some reason) unipdf, which in my understanding means that it also should be AGPLed, or have a AGPL-compatible license instead of https://github.com/lu4p/cat/blob/master/LICENSE
@rsprabha :arrow_up:
Dmitry Mozzherin
@dimus
Looks like gnfinder-plus needs to have AGPL license, another good reason to keep gnfinder and gnfindef-plus separate
Harsh Zalavadiya
@harshzalavadiya
v1.0.0 was given just because i was just trying out for fun and wanted to trigger CI build, as of now this repo under @harshzalavadiya is completely recycable and git history will be removed if we decide to use this repo so please ignore version or commits as of now :P
Harsh Zalavadiya
@harshzalavadiya
I also found one more library https://github.com/sajari/docconv (MIT License) so we can avoid AGPL and this might ease it for everyone incl. for the folks who uses it for commercial usage, and also has a optional https://github.com/otiai10/gosseract support and exactly what we need, I have updated repo that uses docconv now https://github.com/harshzalavadiya/gnfinder-plus, also I have started using proper sementic releases from now on
Dmitry Mozzherin
@dimus
@harshzalavadiya great news! Can you investigate if there is a difference in handling a 2-column PDFs by either unipdf or docconv? If one of them supports 2 column layouts, and other does not, I think it is more important than AGPL license. Most tools do not support 2 columns sadly.
Dmitry Mozzherin
@dimus
@harshzalavadiya looks like docconv is a wrapper to pdf2text command line tool, which means an introduction of an external dependency that has to be installed independently of gnfinder-plus. Because of that, I would say your unipdf solution has an advantage over docconv solution.
I added info about docconv and gnfinder-plus to https://github.com/gnames/gnfinder/wiki/RFC-for-%22gnfinder-plus%22-project
Dmitry Mozzherin
@dimus
@gdower, @mju looks like symposia at TDWG is very collection oriented, and the work on CoL-BHL integration does not fit. However there is a plenary section that seem to fit this work, should we write and abstract to fit this section? https://www.tdwg.org/conferences/2020/session-list/#pd01%20avenues%20into%20integration:%20communicating%20taxonomic%20intelligence%20from%20sender%20to%20recipient
Dmitry Mozzherin
@dimus
My understanding it is more about what Nico Franz is doing, however I think taxonomic intelligence has a broader sense, like 'Animal', 'Bear' 'Ursus arctos', 'Ursus arctos L. 1758', 'Ursus arctos L. 1758 sensu A.B. C' all carry some taxonomic inteligence and their meaning would significantly shift depending on a context
Debbie Paul
@debpaul
Hi all, if you can't find a good fit, do submit to open, and we can then see after all talks are in, where it might fit.
Dmitry Mozzherin
@dimus
thanks @debpaul
Matt
@mjy
@dimus SYM09 Technical and standards implications in data liberation and semantic publishing for biodiversity seems very much related?
Dmitry Mozzherin
@dimus
@mjy yes, good point, i'll think about an abstract under this angle
Geoff Ower
@gdower
Sorry I missed your message, @dimus. I can still help edit tonight.
Dmitry Mozzherin
@dimus
@gdower I submitted an abstract for approval, we will be able to work on it further if it will be accepted
You should get an email with a link that will let you see and edit it in the future.
diatomsRcool
@diatomsRcool
looks like gnrd is down?
Dmitry Mozzherin
@dimus
something is wrong, it goes down and up, I did not have a chance to look what causes it yet
Dmitry Mozzherin
@dimus
@diatomsRcool got GNRD back, was a Kubernetes trouble :-/