by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Oct 22 2018 21:12

    dimus on v4.0.3

    (compare)

  • Oct 22 2018 21:12

    dimus on master

    v 4.0.3 gems update (compare)

  • Sep 18 2017 08:28

    dimus on v4.0.2

    (compare)

  • Sep 18 2017 08:28

    dimus on master

    add span for resolution (compare)

  • Sep 17 2017 18:57

    dimus on master

    make rubocop happy (compare)

  • Sep 17 2017 14:24

    dimus on v4.0.1

    (compare)

  • Sep 17 2017 14:24
    dimus closed #43
  • Sep 17 2017 14:24

    dimus on master

    Fix #43 partially The idea was… (compare)

  • Sep 14 2017 22:19

    dimus on 43-unify-resolution-injestion

    wip (compare)

  • Sep 14 2017 21:47

    dimus on 43-unify-resolution-injestion

    wip (compare)

  • Sep 13 2017 22:02

    dimus on 43-unify-resolution-injestion

    Fix #43 unify injection and res… (compare)

  • Sep 13 2017 21:30

    dimus on 43-unify-resolution-injestion

    (compare)

  • Sep 13 2017 21:26
    dimus opened #43
  • Sep 13 2017 20:20

    dimus on v4.0.0

    (compare)

  • Sep 13 2017 20:20

    dimus on 42-better-eta

    (compare)

  • Sep 13 2017 20:20
    dimus closed #42
  • Sep 13 2017 20:20

    dimus on master

    Fix #42 - better speed estimati… (compare)

  • Sep 12 2017 22:16

    dimus on 42-better-eta

    wip (compare)

  • Sep 12 2017 21:46

    dimus on 40-concurrency

    (compare)

  • Sep 12 2017 21:45

    dimus on 42-better-eta

    Fix #42 - better speed estimati… (compare)

dashish333
@dashish333
Alright! Well, it seems that I must give try to GO sometime.
Dmitry Mozzherin
@dimus
So Python is fine as a glue between fast libraries, but not fine as a language to write such libraries
Big advantage of Go vs C is garbage collection, some nicities stolen from languages like Python and very fast compilation times
so it is as fast to write in Go as it is in Python
However garbage collection does slow Go down, so I am considering to add Rust where i need maximum possible performance
dashish333
@dashish333
I find the gc to be not effective in python. Recently was running couple of ML heuristics on Intel Xeon Es2620 with 64GiB memory and found the memory is not being freed very effectively.
Dmitry Mozzherin
@dimus
GCs might become a huge resource eater in some programs, if memory heap grows out of control GC has to traverse huge chunks of memory all the time
dashish333
@dashish333
Ok, that nice! Rust will be useful.
Dmitry Mozzherin
@dimus
the main problem with Rust is it about 10 times harder to leran than Go, and about 5 times harder to read
but it is the most innovative and exciting language
dashish333
@dashish333
Hahaha!! I just looked up and seems the syntax is an amalgamation of python java and c.
Dmitry Mozzherin
@dimus
Rust stole much from Scala and Haskel
of its syntax
But the most amazing things that Rust is a language without GC and with memory safety
and it is completely new and revolutionary
dashish333
@dashish333
What's your take on this new language Julia, from recent times people are encouraging it.
Dmitry Mozzherin
@dimus
i hear good words about it, it seemed to be positioned as a language for data scientists, but I did not see any advantages for myself in Julia
dashish333
@dashish333
Your work mostly revolves around NLP and algorithms?
Dmitry Mozzherin
@dimus
finding, normalizing and verifying scientific names is about all i do
dashish333
@dashish333
Great, great!!
Well, I must say bye, for now, it's nice to have a conversation with you, learned couple of thing amidst resolving the installation issue. Thanks a ton!
Dmitry Mozzherin
@dimus
have a good night @dashish333
dashish333
@dashish333
One quick question, what should we prefer given our use case would involve input as pdfs and URL. Should we stick to local installation of GNRD or go with gnfinder?
Dmitry Mozzherin
@dimus
GNRD is aging, if you are able to have a conversion to plain UTF text I would go with gnfinder. I used gnfinder to go throught 20 million books and it worked just fine
dashish333
@dashish333
Ok...Thanks!
R. Prabhakar
@rsprabha
Thanks Dima, Can we meet at 11:30a UTC which will be 5:00pm IST?
Dmitry Mozzherin
@dimus
sounds good @rsprabha
Dmitry Mozzherin
@dimus

@dashish333 and @rsprabha here are some other GN resources you might find useful:

https://github.com/gnames/gntagger

https://github.com/gnames/gnverify

https://gitlab.com/gogna/gnparser

Dmitry Mozzherin
@dimus
@rsprabha and @dashish333 I made a draft RFC document for gnfinder+ project: https://github.com/gnames/gnfinder/wiki/RFC-for-%22gnfinder-plus%22-project
Dmitry Mozzherin
@dimus
we can discuss the gnfinder+ project there
R. Prabhakar
@rsprabha
Thanks for the document. Took a quick look at it.
Harsh Zalavadiya
@harshzalavadiya

Hi @dimus,

After reading RFC I wrote a tiny go CLI application for gnfinder+
https://github.com/harshzalavadiya/gnfinder-plus

As of now it's just a simple binary there's no gRPC or file URL support included
Cross compiled binaries are available in releases

p.s. I'm from strand biodiv team

however pdfcpu lakes plain text output so I have to use library that uses unipdf behind the scenes
R. Prabhakar
@rsprabha
@dimus I posted you RFC on our chat channel and our colleague took it up did this in his spare hour.
Dmitry Mozzherin
@dimus
@harshzalavadiya thats great! And I think it is way better to move little by little with new functionality, find and solve problems of practical use that are appearing in the process, than try to implement everything and find that the project cannot be used for one or another reason, I am looking at your code now
@rsprabha I am glad we have some practical results from our conversation!
Dmitry Mozzherin
@dimus
@harshzalavadiya first quick note. I see that you released v1.0.0, from the point of semantic versioning it is a very serious statement, that means you are not going to change API of your project until the release of v2.x.x. I would suggest to start with v0.x.x and move forward until you feel that the API is well designed and is stable for a foreseable future
Dmitry Mozzherin
@dimus
@harshzalavadiya https://github.com/lu4p/cat is a nice find
Dmitry Mozzherin
@dimus
@rsprabha do I understand it correctly, that if gnfinder-plus uses AGPL-licensed library, the project itself has to be AGPL as well? Using MIT licensed gnfinder should not be a problem for AGPL project
R. Prabhakar
@rsprabha
@dimus I was referring to unipdf which is under agpl license. Are we ok with this? So are you saying lu4p cannot be used? And you are right about the semantic versioning. We should be slow @harshzalavadiya
R. Prabhakar
@rsprabha
@dimus looked up the lu6p unlicense. Seems to be approved by the FSF and GPL compatible. https://en.wikipedia.org/wiki/Unlicense Do you see a problem with using this library?
Dmitry Mozzherin
@dimus
https://github.com/lu4p/cat uses (cloned for some reason) unipdf, which in my understanding means that it also should be AGPLed, or have a AGPL-compatible license instead of https://github.com/lu4p/cat/blob/master/LICENSE
@rsprabha :arrow_up:
Dmitry Mozzherin
@dimus
Looks like gnfinder-plus needs to have AGPL license, another good reason to keep gnfinder and gnfindef-plus separate
Harsh Zalavadiya
@harshzalavadiya
v1.0.0 was given just because i was just trying out for fun and wanted to trigger CI build, as of now this repo under @harshzalavadiya is completely recycable and git history will be removed if we decide to use this repo so please ignore version or commits as of now :P
Harsh Zalavadiya
@harshzalavadiya
I also found one more library https://github.com/sajari/docconv (MIT License) so we can avoid AGPL and this might ease it for everyone incl. for the folks who uses it for commercial usage, and also has a optional https://github.com/otiai10/gosseract support and exactly what we need, I have updated repo that uses docconv now https://github.com/harshzalavadiya/gnfinder-plus, also I have started using proper sementic releases from now on
Dmitry Mozzherin
@dimus
@harshzalavadiya great news! Can you investigate if there is a difference in handling a 2-column PDFs by either unipdf or docconv? If one of them supports 2 column layouts, and other does not, I think it is more important than AGPL license. Most tools do not support 2 columns sadly.
Dmitry Mozzherin
@dimus
@harshzalavadiya looks like docconv is a wrapper to pdf2text command line tool, which means an introduction of an external dependency that has to be installed independently of gnfinder-plus. Because of that, I would say your unipdf solution has an advantage over docconv solution.
I added info about docconv and gnfinder-plus to https://github.com/gnames/gnfinder/wiki/RFC-for-%22gnfinder-plus%22-project