Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Harshit Agarwal
    @harshitagarwal2
    hey I am trying to get the most search keywords in a search engine would I be able to do that with crawler4j?
    Federico Tolomei
    @s17t
    @pgalbraith thank you for your contributions. guava library is in the dependencies again. I would try to avoid to include guava 'just' for InternetDomainName. Is there any alternative implementation ? Maybe from apache ?
    Paul Galbraith
    @pgalbraith
    @s17t Hi I, looked into this one quite a bit, and wasn't able to find any satisfying alternatives other than the two I found (i.e. Guava for static lookup, and the https://github.com/whois-server-list/public-suffix-list lib for external download/compare) ... ultimately, though, I still am thinking that this is beyond what Crawler4j should be doing ... just provide the URL and let the consumer decide if they need to do more work to determine public/private host.
    Federico Tolomei
    @s17t
    I can't merge anymore in yasserg/crawler4j Does anybody know how to contact Yasserg ? Thx
    rz
    @rzo1
    Maybe via an issue on Github? It seems, that he revoked your contributor rights...
    Federico Tolomei
    @s17t
    I opened it two days ago: yasserg/crawler4j#384
    still no luck from his linkedin
    Federico Tolomei
    @s17t
    I have created an organization https://github.com/crawler4j/crawler4j with an import of the main repo. I have added @yasserg as admin (we will see if he will respond). @pgalbraith , @rzo1 and @Chaiavi have write permission in the repo.
    rz
    @rzo1
    still no luck @s17t ? if so, we should proceed on the organizational repository and request sonatype permissions for releases...
    Federico Tolomei
    @s17t
    sadly no
    for nexus at time of 4.0/4.5 release I uploaded my GPG keys so I still have authorization to push artifact
    Federico Tolomei
    @s17t
    Concerns have been raised about keeping the name 'crawler4j' for the fork. I support to still use the 'crawler4j' the name as long the original Yasser copyright statement is maintained in README and in the documentation. The license is Apache2 so the license is not an issue.
    rz
    @rzo1
    the question is, if we should change the package / groupId structure ...
    would be a clear cut in terms of a real fork.
    Federico Tolomei
    @s17t
    that is another question
    I would like to avoid the change. I will try contact the the edu.uci.ics admin
    rz
    @rzo1
    ok.
    rz
    @rzo1
    any updates from Yasserg?
    Federico Tolomei
    @s17t
    nope :(
    I am talking with @pgalbraith by email about an hard fork
    in these days. We will propose something in next days I think.
    rz
    @rzo1
    alright :)
    i am happy to see the project going forward. there is a lot of open work :)
    rz
    @rzo1
    Okay. It seems, the official repo is "dead" :/
    Federico Tolomei
    @s17t
    Hi, everybody interested in evolutionary fork of crawler4j is invited to vote the fork's name: http://www.strawpoll.me/17363919
    rz
    @rzo1
    ok
    rotis23
    @rotis23
    Hi all. Anyone still around on this? Was this successfully forked?
    Sai Aditya Harish
    @77aditya77
    Hi
    Anyone around?
    Can anyone tell if we move this if else loop to some another thread will the crawler's crawling performance increase.
    ?
    Paul Galbraith
    @pgalbraith
    Fork never happened, I think we both got too busy :-), though I'd still like to fork. I have particular interest in being able to deploy crawlers as J2EE batch processes, so if I get around to it, that's what I'll be adding in a new fork.
    Vincent Lee
    @MightyVincent
    Hi, everyone, there seems to be a bunch of new commits but with no release since 2018, is this repo "dead"?