Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    bnewbold
    @bnewbold
    Hi @Umeaboy! I am the person who can start a new translation.
    We are currently at the limit of number of languages our weblate account allows for a single project. I will get in touch with them about raising the limit, but it could be a couple weeks before we can start a new language
    Gerard Meijssen
    @GerardMeijssen_twitter
    Hey @bnewbold why not consider translatewiki.net, you then get the whole Wikimedia translator crowd as a bonus.
    3 replies
    There is indeed a fatcat identifier at Wikidata so you can link an author to a Wikidata item. EVERY author of a paper is notable enough.. even better having a fatcat identifier is a qualifier as well.. (meagre as a data point.. but hey)
    bnewbold
    @bnewbold
    @GerardMeijssen_twitter we have been really happy with weblate so far. there seems to be a pretty large community of active contributors, it runs on free software (which we could run ourselves in the future if need be), the interface is easy to understand, the machine translation feature is easy to use, the shared/community translation corpus is large, and it integrates very nicely with our development process. translatewiki.net might have other benefits but it would be a bunch of logistics to switch and I wouldn't want to lose our existing contributors
    bnewbold
    @bnewbold
    as a heads up, fatcat.wiki is going to be in read-only maintenance mode for about a day some time this week. API and web interface reads (searches, etc) should continue to work fine.
    this is an upgrade of operating system (from ubuntu xenial to ubuntu focal) and postgresql (from 11 to 13), and also sets us up for replicated operation, which should keep the service up during power outages (expected to be more frequent in California this year)
    stencil
    @stencil:matrix.org
    [m]
    Weblate feature such as suggesion on past translation, nearby keys, history, other languages and comments are very useful
    stencil
    @stencil:matrix.org
    [m]
    Their software isn't as polished as weblate IMO, quite a few projects had move d to weblate
    bnewbold
    @bnewbold
    hey folks, let's please keep the discussion here on-topic. continued comparisons on translation tools could be taken to a 1-to-1 conversation or another venue
    do appreciate the recommendations and input thus far
    bnewbold
    @bnewbold
    we have started rolling out a citation graph ("refcat") as part of fatcat.wiki: https://guide.fatcat.wiki/reference_graph.html
    billions of references! we include some references from wikipedia to papers, and from papers to open library books.
    planning to improve parsing of citation strings; improve reference matching; and handle reference to general web resources better (eg, showing wayback status/URL and possibly some site-specific metadata)
    feedback welcome!
    Stencil
    @stencil:matrix.org
    [m]
    not sure this is the place, but is there a way to bundle links to research paper, like "Research papers on software development with x library" or "Machine learning in x subject"
    bnewbold
    @bnewbold
    @stencil:matrix.org that could certainly be built on top of fatcat/scholar, and use the corpus to build it. I don't think it is likely to be part of core service/API itself
    https://paperswithcode.com/ is a pretty good start! it would likely be possible to expand that with content from scholar. for instance, I bet there are a bunch of papers mentioning or referencing specific datasets, software, and models, which could be discovered using scholar fulltext search API and/or the refcat citation graph. and then update paperswithcode with that information
    datasets and software are frequently mentioned by name instead of a formal citation. there are efforts to identify these using entity recognition, but a simple keyword/phrase search over a fulltext index, where there are only a few thousand libraries/datasets of interest, should be possible
    bnewbold
    @bnewbold
    starting a deploy of fatcat API v0.4.0 to production servers. API changes are pretty minor; proposal and CHANGELOG entry will go out as part of deploy
    bnewbold
    @bnewbold
    fatcat.wiki and api.fatcat.wiki are going to have downtime for several hours during the day this Wednesday (USA/Pacific time), due to power work in the building. scholar.archive.org and search.fatcat.wiki should work fine
    at some point we plan to set up read-only replication of the main api.fatcat.wiki database server, which should allow us to continue serving some requests during such downtime, but this work hasn't been scheduled yet
    bnewbold
    @bnewbold
    reminder: downtime tomorrow for many hours
    Hashi
    @stencil:matrix.org
    [m]
    What's the gitter room for normal chat?
    1 reply
    bnewbold
    @bnewbold
    I'm deploying a series of large code style cleanups to fatcat services. everything has been tested, so hopefully no noticeable changes!
    bnewbold
    @bnewbold
    Fatcat has a few hours of planned downtime scheduled for Monday, 2021-12-20. This is for building power maintenance. We have plans for replicating the the API and web interfaces to prevent such planned downtime in the future, but it isn't ready yet
    almugabo
    @almugabo

    Greetings. Is any one planning to import publication records from so called "policy documents". I mean things like IPCC reports (chapters) and their references . Like here

    https://doi.org/10.5281/zenodo.5475442

    almugabo
    @almugabo
    I am asking because I would like to work on this but would like to avoid reinventing the wheel and join force with others
    almugabo
    @almugabo
    Thanks for publishing the mapping of fatcat ids to external identifiers (fatcat_bulk_exports_2021-12-01/release_extid.tsv.gz).
    I have a question about the headers : I see (on a sample of first 100 rows) that it has 6 fields and could work out that 2=revision_id, 3=doi, 4=pmc, 5=pubmed and 6 = wikidata.
    My question: What is the first field ? I was expecting it to be the work_id or ident but this does not seem to be the case. (here an example
    edcbcbd3-18a4-4340-ba71-a129f3859270 a454dc99-f473-4e21-8e24-1054dab1e85b 10.3390/insects6040869 PMC4693176 26512699 Q30394287)
    Am I missing something ?
    bnewbold
    @bnewbold
    Hi @almugabo !
    This seems like it is likely to be a good contribution. "policy documents" fall on the edge of scope for fatcat, but if they cite and are cited then they are in scope
    documents that get a DOI, of any type, are usually imported automatically. eg, the document you linked is in the catalog here: https://fatcat.wiki/release/koyv44ordjaarcna6mq7medk5y
    our auto-crawling has gotten behind; catching that up, and crawling things like PDFs on zenodo.org from the past year, is something I am working on actively
    (oh, I see, this particular zenodo item is a set of files, not a single PDF)
    (and you probably mean the referred docs, not the IPCC report itself)
    I don't know anybody working on this directly, so probably not going to conflict
    bnewbold
    @bnewbold
    regarding the extid dumps: these can cause confusion, they are used in for internal data quality checks but are not easily reusable (and, eg, don't include all extid fields).
    the first two columns are the release ident and work ident in the form of a UUID, which is how they are stored in the database. when transformed to JSON, we encode the UUID as a hash-like string instead of UUID, but this export is generated directly from the database so they haven't been transformed.
    I would recommend using the release export dump (JSON lines) directly. if the size of that file is too large for you to download, I can do a quick transform using jq locally and upload JSON lines of just the ident and release extids, if that is helpful. You can ping (@) me here or email me (username @archive.org) as well
    bnewbold
    @bnewbold
    The Fatcat API (and web interface) are going to be in read-only mode for maintenance for a couple time spans this week. Likely from now for up to 24 hours, then again later in the week.
    there will be a banner on the site linking here for updates. searching, browsing, and viewing should not be impacted
    bnewbold
    @bnewbold
    ^ this work is complete, fatcat is back in read/write mode and catching up on recent publications
    Hashi
    @stencil:matrix.org
    [m]
    I have found a dspace domain that I would like to submit for auto crawling, how can I do this?
    bnewbold
    @bnewbold
    @stencil:matrix.org can you share the domain here? do you know if any specific persistent identifier is used (DOI, OAI-PMH, ark, etc)? we have crawled many OAI-PMH repositories, but have not indexed them in to fatcat.wiki yet. we have also crawled many DOIs for repositories (often registered via datacite)