mfrw on pyenv-list-versions
pyenv: add example to list all … (compare)
github-actions[bot] on master
[GitHub Actions] uploaded asset… (compare)
owenvoke on main
[GitHub Actions] uploaded asset… (compare)
github-actions[bot] on master
autoflake, autojump, autopep8: … (compare)
marchersimon on translate
marchersimon on main
autoflake, autojump, autopep8: … (compare)
cut out
was used, but I'd agree that the above is ambiguous and probably warrants rewording.TLDR has many clients made in different programming language, the one in the brew package which I think you have installed is here
I have never used this client, so I am currently checking whether there is a way to turn off the colors
Thoughts on us maintaining a translation dataset using tl;dr as the source?
Ideally, I'd like to submit it to Opus as a public dataset, so projects that need lots of translations can use it for whatever natural language use-case they have.
I'm mainly hoping to do this to extend the datasets Argos Translate and LibreTranslate have access to. (Both free, open-source, and self-hostable machine translations.)
I've already written a script which more or less works. (Yes I know it's very messy!)
https://gist.github.com/SethFalco/3efb0c73017d7d2e656fb148cc8692e4
I'll attach the output of it atm, which is also a mess, but I was just rushing it a bit.
If we don't want to maintain it in the repo, that's fine with me.
I could still just clean it up and provide the script/dump to Opus if they'd be happy to take it.
I just figured it could be nice to make it an artifact in CI, similar to the PDF we export.
For context on Argos Translate and LibreTranslate:
Ahh, I'll elaborate tomorrow.
I'm a bit busy now and need to sleep soon. 🥱
Work tomorrow. 🥲
I'll try to keep this concise to it's not an essay. 😆
OPUS is a website that hosts free public datasets for mapping natural language. (i.e. translations)
These datasets vary in size, though some are quite small.
They are used mostly for aiding projects that need large amounts of translation data.
Two projects I like called Argos Translate and LibreTranslate use these datasets to train models for machine translations, and made a cloud service which is comparable to Google Translate, except it's free, open-source, self-hostable, and works offline.
I would like to generate a dataset which matches the schema for OPUS, using tl;dr as the source. Then this can be used to train models as well by Argos Translate.
My request was, if I cleaned up the script I made above, could we commit that to the repo and actually make the dataset an artefact in CI (GitHub Actions) as well, similar to the tldr PDF/book?
If not, I'll just provide the script and dataset to OPUS directly.
The way the script works, it reads all English pages, and finds where we have a translated page.
Then it parses the markdown, and creates pairs of command descriptions to commands. (During the clean up, I should have it process the overall page description as well, currently only taking command descriptions.)
When comparing descriptions between languages, I use the command as an ID so we know we're getting the correct thing.
Since our translations can be inconsistent, i.e. the order changes or commands are added/removed without the translation being updated, we can't just assume everything matches out of the box.
Then we remove all translatable elements from commands, i.e. {{...}}
so we can use the command as a nice ID. (because we translate {{path/to/file}}
, for example)
We then remove all descriptions/commands that are duplicates, since we have no way of 100% selecting the right translation to map it too.
Then find the same command between the English page, and the translated page, and save the description to the dataset.
I think it'd be nice to keep it in a separate repo actually!
I was thinking that myself after I sent the messages, but didn't want to spam the chat about this since no one had responded yet. ^-^'
In that case, on Friday probably I'll read up what format it actually needs to be, and how I can improve the script.
I think the only con to separating is that we either only maintain the script to generate the dataset, or put in some extra effort so it builds on commit to the main project, i.e. build on webhook or something. I'm assuming just maintaining the script and instructions on how to build on demand would be fine though, while convenient I don't think constant builds are actually that useful in this case.
This would clean up a lot of clutter.
To be exact, it would bring down the number of issues from 158 to 70
Ayy, Matrix is growing more and more.
Just learned that Rocket.Chat has adopted Matrix as well now!
So now we have multiple established clients with it, Element, FluffyChat, Fractal, Thunderbird, etc.
Multiple non-profits have moved to it from IRC like Mozilla, GNOME, KDE, etc, and some governments are using Matrix now, like France, there was also a deal to move the German healthcare system onto it.
GitLab let Element acquire Gitter, so that they could maintain it as part of the Matrix ecosystem.
Now Rocket.Chat is adding native support for Matrix.
Nextcloud already works together with Rocket.Chat to integrate their chats, so I suspect that'll come with it, or follow up soon after Rocket.Chat has stable support.
https://matrix.org/blog/2022/05/30/welcoming-rocket-chat-to-matrix
Just in time for the EU to start mandating chat interoperability. (Law passed on 5th July 2022.)
The DMA is set to force changes in companies' businesses, requiring them to make their messaging services interoperable and provide business users access to their data.
I still doubt in the end companies will use Matrix, they'll probably end up creating their own "Big Tech"-standard which is anti-encryption and advertiser friendly, but we'll wait and see.
Maybe if Matrix grows fast enough, it'll be harder to excuse.
I'm wary that something like what Twitter is doing will happen though, we've had ActivityPub for years, yet Twitter like to pretend that it doesn't exist and are revolutionising social media by creating their own standard.
tldr-pages:main
and highlights it, while all other PRs show only main
We are having issues with the Build PDF
step on CI.
AttributeError: module 'importlib' has no attribute 'util'
Some Py dev in the room?
https://github.com/tldr-pages/tldr/runs/7360708494?check_suite_focus=true