Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 11:03
    marchersimon commented #8327
  • 11:00
    marchersimon commented #8327
  • 10:58
    mfrw commented #8327
  • 10:57
    mfrw commented #8327
  • 10:50
    marchersimon commented #8327
  • 10:44
    mfrw ready_for_review #8327
  • 10:44
    mfrw edited #8327
  • 10:43
    github-actions[bot] labeled #8327
  • 10:43
    github-actions[bot] labeled #8327
  • 10:43
    mfrw opened #8327
  • 10:43

    mfrw on pyenv-list-versions

    pyenv: add example to list all … (compare)

  • 10:32
    marchersimon commented #8297
  • 10:15
    navarroaxel commented #8297
  • 10:07
    navarroaxel review_requested #8326
  • 06:55

    github-actions[bot] on master

    [GitHub Actions] uploaded asset… (compare)

  • 06:55

    owenvoke on main

    [GitHub Actions] uploaded asset… (compare)

  • 06:52

    github-actions[bot] on master

    autoflake, autojump, autopep8: … (compare)

  • 06:52

    marchersimon on translate

    (compare)

  • 06:52

    marchersimon on main

    autoflake, autojump, autopep8: … (compare)

  • 06:52
    marchersimon closed #8325
jim
@jimrothstein
Actual behavior: ls -1 | cut -c 1-16 is to KEEP the first 16 characters
Desktop
Downloads
R
R_daily_update.l
bin
client_secret_46
sethi
@sethi:one.ems.host
[m]
I could totally see why the term cut out was used, but I'd agree that the above is ambiguous and probably warrants rewording.
@jimrothstein If you're up for it, you could think what you'd rather it said instead and open a PR?
Stathis Kapnidis
@StathisKap
do youguys know how to change tldr's colors? It's unreadable on my terminal
CleanMachine1
@CleanMachine1
Which client are you using @StathisKap
Stathis Kapnidis
@StathisKap
what do you mean?
I'm on Mac Os. tldr version is tldr v1.4.3 (v1.4.3)
CleanMachine1
@CleanMachine1

TLDR has many clients made in different programming language, the one in the brew package which I think you have installed is here

I have never used this client, so I am currently checking whether there is a way to turn off the colors

CleanMachine1
@CleanMachine1
From what I can tell, there is no config file however you could either use the python client which has the configuration you need which can be found here or change the ansi codes which are being used here to something else like "\033[37m" which is the color white or anything from here then recompile the code and move the binary to the correct place
However the edits may cause problems when displaying codes :shrug:
Stathis Kapnidis
@StathisKap
ok got it. I'll probably just go for the python version. Thanks a lot man
marchersimon
@marchersimon
Can I merge #7872? I'm not sure, because I haven't been around lately and it's not a trivial change
4 replies
waldyrious
@waldyrious:matrix.org
[m]
I'm not sure when I'll be able to re-review it. However, most of my concerns have been described in my past comments there. I trust that you guys are able to discern whether they have been addressed or not. In any case, additional changes can be submitted as separate PRs, so don't fret too much about it :)
Starbeamrainbowlabs
@sbrl
Sorry for the extended absence, everyone!
What with PhD work and being burnt out on tldr-pages (it's still a bit scary thinking of looking at my GH notifications), I haven'tbeen around much here
But wwhile I've been away I've realised that my experiences here have been very valuable, and I'm proud to call myself an open-source maintainer.
So I'll try my best to keep up around here, but I might be a bit sparse for a little while as I get into it again :-)
waldyrious
@waldyrious:matrix.org
[m]
A PhD takes up a lot of one's mental energy, @sbrl :) it's perfectly understandable that you might not have the bandwidth to remain actively maintaining the project (burnout nonwithstanding). Thanks for all your work over the years btw!
sethi
@sethi:one.ems.host
[m]

Thoughts on us maintaining a translation dataset using tl;dr as the source?
Ideally, I'd like to submit it to Opus as a public dataset, so projects that need lots of translations can use it for whatever natural language use-case they have.

I'm mainly hoping to do this to extend the datasets Argos Translate and LibreTranslate have access to. (Both free, open-source, and self-hostable machine translations.)

I've already written a script which more or less works. (Yes I know it's very messy!)
https://gist.github.com/SethFalco/3efb0c73017d7d2e656fb148cc8692e4

I'll attach the output of it atm, which is also a mess, but I was just rushing it a bit.

sethi
@sethi:one.ems.host
[m]

If we don't want to maintain it in the repo, that's fine with me.
I could still just clean it up and provide the script/dump to Opus if they'd be happy to take it.

I just figured it could be nice to make it an artifact in CI, similar to the PDF we export.

For context on Argos Translate and LibreTranslate:

Starbeamrainbowlabs
@sbrl
I'm not sure what you're suggesting @sethi:one.ems.host?
sethi
@sethi:one.ems.host
[m]

Ahh, I'll elaborate tomorrow.
I'm a bit busy now and need to sleep soon. 🥱

Work tomorrow. 🥲

Starbeamrainbowlabs
@sbrl
no worries :-)
sethi
@sethi:one.ems.host
[m]

I'll try to keep this concise to it's not an essay. 😆

OPUS is a website that hosts free public datasets for mapping natural language. (i.e. translations)
These datasets vary in size, though some are quite small.
They are used mostly for aiding projects that need large amounts of translation data.

Two projects I like called Argos Translate and LibreTranslate use these datasets to train models for machine translations, and made a cloud service which is comparable to Google Translate, except it's free, open-source, self-hostable, and works offline.

I would like to generate a dataset which matches the schema for OPUS, using tl;dr as the source. Then this can be used to train models as well by Argos Translate.


My request was, if I cleaned up the script I made above, could we commit that to the repo and actually make the dataset an artefact in CI (GitHub Actions) as well, similar to the tldr PDF/book?

If not, I'll just provide the script and dataset to OPUS directly.


The way the script works, it reads all English pages, and finds where we have a translated page.
Then it parses the markdown, and creates pairs of command descriptions to commands. (During the clean up, I should have it process the overall page description as well, currently only taking command descriptions.)

When comparing descriptions between languages, I use the command as an ID so we know we're getting the correct thing.
Since our translations can be inconsistent, i.e. the order changes or commands are added/removed without the translation being updated, we can't just assume everything matches out of the box.

Then we remove all translatable elements from commands, i.e. {{...}} so we can use the command as a nice ID. (because we translate {{path/to/file}}, for example)
We then remove all descriptions/commands that are duplicates, since we have no way of 100% selecting the right translation to map it too.

Then find the same command between the English page, and the translated page, and save the description to the dataset.

Starbeamrainbowlabs
@sbrl
Ohhh I see! Sure thing
go ahead
so long as the licence of the repo (CC-BY) is adhered to, feel free :-)
waldyrious
@waldyrious:matrix.org
[m]
I think that's a great idea, Seth :) it's especially useful to provide a tech-related corpus for machine translation, since for many languages such specialized vocabulary is often missing in these resources
However, I wonder if we shouldn't put it in a separate repository (still in the tldr-pages org) because unlike e.g. the pdf or the zipped archive, the output is not a generally consumable resource, but rather a formar specific to the Opus project, IIUC
WDYT?
sethi
@sethi:one.ems.host
[m]

I think it'd be nice to keep it in a separate repo actually!
I was thinking that myself after I sent the messages, but didn't want to spam the chat about this since no one had responded yet. ^-^'

In that case, on Friday probably I'll read up what format it actually needs to be, and how I can improve the script.

  1. Just so it's not a cluttered mess.
  2. So I can take more time to evaluate how to maximise valid mappings.

I think the only con to separating is that we either only maintain the script to generate the dataset, or put in some extra effort so it builds on commit to the main project, i.e. build on webhook or something. I'm assuming just maintaining the script and instructions on how to build on demand would be fine though, while convenient I don't think constant builds are actually that useful in this case.

Starbeamrainbowlabs
@sbrl
I'm sure a GitHub Action or webhook would be possible
waldyrious
@waldyrious:matrix.org
[m]
:point_up: Edit: However, I wonder if we shouldn't put it in a separate repository (still in the tldr-pages org) because unlike e.g. the pdf or the zipped archive, the output is not a generally consumable resource, but rather a format specific to the Opus project, IIUC
Yeah, we could make it build periodically, but given the scale that these corpora operate on, I agree that building on each commit is overkill.
waldyrious
@waldyrious:matrix.org
[m]
Yeah, given the scale that machine translation corpora operate on, I agree that building it on each commit is overkill. But we could still build it automatically on a schedule that would be reasonable (say, monthly or so)
Starbeamrainbowlabs
@sbrl
sounds good @ waldyrious
I wonder if it's possible to do with GitHub Actions? hmmmmm
marchersimon
@marchersimon
What do you think about moving all Page request issues into a single one? This would clean up a lot of clutter.
I'm not sure what we would do with new requests

This would clean up a lot of clutter.

To be exact, it would bring down the number of issues from 158 to 70

waldyrious
@waldyrious:matrix.org
[m]
IMO the downsides (e.g. ceasing to have easy issues that new contributors can address and actually close, or not being able to have a detailed conversation about a particular command without making the thread excessively long) outweigh the benefits (of having fewer issues in the tracker, which btw I hardly see as a significant benefit anyway).
marchersimon
@marchersimon
That's a good point. However I feel like most of those issues are almost never addressed/closed anyways.
Starbeamrainbowlabs
@sbrl
That would make it harder to discuss individual pages I think
marchersimon
@marchersimon
Then let's leave it
CleanMachine1
@CleanMachine1
I would also like for there to be less clutter, however as mentioned, the downsides outweigh the upsides unfortunately.
Starbeamrainbowlabs
@sbrl
You can always filter to hide issues with a given tag IIRC
Matthew Peveler
@MasterOdin
The GitHub API token for doing releases for tldr-c-client has expired (https://github.com/tldr-pages/tldr-c-client/runs/7313233179?check_suite_focus=true), and so someone needs to set a new one.
6 replies
sethi
@sethi:one.ems.host
[m]

Ayy, Matrix is growing more and more.
Just learned that Rocket.Chat has adopted Matrix as well now!

So now we have multiple established clients with it, Element, FluffyChat, Fractal, Thunderbird, etc.
Multiple non-profits have moved to it from IRC like Mozilla, GNOME, KDE, etc, and some governments are using Matrix now, like France, there was also a deal to move the German healthcare system onto it.

GitLab let Element acquire Gitter, so that they could maintain it as part of the Matrix ecosystem.
Now Rocket.Chat is adding native support for Matrix.
Nextcloud already works together with Rocket.Chat to integrate their chats, so I suspect that'll come with it, or follow up soon after Rocket.Chat has stable support.

https://matrix.org/blog/2022/05/30/welcoming-rocket-chat-to-matrix

Just in time for the EU to start mandating chat interoperability. (Law passed on 5th July 2022.)

The DMA is set to force changes in companies' businesses, requiring them to make their messaging services interoperable and provide business users access to their data.

https://www.reuters.com/technology/eu-lawmakers-pass-landmark-tech-rules-enforcement-worry-2022-07-05/

I still doubt in the end companies will use Matrix, they'll probably end up creating their own "Big Tech"-standard which is anti-encryption and advertiser friendly, but we'll wait and see.

Maybe if Matrix grows fast enough, it'll be harder to excuse.

I'm wary that something like what Twitter is doing will happen though, we've had ActivityPub for years, yet Twitter like to pretend that it doesn't exist and are revolutionising social media by creating their own standard.

What I'm referring to by the Twitter comments:
https://nitter.net/jack/status/1204766078468911106
sethi
@sethi:one.ems.host
[m]
Oh shoot… sorry! ^-^'
I just realized I put this in the main chat, not #tldr-pages_off-topic:gitter.im. DX
Matthew Peveler
@MasterOdin
Also, could someone rename the default branch of the tldr-c-client to be main so that it matches the other repos. There's no open PRs, so pretty low risk.
2 replies
Emily Grace Seville
@EmilySeville7cfg
Can I merge #7951 PR? Not all comments are resolved.
7 replies
marchersimon
@marchersimon
Can anybody see something wrong the the branches in #8205? I use Refined Github, which shows me, when someone opens a PR against a non-default branch. In this PR it shows tldr-pages:main and highlights it, while all other PRs show only main
2 replies
Axel Navarro
@navarroaxel

We are having issues with the Build PDF step on CI.

AttributeError: module 'importlib' has no attribute 'util'

Some Py dev in the room?
https://github.com/tldr-pages/tldr/runs/7360708494?check_suite_focus=true

5 replies