by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • May 24 14:09

    depfu[bot] on update

    (compare)

  • May 24 14:09
    depfu[bot] closed #716
  • May 24 14:09
    depfu[bot] commented #716
  • May 24 14:09
    depfu[bot] labeled #733
  • May 24 14:09
    depfu[bot] opened #733
  • May 24 14:00

    depfu[bot] on update

    Update mocha to version 7.2.0 (compare)

  • May 24 12:48

    depfu[bot] on update

    (compare)

  • May 24 12:48

    papandreou on master

    Update rollup-plugin-terser to … Merge pull request #732 from un… (compare)

  • May 24 12:48
    papandreou closed #732
  • May 24 12:34
    depfu[bot] labeled #732
  • May 24 12:34
    depfu[bot] opened #732
  • May 24 12:25

    depfu[bot] on update

    Update rollup-plugin-terser to … (compare)

  • May 21 13:30
    depfu[bot] synchronize #723
  • May 21 13:30

    depfu[bot] on update

    Update jest to version 26.0.1 (compare)

  • May 21 13:11
    depfu[bot] synchronize #722
  • May 21 13:11

    depfu[bot] on update

    Update jest to version 25.5.4 (compare)

  • May 21 13:08

    depfu[bot] on update

    (compare)

  • May 21 13:08

    papandreou on master

    Update karma to version 5.0.9 Merge pull request #731 from un… (compare)

  • May 21 13:08
    papandreou closed #731
  • May 20 22:53
    depfu[bot] labeled #731
Sune Simonsen
@sunesimonsen
There is also the memory usage to consider. It is mainly to be able to do something like this:
await program(
  glob('./packages/*/package.json'),
  jsonStream('.dependencies.*'),
  tap()
)
But you might be right that it is always fast and better to do something like this:
await program(
  glob('./packages/*/package.json'),
  map(fs.readFile),
  map(JSON.parse),
  jsonFind('.dependencies.*'),
  tap()
)
Andreas Lind
@papandreou
If that's the main use case, I don't think you'll get any performance or memory benefits from streaming the JSON as opposed to JSON.parse(str).dependencies. You still want a big fraction of the ...
yeah
I think you need to be in a situation where the JSON files are in the GB range where you want to pick out just small bits.
Sune Simonsen
@sunesimonsen
Yes probably, but it is still a lot of stuff you need in memory, so that is still something to consider.
Andreas Lind
@papandreou
It'll immediately be scavenge-garbage collected if you only keep a reference to the stuff you need right after JSON.parse
Sune Simonsen
@sunesimonsen
:+1:
I'll ice it till have have GB of JSON :-)
Andreas Lind
@papandreou
You know that I'm a streaming nut and would usually welcome any excuse to apply it if there was a benefit :)
Sune Simonsen
@sunesimonsen
I think you might be right, that most bigger JSON data is line delimited.
Andreas Lind
@papandreou
Yeah, I think that's the lazy (geek point-wise) but extremely efficient and pragmatic way to do it :)
Sune Simonsen
@sunesimonsen
I gotten pretty far with the initial documentation https://github.com/sunesimonsen/transformation/blob/master/packages/core/Readme.md it starting to look like something :-)
Andreas Lind
@papandreou
That looks great. I try picking it up when I run into a suitable problem :)
Andreas Lind
@papandreou
I have not looked at that lint-staged replacement idea for some time now, but the breakdown of that is something like:
  • Open .git/index, parse and stream the entries (metadata about each staged file)
  • Apply some sort of filtering based on the extensions that have to be linted
  • Maybe fan out to a worker pool:
    • Read the files in the staged state from git's object store based on the sha in the index entry
    • Run the contents of each file through the linters configured for the file type (optionally in --fix mode, which would mean that the linters can't run in parallel on one file)
    • If --fix mode, write each changed file back to the git object storage
    • If any (non-fixable) errors happen, indicate it on the screen as soon as possible (probably has to involve the main thread?)
    • Report a status about each file back to the main thread
  • Back in the main thread after all the steps have completed
  • Rewrite the git index if any files were auto-fixed.
  • Exit with 1 and a report if at least one file had linting errors
The invocation of the linters would happen through their programmatic apis, so there would be a benefit from only require-ing each of them once.
Sune Simonsen
@sunesimonsen
It is a long recipe, but it sounds doable. If you can imagine it happen sequencial, then you can always apply concurrency to the individual parts of the pipeline.
Andreas Lind
@papandreou
Ah, thanks for that!
That's good for inspiration
Sune Simonsen
@sunesimonsen
There is still a lot of things that can be a bit simpler, but I usually just implement the things as I need them :-)
One thing that is a bit hard to get used to, is that you should try to avoid global state, but using memorization seems to be one of the tricks to get around that.
Andreas Lind
@papandreou
I'll probably try to write it out naively first, just to see what the lower bound on the potential improvement is :)
Sune Simonsen
@sunesimonsen
The nice thing is that you can top a tap at the end and just start building up the pipeline. You will be able to see what you get while you write it.
Andreas Lind
@papandreou
:heart_eyes:
Sune Simonsen
@sunesimonsen
I don't want to hash the files twice, but I don't want coordination either, so I'm just using this trick https://github.com/sunesimonsen/workspace-cache/blob/eed8b11f8df74d99cc3415abbd7f1e65cfb4dbed/lib/calculatePackageHash.js#L4
You can try it and see if you think it is fun.
Alex J Burke
@alexjeffburke
I’m also still keen to try this - started to read the CSP paper a couple of weekends back and got my mind blown
@sunesimonsen do you ave a sense yet of what kinds of problems are well expressed in this form? I think that’s my big question - when you habitually implement things using a certain art of approaches, I think you have to learn to spot cases where you can do something differentv
Sune Simonsen
@sunesimonsen
@alexjeffburke anything that reads input, do one or more transformations and output something. Input can be from anywhere and output can be to anywhere. So it is a pretty large set of problems you can solve this way.
I have made the yarn workspace cache manager with it and a program that read a CSV of stock transactions and make CSV for each year and output a report webpage.
I'm trying to add tools enough that it would be appropriate fo at any kind of scripting I will do. Moving files around, calling external programs that kind of stuff.
Sune Simonsen
@sunesimonsen
Programs that continually reads user input from the console and drives an interaction flow should also be possible.
The only requirement is that things are linear.
Gert Sønderby
@gertsonderby
I got to thinking. The trick here is to read a 'chunk' of JSON that is parseable, right? So you have only a few syntactic cases. A boolean value, a number, a string, lists and dictionaries. Wouldn't it be doable to create a very simple parser to determine these cases? The primitives should be easy enough - true|false|(?:0(?:x(?:0)?)?\d+(?:.\d+)?(?:e\d+)?|"[^"]*" - so you'd just need a way to count parentheses as it were.
Then throw that chunk at the native JSON parser and bob's your uncle.
Sune Simonsen
@sunesimonsen
You usually would use something like this under the covers https://www.npmjs.com/package/sax
Gert Sønderby
@gertsonderby
What I mean is to deal with the whole speed thing. Parsing the stream on the fly can't be done with JSON.parse. So what you need is a fast chunking algorithm that can.
Sune Simonsen
@sunesimonsen
Yes, that is what a sax parser will do.
It gives you events when it sees tokens.
Gert Sønderby
@gertsonderby
Yeah. But does it eat JSON?
Sune Simonsen
@sunesimonsen
Ahh sorry, wrong pointer :-)
Gert Sønderby
@gertsonderby
And the tokens we mostly care aboiut would be only []{}, right?
(And checking if those are inside ""pairs so we can disregard those.)
Sune Simonsen
@sunesimonsen
This one looks about right: https://www.npmjs.com/package/stream-json
But yes, if you are just looking at picking out things, you might be able to find a sub string you can call JSON.parse with.
But if you care already parsing tokens, I think it would be more performant to just build up the objects as you see them, that is pretty trivial.
Gert Sønderby
@gertsonderby
True enough. I guess the benchmarks will tell!
Sune Simonsen
@sunesimonsen
@papandreou convinced me, that you will probably never run into a case where JSON.parse wont be appropriate. Even if it is an interesting problem.
For large things you will probably always use https://en.wikipedia.org/wiki/JSON_streaming