Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Aug 26 22:57
    nicolas-comerci commented #140
  • Aug 24 07:04
    M-Gonzalo commented #140
  • Aug 23 20:56
    M-Gonzalo opened #141
  • Aug 23 20:28
    M-Gonzalo commented #140
  • Aug 23 15:14
    nicolas-comerci commented #140
  • Aug 23 14:09
    M-Gonzalo commented #140
  • Aug 21 21:14
    nicolas-comerci edited #140
  • Aug 21 21:13
    nicolas-comerci edited #140
  • Aug 21 21:11
    nicolas-comerci edited #140
  • Aug 21 09:35
    nicolas-comerci synchronize #140
  • Aug 20 11:48
    schnaader labeled #140
  • Aug 20 11:48
    schnaader assigned #140
  • Aug 20 11:47
    schnaader review_requested #140
  • Aug 20 09:06
    nicolas-comerci edited #140
  • Aug 20 09:06
    nicolas-comerci edited #140
  • Aug 20 09:06
    nicolas-comerci edited #140
  • Aug 20 09:03
    nicolas-comerci synchronize #140
  • Aug 19 22:32
    nicolas-comerci opened #140
  • Aug 14 08:38
    SophrosyneX edited #139
  • Aug 14 08:05
    SophrosyneX opened #139
Christian Schneider
@schnaader
At the moment I'm revising the recompression steps. Before, we had 4 steps (decompression -> recompression -> decompression -> recompression), the last 2 only used for partial matches. Looks like the last 2 can be removed, leading to a slightly worse compression ratio on these streams, but speeding up things a lot (for silesia.zip: 20% speedup, 0.2% worse compression). Next commit will address that, two temporary files less!
Christian Schneider
@schnaader
That was easier than expected - schnaader/precomp-cpp@faa4776
kraileth
@kraileth
Nice! 20% less time spend for just a little worse compression is the kind of trade-off that makes sense. Any plans to bring the old method back as a switch for people who want to squeeze the last bit of compression possible out of some file?
Christian Schneider
@schnaader
Sorry, I don't think I'll do such a switch, for 3 reasons: 1. Using the old code (with temporary files) would be a bad mix when everything is done without temporary files, 2. Adjusting the code to do the 2 stages in memory would be complex as all of the 4 stages are dependent, 3. When the new deflate handling ("difflate") is ready, there will be no more partial matches.
Christian Schneider
@schnaader
However, on a second thought, a switch to disable partial matches completely might be interesting - there might be cases with many small partial matches that hurt compression.
Christian Schneider
@schnaader
"The real hero of programming is the one who writes negative code." - score today: -154 LOC
Christian Schneider
@schnaader
OK, another 15% speedup on silesia.zip (compression is exactly the same, nothing changed except removing the need for one of the temporary files). The next thing is to do zLib decompression in memory, after that issue #14 will be done and temporary files will only be used on GIF and some special files like multi-PNGs and JPG/MP3 files bigger than 64 MB.
Christian Schneider
@schnaader
Fixed two cases where the new versions with less temporary file usage had worse compression even for non-partial matches. Reason was Precomp thinking it had found the best compression/memory level combination although it hadn't. This was already a bug even before removing the 2 last recompression steps, but these steps prevented most of the impact, so the bug was hidden.
Christian Schneider
@schnaader
Quick status update: Just finished liblzma on-the-fly compression and decompression in sftt's branch, you can test it here: https://github.com/sftt/precomp-cpp/tree/schnaader - at the moment, you can use "-cl" (lzma), "-cx" (xz) and "-cm" (lzma2, multithreaded). I will try to clean up and merge soon. After that, the last temporary file will fall and 0.4.6 comes closer.
PrinceGupta200
@PrinceGupta200
This message was deleted
PrinceGupta200
@PrinceGupta200
get a 1 kb file using -cm and precomp stops at 68.93% without any warning or anything
tested 3 times and everytime same result
PrinceGupta200
@PrinceGupta200
tested file --> Test.tar(167150kb)---->precomp -intense0 -cn---->305947kb
precomp -intense0 -cl----->104174kb
precomp -intense0 -cx---->104174kb
precomp -intense0 -cm----->1kb
Christian Schneider
@schnaader
Interesting, thanks for reporting. I haven't tested -cm on big files (> 10 MB) yet, there might still be bugs, I'll have a look at it.
PrinceGupta200
@PrinceGupta200
here is the console output
https://drive.google.com/open?id=0BxQSCiRtuvAScEItaUowbjhBM3M
maybe helpful for u
PrinceGupta200
@PrinceGupta200
one more thing no temp file left behind in any case
Abhishek Sharma
@RamiroCruzo
Hey schnaader Sir, I wonder, you're using partial matching code which uses about first 100 bytes, then why not totally shift those operations into memory?
Christian Schneider
@schnaader
Doing everything in memory already is in progress, when it's done, no temporary files will be used anymore for zLib streams.
Abhishek Sharma
@RamiroCruzo
Nice, also for brute, why don't ya use Ring Buffer, it can speed up things a lot
Christian Schneider
@schnaader
Brute and intense mode are special cases - brute originally was meant to be used to analyze small files only.
The good news is, I recently found out that intense mode is slowed down massively by temporary files, so these two modes will be much faster when done.
Abhishek Sharma
@RamiroCruzo
Haha..Well, I was saying this for a long time for now
As for Brute being for small files, at the time of building they were supposed to be run on small files, but whats the problem in speeding them up now ;)
Christian Schneider
@schnaader
Well, even after speeding up, it will be very slow and adding file format support for files that contain headerless deflate streams is much better than using brute mode which is slow and has many false positives that decrease compression ratio.
Abhishek Sharma
@RamiroCruzo
Well its true in a way too
Why don't ya instead of trying to decompress deflate at every position try to decode Huffman?
Its faster than deflate approach
Christian Schneider
@schnaader
At the end, this is the same thing that zLib does here if the stream is invalid - decoding huffman and returning an error code. Removing the zLib wrapper overhead shouldn't speed it up that much.
Abhishek Sharma
@RamiroCruzo
Hmmm Okie then :D Well, was only collecting even the smallest speed gain we can get
On top of that, should we focussing on MT too?
Christian Schneider
@schnaader
I haven't tested it, though, so it actually might help a bit. But my guess would be like 5-10% faster whereas removing temporary files will make things 5-10 times faster.
Yeah, MT will dominate 0.4.7 work, can be used both in compression and restoring and is easy achievable Monear speedup.
*linear
Abhishek Sharma
@RamiroCruzo
After MT, those 5-10% would be great XD
Also, even after temps, you should reduce disk activity, Precomp seeks forward and backward way many times, leading to slower access
Christian Schneider
@schnaader
Yes, especially the input file buffering could be done better.
Abhishek Sharma
@RamiroCruzo
Yep :D
Abhishek Sharma
@RamiroCruzo
Dead room eh...
Abhishek Sharma
@RamiroCruzo
There sir? Got a few interesting results with Precomp
Christian Schneider
@schnaader
Yes, I'm online.
Christian Schneider
@schnaader
Finally merged with the lzma2 branch. The new multithreaded lzma2 compression method is now used by default.
Abhishek Sharma
@RamiroCruzo
Nicee :D
Prince Gupta
@guptaprince

File: File:html book 20161029.zip

Precomp(schnaader/precomp-cpp@9ddc049): -intense0 -cn-
Time: 33 minute(s), 15 second(s)
Size: 162,947,142 bytes

PZLIB V3 (Hotfix): -m2 -x -s -b128m -t1
Time: 2.8745 minutes
Size: 170,417,418 bytes

Reflate(v1l1): c
Time: 15.781 seconds
size: 170,417,418 bytes

Don't you think precomp is way too slow?
Is it because very small zlib streams inside or any thing else?

Christian Schneider
@schnaader
Intense mode is very slow because of the temporary files that are created even if they aren't used, next version will fix that. After that, it should be comparable to the pzlib result. Note that reflate will still be faster because it only recompresses once, something that will be adressed by difflate.
Abhishek Sharma
@RamiroCruzo
Yahallo sir, long time no see, finally difflate is coming?
Prince Gupta
@guptaprince
what should be the blueprint to remove temps
Prince Gupta
@guptaprince
i m thinking of taking the lazy approach, fmemopen for Linux and for the windows using CreateFileMap and Then use_fdopen
for FILE*
for buffers have to write half of the precomp again i think
Christian Schneider
@schnaader
Look at the paq variants with zlib routines, they are what I'll use as blueprints. Basically, no need to keep the whole file in memory, only 64 KB portions since deflate uses 32 KB windows and recompression needs another 32 KB as “lookback“