Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jul 18 09:28

    schnaader on master

    encode.ru moved to encode.su -… (compare)

  • Jul 18 09:26

    schnaader on master

    encode.ru moved to encode.su -… (compare)

  • May 28 19:14
    sarether commented #55
  • May 21 14:36
    andrew-epstein commented #97
  • May 21 14:19
    M-Gonzalo commented #97
  • May 20 13:57
    andrew-epstein commented #97
  • May 18 23:10
    M-Gonzalo commented #97
  • May 16 15:14
    schnaader closed #99
  • May 16 15:13
    andrew-epstein commented #99
  • May 16 14:41
    schnaader commented #75
  • May 16 14:41
    schnaader edited #75
  • May 16 06:02
    schnaader commented #99
  • May 16 06:02
    schnaader commented #99
  • May 16 06:02
    schnaader reopened #99
  • May 16 06:01
    schnaader closed #99
  • May 16 06:01

    schnaader on master

    Fixed out-of-bounds read in pac… (compare)

  • May 16 06:01
    schnaader commented #99
  • May 16 06:01
    schnaader commented #99
  • May 16 06:00
    schnaader commented #99
  • May 15 14:03
    schnaader closed #15
Christian Schneider
@schnaader
Precomp doesn't store a hash if -co is used, see schnaader/precomp-cpp#66
-cn, not -co
AntiZ has a slightly different approach on the params, uses some heuristics to make guesses about them. Also, IIRC, it doesn't remember parameters that were used before, which speeds up Pump for most files
Pump= Precomp damn autocorrect
jagannatharjun
@jagannatharjun
antiz tries to extract information from headers.. Precomp not?
Christian Schneider
@schnaader
Precomp does it too, but the headers are too small and unreliable. See RFC 1950: https://www.ietf.org/rfc/rfc1950.txt
There are 2 bits called "compression level" (FLEVEL) that indicate that fastest/fast/default/max compression was used, but with this information, you can't really decide the real compression level setting (0-9). At least the window size in there is reliable.
On the other hand, in other formats like ZIP, there simply is no ZLIB header, so you don't know anything.
These are pure deflate streams, https://www.ietf.org/rfc/rfc1951.txt
Christian Schneider
@schnaader
Precomp uses just the window site and ignores FLEVEL because it could be anything and it's not needed for decompression. AntiZ uses the information, but I think this is bad if FLEVEL is not correct.
jagannatharjun
@jagannatharjun
making windows size -15 and initializing compression bit by bit helps me detect them and gives me almost correct inflation size
Test
but for first file it ratio should be around 316(by pzlib), i assume this is because of diff
byte by byte* XD
jagannatharjun
@jagannatharjun
how about abstract hxim/paq8px@b1f1f50 recently introduced in paq8px_v125
Christian Schneider
@schnaader
Yeah, I've seen that one. The latest Precomp version (0.4.6) is quite similar (everything in memory until 64 MB exceeded), but still writes temporary files when recursion is used - so it could be useful there.
The culprits in Precomp are things like the Multi PNGs and various parsings that jump between file positions. Also, recursion gets a filename instead of an abstract file/memory pointer. Changing it is possible, but quite some work, so I delayed it for now.
jagannatharjun
@jagannatharjun
some thing like this can give 64 mb or so window
Christian Schneider
@schnaader
Things I'd like to do before are refactorings (at the moment much code is duplicated) and unit tests (so it can be automatically determined that such big changes didn't break anything)
Even small changes can lead to subtle bugs, like schnaader/precomp-cpp#76 shows
jagannatharjun
@jagannatharjun
such abstraction will help in refactoring
ZlibWrapper
jagannatharjun
@jagannatharjun
and unit tests
jagannatharjun
@jagannatharjun
"fout_fput*" what's wrong in casting data directly as char?
Christian Schneider
@schnaader
Endianness mostly, though I guess a change in endianness would break something else anyway..
Christian Schneider
@schnaader
Since the software and the pcf files are platform independent, I wanted to make sure that 0x1234 is always written as 0x12, 0x34 instead of the usual 0x34, 0x12 and this won't change with platforms/compilers - also don't like the 3412 format because it isn't readable in hex editors :smile:
But there might be some clever way to both cast and ensure endianness.
jagannatharjun
@jagannatharjun
cpp20 wil have a compile time endianness detection
i think we can still do it with constexpr
Christian Schneider
@schnaader
I'm open to any code quality suggestions like this. The code as it is uses very little modern concepts, I'm pretty sure parts of it are still from the original code from 2006.
Too much features, not enough refactoring, not enough time, the usual..
jagannatharjun
@jagannatharjun
will you accept an external library for CLI
may be depreciated precomf?
Christian Schneider
@schnaader
This would help very much, yes. As long as it keeps the parameter syntax, of course. Would also solve schnaader/precomp-cpp#9 , I guess
jagannatharjun
@jagannatharjun
i can but can we use this, then we don't have to worry about such things
boost has also it's own CLI interface so maybe start using boost, u never know what u need next
Christian Schneider
@schnaader
Will look into both, thanks. Boost also has endian buffers (http://www.boost.org/doc/libs/1_62_0/libs/endian/doc/buffers.html ), so yeah, like you said, never know what's needed next
jagannatharjun
@jagannatharjun
i can start working on it if you say
jagannatharjun
@jagannatharjun
can i add ZlibWrapper to precomp
Christian Schneider
@schnaader
Sure, feel free to fork and send pull requests :+1: Thanks in advance!
pasha-zzz
@pasha-zzz
Sometimes streams are encoded by base64 (maybe mime or uue)... What about recompression of such streams?
Christian Schneider
@schnaader
Base64 wrapped by mime should already work, others most certainly won't. The problem is that most text could be base64, so many potential false positives. I'll try a potential solution I had in mind that is similar to intense/brute mode - it will decode base64 always, so false positives, too, but will only keep it if something else (PNG/gif/...) is found in recursion
Also see schnaader/precomp-cpp#43
pasha-zzz
@pasha-zzz
Maybe add option to test for b64? My file contains "base64:77u/PCFET0NUWVB....", "base64:iVBORw0KGgoAAAANSU..." and so on. About text files: texts w/o spaces? Rare case. Simply add minimal b64 length for detection I think...
jagannatharjun
@jagannatharjun
can you provide an overview of pcf format?
Christian Schneider
@schnaader
Good idea, I can add the format description to the wiki. Basically, it's a small header ("PCF" + version) followed by two types of blocks, interleaved: 1. unmodified (copied) data, 2. Processed streams
jagannatharjun
@jagannatharjun
Actually i am trying to make restoration parallel but the source code is too complex for me
Christian Schneider
@schnaader
I think the restoration part isn't that complex, it's the code around that is a big mess. Let me see if I can show you the relevant bits.
Christian Schneider
@schnaader
It goes through the file and looks for the mentioned block identifiers (0 = uncompressed, else progressed add
*processed streams)
In the big switch case afterwards are the routines for all the stream types. I'd suggest to start with MP3 and/or JPG for parallelism changes. They are quite slow and they don't involve recursion which would make things complicated.
My main idea for it was to start an async task with the reconstruction code and write 0 bytes to the output file (length of reconstructed stream is known). This way, the task can later write to the output file and only I/O has to be synchronized.