by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Sep 23 22:34
    dgelessus commented #51
  • Sep 23 22:32
    dgelessus commented #51
  • Sep 23 20:31
    KOLANICH commented #51
  • Sep 23 20:30
    KOLANICH commented #51
  • Sep 23 20:25
    KOLANICH commented #51
  • Sep 23 20:25
    generalmimon labeled #39
  • Sep 23 13:42
    abitrolly commented #51
  • Sep 22 07:04
    KOLANICH commented #51
  • Sep 22 06:58
    KOLANICH commented #51
  • Sep 22 05:28
    ildar commented #322
  • Sep 22 03:41
    abitrolly commented #51
  • Sep 22 00:29
    hallipr commented #810
  • Sep 21 22:40
    KOLANICH opened #343
  • Sep 21 22:32
    KOLANICH commented #51
  • Sep 21 21:32
    webbnh commented #810
  • Sep 21 21:25
    webbnh commented #810
  • Sep 21 19:43
    KOLANICH opened #342
  • Sep 21 19:42
    generalmimon commented #51
  • Sep 21 19:41
    generalmimon commented #51
Mikhail Yakshin
@GreyCat
If a language includes a native serializer/deserializer for VLQ, it will be very effective and straightforward to use
Bouke Versteegh
@boukeversteegh
ah yes, i've looked at the definition for that..
it is implemented basically in an OOP manner, with objects for each byte
Mikhail Yakshin
@GreyCat
Native implementation will not store anything
Bouke Versteegh
@boukeversteegh
that should be significantly slower compared to a native implementation that only does bitwise operations
right!
Mikhail Yakshin
@GreyCat
There's no point in storing individual bytes and wrapping them in some convoluted object structure, if you just care about the final result
On the other hand, if you care about representing the structure (e.g. for educational purposes or for research), you'll need some structure like the one introduced in KS
Bouke Versteegh
@boukeversteegh
indeed.. and i think that there will always be some overhead, because the structure of the kaitai definition requires references to objects, whereas many of those structures you don't need in the end
Mikhail Yakshin
@GreyCat
That's right — it all boils down to question what you really need in the end
Bouke Versteegh
@boukeversteegh
interesting..
well, KS is in general a great starting point for understanding a binary format, if the definition is available
and can be a baseline to provide support in many languages
if needed, it can be reimplemented in a target language if performance isn't the best
ok, thanks for your insights @GreyCat ! enjoy your afternoon :)
Mikhail Yakshin
@GreyCat
:+1:
dgelessus
@dgelessus
Some performance anecdata from me: a few months ago I took a decompressor written in pure Python (for a non-standard legacy compression format) and rewrote it to use Kaitai Struct for parsing the compressed data. The test suite for the decompressor ran twice as slow with the KS-based implementation compared to the pure Python one. I don't think this is representative of other languages (or even other Python implementations) though - I was testing with CPython, which doesn't do JIT compilation and doesn't have many other runtime optimizations. The results were probably also affected by kaitai-io/kaitai_struct#804.
https://gitter.im/kaitai_struct/Lobby?at=5e800119c1880d2c9b49a426
Kenny Root
@kruton
In SSH, there are message numbers represented as u1, but a range of messages depend on what state it's in. If you started a GSS-API authentication, it should represent message IDs 60-66 as representing GSS-API messages. If you start a publickey authentication, message ID 60 will be to prove possession of the publickey. Is there a good way of representing this in Kaitai Struct definition files? I can't think of a way to have a switch-on enum where part of the cases would depend on the params.
dgelessus
@dgelessus
If it's not practical to create two slightly different variants of the enum (because there are a lot of other enum values that are identical between the two), the best solution is probably to rename the value so that all possible meanings are obvious, or give it a name like auth_method_specific_60 to make it clear that the meaning depends on the authentication method. KS enums are completely constant and don't support parameters, so there's no way to change the meaning of some enum values depending on context.
Kenny Root
@kruton
Okay, that's where I was headed. I'll see if I can define as much as possible in the .ksy file to simplify the business logic.
Bouke Versteegh
@boukeversteegh
@Maxzor_gitlab i'm ready, see email :)
dromer
@dromer
hi all. just found katai_struct in my search for parsing apev2 tags. still looking around, but having the apev2 spec I hope it should be fairly easy to get started
dromer
@dromer
hmm, getting this error:
/types/frame/seq/3/size: invalid type: expected integer, got BytesLimitType(IntNum(4),None,false,None,None)
On:
  frame:
    seq:
      - id: item_size
        size: 4
      - id: item_flags
        size: 4
      - id: item_key
        type: str
        size: 16
        terminator: 0
        encoding: UTF-8
      - id: item_value
        size: item_size
I have a delimited string there, but the parser keeps complaining about type of the size
just following the docs
dromer
@dromer
hm, managed to get no errors by changing the type of some fields to bytes (which also helps with reading the spec), but so far the parser doesn't give any objects with my test-file. Here the definition so far (any comments welcome :) ):
meta:
  id: apev2
  file-extension: apetag
  endian: be
types:
  tag:
    seq:
      - id: header
        type: header
        size: 32
      - id: frames
        type: frame
      - id: footer
        type: footer
        size: 32
  header:
    seq:
      - id: preamble
        contents: 'APETAGEX'
        size: 8
      - id: version_number
        type: b32
      - id: tag_size
        type: b32
      - id: item_count
        type: b32
      - id: tag_flags
        type: b32
      - id: reserved
        contents: [0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0]
        size: 8

  frame:
    seq:
      - id: item_size
        type: b32
      - id: item_flags
        type: b32
      - id: item_key
        type: str
        terminator: 0
        encoding: UTF-8
      - id: item_value
        size: item_size

  footer:
    seq:
      - id: preamble
        contents: 'APETAGEX'
        size: 8
      - id: version_number
        type: b32
      - id: tag_size
        type: b32
      - id: item_count
        type: b32
      - id: tag_flags
        type: b32
      - id: reserved
        contents: [0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0]
        size: 8
dgelessus
@dgelessus
@dromer When you declare a field with only size and no type, it becomes a byte array. If you want item_size to be an integer, you need to declare it as type: u4 instead of size: 4. The error you're getting from the compiler basically means "the value of size (under item_value) must be an integer, but you supplied a byte array (namely item_size)".
type: b32 is mostly equivalent to type: u4, but you should really always use type: u4 unless you have a specific reaason for using b32.
In the second code block that you posted, you're defining some types, but you don't have a top-level seq declaration. That's probably why you're not getting anything from the parser. I'm a bit surprised that the compiler allows this without errors...
dromer
@dromer
@dgelessus yes I changed a number of those fields to u4
don't see any errors, but also no result yet ;)
do I need to call the parser explicitly, or is it always evaluating?

ah, I have to add a

seq:
  - id: tag
    type: tag

To actually go over the sequence, right?

then I get a nice
Call stack: undefined KaitaiEOFError: requested 16777216 bytes, but only 758 bytes available
so at least it's trying to get data .. except way too much hehe
dgelessus
@dgelessus
hm, that sort of error is a bit difficult to debug without already knowing the format and seeing the data :) 16777216 in decimal is 0x1000000 in hex - maybe check that the endianness is correct and that you don't have a size field mixed up with a flags field?
dromer
@dromer
hah, endianness was the trick indeed
I can now read the header and the very first item, now I need to figure out how to define that my 'frame' his a certain number of items (as defined in the header)
dgelessus
@dgelessus
you're looking for repeat: expr probably
dromer
@dromer
yup :D
ok, can at least select everything in my file now! thnx :)
@dgelessus everything that is currently still a 'b' type will get its own sub-type probably. there are a number of blocks of bits that each have their own meaning and such
so if I want to parse those I will specify those more in depth
dromer @dromer now see how to use this with python
dromer
@dromer
hmm, trying to build kaitai-struct-compiler with scala from debian stable. not much luck yet ..
I think the scala version may be too old? in build.sbt I see scalaVersion := "2.12.4"
(I have never used scala in my life, btw)
dromer
@dromer
ok, managed to get the binary download to work and generated a python struct version from the ksy :)
Bouke Versteegh
@boukeversteegh
well done @dromer ! seems you got the hang of it quickly. was it hard to figure out how it all works?
dgelessus
@dgelessus
yeah, I would almost always recommend just using the prebuilt downloads for the command-line compiler. There's normally no need to build it yourself unless you're working on the compiler source code.
If you do want to build it from source - it should be enough if you get a recent enough version of sbt installed. sbt will then download the correct version of Scala when building the project.