Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 04:40
    robertlzj commented #967
  • 04:39
    robertlzj commented #967
  • May 21 23:47
    generalmimon commented #967
  • May 21 23:44
    generalmimon commented #967
  • May 21 23:40
    generalmimon commented #967
  • May 21 23:39
    generalmimon commented #967
  • May 21 23:39
    generalmimon commented #967
  • May 21 23:26
    generalmimon commented #967
  • May 21 23:25
    generalmimon commented #967
  • May 21 23:24
    generalmimon commented #967
  • May 21 23:23
    generalmimon commented #967
  • May 21 23:21
    generalmimon reopened #967
  • May 21 23:21
    generalmimon closed #967
  • May 21 23:21
    generalmimon commented #967
  • May 21 23:07
    GreyCat commented #967
  • May 21 23:03
    GreyCat labeled #967
  • May 21 15:34
    generalmimon edited #967
  • May 21 15:34
    robertlzj opened #967
librarianmage
@librarianmage:matrix.org
[m]
oh dear
TDV Alinsa
@alinsavix
That's not nearly as bad as I expected when I clicked it.
librarianmage
@librarianmage:matrix.org
[m]
after doing some experimentation it seems like kaitai struct isn't well-suited to parsing QOI images quite yet 😔
aside from the lack of backtracking facilities (not being able to parse the 8-bit tags before the 2-bit ones), the image file terminates with a multi-byte "magic" section that I can't figure out how to parse because there is no easily-commutable length of the file (since chunks can expand into multiple pixels, width * height is insufficient)
dgelessus
@dgelessus
Ah yep, that looks like another thing that would require proper lookahead or backtracking :(
actually, looking at the official decoder, it calculates the end of the image data by simply subtracting the fixed size of the header from the total file size
dgelessus
@dgelessus
you should be able to do something similar in Kaitai Struct, for example:
seq:
  - id: header
    type: header
  - id: body
    type: chunks
    size: _io.size - header._sizeof - footer._sizeof
  - id: footer
    contents: [0, 0, 0, 0, 0, 0, 0, 1]
librarianmage
@librarianmage:matrix.org
[m]
ooh thanks!
Ruslan0Dev
@Ruslan0Dev
Hi everybody! How can I implement this?:
A sequence that may be missing at some stage.
section1, section2, section3, section4
section1, section3, section4
section2, section3
sample.bin
sample-wo2.bin without section2
meta:
  id: sample
  endian: le
seq:
- id: section1
  type: t_sig1
- id: section2
  type: t_sig2
- id: section3
  type: t_sig3
- id: section4
  type: t_sig4
types:
  t_sig1:
    seq:
    - id: sig
      contents: ['ksy']
    - id: data
      size: 2
  t_sig2:
    seq:
    - id: sig
      contents: ['head']
    - id: data
      size: 4
  t_sig3:
    seq:
    - id: sig
      contents: ['thx']
    - id: data
      size: 3
  t_sig4:
    seq:
    - id: sig
      contents: ['hi']
    - id: data
      size: 5
Ruslan0Dev
@Ruslan0Dev
Why are there no errors or why is it not working correctly?
image.png
meta:
  id: hmm
  endian: le
seq:
- id: main
  type: t_test
  repeat: expr
  repeat-expr: 2
types:
  t_test:
    seq:
    - id: animal
      size: 4
      type:
        switch-on: about_type
        cases:
          'e_t::dog': rec_type_1
          'e_t::cat': rec_type_2
          _: t_def
    - id: about_type
      type: u1
      enum: e_t
  t_def:
    seq:
    - id: reserved
      size: 4
  rec_type_1:
    seq:
    - id: x
      type: u2
    - id: y
      type: u2
  rec_type_2:
    seq:
    - id: ax
      type: u2
    - id: ay
      type: u2
enums:
  e_t:
    7: dog
    109: cat
LetsGoRosco
@LetsGoRosco
Screenshot 2022-04-13 at 18.57.59.png
Hi, I have a binary file that I need to parse which consists of chunks of data that start with a "magic" value and end with 0x0d0a. The chunks are completely sequential and of variable length, but there is no size data to be parsed out. I'm (naively) hoping I can match a type based on the starting value, but the lack of a fixed size or size data seems to be a real issue. I've been reading the docs and struggling to see if there is an approach to this kind of problem. Could any one point me in the right direction?
dgelessus
@dgelessus
Hi! Unfortunately, Kaitai Struct currently doesn't support multi-byte terminators (see kaitai-io/kaitai_struct#158), so formats like this are often not easy to parse using Kaitai Struct. As a workaround, if your data doesn't contain 0x0d on its own anywhere, you can use 0x0d as the terminator and then check that there is a 0x0a byte after it.
Also, it sounds like your format might actually be text-based in some way - 0x0d0a is a standard Windows newline. It might be easier if you use your favorite language's APIs for reading the file line by line and then parse each line separately using Kaitai Struct.
LetsGoRosco
@LetsGoRosco
@dgelessus thank you, I think you're probably right. It's some old mainframe format.
unfortunately the data does contain 0x0d in some places, but I like the idea of just parsing it line by line. I considered modifying the file before processing with Kaitai Struct to add some size data, but I prefer your idea so thanks for the steer.
Thell 'Bo' Fowler
@Thell
Question: What is your go-to rust crate for binary parsing when you want want to translate from a ksy? I'm looking most closely at binrw and nom-derive at the moment but am open to others.
Daniel Petitfils
@eapetitfils

Hi, I am trying to write a .ksy for a binary format containing a simple list of telemetry. Everything is defined as integers but in order to get the actual engineering value, I have sometimes to multiply by a value (like voltages given as a uint in mV) and for some others I have an integer corresponding to a mode of operation (0: manual, 1: automatic).

For the first ones, I am using value instances, is that the correct way?
For the second ones (lookup tables), I tried having a switch-on entry in a value instance but got an error message. Is this doable?

 expected string, got Map(switch-on -> raw_mode, cases -> Map(0 -> Automatic, 1 -> Manual))
mode:
        value:
          switch-on: raw_mode
          cases:
            0: 'Automatic'
            1: 'Manual'
dgelessus
@dgelessus

Hi! The switch-on/cases syntax only works inside types. In expressions, you can use conditional operators to get the same effect, for example:

      mode:
        value: |
          raw_mode == 0 ? 'Automatic'
          : raw_mode == 1 ? 'Manual'
          : 'invalid'

But in this case, a better solution would be to use enums. Then you can directly declare the valid values and their meanings and don't have to use a "raw" field with an intermediate instance. For example:

seq:
  - id: mode
    type: u1
    enum: mode
enums:
  mode:
    0: automatic
    1: manual
abheekd
@abheekd:matrix.adawesome.tech
[m]
hey! I'm new here but I wanted to try my hand at creating a .ksy file for a sound cache file from the witcher 3. I have an example file as well as a bms script and I was able to parse the header but I'm having trouble actually reaching the data. it's just an uncompressed archive file so it simply stores all the constituent files in the file itself. here's the BMS script:
    getdstring SIGN 4
    goto 0
    if SIGN == "CS3W"

        idstring "CS3W"
        get VER long
        get DUMMY longlong
        get INFO_OFF long
        get FILES long
        get NAMES_OFF long
        get NAMES_SIZE long

        log MEMORY_FILE NAMES_OFF NAMES_SIZE

        goto INFO_OFF
        for i = 0 < FILES
            get NAME_OFF long
            get OFFSET long
            get SIZE long
            goto NAME_OFF MEMORY_FILE
            get NAME string MEMORY_FILE
            log NAME OFFSET SIZE
        next i
I'm kinda stuck on the part where it says goto INFO_OFF (which I'm guessing is information offset) because i'm not sure how to replicate that in ksy
dgelessus
@dgelessus
Hi! I think you're looking for instances and pos, which lets you parse data from a specific byte offset. The INFO_OFF part could be translated to ksy syntax roughly like this:
seq:
  # ... SIGN, VER, DUMMY ...
  - id: info_off
    type: u4
  - id: files
    type: u4
  # ... remaining fields ...
instances:
  file_infos:
    pos: info_off
    type: file_info
    repeat: expr
    repeat-expr: files
types:
  file_info:
    seq:
      - id: name_off
        type: u4
      # ... remaining fields from inside "for i = 0 < FILES" ...
abheekd
@abheekd:matrix.adawesome.tech
[m]
yes I just figured it out after reading the docs a bit more thoroughly. your answer is super helpful and appreciated! thank you very much!
abheekd
@abheekd:matrix.adawesome.tech
[m]
one more thing: in the bms script it says that if the version is 1 then the header data is u4 but if it's higher than it's u8. based on what i saw from the conditionals, I couldn't figure out a way to implement that. any suggestions?
dgelessus
@dgelessus
To change a field's type depending on some other value, you can use switch-on:
seq:
  - id: version
    type: u4
  - id: dummy
    type: u8
  - id: info_off
    type:
      switch-on: version
      cases:
        1: u4
        _: u8 # default case
1 reply
Gavin Ray
@GavinRay97

Hiya all :wave:

Noob question: If I want to have an enum whose keys are 1-byte char/string identifiers, like 'A', 'B', etc, how would I model that?
Would it be with a one-letter string in quotations, or would I need to use the hex representation of the char byte?

seq:
    - id: type # A single byte character, identifying the message type. IE: 'Q' for Query
      type: u1

    - id: length
      type: u4

    - id: body
      size: length
      type:
          switch-on: type
          cases:
              '"B"': bind_message
              '"Q"': query_message
I feel like this is probably wrong
dgelessus
@dgelessus
Hi! Quite a few new people here lately, welcome :)
You can parse a single character like this either as a 1-byte number (type: u1) or as a 1-character string (type: str, size: 1). Which one you choose doesn't matter much, but you have to keep the types consistent. So if you parse it as an integer, you have to compare it against numeric ASCII codes (0x42, 0x51), and if you parse it as a string, you can only compare against string literals ("B", "Q").
dgelessus
@dgelessus
In most target languages, the integer version will be more compact in memory and probably also parses a little bit faster. The string version has the advantage that the characters are more readable. But you can also work around that to some extent by defining an enum with all valid character codes - that way you still have compact integers, but with meaningful names:
seq:
  - id: type
    type: u1
    enum: type
  # ...
enums:
  type:
    0x42: bind
    0x51: query
(wow, Gitter's syntax highlighting for YAML is really broken here...)
Gavin Ray
@GavinRay97

the integer version will be more compact in memory and probably also parses a little bit faster.

Ahh this is good to know -- I was wondering if there was any pro/con to doing it one way or the other

Thank you for the help, this clears everything up! :pray:

abheekd
@abheekd:matrix.adawesome.tech
[m]
does anyone know the best way to store something similar to a C++ vector?
struct EventActionObject
{
    EventActionScope scope;
    EventActionType action_type;
    std::uint32_t game_object_id;
    std::uint8_t parameter_count;
    std::vector<EventActionParameterType> parameters_types;
    std::vector<std::int8_t> parameters;
};
I'm trying to sort of map this cpp object to a kaitai type
abheekd
@abheekd:matrix.adawesome.tech
[m]
wait i think i can just use s1 for eventactionparametertype and repeat parameter_count times
Misha Veldhoen
@MishaVeldhoen

Just a quick question regarding memory usage when reading a "wrong" file.

Kaitai generates the following (python) code for me:

    def _read(self):
        self.file_header = self._io.read_bytes(12)
        self.n_files = self._io.read_u4le()
        self.files = [None] * ((self.n_files + 1))
        for i in range((self.n_files + 1)):
            self.files[i] = MyParser.MyFile(self._io, self, self._root)

This works fine whenever I pass a correct file into the parser. However, when I pass an arbitrary file, the value of self.n_files becomes arbitrary, and the parser allocates a ton of memory and crashes the system.

Is there a kaitai-way of handling this sort of problems? For example, can I set an upper-bound on the value n_files? So I'd be looking for something like max_value in the following ksy:

seq:
  - id: my_len
    type: u4
    max_value: 1000
  - id: my_str
    type: str
    size: my_len
    encoding: UTF-8
dgelessus
@dgelessus
Yes, under valid you can set a max value for a field. That gets checked right after the field is read and throws an exception if the condition isn't fulfilled, so it should prevent the problem of allocating a huge list for garbage input.
seq:
  - id: my_len
    type: u4
    valid:
      max: 1000
10 replies
demberto
@demberto
How to use a 12 byte integer type?
2 replies
Andreas
@andreas-volz
Hi, after writing a lot of ksy files for my project I generate now the graphviz (svg) docu which gives a good overview of the API. But I don't see all my ksy doc strings in this view. Is there a more html like doc generator which shows beside the graph the text description well formated as documentation?
James Elliott
@brunchboy
The language in which you are generating parsers may well have an HTML doc generator. For example, you could generate Java classes from the .ksy files, and then use the normal JavaDoc tool to generate API documentation from that class source. In the case of my own Crate Digger project, that gives results like this.
However, generated API documentation is never enough as introductory or explanatory documentation, so I used AsciiDoc and Antora to build a detailed static documentation site providing that level of documentation in a language-independent way. Creating quality documentation like that is a huge amount of work, but it makes my reverse-engineering efforts vastly more useful to more developers. There have been a number of quality projects that came out of them.
I even ended up building a byte-field diagram generator in order to create the quality of diagrams I wanted, and contributed that to the AsciiDoc community so that it could be used in other people’s documentation as well.
Andreas
@andreas-volz
@brunchboy Ok, good idea for a first shot. I extended doxygen to also generate the folder with generated kaitai sources. Sure, this gives me an overview. But two drawbacks that I see. I've to reference the generated dot images as they give much more understanding than the doxygen dot images. I need to check if this is possible by some annotations in the source. Second the text I write to the doc area in ksy get ugly wrapped in doxygen into one line. I don't know why this happens
ah, \n in the ksy does the trick in doxygen :-)
Andreas
@andreas-volz
Hm, it works in doxygen, but I'm not getting the @image string in the generated header file. There is no global "doc:" keyword in ksy where I could generate documentation just before the class keyword for doxygen. Do you know a way?
James Elliott
@brunchboy
I am afraid I am not sure what you are asking, because I have not ever used doxygen. It seems like the top level doc string in my .ksy is also not passed to JavaDoc anywhere, so I used JavaDoc’s own package-documentation facilities to create that level of documentation. Again, for full documentation, I found I needed to go far beyond the API documentation integration that Kaitai offers, but that is a very nice foundation to build on.
Andreas
@andreas-volz
At the end it's no question of doxygen. More the c++ code generator. So how to get doc strigns from ksy to the top above the C++ class.
is that deep-symmetry tool able to get data from ksy? Is there a transformation tool? Or all hand-written docu in your case?
James Elliott
@brunchboy
Yeah, I was trying to say the same issue seems to apply to the Java code generator. I think you’d need to submit a PR to the code generators to fix that. But I take that back about Java; the things I put at the top-level doc: nodes in my .ksy files do end up as Java class-level documentation.
bytefield-svg is independent of KSY. I find that great diagrams need to be hand-specified, so I created my own domain-specific language for creating the diagrams that I needed. I build hand-written documentation as a complement to the API documentation that Kaitai offers. API documentation is nice for reference when you know what you are doing, but is no substitute for carefully written introductory, conceptual, and tutorial documentation.