Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Dec 06 11:58
    KOLANICH commented #1002
  • Dec 05 21:49
    generalmimon labeled #1002
  • Dec 05 16:55
    iboofkratom commented #263
  • Dec 05 16:48
    iboofkratom edited #1002
  • Dec 05 16:48
    iboofkratom edited #1002
  • Dec 05 16:47
    iboofkratom opened #1002
  • Dec 02 08:51
    VVlhw closed #999
  • Dec 02 08:51
    VVlhw commented #999
  • Nov 28 18:41
    DarkShadow44 commented #263
  • Nov 28 15:26
    iboofkratom commented #263
  • Nov 28 15:25
    iboofkratom commented #263
  • Nov 28 15:24
    iboofkratom commented #263
  • Nov 28 15:23
    iboofkratom commented #263
  • Nov 26 14:41

    generalmimon on serialization

    Set write flags of parse instan… _write(): use special loops for… Java: make {_read,_write}{BE,LE… and 5 more (compare)

  • Nov 26 14:35

    generalmimon on serialization

    dumpStructValue: exclude _inval… Add Expr2 subtest for _invalida… dumpStructValue: call all _chec… and 4 more (compare)

  • Nov 26 14:21

    generalmimon on serialization

    Remove KaitaiStruct.ReadWrite._… Implement _write() with `instan… (compare)

  • Nov 22 20:25
    Waffle1434 edited #1001
  • Nov 22 20:25
    Waffle1434 edited #1001
  • Nov 22 20:25
    Waffle1434 opened #1001
  • Nov 22 15:18
    generalmimon edited #963
Kurt Sansom
@kayarre
maybe is there an example that uses something like recursion for repeated nested structures?
Andreas
@andreasv:matrix.org
[m]
Hi, I've a file with a structure that contains some absolute offsets position to other abolute offset positions which points then to a structure that has dynamic type size based on some opcodes. I tried to write a ksy file, but failed as this was to complicated for me. So I try it again with some help from the beginning. The first step was getting the outer offset list:
seq:

- id: entree_offsets
type: entree_offset_type
repeat: expr
repeat-expr: 343
doc: |
tbd

types:
entree_offset_type:
seq:

- id: number
type: u2
- id: offset
type: u2
now I like to have the values based on id: offset which has then again some number and offset
Kurt Sansom
@kayarre
File_diagram.png

Similar to the riff format and the diagrams shown here:https://www.johnloomis.org/cpe102/asgn/asgn1/riff.html

I made a diagram showing the tagged format. where the base structure is a tag/ID, a size and the data.
and the subchunks are nested. and am looking for the right combination of things in kaitai struct to parse the file. EDIT: simplified diagram.

Kurt Sansom
@kayarre
You don't know up front how many nested chunks or subchunks there could be. essentially it would be some kind of tree structure but not sure how to move beyond the linear sequence mental model
Andreas
@andreasv:matrix.org
[m]
@kayarre: I made a first PoC that parses the binary, but not complete. Happy that I've another parser (with C sourse) where I could check the outcome result. But there're still problems and I'll describe them as question now:
Here an screenshot of my format problem:
https://ibb.co/tqdgvF2
2 replies
Andreas
@andreasv:matrix.org
[m]
The problem is as following. The offset points to a list of opcodes. But that list has no size or end terminator. The alternative C implementation solves it by creating an offset list of all opcodes and stop reading in more opcodes in that list if it hits opcode which is adressed by another offset. Crazy!
Any ideas how to solve that? Maybe with a "custom processing routine"?
Andreas
@andreasv:matrix.org
[m]
:point_up: Edit: @kayarre: I made a first PoC that parses the binary, but not complete. Happy that I've another parser (with C source) where I could check the outcome result. But there're still problems and I'll describe them as question now:
Ruslan0Dev
@Ruslan0Dev
Hi all! I expect 0x0A but i get 0x02. Why?
meta:
  id: example
  endian: le
  bit-endian: le

# 0x14      0x02      0xFE
# 00101000 01000000 01111111

seq:
  - id: first_bit # 0 = 0x00 ([0]0101000 01000000 01111111)
    type: b1
  - id: test
    type:
      switch-on: first_bit.to_i
      cases:
        0 : a # 01010000 = 0x0A (0[0101000 0]1000000 01111111)
        1 : b # 0        = 0x00 (0[0]101000 01000000 01111111)

types:
  a:
    seq:
      - id: bits
        type: b8
  b:
    seq:
      - id: bit
        type: b1
image.png
if without switch:
image.png
Ruslan0Dev
@Ruslan0Dev
Andreas
@andreasv:matrix.org
[m]
I don't like to bug you with my complicated detail case, but I try it once again with a more detailed example :-)
1 reply
I don't see an easy possibility to detect while parsing when a command list ends. It seems that the format says: "parse until a line where another script points to the start".
This is such a detailed implicit logic that I don't know if it's maybe better to not implement this in kaitai.
and this example is bad as such as the start of a pointer has not to be the next one. It could point also from another place. So I need to maintain a "pointer list" and check always against it.
Jim Pivarski
@jpivarski

Hi! Is there a process for developing new language targets?

I'm considering a project that would use a modified version of the cpp_stl target to generate Awkward Arrays in Python. The idea is to parse bytes in C++ in the usual way, but instead of creating a collection of (permanent)kaitai::kstruct objects, it would use Awkward's LayoutBuilder to fill an Awkward Array, for use in Python.

The purpose of this is to be able to read large binary files for scientific use-cases (direct dark matter and gamma ray astronomy), in which

  • the data analysis tools are in Python (yt in particular)
  • the data structures are complex, with nested, variable-length lists (Awkward Array is good at wrangling data like that)
  • the binary format is custom (hence, Kaitai)
  • and the data volume is huge (so Kaitai's Python target is insufficient—anything involving native Python objects, rather than arrays, would be)

Various pieces of this project are still being brought into place, so it's not certain that it will happen. However, I'd like to know if we'd be able to contribute the new target back to Kaitai if we do end up with a general Kaitai → Awkward Array solution. Kaitai has many language targets, and this could be considered a new one, although it's technically addressing two languages, C++ and Python, and the implementation would look a lot like the cpp_stl target. It may even be a subclass of io.kaitai.struct.languages.CppCompiler.

Is there someone in particular we should contact about the process? We'd like to stay in touch during development, so that the final product fits Kaitai's coding conventions and other constraints. Thanks!

Andreas
@andreasv:matrix.org
[m]
meanwhile I started a PoC implementation with help of https://doc.kaitai.io/user_guide.html#opaque-types
My plan is to populate a singleton or something which remembers that global offset pointers and then decides in that custom code while runtime of the parser based on lookup of this pointers.
2 replies
Petr Pučil
@generalmimon

@jpivarski

Hi! Is there a process for developing new language targets?

Hi, the recommended procedure for adding support for a new language is here: https://doc.kaitai.io/new_language.html

Jim Pivarski
@jpivarski

Thanks, @generalmimon! Somehow I didn't find this.

But are you guys okay with the idea of a "language" being C++/Python/Awkward Array? It's not one traditionally thinks of as a programming language, but it would be useful for scientific use-cases.

Petr Pučil
@generalmimon

@jpivarski

But are you guys okay with the idea of a "language" being C++/Python/Awkward Array?

Yes, that's not a problem - a compiler target can be anything. KS compiler already has some support for generating structures for the Construct library in Python, which is also not a general-purpose programming language. We'd also love to have a compiler target for Wireshark dissectors, for example (kaitai-io/kaitai_struct#50).

The (ideal and unattainable) goal of Kaitai Struct is "write a format specification once, use everywhere", so new targets (even relating to one programming language where it makes sense) are expected to be created for different use cases and needs.

dgelessus
@dgelessus
By the way, there's been some discussion about Awkward Array support before (kaitai-io/kaitai_struct#589), but I don't know if anything ever came out of it.
Jim Pivarski
@jpivarski

Thanks for pointing me to that issue! I don't personally know @GabriellaNicoleRamirez, but it sounds like she worked in the U. Colorado group that I'm working with now. (Between then and now, there was another student who worked on this, sometime in 2020.) Awkward Array has come a long way since then (it's clear you were talking about 0.x; we'll be releasing 2.0 early next month), and I know a developer who already has a lot of the background knowledge and would have an easier time getting started. It's still an open question, though, whether all the pieces will come together.

But at least I know that you would welcome a new language target, if we provide one, and also how to get in touch. Cheers!

Andreas
@andreasv:matrix.org
[m]

```instances:
scpe_offsets:
type: scpe_type(_index)
repeat: expr
repeat-expr: 342
doc: |
tbd

types:
entree_offset_type:
seq:

  - id: iscript\_id
    type: u2
  - id: offset
    type: u2

scpe_type:
params:

  - id: i
    type: u2
instances:
  scpe\_header:
    pos: \_parent.entree\_offsets\[i\].offset
    type: scpe\_header\_type
  scpe\_content:
    pos: \_parent.entree\_offsets\[i\].offset + 8
    type: scpe\_content\_type
    repeat: expr
    repeat-expr: |
      (scpe\_header.scpe\_content\_type == 0) ? 2 : 
      (scpe\_header.scpe\_content\_type == 1) ? 2 :
      (scpe\_header.scpe\_content\_type == 2) ? 4 : 
      (scpe\_header.scpe\_content\_type == 12) ? 14 : 

'''

I couldn't understand why access to entree_offset_type works in the web IDE, but if I access it from C++ I get always a zero for all values. It crash in the seek() function with no bytes left. Maybe that _parent logic doesn't work for some reason. Anyhow...
Jim Pivarski
@jpivarski

Is someone online?

I have what I think is a quick question: I define a value instance (the instances section of a type) and I want it to be 64-bit because I'm doing bit-shift operations on it that assume 64 bits. I understand that Javascript/the web IDE can't do this, but when I generate C++ code, it's being generated as an int32_t. How do I get the value instance to be int64_t?

Making a term in the expression .as<u8> doesn't do it.
Jim Pivarski
@jpivarski

My problem is

which might be solvable by setting CalcIntType, if there was any way I could do that. (I wouldn't mind if every integer were 64 bits.)

Peter
@russinpeter_gitlab

Hi! Is there a way to verify a C struct using GTest and the cpp target?
I tried the following test but run into an error that has been discussed before

STATUS_t data;
data.uptime = 123;

std::string buf;
buf.append((const char *) &data, sizeof(STATUS_t));
std::istringstream is(buf);
kaitai::kstream ks(&is);
status_t status_t(&ks);

ASSERT_EQ(data.uptime, status_t.uptime());

Then i get the following error

unknown file: Failure
C++ exception with description "ios_base::clear: unspecified iostream_category error" thrown in the test body.
Petr Pučil
@generalmimon

@jpivarski

Making a term in the expression .as<u8> doesn't do it.

It should work, make sure it's the last operation in the expression:

instances:
  foo:
    value: '(...).as<u8>'
Jim Pivarski
@jpivarski
@generalmimon That did it! Thanks!
Maxim Poliakovski
@maximumspatium
First of all, thats a lot for bringing us Kaitai struct! I'm trying to create a KSY file for the legacy Apple PEF format. This works well as long as the internal data is laid out sequentially. The loader section of a PEF file usually contains relocation data followed by a predefined sequence of structures describing exported symbols. The loader header contains a field that specifies the offset to the beginning of the exported symbols descriptors. I therefore decided to put them all into the instance section so I can specify their positions using pos:
    seq:
      - id: header
        type: pef_loader_header
      - id: import_libs
        type: pef_imported_library
        repeat: expr
        repeat-expr: header.imp_lib_count
      - id: import_symbols
        type: pef_imported_symbol
        repeat: expr
        repeat-expr: header.imp_sym_count
      - id: reloc_headers
        type: pef_reloc_header
        repeat: expr
        repeat-expr: header.reloc_section_count
    instances:
      strtab:
        pos: header.loader_str_offs
        size: header.exp_hash_offs - header.loader_str_offs
      exp_hash_tab:
        pos: (header.exp_hash_offs + 3) & -4
        type: u4
        repeat: expr
        repeat-expr: 1 << header.exp_hash_tab_pow
      exp_key_tab:
        pos: ((header.exp_hash_offs + 3) & -4) + (1 << header.exp_hash_tab_pow) * 4
        type: pef_exp_key_tab_entry
        repeat: expr
        repeat-expr: header.exp_sym_count
      exp_sym_tab:
        pos: ((header.exp_hash_offs + 3) & -4) + (1 << header.exp_hash_tab_pow) * 4 + 4 * header.exp_sym_count
        type: pef_exp_symbol
        repeat: expr
        repeat-expr: header.exp_sym_count
Going this way requires me to specify ugly pos keys for each descriptor for the exported symbols: ex_hash_tab, exp_key_tab and exp_sym_tab although they are laid out sequentially.
I suppose I can't use seq in instances. At least the Kaitai IDE doesn't allow me to do that.
Maxim Poliakovski
@maximumspatium
How can I fix the above issue? TLDR; I need to jump to some position in the file where another sequential structure is located. I assume I need some kind of substream but have no idea how to da that. Thank you in advance!
Andreas
@andreasv:matrix.org
[m]
It's not yet bug free, but it could remember the start/end offsets from the parent in C++ code and just parse the arguments until all are done. So my learning is if you reach a point with kaitai that looks impossible use a custom object :-)
Maxim Poliakovski
@maximumspatium
@andreasv:matrix.org Thank you for your suggestion! I'll leave it as is for now. There is another obstacle with the exported symbols in PEF I can't solve in the declarative fashion provided by Kaitai Struct. The exported symbol names are stored at three distinct locations: the name itself is resided in the string table but its position and the length are held separately in two arrays. The name of an exported symbol is decoded like that:
sym_index = 1
str_len = exp_key_tab[sym_index].str_len
str_offset = exp_sym_tab[sym_index].class_and_value & 0xFFFFFF
strcpy(name, str_tab[str_offset], str_len)
I suppose I can't do this kind of parsing the declarative way.
Matthew Turk
@matthewturk
I just wanted to drop a line to say that I just found out about ksdump (I usually use ksv and the webide) and it's awesome!