meta:
id: sample
endian: le
seq:
- id: section1
type: t_sig1
- id: section2
type: t_sig2
- id: section3
type: t_sig3
- id: section4
type: t_sig4
types:
t_sig1:
seq:
- id: sig
contents: ['ksy']
- id: data
size: 2
t_sig2:
seq:
- id: sig
contents: ['head']
- id: data
size: 4
t_sig3:
seq:
- id: sig
contents: ['thx']
- id: data
size: 3
t_sig4:
seq:
- id: sig
contents: ['hi']
- id: data
size: 5
meta:
id: hmm
endian: le
seq:
- id: main
type: t_test
repeat: expr
repeat-expr: 2
types:
t_test:
seq:
- id: animal
size: 4
type:
switch-on: about_type
cases:
'e_t::dog': rec_type_1
'e_t::cat': rec_type_2
_: t_def
- id: about_type
type: u1
enum: e_t
t_def:
seq:
- id: reserved
size: 4
rec_type_1:
seq:
- id: x
type: u2
- id: y
type: u2
rec_type_2:
seq:
- id: ax
type: u2
- id: ay
type: u2
enums:
e_t:
7: dog
109: cat
Hi, I am trying to write a .ksy for a binary format containing a simple list of telemetry. Everything is defined as integers but in order to get the actual engineering value, I have sometimes to multiply by a value (like voltages given as a uint in mV) and for some others I have an integer corresponding to a mode of operation (0: manual, 1: automatic).
For the first ones, I am using value instances, is that the correct way?
For the second ones (lookup tables), I tried having a switch-on entry in a value instance but got an error message. Is this doable?
expected string, got Map(switch-on -> raw_mode, cases -> Map(0 -> Automatic, 1 -> Manual))
mode:
value:
switch-on: raw_mode
cases:
0: 'Automatic'
1: 'Manual'
Hi! The switch-on
/cases
syntax only works inside type
s. In expressions, you can use conditional operators to get the same effect, for example:
mode:
value: |
raw_mode == 0 ? 'Automatic'
: raw_mode == 1 ? 'Manual'
: 'invalid'
But in this case, a better solution would be to use enums. Then you can directly declare the valid values and their meanings and don't have to use a "raw" field with an intermediate instance. For example:
seq:
- id: mode
type: u1
enum: mode
enums:
mode:
0: automatic
1: manual
.ksy
file for a sound cache file from the witcher 3. I have an example file as well as a bms script and I was able to parse the header but I'm having trouble actually reaching the data. it's just an uncompressed archive file so it simply stores all the constituent files in the file itself. here's the BMS script: getdstring SIGN 4
goto 0
if SIGN == "CS3W"
idstring "CS3W"
get VER long
get DUMMY longlong
get INFO_OFF long
get FILES long
get NAMES_OFF long
get NAMES_SIZE long
log MEMORY_FILE NAMES_OFF NAMES_SIZE
goto INFO_OFF
for i = 0 < FILES
get NAME_OFF long
get OFFSET long
get SIZE long
goto NAME_OFF MEMORY_FILE
get NAME string MEMORY_FILE
log NAME OFFSET SIZE
next i
instances
and pos
, which lets you parse data from a specific byte offset. The INFO_OFF
part could be translated to ksy syntax roughly like this:seq:
# ... SIGN, VER, DUMMY ...
- id: info_off
type: u4
- id: files
type: u4
# ... remaining fields ...
instances:
file_infos:
pos: info_off
type: file_info
repeat: expr
repeat-expr: files
types:
file_info:
seq:
- id: name_off
type: u4
# ... remaining fields from inside "for i = 0 < FILES" ...
switch-on
:seq:
- id: version
type: u4
- id: dummy
type: u8
- id: info_off
type:
switch-on: version
cases:
1: u4
_: u8 # default case
seq:
- id: type # A single byte character, identifying the message type. IE: 'Q' for Query
type: u1
- id: length
type: u4
- id: body
size: length
type:
switch-on: type
cases:
'"B"': bind_message
'"Q"': query_message
type: u1
) or as a 1-character string (type: str
, size: 1
). Which one you choose doesn't matter much, but you have to keep the types consistent. So if you parse it as an integer, you have to compare it against numeric ASCII codes (0x42
, 0x51
), and if you parse it as a string, you can only compare against string literals ("B"
, "Q"
).
enum
with all valid character codes - that way you still have compact integers, but with meaningful names:seq:
- id: type
type: u1
enum: type
# ...
enums:
type:
0x42: bind
0x51: query
struct EventActionObject
{
EventActionScope scope;
EventActionType action_type;
std::uint32_t game_object_id;
std::uint8_t parameter_count;
std::vector<EventActionParameterType> parameters_types;
std::vector<std::int8_t> parameters;
};
Just a quick question regarding memory usage when reading a "wrong" file.
Kaitai generates the following (python) code for me:
def _read(self):
self.file_header = self._io.read_bytes(12)
self.n_files = self._io.read_u4le()
self.files = [None] * ((self.n_files + 1))
for i in range((self.n_files + 1)):
self.files[i] = MyParser.MyFile(self._io, self, self._root)
This works fine whenever I pass a correct file into the parser. However, when I pass an arbitrary file, the value of self.n_files
becomes arbitrary, and the parser allocates a ton of memory and crashes the system.
Is there a kaitai-way of handling this sort of problems? For example, can I set an upper-bound on the value n_files
? So I'd be looking for something like max_value
in the following ksy
:
seq:
- id: my_len
type: u4
max_value: 1000
- id: my_str
type: str
size: my_len
encoding: UTF-8
valid
you can set a max
value for a field. That gets checked right after the field is read and throws an exception if the condition isn't fulfilled, so it should prevent the problem of allocating a huge list for garbage input.seq:
- id: my_len
type: u4
valid:
max: 1000
.ksy
files, and then use the normal JavaDoc tool to generate API documentation from that class source. In the case of my own Crate Digger project, that gives results like this.
doc
string in my .ksy
is also not passed to JavaDoc anywhere, so I used JavaDoc’s own package-documentation facilities to create that level of documentation. Again, for full documentation, I found I needed to go far beyond the API documentation integration that Kaitai offers, but that is a very nice foundation to build on.
doc:
nodes in my .ksy
files do end up as Java class-level documentation.