Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jan 31 2019 22:45
    eolivelli commented #4914
  • Jan 31 2019 22:07
    samsartor starred google/flatbuffers
  • Jan 31 2019 21:28
    marang starred google/flatbuffers
  • Jan 31 2019 20:51
    thyrlian starred google/flatbuffers
  • Jan 31 2019 19:19
    harshshah903 commented #5144
  • Jan 31 2019 19:19
    harshshah903 commented #5144
  • Jan 31 2019 18:56
    aardappel commented #4914
  • Jan 31 2019 18:54
    aardappel commented #5144
  • Jan 31 2019 18:51
    aardappel commented #5141
  • Jan 31 2019 18:51
    aardappel commented #5145
  • Jan 31 2019 18:51
    krojew commented #5142
  • Jan 31 2019 18:49
    krojew commented #5142
  • Jan 31 2019 18:48
    gabyx edited #5142
  • Jan 31 2019 18:48
    gabyx edited #5142
  • Jan 31 2019 18:47
    gabyx commented #5142
  • Jan 31 2019 18:47
    aardappel commented #5002
  • Jan 31 2019 18:43
    gabyx commented #5142
  • Jan 31 2019 18:43
    krojew commented #5142
  • Jan 31 2019 18:43
    aardappel commented #5143
  • Jan 31 2019 18:42
    gabyx commented #5142
Jon Lederman
@jon2718
Ok. Can u provide instructions
Jon Lederman
@jon2718
Flatc seems to generate wrong code b/c it is trying to generate a struct for FlatBuffersBuilder and pass that to static method, which cannot be marked mutating. Something appears quite wrong.
cyberquarks
@cyberquarks
@mikkelfj thanks
adsharma
@adsharma

I put together a bunch of schemas in flatbuffer format, primarily to see if the ideas expressed in my earlier blog post (https://adsharma.github.io/flattools/) actually work.

https://github.com/adsharma/fbs-schemas/

This is all experimental work - there may be bugs or syntax errors in the schemas in the submodules. Please file bug reports or send me fixes if it looks interesting. The telegram schema is the most extensive one I've found to date.

MikkelFJ
@mikkelfj
I also use the flatcc compilers library to parse flatbuffers in another program and generate protocol code for fast simple memory layout, but without nested type support. Someone else once mentioned using flatbuffers schema to drive a huge project, not sure what it was, UI elements or OpenGL or something.
I prefer the int32 and uint64 over int and ulong types since you don't have to very familiar with flatbuffers to understand the size of the types. Which is especially important for cross protocol IDL.
FYI you can use int32 as an alias for int in Flatbuffers schema.
MikkelFJ
@mikkelfj

From the blog:

Using the deprecated attribute is recommended.

I can’t wait till we deprecate that feature ;)

adsharma
@adsharma

I prefer the int32 and uint64 over int and ulong types
Good tip thanks. I'd use int/ulong only when the author wants to use the corresponding types in the languages we generate from the schema.

Yeah - deprecation is complicated. But I like it so much better than field numbering when generating schema at a conceptual level (telegram_api.tl stuff).

Maxim Zaks
@mzaks
I few suggestions from me regarding the core fbs:
  • IMHO language should be an enum instead of a string, the set is final so there is no need for wasting space, even when deduplication for strings is on a reference to a string is 4 bytes long by itself.
  • MimeType type is a bit complicated as the set can be non final, but in this case I would also define most of the known types as enum, and then add a custom case.
Maxim Zaks
@mzaks
enum StandardMimeType: byte {application_octstream, application_json, ... , custom}
table MimeType {
  mimeType: StandardMimeType;
  customMimeType: string;
  params:string;
}
And for URL, I think it makes sense to break it up. In a payload it is often that atleast base url repeats itself very often. In this case IMHO it is better to turn the String deduplication on and save some space.
something like:
table URL {
  baseUrl: string;
  pathComponents: [string];
  parameters: string;
}
Maxim Zaks
@mzaks
pathComponents are an array of string because I guess it is often that parts of the path repeat themselves.
I made parameters as just one concatenated string, becuase my guess is often the params are repeated as whole as well. But those are just my speculations, if you have examples it is better to check those and see what separation would be better
Maxim Zaks
@mzaks
and why would box String and Bool? if it is for null comparibility, there is a new feature which just implemented default nulls for scalars. In fact, types like Date, Duration, int128, int256 should be defined as struct. Structs are non evolvable, but we are talking about base types which should not be evolved and structs, don't have the overhead of virtual table and are stored inline.
Well regarding structs I guess if you use just the fbs and not the FlatBuffers as is, then it probably does not matter
Maxim Zaks
@mzaks
Ah OK, I looked a bit further through your examples I guess you have your own own generator from fbs to other languages, or do you use flatc? It seems to me that the generated classes are not typical for flatc output. So you don't use FlatBuffers for Flatbuffers sake but rather just the schema syntax. So all the deduplication and tables vs. structs remarks I did are not really applicable in this case.
MikkelFJ
@mikkelfj

Yeah - deprecation is complicated.

It is, but I was being sarcastic:
deprecated has been deprecated ...

As to boxed types, I can see why you would want them all to be tables, but boxing actually makes more sense with structs as Max points out, except for variable length strings. int128 can be represented as a struct int128 (align: 16) { val: [byte:16]; }. I have actually thought about adding typedef support for such wrappers. These structs have no overhead compared to native integers, but contrary to native integers (for supported sizes), they can be used in Unions.
MikkelFJ
@mikkelfj
You might want to use Int128 in upper case, and also define Int64 as struct { val: int64; } etc. because then all types can work in Unions, and all wrapped types are consistently uppercase. Lowercase int128 could theoretically conflict with a future FlatBuffer native type.
MikkelFJ
@mikkelfj
@mzaks I would keep URL simple. If needed you could added a ParsedURL or something in a separate library. Optimizing for reuse of string components likely cost more than they offer unless you have a lot of urls with long reused components. E.g. a 6 byte component would use 16 bytes: 4 for reference in the components in array, 4 for length, 6 for content, 1 for 0 terminator, and 1 for padding. Using it embedded in a another string would only use 7 bytes including a ‘/'. But of course this is for metadata and not actual contents.
adsharma
@adsharma

Ah OK, I looked a bit further through your examples I guess you have your own own generator from fbs to other languages, or do you use flatc? It seems to me that the generated classes are not typical for flatc output. So you don't use FlatBuffers for Flatbuffers sake but rather just the schema syntax. So all the deduplication and tables vs. structs remarks I did are not really applicable in this case.

That's right. This is the compiler I use:

https://github.com/adsharma/flattools/blob/master/bin/flatc.py

adsharma
@adsharma

The idea is that flatbuffers are useful when you have large buffers and you want zero overhead deserialization. Perhaps at other times you want to choose a different serialization format (thrift supported multiple, but I haven't looked lately).

So what if each table can specify its own serialization format as an annotation? It should be possible to translate that into a swift attribute?

table message (serde: flatbuffers) {
...
}

could end up with:

@flatbuffer
class Message {
...
}

But perhaps you want to use @grpc or some other attribute for other tables. This way the choice of serialization format doesn't drive the choice of IDL.

As to boxed types, I can see why you would want them all to be tables, but boxing actually makes more sense with structs as Max points out, except for variable length strings. int128 can be represented as a struct int128 (align: 16) { val: [byte:16]; }. I have actually thought about adding typedef support for such wrappers. These structs have no overhead compared to native integers, but contrary to native integers (for supported sizes), they can be used in Unions.

I didn't write those types. The boxed types exist in Telegram's TL (type language) described here:

https://core.telegram.org/mtproto/TL

The included script (compiler.py - which I derived from another project) spits out all the fbs files in the directory.

Yes - unions are a major use case for boxed types. Having multiple constructors is another I think.

adsharma
@adsharma

You might want to use Int128 in upper case, and also define Int64 as struct { val: int64; } etc. because then all types can work in Unions, and all wrapped types are consistently uppercase. Lowercase int128 could theoretically conflict with a future FlatBuffer native type.

Yup. Int128 makes more sense. Will do.

https://github.com/adsharma/fbs-schemas/blob/main/core.fbs

Most of these types came from ActivityStreams 2.0 (which is a JSON-LD schema spec). I feel the flatbuffer schema is 100x more readable for the average programmer, even if JSON-LD is more general and might have other capabilities.

MikkelFJ
@mikkelfj

So what if each table can specify its own serialization format as an annotation?

That might get a bit clumsy, and you might want to use multiple serialization formats, for example in a gateway.

Lijie.Jiang
@lijie-jiang
Would it be possible to include multiple input path with '-I' option? google/flatbuffers#6346
Sargun Dhillon
@sargun
I noticed flatbuffers doesn't have canonicalization:
RE: On purpose, the format leaves a lot of details about where exactly things live in memory undefined, e.g. fields in a table can have any order, and objects to some extent can be stored in many orders. This is because the format doesn't need this information to be efficient, and it leaves room for optimization and extension (for example, fields can be packed in a way that is most compact). Instead, the format is defined in terms of offsets and adjacency only. This may mean two different implementations may produce different binaries given the same input values, and this is perfectly valid.
Is there appetite for it?
MikkelFJ
@mikkelfj

Is there appetite for it?

What do you mean?
If implementations use this to pack better?
I'm not sure how much, but for example flatcc packs vtables at the end of the buffer by default.
And in general, data are placed as they arrive as much as possible on all implementations that I am aware of, which in itself helps performance by avoiding buffering.

Sargun Dhillon
@sargun
Is there a plan to add canonicalization to the specification, or similar?
Maxim Zaks
@mzaks
What would be the benefit of it, at this point in time?
I think if there would be a FlatBuffers 2.0 canonicalization might make sense. But pushing it on an established format without big benefits, is questionable.
It think it is ok to introduce guidelines, explaining that folowing form is more efficient and that people should follow it, if they can. But the format in current state is flexible and there is data already created in different way, so it needs to be supported anyways.
Sargun Dhillon
@sargun
Are structs themselves always guaranteed to be reproducible?
adsharma
@adsharma

Is there a plan to add canonicalization to the specification, or similar?

https://adsharma.github.io/flattools/ - pick a canonical serialization that works for you and implement it as a decorator in your favorite language, while enjoying the benefits of flatbuffer syntax as IDL.

Maxim Zaks
@mzaks

Are structs themselves always guaranteed to be reproducible?

Yes. They are rigid. You can not evolve them. You can specify some special layout properties through attributes though. But you can't change it after you have used it as it will be a breaking change.

@sargun ^
MikkelFJ
@mikkelfj
@sargun you can also print to JSON without spaces, that is probably as close as you can get. And yes structs are always the same - except potential flaws in exports where padding space is not zeroed - I just found a bug in a flatcc that failed to ensure that - because in some cases user code is allowed to influence that via a raw copy.
adsharma
@adsharma

One more blog post on flattools and where it fits in the stack:

https://adsharma.github.io/flattools-programs/

Happy New Year!

cyberquarks
@cyberquarks
Hi can this work with Flatbuffers?
message Entity {
  string dir = 1; 
  string entity_type = 2;
  string entity_id = 3;
  repeated EntityBlob blobs = 4;
  map<string, EntityProperty> properties = 5;
  map<string, EntityIdList> related_entities = 6;
}
message EntityProperty {
  oneof property_value {
    string string_value = 1;
    EntityArrayProperty array_value = 2;
    EntityObjectProperty object_value = 3;
    bool bool_value = 4;
    double double_value = 5;
  }
}
message EntityArrayProperty {
  repeated EntityProperty values = 1;
}
message EntityObjectProperty {
  map<string, EntityProperty> property_map = 1;
}
message EntityIdList {
  repeated EntityId ids = 1;
}
message EntityBlob {
  string blob_name = 1;
  bytes blob_bytes = 2;
}
message EntityId {
  string type = 1;
  string id = 2;
}
cyberquarks
@cyberquarks

I tried to translate this with flatc and I got this with the "Anonymous0" table:

// Generated from schema.proto

namespace ;

table Entity {
  dir:string;
  entity_type:string;
  entity_id:string;
  blobs:[EntityBlob];
  properties:[MapFieldEntry];
  links:[MultimapFieldEntry];
}

table EntityProperty {
  property_value:EntityProperty_.Anonymous0;
}

namespace EntityProperty_;

table Anonymous0 {
  string_value:string;
  array_value:EntityArrayProperty;
  object_value:EntityObjectProperty;
  bool_value:bool;
  double_value:double;
}

namespace ;

table EntityArrayProperty {
  values:[EntityProperty];
}

table EntityObjectProperty {
  properties:[MapFieldEntry];
}

table EntityIdList {
  ids:[EntityId];
}

table EntityBlob {
  blob_name:string;
  blob_bytes:[ubyte];
}

table EntityId {
  type:string;
  id:string;
}

table MapFieldEntry {
  key:string;
  value:EntityProperty;
}

table MultimapFieldEntry {
  key:string;
  value:EntityId;
}

What does this Anonymous0 mean?

Wouter van Oortmerssen
@aardappel
nice @adsharma
@cyberquarks Protobuf to FlatBuffers is not a 1:1 mapping, and FlatBuffers doesn't have the oneof construct.. you can just rename it to something else. And since its the only field in EntityProperty you can just replace it with EntityProperty directly. Or use a FlatBuffers union.
also MapFieldEntry should probably have a (key) attribute on the key field, so you can actually use it with dictionary lookup
vjani
@vjani
@here Had a question about tags for the flatbuffers repo, do they indicate the official releases? If so, there has been quite some time since the last one(March 2020) and I need to consume some of the later fixes, what is a good way to do this?