Why I quit using Flexible Arrays (C++)

At ex-job my experience with Flexible Arrays (FAs further) have been positive. I initially got there thanks to institute, and then started working fulltime, and part of my work was to develop a server and client for delivering updates, and afterwards for other activity. Until recently I didn’t even know FAs are invalid as far as the standard concerned.

Also, client have been written in completely different language than server, so my experience with FAs was limited to the server.

Deserialization was as simple as casting a buffer-pointer to a struct-pointer, e.g.:

struct Foo {
    uint8_t someField;
    uint8_t data[]; // my FA ♥
// [snip]
Foo* foo = (Foo*) received_buf;

Serialization could have been harder, but barely I noticed it — I’ve been in furious-learning mode, everything was hard and new, and I was afraid of missing a deadline. So sometimes I just couldn’t see if something can be improved, othertimes I simply couldn’t afford time for exploration, so had to do best I can and move on.

Now I’m implementing a pidgin plugin for IMPP (trillian) protocol, and I dare you, FAs stands for Fucking Annoying! Too bad I understood it very late, and now I have to rewrite lots of code, but oh, well. The more you know…

First of all, let me explain what made the problem so dramatic as opposed to my previous experience. Protocol of my ex-job software was pretty simple: some struct with fields, and FA in the end. The struct Foo you see above is pretty much it. IMPP on the other hand consist of TLVs — depending on different situations both header, and the body, and the units of data can be of at least 2 different sizes. For example a TLV unit — the structure which carries the data — depending on certain bit of its type can either have the next field (being the size of the data) as uint16_t or uint32_t.

This means that “simplicity of deserialization” is no more. Instead of casting whole thing I had to cast a small part, decrease size of data left (alternatively shift a pointer), check certain field, then check if I have enough data to proceed, otherwise return a specific error, but if everything okay, make again some calculation with sizes left, repeat whole thing again until next field, and again and again. Does it sound confusing? I hope it does, because in actual code it is! Too many places to make a mistake.

I’m sure it’s possible to automate somehow, but to figure it out I’d have to write down all use-cases. And to avoid building up unnecessary abstractions the best way would be to write half the code, then check the boilerplate. Which is a problem — I once caught myself staring for more than an hour in the code trying to understand everything I need just to move to the next level. Needless to say I spent many hours debugging various parts of such code that upon running under libasan been throwing buffer overflow. Last time it recidivated, I gave up, and decided to go look what API serialization libs are using out there; ATM I’m in process of hooking up Cereal.

  1. -st problem is a simple struct initialization. How would you initialize our hero Foo from above? Well, you can do Foo foo = {1, {2,3,4,5}}, fine. Now consider the following problem: your array is actually some other particular struct, so you want to initialize this struct, and then cast it to the array. Something like Foo foo = {1, *(uint8_t(*)[5])myOtherStruct}. Now, I have explored every possible way, I have even tried rarely known beast “pointer to array” which you can see in the prev. sentence. It never works, array always decays to a pointer. Your only bet is taking pointers, then using memcpy/std::copy. And unless you’re a robot, adding another pointer arithmetic in the code that already crowded by it for aforementioned reasons doesn’t make it look any better.

    Now that I think of it though, I could probably make a constructor accepting a pointer and a size. It’d have to blindly write beyond this->data pointer though, which I’m pretty sure is undefined behavior; but then again, the concept of FAs doesn’t even exist in standard! I dunno if it’s a good idea.
    I think issue per se arises from FAs not being valid in standard, so interaction with the rest of C++ haven’t been explored by whoever invented the concept.

  2. -nd problem kind of grows from the first. When you have lots of structs with FAs in the end, you’d have to slowly initialize them from the bottom, one-by-one, and then copy, initialize, copy, rinse and repeat. It is a lot of boilerplate code, and a lot of unnecessary copying. And remember: it is real lot of arithmetic, because you can’t just take sizeof(mystruct) and be done with, instead you have to check certain fields to make sure whether particular field have one size or another.

  3. -rd problem: you can’t have a proper default constructor for a struct with FA. Well, technically you can, but it will obviously initialize the FA to not have any data. There is a subtle consequence of this problem — you can’t use Cereal serialization library for such a struct, because the lib doesn’t know how much size needs to be preallocated.

As an alternative I gonna use a pointer. It might not follow the logic of protocol as close because the place where it would point upon deserialization will probably be in a completely different place than the rest of the struct, but at least I can avoid a bunch of horrible hacks and a maintainance nightmare if this plugin gets finally released. For the same reason I’m going to stick to Cereal — it unfortunately doesn’t allow zero-copy deserialization, but, well… consider me an asshole.

Leave a Reply (Markdown is supported)

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s