Dealing with a TCP packet with a little endian header in Erlang gen_tcp
Posted by Jim Morris on Mon Apr 13 01:11:57 -0700 2009
In Erlang they have a very neat way of reading TCP packets that have a header that specifies how big the following packet is. So long as you send that header as a big-endian integer, you can use the built-in mechanism. Then gen_tcp, takes care of making sure the entire packet is read before passing it onto you.
Here is an example of a simple server getting packets from a client using some simple binary protocol... For instance sending this binary packet,
<<0,0,0,10,1,2,3,4,5,6,7,8,9,0>>
the first 4 bytes are the length of the rest of the packet in this case 10 bytes, followed by 10 bytes of data. This will be delivered in the receive statement below when all the bytes have been read by the low level gen_tcp code.
However if (like me) 10 years ago you made the wrong decision and used a little-endian packet count in your binary protocol (because you were only talking to Intel based Windows PCs back then), you are SOL. Well not quite. A fairly simple state machine will handle the packets quite nicely, although not as efficiently as the built-in mechanism I am sure.
There are at least two ways to go for this one.
If you can spare two processes for each connected client (sure processes are really cheap in Erlang) then have a process that does a blocking tcp_gen:recv (using {active, false}, {packet, 0}) until all the bytes are read, and send a message to the other client process just as the gen_tcp example above does. This is pretty easy and clean, but does use two processes per client, and my goal is to have 100,000 clients connected per server, which may be stretching even Erlang. The example code for this case is shown at the end as example3
Write your own state machine in the one process that handles each client, this is the approach I took, and the example code is below.
It is a little complicated because with TCP/IP you may receive too little data, too much data or the exact amount you want. There is usually a minimum size of TCP/IP packet, however I take the worst case scenario and allow for a case where we could get one byte at a time per read. There is also the case where the client actually sends multiple packets in one TCP/IP packet, this could easily happen depending on how you write the client, so we need to store all excess data, and maybe process multiple packets from a single read.
Included at the end are two eunit tests to test the two extreme cases.
Now remember this is my first Erlang project, the state machine I am using went through many iterations, as I read more about Erlang I refactored and cleaned up the code until I got to something similar to the above example. I welcome any feedback from Erlang Gurus.
The moral of this story is to make the right decision in the first place and pass big-endian integers over the network, that is why it is also called network byte order. (Hey I was young and naive ;)
To be fair this only added about 23 lines of extra code overall.
Here is the code for example 3 the case where we have two process per client, and uses a blocking recv to get the packets. It is slightly less code, but not nearly as sexy ;)
UPDATE I actually rewrote this as a gen_fsm in my project, the idea is similar but I use the FSM mechanism, if there is any interest in the gen_fsm code drop me a note and I'll publish it.
Hey Wolfmanjm,
Nice post, thanks for that.
I would be interested in the fsm implementation. It will really help me in what I'm doing.
Thank you,
Alin
Brilliant post,
I am also interested in fsm implementation.
Thank you.
Janis