<?xml version='1.0' encoding='utf-8' ?>
<rss version='2.0' xmlns:dc='http://purl.org/dc/elements/1.1/'>
  <channel>
    <title>Wolfmans Howlings: Dealing with a TCP packet with a little endian header in Erlang gen_tcp</title>
    <link>http://blog.wolfman.com/articles/2009/4/13/dealing-with-an-tcp-packet-with-a-little-endian-header-in-erlang-gen-tcp</link>
    <description>A programmers Blog about Ruby, Rails and a few other issue</description>
    <language>en-us</language>
    <ttl>40</ttl>
    <item>
      <title>Dealing with a TCP packet with a little endian header in Erlang gen_tcp</title>
      <description>
        &lt;p&gt;In Erlang they have a very neat way of reading TCP packets that have a
        header that specifies how big the following packet is. So long as you
        send that header as a big-endian integer, you can use the built-in
        mechanism. Then gen_tcp, takes care of making sure the entire packet
        is read before passing it onto you.&lt;/p&gt;
        
        &lt;p&gt;Here is an example of a simple server getting packets from a client
        using some simple binary protocol...  For instance sending this binary
        packet,&lt;/p&gt;
        
        &lt;pre&gt;&lt;code&gt;&amp;lt;&amp;lt;0,0,0,10,1,2,3,4,5,6,7,8,9,0&amp;gt;&amp;gt;
        &lt;/code&gt;&lt;/pre&gt;
        
        &lt;p&gt;the first 4 bytes are the length of the rest of the packet in this
        case 10 bytes, followed by 10 bytes of data. This will be delivered in
        the receive statement below when all the bytes have been read by the
        low level gen_tcp code.&lt;/p&gt;
        
        &lt;pre&gt;-module(example1).
        
        %% state to hold the clients state
        -record(state, {stuff=default}).
        
        %% Receives a packet whose header is 4 bytes of 
        %% packet count in big-endian order followed by that 
        %% length of data as binary
        listen(Port) -&amp;gt;
            {ok, LSocket} = gen_tcp:listen(Port, [binary, {packet, 4}, 
                                                {active, false}]),
            do_accept(LSocket).
        
        %% Each client gets its own process
        do_accept(LSocket) -&amp;gt;
            case gen_tcp:accept(LSocket) of
                {ok, Socket} -&amp;gt;
                    Pid = spawn(fun() -&amp;gt; init(Socket) end),
                gen_tcp:controlling_process(Socket, Pid);
        
                {error, Reason} -&amp;gt;
                io:format(&amp;quot;Socket accept error: ~p~n&amp;quot;, [Reason])
            end,
            do_accept(LSocket).
        
        init(Socket) -&amp;gt;
            State = #state{stuff=something},
            loop(Socket, State).
        
        
        %% the {active,once} stuff means that it will receive one message for each
        %% iteration of the loop, this is good as it means TCP/IP flow control will
        %% be used to throttle the client, rather than filling up the servers buffers.
        %% We get one receive message for each complete packet received, and we know the packet
        %% is a complete packet
        loop(S, State) -&amp;gt;
            inet:setopts(S,[{active,once}]),
            receive
                {tcp,S,Data} -&amp;gt;
                    NewState = process(Data, State),
                    loop(S, NewState);
                {tcp_closed,S} -&amp;gt;
                    io:format(&amp;quot;Socket ~w closed [~w]~n&amp;quot;,[S,self()]),
                    ok
            end.
        
        process(_Data, _State) -&amp;gt;
            %% do something cool
            io:format(&amp;quot;We got the following Data: ~p~n&amp;quot;, [_Data]),
            ok.
        &lt;/pre&gt;
        
        &lt;p&gt;However if (like me) 10 years ago you made the wrong decision and used a
        little-endian packet count in your binary protocol (because you were
        only talking to Intel based Windows PCs back then), you are SOL. Well
        not quite. A fairly simple state machine will handle the packets quite
        nicely, although not as efficiently as the built-in mechanism I am
        sure.&lt;/p&gt;
        
        &lt;p&gt;There are at least two ways to go for this one.&lt;/p&gt;
        
        &lt;ol&gt;
        &lt;li&gt;&lt;p&gt;If you can spare two processes for each connected client (sure
        processes are really cheap in Erlang) then have a process that does a
        blocking tcp_gen:recv (using {active, false}, {packet, 0}) until all
        the bytes are read, and send a message to the other client process
        just as the gen_tcp example above does.  This is pretty easy and
        clean, but does use two processes per client, and my goal is to have
        100,000 clients connected per server, which may be stretching even
        Erlang. The example code for this case is shown at the end as example3&lt;/p&gt;&lt;/li&gt;
        &lt;li&gt;&lt;p&gt;Write your own state machine in the one process that handles each
        client, this is the approach I took, and the example code is below.&lt;/p&gt;&lt;/li&gt;
        &lt;/ol&gt;
        
        &lt;pre&gt;-module(example2).
        
        %% state to hold the clients state, and the read state of the packet
        -record(state, {rdstate, pktsize, pktbuf, testcount= 0}).
        
        %% definitions for a little endian count, a little endian short and a
        %% byte
        -define(LONG, 32/unsigned-little-integer).
        -define(SHORT, 16/unsigned-little-integer).
        -define(BYTE, 8/unsigned-little-integer).
        
        %% Receives a packet whose header is 4 bytes of packet count in
        %% little-endian order followed by that length of data as binary Note
        %% that packet is set to 0 as we can't use the built-in packet stuff
        listen(Port) -&amp;gt;
            {ok, LSocket} = gen_tcp:listen(Port, [binary, {packet, 0}, 
                                                {active, false}]),
            do_accept(LSocket).
        
        %% Each client gets its own process
        do_accept(LSocket) -&amp;gt;
            case gen_tcp:accept(LSocket) of
                {ok, Socket} -&amp;gt;
                    Pid = spawn(fun() -&amp;gt; init(Socket) end),
                %% we need this so that the client process gets its own tcp messages
                gen_tcp:controlling_process(Socket, Pid);
        
                {error, Reason} -&amp;gt;
                io:format(&amp;quot;Socket accept error: ~p~n&amp;quot;, [Reason])
            end,
            do_accept(LSocket).
        
        init(Socket) -&amp;gt;
            %% this sets the inital state of packet read state to read the
            %% size and the initial number of bytes to read as 4, with an empty
            %% packet buffer
            State = #state{rdstate=rdsize, pktbuf= &amp;lt;&amp;lt;&amp;gt;&amp;gt;, pktsize=4},
            loop(Socket, State).
        
        
        %% the {active,once} stuff means that it will receive one message for each
        %% iteration of the loop, this is good as it means TCP/IP flow control will
        %% be used to throttle the client, rather than filling up the servers buffers.
        %% We get one receive message for each complete packet received, and we know the packet
        %% is a complete packet
        loop(S, State) -&amp;gt;
            inet:setopts(S,[{active,once}]),
            receive
                {tcp,S,Data} -&amp;gt;
                    NewState =  handle_data_loop(Data, State),
                    loop(S, NewState);
                {tcp_closed,S} -&amp;gt;
                    io:format(&amp;quot;Socket ~w closed [~w]~n&amp;quot;,[S,self()]),
                    ok
            end.
        
        %% handles reading the data packet until a complete packet is read,
        %% and accumulate any excess which may be used towards the next packet
        %% may execute several packets/commands if it got more than one packet.
        %% stores packet information in State between reads from the client.
        handle_data_loop(Data, State) -&amp;gt;
            {NewState, Acc, ReadState, Size} =
            handle_data_loop(Data, State#state.pktbuf, State#state.rdstate, State#state.pktsize, State),
            NewState#state{pktbuf=Acc, pktsize=Size, rdstate=ReadState}.
        
        %% We need this second version so we can loop and process multiple
        %% packets we may have received in a single TCP/IP packet
        handle_data_loop(Data, Acc, ReadState, Size, State) -&amp;gt;   
            case handle_data(ReadState, &amp;lt;&amp;lt;Acc/binary, Data/binary&amp;gt;&amp;gt;, Size) of
            %% need more data from client, so needs to go back to the read
            %% loop to wait for more data from the client
            {need_more, Rest, NewState, NewSize} -&amp;gt;
                {State, Rest, NewState, NewSize};
        
            %% we got a complete packet so process it, then do a tail
            %% recursive loop until all complete packets have been
            %% processed
            {packet, Pkt, Rest} -&amp;gt;
                NewState= process(Pkt, State),
                handle_data_loop(Rest, &amp;lt;&amp;lt;&amp;gt;&amp;gt;, rdsize, 4, NewState)
            end.
        
        
        %% the actual state machine that reads the packet size then the data
        %% each of these handles a different state (the first parameter)
        
        %% state 1 where we are reading the packet header but we don;t have
        %% enough yet
        handle_data(rdsize, Data, Size) when byte_size(Data) &amp;lt; Size -&amp;gt;
            {need_more, Data, rdsize, Size};
        
        %% state 2 where we are still reading the packet header and we now
        %% have enough, we extract the little endian packet count and switch
        %% state to read data
        handle_data(rdsize, Data, Size) when byte_size(Data) &amp;gt;= Size -&amp;gt;
            &amp;lt;&amp;lt;N:?LONG, Rest/binary&amp;gt;&amp;gt; = Data,
            handle_data(rddata, Rest, N);
        
        %% state 3 where we are reading the data portion of the packet but
        %% don't have enough yet
        handle_data(rddata, Data, Size) when byte_size(Data) &amp;lt; Size -&amp;gt;
            {need_more, Data, rddata, Size};
        
        %% state 4 where we are still reading the data portion of the packet,
        %% and now have enough for a complete packet
        %% extract the complete pkt, and any excess data
        handle_data(rddata, Data, Size) when byte_size(Data) &amp;gt;= Size -&amp;gt;
            &amp;lt;&amp;lt;Pkt:Size/binary, Rest/binary&amp;gt;&amp;gt; = Data,
            {packet, Pkt, Rest}.
        
        
        
        %% Actually process the packet we received
        process(Data, State) -&amp;gt;
            %% do something cool and maybe change the State
            io:format(&amp;quot;We got the following Data: ~p~n&amp;quot;, [Data]),
            %% for test purposes we just count the number of packets we got
            Cnt= State#state.testcount,
            State#state{testcount= Cnt+1}.
        
        
        %%
        %% We even have Unit tests for the state machine to test the two extreme
        %% cases, 1 byte at a time, and multiple packets in one chunk
        %%
        -define(TEST, 1).
        -ifdef(TEST).
        -include_lib(&amp;quot;/usr/local/lib/erlang/lib/eunit-2.0/include/eunit.hrl&amp;quot;).
        -endif.
        
        -ifdef(TEST).
        
        handle_data_1byte_at_a_time_test() -&amp;gt;
            ?assertMatch({need_more, &amp;lt;&amp;lt;3&amp;gt;&amp;gt;, rdsize, 4}, handle_data(rdsize, &amp;lt;&amp;lt;3&amp;gt;&amp;gt;, 4)),    
            ?assertMatch({need_more, &amp;lt;&amp;lt;3,0&amp;gt;&amp;gt;, rdsize, 4}, handle_data(rdsize, &amp;lt;&amp;lt;3,0&amp;gt;&amp;gt;, 4)),
            ?assertMatch({need_more, &amp;lt;&amp;lt;3,0,0&amp;gt;&amp;gt;, rdsize, 4}, handle_data(rdsize, &amp;lt;&amp;lt;3,0,0&amp;gt;&amp;gt;, 4)),   
            ?assertMatch({need_more, &amp;lt;&amp;lt;&amp;gt;&amp;gt;, rddata, 3}, handle_data(rdsize, &amp;lt;&amp;lt;3,0,0,0&amp;gt;&amp;gt;, 4)),   
        
            ?assertMatch({need_more, &amp;lt;&amp;lt;11&amp;gt;&amp;gt;, rddata, 3}, handle_data(rddata, &amp;lt;&amp;lt;11&amp;gt;&amp;gt;, 3)),   
            ?assertMatch({need_more, &amp;lt;&amp;lt;11,22&amp;gt;&amp;gt;, rddata, 3}, handle_data(rddata, &amp;lt;&amp;lt;11,22&amp;gt;&amp;gt;, 3)),   
            ?assertMatch({packet, &amp;lt;&amp;lt;11,22,33&amp;gt;&amp;gt;, &amp;lt;&amp;lt;&amp;gt;&amp;gt;}, handle_data(rddata, &amp;lt;&amp;lt;11,22,33&amp;gt;&amp;gt;, 3)).
        
        handle_data_loop_2_packets_all_bytes_at_once_test() -&amp;gt;
            C1 = #state{rdstate=rdsize, pktsize=4, pktbuf = &amp;lt;&amp;lt; &amp;gt;&amp;gt;, testcount=0},
            Bin =  &amp;lt;&amp;lt;4, 0, 0, 0, 250, 1, 0, $z, 5, 0, 0, 0, 251, 2, 0, $a, $b&amp;gt;&amp;gt;,
            C2 = handle_data_loop(Bin, C1),
            ?assertMatch(#state{pktsize=4, rdstate=rdsize, pktbuf = &amp;lt;&amp;lt;&amp;gt;&amp;gt;, testcount=2}, C2).
        
        -endif.
        &lt;/pre&gt;
        
        &lt;p&gt;It is a little complicated because with TCP/IP you may receive too
        little data, too much data or the exact amount you want. There is
        usually a minimum size of TCP/IP packet, however I take the worst case
        scenario and allow for a case where we could get one byte at a time
        per read. There is also the case where the client actually sends
        multiple packets in one TCP/IP packet, this could easily happen
        depending on how you write the client, so we need to store all excess
        data, and maybe process multiple packets from a single read.&lt;/p&gt;
        
        &lt;p&gt;Included at the end are two eunit tests to test the two extreme cases.&lt;/p&gt;
        
        &lt;p&gt;Now remember this is my first Erlang project, the state machine I am
        using went through many iterations, as I read more about Erlang I
        refactored and cleaned up the code until I got to something similar to
        the above example. I welcome any feedback from Erlang Gurus.&lt;/p&gt;
        
        &lt;p&gt;The moral of this story is to make the right decision in the first
        place and pass big-endian integers over the network, that is why it is
        also called network byte order. (Hey I was young and naive ;)&lt;/p&gt;
        
        &lt;p&gt;To be fair this only added about 23 lines of extra code overall.&lt;/p&gt;
        
        &lt;p&gt;Here is the code for example 3 the case where we have two process per
        client, and uses a blocking recv to get the packets. It is slightly
        less code, but not nearly as sexy ;)&lt;/p&gt;
        
        &lt;pre&gt;-module(example3).
        
        %% state to hold the clients state
        -record(state, {pid, testcount}).
        -record(cc, {testcount}).
        
        listen(Port) -&amp;gt;
            {ok, LSocket} = gen_tcp:listen(Port, [binary, {packet, 0}, 
                                                {active, false}]),
            do_accept(LSocket).
        
        %% Each client gets its own process
        do_accept(LSocket) -&amp;gt;
            case gen_tcp:accept(LSocket) of
                {ok, Socket} -&amp;gt;
                    Pid = spawn(fun() -&amp;gt; init(Socket) end),
                gen_tcp:controlling_process(Socket, Pid);
        
                {error, Reason} -&amp;gt;
                io:format(&amp;quot;Socket accept error: ~p~n&amp;quot;, [Reason])
            end,
            do_accept(LSocket).
        
        %% Each client also gets another process to handle the packets
        init(Socket) -&amp;gt;
            Pid = spawn(fun() -&amp;gt; server(#cc{testcount=0}) end),
            State = #state{testcount=0, pid=Pid},
            loop(Socket, State).
        
        
        %% This is a blocking loop that blocks to get exactly 4 bytes which is
        %% a count then blocks reading that count number of bytes, sending a
        %% message to the server process when done
        loop(S, State) -&amp;gt;
            case gen_tcp:recv(S, 4) of
            {ok, B} -&amp;gt;
                &amp;lt;&amp;lt;Cnt:32/unsigned-little-integer&amp;gt;&amp;gt; = B,
                case gen_tcp:recv(S, Cnt) of
                {ok, Data} -&amp;gt;
                    State#state.pid ! {mytcp, Data},
                    loop(S, State);
                {error, closed} -&amp;gt;
                    State#state.pid ! {mytcp_closed}
                end;
            {error, closed} -&amp;gt;
                State#state.pid ! {mytcp_closed}
            end.
        
        %% the actual server that handles the packets asynchronously
        server(State) -&amp;gt;
            receive
                {mytcp,Data} -&amp;gt;
                    NewState = process(Data, State),
                    server(NewState);
                {mytcp_closed} -&amp;gt;
                    io:format(&amp;quot;my Socket closed [~w]~n&amp;quot;,[self()]),
                    ok
            end.
        
        process(Data, State) -&amp;gt;
            %% do something cool
            io:format(&amp;quot;We got the following Data: ~p, State: ~p~n&amp;quot;, [Data, State]),
            Cnt= State#cc.testcount,
            State#cc{testcount= Cnt+1}.
        
        &lt;/pre&gt;
        
        &lt;p&gt;&lt;a href=&quot;http://technorati.com/tag/erlang+gen_tcp&quot; rel=&quot;tag&quot;&gt;&lt;/a&gt;&lt;/p&gt;
      </description>
      <author>Jim Morris</author>
      <pubDate>Mon, 13 Apr 2009 01:45:30 -0700</pubDate>
      <link>http://blog.wolfman.com/articles/2009/4/13/dealing-with-an-tcp-packet-with-a-little-endian-header-in-erlang-gen-tcp</link>
      <guid isPermaLink='false'>urn:uuid:2eee82d9-c7f9-413f-b34c-53a03b907477</guid>
    </item>
    <item>
      <title>"Dealing with a TCP packet with a little endian header in Erlang gen_tcp" by wolfmanjm</title>
      <description>UPDATE I actually rewrote this as a gen_fsm in my project, the idea is similar but I use the FSM mechanism, if there is any interest in the gen_fsm code drop me a note and I'll publish it.</description>
      <pubDate>Tue, 05 May 2009 23:17:28 -0700</pubDate>
      <link>http://blog.wolfman.com/posts/45#comment-249</link>
      <guid isPermaLink='false'>urn:uuid:06e940c0-593e-4592-86dc-b85a14bf14ab</guid>
    </item>
    <item>
      <title>"Dealing with a TCP packet with a little endian header in Erlang gen_tcp" by alin.popa@gmail.com</title>
      <description>
        Hey Wolfmanjm,
        
        Nice post, thanks for that.
        I would be interested in the fsm implementation. It will really help me in what I'm doing.
        
        Thank you,
        
        Alin
      </description>
      <pubDate>Sun, 18 Jul 2010 11:08:55 -0700</pubDate>
      <link>http://blog.wolfman.com/posts/45#comment-351</link>
      <guid isPermaLink='false'>urn:uuid:de107ec5-fb55-477b-8384-b6ca8e7b7373</guid>
    </item>
  </channel>
</rss>
