XMPP versus MQTT: comparing apples with pears

Recently I had a discussion about using either XMPP or MQTT within an IoT project. One of the arguments for MQTT was that its a very efficient protocol with very little overhead where XMPP is considered very verbose.

Of course message size is only 1 aspect when choosing a protocol buts for now lets focus on message sizes.

Spoiler

MQTT is clearly the most efficient protocol on the wire. When encryption is taken into account the gap is not so big anymore. Considering all the extra functions you get from XMPP the overhead might be acceptable, depending on your use case.

Protocol Without encryption With encryption
MQTT 28 bytes 68 bytes
XMPP 491 bytes 308 bytes

XMPP pub/sub uses a transaction to publish a new item. So the publisher first sends a new item to the pubsub service and the pubsub confirms that it received the message. This is the main reason that the total number for publishing gets so large.

I noticed that subsequent XMPP requests take up less space on the wire up to the moment that the transaction takes only 180 bytes for the transaction. I guess this is the compression kicking in.

Using EXI the size of the XMPP messages can be optimised even further but as this is a relatively new XMPP extension proposal this is currently not implemented within XMPP servers (as far as I know).

The test

Lets consider a topic universe/earth (14 bytes) and a message Hello World! (12 bytes) and see what this message would take on the wire.

Since MQTT is a pure pub/sub protocol I will compare it with the pub/sub service in XMPP (which might not be the most efficient for your use case).

MQTT

Its pretty easy to determine what the message size in MQTT would be. MQTT adds a 4 bytes header to the topic and the message and the total length on the message would be 28 bytes.

  • 2 bytes for the header
  • 2 bytes for the topic length
  • 26 bytes for the message

If we want to add encryption the message would grow as encryption adds overhead. If the MQTT packet would be encrypted with TLS the overhead per packet is around 40 bytes.

The total message size of an encrypted MQTT message would be 68 bytes.

Note that I did not actually test this on the wire (yes I am lazy)

XMPP

For XMPP its a bit harder to determine the message size as it depends on encryption and compression. Lets consider the following XMPP pub/sub message:

<iq to='pubsub.servicelab.org' type='set' id='123457' xmlns='jabber:client'>
  <pubsub xmlns='http://jabber.org/protocol/pubsub'>
    <publish node='universe/earth' jid='spaceship@servicelab.org'>
      <item>
        <entry>Hello World!</entry>
      </item>
    </publish>
  </pubsub>
</iq>

Note that this message has information about the identity of the publisher. This information is lost in the MQTT message.

When removing the white space from the message the length would be 252 bytes. But what does it take on the wire?

To test this I’ve used the following:

  • A XMPP server: Prosody
  • A XMPP client: PSI
  • tcpdump to get the wire message size

The server and the client are configured to use TLS and zlib compression. On the wire this message initially will be 202 bytes. When trying to send the again (with another id) the message size will be a bit smaller: 90 bytes. I guess this is because of the stream compression.

Because XMPP pub/sub uses a request response mechanism for publishing messages to a pub/sub node so the server will send a message back to the client for each message that is published. That message will look like:

<iq type='result' to='pubsub.servicelab.org' from='spaceship@servicelab.org' id='123456'>
  <pubsub xmlns='http://jabber.org/protocol/pubsub'>
    <publish node='universe/earth'>
      <item id='ae890ac52d0df67ed7cfdf51b644e901'/>
    </publish>
  </pubsub>
</iq>

This message is 239 bytes large. On the wire it will take 138 bytes. Subsequent message with another id and item id will be around 106 bytes in size.

The total message size on the wire is initially 308 bytes for a new connection but sizes may very when a connection is kept alive longer. The total size of the XML is 491 bytes.

Comparing other functions

Above I’ve focused on comparing message sizes but what about other functions? Message size isn’t the only thing to look at when choosing a protocol. XMPP offers a lot of features that MQTT does not offer like identity, federation and all kind a extensions also specific for the IoT.

MQTT is really nice as a messaging bus in a single domain where devices belong to a single entity. I find the strength of XMPP is in the fact that it federates which make it more suitable in a multi domain, multi entities (or business roles) environment.

Posted in internet of things, machine to machine, protocols | Tagged , , , , , | Leave a comment

Smart Grid Research with the ELK Stack. You Know, For Science.

The Elasticsearch tagline is “you know, for search”. But in our case, also for science. At Servicelab, we’re using Elasticsearch, Logstash and Kibana to monitor and analyze a Smart Grid pilot. Read on for the why and how. Continue reading

Posted in Cloud computing, DevOps, internet of things | Tagged , , , , , , | Leave a comment

Installing git on windows with putty

There are many solution available for running git on your windows machine, but this is the setup that I find most useful: msysgit with putty.

  1. First download the latest putty (using the installer) and install it. If you want to download the executables directly, you also have to download plink.exe.
  2. Download the latest msysgit and install it. During installation it will ask to choose the SSH executable. If putty is installed correctly it will have detected it, otherwise you have to select the ‘Use (Tortoise)Plink’ option and use the button to select the plink executable on your system (when using the default putty installer this will be ‘C:\Program Files (x86)\PuTTY\plink.exe’).
  3. Now you can start using git and use pageant for public-private key authentication!

Continue reading

Posted in Uncategorized | Leave a comment

DevOps on Mobile

The word on the street is DevOps. There’s technology out there to make both developers’ and sysadmins’ lives easier and more intertwined, such as:

  • Containerization: packaging software in a lightweight virtualization container with a low overhead, with bonuses in maintainability and portability, and
  • Large-scale cluster management: configuration and management of thousands of servers using easy-to-use tooling and scripting,

but it’s currently exclusively focused on the let’s-call-it-the-cloud domain. So we – a very small team of network engineers, information security experts and mobile platform wizards – secured some funding from the Dutch Ministry of Defence and very limited time to do a small-scale project to see if aforementioned technologies work in the mobile domain. Continue reading

Posted in DevOps, Hardware, internet of things, Networking | Tagged , , , , , , , , | Leave a comment

Streaming audio between browsers with WebRTC and WebAudio

UPDATE (November 10, 2013): the demo page is up and running again.

As an experiment I wanted to try if it is possible to stream music files from one browser to another using WebRTC. This post describes how to achieve this. I’ve published the result of my experiment as a demo web application, you’ll find the link to the demo application below.

What is WebRTC

WebRTC is a JavaScript API that enables web-developers to create real-time communication (RTC) applications. WebRTC uses peer-to-peer connections to send data between browsers, without the need for servers in the data path. WebRTC is mostly known for making audio and video calls from one browser to another, making skype-like communication possible using only browser technology.

But WebRTC has more to it that real-time communication only. Personally I believe that RTCDataChannels, that makes it possible to send arbitrary data between browsers, is the most disruptive feature WebRTC has to offer.

WebRTC explicitly does not handle signaling messages that are needed to setup a peer-to-peer connection between two browsers.

Today I want to write about using WebRTC in combination with WebAudio APIs. This article will not cover WebRTC basics like sending around the signalling, great tutorials on this alreay exist.

What is WebAudio

The is a JavaScript API for processing and synthesizing audio in web applications. It can be used for mixing, processing and filtering audio using only a webbrowser.

HTML5 ROCKS has a nice introduction on the basics of WebAudio, please go there for more details on this API.

Tying the two together

Right in the center of the WebAudio API is the AudioContext object. The AudioContext is mostly used as a singleton for routing audio signals and representation of audio object. Below is a simple example on how to play a mp3 file using the WebAudio API:

var context = new AudioContext();

function handleFileSelect(event) {
  var file = event.target.files[0];

  if (file) {
    if (file.type.match('audio*')) {
      var reader = new FileReader();
      reader.onload = (function(readEvent) {
        context.decodeAudioData(readEvent.target.result, function(buffer) {
          // create an audio source and connect it to the file buffer
          var source = context.createBufferSource();
          source.buffer = buffer;
          source.start(0);
          // connect the audio stream to the audio hardware
          source.connect(context.destination);
        });
      });

      reader.readAsArrayBuffer(file);
    }
  }
}

This code will make a mp3 file, selected using a file input element, play over the audio hardware on the host computer. To send the audio stream over a WebRTC connection we’ll have to add a few extra lines of code:

var context = new AudioContext();
// create a peer connection
var pc = new RTCPeerConnection(pc_config);

function handleFileSelect(event) {
  var file = event.target.files[0];

  if (file) {
    if (file.type.match('audio*')) {
      var reader = new FileReader();

        reader.onload = (function(readEvent) {
          context.decodeAudioData(readEvent.target.result, function(buffer) {
            // create an audio source and connect it to the file buffer
            var source = context.createBufferSource();
            source.buffer = buffer;
            source.start(0);

            // connect the audio stream to the audio hardware
            source.connect(context.destination);

            // create a destination for the remote browser
            var remote = context.createMediaStreamDestination();

            // connect the remote destination to the source
            source.connect(remote);

            // add the stream to the peer connection
            pc.addStream(remote.stream);

            // create a SDP offer for the new stream
            pc.createOffer(setLocalAndSendMessage);
          });
        });

      reader.readAsArrayBuffer(file);
    }
  }
}

I’ve tried playing the audio on the receiving side using the WebAudio API as well but this did not seem to work at the time of writing. So for now we’ll have to add the incoming stream to an <audio/> element:

function gotRemoteStream(event) {
  // create a player, we could also get a reference from a existing player in the DOM
  var player = new Audio();
  // attach the media stream
  attachMediaStream(player, event.stream);
  // start playing
  player.play();
}

And we’re done.

Demo

Screenshot of the demo application

Screenshot of the demo application

Building upon this example I’ve created a demo web application (requires Chrome) to show this functionality. The demo basically uses the example above but it connects the audio stream for local playback to a GainNode object so the local audio volume can be controlled. The demo application allows you to play an mp3 file from a webpage and let others listen in. On the listener page the music is played using an <audio/> element. The listener page also receives some id3 meta-data using an RTCDataChannel connection. Finally the listener page displays the bit-rate of the music stream that is played.

The demo application has a few loose ends:

  • Currently it only works on Chrome (mainly because it plays mp3s)
  • On the receiving side it doesn’t use the WebAudio API for playing the audio
  • There is static on the line when the audio stream is paused. I haven’t been able to detect, on the receiving side, without signaling, if a stream is removed.
  • The audio quality is quite poor, the audio stream has a bit rate of around 32kbits/sec, increasing the bandwidth constrains does not seem to influence the audio stream yet.

It is good to see that WebRTC and WebAudio are playing together quite nicely. In the future it will be possible to expand on this concept and create all kinds of cool applications like for example sending a voice message when a user isn’t available to accept a WebRTC call or a DJ-ing web application where others can listen in on!

Image | Posted on by | Tagged , , , , , , , | 11 Comments

XMPP Service Discovery extensions for M2M and IoT

Within the Webinos project we have investigated if we could use XMPP as the signalling protocol. Although it was decided by the Webinos project not to use XMPP in Webinos we believe we’ve made some interesting observations and solutions for some of the problems we’ve encountered in our investigation.

We presented our findings in the Jabber dev-room on FOSDEM’13 and got some interesting feedback from the Guru’s present.

In this blogpost we want to share our insights upon service discovery and invocation in XMPP which we believe are very well applicable onto machine-to-machine communications and the Internet-of-Things.

About Webinos

If you would like to know what Webinos is all about I suggest that you watch this short film that describes Webinos quite well. But to summarize: Webinos is about sharing device capabilities with applications running of different devices. For instance one can have an application running on a thermostat at home that uses the GPS capabilities of a car to track if your home should be heated or not.

One of the requirements is that a device can have multiple instances of the same capability, for example a smartphone has an internal GPS sensor but can also pair with a Bluetooth GPS sensor that has better accuracy, hence the GPS service would have 2 instances on this device.

XMPP to the rescue

XMPP has some great out-of-the-box features that we could use to discover services. For example XMPP Service Discovery (XEP-0030) and XMPP Entity Capabilities (XEP-0115). These mechanisms works as described in the call flow shown below:

disco

Bob publishes his presence state to the server and the server propagates his presence state to all his subscribers. Encoded in his presence message is a hash based upon his device capabilities, it is unique for every state, so if a capability gets added or removed the hash will change. If a receiving node does not know what the hash means, that node can ask Bob’s device for more information. The receiving node can that cache that information using the hash it received earlier. Note that all of Bob’s own devices are also subscribed on his presence so each of his devices will receive the presence updates of his other devices.

Looking at the wire protocol this looks like this:

<presence from="someone@servicelab.org/mobile">
  <c xmlns="http://jabber.org/protocol/caps" 
      hash="sha-1"
      node="http://webinos.org/android"
      ver="QgayPKawpkPSDYmwT/WM94uAlu0="/>
</presence>

The ver attribute holds the hash that is generated using the capabilities of someone’s mobile device. The message is send to the server and received by someone’s TV. The TV is unaware of the meaning of the hash and requests additional information:

<iq from="someone@servicelab.org/tv" 
    id="discodisco"
    to="someone@servicelab.org/mobile" 
    type="get">
  <query xmlns="http://jabber.org/protocol/disco#info"
      node="http://webinos.org/android#QgayPKawpkPSDYmwT/WM94uAlu0="/>
</iq>

This message tells someone’s mobile that someone’s TV wants to know what the hash code means. Someone’s mobile will answer the message:

<iq from="someone@servicelab.org/mobile" 
    id="discodisco"
    to="someone@servicelab.org/tv" 
    type="result">
  <query xmlns="http://jabber.org/protocol/disco#info"
      node="http://webinos.org/android#QgayPKawpkPSDYmwT/WM94uAlu0=">
    <identity category="client" name="Webinos for Android vX.X.X" type="mobile"/>

    <feature var="urn:services-webinos-org:geolocation"/>
    <feature var="urn:services-webinos-org:geolocation"/>

  </query>
</iq>

In this case the hash code means that there are 2 instances of the geolocation service available on someone’s mobile. But how can someone’s TV differentiate between the 2 services? Currently in XMPP it is possible to add additional information about a service in the message (XEP-128) but it does not has the possibility to differentiate between multiple services within the same namespace. The nice thing about XMPP is that you can extend it… And that is exactly what we did:

<iq from="someone@servicelab.org/mobile" 
    id="discodisco"
    to="someone@servicelab.org/tv" 
    type="result">
  <query xmlns="http://jabber.org/protocol/disco#info"
      node="http://webinos.org/android#QgayPKawpkPSDYmwT/WM94uAlu0=">
    <identity category="client" name="Webinos for Android vX.X.X" type="mobile"/>

    <feature var="urn:services-webinos-org:geolocation">
      <!-- new stuff -->
      <instance xmlns="webinos:rpc#disco" id="gps1">
        <displayName>Internal GPS</displayName>
        <description>Internal GPS sensor</description>
        ...
      </instance>
      <!-- back to normal --->
    </feature>

    <feature var="urn:services-webinos-org:geolocation">
      <instance xmlns="webinos:rpc#disco" id="gps2">
        <displayName>Bluetooth GPS</displayName>
        <description>Bluetooth GPS sensor</description>
        ...
      </instance>
    </feature>

  </query>
</iq>

We added an instance element to each of the feature elements. The instance element holds the identifier for each service instance and hold some additional information about the service. The additional information can be extended with extra information about the service, like for example the accuracy of the GPS device.

Service invocation

Because someone’s TV is now aware of the location services on someone’s mobile it can now choose which of these service instances it want to invoke.

Since webinos is all about web technologies we’ve chosen not to use XMPP RPC (XEP-0009) but to transport JSON RPC payloads over XMPP.

<iq from="someone@servicelab.org/tv"
    id="rpc1"
    to="someone@servicelab.org/mobile"
    type="get">
  <query xmlns="urn:services-webinos-org:geolocation">
    <payload xmlns="webinos:rpc#invoke" id="gps1">
      { "method": "getCurrentPosition"
      , "params": {}
      , "id": "rpc1"
      }
    </payload>
  </query>
</iq>

The request is than answered by someone’s mobile:

<iq from="someone@servicelab.org/mobile"
    id="rpc1"
    to="someone@servicelab.org/tv"
    type="result">
  <query xmlns="urn:services-webinos-org:geolocation">  
    <payload xmlns="webinos:rpc#result" id="gps1">
        { "result":
          { "coords":
            { "latitude": 52.5
            , "longitude": 5.75
            }
          , "timestamp": 1332410024338
          }
        , "id": "rpc1"
        , "callbackId": "app1"
        }
    </payload>
  </query>
</iq>

Feedback

As mentioned above we’ve presented our idea in the Jabber dev-room on FOSDEM. The initial feedback we got from the XMPP Guru’s that were in the room was that XMPP Service Discovery disco#info feature we extended is about getting information about what exactly a node is capable of. One could than use disco#items to discover the instances of the feature.

The problem we have with that approach is that the disco#items are not used to create the version hash as described in the Entity Capability XEP. So if a service instance goes away but another instance of the same service is still presence the presence information of that node would not change and other node’s would not be notified about the change. This could be a problem in a situation were someone’s TV would have an accuracy requirement on the GPS service instance it wants to use. An example:

  • Someone’s mobile has one location service available. It has an accuracy of 30 meters.
  • To notify other devices about this feature is sends out a presence update with version hash 123.
  • Someone’s TV gets the details that belong to version hash 123 and uses disco#items to query for the instances of the location services. The single service that is present is not accurate enough for someone’s TV as the application that is running there needs an accuracy of 10 meters.
  • Someone pairs a Bluetooth GPS device with an accuracy of 10 meters with his mobile. The disco#items of someone’s mobile change but the feature list stays the same, so the version hash will stay 123 and the presence state would not have to updated.
  • Someone’s TV would never be aware of the fact that his service requirements are now met because someone’s mobile feature set is still the same.

Conclusion

The approach we’ve chosen for service discovery proved to work quite well with our requirements but XMPP flexibility makes other solutions to the same problem viable as well.

Posted in domotica, internet of things, machine to machine, Networking, programming, protocols, telecommunications | Tagged , , , | 2 Comments

Installing a Storm cluster on CentOS hosts

Storm is a distributed, realtime computation system to reliably process unbounded streams of data. The following picture shows how data is processed in Storm:

storm-processing

This tutorial will show you how to install Storm on a cluster of CentOS hosts. A Storm cluster contains the following components:

storm-cluster

Nimbus is the name for the master node. Nimbus is responsible for distributing code around the cluster, assigning tasks to machines, and monitoring for failures. The nodes that perform the work contain a supervisor and each supervisor is in control of one or more workers on that node. ZooKeeper is used for coordination between nimbus and the supervisors.

All nodes

We start with disabling SELinux and iptables on every host. This is a bad idea if you are running your cluster on publicly accessible machines, but makes it a lot easier to debug network problems. SELinux is enabled by default on CentOS. To disable it, we need to edit /etc/selinux/config:

SELINUX=disabled

We need to reboot the machine for this to take effect.

The firewall has some default rules we want to get rid of:

iptables --flush
iptables --table nat --flush
iptables --delete-chain
iptables --table nat --delete-chain
/etc/init.d/iptables save

Storm and ZooKeeper are both fail-fast systems, which means that a Storm or ZooKeeper process will kill itself as soon as an error is detected. It is therefore necessary to put the Storm and ZooKeeper processes under supervision. This will make sure that each process is restarted when needed. For supervision we will use supervisord. Installation is performed like this:

rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm
yum install supervisor

ZooKeeper node

We will now create a single ZooKeeper node. Take a look at the ZooKeeper documentation to install a cluster.

yum -y install java-1.7.0-openjdk-devel wget
cd /opt
wget http://apache.xl-mirror.nl/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz
tar zxvf zookeeper-3.4.5.tar.gz
mkdir /var/zookeeper
cp zookeeper-3.4.5/conf/zoo_sample.cfg zookeeper-3.4.5/conf/zoo.cfg

Now edit the zookeeper-3.4.5/conf/zoo.cfg file:

dataDir=/var/zookeeper

Edit the /etc/supervisord.conf file and add a section about ZooKeeper to it:

[program:zookeeper]
command=/opt/zookeeper-3.4.5/bin/zkServer.sh start-foreground
autostart=true
autorestart=true
startsecs=1
startretries=999
redirect_stderr=false
stdout_logfile=/var/log/zookeeper-out
stdout_logfile_maxbytes=10MB
stdout_logfile_backups=10
stdout_events_enabled=true
stderr_logfile=/var/log/zookeeper-err
stderr_logfile_maxbytes=100MB
stderr_logfile_backups=10
stderr_events_enabled=true

Start the supervision and thereby the ZooKeeper service:

chkconfig supervisord on
service supervisord start

Running the supervisorctl command should result in something like this:

zookeeper      RUNNING    pid 1115, uptime 1 day, 0:07:33

Nimbus and Supervisor nodes

Every Storm node has a set of dependencies that need to be satisfied. We start with ZeroMQ and JZMQ:

yum -y install gcc gcc-c++ libuuid-devel make wget
cd /opt
wget http://download.zeromq.org/zeromq-2.2.0.tar.gz
tar zxvf zeromq-2.2.0.tar.gz
cd zeromq-2.2.0
./configure
make install
ldconfig

yum install java-1.7.0-openjdk-devel unzip libtool
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.9.x86_64
cd /opt
wget https://github.com/nathanmarz/jzmq/archive/master.zip
mv master master.zip
unzip master.zip
cd jzmq-master
./autogen.sh
./configure
make install

Then we move onto Storm itself:

cd /opt
wget https://github.com/downloads/nathanmarz/storm/storm-0.8.1.zip
unzip storm-0.8.1.zip
mkdir /var/storm

Now edit the storm-0.8.1/conf/storm.yaml file, replacing the IP addresses as needed:

storm.zookeeper.servers:
 - "10.20.30.40"
nimbus.host: "10.20.30.41"
storm.local.dir: "/var/storm"

Finally we edit the supervision configuration file /etc/supervisord.conf:

[program:storm_nimbus]
command=/opt/storm-0.8.1/bin/storm nimbus
autostart=true
autorestart=true
startsecs=1
startretries=999
redirect_stderr=false
stdout_logfile=/var/log/storm-nimbus-out
stdout_logfile_maxbytes=10MB
stdout_logfile_backups=10
stdout_events_enabled=true
stderr_logfile=/var/log/storm-nimbus-err
stderr_logfile_maxbytes=100MB
stderr_logfile_backups=10
stderr_events_enabled=true

[program:storm_ui]
command=/opt/storm-0.8.1/bin/storm ui
autostart=true
autorestart=true
startsecs=1
startretries=999
redirect_stderr=false
stdout_logfile=/var/log/storm-ui-out
stdout_logfile_maxbytes=10MB
stdout_logfile_backups=10
stdout_events_enabled=true
stderr_logfile=/var/log/storm-ui-err
stderr_logfile_maxbytes=100MB
stderr_logfile_backups=10
stderr_events_enabled=true

And start the supervision:

chkconfig supervisord on
service supervisord start

Running the supervisorctl command should result in something like this:

storm_nimbus   RUNNING    pid 1119, uptime 1 day, 0:20:14
storm_ui       RUNNING    pid 1121, uptime 1 day, 0:20:14

The Storm UI should now be accessible. Point a webbrowser at port 8080 on the Nimbus host, and you should get something like this:

storm-ui

Note that the screenshot also shows an active topology, which will not be available if you just followed the steps in this tutorial and haven’t deployed a topology to the cluster yet.

Posted in Cloud computing, programming | Tagged , , | Leave a comment