Installing a Storm cluster on CentOS hosts

Storm is a distributed, realtime computation system to reliably process unbounded streams of data. The following picture shows how data is processed in Storm:

storm-processing

This tutorial will show you how to install Storm on a cluster of CentOS hosts. A Storm cluster contains the following components:

storm-cluster

Nimbus is the name for the master node. Nimbus is responsible for distributing code around the cluster, assigning tasks to machines, and monitoring for failures. The nodes that perform the work contain a supervisor and each supervisor is in control of one or more workers on that node. ZooKeeper is used for coordination between nimbus and the supervisors.

All nodes

We start with disabling SELinux and iptables on every host. This is a bad idea if you are running your cluster on publicly accessible machines, but makes it a lot easier to debug network problems. SELinux is enabled by default on CentOS. To disable it, we need to edit /etc/selinux/config:

SELINUX=disabled

We need to reboot the machine for this to take effect.

The firewall has some default rules we want to get rid of:

iptables --flush
iptables --table nat --flush
iptables --delete-chain
iptables --table nat --delete-chain
/etc/init.d/iptables save

Storm and ZooKeeper are both fail-fast systems, which means that a Storm or ZooKeeper process will kill itself as soon as an error is detected. It is therefore necessary to put the Storm and ZooKeeper processes under supervision. This will make sure that each process is restarted when needed. For supervision we will use supervisord. Installation is performed like this:

rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm
yum install supervisor

ZooKeeper node

We will now create a single ZooKeeper node. Take a look at the ZooKeeper documentation to install a cluster.

yum -y install java-1.7.0-openjdk-devel wget
cd /opt
wget http://apache.xl-mirror.nl/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz
tar zxvf zookeeper-3.4.5.tar.gz
mkdir /var/zookeeper
cp zookeeper-3.4.5/conf/zoo_sample.cfg zookeeper-3.4.5/conf/zoo.cfg

Now edit the zookeeper-3.4.5/conf/zoo.cfg file:

dataDir=/var/zookeeper

Edit the /etc/supervisord.conf file and add a section about ZooKeeper to it:

[program:zookeeper]
command=/opt/zookeeper-3.4.5/bin/zkServer.sh start-foreground
autostart=true
autorestart=true
startsecs=1
startretries=999
redirect_stderr=false
stdout_logfile=/var/log/zookeeper-out
stdout_logfile_maxbytes=10MB
stdout_logfile_backups=10
stdout_events_enabled=true
stderr_logfile=/var/log/zookeeper-err
stderr_logfile_maxbytes=100MB
stderr_logfile_backups=10
stderr_events_enabled=true

Start the supervision and thereby the ZooKeeper service:

chkconfig supervisord on
service supervisord start

Running the supervisorctl command should result in something like this:

zookeeper      RUNNING    pid 1115, uptime 1 day, 0:07:33

Nimbus and Supervisor nodes

Every Storm node has a set of dependencies that need to be satisfied. We start with ZeroMQ and JZMQ:

yum -y install gcc gcc-c++ libuuid-devel make wget
cd /opt
wget http://download.zeromq.org/zeromq-2.2.0.tar.gz
tar zxvf zeromq-2.2.0.tar.gz
cd zeromq-2.2.0
./configure
make install
ldconfig

yum install java-1.7.0-openjdk-devel unzip libtool
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.9.x86_64
cd /opt
wget https://github.com/nathanmarz/jzmq/archive/master.zip
mv master master.zip
unzip master.zip
cd jzmq-master
./autogen.sh
./configure
make install

Then we move onto Storm itself:

cd /opt
wget https://github.com/downloads/nathanmarz/storm/storm-0.8.1.zip
unzip storm-0.8.1.zip
mkdir /var/storm

Now edit the storm-0.8.1/conf/storm.yaml file, replacing the IP addresses as needed:

storm.zookeeper.servers:
 - "10.20.30.40"
nimbus.host: "10.20.30.41"
storm.local.dir: "/var/storm"

Finally we edit the supervision configuration file /etc/supervisord.conf:

[program:storm_nimbus]
command=/opt/storm-0.8.1/bin/storm nimbus
autostart=true
autorestart=true
startsecs=1
startretries=999
redirect_stderr=false
stdout_logfile=/var/log/storm-nimbus-out
stdout_logfile_maxbytes=10MB
stdout_logfile_backups=10
stdout_events_enabled=true
stderr_logfile=/var/log/storm-nimbus-err
stderr_logfile_maxbytes=100MB
stderr_logfile_backups=10
stderr_events_enabled=true

[program:storm_ui]
command=/opt/storm-0.8.1/bin/storm ui
autostart=true
autorestart=true
startsecs=1
startretries=999
redirect_stderr=false
stdout_logfile=/var/log/storm-ui-out
stdout_logfile_maxbytes=10MB
stdout_logfile_backups=10
stdout_events_enabled=true
stderr_logfile=/var/log/storm-ui-err
stderr_logfile_maxbytes=100MB
stderr_logfile_backups=10
stderr_events_enabled=true

And start the supervision:

chkconfig supervisord on
service supervisord start

Running the supervisorctl command should result in something like this:

storm_nimbus   RUNNING    pid 1119, uptime 1 day, 0:20:14
storm_ui       RUNNING    pid 1121, uptime 1 day, 0:20:14

The Storm UI should now be accessible. Point a webbrowser at port 8080 on the Nimbus host, and you should get something like this:

storm-ui

Note that the screenshot also shows an active topology, which will not be available if you just followed the steps in this tutorial and haven’t deployed a topology to the cluster yet.

Advertisements
Posted in Cloud computing, programming | Tagged , , | Leave a comment

Installing Apache Libcloud on CentOS

Apache Libcloud is a standard Python library that abstracts away differences among multiple cloud provider APIs. At the moment it can be used to manage four different kinds of cloud services: servers, storage, loadbalancers and DNS. Here are the steps to install Libcloud on a machine running CentOS 6:

rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm
yum install python-pip
pip-python install apache-libcloud

The first step installs the Extra Packages for Enterprise Linux (EPEL) repository. This repository contains the pip command, which is a package manager for Python. If you want to deploy nodes on the different clouds, you need an additional package:

yum install gcc python-devel
pip-python install paramiko

The paramiko package adds SSH capabilities to Python and allows the Libcloud library to SSH into your nodes and perform initial configuration.

Posted in Cloud computing, programming | Tagged , | 2 Comments

Building applications for health devices with Antidote and NodeJS

In this post I will explain how to build health apps that use Continua Health Alliance certified health devices with NodeJS. To communicate with these devices I am using the D-Bus health service from the open source Antidote IEEE 11073 stack library. Signove, the authors of the Antidote library, did excellent work creating an open source software stack that can be used to develop health applications. They provided good developer documentation (PDF) that helped me a lot to get things working.

Please note that the IEEE 11073 specificationis not an open specification. You can purchase the specification at IEEE. Without this information it is difficult to build an application since all of the device and attribute definitions are defined in this specification. Searching the web may, or may not, help you to get along without the specifications…

Setup

Roughly my setup breaks down into 3 components:

Overview

The Antidote Health Service handles all communications with the health devices. I’ve only managed to get Antidote running on Linux but it supports more platforms, please refer to their documentation for more info on this.

My health application itself is written for NodeJS and uses node-dbus to communicate with the Antidote Health Service. There are a couple of D-Bus modules available for NodeJS but node-bus was the only one that worked for me. I did not have prior experience with D-Bus programming and not all of the examples included with node-dbus made sense to me. I spend quite some time figuring out how to communicate with the Health Service via D-Bus. The python example, included in the Antidote software helped me out quite a lot.

As for the health device: I tested my setup using a Continua Certified Omron body composition monitor(or, as you prefer: a weighing scale…).

Tying it together

To be able to use the health service, an object should be registered on the dbus that will listen to messages of the health service. The code fragments below show how this can be done.

First, require the dependencies and make some definitions.

var dbus = require("dbus");
var xpath = require('xpath');
var dom = require('xmldom').DOMParser;

// data type of a body weight scale device
var BODY_WEIGHT_SCALE = 0x100f;

// metric id for body mass measurement
var MDC_MASS_BODY_ACTUAL = 57664;

As I mentioned above node-dbus is used to communicate with the Health Service. The xpath and xmldom modules are used to parse the information that is received from the weighing scale. The weighing scale’s data type is defined by an integer defined in BODY_WEIGHT_SCALE. The information from a measurement event is received in an XML document. Within the XML document the body mass (weight) is identified by the value of MDC_MASS_BODY_ACTUAL.

The following code fragment shows how to start using the dbus and configure dbus for using it with the health service.

dbus.start(function() {
  var bus = dbus.system_bus();
  var manager;

  try {
    manager = dbus.get_interface(
      bus,
      "com.signove.health",
      "/com/signove/health",
      "com.signove.health.manager"
    );
  } catch (err) {
    console.log('Is the healthd process running?');
    process.exit(1);
  }

When the reference to the manager interface of the health manager is made, the health application can register itself as a listener to the health service. The code below shows how this is done.

First we get a reference to the dbus registration mechanism and request a name on the dbus, in this case the name is org.servicelab.healthapp. Then a name is created for the object we are going to register, to make the name unique the process id of the currently running process is used.

Then the methods that will be listened to are defined in the Methods object. All messages that are received from the health service will generate callbacks to functions that are defined in this object. An example of the Methods object will be given below.

The Methods object is registered at the dbus using the objectName. The Methods object will implement the com.signove.health.agent interface.

  var register = new dbus.DBusRegister(dbus, bus);
  dbus.requestName(bus, 'org.servicelab.healthapp');
  var objectName = '/org/servicelab/healthapp/' + process.pid;

  var Methods = { // ... };

  register.addMethods(
    objectName,
    'com.signove.health.agent',
    Methods
  );

  manager.ConfigurePassive(objectName, [BODY_WEIGHT_SCALE]);

This concludes the registration of the listener. Only the Methods object needs to be implemented to get things working. The interface of this object is documented in Antidote’s documentation. Not all methods are implemented and the Continua device that I used did not support all features either. The code example below shows how to the device attributes of the device that is connecting and how to get the measured weight from the measurement data.

var Methods = {
  Connected: function(device, address) { },
  Associated: function (device, xmldata) {
    device = dbus.get_interface(
      bus,
      'com.signove.health',
      device,
      'com.signove.health.device'
    );
    device.RequestDeviceAttributes();
  },
  MeasurementData: function(device, xmldata) {
    var doc = new dom().parseFromString(xmldata);
    var weight = parseFloat(
      xpath.select("//meta-data[meta='" + MDC_MASS_BODY_ACTUAL + "']/../simple/value/text()", doc)
    );
    console.log('Measured weight is: ' + weight);
  },
  DeviceAttributes: function(device, xmldata) {
    console.log(xmldata);
  },
  Disassociated: function(device) { },
  Disconnected: function(device) { },
  PMStoreData: function(device, handle, xmldata) { },
  SegmentInfo: function (device, handle, xmldata) { },
  SegmentDataResponse: function(device, handle, segment, response) { },
  SegmentData: function(device, handle, segment, xmldata) { },
  SegmentCleared: function(device, handle, segment, xmldata) { }
};

When the device becomes associated with the health service the Associated function gets called. Within this function the device can be queried for its attributes. The device will answer this request using the DeviceAttributes functions. The device attributes are passed to this function as an XML document.

Measurement data will be delivered to the application via the MeasurementData function. The measurement data is in XML format. In the above example is shown how to get the weight from a measurement using an XPath query.

Gist

The source code is also available as gist.

This configuration is used in a demonstrator of the Figaro project which demonstrates how IP-based and non-IP based home networks can be converged (PDF). This demonstrator is shown in the IEEE booth on the CES coming January.

Posted in programming | Tagged , , , , , , , , , | 5 Comments

Wirelessly control an Arduino with NodeJS over Bluetooth

I wanted to control my Arduino via Bluetooth using NodeJS but I could not find a Node module to do it. That is why I decided to build my own. This post describes how to use it.

Arduino setup

First, lets take a look at the Arduino setup I am using. It is a simple Arduino Uno with breadboard. For Bluetooth connectivity I’ve added a Bluetooth shield. For testing purposes I’ve configured a simple layout on the breadboard that allows me to control a LED. The picture below shows the configuration.

arduinosetup

I wrote a simple schema to control the LED. The program can change the status of the LED according to the value that is read from the serial Bluetooth connection. The program also allows to read the current state of the LED.

Bluetooth-serial-port

On the NodeJS side I have created a module that allows a script to communicate via a Bluetooth serial connection. The module can be used to communicate via Bluetooth as well as to search for Bluetooth devices and serial port channels.

Currently the module only supports the Bluez Bluetooth stack on Linux. I might add OS X support in the future. supports both Linux, Mac OS X and Windows (thanks Elmar!).

The module is available on npm and can be installed by issuing:

$ npm install bluetooth-serial-port

Using the module

To use the module you’ll have to import it into your script. Below is a simple example program that controls the Arduino configuration described above.

var BTSP = require('bluetooth-serial-port');
var serial = new BTSP.BluetoothSerialPort();

serial.on('found', function(address, name) {

    // you might want to check the found address with the address of your
    // bluetooth enabled Arduino device here.

    serial.findSerialPortChannel(address, function(channel) {
        serial.connect(bluetoothAddress, channel, function() {
            console.log('connected');
            process.stdin.resume();
            process.stdin.setEncoding('utf8');
            console.log('Press "1" or "0" and "ENTER" to turn on or off the light.')

            process.stdin.on('data', function (data) {
                serial.write(data);
            });

            serial.on('data', function(data) {
                console.log('Received: ' + data);
            });
        }, function () {
            console.log('cannot connect');
        });
    });
});

serial.inquire();

Open issue

Currently the module works quite well. The only thing not working is when a script wants to reconnect the Bluetooth connection.

When a connection is ended, for example when the Arduino is switched off, and the scripts starts a new Bluetooth inquiry the module will find the Bluetooth serial channel again but does not connect to it.

My current work around for this issue is to terminate my script when a connection has ended and than restart the script again. To achieve this I’m using forever.

For example…

I hope this post helps you to build cool stuff using using NodeJS and Bluetooth. I’m curious about the applications you’ll come up with. Please drop me a note ;-)

I’ve used the above configuration to make a UPnP controlable Bluetooth lightbulb prototype. For the UPnP side of the prototype I used the upnp-device module. The prototype will be part of the Figaro demonstrator that will demonstrate how IP-based and non-IP based home networks can be converged (PDF). This demonstrator is shown in the IEEE booth on the CES coming January.

All sources from this post are available as gist.

Happy programming!

Posted in programming | Tagged , , , , , , , , , | 15 Comments

Creating a Java WebStart (JNLP) application

Java WebStart, also called Java Network Launching Protocol (JNLP), allows you to launch Java applications directly from the internet using a webbrowser. In this article we will create a simple application and all configuration files necessary to launch it through Java WebStart.

We start with a simple Java program with a GUI:

package nl.jansipke.samplegui;

import javax.swing.JFrame;
import javax.swing.SwingUtilities;

public class SampleGUI extends JFrame {

    private static final long serialVersionUID = 522159447010444143L;

    public SampleGUI() {
        setTitle("Sample GUI");
        setSize(300, 200);
        setLocationRelativeTo(null);
        setDefaultCloseOperation(EXIT_ON_CLOSE);
    }

    public static void main(String[] args) {
        SwingUtilities.invokeLater(new Runnable() {
            public void run() {
                SampleGUI sampleGUI = new SampleGUI();
                sampleGUI.setVisible(true);
            }
        });
    }
}

Java WebStart applications run in a sandbox with very limited default capabilities. The application can ask for extra permissions in the JNLP file, as can be seen in the following file.

<?xml version="1.0" encoding="utf-8"?>
<jnlp spec="1.0+" codebase="http://www.jansipke.nl" href="SampleGUI.jnlp">

    <information>
        <title>SampleGUI</title>
        <vendor>Some Vendor Name</vendor>
        <homepage href="http://www.jansipke.nl"/>
        <description>SampleGUI description</description>
    </information>

    <security>
        <all-permissions/>
    </security>

    <resources>
        <j2se version="1.6+"/>
        <jar href="SampleGUI.jar"/>
    </resources>

    <application-desc main-class="nl.jansipke.samplegui.SampleGUI"/>

</jnlp>

The JAR file needs to be signed for these permissions to take effect. We need to create a keystore file for that first. If needed you may change the alias and the keystore file name.

keytool -genkey -alias alias -keystore keystore.bin

Answer the questions the keytool command asks and copy the file into a directory where the following ANT build script can find it:

<?xml version="1.0" encoding="UTF-8"?>
<project name="samplegui" basedir=".">

    <property name="dir.build" value="bin" />
    <property name="dir.dist" value="dist" />
    <property name="dir.src" value="src" />
    <property name="file.jar" value="SampleGUI.jar" />

    <path id="compile.classpath">
        <fileset dir=".">
            <include name="lib/*.jar" />
        </fileset>
    </path>

    <target name="clean" description="Clean project">
        <delete dir="${dir.build}" />
    </target>

    <target name="prepare" description="Prepare project">
        <mkdir dir="${dir.build}" />
    </target>

    <target name="compile" description="Compile project" depends="prepare">
        <javac destdir="${dir.build}" classpathref="compile.classpath" debug="true" includeantruntime="false">
            <src path="${dir.src}" />
        </javac>
    </target>

    <target name="jar" description="Build jar file" depends="compile">
        <mkdir dir="${dir.dist}" />
        <jar destfile="${dir.dist}/${file.jar}" basedir="${dir.build}">
            <manifest>
                <attribute name="Main-Class" value="nl.jansipke.samplegui.SampleGUI"/>
            </manifest>
        </jar>
    </target>

    <target name="signjar" description="Sign jar file" depends="jar">
        <signjar jar="${dir.dist}/${file.jar}" alias="alias" storepass="secret" keystore="keystore.bin"/>
    </target>

</project>

Now run the ANT script (target signjar) and copy the resulting JAR file and the JNLP file to a directory on your webserver. Fire up a webbrowser and point it to the JNLP file. If all goes well, it will present you with a warning about permissions. Check yes and it will start the application.

Posted in programming | Tagged , , | Leave a comment

Creating network diagrams with D3.js

D3.js is a JavaScript library for manipulating documents based on data. It can be used for all sorts of visualizations including network diagrams. In this article we will create a network diagram with nodes and directed links between them, visualized by circles and lines with arrowheads. We start with the file index.html that holds the HTML and basic SVG structure:

<!DOCTYPE html>
<html>
    <head>
        <meta http-equiv="content-type" content="text/html;charset=utf-8">
        <title>Cloud</title>
        <script type="text/javascript" src="d3.v2.js"></script>
    </head>
    <body>
        <svg id="cloud" width="800" height="600">
            <defs>
                <marker id="arrow" viewbox="0 -5 10 10" refX="18" refY="0"
                        markerWidth="6" markerHeight="6" orient="auto">
                    <path d="M0,-5L10,0L0,5Z">
                </marker>
           </defs>
        </svg>
        <link href="cloud.css" rel="stylesheet" type="text/css" />
        <script src="cloud.js" type="text/javascript"></script>
    </body>
</html>

The file cloud.js contains the Javascript code to generate the SVG code according to some JSON content:

var width = 1200;
var height = 800;

var color = d3.scale.category10();

var force = d3.layout.force()
    .charge(-180)
    .linkDistance(70)
    .size([width, height]);

var svg = d3.select("#cloud");

d3.json("cloud.json", function(json) {
    force
        .nodes(json.nodes)
        .links(json.links)
        .start();

    var links = svg.append("g").selectAll("line.link")
        .data(force.links())
        .enter().append("line")
        .attr("class", "link")
        .attr("marker-end", "url(#arrow)");

    var nodes = svg.selectAll("circle.node")
        .data(force.nodes())
        .enter().append("circle")
        .attr("class", "node")
        .attr("r", 8)
        .style("fill", function(d) { return color(d.group); })
        .call(force.drag);

    nodes.append("title")
        .text(function(d) { return d.name; });

    force.on("tick", function() {
        links.attr("x1", function(d) { return d.source.x; })
            .attr("y1", function(d) { return d.source.y; })
            .attr("x2", function(d) { return d.target.x; })
            .attr("y2", function(d) { return d.target.y; });

        nodes.attr("cx", function(d) { return d.x; })
            .attr("cy", function(d) { return d.y; });
    });
});

The file cloud.json contains the JSON that the Javascript uses to create SVG:

{
    "nodes":
        [
            {"name":"Client 1",       "group":1},
            {"name":"Loadbalancer 1", "group":2},
            {"name":"Webserver 1",    "group":3},
            {"name":"Webserver 2",    "group":3}
        ],
    "links":
        [
            {"source":0, "target":1, "value":1},
            {"source":1, "target":2, "value":1},
            {"source":1, "target":3, "value":1}
        ]
}

The final file cloud.css contains the CSS to make things more pretty:

circle.node {
    stroke: #fff;
    stroke-width: 3px;
}

line.link {
    stroke-width: 2px;
    stroke: #999;
    stroke-opacity: 0.6;
}

marker#arrow {
    stroke: #999;
    fill: #999;
}
Posted in programming | Tagged , | Leave a comment

Welcome to Framework Limbo! Using Eclipse, Maven, GWT and Lombok

One of the main advantages of service engineering in Java is that there are quite a lot of environments, libraries, frameworks, IDEs and plugins to choose from. That’s also a major drawback. We ran into this while doing a recent prototyping project, and once again discovered the hard way that there is a certain threshold before you run into framework limbo, dependency hell and assorted related afflictions.

There is a saying in Dutch about donkeys and bumping into the same stone twice. Our new general rule of thumb is that you should use never use more than about two (plus or minus one) environments, libraries, frameworks, IDEs and / or plugins at once. Ever. You’ll save yourself from being a donkey by using only the stuff you really need. So read on if you want to risk countless hours of frustration with Eclipse, Maven Google Web Toolkit and Lombok.

Disclaimer: we have switched to using another GUI framework which is more suited to our needs.

Continue reading

Posted in programming | Tagged , , , , , | 1 Comment