Making your own smart ‘machine learning’ thermostat using Arduino, AWS, HBase, Spark, Raspberry PI and XBee

Previous part:
2. Reading data form a ‘dumb’ thermostat and various temperature sensors (Arduino)

3. Sending data, at 1,000 values per second, to a Raspberry PI (Python)

3.1 1,000 values per second

When I started the project I had no idea at what resolution I had to collect data in order to analyze the workings of a normal thermostat. In my job I had worked on a project that involved sensor data from industrial equipment installed in a natural gas field. In this project collecting data from analogue sensors presented a problem because of the huge amount of values per second that had to be stored and processed. So for my own thermostat project I wanted to collect sensor information at the sub-second level as well.

3.2 In memory database on a Raspberry PI

I set my goal on collecting one value every millisecond. In this scenario the Arduino would send 1,000 values per second to the Raspberry PI. That meant the pi would have to be able to ingest and process at at least the same rate.

For storing data on the Raspberry PI I wanted to use a database, mostly because I am familiar with these and that it makes averaging data very simple. I was very disappointed when I read all the performance reviews of using a common database system like MySQL or PostgreSQL on the Raspberry PI. My initial experimentation with SQLite showed it also would not meet the required performance requirements. That changed when I stored the SQLite database in memory. This more than allowed for ingesting and processing the sensor values at 1,000 per second. I wanted to use a cloud server for permanent storage, so the Raspberry PI did not have to store the data for a long period of time.

Storing the sensor data in a in-memory database requires different processes accessing that same in memory database. The only method I got working (in Python) used a multi-threaded Python script accessing the in-memory database. All threads have to be started from a single Python script for all the processes to be able access the same database. I used Robert Binns’ Another Python SQLite Wrapper (APSW) to support multithreading.

3.3 Datamodel sensor information Rapsberry PI database

The datamodel of the database is a key value store. In addition the key is split in different fields to allow indexing of different parts of the key. The key design is discussed in the next part. Values are stored as 64bit integers.

3.4 Connecting the XBee to the Raspberry PI

The schematic and breadboard layout of the Arduino and XBee is already discussed in the previous chapter. Connecting the XBee to the Raspberry PI is explained in Michael Bouvy’s excellent blog post Raspberry PI + XBee: UART / Serial howto .

3.5 XBee Arduino code

Both the Arduino’s and Raspberry PI’s XBee run in API mode. This allows bi-directional data transfer and makes future extensions possible because it can handle more than two XBee nodes in the network. Using the XBee’s in API mode makes is more difficult to program.

On the Arduino side I followed the blog post of Desert Home on sending data using an Xbee 2 in API mode . Immediately I found out that the XBee 2 is not capable of sending 1,000 values per second and that it requires a pause between the sends. Because I was sending different sensor values I made a carousel that sends the value of a different sensor at every send interval. All sensors have an identifier starting with A followed by a two digit integer. The Arduino code is shown below. The carousel sends the sensor id and the value to a function that actually sends the value to the Raspberry PI through the XBee.

The code below shows the used Arduino program. I did not have stability issues, but when evaluating the code, several problems popped up. First I believe you are not supposed to use character arrays in combination with functions. Second the variable ‘previousMillisSend’ is a long whereas the variable ‘currentMillis’ is an unsigned long. This probably causes problems when the previousMillisSend rolls over and the currentMillis does not.

3.6 Stability issues when sending more than 20 values per second

The code above also highlights another issue. I was never able to send more than 20 values per second without stability issues. When the Arduino was both receiving and sending data from and to the Raspberry PI, I had to increase the interval to 100ms to prevent crashes. The XBee 2 manual states that 50hz is the maximum sample rate so it is possible that this is not a coincidence.

To see how far I could stress the amount of values sent per second, I included the possibility to handle an array of values on the Raspberry PI side. The payload of the XBee can be a maximum of 72 bytes. Subtracting 4 bytes for the sensor id leaves 68 bytes for sensor values. Every sensor value has a maximum of 4 digits. Including a comma as separation character about 12 values could be sent in each payload. The Arduino code to actually send an array of values, as shown below, was not implemented until recently and has not been running stable, most likely due to the issues discussed above, especially the use of character arrays in combination with functions.

3.7 XBee Raspberry PI Python Code

At the Raspberry PI side the python code was relatively easy to implement. The basis is a thread that receives the data from the XBee and puts it in the in-memory database for further processing by other threads. By default the function receiving the data takes a regular sensor payload from an XBee. A special case is made for receiving data from the serial port of the other XBee. In this case it is assumed the other XBee is connected to an Arduino and the serial port is not the actual sensor identifier. Instead the first 3 characters of the payload, as explained it the Arduino part, are added to the sensor identifier and the rest of the payload is seen as an array of values.

For inserting the values in the database I made an insert function that correctly sets the different parts of the key and finally the key itself, as will be explained in the next part Storing data in the Amazon Cloud.

A second thread creates a rolling average for each of the sensor values. Every second it calculates the average measured temperature of the last 20 seconds. This value is also inserted into the main sensor value table so that the average values are also uploaded to the cloud server. A third thread runs every 5 seconds and deletes all entries in the sensor table older than 2 minutes, keeping two minutes of data on the Raspberry PI.

Thread 1: receiving sensor data and inserting in database

Thread 2: creating rolling average

Thread 3: deleting old data

Next part:
4. Storing data in the Amazon Cloud (HBase)


2 thoughts on “Making your own smart ‘machine learning’ thermostat using Arduino, AWS, HBase, Spark, Raspberry PI and XBee

  1. Pingback: Enabling technologies: how to build your own NEST | SmartDomus

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s