Saving costs with a new scheduler in Cloud Foundry Diego

Cloud Foundry

In the Mendix Cloud we run thousands of Mendix apps on Cloud Foundry on AWS. Mendix Runtime Engines that currently run in 2, 4, 8 or 16 GB memory containers. Mendix developers have the possibility to start, stop, scale and upload new versions of their app themselves via our Developer Portal.

This results in the fact that we must have diego-cell instances with at least 16 GB memory available at all times so that a Mendix developer can start their 16 GB memory Runtime Engine.

We found out the way Diego schedules LRPs (Long-Running Processes) on diego-cell EC2 instances can be more optimal in our usecase. In Diego there is only one scheduling algorithm. In a nutshell, app instances (LRPs) get deployed to a diego-cell with most resources available. This way app instances get balanced across diego-cell instances equally.

Nima and Jen did a really nice presentation during the last Cloud Foundry Summit in The Hague about how scheduling in Cloud Foundry works.

Let’s say you have a number of AWS EC2 m5.4xlarge (64 GB memory) diego-cell instances. At some point all diego-cell instances are filled up equally with app instances (LRPs) and all diego-cell instances have about 16 GB memory available. At some point this gets to 14~15 GB memory available. Then we have to add additional diego-cell instances to keep supporting the deployment of 16 GB memory Mendix Runtime Engines. But.. when deploying more app instances (LRPs) after scaling up, they get scheduled to the new diego-cell instances, also when they are 2, 4 or 8 GB app instances, until all diego-cell instances have ~16 GB available again.

In practice it looks like this (20 diego-cell instances, 64GB memory):

Graph: Remaining Memory (per diego-cell)

Result: 25% of the memory of our diego-cell instances is unused, wasted.

Now we could scale up to AWS EC2 m5.8xlarge (128 GB memory), so we only waste 12.5%, but at some point we also want to support app instances with 32 GB memory.

We have looked into isolation segments. Having for example an isolation segments per app instance size. Unfortunately that does not work for us. Mendix developers don’t notice this because its abstracted away for them, but they run different app instance sizes in one “Org” and “isolation segments” apply to an “Org”.

The quest to a new scheduling algorithm

I’ve been looking at this inefficient usage of resources for quite a while now. I also investigated how the scheduling algorithm in Diego works before Nima and Jen gave the presentation. During the Cloud Foundry Summit I had a chat with Nima if it would make sense to invest time in adding or changing the scheduler in Diego. Project Eirini was close to a version 1.0 release, where app instances run on Kubernetes. Kubernetes is more flexible with scheduling algorithms. So that could solve our issue as well.

First I thought: “No, let’s wait for Eirini.” But it would probably still take a year before we would migrate to Eirini in production. Having an improved scheduler in Diego would mean a cost saver for us right now.

Goal of the new scheduling algorithm

Mendix apps are memory heavy. In a shared environment, with running many Mendix Runtime Engines on one Cloud Foundry diego-cell instance, we notice that there is more then enough CPU resources available. Mendix developers mainly scale up their app by adding more memory (or adding more instances). So in our case we want to fill up diego-cell instances as much as possible.

How scheduling LRPs in Diego works technically

Like Nima explained in the presentation, the scheduler makes a decision where to deploy an app instance (LRP) based on a score the diego-cell instances provide. The lowest score wins. The score is calculated here:

It basically drills down to:

Score = ((Memory + Disk + Containers) / 3) + StartingContainers + Locality

  • Memory: percentage of memory that is still available
  • Disk: percentage of disk that is still available
  • Containers: percentage of containers it still can host (max 256 per diego-cell)
  • StartingContainers: number of starting containers x weight (usually 0.25)
  • Locality: 1000 when its already hosting an instance of the same app

For example:

((0.5 + 0.5 + 0.39) / 3) + 0.25 + 0 = 0.7133

  • Memory: diego-cell has 50% of its memory available
  • Disk: diego-cell has 50% of its disk available
  • Containers: diego-cell runs 100 containers (100/256)
  • StartingContainers: there is currently 1 container starting
  • Locality: this diego-cell does not run an instance of the same app

The idea: Bin Pack First Fit Weight

Scaling up and down the number of diego-cell instances is based on the index number BOSH assigns to an instance. When you add 1 diego-cell instance and after that remove 1 diego-cell instance the instance that was just created gets removed.

What if we could make a diego-cell more attractive to deploy to based on the index number it has. This way diego-cell instances with a lower index number could be filled up first. As long as it has enough resources available. This could be called Bin Pack First Fit.

The index number can be displayed using the “bosh instances” command:

$ bosh -d cf instances -i --column=instance --column=index
Instance                                                    Index
diego-cell/0342c42b-756e-4951-8280-495261e38f53            	0	
diego-cell/16be34ce-bd34-4837-8431-51f6bc4a0fa8            	1	
diego-cell/e3bec1d3-0899-4502-9f43-4049f53721b1            	2	
diego-cell/2581addf-4f08-421e-ab9d-c52772f50315            	3	

Like with “StartingContainers“, we could add some weight to the total score based on the index number a diego-cell instance has. This way it is also still possible to completely disable the Bin Pack First Fit weight component in the algorithm by setting the weight to 0 and keep the existing algorithm Diego has currently.

It will work like this:

Score = ((Memory + Disk + Containers) / 3) + StartingContainers + Locality + Index

  • Memory: percentage of memory that is still available
  • Disk: percentage of disk that is still available
  • Containers: percentage of containers it still can host (max 256 per diego-cell)
  • StartingContainers: number of starting containers x weight (usually 0.25)
  • Locality: 1000 when its already hosting an instance of the same app
  • Index: BOSH index number x weight

Let’s take the previous example, assume all diego-cell instances are filled up equally and add an index weight of 0.25:

  • diego-cell 0: ((0.5 + 0.5 + 0.39) / 3) + 0.25 + 0 + (0*0.25) = 0.7133
  • diego-cell 1: ((0.5 + 0.5 + 0.39) / 3) + 0.25 + 0 + (1*0.25) = 0.9633
  • diego-cell 2: ((0.5 + 0.5 + 0.39) / 3) + 0.25 + 0 + (2*0.25) = 1.2133
  • diego-cell 3: ((0.5 + 0.5 + 0.39) / 3) + 0.25 + 0 + (3*0.25) = 1.4633

In this case the next app instance will be deployed to diego-cell 0. Exactly what we want. The weight, 0.25 currently, can be increased to make diego-cell instances with a lower BOSH index number even more attractive.

A Proof of Concept

As a Proof of Concept the above has been developed in Diego:

To test the updated scheduling algorithm, while this is not part of the official diego-release (yet), we create a custom diego-release and use that in our Cloud Foundry setup.

NOTE: this diego-release is based on diego-release v2.34 (cf-deployment v9.5)

git clone --recurse-submodules --branch bin-pack-first-fit
cd diego-release
bosh --sha2 cr --timestamp-version --tarball=diego-release-bin-pack-first-fit-v2.34.0-5-g0b5569154.tgz --force

Upload diego-release-bin-pack-first-fit-v2.34.0-5-g0b5569154.tgz somewhere online and create an ops file to deploy this diego-release version instead of the default one:

- type: replace
  path: /releases/name=diego
    name: diego
    url: https://<your-domain.tld>/diego-release-bin-pack-first-fit-v2.34.0-5-g0b5569154.tgz
    sha1: <sha1 of diego-release-bin-pack-first-fit-v2.34.0-5-g0b5569154.tgz>
    version: <version from the `bosh --sha2 cr` command>
- type: replace
  path: /instance_groups/name=scheduler/jobs/name=auctioneer/properties/diego/auctioneer/bin_pack_first_fit_weight?
  value: 0.25

The result: Weighted Bin Pack First Fit

The result is actually pretty amazing ūüôā (15 diego-cell instances, 128GB memory):

Graph: Remaining Memory (per diego-cell)

This graph shows a 48 hour period, where the deployment pattern of Mendix app instances is equal to the previous graph. It is definitely noticeable that the added “Bin Pack First Fit Weight” has impact. App instances (LRPs) are not spread equally anymore. In this case we could remove 2 or 3 diego-cell instances, while keeping at least 2 to 3 diego-cell instances with 16 GB memory available. ūüėÄ

And the cost saver? An AWS On-Demand EC2 m5.4xlarge instance costs around $18.432 per day in AWS region us-east-1. Let’s say you run 100 diego-cell instances in total and you could now remove 20 to 25, while keeping 16 GB memory available on at least a couple of diego-cell instances. That is a saving of $368.60~$460.80 per day, $134,553.60~$168,192.00 per year of On-Demand EC2 costs. ūüėé (With Reserved or Spot Instances this is of course less)

We’re hiring

Want to join our team and work on cool stuff like this?
Apply for a job at Mendix:

Server stats with collectd, InfluxDB and Grafana (with downsampling)

Almost 10 years ago I started developing a web frontend for collectd, Collectd Graph Panel (CGP). A PHP frontend that displays graphs in PNG format using rrdtool and the RRD files created by collectd.

A lot has happened since then. Because of the IoT hype time series databases like Graphite, InfluxDB and TimescaleDB became more popular. Also visualization tools gained more traction, of which Grafana is the most popular one.

In this blogpost I’m going to show a replacement of collectd, RRD files and CGP, by using collectd, InfluxDB and Grafana. I will:

  1. Hook up collectd to InfluxDB to store the metrics
  2. Configure InfluxDB to aggregate data over time (it doesn’t do this automatically like RRD)
  3. Use a Grafana dashboard to display the graphs with the same colors and styling I was used to in CGP

Hooking up collectd to InfluxDB

This is pretty simple. First of all follow the installation guide to install the InfluxDB service.

InfluxDB supports the collectd protocol. It can be configured to listen on UDP port 25826, which collectd clients can send metrics to.

I more or less used the default values that were already provided in /etc/influxdb/influxdb.conf:

  enabled = true
  bind-address = ":25826"
  database = "collectd"
  retention-policy = ""
  typesdb = "/usr/share/collectd/types.db"
  security-level = "none"
  batch-size = 5000
  batch-pending = 10  
  batch-timeout = "10s"
  read-buffer = 0

In the configuration of the collectd clients, InfluxDB can be configured as server in the network plugin:

LoadPlugin network
<Plugin network> 
  Server "<InfluxDB-IP-address>" "25826"

The metrics the collectd clients collect are now send to InfluxDB.

Downsampling data in InfluxDB

Unlike with the RRD files created by collectd, InfluxDB doesn’t come with a default downsampling policy. Metrics are just send by the collectd clients every 10 seconds and saved in InfluxDB and kept indefinitely. You will have super detailed graphs when you for example zoom in on some hourly statistics from 5 months ago, but your InfluxDB data-set will keep growing resulting in gigabytes of data per collectd client.

In my experience for server statistics you want to have detailed graphs for the most recent metrics. This is useful when you want to debug an issue. Older metrics are nice to display weekly, monthly, quarterly or yearly graphs to spot trends. For graphs with these timeframes 10 second metrics are not required. Metrics for these graphs can be aggregated.

In InfluxDB the combination of “Retention Policies” (RPs) and “Continuous Queries” (CQs) can be used to downsample the metrics. One of the things you can define with an RP is for how long InfluxDB keeps the data. CQs automatically and periodically execute pre-defined queries. This can be used to aggregate the metrics to a different RP.

I’ve been fairly happy with the aggregation policy in the RRD files used by collectd. Let’s try to setup the same data aggregation system in InfluxDB.

Information about the aggregation policy can be extracted from the RRD file by using the rrdinfo command. Let’s take for example the cpu-idle.rrd file. This shows that this RRD file contains 1 metric per 10 seconds:

$ rrdinfo cpu-idle.rrd | grep step
step = 10

And this shows the different aggregation policies for the average value of the metrics:

$ rrdinfo cpu-idle.rrd | grep AVERAGE -A6 | egrep '(rows|pdp_per_row)'
rra[0].rows = 1200
rra[0].pdp_per_row = 1
rra[3].rows = 1235
rra[3].pdp_per_row = 7
rra[6].rows = 1210   
rra[6].pdp_per_row = 50
rra[9].rows = 1202
rra[9].pdp_per_row = 223
rra[12].rows = 1201
rra[12].pdp_per_row = 2635

There are 5 different aggregations. They all have Primary Data Points per row (pdp_per_row), which means that for example 1 row (metric) is an aggregation of 7 Primary Data Points. And it shows the number of rows that are kept.

Summarized this RRD file contains:

  • 1200 metrics of a 10 second interval (12000s of data == 3.33 hours)
  • 1235 metrics of a (7*10) 70 second interval (86450s of data =~ 1 day)
  • 1210 metrics of a (50*10) 500 second interval (605000s of data == 1 week)
  • 1202 metrics of a (223*10) 2230 second interval (2680460s of data == 31 days)
  • 1201 metrics of a (2635*10) 26350 second interval (31646350s of data == 366 days)

Let’s connect to our influxdb instance and configure the same using RPs and CQs.

$ influx
Connected to http://localhost:8086 version 1.7.6
InfluxDB shell version: 1.7.6
Enter an InfluxQL query
> show databases
name: databases
> use collectd
Using database collectd
> show retention policies
name    duration  shardGroupDuration replicaN default
----    --------  ------------------ -------- -------
autogen 0s        168h0m0s           1        true

The database by default contains the “autogen” RP, with a duration of 0s. No data will be thrown away. First modify the duration of the autogen retention policy to 200 minutes:

> alter retention policy "autogen" on "collectd" duration 200m shard duration 1h
> show retention policies
name    duration  shardGroupDuration replicaN default
----    --------  ------------------ -------- -------
autogen 3h20m0s   1h0m0s             1        true  

Now add the additional RPs:

> show retention policies
name    duration  shardGroupDuration replicaN default
----    --------  ------------------ -------- -------
autogen 3h20m0s   1h0m0s             1        true  
day     24h0m0s   1h0m0s             1        false
week    168h0m0s  24h0m0s            1        false
month   744h0m0s  24h0m0s            1        false
year    8784h0m0s 168h0m0s           1        false

For downsampling in InfluxDB I want to use more logical durations compared to what was in the RRD file:

  • 70s -> 60 seconds
  • 500s -> 300 seconds (5 minutes)
  • 2230s -> 1800 seconds (30 minutes)
  • 26350s -> 21600 seconds (6 hours)

These CQs will downsample the data accordingly:

> CREATE CONTINUOUS QUERY "cq_day" ON "collectd" BEGIN SELECT mean(value) as value INTO "collectd"."day".:MEASUREMENT FROM /.*/ GROUP BY time(60s),* END
> CREATE CONTINUOUS QUERY "cq_week" ON "collectd" BEGIN SELECT mean(value) as value INTO "collectd"."week".:MEASUREMENT FROM /.*/ GROUP BY time(300s),* END
> CREATE CONTINUOUS QUERY "cq_month" ON "collectd" BEGIN SELECT mean(value) as value INTO "collectd"."month".:MEASUREMENT FROM /.*/ GROUP BY time(1800s),* END
> CREATE CONTINUOUS QUERY "cq_year" ON "collectd" BEGIN SELECT mean(value) as value INTO "collectd"."year".:MEASUREMENT FROM /.*/ GROUP BY time(21600s),* END

With these CQs and RPs configured you will get 5 data streams: autogen (the default), day, week, month and year. To retrieve the aggregated metrics from a specific RP you have to prefix the measurement in your select query with it. So for example to get the cpu idle metrics you can execute this to get the metrics in the 10s resolution:

> select * from "cpu_value"
# or
> select * from "autogen"."cpu_value"

To get it in 60s resolution (RP “day”):

> select * from "day"."cpu_value"

This is important to know when creating graphs in Grafana. When you want to show a “month” or “year” graph you can not simply do select value from "cpu_value" where type_instance='idle', because you will only get the metrics from the “autogen” RP. You have to explicitly define the RP.

Collectd graphs in Grafana

To install Grafana follow the installation guide.

Create a user in InfluxDB that can be used in Grafana to read data from InfluxDB:

> create user grafana with password <PASSWORD>
> grant read on collectd to grafana

To get access to the collectd data in InfluxDB you need to configure a data source in Grafana:

Configure CollectD data source.

Now let’s for example create a graph for the load average.

Select Retention Policy in query

As you can see you have to explicitly select the RP for the metrics you want to display in the graph. There is no easy way to get metrics automatically from all RPs at once. This is of course not really convenient, because once the graph on your dashboard is configured you want to be able to change the time range and just see the data from whatever RP that has the metrics in the most detailed way. So ideally you want the RP to be automatically selected based on the time range that is selected.

There are luckily more people having this issue and Talek found a nice workaround for it.

We can create a variable that executes a query based on the current “From” and “To” time range values in Grafana to find out what the correct RP is. This variable can be refreshed every time the time range changes. The query to find out the correct RP is executed on measurement “rp_config” that has a separate RP (forever) without a duration so this data never gets deleted.

Configure the extra RP and insert the RP data:

INSERT INTO forever rp_config,idx=1 rp="autogen",start=0i,end=12000000i,interval="10s" -9223372036854775806
INSERT INTO forever rp_config,idx=2 rp="day",start=12000000i,end=86401000i,interval="60s" -9223372036854775806
INSERT INTO forever rp_config,idx=3 rp="week",start=86401000i,end=604801000i,interval="300s" -9223372036854775806
INSERT INTO forever rp_config,idx=4 rp="month",start=604801000i,end=2678401000i,interval="1800s" -9223372036854775806
INSERT INTO forever rp_config,idx=5 rp="year",start=2678401000i,end=31622401000i,interval="21600s" -9223372036854775806

In the start and end times I added one extra second (86400000i -> 86401000i) because I noticed when for example selecting the “Last 24 hours” range in Grafana, $__to$__from never was exactly 86400000 milliseconds.

Create the variable in Grafana:

Create $rp variable in Grafana

And use the $rp variable as RP in the queries to create the graph:

Configure $rp in query

There is one caveat with this solution. It only works when the end of the time range is now (current time), for example by selecting a “Quick range” that starts with “Last …”. The query only looks at how long the time range is. Not if the RP contains the full time range. I’ve not been able to achieve this by using the available variables in Grafana like $__from, $__to and $__timeFilter and the possibilities that InfluxQL has. I’ve tried to adjust the query to do something like select rp from rp_config where $__from > now() - "end", but that is not supported by InfluxDB and returns an empty result.

The effect of the caveat is that when you zoom in on older metrics, the $rp variable will select an RP that does not contain the data anymore. When changing the $rp variable manually you can see that less detailed metrics are available in different RPs. For example:

GIF of different retention policies

Result: Less storage required

I monitor 6 systems with collectd in my small home-setup. After configuring the collectd clients to send the metrics to InfluxDB and running this setup without RPs and CQs for a couple of weeks it already required 6 gigabyte of storage. After configuring the RPs and CQs the collectd InfluxDB now uses 72 MB. The RRD files in my previous setup used ~186 MB for these 6 systems.

Free space (var-lib-influxdb)

Grafana Dashboard available

To make things easy I’ve already created a dashboard that uses the same colors and styling as Collectd Graph Panel. It can be downloaded here:

Grafana: CollectD Graph Panel

Measuring Power Consumption with Broadlink SP3S, python, influxdb and grafana

A while ago I was researching the possibilities to measure the power consumption of some devices in my house via Wifi. I came across the Broadlink SP3S Smart Plug. It met my requirements: relatively cheap, power measurement and Wifi. It comes with an IOS and Android App. There a big chance the app is not directly connecting to the SP3S, but to “the Cloud” where the SP3S sends its data to. This is how most companies design their products nowadays. I wasn’t really looking forward to share my power consumption data with Broadlink in “the Cloud”. With the App you can also turn the power on/off, which scares me a little bit. The Broadlink Cloud controlling this power switch. Nah, not for me.

I will explain how I installed the Broadlink SP3S without it making a connection to the internet and show how I use a python script to read the power meter data from the SP3S, store it to InfluxDB and use Grafana to display the collected data in a graph.

Note: When you want to buy a Broadlink SP3S, please make sure you buy the SP3S and not the SP3, which only is a power switch, not a power meter.

Install the Broadlink SP3S

In the step-by-step instructions below I will configure the SP3S to connect to my Wifi so I can connect to it from my local network to retrieve the power meter data. I use a laptop running Linux to connect to initially connect to the SP3S to configure it. I also run a Debian Linux machine as router to control the firewall between the local network and the internet.

  • Plug the SP3S in a wall socket
  • Press the On/Off button for 6 seconds to reset the SP3S. The power button starts blinking rapidly.
  • Press the On/Off button another 6 seconds to enable the Wifi Access Point on the SP3S. The power button blinks rapidly with pauzes.
  • Connect to the Wifi Access Point, it should be called “BroadlinkProv”
  • Look up the MAC address of the SP3S
$ ip neigh dev wlp3s0 lladdr 34:ea:34:79:7b:ff REACHABLE
  • Block the MAC address to access the Internet in the router (I’m using a Debian Linux machine as router). It is important to block the MAC address before connecting the SP3S to your Wifi network so that it will never be able to access the internet.
$ iptables -A FORWARD -m mac --mac-source 34:ea:34:79:7b:ff -j DROP
$ ip6tables -A FORWARD -m mac --mac-source 34:ea:34:79:7b:ff -j DROP
$ git clone
$ cd python-broadlink
$ python3 -m venv venv
$ . venv/bin/activate
$ pip3 install pyaes
$ mkdir lib
$ ln -s broadlink lib/broadlink
$ python3
Python 3.5.3 (default, Sep 27 2018, 17:25:39)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import broadlink
>>> broadlink.setup('myssid', 'mynetworkpass', 3)
  • Now you will get disconnected from the SP3S Wifi Access Point. The SP3S will connect to the Wifi network configured above

When this firewall rule is added to the router as well, you will see that the SP3S immediately tries to connect to the internet.

$ iptables -I FORWARD -m mac --mac-source 34:ea:34:79:7b:ff -j LOG --log-level debug --log-prefix "Broadlink: "

$ tail /var/log/syslog
Broadlink: IN=eth1 OUT=eth0 MAC=e0:69:95:73:10:bf:34:ea:34:79:7b:ff:08:00 SRC= DST= LEN=76 TOS=0x00 PREC=0x00 TTL=63 ID=258 PROTO=UDP SPT=16404 DPT=16384 LEN=56
Broadlink: IN=eth1 OUT=eth0 MAC=e0:69:95:73:10:bf:34:ea:34:79:7b:ff:08:00 SRC= DST= LEN=76 TOS=0x00 PREC=0x00 TTL=63 ID=259 PROTO=UDP SPT=16404 DPT=1812 LEN=56
Broadlink: IN=eth1 OUT=eth0 MAC=e0:69:95:73:10:bf:34:ea:34:79:7b:ff:08:00 SRC= DST= LEN=76 TOS=0x00 PREC=0x00 TTL=63 ID=260 PROTO=UDP SPT=16404 DPT=8080 LEN=56
Broadlink: IN=eth1 OUT=eth0 MAC=e0:69:95:73:10:bf:34:ea:34:79:7b:ff:08:00 SRC= DST= LEN=76 TOS=0x00 PREC=0x00 TTL=63 ID=261 PROTO=UDP SPT=16404 DPT=80 LEN=56
Broadlink: IN=eth1 OUT=eth0 MAC=e0:69:95:73:10:bf:34:ea:34:79:7b:ff:08:00 SRC= DST= LEN=76 TOS=0x00 PREC=0x00 TTL=63 ID=262 PROTO=UDP SPT=16404 DPT=8090 LEN=56
Broadlink: IN=eth1 OUT=eth0 MAC=e0:69:95:73:10:bf:34:ea:34:79:7b:ff:08:00 SRC= DST= LEN=76 TOS=0x00 PREC=0x00 TTL=63 ID=263 PROTO=UDP SPT=16404 DPT=16384 LEN=56
Broadlink: IN=eth1 OUT=eth0 MAC=e0:69:95:73:10:bf:34:ea:34:79:7b:ff:08:00 SRC= DST= LEN=76 TOS=0x00 PREC=0x00 TTL=63 ID=264 PROTO=UDP SPT=16404 DPT=1812 LEN=56
Broadlink: IN=eth1 OUT=eth0 MAC=e0:69:95:73:10:bf:34:ea:34:79:7b:ff:08:00 SRC= DST= LEN=76 TOS=0x00 PREC=0x00 TTL=63 ID=265 PROTO=UDP SPT=16404 DPT=8080 LEN=56
Broadlink: IN=eth1 OUT=eth0 MAC=e0:69:95:73:10:bf:34:ea:34:79:7b:ff:08:00 SRC= DST= LEN=76 TOS=0x00 PREC=0x00 TTL=63 ID=266 PROTO=UDP SPT=16404 DPT=80 LEN=56
Broadlink: IN=eth1 OUT=eth0 MAC=e0:69:95:73:10:bf:34:ea:34:79:7b:ff:08:00 SRC= DST= LEN=76 TOS=0x00 PREC=0x00 TTL=63 ID=267 PROTO=UDP SPT=16404 DPT=8090 LEN=56

Let’s try to find out what the destination IP addresses are by using tcpdump.

$ tcpdump -ni eth1 host and port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
12:38:22.460717 IP > 0+ A? (44)
12:38:22.460870 IP > 0 1/0/0 A (60)
12:38:38.480835 IP > 0+ A? (46)
12:38:38.480962 IP > 0 1/0/0 A (62)

So the SP3S immediately tries to contact ( and ( on ports 16384, 1812, 8080, 80 and 8090 once it has a network connection. The iptables DROP rules in my router block this traffic. ūüôā

Using broadlink_cli to retrieve meter data

Using “broadlink_cli” from python-broadlink the current energy consumption can be retrieved from the SP3S. To make “broadlink_cli” work, some things need to be modified when using the cloned git repository as python library.

Create a symlink to “broadlink_cli”:

$ ln -s cli/broadlink_cli

Edit “broadlink_cli” and change this:

import broadlink
import sys


import sys
sys.path.insert(0, './')
import broadlink

Retrieve the current usage from the SP3S using “broadlink_cli”:

$ ./broadlink_cli --type 0x947a --host --mac 34ea34797bff --energy

Turn on the power:

$ ./broadlink_cli --type 0x947a --host --mac 34ea34797bff --turnon
== Turned * ON * ==

Store the power meter data to InfluxDB

The python script below reads the power consumption every 30 seconds from the SP3S and stores it to InfluxDB.

#!/usr/bin/env python3
import sys
import time
import datetime
from influxdb import InfluxDBClient

sys.path.insert(0, './')
import broadlink

name = '<NAME>' # What is the SP3S connected to?
type = int(0x947a) #
host = ''
mac = bytearray.fromhex('34ea34796e9c') # The MAC address of the SP3S (without colons!)

dev = broadlink.gendevice(type, (host, 80), mac)
influx_client = InfluxDBClient('<INFLUXDB_HOSTNAME>', 8086, '<USERNAME>', '<PASSWORD>', '<DATABASE>')

def get_data():
    return dev.get_energy()

def influx(value):
    if value is None:
    json_body = [
            "measurement": name,
            "fields": {
                "usage": float(value),
        print('Failed to write to influxdb')

while True:
    except Exception as err:
        print('Error: %s' % str(err))

Graphing the result in Grafana

In grafana use this configuration for the graph. Replace <NAME> with the name that is in the script.

Some interesting results

Measuring the power usage of several devices gives interesting insight in what a device is actually doing power-wise. Some examples are below.

The washer consumes around 2200 Watt at the beginning of a ~1:45h, 40¬įC program. And at the end about 500 Watt to centrifuge to dry the clothes a little bit.
The washer consumes 2200 Watt a bit longer in case of a ~1:45h, 60¬įC program.
My washer is actually a wash-dry combination. When starting the dry program after a ~1:45h, 40¬įC program you see that drying consumes even more energy than washing.
The fridge consumes around 80 Watt about 30% of the time too keep the fridge cool. When you look good you actually see 3 mini-spikes in the morning where I opened the fridge and the light turned on.
The electric heatpump starts heating the 150 liter hot water tank at 23:00. It ramps up to 1250 Watt. It starts exactly when electricity switches to low tariff, smart ūüôā The heatpump also heats the house and tries to keep it around one temperature level. This is the most power efficient for a well isolated house they say. The heatpump is consuming 700 Watt for this continuously when it gets colder in the house during the night.
When it gets too warm in the house the heatpump also has the ability to cool. This is less power consuming than heating.
And sometimes this heating/cooling system is just stupid. During the day it is too warm and the same night it is too cold.

Routed IPTV via a Debian router (XS4ALL or KPN)

At home my FTTH Internet connection is provided by XS4ALL. They provide a FRITZ!Box router to connect to the Internet. Instead of using the FRITZ!Box I’ve always used my own Debian GNU/Linux machine to route traffic to the internet.

The XS4ALL uplink has 2 VLANs:

  • VLAN4: TV (bridged, RFC1483)
  • VLAN6: PPPoE IPv4 + IPv6 internet connection

My XS4ALL uplink is connected to a managed switch. My Motorola 1963 TV Receiver is directly connected to an untagged VLAN4 port on my switch. This way the TV Receiver is directly connected to the TV platform on OSI Layer 2.

Recently I got a letter from XS4ALL saying that this setup is going to change. The TV Receiver can not be connected to the TV platform directly anymore, but needs to be part of the internal network. This adds the ability to support Internet services (like Youtube, Netflix, etc.) on the TV Receiver.

Current setup

In my current setup the upstream connection is connected to a managed switch. VLAN4 and VLAN6 are tagged on this switchport. The TV Receiver is connected to an untagged VLAN4 switchport. It can directly communicate with the TV platform. The Debian Router is connected to a tagged VLAN6 switchport for internet access and a tagged VLAN1 switchport for the local network. Devices on the local network connect to the Internet via the Debian Router on VLAN1.

New setup

In the new setup the TV Receiver is not in untagged VLAN4 anymore. Instead VLAN4 is now tagged on the switchport of the Debian Router as it will function as a gateway to the TV Platform. I created VLAN104 in which the TV Receiver will be. It’s also possible to create a setup where the TV Receiver is in VLAN1, but my Managed Switch currently doesn’t support IGMP Snooping. The result of that would be that if you are watching TV, all other devices in VLAN1 also receive the IPTV multicast traffic.

Layer 2 / Layer 3 view

In a more detailed view, leaving out the physical hardware, it looks like the diagram below. Local devices on VLAN1 access the Internet through the Debian Router, which routes the traffic to VLAN6. The TV Receiver on VLAN104 accesses the TV Platfrom through the Debian router, which routes it to VLAN4. The Debian Router runs an igmpproxy to route Multicast Traffic (IPTV) from VLAN4 to VLAN104. The red arrow shows that the TV Receiver is now also able to access the Internet for for services like Youtube or Netflix.

How is the Debian Router configured?

First of all the Debian Router has 1 physical interface, 4 VLAN interfaces and 1 PPPoE interface. They are configured in /etc/network/interfaces:

auto eth0
iface eth0 inet manual
    up ip link set up dev eth0
    down ip link set down dev eth0

auto vlan1
iface vlan1 inet manual
    pre-up ip link add link eth0 name vlan1 type vlan id 1
    up ip link set up dev vlan1
    up ip addr add brd + dev vlan1
    down ip addr del dev vlan1
    down ip link set down dev vlan1
    post-down ip link delete vlan1

auto vlan4
iface vlan4 inet manual
    pre-up ip link add link eth0 name vlan4 type vlan id 4
    up ip link set up dev vlan4
    post-up dhclient vlan4
    pre-down dhclient -x
    down ip link set down dev vlan4
    post-down ip link delete vlan4

# Internet (PPPoE)
auto vlan6
iface vlan6 inet manual
    pre-up ip link add link eth0 name vlan6 type vlan id 6
    up ip link set up dev vlan6
    down ip link set down dev vlan6
    post-down ip link delete vlan6

# IPTV (Internal)
auto vlan104
iface vlan104 inet manual
    pre-up ip link add link eth0 name vlan104 type vlan id 104
    up ip link set up dev vlan104
    up ip addr add brd + dev vlan104
    down ip addr del dev vlan104
    down ip link set down dev vlan104
    post-down ip link delete vlan104

auto xs4all
iface xs4all inet ppp
    provider xs4all

The DHCP client configuration in /etc/dhcp/dhclient.conf will request a subnet-mask (option 1), broadcast-address (option 28), routers (option 3) and Classless Static Routes (option 121) on VLAN4:

option rfc3442-classless-static-routes code 121 = array of unsigned integer 8;
interface "vlan4" {
  request subnet-mask, broadcast-address, routers, rfc3442-classless-static-routes;
  send vendor-class-identifier "IPTV_RG";

This will result in the fact that the vlan4 interface will get an IP address and additional routes will be added to the route table of the Debian Router to be able to access the TV Platform:

# ip addr show dev vlan4
5: vlan4@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:1b:21:c3:f8:90 brd ff:ff:ff:ff:ff:ff
    inet brd scope global vlan4
       valid_lft forever preferred_lft forever

# ip route | grep vlan4 dev vlan4 proto kernel scope link src via dev vlan4

Configure /etc/igmpproxy.conf to forward multicast traffic from VLAN4 to VLAN104:

phyint vlan4 upstream  ratelimit 0  threshold 1

phyint vlan104 downstream  ratelimit 0  threshold 1

Make sure IPv4 forwarding is enabled:

# cat /proc/sys/net/ipv4/ip_forward

And configure IPTables to allow the traffic we want to allow:

# allow igmpproxy traffic to the TV Receiver
iptables -A INPUT -i vlan104 -j ACCEPT
iptables -A OUTPUT -o vlan104 -j ACCEPT

# allow dhclient + igmpproxy traffic to the TV Platform
iptables -A INPUT -i vlan4 -d -j ACCEPT
iptables -A OUTPUT -o vlan4 -p udp --dport 68 -j ACCEPT
iptables -A OUTPUT -o vlan4 -p igmp -d -j ACCEPT

# allow TV Receiver traffic to the TV Platform and apply Source NAT
iptables -A FORWARD -i vlan104 -o vlan4 -j ACCEPT
iptables -A FORWARD -i vlan4 -o vlan104 -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A FORWARD -i vlan4 -o vlan104 -p udp -d -j ACCEPT
iptables -t nat -A POSTROUTING -o vlan4 -j MASQUERADE

# allow TV Receiver traffic to the internet
iptables -A FORWARD -i vlan104 -o ppp0 -j ACCEPT
iptables -A FORWARD -i ppp0 -o vlan104 -m state --state RELATED,ESTABLISHED -j ACCEPT
iptables -t nat -A POSTROUTING -o ppp0 -j MASQUERADE

Download “NPO Radio 2 – Top 2000” in mp3 format

In a marathon program from Christmas to New Year’s eve NPO Radio 2 broadcasts the so called “Top 2000“. A list of the 2000 most popular songs of all time. Because I’m not able to listen all 2000 songs in one go, I like to have them on a USB drive in MP3 format, so that I’m able to listen for example in my car.

The shell script below downloads the full “Top 2000” of 2018 in MP3 format from the official website. 80 MP3 files, ~12GB in size.

set -e

for i in $(seq 25 31); do
    curl -s$i-12-$year | grep '/gemist/uitzending' | cut -d'"' -f 2 | xargs -i echo "{}" | tac >> pages

for p in $(cat pages); do
    curl -s $p | grep broadcaststream | cut -d '"' -f 2 | xargs -i echo "https:{}" >> mp3
# remove the 1st 4 items (00:00-02:00, 02:00-04:00, 04:00-06:00, 06:00-08:00)
tail -n +5 mp3 | sponge mp3

for m in $(cat mp3); do
    curl -OL $m

Safe in-place upgrade to a slim Debian stretch running i3

Debian Stretch was released last month, so it is time to upgrade my laptop. I’m an i3 window manager user. Previously my procedure was to backup /home, reinstall using the network installer (I don’t like apt-get dist-upgrade, I like to start clean) and tick the “Debian Desktop Environment … GNOME” checkbox and after the installer was done, install i3 and the rest of my tools.

While I use some of the tools from Gnome, like gnome-terminal, network manager, nautilus and Eye of Gnome, I do not really need the complete Gnome desktop environment and 100s of software packages that come with it. This time I want a basic system with only the tools I need. And I want to upgrade in-place without losing my Debian jessie install.

Some notes about the installation:

  • It’s a Lenovo Thinkpad T450s
  • I currently have 2 partitions: 1 for /boot and the other is encrypted with luks
  • The encrypted partition contains 3 logical volumes for /, /home and swap
  • A 4th lv (logical volume) will be created for the new root partition for Debian stretch
  • The new lv will be BTRFS formatted and I’ll use a BTRFS subvolume to be able to create snapshots of it
  • The minimal required software will be installed to run the i3 window manager, including some tools I regularly use.

Let’s prepare the root volume:

VG=bento # my lvm2 volume group is called bento
LV=stretch # the new lv will be called stretch
LABEL=stretch # label for btrfs

lvcreate -L10G -n $LV $VG
mkfs.btrfs -L $LABEL /dev/$VG/$LV

mkdir /mnt/$LV
mount /dev/$VG/$LV /mnt/$LV
cd /mnt/$LV
# create the root subvolume
btrfs subvolume create @
cd -
umount /mnt/$LV
# mount the subvolume instead
mount -o subvol=/@ /dev/$VG/$LV /mnt/$LV

The boot partition (/boot) will be reused/shared between the current Debian jessie install and the new stretch install. Because I still use Debian jessie daily for my work, I still want to be able to boot jessie as a fallback. To see what happens to /boot/grub and especially /boot/grub/grub.cfg I will make a git repository in /boot/grub.

cd /boot/grub
git init
git add -A .
git commit -am 'grub at the time jessie was still installed'

Let’s start with the install:

/usr/sbin/debootstrap --include udev,openssh-server,linux-image-amd64 stretch /mnt/$LV

# the new system needs to know about the encrypted partition, so copy crypttab
cp /etc/crypttab /mnt/$LV/etc/crypttab

# configure /mnt/$LV/etc/fstab
# example contents (replace $UUID with the uuid of /boot, replace $LABEL with the btrfs label)
UUID=$UUID /boot ext4 defaults 0 2
LABEL=$LABEL / btrfs subvol=/@,defaults,noatime 0 0
LABEL=$LABEL /btrfs-root btrfs subvol=/,defaults,noatime 0 0
# optionally add the existing mountpoint for /home

# The root of the BTRFS filesystem is mounted at `/btrfs-root`. From here we can manage the subvolumes and snapshots
mkdir /mnt/$LV/btrfs-root

# chroot into the new system
mount -o bind /boot /mnt/$LV/boot/
mount -o bind /dev /mnt/$LV/dev/
mount -t proc proc /mnt/$LV/proc
mount -t sysfs sys /mnt/$LV/sys
chroot /mnt/$LV

# set a root password
# create a normal user account
adduser pommi

# we do not want recommended packages to be installed automatically
echo 'APT::Install-Recommends "false";' > /etc/apt/apt.conf.d/00InstallRecommends

# stable and security updates
cat > /etc/apt/sources.list <<EOT
deb stretch main contrib non-free
deb stretch-updates main contrib non-free
deb stretch/updates main contrib non-free
apt-get update
apt-get upgrade

# install some basics
apt-get install cryptsetup lvm2 locales busybox less grub-pc git vim-nox initramfs-tools btrfs-progs

# set the default locale (to for example en_US.UTF-8)
dpkg-reconfigure locales

# set the timezone
dpkg-reconfigure tzdata

The basics are done. Now install the Desktop Environment. To graphically login after booting I chose lightdm, which is a lightweight display manager. Gnome comes with gdm (Gnome Display Manager), but that installs ~87 another software packages I don’t want.

# suckless-tools for dmenu, x11-xserver-utils for xrandr
apt-get install lightdm i3 i3status suckless-tools xserver-xorg x11-xserver-utils

# networking, including wifi, gnome-keyring to store wifi passwords
apt-get install network-manager-gnome firmware-iwlwifi firmware-linux gnome-keyring

# and a terminal and a browser
apt-get install gnome-terminal firefox-esr

The system is now ready. Let’s check the changes in /boot/grub before we reboot.

cd /boot/grub
git status
git diff
# commit the changes
git add -A .
git commit -m 'after installing stretch'

In my case I had (at least) 2 kernels present in /boot. An active one for jessie (/vmlinuz-3.16.0-4-amd64) and a new one for stretch (/vmlinuz-4.9.0-3-amd64). When update-grub was executed from stretch just now, it changed all the grub menu items to boot into the new stretch system (every “linux” line now contains: root=/dev/mapper/$VG-$LV). To be able to still boot to jessie, I revert the changes for the “linux” lines that are supposed to boot the jessie system (/vmlinuz-3.16.0-4-amd64 kernels).

For example, change this:

linux /vmlinuz-3.16.0-4-amd64 root=/dev/mapper/bento-stretch ro quiet

back to (“root” is my current root lv):

linux /vmlinuz-3.16.0-4-amd64 root=/dev/mapper/bento-root ro quiet

And commit the result:

git add grub.cfg
git commit -m 'boot jessie with the old kernel'

Time to reboot to stretch

# exit the chroot
umount /mnt/$LV/sys
umount /mnt/$LV/proc
umount /mnt/$LV/dev/
umount /mnt/$LV/boot/
umount /mnt/$LV

Lightdm will start and show you a login screen. Login using the normal user you just created an i3 will start.

Additional software and configuration

Lock screen (Ctrl+Alt+l and after 5 minutes):

apt-get install i3lock xautolock
echo 'exec xautolock -time 5 -locker i3lock' >> .config/i3/config
echo 'bindsym Control+$alt+l exec xautolock -locknow' >> .config/i3/config

Start Network Manager Applet on startup

echo 'exec --no-startup-id nm-applet' >> .config/i3/config

Open urls from gnome-terminal in firefox:

apt-get install xdg-utils
xdg-settings get default-web-browser
xdg-settings set default-web-browser firefox-esr.desktop

Collectd Graph Panel v1

v1 is here. CGP is finished ūüėÜ

Joking aside. It has been requested multiple times. So let’s get it over with. The last version was more then 3.5 years ago. This will be the last tagged version of CGP. Every commit in the master branch after this release can be considered as a new release. ūüėČ

Use git and “git pull” to keep up-to-date or download the latest version here.

Notable Changes since v0.4.1:

  • mobile support (responsive design)
  • automatic support for all plugins (markup/styling in json)
  • hybrid graph type (canvas graph on detail page, png on the others)
  • svg graph support
  • support for newer PHP versions
  • deprecate support for collectd 4

Special thanks for this version go to Peter Wu for improving security, Manuel Luis for maintaining jsrrdgraph and Vincent Brillault for his amount of contributions.

Git: git clone

Nagios notifications via Telegram

This post shows you how to use Telegram for Nagios notifications. First create a Telegram Bot by talking to the BotFather. The Telegram Bot will be the sender of the Nagios alerts.


You’ll receive an API token that also includes the UserID of the Bot:

  • Token: 200194008:AAEG6djWC9FENEZaVIo3y3vZm24P3GTMetw
  • UserID: 200194008

Download the script that will send the alerts via Telegram:

wget -O /usr/local/bin/
chmod 755 /usr/local/bin/

This is the configuration you need in Nagios (of course replace the token with your own):

# commands to send host/service notifications
define command {
  command_name     notify-host-by-telegram
  command_line     /usr/local/bin/ --token 200194008:AAEG6djWC9FENEZaVIo3y3vZm24P3GTMetw --object_type host --contact "$CONTACTPAGER$" --notificationtype "$NOTIFICATIONTYPE$" --hoststate "$HOSTSTATE$" --hostname "$HOSTNAME$" --hostaddress "$HOSTADDRESS$" --output "$HOSTOUTPUT$"
define command {
  command_name     notify-service-by-telegram
  command_line     /usr/local/bin/ --token 200194008:AAEG6djWC9FENEZaVIo3y3vZm24P3GTMetw --object_type service --contact "$CONTACTPAGER$" --notificationtype "$NOTIFICATIONTYPE$" --servicestate "$SERVICESTATE$" --hostname "$HOSTNAME$" --servicedesc "$SERVICEDESC$" --output "$SERVICEOUTPUT$"

# 2 example contact definitions
define contact {
  contact_name                    John Doe
  pager                           12345678
  service_notification_commands   notify-service-by-telegram
  host_notification_commands      notify-host-by-telegram
define contact {
  contact_name                    Telegram Group Chat
  pager                           -23456789
  service_notification_commands   notify-service-by-telegram
  host_notification_commands      notify-host-by-telegram

The Telegram Nagios plugin is able to send alerts to a single contact or to a group chat. As you can see Telegram GroupIDs are negative numbers.

How to get your UserID or GroupID?

Download and install this Telegram CLI: The CLI makes it easier to discover your UserID and GroupIDs.

$ telegram-cli
> get_self
User John Doe @johndoe (#12345678):
        phone: XXXXXXXXXXX
        offline (was online [2016/03/15 11:57:46])

There is your UserID (#12345678). First start a conversation with the Bot you just created to be able to receive messages (Nagios alerts) from the Bot and to be able to invite it to a Telegram group chat.

To receive Nagios alerts in a Telegram group chat, create a group chat and invite the Bot. You need at least 2 other users in the group.

$ telegram-cli
> create_group_chat "Nagios Alerts" user#200194008 user#12345678 user#33333333
[21:28]  Nagios Alerts John Doe created chat Nagios Alerts. 3 users

> chat_info Nagios_Alerts
Chat Nagios Alerts updated photo admin members
Chat Nagios Alerts (id 23456789) members:
                Nagios Bot invited by John Doe at [2016/03/08 21:28:59]
                John Doe invited by John Doe at [2016/03/08 21:28:59] admin

There is the GroupID (id 23456789) of the Nagios Alerts group chat which needs to be configured in the Nagios configuration as a negative number (-23456789).

Let’s send some test messages! --token 200194008:AAEG6djWC9FENEZaVIo3y3vZm24P3GTMetw --object_type service --contact "-23456789" --servicestate "OK" --hostname "hostname.domain.tld" --servicedesc "load" --output "OK - load average: 0.02 0.01 0.01" --token 200194008:AAEG6djWC9FENEZaVIo3y3vZm24P3GTMetw --object_type service --contact "-23456789" --servicestate "WARNING" --hostname "hostname.domain.tld" --servicedesc "load" --output "WARNING - load average: 3.48 4.19 2.74" --token 200194008:AAEG6djWC9FENEZaVIo3y3vZm24P3GTMetw --object_type service --contact "-23456789" --servicestate "CRITICAL" --hostname "hostname.domain.tld" --servicedesc "load" --output "CRITICAL - load average: 233.29 154.35 15.05" --token 200194008:AAEG6djWC9FENEZaVIo3y3vZm24P3GTMetw --object_type host --contact "-23456789" --hoststate "UNREACHABLE" --hostname "hostname.domain.tld" --hostaddress "2001:DB8::1" --output "Network Unreachable (hostname.domain.tld)" --token 200194008:AAEG6djWC9FENEZaVIo3y3vZm24P3GTMetw --object_type host --contact "-23456789" --hoststate "DOWN" --hostname "hostname.domain.tld" --hostaddress "2001:DB8::1" --output "PING CRITICAL - Packet loss = 100%" --token 200194008:AAEG6djWC9FENEZaVIo3y3vZm24P3GTMetw --object_type host --contact "-23456789" --hoststate "UP" --hostname "hostname.domain.tld" --hostaddress "2001:DB8::1" --output "PING OK - Packet loss = 0%, RTA = 3.74 ms

And here is the result sunglasses


Upgrade Oracle Java without interrupting a Mendix App

In the “Mendix Cloud” we are hosting thousands of Mendix Apps. All these Apps are running on top of the Oracle Java Runtime Environment (JRE) in Debian Linux environments. We use java-package to package the Oracle JRE to be able to easily redistribute it to all our servers.

After packaging and putting the Debian package in our local apt repository the Oracle JRE can be easily installed via apt-get.

# apt-get install oracle-java8-jre

When there is an update available of the Oracle JRE, we again package the new version and put it in our local apt repository. The update will now be available to all our Debian Linux environments.

# apt-get -V upgrade
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages will be upgraded:
  oracle-java8-jre (8u40 => 8u45)
1 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 39.4 MB of archives.
After this operation, 26.6 kB of additional disk space will be used.
Do you want to continue [Y/n]?

But wait… it doesn’t warn you about it, but do you remember these screens when using Windows or Mac OSX?

javaupdate-windows  javaupdate-mac

This doesn’t mean that this doesn’t apply to Linux. ūüėČ Also on Linux it’s required to restart all java processes. In case of a Oracle JRE update it meant that we had to plan maintenance windows and restart all Mendix Apps while rolling out the update.

A new approach

It would have been much nicer if we could roll out updates without thinking about the Mendix Apps that are currently using the installed Java version. In the Linux universe this is not an unfamiliar issue. Look for example at the Linux kernel. The Linux kernel that is currently running also cannot be replaced or uninstalled. You would run into all kinds of issues regarding kernel modules and libraries that have been changed or removed. Therefore the packaging system is keeping the last X Linux kernels installed including the one you are currently running.

Since Debian 8.0 (Jessie) the apt package (since version contains this file: “/etc/kernel/postinst.d/apt-auto-removal“. This file is executed after the installation (during “postinst“) of each “linux-image*” package. The “apt-auto-removal” script lists all installed kernels and creates an “APT::NeverAutoRemove” list in “/etc/apt/apt.conf.d/01autoremove-kernels” of the 3 most recent versions plus the one that is currently in use in. “linux-image*” packages that are not on that list may be “AutoRemoved“.

For Oracle JRE we can exactly use the same procedure. There are a few requirements:

  1. java-package needs to create versioned packages so we can install multiple versions at the same time.
  2. The oracle-java8uXX-jre package must run an apt-auto-removal script after installation to update an APT::NeverAutoRemove list.
  3. The apt-auto-removal script needs to be in a separate package, because its already required on installation of a oracle-java8uXX-jre package.
  4. We need an oracle-java8-jre-latest dependency package to install the latest oracle-java8uXX-jre package, also so that for example oracle-java8uXX-jre is marked as automatically installed so it can be removed using apt-get autoremove when it’s not on the APT::NeverAutoRemove list.


Versioned packages with java-package

java-package needed to be patched to produce versioned packages. Instead of “oracle-java8-jre” we needed to have “oracle-java8uXX-jre” where XX is the update version number, for example “oracle-java8u45-jre“.

Besides the package name, the package content needed to be installed in a different place. With “oracle-java8-jre” all files are installed in “/usr/lib/jvm/jre-8-oracle-x64/“. This needed to change to “/usr/lib/jvm/jre-8uXX-oracle-x64/“.

Changing 4 lines of bash gave the expected result (

diff --git a/lib/ b/lib/
index cd41772..bc981e1 100644
--- a/lib/
+++ b/lib/
@@ -57,8 +57,8 @@ j2sdk_run() {
     diskfree "$j2se_required_space"
-    j2se_package="$j2se_vendor-java$j2se_release-jdk"
-    j2se_name="jdk-$j2se_release-$j2se_vendor-$j2se_arch"
+    j2se_package="$j2se_vendor-java${j2se_release}u$j2se_update-jdk"
+    j2se_name="jdk-${j2se_release}u$j2se_update-$j2se_vendor-$j2se_arch"
     local target="$package_dir/$j2se_name"
     install -d -m 755 "$( dirname "$target" )"
     extract_bin "$archive_path" "$j2se_expected_min_size" "$target"
diff --git a/lib/ b/lib/
index ecd6d41..b209fcb 100644
--- a/lib/
+++ b/lib/
@@ -42,8 +42,8 @@ j2re_run() {
     diskfree "$j2se_required_space"
-    j2se_package="$j2se_vendor-java$j2se_release-jre"
-    j2se_name="jre-$j2se_release-$j2se_vendor-$j2se_arch"
+    j2se_package="$j2se_vendor-java${j2se_release}u$j2se_update-jre"
+    j2se_name="jre-${j2se_release}u$j2se_update-$j2se_vendor-$j2se_arch"
     local target="$package_dir/$j2se_name"
     install -d -m 755 "$( dirname "$target" )"
     extract_bin "$archive_path" "$j2se_expected_min_size" "$target"

Now we were able to install multiple Oracle JRE versions alongside each other. I thought it was also nice to have a “/usr/bin/java8” symlink, which always points to the latest version. This was also easily implemented:

diff --git a/lib/ b/lib/
index adb3dc2..bdd2b91 100644
--- a/lib/
+++ b/lib/
@@ -124,6 +124,10 @@ fi
 install_no_man_alternatives $jvm_base$j2se_name/jre/lib $oracle_jre_lib_hl
 install_alternatives $jvm_base$j2se_name/bin $oracle_bin_jdk
+if [[ -f "$jvm_base$j2se_name/bin/java" ]]; then
+    update-alternatives --install "/usr/bin/java$j2se_release" "java$j2se_release" "$jvm_base$j2se_name/bin/java" $j2se_priority
 # No plugin for ARM architecture yet
 if [ "${DEB_BUILD_ARCH:0:3}" != "arm" ]; then
@@ -148,6 +152,8 @@ fi
 remove_alternatives $jvm_base$j2se_name/jre/lib $oracle_jre_lib_hl
 remove_alternatives $jvm_base$j2se_name/bin $oracle_bin_jdk
+update-alternatives --remove "java$j2se_release" "$jvm_base$j2se_name/bin/java"
 # No plugin for ARM architecture yet
 if [ "${DEB_BUILD_ARCH:0:3}" != "arm" ]; then
diff --git a/lib/ b/lib/
index 3958ea7..fcc2287 100644
--- a/lib/
+++ b/lib/
@@ -96,6 +96,10 @@ install_alternatives $jvm_base$j2se_name/bin $oracle_jre_bin_jre
 install_no_man_alternatives $jvm_base$j2se_name/bin $oracle_no_man_jre_bin_jre
 install_no_man_alternatives $jvm_base$j2se_name/lib $oracle_jre_lib_hl
+if [[ -f "$jvm_base$j2se_name/bin/java" ]]; then
+    update-alternatives --install "/usr/bin/java$j2se_release" "java$j2se_release" "$jvm_base$j2se_name/bin/java" $j2se_priority
 for b in $browser_plugin_dirs;do
     install_browser_plugin "/usr/lib/\$b/plugins" "" "\$" "\$plugin_dir/"
@@ -114,6 +118,8 @@ remove_alternatives $jvm_base$j2se_name/bin $oracle_jre_bin_jre
 remove_alternatives $jvm_base$j2se_name/bin $oracle_no_man_jre_bin_jre
 remove_alternatives $jvm_base$j2se_name/lib $oracle_jre_lib_hl
+update-alternatives --remove "java$j2se_release" "$jvm_base$j2se_name/bin/java"
 for b in $browser_plugin_dirs;do
     remove_browser_plugin "\$" "\$plugin_dir/"

And the last part regarding java-package was to execute “/etc/oracle-java/postinst.d/apt-auto-removal” after installation:

diff --git a/lib/ b/lib/
index fcc2287..ebebb1f 100644
--- a/lib/
+++ b/lib/
@@ -104,6 +104,10 @@ plugin_dir="$jvm_base$j2se_name/lib/$DEB_BUILD_ARCH"
 for b in $browser_plugin_dirs;do
     install_browser_plugin "/usr/lib/\$b/plugins" "" "\$" "\$plugin_dir/"
+if [ -d "/etc/oracle-java/postinst.d" ]; then
+    run-parts --report --exit-on-error --arg=$j2se_vendor-java${j2se_release}u$j2se_update-jre /etc/oracle-java/postinst.d

apt-auto-removal and APT::NeverAutoRemove

To generate the “APT::NeverAutoRemove” list, we’ve taken the “apt-auto-removal” script from the apt package and modified it to support oracle-java packages:

set -e

# Author: Pim van den Berg <>
# This is a modified version of the /etc/kernel/postinst.d/apt-auto-removal
# script from the apt package to mark kernel packages as NeverAutoRemove.
# Mark as not-for-autoremoval those oracle-java packages that are currently in use.
# We generate this list and save it to /etc/apt/apt.conf.d instead of marking
# packages in the database because this runs from a postinst script, and apt
# will overwrite the db when it exits.

eval $(apt-config shell APT_CONF_D Dir::Etc::parts/d)
test -n "${APT_CONF_D}" || APT_CONF_D="/etc/apt/apt.conf.d"

eval $(apt-config shell DPKG Dir::bin::dpkg/f)
test -n "$DPKG" || DPKG="/usr/bin/dpkg"

if [ ! -e /bin/fuser ]; then
	echo "WARNING: /bin/fuser is missing, could not generate reliable $config_file"


for java_binary in /usr/lib/jvm/*/bin/java; do
	if /bin/fuser $java_binary > /dev/null 2>&1; then
$(dpkg -S $java_binary | sed 's/: .*//')"

versions="$(echo "$java_versions" | sort -u | sed -e 's#\.#\\.#g' )"

generateconfig() {
	cat <<EOF
// DO NOT EDIT! File autogenerated by $0
	for version in $versions; do
		echo "   \"^${version}$\";"
	echo '};'
generateconfig > "${config_file}.dpkg-new"
mv "${config_file}.dpkg-new" "$config_file"

The “java-auto-removal” script will go through all “/usr/lib/jvm/*/bin/java” files and check whether they are in use, using the “/bin/fuser” command. When in use, the package the java binary is part of will be added to the “APT::NeverAutoRemove” list. This list will be written to /etc/apt/apt.conf.d/01autoremove-oracle-java.

Great improvement ūüėÄ

That’s it. We are now able to upgrade Oracle Java while the Mendix App keeps running. Once the Mendix App is stopped and then started by the customer, it will start to use the new version of Java. Once another new Oracle Java update is installed or the “java-auto-removal” script is run, the “APT::NeverAutoRemove” list is updated. After that the Oracle Java version that was in use by the Mendix App before it stopped can be “AutoRemoved“. ūüėÄ

Installing elementary OS on my late 2006 MacBook 2,1

In January 2007 I bought my first OS X MacBookdevice, a white 13.3-inch MacBook, running OS X 10.4 Tiger on a 2.0Ghz Intel Core 2 Duo processor, having 1GB of RAM. Along the way I upgraded it to 2GB RAM and gave it a fantastic boost by replacing the HDD by an Intel 320 Series SSD. I also upgraded OS X to 10.5 Leopard, 10.6 Snow Leopard and in the end to 10.7 Lion. At the same time YouTube started (2008) offering 720p HD videos. Now almost all videos are available in >720p format. What always frustrated me a bit is that this MacBook wasn’t 100% capable to show 720p YouTube videos. It was viewable, but with annoying frame-drop here and there.

Lately I stumbled upon this “Linux Sucks” YouTube video, which showed the enormous growth of elementary OS on in the past years. I was interested. Normally I use Debian with i3 or sometimes Gnome3. But I was interested in this lightweight Ubuntu based OS to replace OS X on my MacBook.

I’d like to explain how I installed elementary OS on my MacBook including full disk encryption.

Creating a MacBook compatible bootable USB stick

First of all I downloaded the 32-bit ISO of the latest elementary OS release (Freya). To be able to boot this ISO from a USB stick on a MacBook, you have to create FAT32 formatted USB stick which contains an EFI/BOOT/ folder, with 2 files in there:

Installing elementary OS

Now you can boot from the USB stick. After a minute or 2 the live CD is started.


Start the installation by clicking the bottom-right CD icon and follow the wizard.

elementary2 elementary3 elementary4 elementary5

When you get a question about unmounting /dev/sdb, just say “No”. /dev/sdb is your USB device. At the “Installation type” screen choose “Something else”.


The interesting part starts now. This screen shows the partition layout of the recognized devices. /dev/sda here is the internal harddrive. In my case it also says INTEL SSD at the bottom. Again ignore /dev/sdb, this is your USB device.

When OS X is installed you have a couple of partitions on the internal harddrive:

  • /dev/sda1: EFI partition, required for booting
  • /dev/sda2: HFS+ partition containing Mac OS X
  • /dev/sda3: An optional ~650MB recovery partition (since OS X 10.7 Lion)

Remove /dev/sda2 and /dev/sda3. Now create 2 new partitions on /dev/sda:

  • a 256MB ext2 partition, this will be the /boot partition
  • fill up the rest with a partition that will be used as “physical volume for encryption

elementary7 elementary8

The installer tries to be smart now, by marking sda3_crypt to be formatted as ext4. Change this partition and configure it to not format it. After that “Quit” the installer.

We just quit the installer, because we want to create 3 partitions in the encrypted sda3_crypt for the root partition (/), swap partition and home partition (/home) using LVM2. This is not possible to configure this via the installer.

elementary9b elementary10b

Open the terminal App. As you can see /dev/sda3 is encrypted and referred to as /dev/mapper/sda3_crypt. Now execute the pvcreate, vgcreate and lvcreate commands. We’ll create a 10GB root partition, 4GB swap partition and the rest is for the home partition. You’ll see it also created some device symlinks in /dev/mapper.

Now run through the installer once again. At the “Installation type” screen choose “Something else” once again.

Now you see that the installer sees the partitions we just created. Configure the partitions:

  • /dev/mapper/apple-home: btrfs partition mounted at /home
  • /dev/mapper/apple-root: ext4 partition mounted at /
  • /dev/mapper/apple-swap: swap partition
  • /dev/sda2: ext2 partition mounted at /boot

elementary11 elementary12 elementary13

Continue clicking “Install Now” and click “Continue” to confirm to write the changes to disk.

elementary14 elementary15 elementary16 elementary17

Follow the wizard until it starts installing elementary OS and wait for a while.

elementary18 elementary19

It will fail to install the bootloader. Continue without a bootloader. After that the installation from the wizard is complete. Choose “Continue Testing” here.

elementary20b elementary21b

Now we have to fix the bootloader manually. Open the Terminal App. Mount the required partitions and chroot into the new elementary OS installation. Install grub-efi-ia32 and run grub-install. Copy a file to /boot/grub/locale/ and run grub-mkconfig to generate the grub configuration.


To make the initial RAMdisk (/boot/initrd.img) aware that /dev/sda3 is an encrypted partition. Put the desired configuration in /etc/crypttab and update the initial RAMdisk.

You can now reboot the MacBook. The funny thing is that elementary OS is snappier then OS X on this MacBook. And it now plays 720p videos flawlessly. ūüėÄ