Postman neural network classifier

Making a smart home less dumb

How smart is smart home really?

So my home has a sensor to tell me when mail has been posted, and a camera pointed to the door step to capture whoever is posting mail. When mail is posted, it captures a picture, fires it off in a message to me over telegram captioned “You’ve got mail!“:

You've got mail

Which means I can check to see if it’s either: a) the postman - and worth going to get the mail or b) a junk mailer - and can be left on the door mat. It’s neat, but not exactly ‘smart’.

What if the house identified the person posting and then decided to tell me whether it’s worth going to get the mail?

Sounds like a problem for a neural network!

Postman training

I’d collected a snapshot every time mail had been posted or the doorbell rung from the last 6 months or so - this amounted to 500 images. I manually went through this and dropped them into one of two folders for ‘positive’ (postie) and ‘negative’ (everyone else).

Sounds tedious, but you can zip through them in a file browser with large thumbnails in no time at all. The classes split 171 positive / 329 negative examples.

I then randomly split these 80% training set / 20% validation set.

It’s not a huge number of images, but as the classification is just into two classes, hopefully it should suffice. As an observation the positive examples tend to be distinctive in that our postie always carries a standard issue Royal Mail red bag, and wears either a light blue shirt, or a reflective overall - so this is what we’re hoping the neural network learns to spot.

Incidentally, our postie is usually the same lovely chap, but when he’s off we’ll get another covering, so hopefully there’s enough variety in the training data that the network doesn’t overfit to only match exactly our usual postie!

Using Keras

Using a convolutional neural network with the great python library Keras (tensorflow backend), I went about training a model.

The input to the neural network was the training data as a tensor downscaled to 168x95 (a multiple of the full size HD images from the camera to minimise quality loss).

The neural network was 3 layers of convolutions/max pooling, a 32 unit dense layer and finally a single unit dense layer with sigmoid activation for output. Since it’s a 2 class output it was fitted using the binary crossentropy loss function.

There was also a dropout before the final layer to help prevent overfitting.

Defined as follows:

img_width, img_height = 168, 95
input_shape = (img_height, img_width, 3)
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32, (3, 3)))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3)))
model.add(MaxPooling2D(pool_size=(2, 2)))



The input images were augmented by rescaling to 1255 (ie 0-1) and zoomed by 0 to 5%.

I used a batch size of 16 and trained for 25 epochs. The final model reached an accuracy of 95% on the validation set - pretty happy with that result!


The next challenge was deploying the model - most my home automation runs on a relatively lowly SBC (single board computer - an Odroid C2). Like the Raspberry Pi this runs an ARM processor and tensorflow support for ARM is lacking - it’s hard finding up to date pre-built binaries and I didn’t really fancy compiling tensorflow from source on this system.

In the end I deployed the classifier wrapped into a simple REST API onto the AWS Lambda serverless platform. This has sufficient grunt and as it’s Intel the processors are better supported. There’s still a bit of pain in getting the whole package under the size limits imposed in Lambda, but that’s for another blog post.

Final result