Description

I knocked together this webcam software a few years ago to run on an RPi with a fixed desktop webcam, when a similar package disappeared from Ubuntu. Recently (since late 2018), I experimented with some motion detection. No great research done — I'd just tried ZoneMinder, and wanted to reduce the number of false alarms, particularly from waving leaves and passing clouds. I'm not claiming an original algorithm, nor criticizing ZoneMinder. These are the results.

This software is not integrated with ZoneMinder at this time. I also found it to be a bit too intensive to run on an old RPi. Seems okay at about 8-9fps on an RPi4.

Function

Images are captured using streamer. Continously, it captures 1000 JPEGs to individual files numbered 000 to 999 at (say) 10 frames per second. inotifywait detects these files being closed, and optionally delivers them to jpegtran to rotate them 180° if necessary.

To serve on the Web, a CGI script uses inotify to detect the potentially rotated images, and generates a never-ending multipart sequence of JPEGs. A native C program is used to avoid calling subprocesses like inotifywait, which get left behind when the client terminates the connection, because Apache sends SIGKILL to terminate the CGI script.

For the motion detection, inotifywait is used once again to pick up the latest images, then each image is reduced by ImageMagick convert, and submitted to a small C program.

Motion-detection algorithm

Image reduction
Each frame is stripped of colour, and scaled to a low resolution. The contrast is also normalized.

Stripping the colour simplifies the rest of the algorithm. The pixelization eliminate motion too subtle to cross a pixel boundary. Contrast normalization should help to maintain sensitivity in different light levels; not sure how effective this is.
Elimination of cyclic motion
By default, the last 51 reduced images are retained, so that a running average of the first 50 of those frames can be kept. A running average is kept of the most recent images too, but by default that's just one image anyway.

The averaging of recent frames should help to avoid cyclic changes from triggering detection, like waves, or flapping leaves.
Pixel comparison
One average image is subtracted from the other, and each pixel in the result is compared with one to its right, and the three below it. Each comparison yields a magnitude of zero if the pixels have the same sign, or the negative product of the pixels otherwise (yielding a positive value). For each cell (except those at the edge), a vector is computed as the sum of vectors to each of the eight neighbours, with each contributing vector scaled by the magnitude yielded by the comparison of the corresponding pixels.

The aim is to detect motion as a darkening of one pixel adjacent to one that brightened. A sudden brightening or darkening of the whole image (as a cloud passes or a light turns on) should not by itself yield a vector of high magnitude.
Motion motion
The vector of each cell is compared with the previous vectors of its adjacent cells by taking the dot product of it with each of them. The dot products are summed to give a score for that cell. The mean and standard deviation of all dot-product sums are computed, and a motion score for the entire image is taken as a half of the mean and an eighth of the deviation.

This step aims to avoid false positives due to auto-exposure, as it expects genuine motion itself to move. Auto-exposure as the sky brightens could lead to shadows darkening, and the contrast between the shadow and the lit area could be mistaken for motion, but this ‘motion’ will appear in the same place in adjacent frames.
Recording
The motion score generated after each image submission is compared to a threshold. If not recording, and the score stays above the threshold, a counter increases; otherwise, it decreases. If the counter reaches a threshold of its own, recording starts. In other words, recording starts when the threshold is attained for long enough, but a failure to meet the threshold does not immediately undo earlier detection.

If recording, and the score is above a threshold, the counter is reset to a fixed level, e.g., 10; otherwise, it is decremented. If the counter reaches zero, recording stops. In other words, recording lingers until there is positively no motion for a fixed number of frames.

Recording is achieved (at the point that recording is deemed to have stopped) by submitting the most recent frames (from the point that recording is deemed to have started) to FFmpeg.

Here's a demonstration of the early parts of the algorithm applied to a source sequence originally captured as a motion event by ZoneMinder, with the original image in the top left, its scaled and grey version bottom left, the average of 5 adjacent frames in the bottom right, and the subtraction of adjacent, non-overlapping averages in the top right:

In the top-right quarter, pixels lighter than mid-grey that are adjacent to pixels darker than mid-grey are considered motion. The contrast has been exaggerated to make the biggest indicators of motion stand out. Although lots of differences can be seen over much of the image, very few brightenings are found adjacent to darkenings.

Here's the script I used for the demonstration, heavily relying on ImageMagick:

# Source frames are numbered 00001-capture.jpg to 00055-capture.jpg.
NF=55
# Merge every 5 adjacent frames.
AV=5

# Create small greyscales of each image.
for i in $(seq 1 $NF)
do
    convert "$(printf '%05d-capture.jpg\n' $i)" -scale 16x12 \
            -set colorspace Gray -separate -average -depth 8 \
            "$(printf '%05d-reduced.pgm\n' $i)"
done

# Average adjacent images.
for i in $(seq 1 $((NF+1-AV)))
do
    convert $(printf '%05d-reduced.pgm\n' $(seq $i $((i+AV-1)))) \
            -average $(printf '%05d-average.pgm\n' $((i+AV-1)))
done

# Compare adjacent, non-overlapping average images.  Contrast
# has been exaggerated.
for i in $(seq $AV $((NF-AV)))
do
    convert \( $(printf '%05d-average.pgm\n' $((i+AV))) \
                $(printf '%05d-average.pgm\n' $i) \
                -compose subtract -composite \) \
            \( -size 16x12 canvas:gray50 \) \
            -compose ModulusAdd -composite \
            -brightness-contrast 0x90 \
            $(printf '%05d-diff.pgm\n' $((i+1)))
done

# Make a montage of each high-contrast comparison and
# related sources.
for i in $(seq $((AV+1)) $((NF+1-AV)))
do
    convert \( \( $(printf '%05d-capture.jpg\n' $i) -scale 320x240 \) \
            \( $(printf '%05d-reduced.pgm\n' $i) -scale 320x240 \) \
            -append \) \
            \( \( $(printf '%05d-diff.pgm\n' $i) -scale 320x240 \) \
            \( $(printf '%05d-average.pgm\n' $((i+AV-1))) \
            -scale 320x240 \) -append \) \
            +append $(printf '%05d-montage.jpg\n' $i)
done

Installation

The software needs Binodeps to build:

git clone https://github.com/simpsonst/binodeps /tmp/binodeps
cd /tmp/binodeps
make && sudo make install

Build and install the software:

git clone https://github.com/simpsonst/stecam
cd stecam
cat > stecam-env.mk <<EOF
CFLAGS += -O2 -g
CFLAGS += -std=gnu11
CPPFLAGS += -D_XOPEN_SOURCE=600
CPPFLAGS += -D_GNU_SOURCE=1
CPPFLAGS += -pedantic -Wall -W -Wno-unused-parameter
CPPFLAGS += -Wno-missing-field-initializers
EOF
sudo make install-apt
make && sudo make install

Configuration goes in /etc/stecam.d. Another target creates this directory, and installs a systemd service:

sudo make install-systemd

Each file matching /etc/stecam.d/*.conf defines an instance of the stecam service, e.g., /etc/stecam.d/foo.conf defines stecam@foo. The file is sourced by Bash to specify the settings for a given camera. The defaults are specified in /usr/local/share/stecam/defaults.sh.

Image capture

CAPDIR=/var/run/stecam/capture: Directory for deposit of JPEGs
Note that CAPDIR is unconditionally set to /run/stecam-foo/capture when running as a SystemD unit instance stecam@foo.
DEVICE=/dev/video0: The device to capture from
RATE=10: The capture rate for use with streamer
CAPDIMS=640x480: Image size to capture with streamer

If DEVICE begins with /, it is taken to be a v4l device path. streamer -r $RATE -f jpeg will be used to capture images, dropping them into the capture directory.

If DEVICE begins with rtsp:, it is passed to FFmpeg. This is known to work with at least one commercial IP camera.

Otherwise, it is assumed to be the URL of a multipart/x-mixed-replace stream of image/jpeg, fetched using curl with a custom script to drop each image into the capture directory. Use this to hook into an on-line webcam, for example:

DEVICE=http://webcam.local/videostream.cgi

If you have to include credentials, you could embed them in the URL, as in http://username:password@webcam.local/videostream.cgi, but you'd have to make sure the configuration file is readable only by root. Instead, curl is invoked with --netrc-optional, so you can place credentials in ~/.netrc. For example:

machine webcam.local login username password password

Make sure ~/.netrc is not world- or group-readable.

Image correction

ROTATION (unset): Image rotation in degrees (0, 90, 180, 270), to be applied after capture
LIVEDIMS (unset): Dimensions of the live stream
Set to wxh to scale down the image for live streaming, allowing recording at a higher resolution.
WORKDIR=/var/run/stecam/work: Directory for modified images
Note that WORKDIR is unconditionally set to /run/stecam-foo/work when running as a SystemD unit instance stecam@foo.

Motion-detection parameters

DETDIMS=16x12

Size of image for motion analysis

Unset to disable detection.

DETRATE=$RATE

The average rate of frames submitted to detection

This should not be more than $RATE. It may also be a rational value, e.g., 1/2 to use one frame every two seconds. When a recording is made, it will use all frames captured over the detected period. By setting DETRATE to a lower value, you can reduce the processing overhead, while still recording at the full capture rate.

DETDIR=/var/run/stecam/detect

Deposit for images to be motion-analyzed

Note that CAPDIR is unconditionally set to /run/stecam-foo/capture when running as a SystemD unit instance stecam@foo.

POWER=9

Base-10 power of scale factor on score

Increase to get more digits on the over-all score. If you increase this by 1, multiply the threshold by 10 to compensate.

GATHER=1

MERGE=50

The sum of these is the number of recent reduced images to retain for averaging before comparison. $GATHER identifies the most recent images, so 51 images are retained by default, and the most recent is compared with the average of the others. A high $MERGE should help to deal with repeating oscillations of leaves and branches. Cells which experience high standard deviation contribute less to the over-all difference between two averaged images.

THRESHOLD=400

Threshold above which analysis score implies motion

You can also set a range, e.g., THRESHOLD=100-150. The higher number will be used to trigger recording, i.e., it's the start threshold. The lower is compared against while recording, i.e., the stop threshold.

HESITATE=4

Number of adjacent over-threshold analyses before recording starts

LINGER=10

Number of under-threshold frames before stopping recording

OVERLAY

Filename of a PNG to merge into the detection image to disable sensitivity in certain areas

The overlay will be scaled to fit across the image just before it is scaled down and decoloured. Solid areas will be blanked out, so they won't contribute to motion detection. Use this feature to mask out wavey tree branches, etc.

Note that the mask is not applied to live images or recorded video.

Looks like this actually makes things worse.

To help calibrate these parameters, you can experiment by running:

sudo stecam-capture -f /etc/stecam.d/main.conf

It will start capturing images and performing motion detection, and print out the score for each frame. It will also say when it would be recording, without actually doing it. If the score remains at zero despite activity, increase POWER gradually until you do see a response.

Recording configuration

RECORD=yes

Whether to save recordings at all; clear it to disable recording

MOVDIR=/var/spool/stecam

PREFIX=motion

SUFFIX

CHOWN (unset)

DATEFMT="%Y-%m-%dT%H-%M-%S.%3N%z"

Where to store recordings, and how to name them

Files are saved as $MOVDIR/$PREFIX-$DATE$SUFFIX.mp4, where DATE is set to the output of date "+$DATEFMT". If $CHOWN is set, the file is owned by the specified user.

Expose $MOVDIR on your webserver to make motion captures available. You might also want to run a cronjob to clear out old recordings. For example, to delete videos at 0417 each day that have not been accessed in the last 90 days:

17 4 * * * find /var/spool/stecam/ -name "motion-*.mp4" -atime +90 -delete

FFMPEG_OUT=(-vcodec copy)

Codec parameters for FFmpeg on motion-capture file

Browsers might not be able to play the MJPEG videos created by default, so use a different codec, at the risk of some extra overhead:

FFMPEG_OUT=(-vcodec libx264)

You might have to add other parameters to get the format right, or conserve the quality:

FFMPEG_OUT=(-vcodec libx264 -vf "format=yuv420p" -qscale:v 2 -preset:v slow -tune stillimage)

FONT=Courier

GRAVITY=NorthWest

POINTSIZE=24

Typeface for a label overlain onto the recording showing timestamp and detection score; overlay position; font size

Choose a monospaced font to avoid jittery characters. Get a list of fonts with:

identify -list font | grep '^  Font' | sort | less

See ImageMagick - Command-line options for valid GRAVITY settings.

EMAIL_FROM

EMAIL_TO

PUBPREFIX

Information to build notification emails

All three parameters must be set. $PUBPREFIX must be the URI prefix of files in $MOVDIR, and the leafname derived from $DATEFMT is suffixed to it to from a link to the new recording in the body of the email. Make sure it ends in a slash. $EMAIL_FROM sets the From: header, and $EMAIL_TO specifies the recipient.

Email is sent using sendmail. Make sure you have set this up with the necessary configuration and credentials, without turning the host into an open relay.

The message will contain a header X-Stecam-Capture-Time:, which you can detect with procmail or similar, if you need to do some filtering on it.

Enabling and starting the service

After setting configuration, you can enable and start capture of a particular camera with:

sudo systemctl enable --now stecam@main

Live video

To serve a camera's live feed, create an NPH CGI script such as the following:

#!/bin/sh

exec stecam-serve -f /etc/stecam.d/main.conf

Make sure the script's name is prefixed with nph- to make it an NPH script.

Security

No attempt is made to secure motion captures or live feeds. That's up to you.

To do…

Allow cameras to trigger each other to record. This should be easier now that frames are indexed by timestamp rather than by sequence number, which is accomplished by building a movie from filenames and durations.
Maybe apply aggressive contrast normalization so that night shots are as likely to be yielded as much as day shots. Right now, it seems that dark images just don't produce the necessary contrast to trigger positives.
Maybe retain colour for detection, treating each colour as a 3D vector from black (or white?), and computing the angle between two vectors as the difference. That could be problematic for infrared images that lack much colour variation.