I recently needed to install 10 motion detecting cameras on 5 doorways. My first thought was to use ethernet cameras, but at \$300 each that gets expensive. My second plan was to use Linksys WRTSL54GS units running OpenWrt and cheap web cameras with the ‘motion’ package. That would sort of work, but it takes far more CPU than those little boxes have.
The problem is that all that JPEG decoding, motion detection on 640x480 pixels, and JPEG encoding is just too much computation. Fortunately we can be simpler.
Many cheap web cameras have JPEG encoders built in and deliver their images as JPEGs, though sometimes missing a critical table. By keeping a copy of the original camera data it is possible to avoid the JPEG encoding step and just insert the missing data table.
We can do better though. We can avoid most of the decoding and also reduce the data for motion detection by a factor of 64. The key is that luminance of JPEG files is encoded in 8x8 pixel blocks. Each 8x8 pixel block has 64 coefficients which are used to regenerate the 64 pixels of data (more on these coefficients later). The first coefficient is the ‘DC’ coefficient, the average luminosity of the 8x8 block. This is outstanding for motion detection! We get a low pass filter, a factor of 64 data reduction, and all we have to do is not perform a bunch of multiplication and addition in the decoding process.
With a process like the following the tiny router hardware can each support two cameras at 640x480, color, 10 frames per second, motion detection on 5 frames per second.
- Read JPEG image from camera over USB.
- Hold a copy of the image.
- Decode the image enough to extract the DC coefficients of the luminance channel.
- Compare to the last frame’s coefficients and decide if there is motion.
- If we are going to save the image, then insert the missing JPEG tables into the image if needed and write it out to storage (NFS in my case).
- If motion has stopped, then write a sentinel file to independent processes examining the image streams know that the motion event is complete.
- Repeat forever.
Astute readers will notice that I can only afford to motion check every other frame with the two camera setup. I’m not happy about this. Essentially all of the CPU time is used in the Huffman decoding of the JPEG data. A long time ago in the age when Vax 8530s roamed my world and I was busy trying to move X-rays and MRIs to radiologists I wrote a state machine based Huffman decoder that could process 3mbits/second of input data, the fastest we could get over Ethernet at the time. Those were 4 MIP machines, these little routers are something like 200MHz. Each camera is generating about 4mbits/second. I have high hopes that this will be doable.
I have other fish to fry and probably won’t get around to the faster huffman code anytime soon. After I run these things for a week or so I’ll release the code in case anyone else wants to read it or use it. I attached the man page in case you wanted to peek at it.
Update: Looks like I have an evil twin in Alexander K. Seewald who has written autofocus software using the AC coefficients. I like his Huffman decoder. I must benchmark it against the slow one I’m currently using to see the range of differences.