Jim's Depository

this code is not written
 

I recently needed to install 10 motion detecting cameras on 5 doorways. My first thought was to use ethernet cameras, but at \$300 each that gets expensive. My second plan was to use Linksys WRTSL54GS units running OpenWrt and cheap web cameras with the ‘motion’ package. That would sort of work, but it takes far more CPU than those little boxes have.

The problem is that all that JPEG decoding, motion detection on 640x480 pixels, and JPEG encoding is just too much computation. Fortunately we can be simpler.

Many cheap web cameras have JPEG encoders built in and deliver their images as JPEGs, though sometimes missing a critical table. By keeping a copy of the original camera data it is possible to avoid the JPEG encoding step and just insert the missing data table.

We can do better though. We can avoid most of the decoding and also reduce the data for motion detection by a factor of 64. The key is that luminance of JPEG files is encoded in 8x8 pixel blocks. Each 8x8 pixel block has 64 coefficients which are used to regenerate the 64 pixels of data (more on these coefficients later). The first coefficient is the ‘DC’ coefficient, the average luminosity of the 8x8 block. This is outstanding for motion detection! We get a low pass filter, a factor of 64 data reduction, and all we have to do is not perform a bunch of multiplication and addition in the decoding process.

With a process like the following the tiny router hardware can each support two cameras at 640x480, color, 10 frames per second, motion detection on 5 frames per second.

  1. Read JPEG image from camera over USB.
  2. Hold a copy of the image.
  3. Decode the image enough to extract the DC coefficients of the luminance channel.
  4. Compare to the last frame’s coefficients and decide if there is motion.
  5. If we are going to save the image, then insert the missing JPEG tables into the image if needed and write it out to storage (NFS in my case).
  6. If motion has stopped, then write a sentinel file to independent processes examining the image streams know that the motion event is complete.
  7. Repeat forever.

Astute readers will notice that I can only afford to motion check every other frame with the two camera setup. I’m not happy about this. Essentially all of the CPU time is used in the Huffman decoding of the JPEG data. A long time ago in the age when Vax 8530s roamed my world and I was busy trying to move X-rays and MRIs to radiologists I wrote a state machine based Huffman decoder that could process 3mbits/second of input data, the fastest we could get over Ethernet at the time. Those were 4 MIP machines, these little routers are something like 200MHz. Each camera is generating about 4mbits/second. I have high hopes that this will be doable.

I have other fish to fry and probably won’t get around to the faster huffman code anytime soon. After I run these things for a week or so I’ll release the code in case anyone else wants to read it or use it. I attached the man page in case you wanted to peek at it.

Update: Looks like I have an evil twin in Alexander K. Seewald who has written autofocus software using the AC coefficients. I like his Huffman decoder. I must benchmark it against the slow one I’m currently using to see the range of differences.

Attachments

Hi - Your motion detection scheme is very interesting! I wonder if you have had a chance to develop it further?
Thanks
Steve
sgulick (at) wildlandsecurity.org 
A word about cheap web cameras:

Many webcams are cheap webcams. Unfortunately some of them cost a lot of money. Logitech I have found to be a crap shoot. Some of their camera models are nice devices, others are utter crap sensors. The problem is you can't tell which is which without buying one. After getting burned with a 'pro' model that was built on a terrible imaging element I now buy web cams that are crap, know they are crap, and are priced like they are crap.

My current favorite is the Aiptek Mini PenCam 1.3, which is a 1.3Mpixel camera, maybe if you count the red, green, and blue elements separately and round up… a couple of times. 640x480 pixels, JPEG encoded, 10 frames per second using the gspca drivers in linux. Their autobrightness logic is insane and will drift off to unintelligible pictures over time, but thats ok, I do my own autobrightness. The gspca driver is wrong about how to set and retrieve the brightness, contrast, and saturation parameters, but I fix that. The nice part is that the cameras are $9.99, with a stand, cable, and a handy leatherette carrying pouch that you can throw in the trashcan.

I don't mind a crappy camera that is honest about it.

When writing daemons there is always the question of “How will I check on the status as it runs?“. Solutions include:

  • Have a debug flag and write to stderr.
  • Periodically write a file in a known place.
  • On a particular signal, write a file in a known place.
  • Keep a socket open to write status if someone connects.

I add one more:

  • Have an HTTP interface and let people talk to you with a browser.

libmicrohttpd is just the thing for embedding an HTTP server in your daemon. It adds about 25k to your code, a couple lines to start it and then a dispatching function you write to handle the requests.

It has three models, the least obtrusive of which it is to just let it make some threads and live in its own threads. The others are useful if you can’t tolerate threads, but you have to cooperate to get select() calls made.

I don’t think you’d want to let it take a slashdotting, so either keep it behind a firewall or check the incoming IPs.

It seems to be solid and well written, but I did notice that it raises a SIGPIPE that kills the host application when a connection closes in the middle of a request. (3 word fix in the source. Add MSG_NOSIGNAL to the SEND() calls) That was found in the first minutes of testing, so it doesn’t get the “bullet proof - use and forget” stamp of trust. Maybe after 6 months of daily use it can earn that.

Bleh, timeout was harder than I had hoped. I use a simple watchdog thread per request scheme, but even that takes a mutex and some care to get everything deallocated safely.

Worse, if thread A is in an fgets() on file F when it is phtread_canceled, then when thread B tries to fclose(F) it hangs. I suppose there is a lock inside the FILE *. I punted stdio and just did my input at the socket level. I was already doing output at the socket level to avoid a copy operation.

Now to add some comments, forget all about this code, and move on with the actual problem.
Added the maximum concurrent connection support. It took -3 lines of code. sem_init(), sem_post(), sem_wait() is nicer for this than using pthread mutexes on variables.
Well that wasn't half hard. wc reports 279 lines of code weighing in at 7.5kb source and just under 4k of binary for an HTTP/1.0 and HTTP/1.1 compliant httpd function. (Well, still a few more lines to enforce a maximum concurrent thread limit and a thread timeout so I needn't fear nefarious people… but it is nearly done.)

I thought going in that getting HTTP/1.1 pipelining right was going to be the trickiest part, but on further investigation none of the major browsers use it. Apparently enough web servers screw it up to prevent it.

In the absence of pipelining I decided to forgo keep-alive entirely in favor of simplicity. By careful use of TCP options I only need 3 outgoing packets for each request (up to 1.mumble kbytes). The SYN-ACK, an ACK-Data-FIN, and a FIN-ACK.

An interesting performance issue: Safari shotguns me with many simultaneous connections, to which my httpd responds quickly. If I were supporting keep-alive I think Safari would be encouraged to only use two connections and serialize the requests over them. I wonder which is faster? I may have to add keep-alive support just to answer this question.

Another interesting tidbit: Some people on the web maintain that TCP_QUICKACK and TCP_DEFER_ACCEPT are inherited from the listener socket to the accepted socket. I don't think so. At least the only way I can get QUICKACK turned off is to not use TCP_DEFER_ACCEPT on the listening socket  and slam TCP_QUICKACK off on the accepted socket before the first data arrives. Otherwise I end up sending an ACK before my first data packet.

And a last tidbit: You can keep your TCP_CORK in all the to shutdown(), that gets your FIN piggybacked on your last data packet.

Why are there so many packages ruined by people using autoconfigure and libtool? Just write a simple Makefile and let it be. 

The application where I was going to use libmicrohttpd requires me to crosscompile and after hours of thrashing about I still can't get it to build.

Back to the bit heap with it. I'll write my own httpd code.

I can't help but wonder though, can it be simpler? I'll probably have to read the HTTP 1.1 spec and see how simple a daemon I can write.

openload, which is called openwebload at sourceforge, is a simple URL load tester. Just the right sized tool for checking if your spiffy new web page is efficient enough to be tolerated.

\$ openload localhost 10 URL: http://localhost:80/ Clients: 10 MaTps 355.11, Tps 355.11, Resp Time 0.015, Err 0%, Count 511 MaTps 339.50, Tps 199.00, Resp Time 0.051, Err 0%, Count 711 MaTps 343.72, Tps 381.68, Resp Time 0.032, Err 0%, Count 1111 MaTps 382.04, Tps 727.00, Resp Time 0.020, Err 0%, Count 1838 MaTps 398.54, Tps 547.00, Resp Time 0.018, Err 0%, Count 2385 MaTps 425.78, Tps 670.90, Resp Time 0.014, Err 0%, Count 3072
<You press enter to stop the run> Total TPS: 452.90 Avg. Response time: 0.021 sec. Max Response time: 0.769 sec

There is a Debian package.

Update: I couldn’t leave it alone. The attached patch will add bits/second reports. It is also worth noting that you can add  ”-l number” to sample for a certain number of seconds, and a “-o csv” to put out a CSV line of the data at the end for easier scripting. Perhaps it needs a man page too.

Update: If you are an especially trusting sort of person you could install the Debian package attached and not have to build it yourself.

Update: Best wait on this. There is something odd going on with openload that is giving unstable results for bytes transfered. More diagnosis required.

Attachments

openload.patch 3971 bytes