DIY Data Capture via Web Cam

Automate data collection with a camera and optical character recognition

4 min read
DIY Data Capture via Web Cam
Photo: Paul Wallich

A while back, I got a cheap wireless LaCrosse brand weather station, with sensors for temperature, humidity, wind speed and direction, and rainfall. The stand-alone display shows weather conditions using seven-segment LCD numerals. The box the station came in promised that it “Connects to your PC!,” which appealed to me because I’d be able to automatically log the data and pass it around my home network. Well, it turns out that this promise could be honored only if I had a PC of the precise operating system and vintage that the station’s USB-to-wireless dongle called for. I’d also need just the right version of some proprietary weather software. Unfortunately, my Macintosh setup met none of these requirements.

Sure, if I had access to an already working installation (and a lot of time), I might have tried reverse engineering the embedded hardware and software. In theory, I could have figured out what the weather-station components were telling the dongle, what the dongle was telling the PC, and what commands might be flowing in the other direction. Then, with some more work and time, I could have reproduced that conversation on a Linux or OS X box. But I didn’t have the access or the time.

So I decided to make do with the station’s stand-alone display. I’ve been learning how to use OpenCV, the comprehensive image-processing library initially created at Intel and now supported mostly by Itseez and the open-source roboticists at Willow Garage. The library is designed to make it easy to extract information about a scene from the raw images coming from a camera. It’s available for many platforms, including Mac.

With such software at hand, and with webcams being cheap to the point of being disposable, I thought it should be a simple matter to take a picture of the screen, extract the text displaying the weather conditions, run the text through an optical character recognition (OCR) system to get data in numeric form, and log the result. Because I needed to make a log entry only every few minutes, I didn’t need a lot of computational horsepower, so I could do the processing using spare background cycles on my Mac.

An early issue was figuring out which of the many possible ways I could use OpenCV to do this most effectively. At first I thought that it would mostly be a matter of correcting the image for distortions caused by my camera’s perspective (my webcam is off to one side to let me still read the display by eye) and uneven lighting. Once I had something to pass to OCR software, the hard work would be done. However, it turns out that almost all free OCR software is designed for printed text, where each letter forms a continuous contour. It’s not at all good for numbers and letters made up of disconnected segments, as in my station’s display.

So then I was entranced by the idea of template matching: comparing small “model” images of digits with those from the webcam feed, seeing where and if they matched, and collating that with the positions of the temperature, wind speed, and other indicators on the display. But that would have meant waiting until all the digits from 0 to 9 appeared on the local weather display at least once so I could save their images to a file. Or I could have drawn sample digits by hand, but OpenCV’s standard template-matching function is unforgiving of mismatches in size or orientation.

Then I found a program called SSOCR, or Seven-Segment Optical Character Recognition, which was developed by Erik Auerswald of the Technical University of Kaiserslautern, in Germany. SSOCR makes it possible to paste one-time-password codes from a security fob into a Web page log-in screen. Optimized for the fob’s single fixed line of six digits, SSOCR turned out to be a bit too specialized. It requires a close-up image under unchanging lighting. However, the light on my weather station’s screen varies depending on the time of day, and the camera has to be far enough back to capture the station’s wider, multiline screen. So I stole some ideas about how to slice up seven-segment images from SSOCR and wrote my own recognizer, based simply on which segments had enough black pixels to be considered “on.”

I still had to correct for the camera’s perspective, which I thought would be easy, as software for solving this exact problem is already available. However, the weather-station display is made of gray plastic, and the LCD is dark gray on light gray, so there aren’t any distinctive points for the corner-finding routines normally used for this kind of correction to lock onto. Finally, I just cut out four little circles of red construction paper and glued them to the display; the correction routines have no trouble finding those. Once an image has been acquired, my OCR routine does its work, and the results are saved into a text file for further processing.

What am I going to do with all my weather data? Over the long term, I’m going to match it to the data from the nearest official weather station, so that I can figure out how the weather there correlates with the weather here. But the first thing I need to do is to create a weather page on my Mac’s personal Web server. Then I can use my tablet or phone to see just how stormy it is outside while I’m still lying in bed at the other end of the house from the station. The easiest way is to just fire up a browser, but with MIT (formerly Google) App Inventor, which allows drag-and-drop assembly of Android apps, I should need just a couple of hours to write a program—once I get yet another development environment set up.

This article originally appeared in print as “Point-and-Shoot Weather Data.”

The Conversation (0)

From WinZips to Cat GIFs, Jacob Ziv’s Algorithms Have Powered Decades of Compression

The lossless-compression pioneer received the 2021 IEEE Medal of Honor

11 min read
Photo: Rami Shlush

Lossless data compression seems a bit like a magic trick. Its cousin, lossy compression, is easier to comprehend. Lossy algorithms are used to get music into the popular MP3 format and turn a digital image into a standard JPEG file. They do this by selectively removing bits, taking what scientists know about the way we see and hear to determine which bits we'd least miss. But no one can make the case that the resulting file is a perfect replica of the original.

Not so with lossless data compression. Bits do disappear, making the data file dramatically smaller and thus easier to store and transmit. The important difference is that the bits reappear on command. It's as if the bits are rabbits in a magician's act, disappearing and then reappearing from inside a hat at the wave of a wand.

Keep Reading ↓ Show less