In my job at
EE Internet, we do a lot of remote data collection, including collection from cameras. Now, the makers of these cameras have figured out how to put the date and time as a graphic in the image itself (
Umiat Airfield) but for some reason they haven't figured out how to embed that same information in the
EXIF Data for that image. Very aggravating.
So, Python to the rescue. I came up with a script that extracts the area given, converts it to black and white, inverts it (because these cameras have light text on dark background, and OCR doesn't seem to like that), and then feeds that file to a command-line OCR program. It then takes that text, parses it for the date, and then prints it out in any date/time format desired. Works quite well.
Here it is in all its glory.
Comments, improvements, critiques, etc., are always welcome. And yes, you can post comments pointing to another program/project/whatever that does what I did and does it way better.
- It requires Python's PIL image library
- It defaults to using the gocr command. This is available in the 'gocr' package in Ubuntu. It also seems to work well with ocrad, which is also available in Ubuntu.
- There should not be any symlink vulnerabilities in this as the exclusive lock is kept on the file while there is data being written to it. Once the data is written, and the file is closed, then it is passed to the OCR program as an input file. Then, it is removed.
UPDATE: Modularized it so you can call it from the command line, or import it into your own Python script and call it that way.