Not that I know anything about OCR, but you wouldn't really think that machine-printed characters in a TIFF file would be all that difficult to recognize. And for the most part, the Accusoft toolkit does a good job, but it has some problems with characters in a box...if the left size of the box is close to the first number, and the number happens to be -9, it will invariably drop the minus sign. After some experimentation, I found it becomes more accurate if you tell it that the box is in a zone that only numbers are in.
But that led to a fascinating issue when I attempted to run a release build, where the minus sign was still being dropped. My application is written in C++, and there is a ZONE structure exposed to set up the area where numbers are. Eventually, after I created the ZONE on the stack, I added code to initialize the struct in the same way that Visual Studio will initialize it in debug mode:
memset(&zone, 0xcc, sizeof(zone));
And now it works like a charm in release.
I suspected that a problem like this was going to come up. One of the first things I did as I was trying to get the app to work was to set the ZONE to all zeroes before I passed it to the toolkit, and this caused it to fail. I opened a ticket with Accusoft, partly since this is a bug that they should know about, but mostly just out of curiosity to find out what is working when its byte is initialized to 0xCC :)
Technorati tags:
programming
c++
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.