[RESOLVED] How to make colors different in a picture?
The background scenario for this is explained in the attachment's readme.txt file.
Basically there are PNG images with "price" captions to be extracted as text. These images tend to be fairly monochromatic but the contrast isn't perfect for OCR. Some have dark nearly black text on a light background, some are light nearly white text on a dark background.
Office MODI is used for OCR, and penultimate speed isn't needed. The real requirement was for relatively brief code that could be understood by a low-skill VB6 programmer. I think the "guy" is primarily a PHPer. In any case I got paid.
What I have works and met all requirements and expectations, but the recent thread here about "finding similar colors" made me wonder. Perhaps a better approach to contrast enhancement would help alleviate my concerns about the quality of the result.
Here's what I'm doing in the version I'm toying with:
Code:
Private Sub Enhance(ByRef BmpFileBytes() As Byte)
Dim I As Long
Dim Sum As Integer
'Pixel "bits" are 32-bit RGBA format, leave A alone.
For I = SIZE_HEADERS To UBound(BmpFileBytes) Step 4
Sum = CInt(BmpFileBytes(I)) _
+ CInt(BmpFileBytes(I + 1)) _
+ CInt(BmpFileBytes(I + 2))
If Sum > CONTRAST_WHITE_THRESHOLD Then
BmpFileBytes(I) = 255
BmpFileBytes(I + 1) = 255
BmpFileBytes(I + 2) = 255
ElseIf Sum < CONTRAST_BLACK_THRESHOLD Then
BmpFileBytes(I) = 0
BmpFileBytes(I + 1) = 0
BmpFileBytes(I + 2) = 0
End If
Next
End Sub
Originally I changed all pixels between the two thresholds to [127, 127, 127] (gray) but leaving those pixels alone appears to give better OCR results.
However I really don't like the hackish nature of this part of the program:
Code:
Private Const CONFIDENCE_MINIMUM = 50 'MODI. Range is 0 to 999.
:
For Each Word In .Images(0).Layout.Words
With Word
Debug.Print .RecognitionConfidence, """" & .Text & """"
If .RecognitionConfidence >= CONFIDENCE_MINIMUM Then
If IsNumeric(.Text) Then
Text = Text & .Text & vbNewLine
End If
End If
End With
Next
I.e. I'd prefer to enhance the image quality enough to let me specify a much higher confidence level for filtering out noise hits. And I would really like to then eliminate the awful IsNumeric() test as well, without which I get junk slipping through.
OCR being what it is, and images from the wild being what they are... I'm not sure how much more can be done. I just thought that there might be a better algorithm for contrast enhancement of color images for OCR purposes.
To play with this you'll need Office XP (2002) or later for the MODI library and Windows Vista or later (or XP SP1 or later with WIA 2.0 installed).
The attachment is bulky due to the sample images it contains. You can focus on the OcrClient project and safely ignore the MiniServer project included unless you get curious. Again, see the readme.txt file.
Interesting challenge. Without diving too deeply into your existing code, here's an approach I've used in similar tasks. (In my case, reading dates and diagnostic text from the edges of radiographs and other medical images.)
1) Crop everything you don't need out of the image. In your case, I guess that's just the bottom 60 pixels.
2) Convert the cropped image to grayscale. Color channels tend to just mess up OCR. For best results, use one of the ITU formulas, like
Code:
Gray = (Red * 0.299 + Green * 0.587 + Blue * 0.114)
3) Scan the image once to find the mean color value. (e.g. add up all grayscale values, and divide by the number of pixels.)
4) Use the result of (3) as your contrast threshold. This tends to work well in situations like yours, where you have to cover both light-on-dark and dark-on-light cases. The mean should always fall between the text color and the background color, guaranteeing good separation.
Obviously I'm using "image" a lot here, but once you convert to grayscale, a normal byte array works just fine.
These steps should give you a decent black/white image. Further refinement is definitely possible, but this requires more complicated area-based methods, like hysteresis. A really simple black/white clean-up method works something like this:
1) Scan the black and white image and count occurrences of each color. Assume the lesser of the two colors represents the meaningful data.
2) Scan the image again, focusing only on meaningful colors. (Let's say Black is the meaningful color for this example.) If a black pixel is surrounded by 4 white pixels, turn it to white. This eliminates noise. If your font is always large enough that no letters are 1-pixel wide, you can also turn black pixels to white if they are surrounded by 3 white pixels. This rounds off rough edges on the text, and depending on the OCR engine, can help with identification of characters with similar shapes (e.g. 1 and l).
I've had pretty good results with these steps in my applications. MODI can be pretty hit or miss, but I'm hopeful that just the first set of steps works well enough, so you don't have to mess with additional heuristics and filtering on the b/w image.
Whether you are snorting in scorn or getting visions of riches in your dreams... I didn't really get paid anything for the code above. The payment (and token payment at that) was for spending 3 hours face to face with the "guy" explaining it. Sort of a good will thing attached to a real contract.
My enhancement first attempt produced a 3-color image: black, white, about 50% gray. This wasn't bad but a little dicey since [127, 127, 127] OCRed more reliably than [128, 128, 128] for the middle color. Just a little too arbitrary and sensitive for my tastes.
That's when I went "clamp to black, white, or keep original color." Reliability actually rose considerably! This seems completely counter-intuitive but I have no idea how MODI handles color vs. grayscale and the sparse docs I have don't discuss the point at all. Edit: Perhaps font antialiasing is a factor here?
But I like the idea of a full scan through the cropped portion of the image to identify a middling brightness level, then applying that as the threshold for clamping to 2-color black/white. Twice the work but maybe 4 times the results?
Thanks for the input. I'll play with some things and see.
Last edited by dilettante; Apr 7th, 2015 at 04:13 AM.
My enhancement first attempt produced a 3-color image: black, white, about 50% gray. This wasn't bad but a little dicey since [127, 127, 127] OCRed more reliably than [128, 128, 128] for the middle color. Just a little too arbitrary and sensitive for my tastes.
That's when I went "clamp to black, white, or keep original color." Reliability actually rose considerably! This seems completely counter-intuitive but I have no idea how MODI handles color vs. grayscale and the sparse docs I have don't discuss the point at all. Edit: Perhaps font antialiasing is a factor here?
But I like the idea of a full scan through the cropped portion of the image to identify a middling brightness level, then applying that as the threshold for clamping to 2-color black/white. Twice the work but maybe 4 times the results?
Thanks for the input. I'll play with some things and see.
"Twice the work but 4 times the result" seems to define all MODI interactions, in my experience.
Normally, heavy pre-processing isn't advisable, because it screws with whatever the OCR engine does on its own. MODI seems to be the exception to this. Also, for some reason, the 2010 version is way worse than the 2007 version. Not sure if that's relevant here, but maybe good to know in case you end up doing more of this kind of work.
Font antialiasing isn't a bad guess for MODI complications. One way to minimize AA problems is to double the size of the image, do your preprocessing, then shrink the image back to its original size. This can result in a "cleaner" version of the text, with less impact from antialiasing.
But honestly, most OCR tasks are black magic. Sacrificing animals to the OCR gods is probably more productive than guessing at MODI's inner workings.
I tried a scheme that was cleverer about analyzing brightness and using it in two separate passes. For some images it improved results and for others it degraded results. By tweaking for specific cases I was able to get solid results but haven't figured out an "autotweaking" algorithm. Casting and reading the chicken bones requires a human touch and a lot of luck.
I hope the guy who wanted this has "real world" images better than the ones I cooked up. For whatever reason his actual goals were very hush-hush and I was never given real images to test against. But at least I got paid for those leftover hours at the end of my contract there.
Sometimes the reward for coming in under an estimate is no reward.
Re: [RESOLVED] How to make colors different in a picture?
Originally Posted by dilettante
The background scenario for this is explained in the attachment's readme.txt file.
Basically there are PNG images with "price" captions to be extracted as text. These images tend to be fairly monochromatic but the contrast isn't perfect for OCR. Some have dark nearly black text on a light background, some are light nearly white text on a dark background.
Office MODI is used for OCR, and penultimate speed isn't needed. The real requirement was for relatively brief code that could be understood by a low-skill VB6 programmer. I think the "guy" is primarily a PHPer. In any case I got paid.
What I have works and met all requirements and expectations, but the recent thread here about "finding similar colors" made me wonder. Perhaps a better approach to contrast enhancement would help alleviate my concerns about the quality of the result.
Here's what I'm doing in the version I'm toying with:
Code:
Private Sub Enhance(ByRef BmpFileBytes() As Byte)
Dim I As Long
Dim Sum As Integer
'Pixel "bits" are 32-bit RGBA format, leave A alone.
For I = SIZE_HEADERS To UBound(BmpFileBytes) Step 4
Sum = CInt(BmpFileBytes(I)) _
+ CInt(BmpFileBytes(I + 1)) _
+ CInt(BmpFileBytes(I + 2))
If Sum > CONTRAST_WHITE_THRESHOLD Then
BmpFileBytes(I) = 255
BmpFileBytes(I + 1) = 255
BmpFileBytes(I + 2) = 255
ElseIf Sum < CONTRAST_BLACK_THRESHOLD Then
BmpFileBytes(I) = 0
BmpFileBytes(I + 1) = 0
BmpFileBytes(I + 2) = 0
End If
Next
End Sub
Originally I changed all pixels between the two thresholds to [127, 127, 127] (gray) but leaving those pixels alone appears to give better OCR results.
However I really don't like the hackish nature of this part of the program:
Code:
Private Const CONFIDENCE_MINIMUM = 50 'MODI. Range is 0 to 999.
:
For Each Word In .Images(0).Layout.Words
With Word
Debug.Print .RecognitionConfidence, """" & .Text & """"
If .RecognitionConfidence >= CONFIDENCE_MINIMUM Then
If IsNumeric(.Text) Then
Text = Text & .Text & vbNewLine
End If
End If
End With
Next
I.e. I'd prefer to enhance the image quality enough to let me specify a much higher confidence level for filtering out noise hits. And I would really like to then eliminate the awful IsNumeric() test as well, without which I get junk slipping through.
OCR being what it is, and images from the wild being what they are... I'm not sure how much more can be done. I just thought that there might be a better algorithm for contrast enhancement of color images for OCR purposes.
To play with this you'll need Office XP (2002) or later for the MODI library and Windows Vista or later (or XP SP1 or later with WIA 2.0 installed).
The attachment is bulky due to the sample images it contains. You can focus on the OcrClient project and safely ignore the MiniServer project included unless you get curious. Again, see the readme.txt file.