How to OCR images and printed documents using Office 2007 on Vista

July 29, 2007

onenote Recently we became interested in a problem, one of our writers lacked a scanner but needed to get text out of a document (known as “OCR” or Optical Character Recognition).  There weren’t very many good freeware options, a few paid ones but they were fairly lacking.  All he needed to do was get access to a computer running Vista which had Office 2007 installed which is not hard in our parts.

I’m the one who eventually discovered how to do it relatively painlessly.  There was a tutorial that would have you do some things manually using Office Word 2007 but that was too hard and too time consuming so I found an easier option.  It’s really too bad Word doesn’t have this built in.

You will need Microsoft Office 2007 with OneNote installed.  OneNote has OCR abilities built right in and is accurate, accurate to the point of being scary, accurate.  All you need to do is to save the image with the text you want in it as any standard format though I stuck with the standard JPEG image format, others might work but I wouldn’t wander too far off the path.

Once you have the image with the text you want, it can either be from an image you found online, from a scanned document or from a digital camera.  Insert the image into a new OneNote document, right click the image and select “Copy Text from Picture.”  Depending on the amount of text this could take a while.  Once it has finished, open another new document in OneNote and select Edit > Paste which will paste the text that it gathered from the image into the new document.  That was easy enough, wasn’t it?

