“whichotherwasmotoristwhyit…” um…pardon?

This was actually a pretty fun exercise and I got a couple laughs out of it. Maybe someone is trying to make up for the tears in previous weeks…? Whatever the reason, I had a good time testing out the capabilities (and limits) of the the OCR app I used.

I read through the article on recommending OCR apps and chose based on price (free), compatibility (iOS), and capability (functional). I concluded that Office Lens was my best option as I have Office 365 which would allow me to unlock all of the functions.

I had quite a lot of fun seeing what I could do with the app. There are options for capturing documents, whiteboards, business cards, and photos. There is also an ‘actions’ option that can extract text from printed documents and tables, and can read text aloud. I turned a page from an old text book into a word document. The text was accurate and the app even kept the photos! The reading function worked well with that page as well. I did find that high contrast print and pages work best. When I tried to extract text from a newspaper page or a book with pages that would not lay flat, the results were incoherent to say the least. I reference the example in the title…

I chose a newspaper page for my PDF scan. It is from a copy of the Grand Forks Gazette published this last October. My grandma had mailed it to me because it contains an article I wrote for work. I thought it would be useful to have a digital copy for my own records anyways, so this was an excellent opportunity to get that done. The document can be found here. If anyone cares to have a look at the article, you will learn all about the treatment of priority invasive plants in the Regional District of Kootenay Boundary.

Continued:

Unfortunately, I wrote too soon. It seems all was not quite as fine and dandy as I had hoped, but it does appear I have sorted out the problem at last. The first file I uploaded was only a picture and, it turns out, could not be searched for text. I tried three more apps to see if I could search the original scan, but without any more luck than before. I ended up sticking with Office Lens. I scanned the document and saved it into a Word document in the app. I then sent the document to my computer where I re-saved it as a PDF copy. From there, I had no more issues uploading it to my Omeka site.

I chose a textbook page for this second document as it was much easier to capture text from than a newspaper with a more challenging format. Have fun reading page 435, part 5, chapter 19 of Elements of Ecology which you can see here. I have left up my original PDF because it might be interesting and added a photo scan of the second document to compare accuracy. I was very impressed that it kept to photos! You can see all of the documents here.

“whichotherwasmotoristwhyit…” um…pardon?

Comments

Leave a Reply Cancel reply