Is Deep Analysis with Textual Content in Natural Scenes for the greater good? Can it be a Privacy Concern?
It is not uncommon to stumble upon references to "Artificial Intelligence "(AI) in articles or on social media and wonder whether this is some new emerging technology. You would be forgiven for thinking this, however, AI has been around for some time dating as far back to the 1950s.
"Optical Character Recognition "(OCR) is another area of AI which was invented in the 1970s by Ray Kurzweil who commercialised the process of converting printed text in almost any font into textual content. These same principles also apply when converting a scanned document into an image with printed textual content
At snapWONDERS we are most interested with textual content in natural scenes. If you are wondering what exactly this means, then it can be best explained with an illustration.
An example of Textual Content in Natural Scenes
Consider the photo below that I took on one of my holiday trips:
If you are from state of Victoria in AUSTRALIA, then it is very likely you recognise the famous landmark depicting the entrance to the Great Ocean Road which is a tourist attraction spot. At the location there are platforms that you can stand upon, pose for a photoshoot with the grand entrance that shows up in the backdrop.
Textual content in natural scenes is shown below where the text content has been marked in the photo in "red" after passing it to the snapWONDERS "analyse tool" for analysis. The textual content is: "GREAT OCEAN ROAD".
If you have concluded – "this is just OCR in an everyday photo"! Then you be correct in thinking along those lines as that is exactly what textual content in natural scenes is all about.
Challenges of Textual Content in Natural Scenes
While many scientists consider that the OCR problem has been solved. The problem of OCR in natural scenes is far from solved. This is because text content in natural scenes has some difficult problems to solve.
Some of the problems are:
- Text content may be unstructured. You would need to decipher textual content in context of where it resides in the natural scene. For example: a photo of a person posing may have text content of a slogan on a T-shirt, with a famous brand name on the person’s cap and a tattoo with the word "BELIEVE" on the forearm. AI would need to not only decipher the text but consider the context in which the text was discovered.
- Text content is often skewed, often written in non-horizontal form, having partial occlusion, and written in varying different font styles.
- To complicate things further, text in natural scenes may have visibility issues such as contrast differences between the textual content and the background, or glares and reflections, or lighting issues as one would expect from natural scenes.
- Other challenges may be where the text content is blurry from camera defects, or simply motion movement or other aberrations in the image itself.
Steps to overcome some of the challenges:
Clearly there are still challenges to be solved. However, there seems to be increasing observation on successes using AI to:
- Do textual localisation on locating the regions where the text content resides within the natural scenes. How all this works using AI and machine learning is a discussion topic for another day.
- Passing the localised text regions to undergo further postprocessing to increase the successes from OCR.
- And lastly, breaking the image into segmentation to classify where the textual content resides. This provides more context on the textual information and a way to structure the information into something more meaningful. How all this works is also a different discussion topic for another day.
Using the above "GREAT OCEAN ROAD" example, such information may be structured in a way that can be described as: A signboard with the word "GREAT OCEAN ROAD" that is overhanging a road with trees and mountains in the background.
Does Textual Content in Natural Scenes provide Deeper Insights for the greater good? Can it be a Privacy Concern?
This is the question that we’re most passionate about at snapWONDERS:
Photos and videos may contain textual content that provides further clues and context on the image under deeper analysis. Often such information is hidden in the background or areas that is not overly obvious. If you are sharing content in a way that you would not have otherwise shared if you had known about the leaked content, then it is a privacy concern.
On the flip side, deeper insights help Accessibility. For someone who is vision impaired or legally blind may not have the means to visually decipher information content directly from the image. Thus, the extra clues and context can greatly assist in the realm of Accessibility.
During January 2022, snapWONDERS has extended the deep analysis to include textual content in natural scenes (See "Deep Image Analysis": https://snapwonders.com/upload/analyse-photo-or-image).
On this front, we will continue to push throughout this year bringing more updates in this realm as our milestones are reached. Stay tuned for the updates.