New apps for visually impaired users provide virtual labels for controls and a way to explore images

June 12, 2023
Written By:
Emily France, College of Engineering

With VizLens, users can touch buttons while their phones read out the labels, and Image Explorer provides a workaround for bad or missing alt text

A panel of three screenshots. The first shows the camera's view of a microwave control panel with a finger hovering over the 5. A text overlay reading "5" indicates the audio readout. The second shows a virtual control panel on the smartphone screen, with buttons that users can press to hear the labels. All buttons were correctly read. The third panel shows the interfaces menu, listing Kitchen Microwave as a favorite, with Printer, Fridge, TV remote, Dryer, Washing Machine, Thermostat and Kitchen Microwave as other saved interfaces. Image credit: Human-AI Lab
VizLens uses a smartphone’s camera to view control interfaces, such as the one on this microwave, and read each label. When a user touches the button in the camera’s view, the smartphone can read out the label. Image credit: Human-AI Lab

Visually impaired iPhone users have two new free tools at their disposal, developed by a team now based at the University of Michigan. One can read the labels on control panels while the other identifies features in an image so that users can explore it through touch and audio feedback.

VizLens is essentially a screen reader that can function in the real world. It reads labels at the direction of the user, who points with their fingers at buttons of interest on control panels. With it, users can employ their smartphone cameras to understand and operate a variety of interfaces in their everyday environments, including home appliances and public kiosks.

“A blind user can take a picture of an interface, and we use optical character recognition to automatically detect the text labels. A user can first familiarize themself with the layout on their smartphone touchscreen. Then, they can move their finger on the physical appliance control panel, and the app will speak out the button under the user’s finger,” said Anhong Guo, U-M assistant professor of computer science and engineering, who led the development of both apps.

The second app, ImageExplorer, helps visually impaired individuals better understand the content of images. For this purpose, Guo and his team have integrated a suite of object detection and segmentation models—including Meta’s Detectron2 visual recognition library and Google OCR (optical character recognition) and image analysis models—to enable visually impaired users explore what is in the image and how the different objects relate to one another.

The screenshot shows two women in business dress in the foreground of the photo, and two park benches in the background. One bench has an older woman sitting on it, facing the camera, and the other has two older men who are looking away. ImageExplorer identifies all these in red outline. The wall of greenery, fence, sidewalk and street are not outlined. Image credit: Human-AI Lab
ImageExplorer identifies the people, benches and bags in this photo. It correctly autocaptions the image as “a couple of women walk down a sidewalk.” The app accurately recognizes some clothing types, like skirts, while most tops are simplified to “shirts.” Image credit: Human-AI Lab

Guo’s aim is to offer visually impaired people agency when alt text is missing or incomplete, as AI-generated captions are often not sufficient.

“There are a number of automated caption programs out there that blind people use to understand images, but they often have errors, and it’s impossible for users to debug them because they can’t see the images,” Guo said. “Our goal, then, was to stitch together a bunch of AI tools to give users the ability to explore images in more detail with a greater degree of agency.”

Upon uploading an image, ImageExplorer provides a thorough analysis of the image’s content. It gives a general overview of the image, including the objects detected, relevant tags and a caption. The app also features a touch-based interface that allows users to explore the spatial layout and content of the image by pointing to different areas. ​​

ImageExplorer is unique in the level of detail it provides. It gives users a comprehensive description of the objects in an image, down to the level of what type of clothing a person is wearing and what activities they are engaged in, as well as the position of these objects in the image.

“ImageExplorer helps users understand the content of an image even though they cannot see it,” Guo said.

Hundreds of visually impaired, user-testing participants have experimented with VizLens and ImageExplorer, offering feedback to Guo’s team, which is continuing to develop these tools. First discussed in 2022, ImageExplorer is a much newer concept than VizLens, which made its academic debut in 2016. Some of its details need further refinement—for instance, most tops are simplified to “shirts,” and different tools within ImageExplorer sometimes give conflicting information.

“The accuracy relies on the models we use, and as they improve, ImageExplorer will improve,” Guo said. “In spite of these errors, the results we presented in 2022 show that ImageExplorer enables users to make more informed judgements of the accuracy of the AI-generated captions.”

Guo is also looking forward to the feedback that will come with public deployment.

“We will be able to observe how people use these tools and adapt them to their lives,” he said.

VizLens: Apple App Store, Website, 2016 study

ImageExplorer: Apple App Store, Website, 2022 study

The research is funded by the University of Michigan with additional support from Google.