Developer Offer

Try ImaginePro API with 50 Free Credits

Build and ship AI-powered visuals with Midjourney, Flux, and more — free credits refresh every month.

How AI Unlocks Street View For The Visually Impaired

2025-10-30•Unknown•6 minutes read

Accessibility

Innovation

Bridging the Accessibility Gap in Virtual Exploration

Interactive streetscape tools from major mapping services have fundamentally changed how we explore the world virtually. From planning routes to visiting tourist destinations from our homes, these tools are incredibly powerful. However, for the blind and low-vision community, this digital world has remained largely inaccessible. Screen readers cannot interpret the visual data in street view imagery, and descriptive alt text is typically absent.

Now, multimodal AI and advanced image understanding present an opportunity to make these immersive experiences inclusive for everyone. This technology could transform services like Google Street View, with its massive library of over 220 billion images, into a fully accessible tool for exploration and navigation.

A new paper, titled "StreetReaderAI: Making Street View Accessible Using Context-Aware Multimodal AI" and presented at UIST’25, introduces a proof-of-concept prototype that does just that. StreetReaderAI was developed by a team of blind and sighted researchers, taking inspiration from accessible games and navigation tools like Shades of Doom, BlindSquare, and SoundScape.

Key features of StreetReaderAI include:

Real-time, AI-generated descriptions of roads, intersections, and nearby places.
Dynamic conversations with a multimodal AI agent about the surrounding environment.
Accessible navigation controls using voice commands or keyboard shortcuts to pan and move through scenes.

How StreetReaderAI Works: An Audio-First Experience

StreetReaderAI offers an immersive, first-person exploration experience that functions much like a video game where audio is the primary interface. Users can navigate seamlessly using both their keyboard and voice.

By pressing the left and right arrow keys, users can pan their view. As they do, the system provides audio feedback on their heading, such as “Now facing: North.” It also announces if they can move forward or if they are facing a point of interest. To move through the virtual world, the up and down arrows allow for "virtual steps" forward and backward. With each step, StreetReaderAI describes the distance traveled and highlights key geographic details. For longer distances, users can also use "jump" or "teleport" commands.

The Twin AI Engines: Describer and Chat

The intelligence behind StreetReaderAI comes from two core AI subsystems powered by Gemini: AI Describer and AI Chat. Both systems use a combination of static prompts, user profiles, and dynamic information—such as the user's current location, nearby places, and the street view image—to provide context-aware responses.

AI Describer

The AI Describer acts as a real-time scene description tool. It analyzes the current street view image along with geographic data to generate an audio summary. It operates in two modes: a "default" mode focused on navigation and safety for pedestrians, and a "tour guide" mode that offers historical and architectural context. The system also proactively suggests follow-up questions relevant to the scene.

AI Chat

AI Chat expands on the describer's capabilities by allowing users to ask specific questions about their surroundings. Using Google's Multimodal Live API, the agent supports real-time interaction and retains a memory of the user's session. This memory is powerful; with a context window that can hold over 4,000 input images, the AI can recall past locations. A user can walk past a bus stop, turn a corner, and ask, “Wait, where was that bus stop?” The agent can then recall its previous context and provide a precise answer, like “The bus stop is behind you, approximately 12 meters away.”

Putting StreetReaderAI to the Test: User Feedback and Insights

To evaluate the system, an in-person lab study was conducted with eleven blind screen reader users. Participants used the tool to explore various locations and plan walking routes. The feedback was overwhelmingly positive, with an average usefulness rating of 6.4 out of 7. Users praised the interplay between virtual navigation and AI, the seamless chat interface, and the quality of the information provided.

During the study, participants explored over 350 panoramic images and made over 1,000 AI requests. Notably, the AI Chat feature was used six times more frequently than the AI Describer, showing a strong preference for personalized, conversational interaction. While the system was seen as a major accessibility advancement, some users faced challenges with orientation and determining the limits of the AI's knowledge.

The study also provided the first-ever analysis of the kinds of questions blind users ask about street imagery. The four most common categories were:

Spatial orientation (27.0%): Questions about the location and distance of objects, e.g., "How far is the bus stop?"
Object existence (26.5%): Queries about the presence of features like sidewalks or crosswalks.
General description (18.4%): Broad requests like, "What's in front of me?"
Object/place location (14.9%): Questions to find specific things, such as, "Where is the nearest intersection?"

Analyzing the AI's Performance and Accuracy

Given its reliance on AI, response accuracy is crucial. Out of 816 questions asked to the AI Chat during the study:

86.3% were answered correctly.
3.9% were incorrect.
The remainder were either partially correct (3.2%) or the AI refused to answer (6.6%).

Of the incorrect responses, most were false negatives (e.g., saying an object wasn't there when it was), while others were misidentifications (e.g., confusing a speed bump with a crosswalk).

StreetReaderAI is a significant first step toward making streetscape tools universally accessible. The research study highlights the clear demand for this technology and the potential of multimodal AI to meet that need.

Future development could expand on this work in several ways:

Geo-visual Agents: Creating a more autonomous AI agent that could explore on its own to answer questions like, “What’s the next bus stop down this road?”
Route Planning: Enabling the AI to “pre-walk” a route between two points, generating a blind-friendly summary that notes obstacles and identifies key landmarks like building entrances.
Richer Audio Interface: Moving beyond speech to incorporate richer, non-verbal feedback, including spatialized audio and immersive 3D soundscapes generated from the images themselves.

While still a research prototype, StreetReaderAI clearly demonstrates a path toward a more accessible and inclusive digital world.

Read Original Post

Compare Plans & Pricing

Find the plan that matches your workload and unlock full access to ImaginePro.

ImaginePro pricing comparison
Plan	Price	Highlights
Standard	$8 / month	300 monthly credits included Access to Midjourney, Flux, and SDXL models Commercial usage rights
Premium	$20 / month	900 monthly credits for scaling teams Higher concurrency and faster delivery Priority support via Slack or Telegram