Android 7.0+ · v1.2.0 · Free forever

Download PhotoLens for Android

The accessible photo gallery that speaks. PhotoLens runs Google Gemma 4 entirely on your device — no internet, no cloud, no account. Install the APK below and start hearing your photographs in under a minute.

SHA-signed APK hosted on GitHub Releases. To install, allow "Install unknown apps" for your browser when prompted.

In this release

What's new in v1.2.0

Full history →

What's New in PhotoLens

All notable changes to PhotoLens are documented here. Versions follow Semantic Versioning.

[1.2.0] — 2026-05-10

Theme: Multi-photo chat, rich sound design, secure vault, full accessibility polish, and a settings overhaul.

New Features

Multi-Photo Chat (URI Preview)

Attach multiple photos in a single chat session — add up to N photos from your gallery or camera directly into a conversation
Horizontal photo strip above the message list shows each attached image with a remove button
Camera capture within AskScreen — take a photo on the spot and send it immediately to the AI
Photo picker integration in AskScreen, same flow as HomeScreen's "New Photo" sheet

Sound Design

Five ambient audio cues — processing loop, reply ding, success chime, page turn, and delete sweep; all non-blocking and run off the main thread
Haptic feedback with configurable sensitivity (0–100%)
All sounds independently togglable per-event in Settings

Secure Vault (Collections)

Per-collection security — lock any album behind biometric or a custom password; vault photos are physically moved to filesDir/secure_photos, not merely hidden
Vault passwords hashed with SHA-256 + 16-byte random salt + 10,000 iterations; raw password is never written to disk
Constant-time password comparison to prevent timing attacks
Security progress bottom sheet with animated progress bar during vault operations
showSecureCollectionsInList setting to control vault visibility in the Collections tab

TTS Reader — Complete Rewrite

ReaderComponent embedded in every chat bubble and OCR result sheet
Reading modes: Characters, Words, Sentences, Paragraphs, Lines — switchable on the fly
Prev / Play-Pause / Next segment controls with TalkBack-labelled state
autoplayOnLoad and stopOnBackground lifecycle hooks
Non-blocking architecture: all TTS work runs on Dispatchers.IO; UI thread is never held
Voice selection, pitch, rate, and engine configurable in Settings

Full Settings Screen

Appearance & Gallery — theme (System/Light/Dark/AMOLED), accent color (6 options + Dynamic Material You), startup screen selection, grid density (2–5 columns), keep screen awake toggle
Advanced Filters — saved per-session; min/max width & height, date range picker, MIME type chip filter
AI Tuning — temperature, top-P, top-K sliders; response length (Brief / Balanced / Detailed / Extremely Detailed); thinking budget; streaming toggle; auto-generate descriptions
Sound & Vibration — per-event toggles for all five sound events plus haptic sensitivity slider
Security — biometric vs. custom-password vault mode; change vault password; disable vault with confirmation dialog
TTS — engine picker (all installed TTS engines), language picker (50+ languages from languages.json), pitch, rate, voice selection
Secure Sharing — strip GPS and full EXIF metadata before sharing
Language — AI response language picker with human-readable names (13+ languages from languages.json)

Onboarding Flow

5-page pager onboarding on first launch, fully TalkBack-navigable
Pages: Welcome, Smart Gallery, AI Features, Accessibility, Privacy & Security
Model download prompt triggered automatically after onboarding if no model is present

ReasoningBlock Component

Expandable "chain-of-thought" block in chat bubbles and photo description — shows the model's internal reasoning before the final answer
Collapsed by default; animates open/close with a chevron

Markdown Renderer

Full MarkdownContent composable with no external dependency
Supports: headings (H1–H4), bold, italic, bold-italic, inline code, code blocks, blockquotes, unordered lists, ordered lists, horizontal rules, links (tappable)

About, Help & Support, Privacy, Terms screens

All rendered via the generic MarkdownScreen composable backed by res/raw/*.md files

Improvements

Smart Collections empty state — contextual message depending on whether any AI descriptions exist; CTA button to go to Settings when descriptions haven't been generated yet
Photo detail screen — pinch-to-zoom + two-finger rotate with spring-back animation; rotation persisted per photo; zoom/rotation reset on navigation
Bulk operations — add to favorites, generate descriptions, recognize text, share, add to collection, delete — all from the selection top bar
OCR bottom sheet — TTS reader embedded; copy + share buttons; streaming progress indicator
ListScreen — collection/album detail with the same sort/filter/bulk-select as HomeScreen
Session memory optimization — "Preparing AI Environment" overlay with polite live-region announcement so TalkBack users know when chat is ready
Scroll-to-bottom FAB in AskScreen — appears when the user scrolls up; animates in/out with fade + scale
Chat export — copy all messages to clipboard or share as plain text
Message regeneration — re-run the last AI turn from the same message index
SpeechRecognizer lifecycle fix — recognizer is properly destroyed when AskScreen leaves composition, preventing a native listener leak after back-navigation

Bug Fixes

GemmaAdapter.processResponse now builds JSON with JSONObject/JSONArray instead of string interpolation — fixes crashes on model output containing quotes, backslashes, or newlines
Vault password stored as SHA-256 hash + salt rather than plaintext — security fix
LiteRtLmManager single-session constraint: properly prevents a second createConversation while one is already open; description session re-opens when chat session ends
TtsManager no longer blocks the UI thread — all TTS calls post to Main via withContext(Dispatchers.Main) from Dispatchers.IO
Photo URI resolution handles both file:// scheme and content:// URIs via ParcelFileDescriptor /proc/self/fd trick

[1.1.0] — 2026-04-14

Theme: Multi-model architecture with model-specific adapters for full control over tokenisation strategy.

New Features

Multi-Model Architecture

ModelAdapter interface — pluggable adapter per model that controls system instruction, content building, conversation config creation, and response post-processing
GemmaAdapter — dedicated adapter for Gemma 4 models; implements tool-call structured JSON response parsing with proper escaping
FastVlmAdapter — adapter skeleton for FastVLM-class models with different tokenisation strategy
Models defined in assets/models.json — adapter field determines which adapter is instantiated at runtime

Model-Specific Behaviour

getSystemInstruction() — per-adapter system prompt customisation
buildAnalysisContent() — per-adapter content construction for photo analysis turns
buildAskContent() — per-adapter content construction for interactive chat turns
createConversationConfig() — per-adapter ConversationConfig including sampler config and tool registration
processResponse() — per-adapter post-processing of raw model output and tool call arguments

`JsonModel` Data Class

New fields: adapter (string, selects adapter class), preferredBackend (overrides global CPU/GPU), toolCall (boolean), memoryMinRequired, memoryRecommended
ModelStatus wraps JsonModel with download state, progress bytes, speed, ETA, and error message

Models Manager

Redesigned ModelConfigScreen — card-per-model layout showing size, RAM requirements, download status, speed, ETA
Download/cancel/delete controls per model; active model shown with a checkmark badge
Model switch triggers full LiteRtLmManager.shutdown() + re-init to ensure correct adapter is loaded

Improvements

LiteRtLmManager session mode enum (DESCRIPTION / CHAT) replaces boolean flag — clearer invariants and easier to extend
Speculative decoding enabled via ExperimentalFlags.enableSpeculativeDecoding = true for faster token generation
ModelDownloadService — foreground service with resume support via HTTP Range header; progress reported via SharedFlow

Bug Fixes

ModelAdapter.processResponse with tool calls no longer crashes on model output that embeds special JSON characters — replaced string interpolation with JSONObject.put() throughout

[1.0.0] — 2026-04-14

Theme: Initial release — on-device AI photo gallery built for accessibility.

New Features

Core AI Integration

On-device AI inference via Google AI Edge LiteRT-LM with Gemma 4 multimodal model
Natural language photo descriptions — full sentences generated by Gemma 4 locally, no internet
Interactive Ask Mode — streaming chat interface to ask any question about a photo
Smart categorisation — photos automatically grouped into Nature, People, Food, Documents, Travel, Architecture, Pets, Sports via tool calls
OCR / text recognition — extract text from any image using the same on-device model
Thinking Mode — chain-of-thought reasoning visible before the final answer
Multilingual output — 13 languages selectable for AI response language

Gallery & Navigation

Grid and list view — togglable; grid supports 2–5 columns
Date-grouped photo timeline with sticky headers
Local Albums via Android MediaStore
Smart Collections — dynamic albums built from AI-generated categories
Favorites — star any photo; dedicated Favorites tab
Bottom navigation: Photos / Collections / Favorites
Full-screen photo detail with share, favorite, rotate, more-menu

Privacy & Security

Zero cloud processing — all AI runs on-device GPU/CPU/NPU
No analytics, no telemetry, no account required
Secure Vault foundation — architecture for secure collections in place

Accessibility Foundation

Full TalkBack semantic labelling on every UI element
WCAG 2.1 Level AA — high contrast, generous touch targets, predictable navigation
Live regions for progress announcements
Voice input for Ask Mode
Built-in TTS with segment-based reading controls

Model Management

Foreground download service for model files (~2.4 GB)
Progress tracking with speed and ETA
GPU / CPU backend selection

Settings

AI backend (GPU / CPU), temperature, response language
Basic gallery preferences (view mode, grid columns, sort order)

Other Screens

About, Help & Support — Markdown-rendered
Privacy Policy and Terms of Use — in-app Markdown

Technical Foundation

MVVM + Repository pattern; single source of truth in PhotoRepository
Jetpack Compose + Material 3 throughout
Hilt dependency injection
Room v9 database for photo metadata and description state
DataStore for persistent preferences
Coil for image loading with HEIC→JPEG auto-conversion

For the full feature list, see features.md.

Capabilities

Everything PhotoLens can do

PhotoLens — Complete Feature Reference

Version 1.2.0 · Android 7.0+ (API 24+) · 100% On-Device · No Cloud · No Account

On-Device AI

Feature	Details
AI Engine	Google AI Edge LiteRT-LM 0.11
Model	Google Gemma 4 E2B (2.41 GB) or E4B (4.56 GB)
Inference	100% on-device — GPU, CPU, or NPU
Internet required	One-time model download only
Photo data sent to cloud	Never

Photo Description

Natural language descriptions generated by Gemma 4 in full sentences (not keywords)
Four response length presets: Brief (1 sentence) · Balanced (≤3 sentences) · Detailed (step-by-step) · Extremely Detailed (exhaustive full-pass)
Technical details mode — includes lighting, resolution, and quality assessment when enabled
Tool-call structured output — description, categories, tags, mood, and technical quality returned as parsed JSON
Auto-generate toggle — describes photos automatically as they scroll into view (bounded queue, one inference at a time)
Manual generate — trigger description for any single photo from the gallery or detail view

Interactive Ask Mode (AI Chat)

Chat interface with streaming token output per message
Attach multiple photos in a single conversation (horizontal photo strip)
Add photos from gallery picker or camera capture without leaving the chat
Regenerate any AI message with one tap
Stop streaming mid-response
Export or copy all messages in a conversation
Session isolation — chat session and description session never overlap; explicit hand-off protocol prevents native engine conflicts

OCR / Text Recognition

Extract printed text, signs, labels, documents, and handwriting from any image
On-device, no cloud OCR
Result displayed in a bottom sheet with copy, share, and TTS reader

Smart Categorisation

Automatic photo categories: Nature · People · Food · Documents · Travel · Architecture · Pets · Sports · Other
Categories stored in the Room database alongside descriptions
Drive Smart Collections — photos categorised as "Food" appear in the Food smart album automatically

Thinking Mode

Chain-of-thought reasoning block visible before the final description or answer
Expandable / collapsible ReasoningBlock component in photo detail and chat bubbles
Configurable thinking budget (0 = fast, higher = deeper reasoning)

AI Settings

Backend: GPU (default) or CPU — model preferredBackend can override the global setting
Temperature (0.0–1.5) — controls output randomness
Top-P (0.0–1.0) — nucleus sampling probability
Top-K (1–100) — token selection pool size
Response language — 13+ languages via assets/languages.json
Enable streaming — toggle token-by-token output vs. single-shot response
Provide technical details — adds lighting/resolution qualifiers to prompts

Multi-Model Support

ModelAdapter interface allows each model to define its own tokenisation strategy, system instruction, content construction, and response post-processing
GemmaAdapter — optimised for Gemma 4 tool-call JSON output with full special-character escaping
FastVlmAdapter — skeleton for FastVLM-class models with different content layout
Model registry in assets/models.json — add new .litertlm models without code changes
Per-model preferredBackend, memoryMinRequired, memoryRecommended, visionSupport, toolCall flags

Gallery

Views & Layout

Grid view — 2, 3, 4, or 5 columns (user-selectable)
List view — single-column with metadata visible
Toggle between grid and list from the toolbar
Date-grouped timeline with sticky date headers in list view
Description preview in gallery cells (toggleable)

Collections & Albums

Local Albums — MediaStore-backed; mirrors device folders automatically
Smart Collections — AI-powered dynamic albums built from photo categories; refresh on demand
Smart collections shown in a 2-column grid with an "AI" badge
Secure collections — lock any album behind vault authentication
Rename any collection from a long-press bottom sheet
Copy all photos from one collection to another
Share all photos in a collection

Favorites

Star/unstar any photo; persisted in Room database
Dedicated Favorites tab in the bottom navigation bar

Search & Filter

Search — live-filter by photo name across the current view
Sort — Date taken (newest/oldest), Date added (newest/oldest), File name (A–Z / Z–A), File size (largest/smallest)
Filters (persistent per session):
- Show / hide videos, screenshots, RAW photos, hidden files
- Group similar photos toggle
- Min / max width and height (pixels)
- Date range picker (from / to)
- MIME type filter chips (JPEG, PNG, WebP, HEIF, GIF, MP4, WebM)
- Reset all filters button

Multi-Select Bulk Operations

Long-press any photo to enter selection mode
Tap additional photos to extend selection
Bulk: Favorite · Generate descriptions · Recognize text · Share · Add to collection · Delete
Contextual selection top bar with photo count

Photo Detail

Full-screen photo viewer
Pinch-to-zoom with spring-back and bounds clamping
Two-finger rotate with spring-back; rotation persisted per photo
Double-tap to reset zoom and rotation
Share (with optional metadata stripping)
Toggle favorite
Rotate 90° (clockwise; persisted)
Add / remove from album
Delete with confirmation dialog
Description display with streaming indicator
Reasoning block (Thinking Mode output)
Ask button — opens AI Chat preloaded with the current photo
Photo metadata sheet (date, dimensions, size, location, MIME type, bucket)

TTS Reader

Available in: chat bubbles, OCR results sheet, photo descriptions.

Setting	Options
Reading mode	Characters · Words · Sentences · Paragraphs · Lines
Autoplay on load	On / Off
Stop on background	On / Off
TTS engine	Any installed Android TTS engine
Language	Full BCP-47 locale picker
Voice	Voice picker filtered by language
Pitch	0.5× – 2.0×
Rate	0.5× – 2.0×

Prev / Play-Pause / Next segment controls
Each button labelled with the current reading mode for TalkBack
Non-blocking: all TTS work runs on Dispatchers.IO; main thread never blocked
Settings cached to skip redundant JNI calls on re-render

Accessibility

PhotoLens was built by a visually impaired developer specifically for visually impaired users. Accessibility is the primary use case, not a feature checkbox.

TalkBack

Every UI element has a precise contentDescription
Every status change (description generated, download complete, session ready) is announced via Polite live regions without requiring focus navigation
Headings marked with Modifier.semantics { heading() } for efficient swipe navigation
Selection mode top bar announces count: "3 photos selected"
Collection cards announce name and photo count

Standards Compliance

WCAG 2.1 Level AA — high-contrast colour schemes, generous touch targets (≥48dp), predictable navigation
Every screen reachable via swipe navigation alone
No time-limited interactions

Voice Input

Voice typing in Ask Mode via Android SpeechRecognizer
Microphone permission properly gated with rationale sheet and Settings deep-link fallback
Recognizer properly destroyed when AskScreen leaves composition (prevents native listener leak)

Other

Onboarding pager fully navigable with TalkBack
Model download progress announced at every percentage update
"Preparing AI Environment" overlay announced so users know when chat is ready
Every settings toggle announces its on/off state
Keep Screen Awake option for users who need extended viewing time

Privacy & Security

Zero Data Collection

No analytics SDK
No crash reporter (no Firebase Crashlytics, Sentry, or similar)
No advertising SDK
No telemetry
No account, no sign-in, no email required

Zero Cloud Processing

AI inference runs on-device GPU/CPU/NPU only
Photos are never uploaded for analysis
No API calls during photo description or chat

Secure Vault

Lock any collection behind biometric (fingerprint / face) or a custom password
Vault photos physically moved to filesDir/secure_photos — inaccessible to other apps without root
On lock: photos removed from MediaStore-visible paths
On unlock: biometric or password challenge presented via BiometricPrompt
Password hashing: SHA-256 + 16-byte random salt + 10,000 iterations
Constant-time comparison prevents timing attacks
Raw password is never written to disk or DataStore

Share Privacy Controls

Strip GPS/location data before sharing
Strip full EXIF device metadata (make, model, software) before sharing
HEIC→JPEG auto-conversion on share (for compatibility without metadata leakage)

Sound & Haptics

Sound	Trigger
Processing loop	AI description or chat inference in progress
Reply	AI message received in chat
Success	Photo description completed
Page turn	TTS reader segment advance
Delete	Photo deleted

Each sound independently toggleable in Settings
Haptic feedback with configurable intensity (0–100%)
Non-blocking: sound playback runs off the main thread via SoundManager

Model Manager

Card per model showing name, version, size, RAM requirements, description
Download / Cancel / Delete controls per model
Live download progress: bytes downloaded, total size, speed (KB/s), ETA
Download runs in a ForegroundService with a persistent notification
Resume support via HTTP Range header (recovers from interrupted downloads)
Active model shown with a checkmark; switching model triggers full AI engine shutdown and re-init
Backend override per model (e.g., a CPU-preferred model ignores the global GPU setting)

Settings Reference

Appearance & Gallery

Setting	Options
Gallery layout	Grid / List
Grid density	2 (Standard) · 3 (Compact) · 4 (Dense) · 5 (Extreme)
Show description in gallery	On / Off
Auto-refresh collections	On / Off
Convert HEIC to JPEG	On / Off
Theme	System / Light / Dark / AMOLED
Accent color	Blue · Green · Purple · Orange · Red · Dynamic
Startup screen	Gallery / Collections / Favorites
Keep screen awake	On / Off
Sort photos by	8 options (see Gallery section)
Group similar photos	On / Off
Stack burst photos	On / Off
Show screenshots	On / Off
Show hidden files	On / Off
Show RAW photos	On / Off
Show videos	On / Off

Advanced Filters (saved)

Min/max width, min/max height, from/to date, file size range, MIME type set.

AI Settings

Setting	Range / Options
AI Backend	GPU / CPU
Current model	Any downloaded model
Response mode	Fast / Reasoning
Thinking budget	0–4096 tokens
Response language	13+ languages
Response length	Brief / Balanced / Detailed / Extremely Detailed
Provide technical details	On / Off
Enable streaming	On / Off
Auto-generate descriptions	On / Off
Temperature	0.0–1.5
Top-P	0.0–1.0
Top-K	1–100

Sound & Vibration

Per-event toggles for: processing loop, success sound, reply sound, delete sound, page-turn sound. Haptic feedback on/off + sensitivity slider.

Secure Sharing

Strip GPS on share · Strip EXIF metadata on share.

Security

Enable secure collections · Show secure collections in list · Encryption mode (Biometric / Custom password) · Change vault password · Disable vault.

TTS

Engine · Language · Pitch · Rate · Voice.

Reader

Reading mode (Characters / Words / Sentences / Paragraphs / Lines) · Autoplay on load · Stop on background · Character chunk size.

Architecture

UI (Jetpack Compose + Material 3)
    ↓ StateFlow / collectAsStateWithLifecycle
ViewModel (HomeViewModel · PhotoViewModel · AskViewModel · SettingsViewModel · ModelConfigViewModel)
    ↓
Repository (PhotoRepository — single source of truth, bounded generation queue)
    ↓                    ↓
Room Database        LiteRtLmManager (single-session AI engine manager)
DataStore Prefs          ↓
                     ModelAdapter (GemmaAdapter / FastVlmAdapter)
                         ↓
                     Google Gemma 4 via LiteRT-LM (100% on-device)

Key Design Decisions

Single-session constraint — LiteRT-LM allows only one Conversation per Engine; enforced via a mutex + SessionMode enum with explicit DESCRIPTION ↔ CHAT hand-off
Bounded generation queue — a Channel consumer serialises auto-describe requests so only one inference runs at a time regardless of scroll speed
Non-blocking TTS — TtsManager reads settings on IO, posts tts.speak() to Main; UI thread never waits
Adapter pattern — ModelAdapter interface lets any .litertlm model plug in without touching LiteRtLmManager

Tech Stack

Layer	Library / Version
Language	Kotlin 2.2
UI	Jetpack Compose + Material 3 (BOM 2024.12)
Navigation	Navigation Compose
State	ViewModel + StateFlow
DI	Hilt 2.57
Database	Room 2.6
Preferences	DataStore 1.1
Photo Access	MediaStore (API 24+ compatible)
AI Inference	Google AI Edge LiteRT-LM 0.11
AI Model	Google Gemma 4 E2B / E4B
Image Loading	Coil 2.7
Networking (download)	OkHttp 4.12
Permissions	Accompanist Permissions 0.36
Biometrics	AndroidX Biometric
Open-source notices	Google OSS Licenses Plugin

Supported Platforms

Requirement	Value
Minimum Android	7.0 (API 24)
Target Android	API 35
Compiled with	SDK 37
Minimum RAM	4 GB (E2B model) · 6 GB (E4B model)
Recommended RAM	4 GB (E2B) · 8 GB (E4B)
Storage required	~2.5 GB (E2B) · ~4.7 GB (E4B)
Physical device	Required (AI requires GPU/NPU)

For version history, see whatsnew.md.

Why this matters

Your help is required to keep PhotoLens free, forever

PhotoLens has no investors, no ads, no subscription, and no data to sell. It is built and maintained by one visually impaired engineer who refuses to put a paywall between blind users and their own photographs.

Every sponsor pays for a real cost — the GitHub releases that host this APK, the Android device farm used for accessibility testing, the screen readers, and the time spent reading every TalkBack bug report. None of it is free to keep running.

If PhotoLens helped you — or could help someone you love — please consider sponsoring. Even one cup of coffee a month keeps this app honest, offline, and ad-free.

Help keep PhotoLens free, private, and independent

PhotoLens is built by one visually impaired engineer, with no investors, no ads, and no plans to ever sell your data — because there is no data to sell. If this work matters to you, your sponsorship pays for the devices, the model testing, and the hours that make accessibility actually accessible.

Sponsor PhotoLens on GitHub

One-time or monthly · Cancel anytime

Sponsor on GitHub Read full changelog

Download PhotoLens for Android

What's New in PhotoLens

[1.2.0] — 2026-05-10

New Features

Multi-Photo Chat (URI Preview)

Sound Design

Secure Vault (Collections)

TTS Reader — Complete Rewrite

Full Settings Screen

Onboarding Flow

ReasoningBlock Component

Markdown Renderer

About, Help & Support, Privacy, Terms screens

Improvements

Bug Fixes

[1.1.0] — 2026-04-14

New Features

Multi-Model Architecture

Model-Specific Behaviour

JsonModel Data Class

Models Manager

Improvements

Bug Fixes

[1.0.0] — 2026-04-14

New Features

Core AI Integration

Gallery & Navigation

Privacy & Security

Accessibility Foundation

Model Management

Settings

Other Screens

Technical Foundation

PhotoLens — Complete Feature Reference

On-Device AI

Photo Description

Interactive Ask Mode (AI Chat)

OCR / Text Recognition

Smart Categorisation

Thinking Mode

AI Settings

Multi-Model Support

Gallery

Views & Layout

Collections & Albums

Favorites

Search & Filter

Multi-Select Bulk Operations

Photo Detail

TTS Reader

Accessibility

TalkBack

Standards Compliance

Voice Input

Other

Privacy & Security

Zero Data Collection

Zero Cloud Processing

Secure Vault

Share Privacy Controls

Sound & Haptics

Model Manager

Settings Reference

Appearance & Gallery

Advanced Filters (saved)

AI Settings

Sound & Vibration

Secure Sharing

Security

TTS

Reader

Architecture

Key Design Decisions

Tech Stack

Supported Platforms

Every photo described here is a moment someone almost lost.

`JsonModel` Data Class