Android 7.0+ · v1.2.0 · Free forever
Download PhotoLens for Android
The accessible photo gallery that speaks. PhotoLens runs Google Gemma 4 entirely on your device — no internet, no cloud, no account. Install the APK below and start hearing your photographs in under a minute.
SHA-signed APK hosted on GitHub Releases. To install, allow "Install unknown apps" for your browser when prompted.
In this release
What's new in v1.2.0
Full history →What's New in PhotoLens
All notable changes to PhotoLens are documented here. Versions follow Semantic Versioning.
[1.2.0] — 2026-05-10
Theme: Multi-photo chat, rich sound design, secure vault, full accessibility polish, and a settings overhaul.
New Features
Multi-Photo Chat (URI Preview)
- Attach multiple photos in a single chat session — add up to N photos from your gallery or camera directly into a conversation
- Horizontal photo strip above the message list shows each attached image with a remove button
- Camera capture within AskScreen — take a photo on the spot and send it immediately to the AI
- Photo picker integration in AskScreen, same flow as HomeScreen's "New Photo" sheet
Sound Design
- Five ambient audio cues — processing loop, reply ding, success chime, page turn, and delete sweep; all non-blocking and run off the main thread
- Haptic feedback with configurable sensitivity (0–100%)
- All sounds independently togglable per-event in Settings
Secure Vault (Collections)
- Per-collection security — lock any album behind biometric or a custom password; vault photos are physically moved to
filesDir/secure_photos, not merely hidden - Vault passwords hashed with SHA-256 + 16-byte random salt + 10,000 iterations; raw password is never written to disk
- Constant-time password comparison to prevent timing attacks
- Security progress bottom sheet with animated progress bar during vault operations
showSecureCollectionsInListsetting to control vault visibility in the Collections tab
TTS Reader — Complete Rewrite
- ReaderComponent embedded in every chat bubble and OCR result sheet
- Reading modes: Characters, Words, Sentences, Paragraphs, Lines — switchable on the fly
- Prev / Play-Pause / Next segment controls with TalkBack-labelled state
autoplayOnLoadandstopOnBackgroundlifecycle hooks- Non-blocking architecture: all TTS work runs on
Dispatchers.IO; UI thread is never held - Voice selection, pitch, rate, and engine configurable in Settings
Full Settings Screen
- Appearance & Gallery — theme (System/Light/Dark/AMOLED), accent color (6 options + Dynamic Material You), startup screen selection, grid density (2–5 columns), keep screen awake toggle
- Advanced Filters — saved per-session; min/max width & height, date range picker, MIME type chip filter
- AI Tuning — temperature, top-P, top-K sliders; response length (Brief / Balanced / Detailed / Extremely Detailed); thinking budget; streaming toggle; auto-generate descriptions
- Sound & Vibration — per-event toggles for all five sound events plus haptic sensitivity slider
- Security — biometric vs. custom-password vault mode; change vault password; disable vault with confirmation dialog
- TTS — engine picker (all installed TTS engines), language picker (50+ languages from
languages.json), pitch, rate, voice selection - Secure Sharing — strip GPS and full EXIF metadata before sharing
- Language — AI response language picker with human-readable names (13+ languages from
languages.json)
Onboarding Flow
- 5-page pager onboarding on first launch, fully TalkBack-navigable
- Pages: Welcome, Smart Gallery, AI Features, Accessibility, Privacy & Security
- Model download prompt triggered automatically after onboarding if no model is present
ReasoningBlock Component
- Expandable "chain-of-thought" block in chat bubbles and photo description — shows the model's internal reasoning before the final answer
- Collapsed by default; animates open/close with a chevron
Markdown Renderer
- Full
MarkdownContentcomposable with no external dependency - Supports: headings (H1–H4), bold, italic, bold-italic, inline code, code blocks, blockquotes, unordered lists, ordered lists, horizontal rules, links (tappable)
About, Help & Support, Privacy, Terms screens
- All rendered via the generic
MarkdownScreencomposable backed byres/raw/*.mdfiles
Improvements
- Smart Collections empty state — contextual message depending on whether any AI descriptions exist; CTA button to go to Settings when descriptions haven't been generated yet
- Photo detail screen — pinch-to-zoom + two-finger rotate with spring-back animation; rotation persisted per photo; zoom/rotation reset on navigation
- Bulk operations — add to favorites, generate descriptions, recognize text, share, add to collection, delete — all from the selection top bar
- OCR bottom sheet — TTS reader embedded; copy + share buttons; streaming progress indicator
- ListScreen — collection/album detail with the same sort/filter/bulk-select as HomeScreen
- Session memory optimization — "Preparing AI Environment" overlay with polite live-region announcement so TalkBack users know when chat is ready
- Scroll-to-bottom FAB in AskScreen — appears when the user scrolls up; animates in/out with fade + scale
- Chat export — copy all messages to clipboard or share as plain text
- Message regeneration — re-run the last AI turn from the same message index
- SpeechRecognizer lifecycle fix — recognizer is properly destroyed when AskScreen leaves composition, preventing a native listener leak after back-navigation
Bug Fixes
GemmaAdapter.processResponsenow builds JSON withJSONObject/JSONArrayinstead of string interpolation — fixes crashes on model output containing quotes, backslashes, or newlines- Vault password stored as SHA-256 hash + salt rather than plaintext — security fix
LiteRtLmManagersingle-session constraint: properly prevents a secondcreateConversationwhile one is already open; description session re-opens when chat session endsTtsManagerno longer blocks the UI thread — all TTS calls post to Main viawithContext(Dispatchers.Main)fromDispatchers.IO- Photo URI resolution handles both
file://scheme andcontent://URIs viaParcelFileDescriptor/proc/self/fdtrick
[1.1.0] — 2026-04-14
Theme: Multi-model architecture with model-specific adapters for full control over tokenisation strategy.
New Features
Multi-Model Architecture
ModelAdapterinterface — pluggable adapter per model that controls system instruction, content building, conversation config creation, and response post-processingGemmaAdapter— dedicated adapter for Gemma 4 models; implements tool-call structured JSON response parsing with proper escapingFastVlmAdapter— adapter skeleton for FastVLM-class models with different tokenisation strategy- Models defined in
assets/models.json—adapterfield determines which adapter is instantiated at runtime
Model-Specific Behaviour
getSystemInstruction()— per-adapter system prompt customisationbuildAnalysisContent()— per-adapter content construction for photo analysis turnsbuildAskContent()— per-adapter content construction for interactive chat turnscreateConversationConfig()— per-adapterConversationConfigincluding sampler config and tool registrationprocessResponse()— per-adapter post-processing of raw model output and tool call arguments
JsonModel Data Class
- New fields:
adapter(string, selects adapter class),preferredBackend(overrides global CPU/GPU),toolCall(boolean),memoryMinRequired,memoryRecommended ModelStatuswrapsJsonModelwith download state, progress bytes, speed, ETA, and error message
Models Manager
- Redesigned
ModelConfigScreen— card-per-model layout showing size, RAM requirements, download status, speed, ETA - Download/cancel/delete controls per model; active model shown with a checkmark badge
- Model switch triggers full
LiteRtLmManager.shutdown()+ re-init to ensure correct adapter is loaded
Improvements
LiteRtLmManagersession mode enum (DESCRIPTION/CHAT) replaces boolean flag — clearer invariants and easier to extend- Speculative decoding enabled via
ExperimentalFlags.enableSpeculativeDecoding = truefor faster token generation ModelDownloadService— foreground service with resume support via HTTP Range header; progress reported viaSharedFlow
Bug Fixes
ModelAdapter.processResponsewith tool calls no longer crashes on model output that embeds special JSON characters — replaced string interpolation withJSONObject.put()throughout
[1.0.0] — 2026-04-14
Theme: Initial release — on-device AI photo gallery built for accessibility.
New Features
Core AI Integration
- On-device AI inference via Google AI Edge LiteRT-LM with Gemma 4 multimodal model
- Natural language photo descriptions — full sentences generated by Gemma 4 locally, no internet
- Interactive Ask Mode — streaming chat interface to ask any question about a photo
- Smart categorisation — photos automatically grouped into Nature, People, Food, Documents, Travel, Architecture, Pets, Sports via tool calls
- OCR / text recognition — extract text from any image using the same on-device model
- Thinking Mode — chain-of-thought reasoning visible before the final answer
- Multilingual output — 13 languages selectable for AI response language
Gallery & Navigation
- Grid and list view — togglable; grid supports 2–5 columns
- Date-grouped photo timeline with sticky headers
- Local Albums via Android MediaStore
- Smart Collections — dynamic albums built from AI-generated categories
- Favorites — star any photo; dedicated Favorites tab
- Bottom navigation: Photos / Collections / Favorites
- Full-screen photo detail with share, favorite, rotate, more-menu
Privacy & Security
- Zero cloud processing — all AI runs on-device GPU/CPU/NPU
- No analytics, no telemetry, no account required
- Secure Vault foundation — architecture for secure collections in place
Accessibility Foundation
- Full TalkBack semantic labelling on every UI element
- WCAG 2.1 Level AA — high contrast, generous touch targets, predictable navigation
- Live regions for progress announcements
- Voice input for Ask Mode
- Built-in TTS with segment-based reading controls
Model Management
- Foreground download service for model files (~2.4 GB)
- Progress tracking with speed and ETA
- GPU / CPU backend selection
Settings
- AI backend (GPU / CPU), temperature, response language
- Basic gallery preferences (view mode, grid columns, sort order)
Other Screens
- About, Help & Support — Markdown-rendered
- Privacy Policy and Terms of Use — in-app Markdown
Technical Foundation
- MVVM + Repository pattern; single source of truth in
PhotoRepository - Jetpack Compose + Material 3 throughout
- Hilt dependency injection
- Room v9 database for photo metadata and description state
- DataStore for persistent preferences
- Coil for image loading with HEIC→JPEG auto-conversion
For the full feature list, see features.md.
Capabilities
Everything PhotoLens can do
PhotoLens — Complete Feature Reference
Version 1.2.0 · Android 7.0+ (API 24+) · 100% On-Device · No Cloud · No Account
On-Device AI
| Feature | Details |
|---|---|
| AI Engine | Google AI Edge LiteRT-LM 0.11 |
| Model | Google Gemma 4 E2B (2.41 GB) or E4B (4.56 GB) |
| Inference | 100% on-device — GPU, CPU, or NPU |
| Internet required | One-time model download only |
| Photo data sent to cloud | Never |
Photo Description
- Natural language descriptions generated by Gemma 4 in full sentences (not keywords)
- Four response length presets: Brief (1 sentence) · Balanced (≤3 sentences) · Detailed (step-by-step) · Extremely Detailed (exhaustive full-pass)
- Technical details mode — includes lighting, resolution, and quality assessment when enabled
- Tool-call structured output — description, categories, tags, mood, and technical quality returned as parsed JSON
- Auto-generate toggle — describes photos automatically as they scroll into view (bounded queue, one inference at a time)
- Manual generate — trigger description for any single photo from the gallery or detail view
Interactive Ask Mode (AI Chat)
- Chat interface with streaming token output per message
- Attach multiple photos in a single conversation (horizontal photo strip)
- Add photos from gallery picker or camera capture without leaving the chat
- Regenerate any AI message with one tap
- Stop streaming mid-response
- Export or copy all messages in a conversation
- Session isolation — chat session and description session never overlap; explicit hand-off protocol prevents native engine conflicts
OCR / Text Recognition
- Extract printed text, signs, labels, documents, and handwriting from any image
- On-device, no cloud OCR
- Result displayed in a bottom sheet with copy, share, and TTS reader
Smart Categorisation
- Automatic photo categories: Nature · People · Food · Documents · Travel · Architecture · Pets · Sports · Other
- Categories stored in the Room database alongside descriptions
- Drive Smart Collections — photos categorised as "Food" appear in the Food smart album automatically
Thinking Mode
- Chain-of-thought reasoning block visible before the final description or answer
- Expandable / collapsible
ReasoningBlockcomponent in photo detail and chat bubbles - Configurable thinking budget (0 = fast, higher = deeper reasoning)
AI Settings
- Backend: GPU (default) or CPU — model
preferredBackendcan override the global setting - Temperature (0.0–1.5) — controls output randomness
- Top-P (0.0–1.0) — nucleus sampling probability
- Top-K (1–100) — token selection pool size
- Response language — 13+ languages via
assets/languages.json - Enable streaming — toggle token-by-token output vs. single-shot response
- Provide technical details — adds lighting/resolution qualifiers to prompts
Multi-Model Support
ModelAdapterinterface allows each model to define its own tokenisation strategy, system instruction, content construction, and response post-processingGemmaAdapter— optimised for Gemma 4 tool-call JSON output with full special-character escapingFastVlmAdapter— skeleton for FastVLM-class models with different content layout- Model registry in
assets/models.json— add new.litertlmmodels without code changes - Per-model
preferredBackend,memoryMinRequired,memoryRecommended,visionSupport,toolCallflags
Gallery
Views & Layout
- Grid view — 2, 3, 4, or 5 columns (user-selectable)
- List view — single-column with metadata visible
- Toggle between grid and list from the toolbar
- Date-grouped timeline with sticky date headers in list view
- Description preview in gallery cells (toggleable)
Collections & Albums
- Local Albums — MediaStore-backed; mirrors device folders automatically
- Smart Collections — AI-powered dynamic albums built from photo categories; refresh on demand
- Smart collections shown in a 2-column grid with an "AI" badge
- Secure collections — lock any album behind vault authentication
- Rename any collection from a long-press bottom sheet
- Copy all photos from one collection to another
- Share all photos in a collection
Favorites
- Star/unstar any photo; persisted in Room database
- Dedicated Favorites tab in the bottom navigation bar
Search & Filter
- Search — live-filter by photo name across the current view
- Sort — Date taken (newest/oldest), Date added (newest/oldest), File name (A–Z / Z–A), File size (largest/smallest)
- Filters (persistent per session):
- Show / hide videos, screenshots, RAW photos, hidden files
- Group similar photos toggle
- Min / max width and height (pixels)
- Date range picker (from / to)
- MIME type filter chips (JPEG, PNG, WebP, HEIF, GIF, MP4, WebM)
- Reset all filters button
Multi-Select Bulk Operations
- Long-press any photo to enter selection mode
- Tap additional photos to extend selection
- Bulk: Favorite · Generate descriptions · Recognize text · Share · Add to collection · Delete
- Contextual selection top bar with photo count
Photo Detail
- Full-screen photo viewer
- Pinch-to-zoom with spring-back and bounds clamping
- Two-finger rotate with spring-back; rotation persisted per photo
- Double-tap to reset zoom and rotation
- Share (with optional metadata stripping)
- Toggle favorite
- Rotate 90° (clockwise; persisted)
- Add / remove from album
- Delete with confirmation dialog
- Description display with streaming indicator
- Reasoning block (Thinking Mode output)
- Ask button — opens AI Chat preloaded with the current photo
- Photo metadata sheet (date, dimensions, size, location, MIME type, bucket)
TTS Reader
Available in: chat bubbles, OCR results sheet, photo descriptions.
| Setting | Options |
|---|---|
| Reading mode | Characters · Words · Sentences · Paragraphs · Lines |
| Autoplay on load | On / Off |
| Stop on background | On / Off |
| TTS engine | Any installed Android TTS engine |
| Language | Full BCP-47 locale picker |
| Voice | Voice picker filtered by language |
| Pitch | 0.5× – 2.0× |
| Rate | 0.5× – 2.0× |
- Prev / Play-Pause / Next segment controls
- Each button labelled with the current reading mode for TalkBack
- Non-blocking: all TTS work runs on
Dispatchers.IO; main thread never blocked - Settings cached to skip redundant JNI calls on re-render
Accessibility
PhotoLens was built by a visually impaired developer specifically for visually impaired users. Accessibility is the primary use case, not a feature checkbox.
TalkBack
- Every UI element has a precise
contentDescription - Every status change (description generated, download complete, session ready) is announced via Polite live regions without requiring focus navigation
- Headings marked with
Modifier.semantics { heading() }for efficient swipe navigation - Selection mode top bar announces count: "3 photos selected"
- Collection cards announce name and photo count
Standards Compliance
- WCAG 2.1 Level AA — high-contrast colour schemes, generous touch targets (≥48dp), predictable navigation
- Every screen reachable via swipe navigation alone
- No time-limited interactions
Voice Input
- Voice typing in Ask Mode via Android SpeechRecognizer
- Microphone permission properly gated with rationale sheet and Settings deep-link fallback
- Recognizer properly destroyed when AskScreen leaves composition (prevents native listener leak)
Other
- Onboarding pager fully navigable with TalkBack
- Model download progress announced at every percentage update
- "Preparing AI Environment" overlay announced so users know when chat is ready
- Every settings toggle announces its on/off state
- Keep Screen Awake option for users who need extended viewing time
Privacy & Security
Zero Data Collection
- No analytics SDK
- No crash reporter (no Firebase Crashlytics, Sentry, or similar)
- No advertising SDK
- No telemetry
- No account, no sign-in, no email required
Zero Cloud Processing
- AI inference runs on-device GPU/CPU/NPU only
- Photos are never uploaded for analysis
- No API calls during photo description or chat
Secure Vault
- Lock any collection behind biometric (fingerprint / face) or a custom password
- Vault photos physically moved to
filesDir/secure_photos— inaccessible to other apps without root - On lock: photos removed from MediaStore-visible paths
- On unlock: biometric or password challenge presented via
BiometricPrompt - Password hashing: SHA-256 + 16-byte random salt + 10,000 iterations
- Constant-time comparison prevents timing attacks
- Raw password is never written to disk or DataStore
Share Privacy Controls
- Strip GPS/location data before sharing
- Strip full EXIF device metadata (make, model, software) before sharing
- HEIC→JPEG auto-conversion on share (for compatibility without metadata leakage)
Sound & Haptics
| Sound | Trigger |
|---|---|
| Processing loop | AI description or chat inference in progress |
| Reply | AI message received in chat |
| Success | Photo description completed |
| Page turn | TTS reader segment advance |
| Delete | Photo deleted |
- Each sound independently toggleable in Settings
- Haptic feedback with configurable intensity (0–100%)
- Non-blocking: sound playback runs off the main thread via
SoundManager
Model Manager
- Card per model showing name, version, size, RAM requirements, description
- Download / Cancel / Delete controls per model
- Live download progress: bytes downloaded, total size, speed (KB/s), ETA
- Download runs in a ForegroundService with a persistent notification
- Resume support via HTTP
Rangeheader (recovers from interrupted downloads) - Active model shown with a checkmark; switching model triggers full AI engine shutdown and re-init
- Backend override per model (e.g., a CPU-preferred model ignores the global GPU setting)
Settings Reference
Appearance & Gallery
| Setting | Options |
|---|---|
| Gallery layout | Grid / List |
| Grid density | 2 (Standard) · 3 (Compact) · 4 (Dense) · 5 (Extreme) |
| Show description in gallery | On / Off |
| Auto-refresh collections | On / Off |
| Convert HEIC to JPEG | On / Off |
| Theme | System / Light / Dark / AMOLED |
| Accent color | Blue · Green · Purple · Orange · Red · Dynamic |
| Startup screen | Gallery / Collections / Favorites |
| Keep screen awake | On / Off |
| Sort photos by | 8 options (see Gallery section) |
| Group similar photos | On / Off |
| Stack burst photos | On / Off |
| Show screenshots | On / Off |
| Show hidden files | On / Off |
| Show RAW photos | On / Off |
| Show videos | On / Off |
Advanced Filters (saved)
Min/max width, min/max height, from/to date, file size range, MIME type set.
AI Settings
| Setting | Range / Options |
|---|---|
| AI Backend | GPU / CPU |
| Current model | Any downloaded model |
| Response mode | Fast / Reasoning |
| Thinking budget | 0–4096 tokens |
| Response language | 13+ languages |
| Response length | Brief / Balanced / Detailed / Extremely Detailed |
| Provide technical details | On / Off |
| Enable streaming | On / Off |
| Auto-generate descriptions | On / Off |
| Temperature | 0.0–1.5 |
| Top-P | 0.0–1.0 |
| Top-K | 1–100 |
Sound & Vibration
Per-event toggles for: processing loop, success sound, reply sound, delete sound, page-turn sound. Haptic feedback on/off + sensitivity slider.
Secure Sharing
Strip GPS on share · Strip EXIF metadata on share.
Security
Enable secure collections · Show secure collections in list · Encryption mode (Biometric / Custom password) · Change vault password · Disable vault.
TTS
Engine · Language · Pitch · Rate · Voice.
Reader
Reading mode (Characters / Words / Sentences / Paragraphs / Lines) · Autoplay on load · Stop on background · Character chunk size.
Architecture
UI (Jetpack Compose + Material 3)
↓ StateFlow / collectAsStateWithLifecycle
ViewModel (HomeViewModel · PhotoViewModel · AskViewModel · SettingsViewModel · ModelConfigViewModel)
↓
Repository (PhotoRepository — single source of truth, bounded generation queue)
↓ ↓
Room Database LiteRtLmManager (single-session AI engine manager)
DataStore Prefs ↓
ModelAdapter (GemmaAdapter / FastVlmAdapter)
↓
Google Gemma 4 via LiteRT-LM (100% on-device)
Key Design Decisions
- Single-session constraint — LiteRT-LM allows only one
ConversationperEngine; enforced via a mutex +SessionModeenum with explicitDESCRIPTION ↔ CHAThand-off - Bounded generation queue — a
Channelconsumer serialises auto-describe requests so only one inference runs at a time regardless of scroll speed - Non-blocking TTS —
TtsManagerreads settings onIO, poststts.speak()to Main; UI thread never waits - Adapter pattern —
ModelAdapterinterface lets any.litertlmmodel plug in without touchingLiteRtLmManager
Tech Stack
| Layer | Library / Version |
|---|---|
| Language | Kotlin 2.2 |
| UI | Jetpack Compose + Material 3 (BOM 2024.12) |
| Navigation | Navigation Compose |
| State | ViewModel + StateFlow |
| DI | Hilt 2.57 |
| Database | Room 2.6 |
| Preferences | DataStore 1.1 |
| Photo Access | MediaStore (API 24+ compatible) |
| AI Inference | Google AI Edge LiteRT-LM 0.11 |
| AI Model | Google Gemma 4 E2B / E4B |
| Image Loading | Coil 2.7 |
| Networking (download) | OkHttp 4.12 |
| Permissions | Accompanist Permissions 0.36 |
| Biometrics | AndroidX Biometric |
| Open-source notices | Google OSS Licenses Plugin |
Supported Platforms
| Requirement | Value |
|---|---|
| Minimum Android | 7.0 (API 24) |
| Target Android | API 35 |
| Compiled with | SDK 37 |
| Minimum RAM | 4 GB (E2B model) · 6 GB (E4B model) |
| Recommended RAM | 4 GB (E2B) · 8 GB (E4B) |
| Storage required | ~2.5 GB (E2B) · ~4.7 GB (E4B) |
| Physical device | Required (AI requires GPU/NPU) |
For version history, see whatsnew.md.
Why this matters
Your help is required to keep PhotoLens free, forever
PhotoLens has no investors, no ads, no subscription, and no data to sell. It is built and maintained by one visually impaired engineer who refuses to put a paywall between blind users and their own photographs.
Every sponsor pays for a real cost — the GitHub releases that host this APK, the Android device farm used for accessibility testing, the screen readers, and the time spent reading every TalkBack bug report. None of it is free to keep running.
If PhotoLens helped you — or could help someone you love — please consider sponsoring. Even one cup of coffee a month keeps this app honest, offline, and ad-free.
Help keep PhotoLens free, private, and independent
Every photo described here is a moment someone almost lost.
PhotoLens is built by one visually impaired engineer, with no investors, no ads, and no plans to ever sell your data — because there is no data to sell. If this work matters to you, your sponsorship pays for the devices, the model testing, and the hours that make accessibility actually accessible.
One-time or monthly · Cancel anytime