Skip to content

Android 7.0+ · v1.2.0 · Free forever

Download PhotoLens for Android

The accessible photo gallery that speaks. PhotoLens runs Google Gemma 4 entirely on your device — no internet, no cloud, no account. Install the APK below and start hearing your photographs in under a minute.

SHA-signed APK hosted on GitHub Releases. To install, allow "Install unknown apps" for your browser when prompted.

In this release

What's new in v1.2.0

Full history →

What's New in PhotoLens

All notable changes to PhotoLens are documented here. Versions follow Semantic Versioning.


[1.2.0] — 2026-05-10

Theme: Multi-photo chat, rich sound design, secure vault, full accessibility polish, and a settings overhaul.

New Features

Multi-Photo Chat (URI Preview)

  • Attach multiple photos in a single chat session — add up to N photos from your gallery or camera directly into a conversation
  • Horizontal photo strip above the message list shows each attached image with a remove button
  • Camera capture within AskScreen — take a photo on the spot and send it immediately to the AI
  • Photo picker integration in AskScreen, same flow as HomeScreen's "New Photo" sheet

Sound Design

  • Five ambient audio cues — processing loop, reply ding, success chime, page turn, and delete sweep; all non-blocking and run off the main thread
  • Haptic feedback with configurable sensitivity (0–100%)
  • All sounds independently togglable per-event in Settings

Secure Vault (Collections)

  • Per-collection security — lock any album behind biometric or a custom password; vault photos are physically moved to filesDir/secure_photos, not merely hidden
  • Vault passwords hashed with SHA-256 + 16-byte random salt + 10,000 iterations; raw password is never written to disk
  • Constant-time password comparison to prevent timing attacks
  • Security progress bottom sheet with animated progress bar during vault operations
  • showSecureCollectionsInList setting to control vault visibility in the Collections tab

TTS Reader — Complete Rewrite

  • ReaderComponent embedded in every chat bubble and OCR result sheet
  • Reading modes: Characters, Words, Sentences, Paragraphs, Lines — switchable on the fly
  • Prev / Play-Pause / Next segment controls with TalkBack-labelled state
  • autoplayOnLoad and stopOnBackground lifecycle hooks
  • Non-blocking architecture: all TTS work runs on Dispatchers.IO; UI thread is never held
  • Voice selection, pitch, rate, and engine configurable in Settings

Full Settings Screen

  • Appearance & Gallery — theme (System/Light/Dark/AMOLED), accent color (6 options + Dynamic Material You), startup screen selection, grid density (2–5 columns), keep screen awake toggle
  • Advanced Filters — saved per-session; min/max width & height, date range picker, MIME type chip filter
  • AI Tuning — temperature, top-P, top-K sliders; response length (Brief / Balanced / Detailed / Extremely Detailed); thinking budget; streaming toggle; auto-generate descriptions
  • Sound & Vibration — per-event toggles for all five sound events plus haptic sensitivity slider
  • Security — biometric vs. custom-password vault mode; change vault password; disable vault with confirmation dialog
  • TTS — engine picker (all installed TTS engines), language picker (50+ languages from languages.json), pitch, rate, voice selection
  • Secure Sharing — strip GPS and full EXIF metadata before sharing
  • Language — AI response language picker with human-readable names (13+ languages from languages.json)

Onboarding Flow

  • 5-page pager onboarding on first launch, fully TalkBack-navigable
  • Pages: Welcome, Smart Gallery, AI Features, Accessibility, Privacy & Security
  • Model download prompt triggered automatically after onboarding if no model is present

ReasoningBlock Component

  • Expandable "chain-of-thought" block in chat bubbles and photo description — shows the model's internal reasoning before the final answer
  • Collapsed by default; animates open/close with a chevron

Markdown Renderer

  • Full MarkdownContent composable with no external dependency
  • Supports: headings (H1–H4), bold, italic, bold-italic, inline code, code blocks, blockquotes, unordered lists, ordered lists, horizontal rules, links (tappable)

About, Help & Support, Privacy, Terms screens

  • All rendered via the generic MarkdownScreen composable backed by res/raw/*.md files

Improvements

  • Smart Collections empty state — contextual message depending on whether any AI descriptions exist; CTA button to go to Settings when descriptions haven't been generated yet
  • Photo detail screen — pinch-to-zoom + two-finger rotate with spring-back animation; rotation persisted per photo; zoom/rotation reset on navigation
  • Bulk operations — add to favorites, generate descriptions, recognize text, share, add to collection, delete — all from the selection top bar
  • OCR bottom sheet — TTS reader embedded; copy + share buttons; streaming progress indicator
  • ListScreen — collection/album detail with the same sort/filter/bulk-select as HomeScreen
  • Session memory optimization — "Preparing AI Environment" overlay with polite live-region announcement so TalkBack users know when chat is ready
  • Scroll-to-bottom FAB in AskScreen — appears when the user scrolls up; animates in/out with fade + scale
  • Chat export — copy all messages to clipboard or share as plain text
  • Message regeneration — re-run the last AI turn from the same message index
  • SpeechRecognizer lifecycle fix — recognizer is properly destroyed when AskScreen leaves composition, preventing a native listener leak after back-navigation

Bug Fixes

  • GemmaAdapter.processResponse now builds JSON with JSONObject/JSONArray instead of string interpolation — fixes crashes on model output containing quotes, backslashes, or newlines
  • Vault password stored as SHA-256 hash + salt rather than plaintext — security fix
  • LiteRtLmManager single-session constraint: properly prevents a second createConversation while one is already open; description session re-opens when chat session ends
  • TtsManager no longer blocks the UI thread — all TTS calls post to Main via withContext(Dispatchers.Main) from Dispatchers.IO
  • Photo URI resolution handles both file:// scheme and content:// URIs via ParcelFileDescriptor /proc/self/fd trick

[1.1.0] — 2026-04-14

Theme: Multi-model architecture with model-specific adapters for full control over tokenisation strategy.

New Features

Multi-Model Architecture

  • ModelAdapter interface — pluggable adapter per model that controls system instruction, content building, conversation config creation, and response post-processing
  • GemmaAdapter — dedicated adapter for Gemma 4 models; implements tool-call structured JSON response parsing with proper escaping
  • FastVlmAdapter — adapter skeleton for FastVLM-class models with different tokenisation strategy
  • Models defined in assets/models.jsonadapter field determines which adapter is instantiated at runtime

Model-Specific Behaviour

  • getSystemInstruction() — per-adapter system prompt customisation
  • buildAnalysisContent() — per-adapter content construction for photo analysis turns
  • buildAskContent() — per-adapter content construction for interactive chat turns
  • createConversationConfig() — per-adapter ConversationConfig including sampler config and tool registration
  • processResponse() — per-adapter post-processing of raw model output and tool call arguments

JsonModel Data Class

  • New fields: adapter (string, selects adapter class), preferredBackend (overrides global CPU/GPU), toolCall (boolean), memoryMinRequired, memoryRecommended
  • ModelStatus wraps JsonModel with download state, progress bytes, speed, ETA, and error message

Models Manager

  • Redesigned ModelConfigScreen — card-per-model layout showing size, RAM requirements, download status, speed, ETA
  • Download/cancel/delete controls per model; active model shown with a checkmark badge
  • Model switch triggers full LiteRtLmManager.shutdown() + re-init to ensure correct adapter is loaded

Improvements

  • LiteRtLmManager session mode enum (DESCRIPTION / CHAT) replaces boolean flag — clearer invariants and easier to extend
  • Speculative decoding enabled via ExperimentalFlags.enableSpeculativeDecoding = true for faster token generation
  • ModelDownloadService — foreground service with resume support via HTTP Range header; progress reported via SharedFlow

Bug Fixes

  • ModelAdapter.processResponse with tool calls no longer crashes on model output that embeds special JSON characters — replaced string interpolation with JSONObject.put() throughout

[1.0.0] — 2026-04-14

Theme: Initial release — on-device AI photo gallery built for accessibility.

New Features

Core AI Integration

  • On-device AI inference via Google AI Edge LiteRT-LM with Gemma 4 multimodal model
  • Natural language photo descriptions — full sentences generated by Gemma 4 locally, no internet
  • Interactive Ask Mode — streaming chat interface to ask any question about a photo
  • Smart categorisation — photos automatically grouped into Nature, People, Food, Documents, Travel, Architecture, Pets, Sports via tool calls
  • OCR / text recognition — extract text from any image using the same on-device model
  • Thinking Mode — chain-of-thought reasoning visible before the final answer
  • Multilingual output — 13 languages selectable for AI response language

Gallery & Navigation

  • Grid and list view — togglable; grid supports 2–5 columns
  • Date-grouped photo timeline with sticky headers
  • Local Albums via Android MediaStore
  • Smart Collections — dynamic albums built from AI-generated categories
  • Favorites — star any photo; dedicated Favorites tab
  • Bottom navigation: Photos / Collections / Favorites
  • Full-screen photo detail with share, favorite, rotate, more-menu

Privacy & Security

  • Zero cloud processing — all AI runs on-device GPU/CPU/NPU
  • No analytics, no telemetry, no account required
  • Secure Vault foundation — architecture for secure collections in place

Accessibility Foundation

  • Full TalkBack semantic labelling on every UI element
  • WCAG 2.1 Level AA — high contrast, generous touch targets, predictable navigation
  • Live regions for progress announcements
  • Voice input for Ask Mode
  • Built-in TTS with segment-based reading controls

Model Management

  • Foreground download service for model files (~2.4 GB)
  • Progress tracking with speed and ETA
  • GPU / CPU backend selection

Settings

  • AI backend (GPU / CPU), temperature, response language
  • Basic gallery preferences (view mode, grid columns, sort order)

Other Screens

  • About, Help & Support — Markdown-rendered
  • Privacy Policy and Terms of Use — in-app Markdown

Technical Foundation

  • MVVM + Repository pattern; single source of truth in PhotoRepository
  • Jetpack Compose + Material 3 throughout
  • Hilt dependency injection
  • Room v9 database for photo metadata and description state
  • DataStore for persistent preferences
  • Coil for image loading with HEIC→JPEG auto-conversion

For the full feature list, see features.md.

Capabilities

Everything PhotoLens can do

PhotoLens — Complete Feature Reference

Version 1.2.0 · Android 7.0+ (API 24+) · 100% On-Device · No Cloud · No Account


On-Device AI

Feature Details
AI Engine Google AI Edge LiteRT-LM 0.11
Model Google Gemma 4 E2B (2.41 GB) or E4B (4.56 GB)
Inference 100% on-device — GPU, CPU, or NPU
Internet required One-time model download only
Photo data sent to cloud Never

Photo Description

  • Natural language descriptions generated by Gemma 4 in full sentences (not keywords)
  • Four response length presets: Brief (1 sentence) · Balanced (≤3 sentences) · Detailed (step-by-step) · Extremely Detailed (exhaustive full-pass)
  • Technical details mode — includes lighting, resolution, and quality assessment when enabled
  • Tool-call structured output — description, categories, tags, mood, and technical quality returned as parsed JSON
  • Auto-generate toggle — describes photos automatically as they scroll into view (bounded queue, one inference at a time)
  • Manual generate — trigger description for any single photo from the gallery or detail view

Interactive Ask Mode (AI Chat)

  • Chat interface with streaming token output per message
  • Attach multiple photos in a single conversation (horizontal photo strip)
  • Add photos from gallery picker or camera capture without leaving the chat
  • Regenerate any AI message with one tap
  • Stop streaming mid-response
  • Export or copy all messages in a conversation
  • Session isolation — chat session and description session never overlap; explicit hand-off protocol prevents native engine conflicts

OCR / Text Recognition

  • Extract printed text, signs, labels, documents, and handwriting from any image
  • On-device, no cloud OCR
  • Result displayed in a bottom sheet with copy, share, and TTS reader

Smart Categorisation

  • Automatic photo categories: Nature · People · Food · Documents · Travel · Architecture · Pets · Sports · Other
  • Categories stored in the Room database alongside descriptions
  • Drive Smart Collections — photos categorised as "Food" appear in the Food smart album automatically

Thinking Mode

  • Chain-of-thought reasoning block visible before the final description or answer
  • Expandable / collapsible ReasoningBlock component in photo detail and chat bubbles
  • Configurable thinking budget (0 = fast, higher = deeper reasoning)

AI Settings

  • Backend: GPU (default) or CPU — model preferredBackend can override the global setting
  • Temperature (0.0–1.5) — controls output randomness
  • Top-P (0.0–1.0) — nucleus sampling probability
  • Top-K (1–100) — token selection pool size
  • Response language — 13+ languages via assets/languages.json
  • Enable streaming — toggle token-by-token output vs. single-shot response
  • Provide technical details — adds lighting/resolution qualifiers to prompts

Multi-Model Support

  • ModelAdapter interface allows each model to define its own tokenisation strategy, system instruction, content construction, and response post-processing
  • GemmaAdapter — optimised for Gemma 4 tool-call JSON output with full special-character escaping
  • FastVlmAdapter — skeleton for FastVLM-class models with different content layout
  • Model registry in assets/models.json — add new .litertlm models without code changes
  • Per-model preferredBackend, memoryMinRequired, memoryRecommended, visionSupport, toolCall flags

Gallery

Views & Layout

  • Grid view — 2, 3, 4, or 5 columns (user-selectable)
  • List view — single-column with metadata visible
  • Toggle between grid and list from the toolbar
  • Date-grouped timeline with sticky date headers in list view
  • Description preview in gallery cells (toggleable)

Collections & Albums

  • Local Albums — MediaStore-backed; mirrors device folders automatically
  • Smart Collections — AI-powered dynamic albums built from photo categories; refresh on demand
  • Smart collections shown in a 2-column grid with an "AI" badge
  • Secure collections — lock any album behind vault authentication
  • Rename any collection from a long-press bottom sheet
  • Copy all photos from one collection to another
  • Share all photos in a collection

Favorites

  • Star/unstar any photo; persisted in Room database
  • Dedicated Favorites tab in the bottom navigation bar

Search & Filter

  • Search — live-filter by photo name across the current view
  • Sort — Date taken (newest/oldest), Date added (newest/oldest), File name (A–Z / Z–A), File size (largest/smallest)
  • Filters (persistent per session):
    • Show / hide videos, screenshots, RAW photos, hidden files
    • Group similar photos toggle
    • Min / max width and height (pixels)
    • Date range picker (from / to)
    • MIME type filter chips (JPEG, PNG, WebP, HEIF, GIF, MP4, WebM)
    • Reset all filters button

Multi-Select Bulk Operations

  • Long-press any photo to enter selection mode
  • Tap additional photos to extend selection
  • Bulk: Favorite · Generate descriptions · Recognize text · Share · Add to collection · Delete
  • Contextual selection top bar with photo count

Photo Detail

  • Full-screen photo viewer
  • Pinch-to-zoom with spring-back and bounds clamping
  • Two-finger rotate with spring-back; rotation persisted per photo
  • Double-tap to reset zoom and rotation
  • Share (with optional metadata stripping)
  • Toggle favorite
  • Rotate 90° (clockwise; persisted)
  • Add / remove from album
  • Delete with confirmation dialog
  • Description display with streaming indicator
  • Reasoning block (Thinking Mode output)
  • Ask button — opens AI Chat preloaded with the current photo
  • Photo metadata sheet (date, dimensions, size, location, MIME type, bucket)

TTS Reader

Available in: chat bubbles, OCR results sheet, photo descriptions.

Setting Options
Reading mode Characters · Words · Sentences · Paragraphs · Lines
Autoplay on load On / Off
Stop on background On / Off
TTS engine Any installed Android TTS engine
Language Full BCP-47 locale picker
Voice Voice picker filtered by language
Pitch 0.5× – 2.0×
Rate 0.5× – 2.0×
  • Prev / Play-Pause / Next segment controls
  • Each button labelled with the current reading mode for TalkBack
  • Non-blocking: all TTS work runs on Dispatchers.IO; main thread never blocked
  • Settings cached to skip redundant JNI calls on re-render

Accessibility

PhotoLens was built by a visually impaired developer specifically for visually impaired users. Accessibility is the primary use case, not a feature checkbox.

TalkBack

  • Every UI element has a precise contentDescription
  • Every status change (description generated, download complete, session ready) is announced via Polite live regions without requiring focus navigation
  • Headings marked with Modifier.semantics { heading() } for efficient swipe navigation
  • Selection mode top bar announces count: "3 photos selected"
  • Collection cards announce name and photo count

Standards Compliance

  • WCAG 2.1 Level AA — high-contrast colour schemes, generous touch targets (≥48dp), predictable navigation
  • Every screen reachable via swipe navigation alone
  • No time-limited interactions

Voice Input

  • Voice typing in Ask Mode via Android SpeechRecognizer
  • Microphone permission properly gated with rationale sheet and Settings deep-link fallback
  • Recognizer properly destroyed when AskScreen leaves composition (prevents native listener leak)

Other

  • Onboarding pager fully navigable with TalkBack
  • Model download progress announced at every percentage update
  • "Preparing AI Environment" overlay announced so users know when chat is ready
  • Every settings toggle announces its on/off state
  • Keep Screen Awake option for users who need extended viewing time

Privacy & Security

Zero Data Collection

  • No analytics SDK
  • No crash reporter (no Firebase Crashlytics, Sentry, or similar)
  • No advertising SDK
  • No telemetry
  • No account, no sign-in, no email required

Zero Cloud Processing

  • AI inference runs on-device GPU/CPU/NPU only
  • Photos are never uploaded for analysis
  • No API calls during photo description or chat

Secure Vault

  • Lock any collection behind biometric (fingerprint / face) or a custom password
  • Vault photos physically moved to filesDir/secure_photos — inaccessible to other apps without root
  • On lock: photos removed from MediaStore-visible paths
  • On unlock: biometric or password challenge presented via BiometricPrompt
  • Password hashing: SHA-256 + 16-byte random salt + 10,000 iterations
  • Constant-time comparison prevents timing attacks
  • Raw password is never written to disk or DataStore

Share Privacy Controls

  • Strip GPS/location data before sharing
  • Strip full EXIF device metadata (make, model, software) before sharing
  • HEIC→JPEG auto-conversion on share (for compatibility without metadata leakage)

Sound & Haptics

Sound Trigger
Processing loop AI description or chat inference in progress
Reply AI message received in chat
Success Photo description completed
Page turn TTS reader segment advance
Delete Photo deleted
  • Each sound independently toggleable in Settings
  • Haptic feedback with configurable intensity (0–100%)
  • Non-blocking: sound playback runs off the main thread via SoundManager

Model Manager

  • Card per model showing name, version, size, RAM requirements, description
  • Download / Cancel / Delete controls per model
  • Live download progress: bytes downloaded, total size, speed (KB/s), ETA
  • Download runs in a ForegroundService with a persistent notification
  • Resume support via HTTP Range header (recovers from interrupted downloads)
  • Active model shown with a checkmark; switching model triggers full AI engine shutdown and re-init
  • Backend override per model (e.g., a CPU-preferred model ignores the global GPU setting)

Settings Reference

Appearance & Gallery

Setting Options
Gallery layout Grid / List
Grid density 2 (Standard) · 3 (Compact) · 4 (Dense) · 5 (Extreme)
Show description in gallery On / Off
Auto-refresh collections On / Off
Convert HEIC to JPEG On / Off
Theme System / Light / Dark / AMOLED
Accent color Blue · Green · Purple · Orange · Red · Dynamic
Startup screen Gallery / Collections / Favorites
Keep screen awake On / Off
Sort photos by 8 options (see Gallery section)
Group similar photos On / Off
Stack burst photos On / Off
Show screenshots On / Off
Show hidden files On / Off
Show RAW photos On / Off
Show videos On / Off

Advanced Filters (saved)

Min/max width, min/max height, from/to date, file size range, MIME type set.

AI Settings

Setting Range / Options
AI Backend GPU / CPU
Current model Any downloaded model
Response mode Fast / Reasoning
Thinking budget 0–4096 tokens
Response language 13+ languages
Response length Brief / Balanced / Detailed / Extremely Detailed
Provide technical details On / Off
Enable streaming On / Off
Auto-generate descriptions On / Off
Temperature 0.0–1.5
Top-P 0.0–1.0
Top-K 1–100

Sound & Vibration

Per-event toggles for: processing loop, success sound, reply sound, delete sound, page-turn sound. Haptic feedback on/off + sensitivity slider.

Secure Sharing

Strip GPS on share · Strip EXIF metadata on share.

Security

Enable secure collections · Show secure collections in list · Encryption mode (Biometric / Custom password) · Change vault password · Disable vault.

TTS

Engine · Language · Pitch · Rate · Voice.

Reader

Reading mode (Characters / Words / Sentences / Paragraphs / Lines) · Autoplay on load · Stop on background · Character chunk size.


Architecture

UI (Jetpack Compose + Material 3)
    ↓ StateFlow / collectAsStateWithLifecycle
ViewModel (HomeViewModel · PhotoViewModel · AskViewModel · SettingsViewModel · ModelConfigViewModel)
    ↓
Repository (PhotoRepository — single source of truth, bounded generation queue)
    ↓                    ↓
Room Database        LiteRtLmManager (single-session AI engine manager)
DataStore Prefs          ↓
                     ModelAdapter (GemmaAdapter / FastVlmAdapter)
                         ↓
                     Google Gemma 4 via LiteRT-LM (100% on-device)

Key Design Decisions

  • Single-session constraint — LiteRT-LM allows only one Conversation per Engine; enforced via a mutex + SessionMode enum with explicit DESCRIPTION ↔ CHAT hand-off
  • Bounded generation queue — a Channel consumer serialises auto-describe requests so only one inference runs at a time regardless of scroll speed
  • Non-blocking TTSTtsManager reads settings on IO, posts tts.speak() to Main; UI thread never waits
  • Adapter patternModelAdapter interface lets any .litertlm model plug in without touching LiteRtLmManager

Tech Stack

Layer Library / Version
Language Kotlin 2.2
UI Jetpack Compose + Material 3 (BOM 2024.12)
Navigation Navigation Compose
State ViewModel + StateFlow
DI Hilt 2.57
Database Room 2.6
Preferences DataStore 1.1
Photo Access MediaStore (API 24+ compatible)
AI Inference Google AI Edge LiteRT-LM 0.11
AI Model Google Gemma 4 E2B / E4B
Image Loading Coil 2.7
Networking (download) OkHttp 4.12
Permissions Accompanist Permissions 0.36
Biometrics AndroidX Biometric
Open-source notices Google OSS Licenses Plugin

Supported Platforms

Requirement Value
Minimum Android 7.0 (API 24)
Target Android API 35
Compiled with SDK 37
Minimum RAM 4 GB (E2B model) · 6 GB (E4B model)
Recommended RAM 4 GB (E2B) · 8 GB (E4B)
Storage required ~2.5 GB (E2B) · ~4.7 GB (E4B)
Physical device Required (AI requires GPU/NPU)

For version history, see whatsnew.md.

Why this matters

Your help is required to keep PhotoLens free, forever

PhotoLens has no investors, no ads, no subscription, and no data to sell. It is built and maintained by one visually impaired engineer who refuses to put a paywall between blind users and their own photographs.

Every sponsor pays for a real cost — the GitHub releases that host this APK, the Android device farm used for accessibility testing, the screen readers, and the time spent reading every TalkBack bug report. None of it is free to keep running.

If PhotoLens helped you — or could help someone you love — please consider sponsoring. Even one cup of coffee a month keeps this app honest, offline, and ad-free.

Help keep PhotoLens free, private, and independent

PhotoLens is built by one visually impaired engineer, with no investors, no ads, and no plans to ever sell your data — because there is no data to sell. If this work matters to you, your sponsorship pays for the devices, the model testing, and the hours that make accessibility actually accessible.

Sponsor PhotoLens on GitHub

One-time or monthly · Cancel anytime