Release notes
What's new in PhotoLens
Every change shipped, since day one. Versions follow Semantic Versioning.
What's New in PhotoLens
All notable changes to PhotoLens are documented here. Versions follow Semantic Versioning.
[1.2.0] — 2026-05-10
Theme: Multi-photo chat, rich sound design, secure vault, full accessibility polish, and a settings overhaul.
New Features
Multi-Photo Chat (URI Preview)
- Attach multiple photos in a single chat session — add up to N photos from your gallery or camera directly into a conversation
- Horizontal photo strip above the message list shows each attached image with a remove button
- Camera capture within AskScreen — take a photo on the spot and send it immediately to the AI
- Photo picker integration in AskScreen, same flow as HomeScreen's "New Photo" sheet
Sound Design
- Five ambient audio cues — processing loop, reply ding, success chime, page turn, and delete sweep; all non-blocking and run off the main thread
- Haptic feedback with configurable sensitivity (0–100%)
- All sounds independently togglable per-event in Settings
Secure Vault (Collections)
- Per-collection security — lock any album behind biometric or a custom password; vault photos are physically moved to
filesDir/secure_photos, not merely hidden - Vault passwords hashed with SHA-256 + 16-byte random salt + 10,000 iterations; raw password is never written to disk
- Constant-time password comparison to prevent timing attacks
- Security progress bottom sheet with animated progress bar during vault operations
showSecureCollectionsInListsetting to control vault visibility in the Collections tab
TTS Reader — Complete Rewrite
- ReaderComponent embedded in every chat bubble and OCR result sheet
- Reading modes: Characters, Words, Sentences, Paragraphs, Lines — switchable on the fly
- Prev / Play-Pause / Next segment controls with TalkBack-labelled state
autoplayOnLoadandstopOnBackgroundlifecycle hooks- Non-blocking architecture: all TTS work runs on
Dispatchers.IO; UI thread is never held - Voice selection, pitch, rate, and engine configurable in Settings
Full Settings Screen
- Appearance & Gallery — theme (System/Light/Dark/AMOLED), accent color (6 options + Dynamic Material You), startup screen selection, grid density (2–5 columns), keep screen awake toggle
- Advanced Filters — saved per-session; min/max width & height, date range picker, MIME type chip filter
- AI Tuning — temperature, top-P, top-K sliders; response length (Brief / Balanced / Detailed / Extremely Detailed); thinking budget; streaming toggle; auto-generate descriptions
- Sound & Vibration — per-event toggles for all five sound events plus haptic sensitivity slider
- Security — biometric vs. custom-password vault mode; change vault password; disable vault with confirmation dialog
- TTS — engine picker (all installed TTS engines), language picker (50+ languages from
languages.json), pitch, rate, voice selection - Secure Sharing — strip GPS and full EXIF metadata before sharing
- Language — AI response language picker with human-readable names (13+ languages from
languages.json)
Onboarding Flow
- 5-page pager onboarding on first launch, fully TalkBack-navigable
- Pages: Welcome, Smart Gallery, AI Features, Accessibility, Privacy & Security
- Model download prompt triggered automatically after onboarding if no model is present
ReasoningBlock Component
- Expandable "chain-of-thought" block in chat bubbles and photo description — shows the model's internal reasoning before the final answer
- Collapsed by default; animates open/close with a chevron
Markdown Renderer
- Full
MarkdownContentcomposable with no external dependency - Supports: headings (H1–H4), bold, italic, bold-italic, inline code, code blocks, blockquotes, unordered lists, ordered lists, horizontal rules, links (tappable)
About, Help & Support, Privacy, Terms screens
- All rendered via the generic
MarkdownScreencomposable backed byres/raw/*.mdfiles
Improvements
- Smart Collections empty state — contextual message depending on whether any AI descriptions exist; CTA button to go to Settings when descriptions haven't been generated yet
- Photo detail screen — pinch-to-zoom + two-finger rotate with spring-back animation; rotation persisted per photo; zoom/rotation reset on navigation
- Bulk operations — add to favorites, generate descriptions, recognize text, share, add to collection, delete — all from the selection top bar
- OCR bottom sheet — TTS reader embedded; copy + share buttons; streaming progress indicator
- ListScreen — collection/album detail with the same sort/filter/bulk-select as HomeScreen
- Session memory optimization — "Preparing AI Environment" overlay with polite live-region announcement so TalkBack users know when chat is ready
- Scroll-to-bottom FAB in AskScreen — appears when the user scrolls up; animates in/out with fade + scale
- Chat export — copy all messages to clipboard or share as plain text
- Message regeneration — re-run the last AI turn from the same message index
- SpeechRecognizer lifecycle fix — recognizer is properly destroyed when AskScreen leaves composition, preventing a native listener leak after back-navigation
Bug Fixes
GemmaAdapter.processResponsenow builds JSON withJSONObject/JSONArrayinstead of string interpolation — fixes crashes on model output containing quotes, backslashes, or newlines- Vault password stored as SHA-256 hash + salt rather than plaintext — security fix
LiteRtLmManagersingle-session constraint: properly prevents a secondcreateConversationwhile one is already open; description session re-opens when chat session endsTtsManagerno longer blocks the UI thread — all TTS calls post to Main viawithContext(Dispatchers.Main)fromDispatchers.IO- Photo URI resolution handles both
file://scheme andcontent://URIs viaParcelFileDescriptor/proc/self/fdtrick
[1.1.0] — 2026-04-14
Theme: Multi-model architecture with model-specific adapters for full control over tokenisation strategy.
New Features
Multi-Model Architecture
ModelAdapterinterface — pluggable adapter per model that controls system instruction, content building, conversation config creation, and response post-processingGemmaAdapter— dedicated adapter for Gemma 4 models; implements tool-call structured JSON response parsing with proper escapingFastVlmAdapter— adapter skeleton for FastVLM-class models with different tokenisation strategy- Models defined in
assets/models.json—adapterfield determines which adapter is instantiated at runtime
Model-Specific Behaviour
getSystemInstruction()— per-adapter system prompt customisationbuildAnalysisContent()— per-adapter content construction for photo analysis turnsbuildAskContent()— per-adapter content construction for interactive chat turnscreateConversationConfig()— per-adapterConversationConfigincluding sampler config and tool registrationprocessResponse()— per-adapter post-processing of raw model output and tool call arguments
JsonModel Data Class
- New fields:
adapter(string, selects adapter class),preferredBackend(overrides global CPU/GPU),toolCall(boolean),memoryMinRequired,memoryRecommended ModelStatuswrapsJsonModelwith download state, progress bytes, speed, ETA, and error message
Models Manager
- Redesigned
ModelConfigScreen— card-per-model layout showing size, RAM requirements, download status, speed, ETA - Download/cancel/delete controls per model; active model shown with a checkmark badge
- Model switch triggers full
LiteRtLmManager.shutdown()+ re-init to ensure correct adapter is loaded
Improvements
LiteRtLmManagersession mode enum (DESCRIPTION/CHAT) replaces boolean flag — clearer invariants and easier to extend- Speculative decoding enabled via
ExperimentalFlags.enableSpeculativeDecoding = truefor faster token generation ModelDownloadService— foreground service with resume support via HTTP Range header; progress reported viaSharedFlow
Bug Fixes
ModelAdapter.processResponsewith tool calls no longer crashes on model output that embeds special JSON characters — replaced string interpolation withJSONObject.put()throughout
[1.0.0] — 2026-04-14
Theme: Initial release — on-device AI photo gallery built for accessibility.
New Features
Core AI Integration
- On-device AI inference via Google AI Edge LiteRT-LM with Gemma 4 multimodal model
- Natural language photo descriptions — full sentences generated by Gemma 4 locally, no internet
- Interactive Ask Mode — streaming chat interface to ask any question about a photo
- Smart categorisation — photos automatically grouped into Nature, People, Food, Documents, Travel, Architecture, Pets, Sports via tool calls
- OCR / text recognition — extract text from any image using the same on-device model
- Thinking Mode — chain-of-thought reasoning visible before the final answer
- Multilingual output — 13 languages selectable for AI response language
Gallery & Navigation
- Grid and list view — togglable; grid supports 2–5 columns
- Date-grouped photo timeline with sticky headers
- Local Albums via Android MediaStore
- Smart Collections — dynamic albums built from AI-generated categories
- Favorites — star any photo; dedicated Favorites tab
- Bottom navigation: Photos / Collections / Favorites
- Full-screen photo detail with share, favorite, rotate, more-menu
Privacy & Security
- Zero cloud processing — all AI runs on-device GPU/CPU/NPU
- No analytics, no telemetry, no account required
- Secure Vault foundation — architecture for secure collections in place
Accessibility Foundation
- Full TalkBack semantic labelling on every UI element
- WCAG 2.1 Level AA — high contrast, generous touch targets, predictable navigation
- Live regions for progress announcements
- Voice input for Ask Mode
- Built-in TTS with segment-based reading controls
Model Management
- Foreground download service for model files (~2.4 GB)
- Progress tracking with speed and ETA
- GPU / CPU backend selection
Settings
- AI backend (GPU / CPU), temperature, response language
- Basic gallery preferences (view mode, grid columns, sort order)
Other Screens
- About, Help & Support — Markdown-rendered
- Privacy Policy and Terms of Use — in-app Markdown
Technical Foundation
- MVVM + Repository pattern; single source of truth in
PhotoRepository - Jetpack Compose + Material 3 throughout
- Hilt dependency injection
- Room v9 database for photo metadata and description state
- DataStore for persistent preferences
- Coil for image loading with HEIC→JPEG auto-conversion
For the full feature list, see features.md.