Skip to content

Release notes

What's new in PhotoLens

Every change shipped, since day one. Versions follow Semantic Versioning.

What's New in PhotoLens

All notable changes to PhotoLens are documented here. Versions follow Semantic Versioning.


[1.2.0] — 2026-05-10

Theme: Multi-photo chat, rich sound design, secure vault, full accessibility polish, and a settings overhaul.

New Features

Multi-Photo Chat (URI Preview)

  • Attach multiple photos in a single chat session — add up to N photos from your gallery or camera directly into a conversation
  • Horizontal photo strip above the message list shows each attached image with a remove button
  • Camera capture within AskScreen — take a photo on the spot and send it immediately to the AI
  • Photo picker integration in AskScreen, same flow as HomeScreen's "New Photo" sheet

Sound Design

  • Five ambient audio cues — processing loop, reply ding, success chime, page turn, and delete sweep; all non-blocking and run off the main thread
  • Haptic feedback with configurable sensitivity (0–100%)
  • All sounds independently togglable per-event in Settings

Secure Vault (Collections)

  • Per-collection security — lock any album behind biometric or a custom password; vault photos are physically moved to filesDir/secure_photos, not merely hidden
  • Vault passwords hashed with SHA-256 + 16-byte random salt + 10,000 iterations; raw password is never written to disk
  • Constant-time password comparison to prevent timing attacks
  • Security progress bottom sheet with animated progress bar during vault operations
  • showSecureCollectionsInList setting to control vault visibility in the Collections tab

TTS Reader — Complete Rewrite

  • ReaderComponent embedded in every chat bubble and OCR result sheet
  • Reading modes: Characters, Words, Sentences, Paragraphs, Lines — switchable on the fly
  • Prev / Play-Pause / Next segment controls with TalkBack-labelled state
  • autoplayOnLoad and stopOnBackground lifecycle hooks
  • Non-blocking architecture: all TTS work runs on Dispatchers.IO; UI thread is never held
  • Voice selection, pitch, rate, and engine configurable in Settings

Full Settings Screen

  • Appearance & Gallery — theme (System/Light/Dark/AMOLED), accent color (6 options + Dynamic Material You), startup screen selection, grid density (2–5 columns), keep screen awake toggle
  • Advanced Filters — saved per-session; min/max width & height, date range picker, MIME type chip filter
  • AI Tuning — temperature, top-P, top-K sliders; response length (Brief / Balanced / Detailed / Extremely Detailed); thinking budget; streaming toggle; auto-generate descriptions
  • Sound & Vibration — per-event toggles for all five sound events plus haptic sensitivity slider
  • Security — biometric vs. custom-password vault mode; change vault password; disable vault with confirmation dialog
  • TTS — engine picker (all installed TTS engines), language picker (50+ languages from languages.json), pitch, rate, voice selection
  • Secure Sharing — strip GPS and full EXIF metadata before sharing
  • Language — AI response language picker with human-readable names (13+ languages from languages.json)

Onboarding Flow

  • 5-page pager onboarding on first launch, fully TalkBack-navigable
  • Pages: Welcome, Smart Gallery, AI Features, Accessibility, Privacy & Security
  • Model download prompt triggered automatically after onboarding if no model is present

ReasoningBlock Component

  • Expandable "chain-of-thought" block in chat bubbles and photo description — shows the model's internal reasoning before the final answer
  • Collapsed by default; animates open/close with a chevron

Markdown Renderer

  • Full MarkdownContent composable with no external dependency
  • Supports: headings (H1–H4), bold, italic, bold-italic, inline code, code blocks, blockquotes, unordered lists, ordered lists, horizontal rules, links (tappable)

About, Help & Support, Privacy, Terms screens

  • All rendered via the generic MarkdownScreen composable backed by res/raw/*.md files

Improvements

  • Smart Collections empty state — contextual message depending on whether any AI descriptions exist; CTA button to go to Settings when descriptions haven't been generated yet
  • Photo detail screen — pinch-to-zoom + two-finger rotate with spring-back animation; rotation persisted per photo; zoom/rotation reset on navigation
  • Bulk operations — add to favorites, generate descriptions, recognize text, share, add to collection, delete — all from the selection top bar
  • OCR bottom sheet — TTS reader embedded; copy + share buttons; streaming progress indicator
  • ListScreen — collection/album detail with the same sort/filter/bulk-select as HomeScreen
  • Session memory optimization — "Preparing AI Environment" overlay with polite live-region announcement so TalkBack users know when chat is ready
  • Scroll-to-bottom FAB in AskScreen — appears when the user scrolls up; animates in/out with fade + scale
  • Chat export — copy all messages to clipboard or share as plain text
  • Message regeneration — re-run the last AI turn from the same message index
  • SpeechRecognizer lifecycle fix — recognizer is properly destroyed when AskScreen leaves composition, preventing a native listener leak after back-navigation

Bug Fixes

  • GemmaAdapter.processResponse now builds JSON with JSONObject/JSONArray instead of string interpolation — fixes crashes on model output containing quotes, backslashes, or newlines
  • Vault password stored as SHA-256 hash + salt rather than plaintext — security fix
  • LiteRtLmManager single-session constraint: properly prevents a second createConversation while one is already open; description session re-opens when chat session ends
  • TtsManager no longer blocks the UI thread — all TTS calls post to Main via withContext(Dispatchers.Main) from Dispatchers.IO
  • Photo URI resolution handles both file:// scheme and content:// URIs via ParcelFileDescriptor /proc/self/fd trick

[1.1.0] — 2026-04-14

Theme: Multi-model architecture with model-specific adapters for full control over tokenisation strategy.

New Features

Multi-Model Architecture

  • ModelAdapter interface — pluggable adapter per model that controls system instruction, content building, conversation config creation, and response post-processing
  • GemmaAdapter — dedicated adapter for Gemma 4 models; implements tool-call structured JSON response parsing with proper escaping
  • FastVlmAdapter — adapter skeleton for FastVLM-class models with different tokenisation strategy
  • Models defined in assets/models.jsonadapter field determines which adapter is instantiated at runtime

Model-Specific Behaviour

  • getSystemInstruction() — per-adapter system prompt customisation
  • buildAnalysisContent() — per-adapter content construction for photo analysis turns
  • buildAskContent() — per-adapter content construction for interactive chat turns
  • createConversationConfig() — per-adapter ConversationConfig including sampler config and tool registration
  • processResponse() — per-adapter post-processing of raw model output and tool call arguments

JsonModel Data Class

  • New fields: adapter (string, selects adapter class), preferredBackend (overrides global CPU/GPU), toolCall (boolean), memoryMinRequired, memoryRecommended
  • ModelStatus wraps JsonModel with download state, progress bytes, speed, ETA, and error message

Models Manager

  • Redesigned ModelConfigScreen — card-per-model layout showing size, RAM requirements, download status, speed, ETA
  • Download/cancel/delete controls per model; active model shown with a checkmark badge
  • Model switch triggers full LiteRtLmManager.shutdown() + re-init to ensure correct adapter is loaded

Improvements

  • LiteRtLmManager session mode enum (DESCRIPTION / CHAT) replaces boolean flag — clearer invariants and easier to extend
  • Speculative decoding enabled via ExperimentalFlags.enableSpeculativeDecoding = true for faster token generation
  • ModelDownloadService — foreground service with resume support via HTTP Range header; progress reported via SharedFlow

Bug Fixes

  • ModelAdapter.processResponse with tool calls no longer crashes on model output that embeds special JSON characters — replaced string interpolation with JSONObject.put() throughout

[1.0.0] — 2026-04-14

Theme: Initial release — on-device AI photo gallery built for accessibility.

New Features

Core AI Integration

  • On-device AI inference via Google AI Edge LiteRT-LM with Gemma 4 multimodal model
  • Natural language photo descriptions — full sentences generated by Gemma 4 locally, no internet
  • Interactive Ask Mode — streaming chat interface to ask any question about a photo
  • Smart categorisation — photos automatically grouped into Nature, People, Food, Documents, Travel, Architecture, Pets, Sports via tool calls
  • OCR / text recognition — extract text from any image using the same on-device model
  • Thinking Mode — chain-of-thought reasoning visible before the final answer
  • Multilingual output — 13 languages selectable for AI response language

Gallery & Navigation

  • Grid and list view — togglable; grid supports 2–5 columns
  • Date-grouped photo timeline with sticky headers
  • Local Albums via Android MediaStore
  • Smart Collections — dynamic albums built from AI-generated categories
  • Favorites — star any photo; dedicated Favorites tab
  • Bottom navigation: Photos / Collections / Favorites
  • Full-screen photo detail with share, favorite, rotate, more-menu

Privacy & Security

  • Zero cloud processing — all AI runs on-device GPU/CPU/NPU
  • No analytics, no telemetry, no account required
  • Secure Vault foundation — architecture for secure collections in place

Accessibility Foundation

  • Full TalkBack semantic labelling on every UI element
  • WCAG 2.1 Level AA — high contrast, generous touch targets, predictable navigation
  • Live regions for progress announcements
  • Voice input for Ask Mode
  • Built-in TTS with segment-based reading controls

Model Management

  • Foreground download service for model files (~2.4 GB)
  • Progress tracking with speed and ETA
  • GPU / CPU backend selection

Settings

  • AI backend (GPU / CPU), temperature, response language
  • Basic gallery preferences (view mode, grid columns, sort order)

Other Screens

  • About, Help & Support — Markdown-rendered
  • Privacy Policy and Terms of Use — in-app Markdown

Technical Foundation

  • MVVM + Repository pattern; single source of truth in PhotoRepository
  • Jetpack Compose + Material 3 throughout
  • Hilt dependency injection
  • Room v9 database for photo metadata and description state
  • DataStore for persistent preferences
  • Coil for image loading with HEIC→JPEG auto-conversion

For the full feature list, see features.md.