Backend changes:
- Create ResponseMetadata and TokenMetrics models for API responses
- Modify call_gemini() and call_gemini_with_function_tools() to return (response, log_id) tuple
- Add _build_response_metadata() helper to extract metadata from AICallLog
- Update routines API (/suggest, /suggest-batch) to populate validation_warnings, auto_fixes_applied, and metadata
- Update products API (/suggest) to populate observability fields
- Update skincare API to handle new return signature
Frontend changes:
- Add TypeScript types: TokenMetrics, ResponseMetadata
- Update RoutineSuggestion, BatchSuggestion, ShoppingSuggestionResponse with observability fields
Next: Create UI components to display warnings, reasoning chains, and token metrics
Add comprehensive token breakdown logging to understand MAX_TOKENS behavior
and verify documentation claims about thinking tokens.
New Fields Added to ai_call_logs:
- thoughts_tokens: Thinking tokens (thoughtsTokenCount) - documented as
separate from output budget
- tool_use_prompt_tokens: Tool use overhead (toolUsePromptTokenCount)
- cached_content_tokens: Cached content tokens (cachedContentTokenCount)
Purpose:
Investigate token counting mystery from production logs where:
prompt_tokens: 4400
completion_tokens: 589
total_tokens: 8489 ← Should be 4400 + 589 = 4989, missing 3500!
According to Gemini API docs (Polish translation):
totalTokenCount = promptTokenCount + candidatesTokenCount
(thoughts NOT included in total)
But production logs show 3500 token gap. New logging will reveal:
1. Are thinking tokens actually separate from max_output_tokens limit?
2. Where did the 3500 missing tokens go?
3. Does MEDIUM thinking level consume output budget despite docs?
4. Are tool use tokens included in total but not shown separately?
Changes:
- Added 3 new integer columns to ai_call_logs (nullable)
- Enhanced llm.py to capture all usage_metadata fields
- Used getattr() for safe access (fields may not exist in all responses)
- Database migration: 7e6f73d1cc95
This will provide complete data for future LLM calls to diagnose:
- MAX_TOKENS failures
- Token budget behavior
- Thinking token costs
- Tool use overhead
Resolves validation failures where LLM fabricated full UUIDs from 8-char
prefixes shown in context, causing 'unknown product_id' errors.
Root Cause Analysis:
- Context showed 8-char short IDs: '77cbf37c' (Phase 2 optimization)
- Function tool returned full UUIDs: '77cbf37c-3830-4927-...'
- LLM saw BOTH formats, got confused, invented UUIDs for final response
- Validators rejected fabricated UUIDs as unknown products
Solution: Consistent 8-char short_id across LLM boundary:
1. Database: New short_id column (8 chars, unique, indexed)
2. Context: Shows short_id (was: str(id)[:8])
3. Function tools: Return short_id (was: full UUID)
4. Translation layer: Expands short_id → UUID before validation
5. Database: Stores full UUIDs (no schema change for existing data)
Changes:
- Added products.short_id column with unique constraint + index
- Migration populates from UUID prefix, handles collisions via regeneration
- Product model auto-generates short_id for new products
- LLM contexts use product.short_id consistently
- Function tools return product.short_id
- Added _expand_product_id() translation layer in routines.py
- Integrated expansion in suggest_routine() and suggest_batch()
- Validators work with full UUIDs (no changes needed)
Benefits:
✅ LLM never sees full UUIDs, no format confusion
✅ Maintains Phase 2 token optimization (~85% reduction)
✅ O(1) indexed short_id lookups vs O(n) pattern matching
✅ Unique constraint prevents collisions at DB level
✅ Clean separation: 8-char for LLM, 36-char for application
From production error:
Step 1: unknown product_id 77cbf37c-3830-4927-9669-07447206689d
(LLM invented the last 28 characters)
Now resolved: LLM uses '77cbf37c' consistently, translation layer
expands to real UUID before validation.
Two critical bugs identified from production logs:
1. UUID Mismatch Bug (0 products returned from function tools):
- Context shows 8-char short IDs: '63278801'
- Function handler expected full UUIDs: '63278801-xxxx-...'
- LLM requested short IDs, handler couldn't match → 0 products
Fix: Index products by BOTH full UUID and short ID (first 8 chars)
in build_product_details_tool_handler. Accept either format.
Added deduplication to handle duplicate requests.
Maintains Phase 2 token optimization (no context changes).
2. MAX_TOKENS Error (response truncation):
- max_output_tokens=4096 includes thinking tokens (~3500)
- Only ~500 tokens left for JSON response
- MEDIUM thinking level (Phase 2) consumed budget
Fix: Increase max_output_tokens from 4096 → 8192 across all
creative endpoints (routines/suggest, routines/suggest-batch,
products/suggest). Updated default in get_creative_config().
Gives headroom: ~3500 thinking + ~4500 response = ~8000 total
From production logs (ai_call_logs):
- Log 71699654: Success but response_text null (function call only)
- Log 2db37c0f: MAX_TOKENS failure, tool returned 0 products
Both issues now resolved.
When products are loaded from PostgreSQL, JSON columns (effect_profile,
context_rules) are deserialized as plain dicts, not Pydantic models.
The build_product_context_summary function was accessing these fields
as object attributes (.safe_with_compromised_barrier) which caused:
AttributeError: 'dict' object has no attribute 'safe_with_compromised_barrier'
Fix: Add isinstance(dict) checks like build_product_context_detailed already does.
Handle both dict (from DB) and object (from Pydantic) cases.
Traceback from production:
File "llm_context.py", line 91, in build_product_context_summary
if product.context_rules.safe_with_compromised_barrier:
AttributeError: 'dict' object has no attribute...
- Add tiered context system (summary/detailed/full) to reduce token usage by 70-80%
- Replace old _build_products_context with build_products_context_summary_list (Tier 1: ~15 tokens/product vs 150)
- Optimize function tool responses: exclude INCI list by default (saves ~15KB/product)
- Reduce actives from 24 to top 5 in function tools
- Add reasoning_chain field to AICallLog model for observability
- Implement _extract_thinking_content to capture LLM reasoning (MEDIUM thinking level)
- Strengthen prompt enforcement for prohibited fields (dose, amount, quantity)
- Update get_creative_config to use MEDIUM thinking level instead of LOW
Token Savings:
- Routine suggestions: 9,613 → ~1,300 tokens (-86%)
- Batch planning: 12,580 → ~1,800 tokens (-86%)
- Function tool responses: ~15KB → ~2KB per product (-87%)
Breaks discovered in log analysis (ai_call_log.json):
- Lines 10, 27, 61, 78: LLM returned prohibited dose field
- Line 85: MAX_TOKENS failure (output truncated)
Phase 2 complete. Next: two-phase batch planning with safety verification.
Two bugs in /routines/suggest where the LLM could override hard constraints:
1. Products with min_interval_hours (e.g. retinol at 72h) were passed to
the LLM even if used too recently. The LLM reasoned away the constraint
in at least one observed case. Fix: added _filter_products_by_interval()
which removes ineligible products before the prompt is built, so they
don't appear in AVAILABLE PRODUCTS at all.
2. Minoxidil was included in the available products list regardless of the
include_minoxidil_beard flag. Only the objectives context was gated,
leaving the product visible to the LLM which would include it based on
recent usage history. Fix: added include_minoxidil param to
_get_available_products() and threaded it through suggest_routine and
suggest_batch.
Also refactored _build_products_context() to accept a pre-supplied
products list instead of calling _get_available_products() internally,
ensuring the tool handler and context text always use the same filtered set.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Keep /products/suggest lean by exposing product UUIDs and fetching INCI, safety rules, actives, and usage notes on demand through Gemini function tools. Add conservative fallback behavior for tool roundtrip limits and expand helper tests to cover tool wiring and payload handlers.
Keep the /routines/suggest base context lean by sending only active names and fetching detailed safety, actives, usage notes, and INCI on demand. Add a conservative fallback when tool roundtrip limits are hit to preserve safe outputs instead of failing the request.
Enable on-demand INCI retrieval in /routines/suggest through Gemini function calling so detailed ingredient data is fetched only when needed. Persist and normalize tool_trace data in AI logs to make function-call behavior directly inspectable via /ai-logs endpoints.
Expose leave-on behavior, contraindications, safety alerts, and compact usage notes in AVAILABLE PRODUCTS so Gemini can make safer routine decisions with real-world product constraints.
Introduces `get_extraction_config` and `get_creative_config` to standardize Gemini API calls.
* Defines explicit config profiles with appropriate `temperature` and `thinking_level` for Gemini 3 Flash.
* Extraction tasks use minimal thinking and temp=0.0 to reduce latency and token usage.
* Creative tasks use low thinking, temp=0.4, and top_p=0.8 to balance naturalness and safety.
* Applies these helpers across products, routines, and skincare endpoints.
* Also updates default model to `gemini-3-flash-preview`.
- Add POST /api/products/suggest endpoint that analyzes skin condition
and inventory to suggest product types (e.g., 'Salicylic Acid 2% Masque')
- Add MCP tool get_shopping_suggestions() for MCP clients
- Add 'Suggest' button to Products page in frontend
- Add /products/suggest page with suggestion cards
- Include product type, key ingredients, target concerns, why_needed,
recommended_time, and frequency in suggestions
- Fix stock logic: sealed products now count as available inventory
- Add legend to clarify ✓ (in stock) vs ✗ (not in stock) markers
- Remove _build_inventory_context; fold pao_months into DOSTĘPNE PRODUKTY entries
- Remove "Otwarte równolegle" duplicate section from prompt
- Rename OSTATNIE RUTYNY (7 dni) → OSTATNIE RUTYNY
- Add _build_day_context and SuggestRoutineRequest.leaving_home (optional bool)
- System prompt: replace unconditional PAO rule with conditional; add SPF factor
selection logic based on KONTEKST DNIA leaving_home value
- Frontend: leaving_home checkbox (AM only) + i18n keys pl/en
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Gemini API rejects int-valued enums (StrengthLevel) in response_schema,
raising a validation error before any request is sent. Fix by introducing
AIActiveIngredient (inherits ActiveIngredient, overrides strength_level and
irritation_potential as Optional[int]) and ProductParseLLMResponse used only
as the Gemini schema. The two-step validation converts ints back to StrengthLevel
via Pydantic coercion. Adds a test covering the numeric strength level path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>