Add comprehensive token breakdown logging to understand MAX_TOKENS behavior
and verify documentation claims about thinking tokens.
New Fields Added to ai_call_logs:
- thoughts_tokens: Thinking tokens (thoughtsTokenCount) - documented as
separate from output budget
- tool_use_prompt_tokens: Tool use overhead (toolUsePromptTokenCount)
- cached_content_tokens: Cached content tokens (cachedContentTokenCount)
Purpose:
Investigate token counting mystery from production logs where:
prompt_tokens: 4400
completion_tokens: 589
total_tokens: 8489 ← Should be 4400 + 589 = 4989, missing 3500!
According to Gemini API docs (Polish translation):
totalTokenCount = promptTokenCount + candidatesTokenCount
(thoughts NOT included in total)
But production logs show 3500 token gap. New logging will reveal:
1. Are thinking tokens actually separate from max_output_tokens limit?
2. Where did the 3500 missing tokens go?
3. Does MEDIUM thinking level consume output budget despite docs?
4. Are tool use tokens included in total but not shown separately?
Changes:
- Added 3 new integer columns to ai_call_logs (nullable)
- Enhanced llm.py to capture all usage_metadata fields
- Used getattr() for safe access (fields may not exist in all responses)
- Database migration: 7e6f73d1cc95
This will provide complete data for future LLM calls to diagnose:
- MAX_TOKENS failures
- Token budget behavior
- Thinking token costs
- Tool use overhead
Resolves validation failures where LLM fabricated full UUIDs from 8-char
prefixes shown in context, causing 'unknown product_id' errors.
Root Cause Analysis:
- Context showed 8-char short IDs: '77cbf37c' (Phase 2 optimization)
- Function tool returned full UUIDs: '77cbf37c-3830-4927-...'
- LLM saw BOTH formats, got confused, invented UUIDs for final response
- Validators rejected fabricated UUIDs as unknown products
Solution: Consistent 8-char short_id across LLM boundary:
1. Database: New short_id column (8 chars, unique, indexed)
2. Context: Shows short_id (was: str(id)[:8])
3. Function tools: Return short_id (was: full UUID)
4. Translation layer: Expands short_id → UUID before validation
5. Database: Stores full UUIDs (no schema change for existing data)
Changes:
- Added products.short_id column with unique constraint + index
- Migration populates from UUID prefix, handles collisions via regeneration
- Product model auto-generates short_id for new products
- LLM contexts use product.short_id consistently
- Function tools return product.short_id
- Added _expand_product_id() translation layer in routines.py
- Integrated expansion in suggest_routine() and suggest_batch()
- Validators work with full UUIDs (no changes needed)
Benefits:
✅ LLM never sees full UUIDs, no format confusion
✅ Maintains Phase 2 token optimization (~85% reduction)
✅ O(1) indexed short_id lookups vs O(n) pattern matching
✅ Unique constraint prevents collisions at DB level
✅ Clean separation: 8-char for LLM, 36-char for application
From production error:
Step 1: unknown product_id 77cbf37c-3830-4927-9669-07447206689d
(LLM invented the last 28 characters)
Now resolved: LLM uses '77cbf37c' consistently, translation layer
expands to real UUID before validation.
- Add tiered context system (summary/detailed/full) to reduce token usage by 70-80%
- Replace old _build_products_context with build_products_context_summary_list (Tier 1: ~15 tokens/product vs 150)
- Optimize function tool responses: exclude INCI list by default (saves ~15KB/product)
- Reduce actives from 24 to top 5 in function tools
- Add reasoning_chain field to AICallLog model for observability
- Implement _extract_thinking_content to capture LLM reasoning (MEDIUM thinking level)
- Strengthen prompt enforcement for prohibited fields (dose, amount, quantity)
- Update get_creative_config to use MEDIUM thinking level instead of LOW
Token Savings:
- Routine suggestions: 9,613 → ~1,300 tokens (-86%)
- Batch planning: 12,580 → ~1,800 tokens (-86%)
- Function tool responses: ~15KB → ~2KB per product (-87%)
Breaks discovered in log analysis (ai_call_log.json):
- Lines 10, 27, 61, 78: LLM returned prohibited dose field
- Line 85: MAX_TOKENS failure (output truncated)
Phase 2 complete. Next: two-phase batch planning with safety verification.
Enable on-demand INCI retrieval in /routines/suggest through Gemini function calling so detailed ingredient data is fetched only when needed. Persist and normalize tool_trace data in AI logs to make function-call behavior directly inspectable via /ai-logs endpoints.
When Gemini stops generation early (e.g. due to safety filters or
thinking-model quirks), finish_reason != STOP but no exception is raised,
causing the caller to receive truncated JSON and a confusing 502 "invalid
JSON" error. Now:
- finish_reason is extracted from candidates[0] and stored in ai_call_logs
- any non-STOP finish_reason raises HTTP 502 with a clear message
- Alembic migration adds the finish_reason column to ai_call_logs
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add include_minoxidil_beard flag to SuggestRoutineRequest and SuggestBatchRequest
- Detect minoxidil products by scanning name, brand, INCI and actives; pass them
to the LLM even though they are medications
- Inject CELE UŻYTKOWNIKA context block into prompts when flag is enabled
- Add _build_objectives_context() returning empty string when flag is off
- Add call_gemini() helper that centralises Gemini API calls and logs every
request/response to a new ai_call_logs table (AICallLog model + /ai-logs router)
- Nginx: raise client_max_body_size to 16 MB for photo uploads
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add alembic 1.14 to dependencies (uv sync → 1.18.4 installed)
- Configure alembic/env.py: loads DATABASE_URL from env, imports all
SQLModel models so metadata is fully populated for autogenerate
- Generate initial migration (c2d626a2b36c) covering all 9 tables:
products, product_inventory, medication_entries, medication_usages,
lab_results, routines, routine_steps, grooming_schedule,
skin_condition_snapshots — with all indexes and constraints
- Add ExecStartPre to innercontext.service: runs alembic upgrade head
before uvicorn starts (idempotent, safe on every restart)
- Update DEPLOYMENT.md: add migration step to backend setup and update
flow; document alembic stamp head for existing installations
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>