Commit graph

83 commits

Author SHA1 Message Date
c8fa80be99 fix(api): rename 'metadata' to 'response_metadata' to avoid Pydantic conflict
The field name 'metadata' conflicts with Pydantic's internal ClassVar.
Renamed to 'response_metadata' throughout:
- Backend: RoutineSuggestion, BatchSuggestion, ShoppingSuggestionResponse
- Frontend: TypeScript types and component usages

This fixes the AttributeError when setting metadata on SQLModel instances.
2026-03-06 16:16:35 +01:00
3c3248c2ea feat(api): add Phase 3 observability - expose validation warnings and metadata to frontend
Backend changes:
- Create ResponseMetadata and TokenMetrics models for API responses
- Modify call_gemini() and call_gemini_with_function_tools() to return (response, log_id) tuple
- Add _build_response_metadata() helper to extract metadata from AICallLog
- Update routines API (/suggest, /suggest-batch) to populate validation_warnings, auto_fixes_applied, and metadata
- Update products API (/suggest) to populate observability fields
- Update skincare API to handle new return signature

Frontend changes:
- Add TypeScript types: TokenMetrics, ResponseMetadata
- Update RoutineSuggestion, BatchSuggestion, ShoppingSuggestionResponse with observability fields

Next: Create UI components to display warnings, reasoning chains, and token metrics
2026-03-06 15:50:28 +01:00
3bf19d8acb feat(api): add enhanced token metrics logging for Gemini API
Add comprehensive token breakdown logging to understand MAX_TOKENS behavior
and verify documentation claims about thinking tokens.

New Fields Added to ai_call_logs:
- thoughts_tokens: Thinking tokens (thoughtsTokenCount) - documented as
  separate from output budget
- tool_use_prompt_tokens: Tool use overhead (toolUsePromptTokenCount)
- cached_content_tokens: Cached content tokens (cachedContentTokenCount)

Purpose:
Investigate token counting mystery from production logs where:
  prompt_tokens: 4400
  completion_tokens: 589
  total_tokens: 8489  ← Should be 4400 + 589 = 4989, missing 3500!

According to Gemini API docs (Polish translation):
  totalTokenCount = promptTokenCount + candidatesTokenCount
  (thoughts NOT included in total)

But production logs show 3500 token gap. New logging will reveal:
1. Are thinking tokens actually separate from max_output_tokens limit?
2. Where did the 3500 missing tokens go?
3. Does MEDIUM thinking level consume output budget despite docs?
4. Are tool use tokens included in total but not shown separately?

Changes:
- Added 3 new integer columns to ai_call_logs (nullable)
- Enhanced llm.py to capture all usage_metadata fields
- Used getattr() for safe access (fields may not exist in all responses)
- Database migration: 7e6f73d1cc95

This will provide complete data for future LLM calls to diagnose:
- MAX_TOKENS failures
- Token budget behavior
- Thinking token costs
- Tool use overhead
2026-03-06 12:17:13 +01:00
5bb2ea5f08 feat(api): add short_id column for consistent LLM UUID handling
Resolves validation failures where LLM fabricated full UUIDs from 8-char
prefixes shown in context, causing 'unknown product_id' errors.

Root Cause Analysis:
- Context showed 8-char short IDs: '77cbf37c' (Phase 2 optimization)
- Function tool returned full UUIDs: '77cbf37c-3830-4927-...'
- LLM saw BOTH formats, got confused, invented UUIDs for final response
- Validators rejected fabricated UUIDs as unknown products

Solution: Consistent 8-char short_id across LLM boundary:
1. Database: New short_id column (8 chars, unique, indexed)
2. Context: Shows short_id (was: str(id)[:8])
3. Function tools: Return short_id (was: full UUID)
4. Translation layer: Expands short_id → UUID before validation
5. Database: Stores full UUIDs (no schema change for existing data)

Changes:
- Added products.short_id column with unique constraint + index
- Migration populates from UUID prefix, handles collisions via regeneration
- Product model auto-generates short_id for new products
- LLM contexts use product.short_id consistently
- Function tools return product.short_id
- Added _expand_product_id() translation layer in routines.py
- Integrated expansion in suggest_routine() and suggest_batch()
- Validators work with full UUIDs (no changes needed)

Benefits:
 LLM never sees full UUIDs, no format confusion
 Maintains Phase 2 token optimization (~85% reduction)
 O(1) indexed short_id lookups vs O(n) pattern matching
 Unique constraint prevents collisions at DB level
 Clean separation: 8-char for LLM, 36-char for application

From production error:
  Step 1: unknown product_id 77cbf37c-3830-4927-9669-07447206689d
  (LLM invented the last 28 characters)

Now resolved: LLM uses '77cbf37c' consistently, translation layer
expands to real UUID before validation.
2026-03-06 10:58:26 +01:00
710b53e471 fix(api): resolve function tool UUID mismatch and MAX_TOKENS errors
Two critical bugs identified from production logs:

1. UUID Mismatch Bug (0 products returned from function tools):
   - Context shows 8-char short IDs: '63278801'
   - Function handler expected full UUIDs: '63278801-xxxx-...'
   - LLM requested short IDs, handler couldn't match → 0 products

   Fix: Index products by BOTH full UUID and short ID (first 8 chars)
   in build_product_details_tool_handler. Accept either format.
   Added deduplication to handle duplicate requests.
   Maintains Phase 2 token optimization (no context changes).

2. MAX_TOKENS Error (response truncation):
   - max_output_tokens=4096 includes thinking tokens (~3500)
   - Only ~500 tokens left for JSON response
   - MEDIUM thinking level (Phase 2) consumed budget

   Fix: Increase max_output_tokens from 4096 → 8192 across all
   creative endpoints (routines/suggest, routines/suggest-batch,
   products/suggest). Updated default in get_creative_config().

   Gives headroom: ~3500 thinking + ~4500 response = ~8000 total

From production logs (ai_call_logs):
- Log 71699654: Success but response_text null (function call only)
- Log 2db37c0f: MAX_TOKENS failure, tool returned 0 products

Both issues now resolved.
2026-03-06 10:44:12 +01:00
3ef1f249b6 fix(api): handle dict vs object in build_product_context_summary
When products are loaded from PostgreSQL, JSON columns (effect_profile,
context_rules) are deserialized as plain dicts, not Pydantic models.

The build_product_context_summary function was accessing these fields
as object attributes (.safe_with_compromised_barrier) which caused:
AttributeError: 'dict' object has no attribute 'safe_with_compromised_barrier'

Fix: Add isinstance(dict) checks like build_product_context_detailed already does.
Handle both dict (from DB) and object (from Pydantic) cases.

Traceback from production:
  File "llm_context.py", line 91, in build_product_context_summary
    if product.context_rules.safe_with_compromised_barrier:
  AttributeError: 'dict' object has no attribute...
2026-03-06 10:34:51 +01:00
594dae474b refactor(api): remove redundant field ban language from prompts
Schema enforcement already prevents LLM from returning fields outside
the defined response_schema (_SingleStepOut, _BatchStepOut). Explicit
field bans (dose, amount, quantity, application_amount) are redundant
and add unnecessary token cost.

Removed:
- 'KRYTYCZNE' warning about schema violations
- 'ZABRONIONE POLA' explicit field list
- 4-line 'ABSOLUTNIE ZABRONIONE' dose prohibition section

Token savings: ~80 tokens per prompt (system instruction overhead)

Trust the schema - cleaner prompts, same enforcement.
2026-03-06 10:30:36 +01:00
c87d1b8581 feat(api): implement Phase 2 token optimization and reasoning capture
- Add tiered context system (summary/detailed/full) to reduce token usage by 70-80%
- Replace old _build_products_context with build_products_context_summary_list (Tier 1: ~15 tokens/product vs 150)
- Optimize function tool responses: exclude INCI list by default (saves ~15KB/product)
- Reduce actives from 24 to top 5 in function tools
- Add reasoning_chain field to AICallLog model for observability
- Implement _extract_thinking_content to capture LLM reasoning (MEDIUM thinking level)
- Strengthen prompt enforcement for prohibited fields (dose, amount, quantity)
- Update get_creative_config to use MEDIUM thinking level instead of LOW

Token Savings:
- Routine suggestions: 9,613 → ~1,300 tokens (-86%)
- Batch planning: 12,580 → ~1,800 tokens (-86%)
- Function tool responses: ~15KB → ~2KB per product (-87%)

Breaks discovered in log analysis (ai_call_log.json):
- Lines 10, 27, 61, 78: LLM returned prohibited dose field
- Line 85: MAX_TOKENS failure (output truncated)

Phase 2 complete. Next: two-phase batch planning with safety verification.
2026-03-06 10:26:29 +01:00
e239f61408 style: apply black and isort formatting
Run formatting tools on Phase 1 changes:
- black (code formatter)
- isort (import sorter)
- ruff (linter)

All linting checks pass.
2026-03-06 10:17:00 +01:00
2a9391ad32 feat(api): add LLM response validation and input sanitization
Implement Phase 1: Safety & Validation for all LLM-based suggestion engines.

- Add input sanitization module to prevent prompt injection attacks
- Implement 5 comprehensive validators (routine, batch, shopping, product parse, photo)
- Add 10+ critical safety checks (retinoid+acid conflicts, barrier compatibility, etc.)
- Integrate validation into all 5 API endpoints (routines, products, skincare)
- Add validation fields to ai_call_logs table (validation_errors, validation_warnings, auto_fixed)
- Create database migration for validation fields
- Add comprehensive test suite (9/9 tests passing, 88% coverage on validators)

Safety improvements:
- Blocks retinoid + acid conflicts in same routine/day
- Rejects unknown product IDs
- Enforces min_interval_hours rules
- Protects compromised skin barriers
- Prevents prohibited fields (dose, amount) in responses
- Validates all enum values and score ranges

All validation failures are logged and responses are rejected with HTTP 502.
2026-03-06 10:16:47 +01:00
e3ed0dd3a3 fix(routines): enforce min_interval_hours and minoxidil flag server-side
Two bugs in /routines/suggest where the LLM could override hard constraints:

1. Products with min_interval_hours (e.g. retinol at 72h) were passed to
   the LLM even if used too recently. The LLM reasoned away the constraint
   in at least one observed case. Fix: added _filter_products_by_interval()
   which removes ineligible products before the prompt is built, so they
   don't appear in AVAILABLE PRODUCTS at all.

2. Minoxidil was included in the available products list regardless of the
   include_minoxidil_beard flag. Only the objectives context was gated,
   leaving the product visible to the LLM which would include it based on
   recent usage history. Fix: added include_minoxidil param to
   _get_available_products() and threaded it through suggest_routine and
   suggest_batch.

Also refactored _build_products_context() to accept a pre-supplied
products list instead of calling _get_available_products() internally,
ensuring the tool handler and context text always use the same filtered set.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 23:36:15 +01:00
7a66a7911d feat(backend): include last-used date in product LLM details 2026-03-05 16:48:49 +01:00
40d26514a1 refactor(backend): consolidate product LLM function tools 2026-03-05 16:44:03 +01:00
b99b9ed68e feat(profile): add profile settings and LLM user context 2026-03-05 15:57:21 +01:00
db3d9514d5 fix(routines): remove dose from AI routine suggestions 2026-03-05 14:19:18 +01:00
0a4ccefe28 feat(repo): expand lab results workflows across backend and frontend 2026-03-05 12:46:49 +01:00
013492ec2b refactor(products): remove usage notes and contraindications fields 2026-03-05 10:11:24 +01:00
30315fdf56 fix(backend): create pricetier enum before migration 2026-03-04 23:16:55 +01:00
0e439b4ca7 feat(backend): move product pricing to async persisted jobs 2026-03-04 22:46:16 +01:00
c869f88db2 chore(backend): enable psycopg binary dependency 2026-03-04 21:46:38 +01:00
83ba4cc5c0 feat(products): compute price tiers from objective price/use 2026-03-04 14:47:18 +01:00
c5ea38880c refactor(products): remove obsolete interaction fields across stack 2026-03-04 12:42:12 +01:00
1d8a8eafb8 refactor(api): remove MCP server integration and docs references 2026-03-04 12:28:30 +01:00
5dd8242985 fix(routines): simplify inventory preference in system prompt 2026-03-04 12:18:07 +01:00
b58fcb1440 feat(api): add tool-calling flow for shopping suggestions
Keep /products/suggest lean by exposing product UUIDs and fetching INCI, safety rules, actives, and usage notes on demand through Gemini function tools. Add conservative fallback behavior for tool roundtrip limits and expand helper tests to cover tool wiring and payload handlers.
2026-03-04 12:05:33 +01:00
558708653c feat(api): expand routines tool-calling to reduce prompt load
Keep the /routines/suggest base context lean by sending only active names and fetching detailed safety, actives, usage notes, and INCI on demand. Add a conservative fallback when tool roundtrip limits are hit to preserve safe outputs instead of failing the request.
2026-03-04 11:52:07 +01:00
cfd2485b7e feat(api): add INCI tool-calling with normalized tool traces
Enable on-demand INCI retrieval in /routines/suggest through Gemini function calling so detailed ingredient data is fetched only when needed. Persist and normalize tool_trace data in AI logs to make function-call behavior directly inspectable via /ai-logs endpoints.
2026-03-04 11:35:19 +01:00
c0eeb0425d fix(routines): include product safety and usage signals in prompts
Expose leave-on behavior, contraindications, safety alerts, and compact usage notes in AVAILABLE PRODUCTS so Gemini can make safer routine decisions with real-world product constraints.
2026-03-04 02:42:16 +01:00
9bbc34ffd2 test(api): fix ruff issues in routine tests 2026-03-04 02:23:19 +01:00
472a3034a0 feat(routines): refine therapeutic and travel-mode prompt rules 2026-03-04 02:22:39 +01:00
820d58ea37 feat(routines): enrich single AI suggestions with concise context 2026-03-04 01:22:57 +01:00
88f3642387 test(api): add tests for ai suggestion endpoints and helpers 2026-03-03 22:06:33 +01:00
5ad9b66a21 build(backend): add pytest-cov configuration and report generation 2026-03-03 22:06:24 +01:00
ba1f10d99f refactor(llm): optimize Gemini config profiles for extraction and creativity
Introduces `get_extraction_config` and `get_creative_config` to standardize Gemini API calls.

* Defines explicit config profiles with appropriate `temperature` and `thinking_level` for Gemini 3 Flash.
* Extraction tasks use minimal thinking and temp=0.0 to reduce latency and token usage.
* Creative tasks use low thinking, temp=0.4, and top_p=0.8 to balance naturalness and safety.
* Applies these helpers across products, routines, and skincare endpoints.
* Also updates default model to `gemini-3-flash-preview`.
2026-03-03 21:24:23 +01:00
78df7322a9 refactor(api): remove shopping assistant logic from mcp_server 2026-03-03 20:51:42 +01:00
0e7a39836f refactor(routines): use category and short uuid for recent history representation 2026-03-03 20:29:36 +01:00
28fb74b9bf refactor(routines): translate prompt input keys to english to reduce language switch penalty 2026-03-03 20:24:56 +01:00
9574c91be1 refactor(routines): remove hardcoded grooming actions from system prompt 2026-03-03 20:22:59 +01:00
4627ec70bf refactor(routines): remove examples from inventory management rule to avoid bias 2026-03-03 20:07:13 +01:00
30ebc093bf feat(routines): adjust inventory management prompt to allow opening better suited sealed products 2026-03-03 20:06:38 +01:00
877051cfaf feat(routines): add actives and recent usage tracking to product context 2026-03-03 20:01:39 +01:00
1109d9f397 fix(products): only suggest when real need exists 2026-03-03 19:51:49 +01:00
609995732b feat(routines): add minimize_products option for batch suggestions 2026-03-03 00:50:49 +01:00
40f9a353bb feat(products): add shopping suggestions feature
- Add POST /api/products/suggest endpoint that analyzes skin condition
  and inventory to suggest product types (e.g., 'Salicylic Acid 2% Masque')
- Add MCP tool get_shopping_suggestions() for MCP clients
- Add 'Suggest' button to Products page in frontend
- Add /products/suggest page with suggestion cards
- Include product type, key ingredients, target concerns, why_needed,
  recommended_time, and frequency in suggestions
- Fix stock logic: sealed products now count as available inventory
- Add legend to clarify ✓ (in stock) vs ✗ (not in stock) markers
2026-03-02 22:38:08 +01:00
389ca5ffdc fix(backend): resolve ty check errors across api, mcp, and lifespan typing 2026-03-02 15:51:14 +01:00
c85ca355df refactor(routines): streamline suggest prompt — merge inventory context, add leaving_home SPF hint
- Remove _build_inventory_context; fold pao_months into DOSTĘPNE PRODUKTY entries
- Remove "Otwarte równolegle" duplicate section from prompt
- Rename OSTATNIE RUTYNY (7 dni) → OSTATNIE RUTYNY
- Add _build_day_context and SuggestRoutineRequest.leaving_home (optional bool)
- System prompt: replace unconditional PAO rule with conditional; add SPF factor
  selection logic based on KONTEKST DNIA leaving_home value
- Frontend: leaving_home checkbox (AM only) + i18n keys pl/en

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 23:47:54 +01:00
258b8c4330 refactor(routines): use SQLAlchemy is_(False) for product filters 2026-03-01 23:23:04 +01:00
d3bd2ff30d feat(skincare): allow HEIC/HEIF uploads in skin analysis 2026-03-01 23:23:04 +01:00
f1acfa21fc feat(routines): add inventory-aware product selection rules 2026-03-01 22:15:47 +01:00
914c6087bd fix(products): work around Gemini int-enum schema rejection in parse-text
Gemini API rejects int-valued enums (StrengthLevel) in response_schema,
raising a validation error before any request is sent. Fix by introducing
AIActiveIngredient (inherits ActiveIngredient, overrides strength_level and
irritation_potential as Optional[int]) and ProductParseLLMResponse used only
as the Gemini schema. The two-step validation converts ints back to StrengthLevel
via Pydantic coercion. Adds a test covering the numeric strength level path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 22:00:48 +01:00