Commit graph

153 commits

Author SHA1 Message Date
dac787b81b test(auth): add multi-user regression coverage
- Enable backend tests in CI (remove if: false)
- Fix test_products_helpers.py to pass current_user parameter
- Fix test_routines_helpers.py to include short_id in products
- Fix llm_context.py to use product_effect_profile correctly
- All 221 tests passing
2026-03-12 16:42:00 +01:00
b11f64d5a1 refactor(frontend): route protected API access through server session 2026-03-12 16:27:24 +01:00
1d5630ed8c feat(api): add admin household management endpoints 2026-03-12 16:02:11 +01:00
4bfa4ea02d chore(deploy): wire OIDC runtime configuration 2026-03-12 15:55:32 +01:00
ffa3b71309 feat(api): enforce ownership across health routines and profile flows 2026-03-12 15:48:13 +01:00
cd8e39939a feat(frontend): add Authelia OIDC session flow 2026-03-12 15:40:55 +01:00
803bc3b4cd feat(api): scope products and inventory by owner and household 2026-03-12 15:37:39 +01:00
1f47974f48 refactor(api): centralize tenant authorization helpers 2026-03-12 15:26:06 +01:00
4782fad5b9 feat(auth): validate Authelia tokens in FastAPI 2026-03-12 15:13:55 +01:00
2704d58673 feat(db): backfill tenant ownership for existing records 2026-03-12 14:54:24 +01:00
04daadccda feat(auth): add local user and household models 2026-03-12 14:45:43 +01:00
e29d62f949 feat(frontend): auto-generate TypeScript types from backend OpenAPI schema
Replace manually maintained types in src/lib/types.ts with auto-generated
types from FastAPI's OpenAPI schema using @hey-api/openapi-ts. The bridge
file re-exports generated types with renames, Require<> augmentations for
fields that are optional in the schema but always present in responses, and
manually added relationship fields excluded from OpenAPI.

- Add openapi-ts.config.ts and generate:api npm script
- Generate types into src/lib/api/generated/types.gen.ts
- Rewrite src/lib/types.ts as bridge with re-exports and augmentations
- Fix null vs undefined mismatches in consumer components
- Remove unused manual type definitions from api.ts
- Update AGENTS.md docs with type generation workflow
2026-03-12 09:17:40 +01:00
470d49b061 docs: restructure AGENTS.md into hierarchical knowledge base 2026-03-11 11:35:47 +01:00
157cbc425e feat(frontend): improve lab result filter ergonomics 2026-03-10 12:44:19 +01:00
ed547703ad chore(repo): ignore local test artifacts 2026-03-10 12:26:59 +01:00
0253b2377d feat(frontend): unify page shell and move create flows to dedicated routes 2026-03-10 12:25:25 +01:00
e20c18c2ee fix(api): tighten shopping suggestion response rules
Constrain shopping target concerns to SkinConcern enums and add a regression test for invalid values. Simplify the shopping prompt so repurchase suggestions stay practical, use shorter product types, and avoid leaking raw scoring/debug language into user-facing copy.
2026-03-09 17:26:24 +01:00
d91d06455b feat(products): improve replenishment-aware shopping suggestions
Replace product weight and repurchase intent fields with per-package remaining levels and inventory-first restock signals. Enrich shopping suggestions with usage-aware replenishment scoring so the frontend and LLM can prioritize real gaps and near-empty staples more reliably.
2026-03-09 13:37:40 +01:00
bb5d402c15 feat(products): improve shopping suggestion decision support 2026-03-08 22:30:30 +01:00
5d9d18bd05 fix(api): constrain shopping suggestion enums 2026-03-08 12:06:39 +01:00
cebea2ac86 fix(api): avoid distinct on json product fields in shopping suggestions 2026-03-08 11:55:16 +01:00
fecfa0b9e4 feat(api): align routine context windows with recent skin history 2026-03-08 11:53:59 +01:00
1c457d62a3 feat(api): include 7-day upcoming grooming context in routine suggestions 2026-03-07 01:40:42 +01:00
5d69a976c4 chore(infra): align systemd units and Forgejo runners
Some checks failed
CI / Frontend Type Checking & Linting (push) Failing after 0s
CI / Backend Tests (push) Has been skipped
CI / Backend Linting & Type Checking (push) Failing after 16s
Point services to /opt/innercontext/current release paths, remove stale phase completion docs, and switch Forgejo workflows to run on the lxc runner label.
2026-03-07 01:21:01 +01:00
2efdb2b785 fix(deploy): make LXC deploys atomic and fail-fast
Rebuild the deployment flow to prepare releases remotely, validate env/sudo prerequisites, run migrations in-release, and auto-rollback on health failures. Consolidate deployment docs and add a manual CI workflow so laptop and CI use the same push-based deploy path.
2026-03-07 01:14:30 +01:00
d228b44209 feat(i18n): add Phase 3 observability translations (EN + PL)
Added translations for all observability components:
- Validation warnings panel
- Auto-fixes badge
- AI reasoning process viewer
- Debug information panel
- Structured error display

English translations (en.json):
- observability_validationWarnings: "Validation Warnings"
- observability_autoFixesApplied: "Automatically adjusted"
- observability_aiReasoningProcess: "AI Reasoning Process"
- observability_debugInfo: "Debug Information"
- observability_model/duration/tokenUsage: Debug panel labels
- observability_validationFailed: "Safety validation failed"

Polish translations (pl.json):
- observability_validationWarnings: "Ostrzeżenia walidacji"
- observability_autoFixesApplied: "Automatycznie dostosowano"
- observability_aiReasoningProcess: "Proces rozumowania AI"
- observability_debugInfo: "Informacje debugowania"
- All debug panel labels translated
- observability_validationFailed: "Walidacja bezpieczeństwa nie powiodła się"

Updated components:
- ValidationWarningsAlert: Uses m.observability_validationWarnings()
- AutoFixBadge: Uses m.observability_autoFixesApplied()
- ReasoningChainViewer: Uses m.observability_aiReasoningProcess()
- MetadataDebugPanel: All labels now use i18n
- StructuredErrorDisplay: Translates error prefixes

All components now fully support English and Polish locales.
2026-03-06 16:28:23 +01:00
b2886c2f2b refactor(frontend): align observability panels with editorial design system
Replace hardcoded gray-* colors with design system tokens:
- border-gray-200 → border-muted
- bg-gray-50 → bg-muted/30
- text-gray-600/700 → text-muted-foreground/foreground
- hover:bg-gray-100 → hover:bg-muted/50

Updated components:
- MetadataDebugPanel: now matches Card aesthetic
- ReasoningChainViewer: now uses warm editorial tones

Benefits:
- Consistent with existing reasoning/summary cards
- Matches warm editorial aesthetic (hsl(42...) palette)
- DRY: reuses design system tokens
- Documented collapsible panel pattern in cookbook

This fixes the cool gray panels that looked out of place among the warm beige editorial UI.
2026-03-06 16:25:47 +01:00
c8fa80be99 fix(api): rename 'metadata' to 'response_metadata' to avoid Pydantic conflict
The field name 'metadata' conflicts with Pydantic's internal ClassVar.
Renamed to 'response_metadata' throughout:
- Backend: RoutineSuggestion, BatchSuggestion, ShoppingSuggestionResponse
- Frontend: TypeScript types and component usages

This fixes the AttributeError when setting metadata on SQLModel instances.
2026-03-06 16:16:35 +01:00
d00e0afeec docs: add Phase 3 completion summary
Document all Phase 3 UI/UX observability work:
- Backend API enrichment details
- Frontend component specifications
- Integration points
- Known limitations
- Testing plan and deployment checklist
2026-03-06 15:55:06 +01:00
5d3f876bec feat(frontend): add Phase 3 UI components for observability
Components created:
- ValidationWarningsAlert: Display validation warnings with collapsible list
- StructuredErrorDisplay: Parse and display HTTP 502 errors as bullet points
- AutoFixBadge: Show automatically applied fixes
- ReasoningChainViewer: Collapsible panel for LLM thinking process
- MetadataDebugPanel: Collapsible debug info (model, duration, token metrics)

CSS changes:
- Add .editorial-alert--warning and .editorial-alert--info variants

Integration:
- Update routines/suggest page to show warnings, auto-fixes, reasoning, and metadata
- Update products/suggest page with same observability components
- Replace plain error divs with StructuredErrorDisplay for better UX

All components follow design system and pass svelte-check with 0 errors
2026-03-06 15:53:46 +01:00
3c3248c2ea feat(api): add Phase 3 observability - expose validation warnings and metadata to frontend
Backend changes:
- Create ResponseMetadata and TokenMetrics models for API responses
- Modify call_gemini() and call_gemini_with_function_tools() to return (response, log_id) tuple
- Add _build_response_metadata() helper to extract metadata from AICallLog
- Update routines API (/suggest, /suggest-batch) to populate validation_warnings, auto_fixes_applied, and metadata
- Update products API (/suggest) to populate observability fields
- Update skincare API to handle new return signature

Frontend changes:
- Add TypeScript types: TokenMetrics, ResponseMetadata
- Update RoutineSuggestion, BatchSuggestion, ShoppingSuggestionResponse with observability fields

Next: Create UI components to display warnings, reasoning chains, and token metrics
2026-03-06 15:50:28 +01:00
3bf19d8acb feat(api): add enhanced token metrics logging for Gemini API
Add comprehensive token breakdown logging to understand MAX_TOKENS behavior
and verify documentation claims about thinking tokens.

New Fields Added to ai_call_logs:
- thoughts_tokens: Thinking tokens (thoughtsTokenCount) - documented as
  separate from output budget
- tool_use_prompt_tokens: Tool use overhead (toolUsePromptTokenCount)
- cached_content_tokens: Cached content tokens (cachedContentTokenCount)

Purpose:
Investigate token counting mystery from production logs where:
  prompt_tokens: 4400
  completion_tokens: 589
  total_tokens: 8489  ← Should be 4400 + 589 = 4989, missing 3500!

According to Gemini API docs (Polish translation):
  totalTokenCount = promptTokenCount + candidatesTokenCount
  (thoughts NOT included in total)

But production logs show 3500 token gap. New logging will reveal:
1. Are thinking tokens actually separate from max_output_tokens limit?
2. Where did the 3500 missing tokens go?
3. Does MEDIUM thinking level consume output budget despite docs?
4. Are tool use tokens included in total but not shown separately?

Changes:
- Added 3 new integer columns to ai_call_logs (nullable)
- Enhanced llm.py to capture all usage_metadata fields
- Used getattr() for safe access (fields may not exist in all responses)
- Database migration: 7e6f73d1cc95

This will provide complete data for future LLM calls to diagnose:
- MAX_TOKENS failures
- Token budget behavior
- Thinking token costs
- Tool use overhead
2026-03-06 12:17:13 +01:00
5bb2ea5f08 feat(api): add short_id column for consistent LLM UUID handling
Resolves validation failures where LLM fabricated full UUIDs from 8-char
prefixes shown in context, causing 'unknown product_id' errors.

Root Cause Analysis:
- Context showed 8-char short IDs: '77cbf37c' (Phase 2 optimization)
- Function tool returned full UUIDs: '77cbf37c-3830-4927-...'
- LLM saw BOTH formats, got confused, invented UUIDs for final response
- Validators rejected fabricated UUIDs as unknown products

Solution: Consistent 8-char short_id across LLM boundary:
1. Database: New short_id column (8 chars, unique, indexed)
2. Context: Shows short_id (was: str(id)[:8])
3. Function tools: Return short_id (was: full UUID)
4. Translation layer: Expands short_id → UUID before validation
5. Database: Stores full UUIDs (no schema change for existing data)

Changes:
- Added products.short_id column with unique constraint + index
- Migration populates from UUID prefix, handles collisions via regeneration
- Product model auto-generates short_id for new products
- LLM contexts use product.short_id consistently
- Function tools return product.short_id
- Added _expand_product_id() translation layer in routines.py
- Integrated expansion in suggest_routine() and suggest_batch()
- Validators work with full UUIDs (no changes needed)

Benefits:
 LLM never sees full UUIDs, no format confusion
 Maintains Phase 2 token optimization (~85% reduction)
 O(1) indexed short_id lookups vs O(n) pattern matching
 Unique constraint prevents collisions at DB level
 Clean separation: 8-char for LLM, 36-char for application

From production error:
  Step 1: unknown product_id 77cbf37c-3830-4927-9669-07447206689d
  (LLM invented the last 28 characters)

Now resolved: LLM uses '77cbf37c' consistently, translation layer
expands to real UUID before validation.
2026-03-06 10:58:26 +01:00
710b53e471 fix(api): resolve function tool UUID mismatch and MAX_TOKENS errors
Two critical bugs identified from production logs:

1. UUID Mismatch Bug (0 products returned from function tools):
   - Context shows 8-char short IDs: '63278801'
   - Function handler expected full UUIDs: '63278801-xxxx-...'
   - LLM requested short IDs, handler couldn't match → 0 products

   Fix: Index products by BOTH full UUID and short ID (first 8 chars)
   in build_product_details_tool_handler. Accept either format.
   Added deduplication to handle duplicate requests.
   Maintains Phase 2 token optimization (no context changes).

2. MAX_TOKENS Error (response truncation):
   - max_output_tokens=4096 includes thinking tokens (~3500)
   - Only ~500 tokens left for JSON response
   - MEDIUM thinking level (Phase 2) consumed budget

   Fix: Increase max_output_tokens from 4096 → 8192 across all
   creative endpoints (routines/suggest, routines/suggest-batch,
   products/suggest). Updated default in get_creative_config().

   Gives headroom: ~3500 thinking + ~4500 response = ~8000 total

From production logs (ai_call_logs):
- Log 71699654: Success but response_text null (function call only)
- Log 2db37c0f: MAX_TOKENS failure, tool returned 0 products

Both issues now resolved.
2026-03-06 10:44:12 +01:00
3ef1f249b6 fix(api): handle dict vs object in build_product_context_summary
When products are loaded from PostgreSQL, JSON columns (effect_profile,
context_rules) are deserialized as plain dicts, not Pydantic models.

The build_product_context_summary function was accessing these fields
as object attributes (.safe_with_compromised_barrier) which caused:
AttributeError: 'dict' object has no attribute 'safe_with_compromised_barrier'

Fix: Add isinstance(dict) checks like build_product_context_detailed already does.
Handle both dict (from DB) and object (from Pydantic) cases.

Traceback from production:
  File "llm_context.py", line 91, in build_product_context_summary
    if product.context_rules.safe_with_compromised_barrier:
  AttributeError: 'dict' object has no attribute...
2026-03-06 10:34:51 +01:00
594dae474b refactor(api): remove redundant field ban language from prompts
Schema enforcement already prevents LLM from returning fields outside
the defined response_schema (_SingleStepOut, _BatchStepOut). Explicit
field bans (dose, amount, quantity, application_amount) are redundant
and add unnecessary token cost.

Removed:
- 'KRYTYCZNE' warning about schema violations
- 'ZABRONIONE POLA' explicit field list
- 4-line 'ABSOLUTNIE ZABRONIONE' dose prohibition section

Token savings: ~80 tokens per prompt (system instruction overhead)

Trust the schema - cleaner prompts, same enforcement.
2026-03-06 10:30:36 +01:00
c87d1b8581 feat(api): implement Phase 2 token optimization and reasoning capture
- Add tiered context system (summary/detailed/full) to reduce token usage by 70-80%
- Replace old _build_products_context with build_products_context_summary_list (Tier 1: ~15 tokens/product vs 150)
- Optimize function tool responses: exclude INCI list by default (saves ~15KB/product)
- Reduce actives from 24 to top 5 in function tools
- Add reasoning_chain field to AICallLog model for observability
- Implement _extract_thinking_content to capture LLM reasoning (MEDIUM thinking level)
- Strengthen prompt enforcement for prohibited fields (dose, amount, quantity)
- Update get_creative_config to use MEDIUM thinking level instead of LOW

Token Savings:
- Routine suggestions: 9,613 → ~1,300 tokens (-86%)
- Batch planning: 12,580 → ~1,800 tokens (-86%)
- Function tool responses: ~15KB → ~2KB per product (-87%)

Breaks discovered in log analysis (ai_call_log.json):
- Lines 10, 27, 61, 78: LLM returned prohibited dose field
- Line 85: MAX_TOKENS failure (output truncated)

Phase 2 complete. Next: two-phase batch planning with safety verification.
2026-03-06 10:26:29 +01:00
e239f61408 style: apply black and isort formatting
Run formatting tools on Phase 1 changes:
- black (code formatter)
- isort (import sorter)
- ruff (linter)

All linting checks pass.
2026-03-06 10:17:00 +01:00
2a9391ad32 feat(api): add LLM response validation and input sanitization
Implement Phase 1: Safety & Validation for all LLM-based suggestion engines.

- Add input sanitization module to prevent prompt injection attacks
- Implement 5 comprehensive validators (routine, batch, shopping, product parse, photo)
- Add 10+ critical safety checks (retinoid+acid conflicts, barrier compatibility, etc.)
- Integrate validation into all 5 API endpoints (routines, products, skincare)
- Add validation fields to ai_call_logs table (validation_errors, validation_warnings, auto_fixed)
- Create database migration for validation fields
- Add comprehensive test suite (9/9 tests passing, 88% coverage on validators)

Safety improvements:
- Blocks retinoid + acid conflicts in same routine/day
- Rejects unknown product IDs
- Enforces min_interval_hours rules
- Protects compromised skin barriers
- Prevents prohibited fields (dose, amount) in responses
- Validates all enum values and score ranges

All validation failures are logged and responses are rejected with HTTP 502.
2026-03-06 10:16:47 +01:00
e3ed0dd3a3 fix(routines): enforce min_interval_hours and minoxidil flag server-side
Two bugs in /routines/suggest where the LLM could override hard constraints:

1. Products with min_interval_hours (e.g. retinol at 72h) were passed to
   the LLM even if used too recently. The LLM reasoned away the constraint
   in at least one observed case. Fix: added _filter_products_by_interval()
   which removes ineligible products before the prompt is built, so they
   don't appear in AVAILABLE PRODUCTS at all.

2. Minoxidil was included in the available products list regardless of the
   include_minoxidil_beard flag. Only the objectives context was gated,
   leaving the product visible to the LLM which would include it based on
   recent usage history. Fix: added include_minoxidil param to
   _get_available_products() and threaded it through suggest_routine and
   suggest_batch.

Also refactored _build_products_context() to accept a pre-supplied
products list instead of calling _get_available_products() internally,
ensuring the tool handler and context text always use the same filtered set.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 23:36:15 +01:00
7a66a7911d feat(backend): include last-used date in product LLM details 2026-03-05 16:48:49 +01:00
40d26514a1 refactor(backend): consolidate product LLM function tools 2026-03-05 16:44:03 +01:00
b99b9ed68e feat(profile): add profile settings and LLM user context 2026-03-05 15:57:21 +01:00
db3d9514d5 fix(routines): remove dose from AI routine suggestions 2026-03-05 14:19:18 +01:00
c4be7dd1be refactor(frontend): align lab results filters with products style 2026-03-05 13:14:33 +01:00
7eca2391a9 perf(frontend): trim unused Cormorant Google font weight 2026-03-05 12:53:14 +01:00
0a4ccefe28 feat(repo): expand lab results workflows across backend and frontend 2026-03-05 12:46:49 +01:00
f1b104909d docs(repo): define agent skills and frontend cookbook workflow 2026-03-05 10:49:07 +01:00
013492ec2b refactor(products): remove usage notes and contraindications fields 2026-03-05 10:11:24 +01:00
9df241a6a9 feat(frontend): localize skin active concerns with enum multi-select 2026-03-04 23:37:14 +01:00