feat(api): add short_id column for consistent LLM UUID handling

Resolves validation failures where LLM fabricated full UUIDs from 8-char
prefixes shown in context, causing 'unknown product_id' errors.

Root Cause Analysis:
- Context showed 8-char short IDs: '77cbf37c' (Phase 2 optimization)
- Function tool returned full UUIDs: '77cbf37c-3830-4927-...'
- LLM saw BOTH formats, got confused, invented UUIDs for final response
- Validators rejected fabricated UUIDs as unknown products

Solution: Consistent 8-char short_id across LLM boundary:
1. Database: New short_id column (8 chars, unique, indexed)
2. Context: Shows short_id (was: str(id)[:8])
3. Function tools: Return short_id (was: full UUID)
4. Translation layer: Expands short_id → UUID before validation
5. Database: Stores full UUIDs (no schema change for existing data)

Changes:
- Added products.short_id column with unique constraint + index
- Migration populates from UUID prefix, handles collisions via regeneration
- Product model auto-generates short_id for new products
- LLM contexts use product.short_id consistently
- Function tools return product.short_id
- Added _expand_product_id() translation layer in routines.py
- Integrated expansion in suggest_routine() and suggest_batch()
- Validators work with full UUIDs (no changes needed)

Benefits:
 LLM never sees full UUIDs, no format confusion
 Maintains Phase 2 token optimization (~85% reduction)
 O(1) indexed short_id lookups vs O(n) pattern matching
 Unique constraint prevents collisions at DB level
 Clean separation: 8-char for LLM, 36-char for application

From production error:
  Step 1: unknown product_id 77cbf37c-3830-4927-9669-07447206689d
  (LLM invented the last 28 characters)

Now resolved: LLM uses '77cbf37c' consistently, translation layer
expands to real UUID before validation.
This commit is contained in:
Piotr Oleszczyk 2026-03-06 10:58:26 +01:00
parent 710b53e471
commit 5bb2ea5f08
8 changed files with 176 additions and 14 deletions

View file

@ -142,6 +142,12 @@ class Product(ProductBase, table=True):
__domains__: ClassVar[frozenset[Domain]] = frozenset({Domain.SKINCARE})
id: UUID = Field(default_factory=uuid4, primary_key=True)
short_id: str = Field(
max_length=8,
unique=True,
index=True,
description="8-character short ID for LLM contexts (first 8 chars of UUID)",
)
# Override 9 JSON fields with sa_column (only in table model)
inci: list[str] = Field(
@ -214,6 +220,11 @@ class Product(ProductBase, table=True):
if self.price_currency is not None:
self.price_currency = self.price_currency.upper()
# Auto-generate short_id from UUID if not set
# Migration handles existing products; this is for new products
if not hasattr(self, "short_id") or not self.short_id:
self.short_id = str(self.id)[:8]
return self
def to_llm_context(