innercontext/backend/innercontext/models/ai_log.py
Piotr Oleszczyk 3bf19d8acb feat(api): add enhanced token metrics logging for Gemini API
Add comprehensive token breakdown logging to understand MAX_TOKENS behavior
and verify documentation claims about thinking tokens.

New Fields Added to ai_call_logs:
- thoughts_tokens: Thinking tokens (thoughtsTokenCount) - documented as
  separate from output budget
- tool_use_prompt_tokens: Tool use overhead (toolUsePromptTokenCount)
- cached_content_tokens: Cached content tokens (cachedContentTokenCount)

Purpose:
Investigate token counting mystery from production logs where:
  prompt_tokens: 4400
  completion_tokens: 589
  total_tokens: 8489  ← Should be 4400 + 589 = 4989, missing 3500!

According to Gemini API docs (Polish translation):
  totalTokenCount = promptTokenCount + candidatesTokenCount
  (thoughts NOT included in total)

But production logs show 3500 token gap. New logging will reveal:
1. Are thinking tokens actually separate from max_output_tokens limit?
2. Where did the 3500 missing tokens go?
3. Does MEDIUM thinking level consume output budget despite docs?
4. Are tool use tokens included in total but not shown separately?

Changes:
- Added 3 new integer columns to ai_call_logs (nullable)
- Enhanced llm.py to capture all usage_metadata fields
- Used getattr() for safe access (fields may not exist in all responses)
- Database migration: 7e6f73d1cc95

This will provide complete data for future LLM calls to diagnose:
- MAX_TOKENS failures
- Token budget behavior
- Thinking token costs
- Tool use overhead
2026-03-06 12:17:13 +01:00

64 lines
2.2 KiB
Python

from datetime import datetime
from typing import Any, ClassVar
from uuid import UUID, uuid4
from sqlalchemy import JSON, Column
from sqlmodel import Field, SQLModel
from .base import utc_now
from .domain import Domain
class AICallLog(SQLModel, table=True):
__tablename__ = "ai_call_logs"
__domains__: ClassVar[frozenset[Domain]] = frozenset()
id: UUID = Field(default_factory=uuid4, primary_key=True)
created_at: datetime = Field(default_factory=utc_now, nullable=False)
endpoint: str = Field(index=True)
model: str
system_prompt: str | None = Field(default=None)
user_input: str | None = Field(default=None)
response_text: str | None = Field(default=None)
prompt_tokens: int | None = Field(default=None)
completion_tokens: int | None = Field(default=None)
total_tokens: int | None = Field(default=None)
duration_ms: int | None = Field(default=None)
finish_reason: str | None = Field(default=None)
tool_trace: dict[str, Any] | None = Field(
default=None,
sa_column=Column(JSON, nullable=True),
)
success: bool = Field(default=True, index=True)
error_detail: str | None = Field(default=None)
# Validation fields (Phase 1)
validation_errors: list[str] | None = Field(
default=None,
sa_column=Column(JSON, nullable=True),
)
validation_warnings: list[str] | None = Field(
default=None,
sa_column=Column(JSON, nullable=True),
)
auto_fixed: bool = Field(default=False)
# Reasoning capture (Phase 2)
reasoning_chain: str | None = Field(
default=None,
description="LLM reasoning/thinking process (MEDIUM thinking level)",
)
# Enhanced token metrics (Phase 2 - Gemini API detailed breakdown)
thoughts_tokens: int | None = Field(
default=None,
description="Thinking tokens (thoughtsTokenCount) - separate from output budget",
)
tool_use_prompt_tokens: int | None = Field(
default=None,
description="Tool use prompt tokens (toolUsePromptTokenCount)",
)
cached_content_tokens: int | None = Field(
default=None,
description="Cached content tokens (cachedContentTokenCount)",
)