fix(backend): drop response_mime_type=application/json to avoid constrained decoding

Constrained decoding is ~10x slower and consumes hidden tokens for constraint processing, causing truncation at ~1000 chars even with 8192 max_output_tokens. The system prompt already instructs the model to output raw minified JSON; our NaN/markdown-fence sanitisation handles edge cases. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 22:03:49 +01:00 · 2026-02-28 22:03:49 +01:00 · 3fbf6d7041
commit 3fbf6d7041
parent 26069f5d66
1 changed files with 1 additions and 2 deletions
--- a/backend/innercontext/api/products.py
+++ b/backend/innercontext/api/products.py
@ -359,8 +359,7 @@ def parse_product_text(data: ProductParseRequest) -> ProductParseResponse:
        contents=f"Extract product data from this text:\n\n{data.text}",
        config=genai_types.GenerateContentConfig(
            system_instruction=_product_parse_system_prompt(),
-            response_mime_type="application/json",
-            max_output_tokens=65536,
+            max_output_tokens=8192,
            temperature=0.0,
        ),
    )