fix(backend): drop response_mime_type=application/json to avoid constrained decoding

Constrained decoding is ~10x slower and consumes hidden tokens for constraint
processing, causing truncation at ~1000 chars even with 8192 max_output_tokens.
The system prompt already instructs the model to output raw minified JSON; our
NaN/markdown-fence sanitisation handles edge cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Piotr Oleszczyk 2026-02-28 22:03:49 +01:00
parent 26069f5d66
commit 3fbf6d7041

View file

@ -359,8 +359,7 @@ def parse_product_text(data: ProductParseRequest) -> ProductParseResponse:
contents=f"Extract product data from this text:\n\n{data.text}", contents=f"Extract product data from this text:\n\n{data.text}",
config=genai_types.GenerateContentConfig( config=genai_types.GenerateContentConfig(
system_instruction=_product_parse_system_prompt(), system_instruction=_product_parse_system_prompt(),
response_mime_type="application/json", max_output_tokens=8192,
max_output_tokens=65536,
temperature=0.0, temperature=0.0,
), ),
) )