fix(llm): switch from thinking_budget to thinking_level=LOW for Gemini 3
gemini-flash-latest resolves to gemini-3-flash-preview which uses thinking_level instead of the legacy thinking_budget (mixing both returns HTTP 400). Use LOW to reduce thinking overhead while keeping basic reasoning, replacing the now-incompatible thinking_budget=0. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
ada5f2a93b
commit
cc657998e8
1 changed files with 8 additions and 3 deletions
|
|
@ -46,11 +46,16 @@ def call_gemini(
|
||||||
with suppress(Exception):
|
with suppress(Exception):
|
||||||
user_input = str(contents)
|
user_input = str(contents)
|
||||||
|
|
||||||
# Disable thinking by default — Gemini 2.5 Flash thinking tokens count toward
|
# Limit thinking by default — Gemini 3 Flash defaults to "high" thinking which
|
||||||
# max_output_tokens, leaving too little room for actual JSON output.
|
# consumes most of the token budget before generating actual output.
|
||||||
|
# Use "low" to reduce latency while keeping basic reasoning intact.
|
||||||
if config.thinking_config is None:
|
if config.thinking_config is None:
|
||||||
config = config.model_copy(
|
config = config.model_copy(
|
||||||
update={"thinking_config": genai_types.ThinkingConfig(thinking_budget=0)}
|
update={
|
||||||
|
"thinking_config": genai_types.ThinkingConfig(
|
||||||
|
thinking_level=genai_types.ThinkingLevel.LOW
|
||||||
|
)
|
||||||
|
}
|
||||||
)
|
)
|
||||||
|
|
||||||
start = time.monotonic()
|
start = time.monotonic()
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue