Home Software Gemini API I/O updates – Google Builders Weblog

Gemini API I/O updates – Google Builders Weblog

33
0

The Gemini API presents builders a streamlined solution to construct revolutionary purposes with cutting-edge generative AI fashions. Google AI Studio simplifies this course of of testing all of the API capabilities permitting for fast prototyping and experimentation with textual content, picture, and even video prompts. When builders need to check and construct at scale they’ll leverage all of the capabilities accessible by means of the Gemini API.


New fashions accessible by means of the API

Gemini 2.5 Flash Preview – We’ve added a brand new 2.5 Flash preview (gemini-2.5-flash-preview-05-20) which is healthier over the earlier preview at reasoning, code, and lengthy context. This model of two.5 Flash is at the moment #2 on the LMarena leaderboard behind solely 2.5 Professional. We’ve additionally improved Flash cost-efficiency with this newest replace decreasing the variety of tokens wanted for a similar efficiency, leading to 22% effectivity good points on our evals. Our aim is to maintain bettering based mostly in your suggestions, and make each usually accessible quickly.

Gemini 2.5 Professional and Flash text-to-speech (TTS) – We additionally introduced 2.5 Professional and Flash previews for text-to-speech (TTS) that assist native audio output for each single and a number of audio system, throughout 24 languages. With these fashions, you’ll be able to management TTS expression and magnificence, creating wealthy audio output. With multispeaker, you’ll be able to generate conversations with a number of distinct voices for dynamic interactions.

Gemini 2.5 Flash native audio dialog – In preview, this mannequin is accessible through the Reside API to generate pure sounding voices for dialog, in over 30 distinct voices and 24+ languages. We’ve additionally added proactive audio so the mannequin can distinguish between the speaker and background conversations, so it is aware of when to reply. As well as, the mannequin responds appropriately to a person’s emotional expression and tone. A separate considering mannequin allows extra complicated queries. This now makes it attainable so that you can construct conversational AI brokers and experiences that really feel extra intuitive and pure, like enhancing name middle interactions, growing dynamic personas, crafting distinctive voice characters, and extra.

Lyria RealTime – Reside music era is now accessible within the Gemini API and Google AI Studio to create a steady stream of instrumental music utilizing textual content prompts. With Lyria RealTime, we use WebSockets to determine a persistent, real-time communication channel. The mannequin constantly produces music in small, flowing chunks and adapts based mostly on inputs. Think about including a responsive soundtrack to your app or designing a brand new kind of musical instrument! Check out Lyria RealTime with the PromptDJ-MIDI app in Google AI Studio.

Gemini 2.5 Professional Deep Suppose – We’re additionally testing an experimental reasoning mode for two.5 Professional. We’ve seen unimaginable efficiency with these Deep Considering capabilities for extremely complicated math and coding prompts. We look ahead to making it broadly accessible so that you can experiment with quickly.

Gemma 3n – Gemma 3n is a generative AI open mannequin optimized to be used in on a regular basis gadgets, akin to telephones, laptops, and tablets. It could deal with textual content, audio and imaginative and prescient inputs. This mannequin contains improvements in parameter-efficient processing, together with Per-Layer Embedding (PLE) parameter caching and a MatFormer mannequin structure that gives the pliability to scale back compute and reminiscence necessities.


New performance within the API

Thought summaries

To assist builders perceive and debug mannequin responses, we’ve added thought summaries for two.5 Professional and Flash within the Gemini API. We take the mannequin’s uncooked ideas and synthesize them right into a useful abstract with headers, related particulars and power calls. The uncooked chain-of-thoughts in Google AI Studio has additionally been up to date with the brand new thought summaries.


Considering budgets

We launched 2.5 Flash with considering budgets to offer builders management over how a lot fashions assume to steadiness efficiency, latency, and value for the apps they’re constructing. We can be extending this functionality to 2.5 Professional quickly.

from google import genai
from google.genai import varieties

shopper = genai.Consumer(api_key="GOOGLE_API_KEY")
immediate = "What's the sum of the primary 50 prime numbers?"
response = shopper.fashions.generate_content(
  mannequin="gemini-2.5-flash-preview-05-20",
  contents=immediate,
  config=varieties.GenerateContentConfig(
    thinking_config=varieties.ThinkingConfig(thinking_budget=1024,
      include_thoughts=True
    )
  )
)

for half in response.candidates[0].content material.components:
  if not half.textual content:
    proceed
  if half.thought:
    print("Thought abstract:")
    print(half.textual content)
    print()
  else:
    print("Reply:")
    print(half.textual content)
    print()

Python

Pattern code to allow and retrieve thought summaries with out streaming, returning a remaining thought abstract with the response.

New URL Context device

We added a brand new experimental device, URL context, to retrieve extra context from hyperlinks that you just present. This can be utilized by itself or along side different instruments akin to Grounding with Google Search. This device is a key constructing block for builders trying to construct their very own model of analysis brokers with the Gemini API.

from google import genai
from google.genai.varieties import Device, GenerateContentConfig, GoogleSearch

shopper = genai.Consumer()
model_id = "gemini-2.5-flash-preview-05-20"

instruments = []
instruments.append(Device(url_context=varieties.UrlContext))
instruments.append(Device(google_search=varieties.GoogleSearch))

response = shopper.fashions.generate_content(
    mannequin=model_id,
    contents="Give me three day occasions schedule based mostly on YOUR_URL. Additionally let me know what must taken care of contemplating climate and commute.",
    config=GenerateContentConfig(
        instruments=instruments,
        response_modalities=["TEXT"],
    )
)

for every in response.candidates[0].content material.components:
    print(every.textual content)
# get URLs retrieved for context
print(response.candidates[0].url_context_metadata)

Python

Pattern code for Grounding with Google Search and URL Context

Laptop use device

We’re bringing Challenge Mariner’s browser management capabilities to the Gemini API through a brand new laptop use device. To make it simpler for builders to make use of this device, we’re enabling the creation of Cloud Run situations optimally configured for working browser management brokers through one click on from Google AI Studio. We’ve begun early testing with firms like Automation Anyplace, UiPath and Browserbase. Their helpful suggestions can be instrumental in refining its capabilities for a broader experimental developer launch this summer time.


Enhancements to structured outputs

The Gemini API now has broader assist for JSON Schema, together with much-requested key phrases akin to “$ref” (for references) and people enabling the definition of tuple-like constructions (e.g., prefixItems).


Video understanding enhancements

The Gemini API now permits YouTube video URLs or video uploads to be added to a immediate, enabling customers to to summarize, translate, or analyze the video content material. With this latest replace, the API helps video clipping, enabling flexibility in analyzing particular components of a video. That is significantly helpful for movies longer than 8 hours. Now we have additionally added assist for dynamic frames per second (FPS), permitting 60 FPS for movies like video games or sports activities the place velocity is vital, and 0.1 FPS for movies the place velocity is much less of a precedence. To assist customers save tokens, we’ve additionally launched assist for 3 totally different video resolutions: excessive (720p), commonplace (480p), and low (360p).


Async operate calling

The cascaded structure within the Reside API now helps asynchronous operate calling, guaranteeing person conversations stay easy and uninterrupted. This implies your Reside agent can proceed producing responses even whereas it is busy executing capabilities within the background, by merely including the conduct discipline to the operate definition and setting it to NON-BLOCKING. Learn extra about this within the Gemini API developer documentation.


Batch API

We’re additionally testing a brand new API, which helps you to simply batch up your requests and get them again in a max 24 hour turnaround time. The API will come at half the worth of the interactive API and with a lot increased price limits. We hope to roll that out extra extensively later this summer time.


Begin constructing

That’s a wrap on I/O for this 12 months! With the Gemini API and Google AI Studio, you’ll be able to flip your concepts into actuality, whether or not you are constructing conversational AI brokers with natural-sounding audio or growing instruments to research and generate code. As at all times, try the Gemini API developer docs for all the most recent code samples and extra.

Discover this announcement and all Google I/O 2025 updates on io.google.

Previous articleCan product homeowners succeed with simply no-code AI instruments like Lovable, Vercel, and Bolt?
Next articleStartup permits 100-year bridges with corrosion-resistant metal | MIT Information

LEAVE A REPLY

Please enter your comment!
Please enter your name here