> [!NOTE] **TL;DR** > In this post, we cover more advanced things you can do with Gemini's structured outputs, like required fields. We also cover things you _can't_ do, like union types and tuples. This is a follow up to [[Google Gemini 101 - Object Detection with Vision and Structured Outputs]], where we explore some advanced/alternative setups for Gemini structured outputs, as well as some gotcha's. # Motivation: Required Fields One of the first things you'll notice from the [[Google Gemini 101 - Object Detection with Vision and Structured Outputs|previous blog post]] is that none of the keys are required. That is, the model is free to return any subset of the keys, which can lead to getting funky results with no localization, e.g.: ```json { "type": "Teacup" } ``` There are three ways to get required fields with AI Studio: 1. use raw JSON schemas 2. use protobufs 3. use the `openai` package and API, but with Gemini endpoint I'm rather partial to the third, so that's what we're going to cover here. However, feel free to follow the above links to the documentation for the other solutions, if you'd prefer. # Use the `openai` Package The `openai` structured output approach enforces that _every_ key is required (link to docs). This means that if we swap from the `google-generativeai` package to `openai`, all keys will be required: ```python from typing import Literal from pydantic import BaseModel from .utils import draw_bounding_box import argparse import openai import base64 import os class TeaSet(BaseModel): type: Literal["Teacup", "Teapot"] bounding_box: list[int] class TeaSets(BaseModel): tea_sets: list[TeaSet] def main(client: openai.OpenAI, image_path: str, size: float = 1024) -> None: with open(image_path, "rb") as image_file: image_base64 = base64.b64encode(image_file.read()).decode("utf-8") image = Image.open(image_path) response = client.beta.chat.completions.parse( model="gemini-2.0-flash-exp", n=1, messages=[ { "role": "system", "content": """Find all the teacups and teapots in the image. Return your answer as a list of JSON objects with the type and bounding box. Return the bounding box in [ymin, xmin, ymax, xmax] format.""", }, { "role": "user", "content": [ {"type": "text", "text": "Here's the image:"}, { "type": "image_url", "image_url": { "url": f"data:image/jpeg;base64,{image_base64}" }, }, ], }, ], response_format=TeaSets, ) if tea_sets := response.choices[0].message.parsed: for tea_set in tea_sets.tea_sets: draw_bounding_box(image, tea_set.type, tea_set.bounding_box) image.show() if __name__ == "__main__": client = openai.OpenAI( base_url="https://generativelanguage.googleapis.com/v1beta/openai/", api_key=os.getenv("GOOGLE_API_KEY"), ) parser = argparse.ArgumentParser() parser.add_argument("image_path", help="Path to image file to process") args = parser.parse_args() main(client, args.image_path) ``` # Gemini Structured Mode Gotchas However, if you want to do more complex things (that OpenAI _can_ do), it's not *that* easy! The following things aren't supported by Gemini's structured outputs: - union types - tuples ## Union Types For instance, union types aren't supported: ```python class Teacup(BaseModel): is_empty: bool bounding_box: list[int] class Teapot(BaseModel): rating: int bounding_box: list[int] class TeaSets(BaseModel): tea_sets: list[Teacup | Teapot] ``` That also means that `Optional` types (or, `| None` in more recent python) isn't supported: ```python from typing import Optional class TeaSet(BaseModel): type: Literal["Teacup", "Teapot"] bounding_box: Optional[list[int]] class TeaSets(BaseModel): tea_sets: list[TeaSet] ``` ## Tuples In the previous post, you might have thought to yourself: "Why implement bounding box as a list of integers when there are only 4 items? Shouldn't you just use a 4-tuple?" That's a great question! Unfortunately, it's not supported. Give it a shot yourself: ```python class TeaSet(BaseModel): type: Literal["Teacup", "Teapot"] bounding_box: tuple[int, int, int, int] class TeaSets(BaseModel): tea_sets: list[TeaSet] ``` ## What _is_ supported? https://cloud.google.com/vertex-ai/docs/reference/rest/v1/Schema