> [!NOTE] **TL;DR**
> In this post, we cover more advanced things you can do with Gemini's structured outputs, like required fields. We also cover things you _can't_ do, like union types and tuples.
This is a follow up to [[Google Gemini 101 - Object Detection with Vision and Structured Outputs]], where we explore some advanced/alternative setups for Gemini structured outputs, as well as some gotcha's.
# Motivation: Required Fields
One of the first things you'll notice from the [[Google Gemini 101 - Object Detection with Vision and Structured Outputs|previous blog post]] is that none of the keys are required. That is, the model is free to return any subset of the keys, which can lead to getting funky results with no localization, e.g.:
```json
{
"type": "Teacup"
}
```
There are three ways to get required fields with AI Studio:
1. use raw JSON schemas
2. use protobufs
3. use the `openai` package and API, but with Gemini endpoint
I'm rather partial to the third, so that's what we're going to cover here. However, feel free to follow the above links to the documentation for the other solutions, if you'd prefer.
# Use the `openai` Package
The `openai` structured output approach enforces that _every_ key is required (link to docs). This means that if we swap from the `google-generativeai` package to `openai`, all keys will be required:
```python
from typing import Literal
from pydantic import BaseModel
from .utils import draw_bounding_box
import argparse
import openai
import base64
import os
class TeaSet(BaseModel):
type: Literal["Teacup", "Teapot"]
bounding_box: list[int]
class TeaSets(BaseModel):
tea_sets: list[TeaSet]
def main(client: openai.OpenAI, image_path: str, size: float = 1024) -> None:
with open(image_path, "rb") as image_file:
image_base64 = base64.b64encode(image_file.read()).decode("utf-8")
image = Image.open(image_path)
response = client.beta.chat.completions.parse(
model="gemini-2.0-flash-exp",
n=1,
messages=[
{
"role": "system",
"content": """Find all the teacups and teapots in the image.
Return your answer as a list of JSON objects with the type and bounding box.
Return the bounding box in [ymin, xmin, ymax, xmax] format.""",
},
{
"role": "user",
"content": [
{"type": "text", "text": "Here's the image:"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{image_base64}"
},
},
],
},
],
response_format=TeaSets,
)
if tea_sets := response.choices[0].message.parsed:
for tea_set in tea_sets.tea_sets:
draw_bounding_box(image, tea_set.type, tea_set.bounding_box)
image.show()
if __name__ == "__main__":
client = openai.OpenAI(
base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
api_key=os.getenv("GOOGLE_API_KEY"),
)
parser = argparse.ArgumentParser()
parser.add_argument("image_path", help="Path to image file to process")
args = parser.parse_args()
main(client, args.image_path)
```
# Gemini Structured Mode Gotchas
However, if you want to do more complex things (that OpenAI _can_ do), it's not *that* easy! The following things aren't supported by Gemini's structured outputs:
- union types
- tuples
## Union Types
For instance, union types aren't supported:
```python
class Teacup(BaseModel):
is_empty: bool
bounding_box: list[int]
class Teapot(BaseModel):
rating: int
bounding_box: list[int]
class TeaSets(BaseModel):
tea_sets: list[Teacup | Teapot]
```
That also means that `Optional` types (or, `| None` in more recent python) isn't supported:
```python
from typing import Optional
class TeaSet(BaseModel):
type: Literal["Teacup", "Teapot"]
bounding_box: Optional[list[int]]
class TeaSets(BaseModel):
tea_sets: list[TeaSet]
```
## Tuples
In the previous post, you might have thought to yourself: "Why implement bounding box as a list of integers when there are only 4 items? Shouldn't you just use a 4-tuple?" That's a great question!
Unfortunately, it's not supported. Give it a shot yourself:
```python
class TeaSet(BaseModel):
type: Literal["Teacup", "Teapot"]
bounding_box: tuple[int, int, int, int]
class TeaSets(BaseModel):
tea_sets: list[TeaSet]
```
## What _is_ supported?
https://cloud.google.com/vertex-ai/docs/reference/rest/v1/Schema