Structured Output: Getting JSON, XML, and Tables from AI

One of the most common challenges in AI application development is getting models to produce output in a specific, parseable format. Whether you need valid JSON for an API response, a markdown table for a report, or structured XML for data interchange, the techniques for reliable structured output are essential knowledge for any developer working with language models.

The Structured Output Challenge

Language models naturally produce free-form text. Getting them to consistently produce valid structured data requires specific prompting strategies. The core challenge is that models generate tokens one at a time, and a single misplaced comma, missing bracket, or wrong data type can make an entire JSON response unparseable.

Fortunately, modern AI APIs are increasingly offering native structured output features. OpenAI's JSON mode, Anthropic's tool use, and various open-source libraries like Instructor and Outlines provide programmatic ways to enforce structure. But even with these tools, understanding the prompting principles behind structured output remains valuable.

"Structured output turns AI from a conversational partner into a data pipeline component. It is the bridge between natural language understanding and programmatic automation."

Getting Reliable JSON Output

JSON is the most common structured format requested from AI models. Here is a proven approach for getting valid JSON every time:

Provide the Exact Schema

Show the model the exact JSON structure you expect, including data types and descriptions for each field:

Extract information from the following text and return it as valid
JSON matching this exact schema:

{
  "product_name": "string - the name of the product",
  "price": "number - price in USD",
  "category": "string - one of: Electronics, Clothing, Food, Home",
  "in_stock": "boolean - whether the item is available",
  "features": ["string - array of key product features"],
  "rating": "number - rating from 1.0 to 5.0, null if not mentioned"
}

Return ONLY the JSON object. No markdown code fences, no
explanatory text before or after.

Use Few-Shot Examples

Providing one or two examples of the input-output mapping dramatically improves JSON consistency. The examples serve as both a format demonstration and a calibration for the level of detail expected in each field.

Key Takeaway

Always provide the complete JSON schema with data types and allowed values for each field. Ambiguity in the schema produces inconsistency in the output.

Generating Markdown Tables

Markdown tables are useful for reports, documentation, and content that will be rendered in markdown-compatible systems:

Create a comparison table with the following specifications:
- Columns: Feature, Free Plan, Pro Plan, Enterprise Plan
- Rows: Storage, Users, API Calls, Support, Price
- Use checkmarks and crosses for boolean features
- Format prices as monthly costs in USD
- Output as a properly formatted markdown table

XML and Custom Formats

For XML output, the same principles apply: provide a template showing the exact element names, attributes, and nesting structure you need. For custom formats like YAML, TOML, or proprietary data formats, include a clear example of the expected output alongside a description of the schema rules.

API-Level Structured Output

Modern AI APIs provide built-in mechanisms for structured output that are more reliable than prompt-only approaches:

OpenAI JSON Mode: Set response_format: {"type": "json_object"} to guarantee valid JSON output.
OpenAI Structured Outputs: Provide a JSON Schema and the model is constrained to produce conforming output.
Anthropic Tool Use: Define tools with input schemas to extract structured data from conversations.
Libraries like Instructor: Python libraries that combine Pydantic models with LLM calls for type-safe structured output with automatic validation and retry.

Handling Edge Cases and Validation

Even with the best prompting techniques, structured output can fail. Build your applications to handle these common issues:

Always validate: Parse the output and validate it against your expected schema before using it downstream.
Implement retries: If validation fails, send the malformed output back to the model with a correction request.
Set defaults: Define default values for optional fields that the model might omit.
Handle null values: Explicitly tell the model how to represent missing or unknown data in the structure.
Strip wrapper text: Models sometimes add explanatory text around the structured output. Use regex or string parsing to extract just the structured portion.

Key Takeaway

Reliable structured output requires a three-layer approach: good prompting for the first attempt, API-level constraints where available, and programmatic validation with retry logic as a safety net.

The Structured Output Challenge

Getting Reliable JSON Output

Provide the Exact Schema

Use Few-Shot Examples

Key Takeaway

Generating Markdown Tables

XML and Custom Formats

API-Level Structured Output

Handling Edge Cases and Validation

Key Takeaway

Related Posts

Prompt Engineering for Code: Getting Better Code from AI

Prompt Engineering for Data Analysis and Visualization

Prompt Chaining: Breaking Complex Tasks into Steps