> ## Documentation Index
> Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Form Schema

> Pre-define field locations for faster, more consistent PDF form filling

Form schemas let you define exactly where form fields are located in a PDF, what type they are, and how they should be filled. Instead of relying on Edit to detect fields each time, you provide the field definitions upfront.

This matters for two reasons: **speed** and **consistency**. With a form schema, Edit skips field detection and description generation, processing forms significantly faster. And because the same fields are targeted every time, you get deterministic results across thousands of form fills.

***

## The Workflow

1. **Run Edit once without a form\_schema** to let Reducto detect all fields
2. **Save the returned `form_schema`** from the response
3. **Use that schema for subsequent fills** of the same form type

<CodeGroup>
  ```python Python theme={null}
  from reducto import Reducto
  import json

  client = Reducto()

  # First run: let Edit detect fields
  result = client.edit.run(
      document_url="https://example.com/w9-blank.pdf",
      edit_instructions="Fill with: Name: Test Corp, EIN: 12-3456789"
  )

  # Save the detected schema (convert Pydantic objects to dicts)
  schema_dicts = [field.model_dump() for field in result.form_schema]
  with open("w9_schema.json", "w") as f:
      json.dump(schema_dicts, f)

  # Subsequent runs: much faster with saved schema
  with open("w9_schema.json") as f:
      saved_schema = json.load(f)

  result = client.edit.run(
      document_url="https://example.com/w9-blank.pdf",
      edit_instructions="Fill with: Name: Acme Inc, EIN: 98-7654321",
      form_schema=saved_schema
  )
  ```

  ```javascript Node.js theme={null}
  import Reducto from 'reductoai';
  import fs from 'fs';

  const client = new Reducto();

  // First run: let Edit detect fields
  const result = await client.edit.run({
    document_url: 'https://example.com/w9-blank.pdf',
    edit_instructions: 'Fill with: Name: Test Corp, EIN: 12-3456789'
  });

  // Save the detected schema
  fs.writeFileSync('w9_schema.json', JSON.stringify(result.form_schema, null, 2));

  // Subsequent runs: much faster with saved schema
  const savedSchema = JSON.parse(fs.readFileSync('w9_schema.json', 'utf-8'));

  const result2 = await client.edit.run({
    document_url: 'https://example.com/w9-blank.pdf',
    edit_instructions: 'Fill with: Name: Acme Inc, EIN: 98-7654321',
    form_schema: savedSchema
  });
  ```

  ```bash cURL theme={null}
  # First run: detect fields and save schema
  curl -s -X POST https://platform.reducto.ai/edit \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "document_url": "https://example.com/w9-blank.pdf",
      "edit_instructions": "Fill with: Name: Test Corp, EIN: 12-3456789"
    }' | jq '.form_schema' > w9_schema.json

  # Subsequent runs: much faster with saved schema
  SCHEMA=$(cat w9_schema.json)
  curl -X POST https://platform.reducto.ai/edit \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "document_url": "https://example.com/w9-blank.pdf",
      "edit_instructions": "Fill with: Name: Acme Inc, EIN: 98-7654321",
      "form_schema": '"$SCHEMA"'
    }'
  ```
</CodeGroup>

The first call runs field detection and generates descriptions. Subsequent calls with the schema skip those steps entirely.

| Scenario             | Pipeline                                  |
| -------------------- | ----------------------------------------- |
| Without form\_schema | Detection → Context → Descriptions → Fill |
| With form\_schema    | Fill only                                 |

***

## Schema Structure

A form schema is an array of field definitions:

```python theme={null}
form_schema = [
    {
        "bbox": {
            "left": 0.227,      # Distance from left edge (0-1)
            "top": 0.144,       # Distance from top edge (0-1)
            "width": 0.15,      # Width as fraction of page
            "height": 0.025,    # Height as fraction of page
            "page": 1           # Page number (1-indexed)
        },
        "description": "Bank Routing/ABA Number",
        "type": "text",
        "fill": True,           # Let LLM determine value (default)
        "value": None           # No fixed value
    },
    {
        "bbox": {"left": 0.432, "top": 0.144, "width": 0.2, "height": 0.025, "page": 1},
        "description": "Bank Name",
        "type": "text",
        "value": "Wells Fargo"  # Fixed value bypasses LLM
    },
    {
        "bbox": {"left": 0.227, "top": 0.54, "width": 0.02, "height": 0.02, "page": 1},
        "description": "Domestic wire checkbox",
        "type": "checkbox"
    }
]
```

### Field Properties

| Property      | Type    | Required | Description                                      |
| ------------- | ------- | -------- | ------------------------------------------------ |
| `bbox`        | object  | Yes      | Normalized coordinates (0-1 range from top-left) |
| `description` | string  | Yes      | Used by LLM to map instructions to this field    |
| `type`        | string  | Yes      | `text`, `checkbox`, `dropdown`, or `barcode`     |
| `fill`        | boolean | No       | Whether to fill this field (default: `true`)     |
| `value`       | string  | No       | Fixed value that bypasses LLM                    |

### Bounding Box

Coordinates are normalized (0-1), measured from the **top-left corner**:

```
(0,0) ─────────────────────── (1,0)
  │                              │
  │    ┌─────────┐               │
  │    │  Field  │ left: 0.1     │
  │    └─────────┘ top: 0.2      │
  │                              │
(0,1) ─────────────────────── (1,1)
```

<Note>
  Page numbers are 1-indexed. The first page is `page: 1`.
</Note>

***

## Fill Control

The `fill` and `value` properties control how each field is handled:

| fill             | value         | Behavior                                    |
| ---------------- | ------------- | ------------------------------------------- |
| `true` (default) | `null`        | LLM determines value from instructions      |
| `true`           | `"something"` | Uses this exact value, ignores instructions |
| `false`          | any           | Field left empty                            |

Use `value` for fields that should always contain the same thing (form version, tax year). Use `fill: false` for fields that should stay blank (signature boxes).

***

## Widget Types

### Text

Standard text input fields. The LLM extracts the relevant value from your instructions based on the field's description.

```python theme={null}
{
    "type": "text",
    "description": "Social Security Number in XXX-XX-XXXX format",
    "bbox": {"left": 0.55, "top": 0.72, "width": 0.4, "height": 0.03, "page": 1}
}
```

Include format expectations in the description. "SSN in XXX-XX-XXXX format" produces better results than just "SSN" because the LLM knows how to format the output.

### Checkbox

Boolean fields that get checked or unchecked. The LLM interprets your instructions to determine whether the box should be checked.

```python theme={null}
{
    "type": "checkbox",
    "description": "US Citizen - Yes",
    "bbox": {"left": 0.15, "top": 0.35, "width": 0.02, "height": 0.02, "page": 1}
}
```

Checkbox bounding boxes should be small and roughly square. Make descriptions explicit about what checking means: "US Citizen - Yes" is clearer than "Citizenship" when there are Yes/No checkbox pairs.

### Dropdown

Selection fields with predefined options. The LLM suggests a value, and Edit selects the matching option from the PDF's dropdown.

```python theme={null}
{
    "type": "dropdown",
    "description": "State of incorporation",
    "bbox": {"left": 0.3, "top": 0.4, "width": 0.2, "height": 0.03, "page": 1}
}
```

<Warning>
  The value must exactly match an available option. If your instructions say "CA" but the dropdown only contains "California", the field is skipped silently. Consider listing options in the description: "State (CA, NY, TX, ...)" to help the LLM match correctly.
</Warning>

### Barcode

Special fields for barcode data. These are typically detected automatically in forms that have barcode regions and filled with encoded data.

```python theme={null}
{
    "type": "barcode",
    "description": "Document tracking code",
    "bbox": {"left": 0.7, "top": 0.9, "width": 0.25, "height": 0.05, "page": 1}
}
```

***

## Troubleshooting

<AccordionGroup>
  <Accordion title="Text appears in wrong location">
    Coordinates are from the top-left (0,0), with Y increasing downward. If you measured from bottom-left, flip the Y values: `correct_top = 1 - your_top - height`

    Start with a single field, verify it works, then add more incrementally.
  </Accordion>

  <Accordion title="Schema doesn't match existing PDF widgets">
    Edit matches schema fields to existing widgets using bounding box overlap. If overlap is less than 50%, a new widget is created.

    Run Edit without a schema first to see actual widget positions, then adjust your coordinates to match.
  </Accordion>

  <Accordion title="Checkbox not checking">
    1. Ensure `type` is `"checkbox"`, not `"text"`
    2. Bounding boxes should be small and square
    3. Be explicit in instructions: "Check the 'Yes' checkbox for US citizenship"
  </Accordion>

  <Accordion title="Fields skipped silently">
    Check for:

    * `fill: false` on the field
    * Dropdown value not matching available options exactly
    * Instructions don't mention data for this field
    * Unsupported widget types (signatures, images)
  </Accordion>
</AccordionGroup>
