> ## Documentation Index
> Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Contract & Legal Document Review

> Parse redlined legal documents and extract insertions, deletions, and annotations as structured data

Reducto's `change_tracking` feature extracts strikethroughs, underlines, and annotations from redlined contracts as structured HTML tags—making it easy to list changes programmatically, categorize by type, and build approval workflows.

***

## What you'll build

By the end of this cookbook, you'll have a pipeline that parses any redlined contract and extracts every revision as structured data.

```
Redlined Contract (PDF/DOCX)
        |
        v
Reducto Parse (change_tracking enabled)
        |
        v
Structured output with <change>, <s>, <u> tags
        |
        v
Python extraction → List of all changes
```

***

## Create API Key

<Steps>
  <Step title="Open Studio">
    Go to [studio.reducto.ai](https://studio.reducto.ai) and sign in. From the home page, click **API Keys** in the left sidebar.

    <Frame>
      <img src="https://mintcdn.com/reducto/9Avr4qdsIoNo7JLQ/cookbooks/dummy-docs/screenshots/api-1.png?fit=max&auto=format&n=9Avr4qdsIoNo7JLQ&q=85&s=6fda1435e042681807741c7743273da2" alt="Studio home page with API Keys in sidebar" width="3164" height="1922" data-path="cookbooks/dummy-docs/screenshots/api-1.png" />
    </Frame>
  </Step>

  <Step title="View API Keys">
    The API Keys page shows your existing keys. Click **+ Create new API key** in the top right corner.

    <Frame>
      <img src="https://mintcdn.com/reducto/9Avr4qdsIoNo7JLQ/cookbooks/dummy-docs/screenshots/api-2.png?fit=max&auto=format&n=9Avr4qdsIoNo7JLQ&q=85&s=10db7406c2ac7217e4b1d75e028b58e1" alt="API Keys page with Create button" width="3164" height="1922" data-path="cookbooks/dummy-docs/screenshots/api-2.png" />
    </Frame>
  </Step>

  <Step title="Configure Key">
    In the modal, enter a name for your key and set an expiration policy (or select "Never" for no expiration). Click **Create**.

    <Frame>
      <img src="https://mintcdn.com/reducto/9Avr4qdsIoNo7JLQ/cookbooks/dummy-docs/screenshots/api-3.png?fit=max&auto=format&n=9Avr4qdsIoNo7JLQ&q=85&s=afb60f6cfb4d33940669d534dd007343" alt="New API Key modal with name and expiration fields" width="3164" height="1922" data-path="cookbooks/dummy-docs/screenshots/api-3.png" />
    </Frame>
  </Step>

  <Step title="Copy Your Key">
    Copy your new API key and store it securely. You won't be able to see it again after closing this dialog.

    <Frame>
      <img src="https://mintcdn.com/reducto/9Avr4qdsIoNo7JLQ/cookbooks/dummy-docs/screenshots/api-4.png?fit=max&auto=format&n=9Avr4qdsIoNo7JLQ&q=85&s=c861b1c2f593244957cf15c6fd717f60" alt="Copy API key dialog" width="3164" height="1922" data-path="cookbooks/dummy-docs/screenshots/api-4.png" />
    </Frame>

    Set the key as an environment variable:

    ```bash theme={null}
    export REDUCTO_API_KEY="your-api-key-here"
    ```
  </Step>
</Steps>

Install the SDK:

<CodeGroup>
  ```bash Python theme={null}
  pip install reductoai requests
  ```

  ```bash JavaScript theme={null}
  npm install reductoai
  ```
</CodeGroup>

***

## Sample document

For this cookbook, we use a 165-page union labor agreement (AFSCME Local 328 vs. Oregon Health & Science University) with extensive redlines showing proposed contract changes. The document includes:

* Strikethroughs for deleted clauses
* Underlines for new language
* Inline annotations explaining changes

**Download the sample:**

```
https://static1.squarespace.com/static/5cee0f8eb1a76b0001ca1d78/t/5e45817fe03fd3048a4b1792/1581613498729/Redline+Contract+with+Annotation.pdf
```

<Tip>
  Reducto works with both Word documents (which have native track changes metadata) and PDFs (where it visually detects underlines and strikethroughs). Word documents give the best results since the change metadata is embedded in the file.
</Tip>

***

## Step 1: Parse with change tracking

### Upload the document

First, upload your redlined contract to Reducto:

<CodeGroup>
  ```python Python theme={null}
  from pathlib import Path
  from reducto import Reducto

  client = Reducto()

  upload = client.upload(file=Path("redlined_contract.pdf"))

  print(f"File ID: {upload.file_id}")
  ```

  ```javascript JavaScript theme={null}
  import Reducto from "reductoai";
  import fs from "fs";

  const client = new Reducto();

  const upload = await client.upload({
    file: fs.createReadStream("redlined_contract.pdf")
  });

  console.log(`File ID: ${upload.file_id}`);
  ```
</CodeGroup>

```
File ID: reducto://e1894955-d89d-42ef-a118-f08d61ee890d.pdf
```

### Parse with change tracking enabled

The key setting is `formatting.include: ["change_tracking"]`. This tells Reducto to detect underlines and strikethroughs and wrap them in HTML tags.

<CodeGroup>
  ```python Python theme={null}
  result = client.parse.run(
      input=upload.file_id,
      formatting={
          "include": ["change_tracking"]
      }
  )

  print(f"Pages: {result.usage.num_pages}")
  print(f"Credits: {result.usage.credits}")
  ```

  ```javascript JavaScript theme={null}
  const result = await client.parse.run({
    input: upload.file_id,
    formatting: {
      include: ["change_tracking"]
    }
  });

  console.log(`Pages: ${result.usage.num_pages}`);
  console.log(`Credits: ${result.usage.credits}`);
  ```
</CodeGroup>

```
Pages: 165
Credits: 188.0
```

**Why `change_tracking`?**

Without this option, Reducto returns plain text. With it enabled, revisions appear as HTML tags that you can parse programmatically:

* `<s>` wraps strikethrough text (deletions)
* `<u>` wraps underlined text (insertions)
* `<change>` groups related revisions together

***

## Step 2: Handle large documents

For large documents like our 165-page contract, Reducto returns results as a URL rather than inline data. This keeps response sizes manageable.

<CodeGroup>
  ```python Python theme={null}
  import requests

  # Check if result is a URL
  if hasattr(result.result, 'url'):
      print(f"Result URL: {result.result.url[:80]}...")
      response = requests.get(result.result.url)
      data = response.json()
      chunks = data.get('chunks', [])
      full_content = "\n".join([c.get('content', '') for c in chunks])
  else:
      full_content = "\n".join([c.content for c in result.result.chunks])

  print(f"Content length: {len(full_content)} characters")
  ```

  ```javascript JavaScript theme={null}
  let fullContent;

  // Check if result is a URL (large documents)
  if (result.result.type === "url") {
    console.log(`Result URL: ${result.result.url.slice(0, 80)}...`);
    const response = await fetch(result.result.url);
    const data = await response.json();
    const chunks = data.chunks || [];
    fullContent = chunks.map(c => c.content || "").join("\n");
  } else {
    fullContent = result.result.chunks.map(c => c.content).join("\n");
  }

  console.log(`Content length: ${fullContent.length} characters`);
  ```
</CodeGroup>

```
Result URL: https://prod-storage20241010144745140900000001.s3.amazonaws.com/ac80631f...
Content length: 365675 characters
```

***

## Step 3: Understand the output

With change tracking enabled, revisions appear as HTML markup in the content. Here's what we found in our sample contract:

```
<change> tags found: 554
<s> (strikethrough) tags found: 289
<u> (underline) tags found: 438
```

### Real examples from the document

**Simple insertion** (new language added):

```html theme={null}
<change><u>and any PEOPLE deduction</u></change>
```

**Clause deletion** (entire section removed):

```html theme={null}
<change><s>a. An employee's chosen form of dues or payment in lieu of dues
shall recommence upon reinstatement following a period of layoff or
extended leave.</s></change>
```

**Replacement** (old language replaced with new):

```html theme={null}
<change><s>forty-eight (48)</s> <u>fifty (50)</u></change>
```

### Tag meanings

| Tag        | Meaning         | Visual in document                  |
| ---------- | --------------- | ----------------------------------- |
| `<s>`      | Strikethrough   | ~~deleted text~~                    |
| `<u>`      | Underline       | inserted text                       |
| `<change>` | Revision region | Groups related deletions/insertions |

A single `<change>` block can contain:

* Just a deletion: `<change><s>removed text</s></change>`
* Just an insertion: `<change><u>new text</u></change>`
* Both: `<change><s>old</s> <u>new</u></change>`

***

## Step 4: Extract changes programmatically

Now we parse the HTML tags to get a structured list of all changes. This function uses regex to find every `<change>` block and extract the deletions and insertions within it.

<CodeGroup>
  ```python Python theme={null}
  import re

  def extract_changes(content):
      """Extract all revision regions from parsed content."""
      changes = []

      pattern = r'<change>(.*?)</change>'
      for match in re.finditer(pattern, content, re.DOTALL):
          change_text = match.group(1)

          # Extract deletions (strikethrough)
          deletions = re.findall(r'<s>(.*?)</s>', change_text, re.DOTALL)

          # Extract insertions (underline)
          insertions = re.findall(r'<u>(.*?)</u>', change_text, re.DOTALL)

          changes.append({
              "deleted": deletions,
              "inserted": insertions,
          })

      return changes
  ```

  ```javascript JavaScript theme={null}
  function extractChanges(content) {
    const changes = [];

    // Match all <change>...</change> blocks
    const changeRegex = /<change>([\s\S]*?)<\/change>/g;
    let match;

    while ((match = changeRegex.exec(content)) !== null) {
      const changeText = match[1];

      // Extract deletions (strikethrough)
      const deletions = [];
      const delRegex = /<s>([\s\S]*?)<\/s>/g;
      let delMatch;
      while ((delMatch = delRegex.exec(changeText)) !== null) {
        deletions.push(delMatch[1]);
      }

      // Extract insertions (underline)
      const insertions = [];
      const insRegex = /<u>([\s\S]*?)<\/u>/g;
      let insMatch;
      while ((insMatch = insRegex.exec(changeText)) !== null) {
        insertions.push(insMatch[1]);
      }

      changes.push({
        deleted: deletions,
        inserted: insertions
      });
    }

    return changes;
  }
  ```
</CodeGroup>

**Why regex?**

The HTML tags are simple and well-structured. For basic extraction, regex is fast and sufficient. For documents with complex nested changes, consider using an HTML parser like BeautifulSoup.

### Run the extraction

<CodeGroup>
  ```python Python theme={null}
  changes = extract_changes(full_content)

  print(f"Found {len(changes)} revisions")
  for i, change in enumerate(changes[:5]):
      print(f"\nRevision {i+1}:")
      if change["deleted"]:
          print(f"  Deleted: {change['deleted'][0][:60]}...")
      if change["inserted"]:
          print(f"  Inserted: {change['inserted'][0][:60]}...")
  ```

  ```javascript JavaScript theme={null}
  const changes = extractChanges(fullContent);

  console.log(`Found ${changes.length} revisions`);
  for (let i = 0; i < Math.min(5, changes.length); i++) {
    const change = changes[i];
    console.log(`\nRevision ${i + 1}:`);
    if (change.deleted.length > 0) {
      console.log(`  Deleted: ${change.deleted[0].slice(0, 60)}...`);
    }
    if (change.inserted.length > 0) {
      console.log(`  Inserted: ${change.inserted[0].slice(0, 60)}...`);
    }
  }
  ```
</CodeGroup>

**Output from our sample contract:**

```
Found 555 revisions

Revision 1:
  Deleted: Employees in the bargaining unit are required either to b...

Revision 2:
  Deleted: a. An employee's chosen form of dues or payment in lieu o...

Revision 3:
  Deleted: b. Dues and payments in-lieu-of dues for employees workin...

Revision 4:
  Inserted: Employees covered by this Agreement shall have the right...

Revision 5:
  Inserted: 1.2.2 Holder of Record. During the life of this Agreemen...
```

***

## Step 5: Categorize changes

Not all changes are equal. Some are pure deletions (language removed), some are pure insertions (new language added), and some are replacements (old swapped for new). Categorizing helps prioritize review.

<CodeGroup>
  ```python Python theme={null}
  deletions_only = sum(1 for c in changes if c["deleted"] and not c["inserted"])
  insertions_only = sum(1 for c in changes if c["inserted"] and not c["deleted"])
  replacements = sum(1 for c in changes if c["deleted"] and c["inserted"])

  print(f"Total revisions: {len(changes)}")
  print(f"  - Deletions only: {deletions_only}")
  print(f"  - Insertions only: {insertions_only}")
  print(f"  - Replacements: {replacements}")
  ```

  ```javascript JavaScript theme={null}
  const deletionsOnly = changes.filter(c => c.deleted.length > 0 && c.inserted.length === 0).length;
  const insertionsOnly = changes.filter(c => c.inserted.length > 0 && c.deleted.length === 0).length;
  const replacements = changes.filter(c => c.deleted.length > 0 && c.inserted.length > 0).length;

  console.log(`Total revisions: ${changes.length}`);
  console.log(`  - Deletions only: ${deletionsOnly}`);
  console.log(`  - Insertions only: ${insertionsOnly}`);
  console.log(`  - Replacements: ${replacements}`);
  ```
</CodeGroup>

**Output:**

```
Total revisions: 555
  - Deletions only: 191
  - Insertions only: 270
  - Replacements: 94
```

This contract has 191 sections removed entirely, 270 new sections added, and 94 places where language was swapped.

***

## Using Studio

You can also extract changes visually in Reducto Studio without writing code.

<Steps>
  <Step title="Upload your document">
    Go to [studio.reducto.ai](https://studio.reducto.ai) and upload your redlined contract.
  </Step>

  <Step title="Enable change tracking">
    In the **Configurations** tab, switch to **Advanced** mode. Expand the **Formatting** section and check `change_tracking`.

    <img src="https://mintcdn.com/reducto/9Avr4qdsIoNo7JLQ/images/parse-change-tracking.png?fit=max&auto=format&n=9Avr4qdsIoNo7JLQ&q=85&s=129cea3f50a3f1b9bb8b08ac0e9428c0" width="2804" height="1664" data-path="images/parse-change-tracking.png" />
  </Step>

  <Step title="Run and review">
    Click **Run**. The results show the parsed content with `<change>`, `<s>`, and `<u>` tags visible in the output. You can search for specific changes using Ctrl+F.
  </Step>

  <Step title="Export or deploy">
    Copy the results, download as JSON, or deploy the pipeline with these settings for repeated use on similar documents.
  </Step>
</Steps>

***

## Complete example

Here's a full script that parses a redlined contract and generates a change summary:

<CodeGroup>
  ```python Python theme={null}
  import re
  import requests
  from pathlib import Path
  from reducto import Reducto

  def extract_changes(content):
      """Extract revision regions from content."""
      changes = []
      pattern = r'<change>(.*?)</change>'

      for match in re.finditer(pattern, content, re.DOTALL):
          change_text = match.group(1)
          deletions = re.findall(r'<s>(.*?)</s>', change_text, re.DOTALL)
          insertions = re.findall(r'<u>(.*?)</u>', change_text, re.DOTALL)
          changes.append({"deleted": deletions, "inserted": insertions})

      return changes

  # Parse the document
  client = Reducto()

  upload = client.upload(file=Path("redlined_contract.pdf"))

  result = client.parse.run(
      input=upload.file_id,
      formatting={"include": ["change_tracking"]}
  )

  # Handle URL result for large documents
  if hasattr(result.result, 'url'):
      response = requests.get(result.result.url)
      data = response.json()
      chunks = data.get('chunks', [])
      full_content = "\n".join([c.get('content', '') for c in chunks])
  else:
      full_content = "\n".join([c.content for c in result.result.chunks])

  # Extract and categorize
  changes = extract_changes(full_content)

  deletions_only = sum(1 for c in changes if c["deleted"] and not c["inserted"])
  insertions_only = sum(1 for c in changes if c["inserted"] and not c["deleted"])
  replacements = sum(1 for c in changes if c["deleted"] and c["inserted"])

  # Print summary
  print(f"Document: {result.usage.num_pages} pages")
  print(f"Total revisions: {len(changes)}")
  print(f"  - Deletions only: {deletions_only}")
  print(f"  - Insertions only: {insertions_only}")
  print(f"  - Replacements: {replacements}")
  ```

  ```javascript JavaScript theme={null}
  import Reducto from "reductoai";
  import fs from "fs";

  function extractChanges(content) {
    const changes = [];
    const changeRegex = /<change>([\s\S]*?)<\/change>/g;
    let match;

    while ((match = changeRegex.exec(content)) !== null) {
      const changeText = match[1];
      const deletions = [...changeText.matchAll(/<s>([\s\S]*?)<\/s>/g)].map(m => m[1]);
      const insertions = [...changeText.matchAll(/<u>([\s\S]*?)<\/u>/g)].map(m => m[1]);
      changes.push({ deleted: deletions, inserted: insertions });
    }

    return changes;
  }

  // Parse the document
  const client = new Reducto();

  const upload = await client.upload({
    file: fs.createReadStream("redlined_contract.pdf")
  });

  const result = await client.parse.run({
    input: upload.file_id,
    formatting: { include: ["change_tracking"] }
  });

  // Handle URL result for large documents
  let fullContent;
  if (result.result.type === "url") {
    const response = await fetch(result.result.url);
    const data = await response.json();
    fullContent = (data.chunks || []).map(c => c.content || "").join("\n");
  } else {
    fullContent = result.result.chunks.map(c => c.content).join("\n");
  }

  // Extract and categorize
  const changes = extractChanges(fullContent);

  const deletionsOnly = changes.filter(c => c.deleted.length > 0 && c.inserted.length === 0).length;
  const insertionsOnly = changes.filter(c => c.inserted.length > 0 && c.deleted.length === 0).length;
  const replacements = changes.filter(c => c.deleted.length > 0 && c.inserted.length > 0).length;

  // Print summary
  console.log(`Document: ${result.usage.num_pages} pages`);
  console.log(`Total revisions: ${changes.length}`);
  console.log(`  - Deletions only: ${deletionsOnly}`);
  console.log(`  - Insertions only: ${insertionsOnly}`);
  console.log(`  - Replacements: ${replacements}`);
  ```
</CodeGroup>

**Output from our sample contract:**

```
Document: 165 pages
Total revisions: 555
  - Deletions only: 191
  - Insertions only: 270
  - Replacements: 94
```

***

## How change tracking works

Reducto uses different detection methods depending on document type:

| Document type | Detection method                                                   |
| ------------- | ------------------------------------------------------------------ |
| Word (.docx)  | Reads native track changes metadata. Most accurate.                |
| PDF (digital) | Detects colored text and formatting via embedded character data.   |
| PDF (scanned) | Uses ML models to visually identify underlines and strikethroughs. |

<Note>
  For best results, use Word documents with Track Changes enabled. The metadata is preserved natively. PDFs require visual detection, which works well but depends on clear formatting.
</Note>

***

## Best practices

<AccordionGroup>
  <Accordion title="Use Word documents when possible">
    Word's native Track Changes stores revision metadata directly in the file. This gives Reducto exact information about what was added or removed, including author and timestamp. PDFs require visual detection.
  </Accordion>

  <Accordion title="Categorize changes for efficient review">
    Replacements (where old text is swapped for new) often need careful review. Pure insertions may be less risky. Route different categories to appropriate reviewers.
  </Accordion>

  <Accordion title="Combine with Extract for clause analysis">
    Use Parse with change tracking to get the revisions, then pipe specific clauses through Extract to pull structured fields like dates, amounts, or party names.
  </Accordion>

  <Accordion title="Handle nested changes carefully">
    Some documents have nested revisions (changes within changes). The regex patterns above handle simple cases. For complex documents, consider using an HTML parser like BeautifulSoup.
  </Accordion>
</AccordionGroup>

***

## Use cases

<AccordionGroup>
  <Accordion title="Contract review automation">
    Extract all changes from incoming redlines and route them to the appropriate reviewer based on clause type. Send indemnification changes to legal, pricing changes to finance.
  </Accordion>

  <Accordion title="Change approval workflows">
    Build approval queues where each revision must be explicitly accepted or rejected before finalizing the agreement. Track who approved what.
  </Accordion>

  <Accordion title="Compliance tracking">
    Monitor changes to policies and procedures. Flag modifications to critical sections for compliance review before they go into effect.
  </Accordion>

  <Accordion title="Negotiation summaries">
    Generate executive summaries showing what the counterparty changed. Brief stakeholders without requiring them to read a 165-page document.
  </Accordion>
</AccordionGroup>

***

## Next steps

<CardGroup cols={2}>
  <Card title="Additional Document Data" icon="file-lines" href="/configs/parse/additional-document-data">
    Learn about highlights, hyperlinks, and signatures.
  </Card>

  <Card title="Extract" icon="filter" href="/extract/overview">
    Pull structured data from specific clauses.
  </Card>

  <Card title="Batch Processing" icon="layer-group" href="/cookbooks/batch-processing">
    Process multiple contracts in parallel.
  </Card>

  <Card title="Studio Guide" icon="browser" href="/studio-parse">
    Visual walkthrough of the Parse pipeline.
  </Card>
</CardGroup>
