When you parse a document, Reducto extracts the main text content by default. But documents often contain additional information layered on top: revision marks from Track Changes, margin comments, highlighted passages, hyperlinks, and signatures. TheDocumentation Index
Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
Use this file to discover all available pages before exploring further.
formatting.include option lets you extract these.
These formatting options are available in the Python SDK, Node.js SDK, and via cURL. The Go SDK has limited support—only
enable_underlines (for change tracking) is currently available.Change Tracking
Legal documents, contracts, and collaborative drafts often use underlines and strikethroughs to show what changed between versions. Reducto can detect these and wrap them in HTML tags so you can programmatically identify revisions.<s> tag marks strikethrough (typically deletions), <u> marks underlines (typically insertions), and <change> wraps the entire revision region.
How it works: For digital PDFs and Word documents, Reducto reads the embedded formatting information. For scanned documents, it uses a segmentation model to visually detect underlines and strikethroughs on the page image.
Common uses:
- Contract review: automatically extract what changed between versions
- Compliance: track modifications to policies and procedures
- Editorial workflows: preserve editor suggestions in parsed output
Comments
PDF sticky notes, Word margin comments, and Excel cell notes contain reviewer feedback, questions, and instructions that are separate from the document content itself. Reducto extracts these as distinct blocks.Highlights
Highlighted text usually signals importance. Reducto can detect highlighted passages and wrap them in<mark> tags, letting you identify what reviewers or authors emphasized.
- Extract key passages from research documents
- Identify what reviewers marked as significant during review
- Use highlights as importance signals for summarization
Hyperlinks
Documents contain links to external resources, internal references, and citations. Reducto extracts these and converts them to markdown link format, preserving both the display text and the URL.- Build reference lists from academic papers
- Audit documents for broken or outdated links
- Extract cited sources for verification
Signatures
Forms and contracts often contain signature fields. Reducto can detect where signatures appear, which is useful for determining whether a document has been signed or for locating signature regions for downstream processing.- Verify that forms have been signed before processing
- Route unsigned documents back for completion
- Classify documents as signed vs. unsigned