Parse
Change Tracking & Comments
Extract document changes and PDF annotations with markup and location data
Extract underlined/strikethrough text with HTML markup and PDF comments with their locations.
Change Tracking
Add HTML tags around text formatting to detect document changes.
Configuration
Requirements: Only works with hybrid
or metadata
extraction mode (not ocr
).
Output
<u>underlined text</u>
for underlined text<s>deleted text</s>
for strikethrough text<change><s>old</s> <u>new</u></change>
for change sequences
PDF Comments
Extract text annotations from PDF documents with their content and locations.
Configuration
Output
Comments include content and normalized bounding box coordinates:
The bbox
array contains [left, top, width, height]
normalized to [0,1] relative to page dimensions.