Parse

The Parse.Run() method converts documents into structured JSON with text, tables, and figures. It runs OCR, detects document layout, and returns content organized into chunks optimized for LLM and RAG workflows.

Basic Usage

package main

import (
    "context"
    "fmt"
    "io"
    "os"
    
    reducto "github.com/reductoai/reducto-go-sdk"
    "github.com/reductoai/reducto-go-sdk/option"
    "github.com/reductoai/reducto-go-sdk/shared"
)

func main() {
    client := reducto.NewClient(option.WithAPIKey(os.Getenv("REDUCTO_API_KEY")))
    
    // Upload
    file, _ := os.Open("invoice.pdf")
    defer file.Close()
    
    upload, _ := client.Upload(context.Background(), reducto.UploadParams{
        File: reducto.F[io.Reader](file),
    })
    
    // Parse
    result, err := client.Parse.Run(context.Background(), reducto.ParseRunParams{
        ParseConfig: reducto.ParseConfigParam{
            DocumentURL: reducto.F[reducto.ParseConfigDocumentURLUnionParam](
                shared.UnionString(upload.FileID),
            ),
        },
    })
    if err != nil {
        fmt.Printf("Parse error: %v\n", err)
        return
    }
    
    // Access results using union type
    if result.Result.Type == shared.ParseResponseResultTypeFull {
        fullResult := result.Result.AsUnion().(shared.ParseResponseResultFullResult)
        for _, chunk := range fullResult.Chunks {
            fmt.Println(chunk.Content)
            for _, block := range chunk.Blocks {
                fmt.Printf("  %s on page %d\n", block.Type, block.Bbox.Page)
            }
        }
    } else if result.Result.Type == shared.ParseResponseResultTypeURL {
        urlResult := result.Result.AsUnion().(shared.ParseResponseResultURLResult)
        fmt.Printf("Large document - fetch from URL: %s\n", urlResult.URL)
    }
}

Method Signatures

Synchronous Parse

func (s *ParseService) Run(
    ctx context.Context,
    body ParseRunParams,
    opts ...option.RequestOption,
) (*shared.ParseResponse, error)

Asynchronous Parse

func (s *ParseService) RunJob(
    ctx context.Context,
    body ParseRunJobParams,
    opts ...option.RequestOption,
) (*ParseRunJobResponse, error)

The RunJob method returns a JobID that you can use with client.Job.Get() to retrieve results.

Input Options

The DocumentURL parameter accepts several formats:

// From upload
DocumentURL: reducto.F[reducto.ParseConfigDocumentURLUnionParam](
    shared.UnionString(upload.FileID),
)

// Public URL
DocumentURL: reducto.F[reducto.ParseConfigDocumentURLUnionParam](
    shared.UnionString("https://example.com/doc.pdf"),
)

// Presigned S3 URL
DocumentURL: reducto.F[reducto.ParseConfigDocumentURLUnionParam](
    shared.UnionString("https://bucket.s3.amazonaws.com/doc.pdf?X-Amz-..."),
)

// Reprocess previous job
DocumentURL: reducto.F[reducto.ParseConfigDocumentURLUnionParam](
    shared.UnionString("jobid://7600c8c5-a52f-49d2-8a7d-d75d1b51e141"),
)

Configuration Examples

Chunking

By default, Parse returns the entire document as one chunk. For RAG applications, use variable chunking:

result, err := client.Parse.Run(context.Background(), reducto.ParseRunParams{
    ParseConfig: reducto.ParseConfigParam{
        DocumentURL: reducto.F[reducto.ParseConfigDocumentURLUnionParam](
            shared.UnionString(upload.FileID),
        ),
        Options: reducto.F(shared.BaseProcessingOptionsParam{
            Chunking: reducto.F(shared.BaseProcessingOptionsChunkingParam{
                ChunkMode: reducto.F(shared.BaseProcessingOptionsChunkingChunkModeVariable),
                ChunkSize: reducto.Int(1000), // Optional: chunk size in tokens
            }),
        }),
    },
})

Table Output Format

Control how tables appear in the output:

result, err := client.Parse.Run(context.Background(), reducto.ParseRunParams{
    ParseConfig: reducto.ParseConfigParam{
        DocumentURL: reducto.F[reducto.ParseConfigDocumentURLUnionParam](
            shared.UnionString(upload.FileID),
        ),
        AdvancedOptions: reducto.F(shared.AdvancedProcessingOptionsParam{
            TableOutputFormat: reducto.F(shared.AdvancedProcessingOptionsTableOutputFormatHTML),
        }),
    },
})

Page Range

Process only specific pages:

result, err := client.Parse.Run(context.Background(), reducto.ParseRunParams{
    ParseConfig: reducto.ParseConfigParam{
        DocumentURL: reducto.F[reducto.ParseConfigDocumentURLUnionParam](
            shared.UnionString(upload.FileID),
        ),
        AdvancedOptions: reducto.F(shared.AdvancedProcessingOptionsParam{
            PageRange: reducto.F[shared.AdvancedProcessingOptionsPageRangeUnionParam](
                shared.PageRangeParam{
                    Start: reducto.Int(1),
                    End:   reducto.Int(10),
                },
            ),
        }),
    },
})

Filter Blocks

Remove specific content types from output:

result, err := client.Parse.Run(context.Background(), reducto.ParseRunParams{
    ParseConfig: reducto.ParseConfigParam{
        DocumentURL: reducto.F[reducto.ParseConfigDocumentURLUnionParam](
            shared.UnionString(upload.FileID),
        ),
        Options: reducto.F(shared.BaseProcessingOptionsParam{
            FilterBlocks: reducto.F([]shared.BaseProcessingOptionsFilterBlock{
                shared.BaseProcessingOptionsFilterBlockHeader,
                shared.BaseProcessingOptionsFilterBlockFooter,
                shared.BaseProcessingOptionsFilterBlockPageNumber,
            }),
        }),
    },
})

Asynchronous Processing

For long-running operations, use Parse.RunJob() to submit a job and poll for results:

// Submit async job
jobResponse, err := client.Parse.RunJob(context.Background(), reducto.ParseRunJobParams{
    DocumentURL: reducto.F[reducto.ParseRunJobParamsDocumentURLUnion](
        shared.UnionString(upload.FileID),
    ),
    Webhook: reducto.F(shared.WebhookConfigNewParam{
        Mode: reducto.F(shared.WebhookConfigNewModeSvix),
        Channels: reducto.F([]string{"https://your-webhook-url.com"}),
    }),
})
if err != nil {
    return err
}

jobID := jobResponse.JobID

// Poll for results
var job *reducto.JobGetResponse
for {
    time.Sleep(2 * time.Second) // Wait 2 seconds
    job, err = client.Job.Get(context.Background(), jobID)
    if err != nil {
        return err
    }
    
    if job.Status == reducto.JobGetResponseStatusCompleted {
        break
    } else if job.Status == reducto.JobGetResponseStatusFailed {
        return fmt.Errorf("job failed: %s", job.Reason)
    }
}

// Access result
result := job.Result

Response Structure

result, err := client.Parse.Run(context.Background(), params)
if err != nil {
    return err
}

// Top-level fields
fmt.Printf("Job ID: %s\n", result.JobID)
fmt.Printf("Duration: %.2f\n", result.Duration)
fmt.Printf("Pages: %d\n", result.Usage.NumPages)
fmt.Printf("Credits: %.2f\n", result.Usage.Credits)

// Result content using union type
if result.Result.Type == shared.ParseResponseResultTypeFull {
    fullResult := result.Result.AsUnion().(shared.ParseResponseResultFullResult)
    for _, chunk := range fullResult.Chunks {
        fmt.Println(chunk.Content)
        fmt.Println(chunk.Embed)
        
        for _, block := range chunk.Blocks {
            // Use SDK constants for block type comparisons
            if block.Type == shared.ParseResponseResultFullResultChunksBlocksTypeTable {
                fmt.Printf("Found table on page %d\n", block.Bbox.Page)
            }
        }
    }
} else if result.Result.Type == shared.ParseResponseResultTypeURL {
    urlResult := result.Result.AsUnion().(shared.ParseResponseResultURLResult)
    fmt.Printf("Large document - fetch from URL: %s\n", urlResult.URL)
}

Union Types

The Go SDK uses union types for responses that can have different structures. The ParseResponseResult can be either a ParseResponseResultFullResult (inline content) or a ParseResponseResultURLResult (URL to fetch content). Always check the Type field first, then use AsUnion() to cast to the specific type:

if result.Result.Type == shared.ParseResponseResultTypeFull {
    fullResult := result.Result.AsUnion().(shared.ParseResponseResultFullResult)
    // Access fullResult.Chunks directly
    for _, chunk := range fullResult.Chunks {
        // ...
    }
} else if result.Result.Type == shared.ParseResponseResultTypeURL {
    urlResult := result.Result.AsUnion().(shared.ParseResponseResultURLResult)
    // Fetch content from urlResult.URL
}

Error Handling

result, err := client.Parse.Run(context.Background(), params)
if err != nil {
    fmt.Printf("Parse failed: %v\n", err)
    return
}

Next Steps

Learn about extracting specific fields from parsed documents
Explore response format details for complete structure
Check out best practices for optimization

Get Started

Core Methods

Utilities

Basic Usage

Method Signatures

Synchronous Parse

Asynchronous Parse

Input Options

Configuration Examples

Chunking

Table Output Format

Page Range

Filter Blocks

Asynchronous Processing

Response Structure

Union Types

Error Handling

Next Steps

Get Started

Core Methods

Utilities

​Basic Usage

​Method Signatures

​Synchronous Parse

​Asynchronous Parse

​Input Options

​Configuration Examples

​Chunking

​Table Output Format

​Page Range

​Filter Blocks

​Asynchronous Processing

​Response Structure

​Union Types

​Error Handling

​Next Steps

Basic Usage

Method Signatures

Synchronous Parse

Asynchronous Parse

Input Options

Configuration Examples

Chunking

Table Output Format

Page Range

Filter Blocks

Asynchronous Processing

Response Structure

Union Types

Error Handling

Next Steps