Overview
This cookbook demonstrates how to:- Navigate to Apple.com’s investor relations section using Stagehand AI actions
- Automatically download PDFs when links are clicked
- Poll the Browserbase Downloads API until the file is ready
- Extract the PDF from the ZIP archive downloaded from Browserbase
- Upload the PDF to Reducto and extract structured iPhone net sales data
- Output the extracted financial data as formatted JSON
Prerequisites
Before starting, you need:- A Browserbase account with API key and project ID
- A Reducto account with API key
- A Google API key for Gemini (used by Stagehand)
- Python 3.9+
Step-by-step breakdown
Step 1: Initialize Stagehand and create a session
Stagehand provides AI-powered browser automation on top of Browserbase. Initialize the clients and start a session:Step 2: Navigate and trigger download with AI actions
Use Stagehand’s AI-powered actions to navigate the page naturally using plain English instructions:Step 3: Poll the Downloads API
Browserbase stores downloads and makes them available through the Downloads API. Poll until the download completes:Step 4: Extract PDF from ZIP
Browserbase returns downloads as a ZIP archive. Extract the PDF:Step 5: Extract data with Reducto
Upload the PDF to Reducto and extract structured financial data using a schema:Step 6: Output results
The extracted data is returned as structured JSON matching your schema:Full implementation
For the complete implementation with all helper functions, error handling, and Reducto extraction schema, see the full source code on GitHub.Resources
Browserbase Template
View the original Browserbase template
Source Code
Full source code on GitHub
Extract API
Learn more about Reducto’s Extract API
Array Extraction
Extract multiple records from documents