How to Use AI for SOV Data Extraction
SOVs are among the hardest insurance documents to parse because they vary wildly by broker, contain multi-page property schedules, and mix addresses with building characteristics in inconsistent column structures. API-first tools handle simple SOVs; enterprise IDP is needed for complex multi-sheet formats.
A Statement of Values (SOV) is the document that makes or breaks a commercial property submission. It lists every location in a property schedule with building characteristics, total insured values, and risk details that an underwriter needs to price the account. And it is, without exaggeration, one of the most difficult document types in insurance to extract data from automatically.
Our team has processed over 2,000 SOVs across three different extraction platforms over the past 18 months. We work primarily with mid-market commercial property underwriters who receive SOVs in every conceivable format: multi-tab Excel workbooks, scanned PDFs of printed spreadsheets, broker-proprietary templates, and occasionally hand-typed documents with no consistent structure at all. This guide covers what we learned about using AI extraction tools for SOVs, including where these tools work well and where they fail.
What Is an SOV and Why It Matters
An SOV (Statement of Values, sometimes called a Schedule of Values or Property Schedule) is a structured document that lists all the properties, buildings, or locations covered under a commercial property insurance policy. For a single-location account, the SOV might be one row in a spreadsheet. For a large real estate portfolio or national retail chain, it can be hundreds or thousands of rows across multiple worksheets.
Each row in a typical SOV contains some combination of the following fields:
- Location number or identifier
- Street address, city, state, ZIP code
- Building description (e.g., “3-story office building,” “warehouse”)
- Construction type (frame, masonry, non-combustible, fire-resistive, modified fire-resistive)
- Year built and year renovated
- Square footage
- Number of stories
- Occupancy or use description
- Protection class or fire protection details (sprinkler type, alarm system)
- Building value (TIV) (total insured value for the building itself)
- Contents value
- Business interruption (BI) or business income value
- Combined TIV (building + contents + BI)
- SIC or NAICS code
- Flood zone designation
- Distance to coast or wildfire exposure
This data is the foundation of commercial property underwriting. Without accurate SOV data, an underwriter cannot assess aggregate exposure, run catastrophe models, or price the account correctly. Manual SOV data entry for a 200-location account can take an experienced underwriting assistant 4 to 8 hours. That time cost, multiplied across hundreds of submissions per month, is what drives the demand for automated extraction.
Why SOV Extraction Is Uniquely Difficult
We have worked with AI extraction tools on many insurance document types: ACORD forms, loss runs, policy declarations, and claims correspondence. SOVs are harder than all of them, for specific technical reasons.
No Standard Format
ACORD forms have a defined layout. SOVs do not. Every broker, every MGA, and every large insured creates SOVs in their own format. Column names vary (“Building Value” vs “Bldg TIV” vs “Structure Replacement Cost”), column order varies, and the level of detail varies. Some SOVs include COPE data (construction, occupancy, protection, exposure) in separate columns; others combine it into a single “Building Description” field. Some list flood zones; others do not.
This format variability means you cannot build a single extraction template that works across all SOVs. Extraction tools need either (a) a library of templates for common formats, or (b) AI-powered layout detection that can adapt to unseen formats.
Multi-Page and Multi-Sheet Tables
SOVs frequently span 10, 20, or 50+ pages in PDF form, or multiple worksheets in Excel. A table that starts on page 1 continues on page 15, and the column headers may or may not repeat on each page. Excel SOVs commonly have separate tabs for different property types (buildings, equipment, inventory) or different geographic regions.
Extraction tools that process documents page by page often struggle with these multi-page tables because they lose context about column headers and row continuity across page breaks.
Merged Cells and Irregular Layouts
SOVs from large brokers frequently use merged cells to group locations under a single street address or to span a “building description” across multiple data columns. These merged cells break grid-based extraction approaches that assume each cell contains one piece of data.
Mixed Data Types in Single Columns
A single SOV column might contain street addresses (text), construction codes (categorical), dollar amounts (numeric), and percentages (numeric) depending on the row. Worse, some SOVs use a “notes” or “comments” column that contains unstructured text mixed in with the structured data.
Inconsistent Number Formatting
TIV values in SOVs appear as “$1,250,000” or “1250000” or “1,250” (meaning $1,250,000 with implied thousands) or “1.25M.” Different brokers use different conventions, and within a single SOV, formatting can change between worksheets. Misinterpreting a TIV value by three orders of magnitude is a real and consequential error.
What Needs to Be Extracted
For a typical commercial property underwriting workflow, the minimum viable extraction from an SOV includes:
| Field | Priority | Difficulty | Notes |
|---|---|---|---|
| Location address (full) | Critical | Medium | Address parsing, geocoding challenges |
| Building TIV | Critical | Medium | Number format inconsistency |
| Contents TIV | Critical | Medium | Sometimes combined with building |
| BI TIV | High | Medium | Not always present |
| Construction type | High | Hard | Varies between coded and descriptive |
| Year built | High | Easy | Usually numeric, straightforward |
| Square footage | High | Easy | Usually numeric |
| Number of stories | Medium | Easy | Usually numeric |
| Occupancy description | Medium | Hard | Unstructured text, varies wildly |
| Protection class | Medium | Hard | Multiple coding systems |
| SIC/NAICS code | Medium | Easy | When present, usually clean |
| Flood zone | Medium | Easy | When present, usually coded |
The “difficulty” column reflects our experience with AI extraction accuracy, not the inherent complexity of the data type.
Tool Comparison: Sensible vs Reducto vs SortSpoke
We tested three tools on a standardized set of 150 SOVs representing different formats, sizes, and quality levels. Here is what we found.
| Capability | Sensible | Reducto | SortSpoke |
|---|---|---|---|
| Approach | JSON config templates + AI fallback | AI-first table extraction | Insurance-specific IDP |
| SOV-specific templates | Pre-built for common formats | No pre-built templates | Pre-built for insurance submissions |
| Multi-page table handling | Good (with config) | Very good (AI-detected continuity) | Good (underwriter-configurable rules) |
| Excel SOV support | Yes (XLSX upload) | Yes (XLSX and CSV) | Yes (XLSX, CSV, PDF) |
| Merged cell handling | Requires manual config | Good (AI interprets merged regions) | Moderate (rule-based) |
| TIV accuracy (clean format) | 94% field-level | 96% field-level | 92% field-level |
| TIV accuracy (messy format) | 78% field-level | 85% field-level | 80% field-level |
| Address parsing quality | Good | Very good | Good |
| Output format | JSON (structured) | JSON (structured) | JSON + ACORD-mapped fields |
| Pricing model | Per-page | Credit-based | Per-document |
| Best for | Teams with developer resources | Highest accuracy needs | Underwriter-configurable workflows |
| Weakest area | Complex multi-sheet Excel | No insurance-specific templates | Lower accuracy on novel formats |
Accuracy figures are from our 150-SOV test set. Field-level accuracy means the percentage of individual extracted fields that matched our human-verified ground truth. “Clean format” means well-structured Excel SOVs with clear headers. “Messy format” means scanned PDFs, irregular layouts, and inconsistent formatting.
Sensible: Template-Driven Extraction
How It Works
Sensible uses a configuration-based approach: you define JSON extraction configs that tell the system where to find specific fields in a document layout. For SOVs, this means creating a config that maps column positions, header labels, and data types for each SOV format you encounter.
Sensible also offers AI-powered “instruct” queries that let you extract data without a rigid template, but for production SOV extraction at scale, the template-based approach is more reliable.
SOV Strengths
Sensible has pre-built templates for several common broker SOV formats, which accelerates initial setup. Its per-page pricing makes it cost-effective for variable-length SOVs (you pay proportionally for a 5-page SOV vs a 50-page SOV). The JSON config approach gives you precise control over extraction behavior, which matters when you need to handle edge cases like “this broker puts contents TIV in the same column as building TIV, separated by a slash.”
SOV Weaknesses
The template approach means someone needs to create and maintain configs for each SOV format your team encounters. For a team that receives SOVs from 50 different brokers, that can mean 50+ configs. Sensible’s AI fallback reduces this burden, but we found it less accurate than template-based extraction on complex SOVs.
Multi-sheet Excel SOVs were Sensible’s weakest area. When data spans multiple worksheets with different column structures per sheet, the config-based approach requires separate configs per sheet layout, which adds maintenance overhead.
Pricing
Sensible charges per page processed. For SOV extraction, expect to spend $0.10 to $0.50 per page depending on your volume tier. A 20-page SOV costs $2 to $10 to process.
Reducto: AI-First Table Extraction
How It Works
Reducto takes an AI-first approach to table extraction. Rather than requiring pre-built templates, Reducto’s models detect table boundaries, infer column headers, handle merged cells, and extract structured data with minimal configuration. You upload a document, specify what you want to extract (or let the system auto-detect), and receive structured JSON output.
SOV Strengths
Reducto had the highest accuracy in our testing, particularly on “messy” SOVs with irregular layouts, merged cells, and multi-page table continuity. Its AI models are strong at detecting where a table continues across page breaks and at interpreting merged cell regions. For address parsing, Reducto consistently outperformed the other two tools, correctly separating street, city, state, and ZIP even when the SOV combined them in a single column.
Reducto also handled the TIV formatting challenge better than the alternatives. When SOVs used implied-thousands notation (“1,250” meaning $1,250,000), Reducto’s contextual analysis was more likely to interpret the value correctly by looking at surrounding values in the same column.
SOV Weaknesses
Reducto has no insurance-specific templates or field mappings. You get raw structured data, and your team needs to map it to your underwriting system’s expected format. For teams that need ACORD-aligned field names or specific risk system mappings, Reducto adds a data transformation step.
Credit-based pricing can be harder to predict for budgeting purposes. Complex SOVs consume more credits than simple ones, and the relationship between document complexity and credit consumption is not always transparent.
Pricing
Reducto uses a credit-based system. A simple SOV might cost 5 to 10 credits; a complex multi-page SOV with merged cells and irregular formatting might cost 20 to 40 credits. At typical credit pricing ($0.05 to $0.15 per credit), a complex SOV costs $1 to $6 to process.
SortSpoke: Insurance-Specific Workflow
How It Works
SortSpoke is built specifically for insurance submission intake, which means SOV extraction is a core use case rather than a generic document processing feature. The platform provides underwriter-configurable rules for extraction, pre-built mappings to ACORD field standards, and a review interface where underwriters can correct extraction errors.
SOV Strengths
SortSpoke’s insurance-specific design means it understands the business context of SOV data. It knows that “Construction Type” should map to ISO construction classes, that TIV values should sum to a total that matches the schedule total, and that location numbers should be sequential. This domain knowledge catches errors that generic extraction tools miss.
The underwriter-configurable rules are valuable for teams that want to adjust extraction behavior without developer involvement. If a new broker format requires a column mapping change, an underwriter can make the adjustment through the interface rather than filing a ticket with the engineering team.
SortSpoke’s review interface is the best of the three for human-in-the-loop correction. Extracted values are displayed alongside the source document, and corrections feed back into the extraction model.
SOV Weaknesses
SortSpoke’s accuracy on novel formats (SOVs from brokers that look nothing like any template in its library) was lower than Reducto’s. The rule-based approach works well for formats within its training set and less well for outliers.
Per-document pricing can be expensive for very large SOVs (100+ pages) compared to per-page pricing.
Pricing
SortSpoke prices per document processed. For SOVs, expect $5 to $25 per document depending on size and complexity. Volume discounts apply for teams processing hundreds of SOVs per month.
Testing Your SOV Parser: What to Measure
If you are evaluating SOV extraction tools, run a structured test. Here is the methodology we used.
Build a Test Set
Collect 50 to 100 SOVs that represent the range of formats your team actually receives. Include:
- Clean, well-structured Excel SOVs from large brokers (the easy cases)
- Scanned PDF SOVs with varying scan quality
- Multi-sheet Excel SOVs with different layouts per sheet
- SOVs with merged cells, color-coded rows, and embedded comments
- SOVs with unusual TIV formatting (implied thousands, abbreviated millions)
- SOVs from non-U.S. markets (Canadian addresses, international construction codes)
Define Ground Truth
Have experienced underwriting staff manually extract the data from each SOV. This is the expensive part (4 to 8 hours per complex SOV), but without ground truth, you cannot measure accuracy.
Measure These Metrics
| Metric | What It Measures | Target |
|---|---|---|
| Field-level accuracy | % of individual fields correctly extracted | 90%+ for clean, 80%+ for messy |
| Row-level completeness | % of rows where ALL fields are correct | 75%+ for clean, 60%+ for messy |
| TIV accuracy | % of TIV fields within 1% of ground truth | 95%+ (TIV errors have downstream cost) |
| Address parse rate | % of addresses correctly parsed into components | 85%+ |
| False extraction rate | % of extracted rows that do not exist in source | Below 2% |
| Processing time | Seconds per page or per document | Depends on workflow requirements |
Watch for These Error Patterns
Based on our testing, the most common extraction errors across all three platforms are:
- TIV column confusion. The extractor maps building TIV to contents TIV or vice versa, doubling one value and zeroing the other. This is the most consequential error because it changes the aggregate exposure.
- Header row misidentification. Multi-page SOVs sometimes have subtotal rows or section headers that the extractor interprets as data rows, creating phantom locations.
- Address truncation. Long addresses (suite numbers, building identifiers) get truncated or split across two rows.
- Merged cell data duplication. When a merged cell spans multiple rows, some extractors copy the merged value into every row, creating duplicate data.
- Currency vs. count confusion. A column that should contain a dollar amount is interpreted as a count (or vice versa), particularly when the column header is ambiguous.
Integration into the Underwriting Workflow
Extracting SOV data is only the first step. The extracted data needs to flow into the underwriting workflow for it to create value. Here is a typical integration pattern.
SOV Upload and Extraction
The underwriting assistant or submission intake team uploads the SOV to the extraction platform. The platform processes the document and returns structured data, usually in JSON format.
Quality Review
An underwriter or experienced assistant reviews the extracted data against the source document. This review focuses on TIV accuracy (the highest-consequence fields), address completeness, and construction type mapping. Most teams report that review takes 15 to 30 minutes for a well-extracted SOV, compared to 4 to 8 hours for fully manual data entry.
Data Transformation and Loading
The extracted and reviewed data is transformed into the format required by the underwriting system (rating engine, catastrophe model, or policy administration system). This step often requires mapping extracted field names to system-specific field names and converting construction type descriptions to ISO codes.
Catastrophe Modeling and Rating
With structured SOV data loaded, the underwriter can run catastrophe exposure analysis, apply location-specific rating factors, and generate a quote. The speed improvement from automated extraction matters here because it shortens the time from submission receipt to quote delivery, which is a competitive differentiator in commercial property markets.
Common Failure Points and How to Handle Them
Multi-Sheet Excel SOVs
When an SOV arrives as a multi-sheet Excel file, determine whether each sheet has the same column structure or different structures. Same-structure sheets (just split by region or property type) can usually be concatenated before extraction. Different-structure sheets need separate extraction configs or separate AI processing runs.
Location Numbers That Reset Per Schedule
Some SOVs number locations within each sub-schedule (locations 1 through 50 for buildings, then 1 through 30 for equipment schedules). Extraction tools may not distinguish between “location 1” in the building schedule and “location 1” in the equipment schedule. Add a schedule identifier prefix during data transformation.
Combined vs. Separate TIV Fields
Some SOVs list building, contents, and BI as separate columns. Others combine them into a single “Total TIV” column. Your extraction logic needs to handle both patterns and flag when it cannot determine the breakdown.
Canadian vs. U.S. Address Formats
Canadian addresses use postal codes (letter-number-letter number-letter-number) instead of ZIP codes, and provinces instead of states. If your book includes Canadian locations, test address parsing explicitly for Canadian formats. Several extraction tools we tested had notably lower accuracy on Canadian addresses.
Choosing the Right Tool
Choose Sensible if: you have developer resources to create and maintain extraction configs, you process a high volume of SOVs from a relatively stable set of broker formats, and you want per-page pricing that scales predictably.
Choose Reducto if: accuracy is the top priority, you receive SOVs in highly variable formats that do not fit pre-built templates, and your team can handle the data transformation from raw JSON to your underwriting system’s format.
Choose SortSpoke if: you want an insurance-specific workflow with underwriter-configurable rules, you need ACORD field mappings out of the box, and your team values a built-in review interface over raw API access.
For most mid-market underwriting teams processing 50 to 200 SOVs per month, we recommend starting with one tool on a defined test set before committing to an annual contract. The accuracy differences between tools matter less than the fit with your team’s technical capabilities and workflow preferences.