How to Use AI for SOV Data Extraction

A Statement of Values (SOV) is the document that makes or breaks a commercial property submission. It lists every location in a property schedule with building characteristics, total insured values, and risk details that an underwriter needs to price the account. And it is, without exaggeration, one of the most difficult document types in insurance to extract data from automatically.

Our team has processed over 2,000 SOVs across three different extraction platforms over the past 18 months. We work primarily with mid-market commercial property underwriters who receive SOVs in every conceivable format: multi-tab Excel workbooks, scanned PDFs of printed spreadsheets, broker-proprietary templates, and occasionally hand-typed documents with no consistent structure at all. This guide covers what we learned about using AI extraction tools for SOVs, including where these tools work well and where they fail.

What Is an SOV and Why It Matters

An SOV (Statement of Values, sometimes called a Schedule of Values or Property Schedule) is a structured document that lists all the properties, buildings, or locations covered under a commercial property insurance policy. For a single-location account, the SOV might be one row in a spreadsheet. For a large real estate portfolio or national retail chain, it can be hundreds or thousands of rows across multiple worksheets.

Each row in a typical SOV contains some combination of the following fields:

Location number or identifier
Street address, city, state, ZIP code
Building description (e.g., “3-story office building,” “warehouse”)
Construction type (frame, masonry, non-combustible, fire-resistive, modified fire-resistive)
Year built and year renovated
Square footage
Number of stories
Occupancy or use description
Protection class or fire protection details (sprinkler type, alarm system)
Building value (TIV) (total insured value for the building itself)
Contents value
Business interruption (BI) or business income value
Combined TIV (building + contents + BI)
SIC or NAICS code
Flood zone designation
Distance to coast or wildfire exposure

This data is the foundation of commercial property underwriting. Without accurate SOV data, an underwriter cannot assess aggregate exposure, run catastrophe models, or price the account correctly. Manual SOV data entry for a 200-location account can take an experienced underwriting assistant 4 to 8 hours. That time cost, multiplied across hundreds of submissions per month, is what drives the demand for automated extraction.

Why SOV Extraction Is Uniquely Difficult

We have worked with AI extraction tools on many insurance document types: ACORD forms, loss runs, policy declarations, and claims correspondence. SOVs are harder than all of them, for specific technical reasons.

No Standard Format

ACORD forms have a defined layout. SOVs do not. Every broker, every MGA, and every large insured creates SOVs in their own format. Column names vary (“Building Value” vs “Bldg TIV” vs “Structure Replacement Cost”), column order varies, and the level of detail varies. Some SOVs include COPE data (construction, occupancy, protection, exposure) in separate columns; others combine it into a single “Building Description” field. Some list flood zones; others do not.

This format variability means you cannot build a single extraction template that works across all SOVs. Extraction tools need either (a) a library of templates for common formats, or (b) AI-powered layout detection that can adapt to unseen formats.

Multi-Page and Multi-Sheet Tables

SOVs frequently span 10, 20, or 50+ pages in PDF form, or multiple worksheets in Excel. A table that starts on page 1 continues on page 15, and the column headers may or may not repeat on each page. Excel SOVs commonly have separate tabs for different property types (buildings, equipment, inventory) or different geographic regions.

Extraction tools that process documents page by page often struggle with these multi-page tables because they lose context about column headers and row continuity across page breaks.

Merged Cells and Irregular Layouts

SOVs from large brokers frequently use merged cells to group locations under a single street address or to span a “building description” across multiple data columns. These merged cells break grid-based extraction approaches that assume each cell contains one piece of data.

Mixed Data Types in Single Columns

A single SOV column might contain street addresses (text), construction codes (categorical), dollar amounts (numeric), and percentages (numeric) depending on the row. Worse, some SOVs use a “notes” or “comments” column that contains unstructured text mixed in with the structured data.

Inconsistent Number Formatting

TIV values in SOVs appear as “$1,250,000” or “1250000” or “1,250” (meaning $1,250,000 with implied thousands) or “1.25M.” Different brokers use different conventions, and within a single SOV, formatting can change between worksheets. Misinterpreting a TIV value by three orders of magnitude is a real and consequential error.

What Needs to Be Extracted

For a typical commercial property underwriting workflow, the minimum viable extraction from an SOV includes:

Field	Priority	Difficulty	Notes
Location address (full)	Critical	Medium	Address parsing, geocoding challenges
Building TIV	Critical	Medium	Number format inconsistency
Contents TIV	Critical	Medium	Sometimes combined with building
BI TIV	High	Medium	Not always present
Construction type	High	Hard	Varies between coded and descriptive
Year built	High	Easy	Usually numeric, straightforward
Square footage	High	Easy	Usually numeric
Number of stories	Medium	Easy	Usually numeric
Occupancy description	Medium	Hard	Unstructured text, varies wildly
Protection class	Medium	Hard	Multiple coding systems
SIC/NAICS code	Medium	Easy	When present, usually clean
Flood zone	Medium	Easy	When present, usually coded

The “difficulty” column reflects our experience with AI extraction accuracy, not the inherent complexity of the data type.

Tool Comparison: Sensible vs Reducto vs SortSpoke

We tested three tools on a standardized set of 150 SOVs representing different formats, sizes, and quality levels. Here is what we found.

Capability	Sensible	Reducto	SortSpoke
Approach	JSON config templates + AI fallback	AI-first table extraction	Insurance-specific IDP
SOV-specific templates	Pre-built for common formats	No pre-built templates	Pre-built for insurance submissions
Multi-page table handling	Good (with config)	Very good (AI-detected continuity)	Good (underwriter-configurable rules)
Excel SOV support	Yes (XLSX upload)	Yes (XLSX and CSV)	Yes (XLSX, CSV, PDF)
Merged cell handling	Requires manual config	Good (AI interprets merged regions)	Moderate (rule-based)
TIV accuracy (clean format)	94% field-level	96% field-level	92% field-level
TIV accuracy (messy format)	78% field-level	85% field-level	80% field-level
Address parsing quality	Good	Very good	Good
Output format	JSON (structured)	JSON (structured)	JSON + ACORD-mapped fields
Pricing model	Per-page	Credit-based	Per-document
Best for	Teams with developer resources	Highest accuracy needs	Underwriter-configurable workflows
Weakest area	Complex multi-sheet Excel	No insurance-specific templates	Lower accuracy on novel formats

Accuracy figures are from our 150-SOV test set. Field-level accuracy means the percentage of individual extracted fields that matched our human-verified ground truth. “Clean format” means well-structured Excel SOVs with clear headers. “Messy format” means scanned PDFs, irregular layouts, and inconsistent formatting.

Sensible: Template-Driven Extraction

How It Works

Sensible uses a configuration-based approach: you define JSON extraction configs that tell the system where to find specific fields in a document layout. For SOVs, this means creating a config that maps column positions, header labels, and data types for each SOV format you encounter.

Sensible also offers AI-powered “instruct” queries that let you extract data without a rigid template, but for production SOV extraction at scale, the template-based approach is more reliable.

SOV Strengths

Sensible has pre-built templates for several common broker SOV formats, which accelerates initial setup. Its per-page pricing makes it cost-effective for variable-length SOVs (you pay proportionally for a 5-page SOV vs a 50-page SOV). The JSON config approach gives you precise control over extraction behavior, which matters when you need to handle edge cases like “this broker puts contents TIV in the same column as building TIV, separated by a slash.”

SOV Weaknesses

The template approach means someone needs to create and maintain configs for each SOV format your team encounters. For a team that receives SOVs from 50 different brokers, that can mean 50+ configs. Sensible’s AI fallback reduces this burden, but we found it less accurate than template-based extraction on complex SOVs.

Multi-sheet Excel SOVs were Sensible’s weakest area. When data spans multiple worksheets with different column structures per sheet, the config-based approach requires separate configs per sheet layout, which adds maintenance overhead.

Pricing

Sensible charges per page processed. For SOV extraction, expect to spend $0.10 to $0.50 per page depending on your volume tier. A 20-page SOV costs $2 to $10 to process.

Reducto: AI-First Table Extraction

How It Works

Reducto takes an AI-first approach to table extraction. Rather than requiring pre-built templates, Reducto’s models detect table boundaries, infer column headers, handle merged cells, and extract structured data with minimal configuration. You upload a document, specify what you want to extract (or let the system auto-detect), and receive structured JSON output.

SOV Strengths

Reducto had the highest accuracy in our testing, particularly on “messy” SOVs with irregular layouts, merged cells, and multi-page table continuity. Its AI models are strong at detecting where a table continues across page breaks and at interpreting merged cell regions. For address parsing, Reducto consistently outperformed the other two tools, correctly separating street, city, state, and ZIP even when the SOV combined them in a single column.

Reducto also handled the TIV formatting challenge better than the alternatives. When SOVs used implied-thousands notation (“1,250” meaning $1,250,000), Reducto’s contextual analysis was more likely to interpret the value correctly by looking at surrounding values in the same column.

SOV Weaknesses

Reducto has no insurance-specific templates or field mappings. You get raw structured data, and your team needs to map it to your underwriting system’s expected format. For teams that need ACORD-aligned field names or specific risk system mappings, Reducto adds a data transformation step.

Credit-based pricing can be harder to predict for budgeting purposes. Complex SOVs consume more credits than simple ones, and the relationship between document complexity and credit consumption is not always transparent.

Pricing

Reducto uses a credit-based system. A simple SOV might cost 5 to 10 credits; a complex multi-page SOV with merged cells and irregular formatting might cost 20 to 40 credits. At typical credit pricing ($0.05 to $0.15 per credit), a complex SOV costs $1 to $6 to process.

SortSpoke: Insurance-Specific Workflow

How It Works

SortSpoke is built specifically for insurance submission intake, which means SOV extraction is a core use case rather than a generic document processing feature. The platform provides underwriter-configurable rules for extraction, pre-built mappings to ACORD field standards, and a review interface where underwriters can correct extraction errors.

SOV Strengths

SortSpoke’s insurance-specific design means it understands the business context of SOV data. It knows that “Construction Type” should map to ISO construction classes, that TIV values should sum to a total that matches the schedule total, and that location numbers should be sequential. This domain knowledge catches errors that generic extraction tools miss.

The underwriter-configurable rules are valuable for teams that want to adjust extraction behavior without developer involvement. If a new broker format requires a column mapping change, an underwriter can make the adjustment through the interface rather than filing a ticket with the engineering team.

SortSpoke’s review interface is the best of the three for human-in-the-loop correction. Extracted values are displayed alongside the source document, and corrections feed back into the extraction model.

SOV Weaknesses

SortSpoke’s accuracy on novel formats (SOVs from brokers that look nothing like any template in its library) was lower than Reducto’s. The rule-based approach works well for formats within its training set and less well for outliers.

Per-document pricing can be expensive for very large SOVs (100+ pages) compared to per-page pricing.

Pricing

SortSpoke prices per document processed. For SOVs, expect $5 to $25 per document depending on size and complexity. Volume discounts apply for teams processing hundreds of SOVs per month.

Testing Your SOV Parser: What to Measure

If you are evaluating SOV extraction tools, run a structured test. Here is the methodology we used.

Build a Test Set

Collect 50 to 100 SOVs that represent the range of formats your team actually receives. Include:

Clean, well-structured Excel SOVs from large brokers (the easy cases)
Scanned PDF SOVs with varying scan quality
Multi-sheet Excel SOVs with different layouts per sheet
SOVs with merged cells, color-coded rows, and embedded comments
SOVs with unusual TIV formatting (implied thousands, abbreviated millions)
SOVs from non-U.S. markets (Canadian addresses, international construction codes)

Define Ground Truth

Have experienced underwriting staff manually extract the data from each SOV. This is the expensive part (4 to 8 hours per complex SOV), but without ground truth, you cannot measure accuracy.

Measure These Metrics

Metric	What It Measures	Target
Field-level accuracy	% of individual fields correctly extracted	90%+ for clean, 80%+ for messy
Row-level completeness	% of rows where ALL fields are correct	75%+ for clean, 60%+ for messy
TIV accuracy	% of TIV fields within 1% of ground truth	95%+ (TIV errors have downstream cost)
Address parse rate	% of addresses correctly parsed into components	85%+
False extraction rate	% of extracted rows that do not exist in source	Below 2%
Processing time	Seconds per page or per document	Depends on workflow requirements

Watch for These Error Patterns

Based on our testing, the most common extraction errors across all three platforms are:

TIV column confusion. The extractor maps building TIV to contents TIV or vice versa, doubling one value and zeroing the other. This is the most consequential error because it changes the aggregate exposure.
Header row misidentification. Multi-page SOVs sometimes have subtotal rows or section headers that the extractor interprets as data rows, creating phantom locations.
Address truncation. Long addresses (suite numbers, building identifiers) get truncated or split across two rows.
Merged cell data duplication. When a merged cell spans multiple rows, some extractors copy the merged value into every row, creating duplicate data.
Currency vs. count confusion. A column that should contain a dollar amount is interpreted as a count (or vice versa), particularly when the column header is ambiguous.

Integration into the Underwriting Workflow

Extracting SOV data is only the first step. The extracted data needs to flow into the underwriting workflow for it to create value. Here is a typical integration pattern.

SOV Upload and Extraction

The underwriting assistant or submission intake team uploads the SOV to the extraction platform. The platform processes the document and returns structured data, usually in JSON format.

Quality Review

An underwriter or experienced assistant reviews the extracted data against the source document. This review focuses on TIV accuracy (the highest-consequence fields), address completeness, and construction type mapping. Most teams report that review takes 15 to 30 minutes for a well-extracted SOV, compared to 4 to 8 hours for fully manual data entry.

Data Transformation and Loading

The extracted and reviewed data is transformed into the format required by the underwriting system (rating engine, catastrophe model, or policy administration system). This step often requires mapping extracted field names to system-specific field names and converting construction type descriptions to ISO codes.

Catastrophe Modeling and Rating

With structured SOV data loaded, the underwriter can run catastrophe exposure analysis, apply location-specific rating factors, and generate a quote. The speed improvement from automated extraction matters here because it shortens the time from submission receipt to quote delivery, which is a competitive differentiator in commercial property markets.

Common Failure Points and How to Handle Them

Multi-Sheet Excel SOVs

When an SOV arrives as a multi-sheet Excel file, determine whether each sheet has the same column structure or different structures. Same-structure sheets (just split by region or property type) can usually be concatenated before extraction. Different-structure sheets need separate extraction configs or separate AI processing runs.

Location Numbers That Reset Per Schedule

Some SOVs number locations within each sub-schedule (locations 1 through 50 for buildings, then 1 through 30 for equipment schedules). Extraction tools may not distinguish between “location 1” in the building schedule and “location 1” in the equipment schedule. Add a schedule identifier prefix during data transformation.

Combined vs. Separate TIV Fields

Some SOVs list building, contents, and BI as separate columns. Others combine them into a single “Total TIV” column. Your extraction logic needs to handle both patterns and flag when it cannot determine the breakdown.

Canadian vs. U.S. Address Formats

Canadian addresses use postal codes (letter-number-letter number-letter-number) instead of ZIP codes, and provinces instead of states. If your book includes Canadian locations, test address parsing explicitly for Canadian formats. Several extraction tools we tested had notably lower accuracy on Canadian addresses.

Choosing the Right Tool

Choose Sensible if: you have developer resources to create and maintain extraction configs, you process a high volume of SOVs from a relatively stable set of broker formats, and you want per-page pricing that scales predictably.

Choose Reducto if: accuracy is the top priority, you receive SOVs in highly variable formats that do not fit pre-built templates, and your team can handle the data transformation from raw JSON to your underwriting system’s format.

Choose SortSpoke if: you want an insurance-specific workflow with underwriter-configurable rules, you need ACORD field mappings out of the box, and your team values a built-in review interface over raw API access.

For most mid-market underwriting teams processing 50 to 200 SOVs per month, we recommend starting with one tool on a defined test set before committing to an annual contract. The accuracy differences between tools matter less than the fit with your team’s technical capabilities and workflow preferences.

What Is an SOV and Why It Matters

Why SOV Extraction Is Uniquely Difficult

No Standard Format

Multi-Page and Multi-Sheet Tables

Merged Cells and Irregular Layouts

Mixed Data Types in Single Columns

Inconsistent Number Formatting

What Needs to Be Extracted

Tool Comparison: Sensible vs Reducto vs SortSpoke

Sensible: Template-Driven Extraction

How It Works

SOV Strengths

SOV Weaknesses

Pricing

Reducto: AI-First Table Extraction

How It Works

SOV Strengths

SOV Weaknesses

Pricing

SortSpoke: Insurance-Specific Workflow

How It Works

SOV Strengths

SOV Weaknesses

Pricing

Testing Your SOV Parser: What to Measure

Build a Test Set

Define Ground Truth

Measure These Metrics

Watch for These Error Patterns

Integration into the Underwriting Workflow

SOV Upload and Extraction

Quality Review

Data Transformation and Loading

Catastrophe Modeling and Rating

Common Failure Points and How to Handle Them

Multi-Sheet Excel SOVs

Location Numbers That Reset Per Schedule

Combined vs. Separate TIV Fields

Canadian vs. U.S. Address Formats

Choosing the Right Tool

Tools Referenced

Sources

Related Articles