From Raw Records to Court-Ready Data

Financial and operational data produced in discovery is rarely clean. Our forensics process transforms whatever raw material exists — incomplete, inconsistent, or deliberately obscured — into a structured, auditable dataset that supports your analysis and withstands opposing scrutiny.

Phase 01

Evidence Intake & Chain of Custody

Every dataset we receive is logged at intake — file name, format, size, hash value, date received, and producing party. From that moment forward, we maintain a documented chain of custody: every transformation, cleaning operation, calculation, and export is logged with a timestamp and a description of what was done and why.

This chain-of-custody documentation is not administrative formality. It is the foundation of data provenance — the ability to trace every number in the final analysis back to a specific record in a specific source file, proving that nothing was introduced, altered, or lost in the process of building the model.

Phase 02

Data Reconstruction & Gap Analysis

Incomplete records are the norm in litigation. Systems change, files are lost, periods are not retained, and opposing parties produce data that is partial by design or by negligence. We reconstruct missing periods using the statistical techniques appropriate to the data type — interpolation for continuous series, imputation for categorical fields, extrapolation with explicit uncertainty quantification for extended gaps.

Every gap and every reconstruction is documented: what was missing, what method was used to address it, and what uncertainty that reconstruction introduces into downstream calculations. If a gap is too large or too material to address without compromising the analysis, we say so directly.

Phase 03

Integrity Verification & Anomaly Detection

We run the sanitized dataset through a systematic integrity verification protocol: internal consistency checks, cross-validation against independent data sources (tax returns vs. accounting records, POS data vs. bank statements), Benford's Law analysis for fabrication indicators, and isolation forest algorithms for statistical outlier detection.

Anomalies are classified by type — data entry error, coding inconsistency, structural outlier, or potential fabrication indicator — and each classification is supported by specific evidence from the data. We do not accuse. We present what the data shows and let counsel determine how to proceed.

Phase 04

Sanitization & Standardization

With integrity verified and anomalies documented, we produce a sanitized dataset: deduped, consistently formatted, correctly coded, and organized into a schema that supports the specific analytical questions the matter requires. Date fields are standardized, currency fields are normalized, categorization schemes are reconciled across inconsistent source systems.

The sanitized dataset is the single source of truth for all subsequent analysis — every calculation in the damages model, every figure in the expert report, every exhibit in the trial package traces back to a record in this dataset. That traceability is what makes the analysis defensible.

Phase 05

Court-Ready Output Packages

The final output includes the sanitized dataset in production-ready format (CSV, Excel, SQL), the data dictionary documenting every field and its source, the anomaly report with supporting exhibits, the chain-of-custody log, and the methodology memorandum describing every cleaning and reconstruction decision made during the process.

This package is designed to be produced in discovery without embarrassment — every decision is documented, every source is traceable, and every gap is disclosed. It gives your expert a foundation they can defend on the stand.

What Sanitization Addresses

The Six Most Common Data Problems in Litigation

Financial data produced in litigation is almost never clean out of the box. These are the issues we encounter and resolve in nearly every engagement.

Duplicate Records

Transactions recorded multiple times due to system migration errors, manual re-entry, or intentional manipulation. We identify duplicates using multi-field matching and hash comparison, not just record ID, since sophisticated duplicates alter non-key fields to evade simple deduplication.

Inconsistent Categorization

Revenue and expense categories coded differently across periods, entities, or systems — producing apparent changes in cost structure that reflect reclassification rather than actual operations. We rebuild a consistent chart of accounts across all periods before any analysis begins.

Missing Periods

Gaps in the time series — absent months, quarters with no transactions, or records that stop abruptly at a legally significant date. We characterize each gap (data loss vs. operational cessation vs. non-production) and address it with the appropriate statistical technique.

Inter-Entity Transactions

Transactions between related entities that inflate or deflate revenue and expense figures for individual entities in a corporate group. We identify and eliminate or separately disclose inter-company transactions before any profit analysis is run.

Benford's Law Violations

Digit frequency distributions in financial records that deviate from Benford's Law — a statistical indicator of potential fabrication or systematic rounding. We run this analysis on all revenue and expense datasets and disclose any violations for further investigation.

Timing Manipulations

Revenue or expense items recorded in periods other than when they economically occurred — accelerating recognition before a measurement date, deferring costs to avoid a period of analysis. We identify these through matching transaction dates against underlying documentation and economic substance.

Common Questions

Data Forensics: Questions & Answers

What file formats do you accept?

Any structured or semi-structured format: Excel, CSV, SQL dumps, Access databases, JSON, XML, and most accounting software exports including QuickBooks, Sage, NetSuite, SAP, and Oracle. We have also worked with paper records that required manual data entry and OCR extraction. The format is rarely a limitation — the completeness and accuracy of what is in the file is what matters.

How do you handle confidential business data?

All data is received and stored under the terms of the applicable protective order. We do not retain client data beyond the engagement period, do not use it for any purpose other than the specific engagement, and transfer it back or destroy it on engagement close per counsel's instruction. Chain-of-custody documentation supports a complete accounting of how the data was handled throughout.

Can you work from incomplete records?

Yes, and we routinely do. The methodology adjusts based on what is available — complete records allow precise analysis; incomplete records require statistical reconstruction with explicit uncertainty disclosure. In every case, we will tell you what confidence level the available data supports and flag where gaps create material uncertainty in the downstream damages calculations.

At what stage of litigation should data forensics begin?

As early as possible — ideally before or concurrent with document requests. Understanding the data landscape early informs which records to request, which custodians to depose, and which production deficiencies to challenge. Waiting until after all discovery closes means working with whatever was produced, often without recourse to address gaps or inconsistencies that early engagement would have surfaced.

Data Forensics & Integrity