Methodology
State and local courts handle a wide range of civil legal issues that deeply impact the lives of low-income Americans, including eviction, debt collection, and domestic violence. There is limited data available on the prevalence and nature of these legal issues because there is no centralized database across the thousands of US state and local courts. The Civil Court Data Initiative aims to close this gap by collecting, standardizing, and analyzing civil court records to make insights more accessible for legal aid providers, policymakers, researchers, and the public.
This document outlines how the CCDI collects, standardizes, and analyzes civil court records to produce the estimates for filings, representation, and judgments that appear on this website and in the external work that make use of CCDI data.
Data Collection
CCDI collects court records from a range of data sources. The primary method is collecting court records from court lookup websites. Many state and local courts provide websites where a user can search for case information by case number, party name, or date filed. We develop software to systematically identify all cases filed from 2016 through the present. Most often, this software is developed to search the court website for each date in the time period and then search for each case. Where court websites do not provide the option to search by date, we identify whether the court uses a pattern for assigning case numbers. For instance, a court might assign the first civil case filed in 2016 as “2016-CV-000001”, where “2016” is the year, the case is filed, “CV” is assigned to civil cases, and “000001” is a number that increases by one with each new case filed. Where we identify these patterned numbering conventions, we develop software for identifying all cases filed and then collect the case record for each case number.
A civil case begins with an initial filing or complaint and additional information is entered as the case proceeds through the case and is ultimately closed. We strive to collect information about civil court records across the full lifecycle from filing through case closure.
We also engage in data-sharing to access court records in jurisdictions which do not make court records publicly available in a manner that facilitates data collection. Data-sharing is a preferred means of data collection because the data comes with greater confidence and does not carry the additional potential inaccuracies that come when scraping individual court records. We engage in data-sharing directly with the courts in multiple jurisdictions. These courts make civil court records available for research purposes, often governed by a data-sharing agreement. These datasets are commonly provided as a bulk extract which is updated periodically or as daily extracts providing information about the cases with case activity on the previous day. We also engage in data-sharing with software developers who collect court records from local jurisdictions and provide the data to LSC free of charge.
Data Validation
We strive to provide data that is representative of the full caseload that courts process. To measure how closely our data mirrors the full caseload, we compare the number of cases filed per year by county and by case type in our database to the number of filings that courts identify in their annual reports. We have compiled county-level filing counts from counties in over 30 states from annual statewide court reports. We have made this dataset, known as the Baseline Court Statistics (BCS) dataset available to the public for others to use for secondary analysis. You can find this dataset and a codebook available here.
There are numerous reasons why the data we collect might be inaccurate; some causes are based on inaccuracies in the information that courts provide on their websites while others are due to data processing errors after collecting the records.
Some examples of court-related data issues include:
- In some jurisdictions, certain cases are sealed or masked at some point after they are filed to protect the privacy of the parties involved in the case, meaning that these case records are not made available to the public. For example, a jurisdiction may have a law that seals any eviction case that does not result in the tenant being removed from the rental home in order to protect the tenant from future housing discrimination. If a jurisdiction automatically seals some cases after filing, the number of cases scraped from a public court website might differ significantly from the number ultimately reported in the court’s annual report.
- Courts have migrated from physical to digital records over the past few decades. In some jurisdictions, older cases have not been fully digitized and will not be available for incorporation into our database. As a result, our database may have no data or incomplete data for these jurisdictions in years prior to 2016, even if aggregate case data is available through official court reports.
- Courts occasionally change case types in response to new state and local laws and policies. For instance, the case types associated with eviction in some counties in Florida has changed at least twice since 2016. We continuously monitor case types to ensure they are recoded properly; however, some case types will not be recoded immediately and might lead to inaccurate filing counts for a period of time.
Some examples of data processing issues:
- When scraping court records, we must systematically identify every case filed within the time period of analysis. Most court websites allow you to enter a range of dates for which the website will return all cases filed between those dates; however, some websites do not provide this functionality. For locations that do not provide this option, we must determine how the courts assign case numbers and construct all possible case numbers (see the Data Collection section for more details). If a court system changes the numbering system or does not strictly follow this numbering system then our software will not capture the full caseload.
- Each court website structures data differently. One court might identify eviction cases under the case type “unlawful detainer” while another will identify these cases under “forcible entry and detainer”. We must reclassify these case types to a single taxonomy to facilitate analysis. While we take great pains to classify case types accurately, we might make mistakes which would lead to inaccuracies in the data and visualizations we produce.
This validation process gives us confidence that we have collected close to the full caseload; however, this process does not validate the quality of the data collected. Ideally, we would be able to validate the representation rates and judgment rates produced by our database against official court statistics. Unfortunately, only a handful of courts report representation or judgment counts or rates in a manner that can be compared to our database. We attempt to overcome this limitation by conducting internal consistency checks.
We assess the consistency of representation rates by analyzing each case type, county, and year individually. For a particular county, case type, and party type (plaintiff or defendant) combination over time. Specifically, if any of the annual representation rates deviate from the average representation rate over all years by more than 10%, we flag it for additional investigation. We intend to consult with the local courts and legal aid providers to confirm that the data is valid before sharing it publicly. This work is underway and could estimates could change as we refine our methods.
Data Cleaning
The data we collect varies in structure and quality across jurisdictions. Courts use different terms to refer to similar legal issues. For instance, eviction cases might be called “landlord-tenant”, “unlawful detainer”, “dispossessory”, among other terms. Similarly, courts identify judgments in a wide range of ways. Some online court record systems explicitly state the judgment type (default, dismissal, etc.), the judgment amount, and any additional costs or interest rates. Other systems only include this information in a table of docket entries; these tables often contain information entered as free-text. We attempt to extract useful information from this free-text; however, a lack of standardization in this area means that our data will never fully capture every judgment.
Every court system structures court records slightly differently. While the use of common technology providers enforces some standardization across jurisdictions, there are often small tweaks in data structure. After collecting court records from a given jurisdiction, we standardize the jurisdiction’s data to a standard structure that we have devised. The standard structure groups common case information to a few categories: general case information (I.e., case number, date filed, date closed, case type, case title, claim amount), party information (party name, party type, party address, attorney representation), disposition/judgment information (judgment type, date judged, judgment amount, interest rate), and event information (event type, date, and additional event information).
Eviction Policy Summaries
State and local eviction policy summaries are based on Legal Services Corporation’s Eviction Laws Database, which was first released in July 2021 in partnership with the Center for Public Health Law Research at Temple University’s Beasley School of Law. The Database includes eviction laws & procedures in effect as of January 1, 2021 for all 50 states, the District of Columbia, Puerto Rico, American Samoa, Guam, the U.S. Virgin Islands, the Northern Mariana Islands, the Federated States of Micronesia, the Marshall Islands, and Palau, as well as a selection of 30 local jurisdictions.
Detailed documentation on the methods used to create the Eviction Laws Database can be found here for the State/Territory-level dataset, and here for the Local-level dataset. The complete Eviction Laws Database can be explored on LSC’s website.
Unless otherwise specified, all laws and procedures documented in the eviction policy summaries were in effect as of January 1, 2021. In other words, the policy summaries are a “snapshot” of what eviction looked like in a jurisdiction on January 1, 2021. In the coming months, the CCDI team will be working to update the policy summaries to incorporate recent changes to eviction laws and procedures.