Court Case Verdicts Dataset
Government & Civic Records
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a curated collection of United States Supreme Court cases, designed to facilitate research in natural language processing and other data-driven applications. Its primary purpose is to allow for the creation of predictive models that can forecast court judgments based on the factual details of a case. By offering a structured and well-annotated set of legal documents, this resource addresses the current scarcity of publicly available, high-quality legal datasets for such advanced analytical work. The dataset includes a target variable, "First Party Winner," indicating whether the first party prevailed in a given case, enabling the development of models that can emulate a human jury's decision-making process.
Columns
- index: An internal sequential identifier for each record.
- ID / id: Unique identifiers for each Supreme Court case.
- name: The widely recognised name of the court case (e.g., Roe v. Wade).
- href: A URL link providing access to more detailed information about the specific court case.
- docket: The official docket number assigned to the case by the court.
- term: The specific court term during which the case was heard.
- first_party: The name of the initial party involved in the legal dispute.
- second_party: The name of the opposing party in the legal dispute.
- facts: A detailed textual description of the factual background pertinent to the case, crucial for NLP applications.
- facts_len: The calculated length of the 'facts' column content.
- First Party Winner: A binary target variable, where 'true' signifies that the first party won the case, and 'false' indicates the second party was victorious.
Distribution
The dataset is typically provided in a CSV file format and contains 3304 individual Supreme Court cases. Each record is structured to include key identifiers, the textual facts of the case, and the eventual decision outcome. A notable feature is the inclusion of the case facts, which are often absent in other comparable legal datasets, thereby enhancing its utility for natural language processing tasks.
Usage
This dataset is ideally suited for Natural Language Processing (NLP) research and various data-driven applications within the legal domain. Potential uses include:
- Predicting court case outcomes by analysing the factual details presented.
- Developing and training predictive models that can classify a court's judgment based on textual information.
- Identifying and revealing underlying patterns that influence judicial decisions.
- Creating AI systems capable of emulating jury decisions through automated verdict generation.
Coverage
The dataset specifically covers United States Supreme Court cases and spans a significant time range from 1955 to 2021. While primarily focused on legal outcomes, the data provides details that can indirectly inform on the demographic scope of parties involved, for example, by sometimes masking identities to protect plaintiffs. The dataset is global in its listing, but the content is specific to US Supreme Court proceedings.
License
CCO
Who Can Use It
This dataset is highly beneficial for:
- NLP Researchers: To build and test models for legal text analysis and outcome prediction.
- Data Scientists: For applying machine learning techniques to real-world legal challenges.
- Legal Scholars and Analysts: To gain data-driven insights into judicial trends and decision-making.
- Academic Institutions: For educational purposes and advanced research projects in law and AI.
Dataset Name Suggestions
- Supreme Court Decisions Prediction
- US Judicial Case Outcomes
- SCOTUS Legal Judgment Data
- Court Case Verdicts Dataset
- American Supreme Court Judgments
Attributes
Original Data Source: Supreme Court Judgment Prediction