Judicial Outcome Prediction Dataset
Government & Civic Records
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a collection of 3304 cases from the Supreme Court of the United States, spanning from 1955 to 2021. Each entry includes case identifiers, the factual summary of the case, and its final decision outcome. Unlike many similar datasets, this one uniquely incorporates the detailed facts of each case, making it particularly valuable for Natural Language Processing (NLP) applications. The primary purpose of this dataset is to facilitate the prediction of case outcomes based on their factual descriptions. The key target variable indicates whether the first party won the case. Researchers and data scientists can leverage NLP techniques to extract meaningful features from the 'facts' column for predictive modelling of judicial decisions.
Columns
- index: An integer index for the record.
- ID: A unique identifier for each case.
- name: The name of the Supreme Court case.
- href: A hyperlink associated with the case.
- docket: The docket number assigned to the case.
- term: The judicial term during which the case was heard.
- first_party: The name of the first party involved in the case.
- second_party: The name of the second party involved in the case.
- facts: The textual summary of the case's facts.
- facts_len: The character length of the 'facts' column.
- Target Variable (First Party Winner): A boolean indicating whether the first party prevailed in the case.
Distribution
The dataset contains 3304 individual Supreme Court cases. Data files are typically provided in CSV format. The length of the 'facts' column varies significantly, with lengths ranging from 26 to 6201 characters. The distribution of fact lengths shows a high concentration between 643.50 and 1261.00 characters (1,618 cases), followed by 1261.00 to 1878.50 characters (818 cases). The judicial terms covered span from 1955 to 2020. The dataset exhibits a diverse range of unique values for 'ID', 'facts_len', and 'term'.
Usage
This dataset is ideally suited for:
- Developing and testing Natural Language Processing (NLP) models for legal text analysis.
- Predicting the outcomes of Supreme Court cases based on their factual descriptions.
- Building features from unstructured text data using NLP techniques.
- Research into judicial behaviour, legal trends, and the factors influencing court decisions.
- Creating machine learning applications for legal tech.
Coverage
The dataset covers cases from the Supreme Court of the United States. The time range of the cases included is from 1955 to 2021. The data pertains to judicial decisions at a national level within the United States.
License
CC0
Who Can Use It
- Data Scientists and Machine Learning Engineers: To build and train predictive models for legal outcomes.
- Legal Researchers and Academics: For empirical studies on judicial decision-making, legal linguistics, and historical legal analysis.
- NLP Practitioners: To develop and benchmark new text processing techniques on challenging legal narratives.
- Legal Tech Developers: To create applications that offer insights into court cases or assist in legal strategy.
- Students and Educators: As a rich resource for projects and coursework in data science, law, and AI.
Dataset Name Suggestions
- Supreme Court Judgment Prediction
- US Supreme Court Case Outcomes
- Judicial Outcome Prediction Dataset
- Supreme Court Legal Facts and Decisions
- United States Supreme Court Case Data
Attributes
Original Data Source: Supreme Court Judgment Prediction