Opendatabay APP

Legal Text Analysis Dataset

Government & Civic Records

Tags and Keywords

Law

Text

Nlp

Government

Australia

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Legal Text Analysis Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset features Australian legal cases from the Federal Court of Australia (FCA), specifically collected from AustLII. It includes cases from 2006 through to 2009, offering rich text content and metadata. Each document captures catchphrases, citation sentences, citation catchphrases, and citation classes, which indicate the type of treatment given to cases cited within the current document. This dataset serves as a valuable resource for developing models to perform text classification on legal data and for exploring key terms within various case categories.

Columns

  • case_id: A unique identifier assigned to each legal case, with 24,985 distinct values present in the dataset.
  • case_outcome: Represents the classification of the case's outcome or, more specifically, the treatment given to cited cases. Examples include 'cited' (49%), 'referred to' (18%), and 'Other' (34% with 8,382 values).
  • case_title: The official title of the legal case, containing 18,581 distinct titles.
  • case_text: The full textual content of the legal case document, with 17,921 unique text entries.

Distribution

The dataset typically comprises data files in a format like CSV. It contains legal cases primarily from the years 2006 to 2009. While the exact number of rows or records is not specified, the presence of thousands of unique values across the various columns suggests a substantial volume of data. The dataset is structured to capture detailed information about legal cases, their content, and citation patterns.

Usage

  • Text Classification: Develop and train machine learning models to classify legal documents based on their content and outcomes.
  • Exploratory Data Analysis (EDA): Conduct analysis to identify important keywords and phrases associated with different types of legal case categories.
  • Natural Language Processing (NLP): Apply NLP techniques for information extraction, sentiment analysis, or summarisation within the legal domain.

Coverage

  • Geographic Scope: The dataset is focused on Australia, specifically drawing cases from the Federal Court of Australia (FCA).
  • Time Range: It encompasses legal cases from a four-year period, including 2006, 2007, 2008, and 2009.
  • Data Availability: All cases from the Federal Court of Australia within the specified years are included in the dataset, ensuring a consistent collection over this period.

License

CCO

Who Can Use It

  • Data Scientists and Machine Learning Engineers: For building and refining models for legal text classification and legal analytics.
  • Legal Researchers and Scholars: To study legal trends, citation patterns, and judicial outcomes.
  • Academic Institutions: Particularly those involved in Computer Science and Engineering, for research into Natural Language Processing applied to legal texts.
  • Government Analysts: For insights into legal precedents and case management.

Dataset Name Suggestions

  • Australian Federal Court Cases
  • Legal Case Citation Dataset
  • AustLII Text Classification Data
  • FCA Legal Document Collection
  • Legal Text Analysis Dataset

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

11/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free