TOI Crime News Dataset
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset aims to facilitate a faster judging process in India, a country where some legal cases have been known to persist for over a century [1]. Recognising that justice delayed is justice denied, this collection of crime articles serves as a valuable resource for legal professionals and the data science community alike [1]. It is designed to assist judges in reaching verdicts more quickly by providing access to similar historical cases, while also empowering lawyers with recent case examples to strengthen their arguments [1]. Furthermore, it offers a practical dataset for individuals engaged in Natural Language Processing (NLP) and recommender systems projects [1].
Columns
The
7k Unique crime articles.csv
file within this dataset contains the following columns:- heading: The main title of the article [2].
- content_summary: A concise summary of the article's content [2].
- article_link: The URL leading to the full article [2].
- img_link: The URL for any image associated with the article [2].
- month_date: The month and day of publication [2].
- time: The time of publication [2].
- Year: The year of publication [2].
Distribution
The dataset comprises two CSV files:
Crime_Articles.csv
and 7k Unique crime articles.csv
[3]. Crime_Articles.csv
contains 80,000 repetitive articles, of which over 6,000 are unique [3]. The 7k Unique crime articles.csv
file features nearly 8,000 unique crime articles [3]. The data has been extracted from the Times of India website [3]. Images corresponding to individual articles are planned to be attached in the future [3].
The dataset covers articles primarily from 2021 (33%) and 2022 (29%), with additional data from 2023 spanning from 1st January to 31st December [4, 5]. The quality of the dataset is rated 5 out of 5, and its current version is 1.0 [3, 6].Usage
This dataset is ideally suited for:
- Developing recommender systems for legal professionals [1].
- Assisting judges in quickly identifying precedents and similar cases for more efficient verdict delivery [1].
- Providing lawyers with relevant and recent case examples to inform their arguments and strategies [1].
- Supporting data scientists working on NLP tasks, text analysis, and machine learning models related to legal or journalistic data [1].
Coverage
The dataset focuses on Indian crime articles [1]. The articles collected span a time range that includes 2021, 2022, and 2023, with detailed date and time information for each entry [4, 5]. Specifically, the 2023 data covers the entire year from 01/01/2023 to 31/12/2023 [4]. There is no specific demographic scope mentioned, as the data consists of crime news articles.
License
CC BY-NC-SA
Who Can Use It
- Lawyers: To research similar cases, understand recent legal trends, and find examples for court arguments [1].
- Judges: To streamline the judgment process by referencing relevant prior cases [1].
- Data Scientists and Analysts: For projects involving Natural Language Processing, building recommender systems, and exploring large text datasets [1].
- Researchers: Studying patterns in Indian crime reporting or the legal system.
Dataset Name Suggestions
- Indian Crime Articles from TOI
- Justice AI: Indian Legal Data
- TOI Crime News Dataset
- Indian Judiciary Support Data
- Legal Recommendation Dataset (India)
Attributes
Original Data Source: 7K UNIQUE & 80K REPETITIVE INDIAN CRIME ART.(TOI)