Dark Mode

Home

Data Categories

AI & ML Data

Historical AI and ML Models

FREE DATASET LIBRARY

Verified Data Provider

£0

Historical AI and ML Models

Data Science and Analytics

Tags and Keywords

Models

Ai

Machine

Learning

Data

Trusted By

Historical AI and ML Models Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

A collection of influential and notable machine learning models. A model is included if it meets criteria such as achieving a state-of-the-art improvement on a recognised benchmark, being highly cited (over 1,000 citations), having historical relevance, or demonstrating significant use. The models were selected from various sources, including literature reviews, historical accounts, and suggestions from individuals.

Columns

System: The unique, best-known name of the model, used as the primary key.
Domain: The high-level machine learning domain of application (e.g., Language, Vision, Audio).
Organization: The organisation(s) responsible for creating the model.
Organization categorization: The category of the creating organisation (e.g., Industry, Academia, Non-profit).
Country (from Organization): The country associated with the developing organisation.
Authors: A comma-separated list of the model's authors.
Publication date: The publication, announcement, or release date of the model (YYYY-MM-DD).
Reference: The title of the literature reference, such as a journal or conference paper.
Link: A URL to the best-choice source documenting the model.
Citations: The number of citations the model has received.
Notability criteria: The reason for the model's inclusion (e.g., SOTA improvement, highly cited).
Notability criteria notes: Additional text explaining the notability.
Parameters: The number of parameters in the model.
Parameters notes: Explanations or sources for the parameter count.
Training compute (FLOP): The amount of computation used for training, measured in FLOP.
Training compute notes: Notes regarding the training compute calculation.
Training dataset: The name of the dataset used for training the model.
Training dataset notes: Additional details about the training data.
Training dataset size (datapoints): The size of the training dataset.
Dataset size notes: Further information on the dataset size.
Epochs: The number of epochs the model was trained for.
Training time (hours): The duration of the model's training in hours.
Training time notes: Additional context on the training time.
Training hardware: The hardware used for training (e.g., Google TPU v3).
Hardware quantity: The amount of hardware units used.
Hardware utilization: The efficiency of hardware use during training.
Training compute cost (2023 USD): The estimated cost of training the model.
Compute cost notes: Explanations for how the cost was estimated.
Confidence: The level of confidence in the accuracy of the data entry.
Abstract: The abstract from the model's reference paper.
Model accessibility: The availability status of the model (e.g., Unreleased).
Base model: The foundational model used for fine-tuning.
Finetune compute (FLOP): The amount of computation used for fine-tuning.
Finetune compute notes: Notes regarding the fine-tuning compute.
Batch size: The batch size used during training.
Batch size notes: Further details on the batch size.
Frontier model: A boolean indicating if the model is considered a frontier model.
Training power draw (W): The power consumed during training in watts.

Distribution

Format: CSV
Size: 1.53 MB
Structure: 867 rows/records and 38 columns.

Usage

This dataset is ideal for analysing trends in machine learning research and development. It can be used for data visualisation to track the evolution of model parameters, training compute, and costs over time. Researchers can perform data analytics to identify patterns in model development across different domains (like Language and Vision) and organisation types (such as Academia and Industry). It is also suitable for classification tasks and for creating historical accounts of AI progress.

Coverage

Geographic: The data includes models from organisations worldwide, with a significant portion from the United States of America.
Time Range: The dataset covers models with publication dates ranging from July 1950 to September 2024.
Demographic: The data primarily concerns organisations and authors within the machine learning field. There is no specific demographic focus on end-users. Information is not available for all fields for every model; for instance, training compute cost and hardware quantity have a high percentage of missing values.

License

Attribution 4.0 International (CC BY 4.0)

Who Can Use It

AI Researchers and Data Scientists: To analyse trends in model scale, compute usage, and performance benchmarks.
Technology Historians and Journalists: To document the evolution and key milestones in artificial intelligence.
Policy Makers and Analysts: To understand the resource requirements and organisational landscape of cutting-edge AI development.
Students and Educators: As a resource for projects and learning about significant models in the history of machine learning.

Dataset Name Suggestions

Notable AI Models
Influential Machine Learning Models Database
AI Model Development Trends
Historical AI and ML Models
State-of-the-Art AI Models Collection

Attributes

Original Data Source: Historical AI and ML Models

Listing Stats

VIEWS

DOWNLOADS

LISTED

17/09/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Historical AI and ML Models

Data Science and Analytics

Tags and Keywords

Models

Ai

Machine

Learning

Data

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format

RECOMMENDED DATASETS