Dark Mode

Home

Data Categories

AI & ML Data

Global LLM Release Metrics

FREE DATASET LIBRARY

Verified Data Provider

£0

Global LLM Release Metrics

Data Science and Analytics

Tags and Keywords

Llm

Chatbot

Ai

Parameters

Token

Trusted By

Global LLM Release Metrics Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Detailed historical data on every major Large Language Model (LLM) and chatbot released between 2018 and 2024. The dataset provides essential technical specifications critical for understanding the development, growth, and complexity of modern artificial intelligence. It serves as a resource for tracking industry evolution by detailing model parameters, token counts, training data, and associated companies.

Columns

The dataset contains 11 specific fields detailing model information:

Model: The official designation or name of the language model.
Company: The corporation or entity responsible for developing the model. Google and Meta AI are frequently represented.
Arch: Describes the underlying model architecture, such as Transformer or Recurrent Neural Network (RNN). Some values are designated as To Be Announced (TBA).
Parameters: The measure of the model's complexity, expressed in billions of weights.
Tokens: The volume of sub-word units the model was trained on or can process, recorded in billions. This field has a significant number of missing values (25%).
Ratio: Typically indicates the ratio of parameters to tokens, though specific values are rare (e.g., 20:01 for Olympus).
ALScore: A calculated metric intended as a quick rating of the model's power, derived from the square root of (Parameters multiplied by Tokens).
Training dataset: The primary data sources used to train the model, often listing resources like Wikipedia, books, and common crawl data.
Release Date: The anticipated or confirmed date when the model was made available.
Notes: Provides supplementary details, such as whether the model functions as a Chatbot.
Playground: A URL linking to a site where users can interact with the model or find additional information.

Distribution

The information is structured as a CSV file named "Large language models (2024).csv". It contains 342 total valid records across 11 columns. Data is expected to be updated on a quarterly basis. It is important to note that certain metrics, such as Tokens, Ratio, and ALScore, have substantial portions of missing values.

Usage

This data product is ideally suited for benchmarking and comparing Large Language Models across various metrics like size, complexity, and training data source. It is perfect for tracking trends in AI development over the period 2018 to 2024, identifying dominant architectures like Dense models, and performing historical analysis of corporate involvement in AI. It is highly useful for generating industry reports and visualisations showing the growth curve of LLM capabilities.

Coverage

The temporal scope of the data encompasses the years 2018 through 2024, covering releases within that period. The data pertains to globally released major models and chatbots. The scope focuses purely on technological specifications and corporate development entities rather than geographic or demographic variables.

License

CC0: Public Domain

Who Can Use It

The dataset holds high value (rated 10.00 for usability) for several key user groups:

Artificial Intelligence Researchers: For studying the relationship between parameter count, token volume, and model release date.
Data Scientists: For advanced modelling and predictive analysis related to AI growth trajectories.
Academics and Students: For educational purposes, specifically understanding LLM taxonomy and architecture types.
Industry Analysts: For tracking the activities of key companies like Google and Meta AI in the LLM space.

Dataset Name Suggestions

AI Language Model Specifications 2018-2024
Global LLM Release Metrics
Major Chatbot and LLM Technical History

Attributes

Original Data Source: Global LLM Release Metrics

Listing Stats

VIEWS

DOWNLOADS

LISTED

07/10/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Global LLM Release Metrics

Data Science and Analytics

Tags and Keywords

Llm

Chatbot

Ai

Parameters

Token

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format

RECOMMENDED DATASETS