HF Model Performance Snapshot
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This data captures information on all models publicly uploaded to the HuggingFace model hub, numbering over 10,000 assets. The material contains metadata, making it useful for understanding the landscape of models available. The idea was inspired by a desire to analyse publicly available models on HuggingFace, following attendance at a live session on the transformers course.
-
Columns
The information is split across two primary files:
huggingface_models.csv and huggingface_modelcard_readme.csv, which can be joined using the modelId column.From
huggingface_models.csv (Primary Metadata):- modelId: The unique ID of the model as it appears on the HuggingFace website.
- lastModified: The timestamp indicating when the model was last updated.
- tags: Keywords or tags associated with the model, typically provided by the maintainer.
- pipeline_tag: Denotes which pipeline the model is intended to be used with, if this information exists.
- files: A list showing the files available within the model repository.
- publishedBy: A custom column derived from the
modelID, specifying the publisher. - downloads_last_month: The number of times the model has been downloaded during the preceding month.
- library: The name of the framework or library the model belongs to (e.g., transformers, spacy, timm).
From
huggingface_modelcard_readme.csv (Detailed Information):-
modelId: The ID of the model used to join the two files.
-
modelCard: Contains the contents of the model's README file (referred to as a model card in the HuggingFace environment). This includes valuable information regarding model training, benchmarks, and author notes. Note that 36% of the records in this column are missing.
-
Distribution
The dataset is tabular and is primarily provided in CSV format. It contains metadata for over 10,000 models. The
huggingface_modelcard_readme.csv file has a size of 18.01 MB, consisting of 2 columns and 10,406 unique values for modelId.-
Usage
This material is suitable for Data Visualization and Exploratory Data Analysis. It can be used to analyse model trends, publishing patterns, and the popularity of different libraries (like Transformers). The model card details allow users to investigate how specific models were trained and benchmarked.
-
Coverage
Time Range: The core information was collected specifically between 15th June and 20th June 2021.
Update Frequency: The dataset is expected to be updated quarterly.
-
License
CC0: Public Domain
-
Who Can Use It
The dataset is appropriate for beginners in data analysis. Intended users include Data Analysts and Data Scientists looking to perform exploratory analysis on the machine learning model ecosystem. Developers and researchers can use the detailed model card information to gain insights into model creation and performance metrics.
-
Dataset Name Suggestions
-
HuggingFace Model Index
-
ML Model Hub Metadata
-
HF Model Performance Snapshot
-
Public Model Metadata Analysis
-
Attributes
Original Data Source:HF Model Performance Snapshot
Loading...
