Dark Mode

Home

Data Categories

AI & ML Data

Satellite-Text Pairing Dataset

FREE DATASET LIBRARY

Verified Data Provider

£0

Satellite-Text Pairing Dataset

Data Science and Analytics

Tags and Keywords

Image

Text

Computer

Vision

Agriculture

Image-to-text

Trusted By

Satellite-Text Pairing Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This collection consists of approximately 10,000 satellite images paired with detailed textual descriptions. The resource is designed for training and evaluating algorithms in remote sensing and automatic image-to-text generation. Crucially, the data links high-resolution aerial visuals with human language, enabling machine learning models to accurately describe geographical scenes. Each visual record is richly annotated, featuring five distinct captions supplied by different individuals, providing diverse linguistic representations of the content.

Columns

The primary association data file, generally provided in a CSV format, includes two essential data fields:

captions: This field contains the set of five unique, human-generated textual descriptions corresponding to the aerial image.
filepath: This field identifies the relative location and filename (e.g., train/airport_1.jpg) of the associated satellite image.

Distribution

The entire collection contains close to 10,000 satellite images. The resource is pre-partitioned into readily usable sets to facilitate model development and testing. The structure includes 8,734 images for training, 1,093 images for testing, and 1,094 images for validation. Data files are typically in CSV format, with the main association file (train.csv) having a size of 2.97 MB.

Usage

This resource is perfectly suited for advancing research and practical applications in artificial intelligence, especially those combining visual and linguistic intelligence:

Creating systems for automatic caption generation from remote sensing images.
Training advanced multimodal models, such as those based on the CLIP architecture, to align visual and textual domains.
Developing semantic search engines that can query aerial imagery using natural language descriptions based on embedding techniques.

Coverage

The dataset captures a variety of scenes derived from satellite imagery, covering diverse geographical features. Examples include transportation hubs like airports, dense residential areas, and agricultural land. Specific global regions or an exact temporal coverage (time range) are not explicitly detailed. The data is expected to receive updates on a monthly basis.

License

CC0: Public Domain

Who Can Use It

The collection is valuable for professionals and researchers focused on merging visual and linguistic intelligence:

Data Engineers: Utilising the high volume of annotated pairs for training robust machine learning pipelines.
Computer Vision Scientists: Focusing on image understanding and multimodal data tasks in specialised domains.
Academic Researchers: Studying model performance in areas such as remote sensing or precision agriculture.

Dataset Name Suggestions

Remote Sensing Image Caption Archive
Satellite-Text Pairing Dataset
GeoVision Description Set
Aerial Annotation Data

Attributes

Original Data Source: Satellite-Text Pairing Dataset

Listing Stats

VIEWS

DOWNLOADS

LISTED

20/11/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Satellite-Text Pairing Dataset

Data Science and Analytics

Tags and Keywords

Image

Text

Computer

Vision

Agriculture

Image-to-text

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in ZIP Format

RECOMMENDED DATASETS