Cleaned Midjourney V5.1 User Selection Data
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This data product offers a focused, cleaned selection of Midjourney V5.1 image generation prompts and their associated metadata. It represents a highly curated subset of a larger scraped collection, specifically filtered to include only those prompts corresponding to images that users actively chose to upscale. This selection strategy ensures the dataset contains examples of high-quality, user-preferred generative outputs, making it extremely valuable for studying successful prompting techniques and prevailing aesthetic preferences in AI art. The data has undergone cleaning to isolate the core prompts from ancillary metadata and parameters, such as the version number and aspect ratio used during generation.
Columns
The product contains 10 specific data fields:
- Content: The original prompt text used for scraping, including any associated metadata.
- Attachments: The URL linking directly to the generated image file.
- Reactions: Records of emojis used on Discord messages related to the generation.
- user_name: The Discord user name of the individual who authored the prompt.
- upscaled: A Boolean flag indicating that the image was upscaled by the user (all records in this file are True).
- version: The MidJourney version number utilised, which may go up to 5.1.
- aspect: The custom aspect ratio requested by the user, if different from the default squared format.
- clean_prompts: A refined version of the prompt, stripped of extraneous metadata for focused analysis.
- Date: The timestamp indicating when the image was generated.
- Row Index: A sequential numbered column for indexing each record.
Distribution
The data is available in a standard CSV file format, named
upscaled_prompts_df.csv, with a file size of 280.84 MB. The set is large, containing approximately 446,000 valid records (rows) available for immediate use. The structure consists of 10 distinct columns.Usage
This resource is ideal for various advanced applications, including:
- Prompt Engineering Research: Analysing successful syntax and keywords used by real users to generate preferred outputs.
- Machine Learning Model Training: Developing models that predict high-quality or popular image characteristics based on input text.
- Trend Analysis: Identifying evolving user preferences, common image themes, and popular generative styles within the Midjourney ecosystem.
- Parameter Optimisation: Investigating the correlation between specific parameters (e.g., aspect ratio or version) and successful image upscaling.
Coverage
The data covers image generation activities spanning a period from 20 April 2023 to 10 May 2023. As the source material is scraped from a global platform, the dataset inherently reflects inputs from a diverse, worldwide user base. It focuses predominantly on Midjourney Version 5.1 outputs, though records may include earlier versions (like V3, V4, and V5) as specified by users.
License
CC0: Public Domain
Who Can Use It
- Data Scientists: For training and validating generative AI models and performing quantitative analysis on prompt effectiveness.
- Generative AI Artists: To study real-world, high-performance prompts and improve personal art generation techniques.
- Academic Researchers: For studying human-computer interaction, creativity metrics, and the societal impact of large-scale text-to-image systems.
- Application Developers: For building prompt suggestion tools or integrating knowledge of effective prompts into new platforms.
Dataset Name Suggestions
- Midjourney V5.1 High-Quality Prompts (Upscaled Only)
- Cleaned Midjourney V5.1 User Selection Data
- Popular AI Art Prompts (April-May 2023)
- Midjourney Upscale Prompt Library
Attributes
Original Data Source: Cleaned Midjourney V5.1 User Selection Data
Loading...
