Argo Solutions Travel Data
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a simulated collection of corporate travel activities, primarily focusing on flights and hotel bookings. It was created for a Datathon in 2019, challenging participants to apply machine learning, AI, and data science techniques to a real-world business case. The purpose is to enable the exploration and generation of insights that can help Argo Solutions, a technology company in Latin America, offer the best travel experience for its customers by simplifying expense management and corporate travel processes through technology.
Columns
- travelCode: A unique identifier for each travel record, acting as the primary key. Values range from 0 to 135,943, with a mean of 68,000.
- userCode: A unique identifier for each user, serving as a foreign key. Values range from 0 to 1,339, with a mean of 668.
- from: Details the origin city or place of travel, such as Florianopolis (SC) and Aracaju (SE). Florianopolis (SC) is the most common origin at 21%.
- to: Details the destination city or place of travel, also including Florianopolis (SC) and Aracaju (SE). Florianopolis (SC) is the most common destination at 21%.
- flightType: Specifies the class of flight, with firstClass being the most frequent at 43%, followed by premium at 29%.
- price: The monetary cost associated with the travel, ranging from 301.51 to 1,754.17, with an average price of 957.
- time: The duration of the flight, ranging from 0.44 to 2.44 units, with an average of 1.42.
- distance: The distance covered by the flight, ranging from 168.22 to 937.77, with an average of 547.
- agency: The travel agency used, with Rainbow and CloudFy each accounting for 43% of the records.
- date: The specific date of travel, spanning from 26 September 2019 to 24 July 2023, with an average date around 11 January 2021.
Distribution
This synthetic dataset is presented in a CSV format and simulates real corporate travel systems. It includes data from over one thousand users and 250 thousand travels, specifically 272,000 valid records for each column. All columns have 100% valid data, with no mismatched or missing values. The dataset is not expected to be updated frequently.
Usage
This dataset is ideal for:
- Developing machine learning models to predict travel costs or optimal routes.
- Analysing user behaviour patterns in corporate travel.
- Identifying trends in flight types, destinations, and travel agencies.
- Creating innovative solutions to enhance customer travel experiences and streamline expense management.
- Benchmarking data science techniques against a real-world business challenge.
Coverage
The dataset covers corporate travels within Latin America, with specific cities like Florianopolis (SC) and Aracaju (SE) highlighted as frequent origins and destinations. It encompasses travel dates from 26 September 2019 to 24 July 2023. The dataset includes information on over one thousand users, offering a broad demographic scope of corporate travellers.
License
Attribution 4.0 International (CC BY 4.0)
Who Can Use It
- Data scientists and machine learning engineers looking for a realistic dataset for model development and insight generation.
- Business analysts seeking to understand corporate travel patterns and identify areas for service improvement.
- Students and researchers in data science, AI, and business analytics for educational purposes and case studies.
- Technology companies like Argo Solutions, aiming to refine and innovate their travel and expense management solutions.
Dataset Name Suggestions
- Corporate Travel Insights 2019-2023
- Argo Solutions Travel Data
- Global Corporate Journeys
- Enterprise Flight & Hotel Log
- Datathon 2019 Travel Data
Attributes
Original Data Source: Argo Solutions Travel Data