India Travel Hospitality Dataset
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset features details for approximately 4,000 Indian hotels listed on Goibibo, a prominent Indian travel website. It provides valuable insights into the hospitality sector across various cities and regions in India. The data includes extensive information on hotel attributes, facilities, user reviews, and geographical coordinates, making it a valuable resource for market analysis and trend identification within the Indian hotel industry. This pre-crawled collection represents a subset of a much larger dataset.
Columns
- additional_info: Provides extra information, often relating to room services.
- address: The physical address of the hotel property.
- area: The sub-city region where the hotel is geographically situated.
- city: The city in which the hotel is located.
- country: The country where the hotel is located; consistently India in this dataset.
- crawl_date: The specific date when the data was extracted.
- guest_recommendation: Indicates the percentage of guests who have recommended the hotel.
- hotel_brand: The brand or chain that owns the hotel, if applicable.
- hotel_category: Categorisation of the hotel (e.g., 'regular', 'gostays').
- hotel_description: A detailed description of the hotel as provided by the lister.
- hotel_facilities: A list of facilities available at the hotel.
- hotel_star_rating: The star rating of the hotel on an out-of-five scale.
- image_count: The total number of images associated with the hotel listing.
- latitude: The geographical latitude coordinate of the hotel.
- locality: The local area or neighbourhood of the hotel, similar to 'area'.
- longitude: The geographical longitude coordinate of the hotel.
- pageurl: The direct URL of the hotel's listing page on Goibibo.
- point_of_interest: Nearby locations or attractions that guests might find interesting.
- property_id: A unique identification number for the hotel property.
- property_name: The registered name of the hotel property.
- property_type: The classification of the property (e.g., 'Hotel', 'Resort').
- province: The province or larger administrative division where the hotel is located.
- qts: A timestamp indicating when the data was crawled.
- query_time_stamp: A duplicate of the 'qts' field.
- review_count_by_category: A breakdown of reviews across various categories.
- room_area: The estimated area of the rooms in square feet.
- room_count: The total number of rooms available at the hotel.
- room_facilities: A list of facilities specific to the rooms.
- room_type: The different types of rooms offered (e.g., 'Deluxe Room', 'Standard Room').
- similar_hotel: Links to other hotels that are considered similar.
- site_review_count: The total number of user reviews posted on the site for the hotel.
- site_review_rating: The overall rating given by users on the website.
- site_stay_review_rating: Detailed ratings across categories such as Service Quality, Amenities, and Cleanliness.
- sitename: The name of the website from which the data was sourced (Goibibo.com).
- state: The state in India where the hotel is located.
- uniq_id: A unique identifier assigned by the website for each listing.
Distribution
This dataset is typically provided as a CSV file, named
goibibo_com-travel_sample.csv
. It has a file size of 9.79 MB. The dataset comprises details for 4,000 individual hotel listings, structured across 36 distinct columns. Information on the exact number of rows or records for all fields is available, with most key attributes having complete data for all 4,000 entries.Usage
This dataset is ideal for:
- Market Analysis: Identifying trends in hotel pricing, amenities, and guest preferences across Indian regions.
- Natural Language Processing (NLP): Analysing hotel descriptions to extract common words, phrases, and their relationship to amenities offered.
- Geospatial Analysis: Exploring hotel distribution, concentration, and points of interest in different cities and states within India.
- Recommendation Systems: Building models that suggest similar hotels based on various attributes.
- Competitive Intelligence: Gaining insights into the offerings and guest feedback of hotels listed on a major travel platform.
Coverage
The dataset focuses exclusively on hotels located within India. Geographically, it covers various states such as Maharashtra and Karnataka, and cities including Goa and Bangalore. The data was primarily crawled between 26 June 2016 and 28 August 2016. While no specific demographic scope is outlined, the data reflects the general Indian hotel market. Updates to this dataset are expected to occur annually.
License
CC BY-SA 4.0
Who Can Use It
- Data Analysts: For uncovering market trends and patterns in the Indian hospitality sector.
- Researchers: To conduct academic studies on travel patterns, customer satisfaction, or regional economic impacts.
- Travel and Hospitality Businesses: For competitive benchmarking, strategic planning, and understanding customer needs.
- Machine Learning Engineers: For training models on sentiment analysis of hotel reviews or developing recommendation engines.
- Urban Planners/Geographers: To visualise hotel distribution and identify areas with high tourist interest.
Dataset Name Suggestions
- Goibibo India Hotel Listings
- Indian Accommodation Data by Goibibo
- Goibibo Hotel Market Insights - India
- India Travel Hospitality Dataset
Attributes
Original Data Source: India Travel Hospitality Dataset