Factori Web Data for AI & ML | 247-Country Global Coverage
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
£263,000
About
Factori's AI & ML training data is thoroughly tested and reviewed to ensure quality suitable for production model development. This dataset delivers web activity signals from users browsing popular websites globally, enabling teams to train algorithms for natural language processing, sentiment analysis, audience modeling, and related AI applications. Data integrity is maintained through rigorous validation processes and quality assurance checks guaranteeing reliability for ML pipeline consumption.
Attribute Domains
Identity & Device: Anonymous_id, IDType, Estid, Ttd_id, Adnxs_id, Ip, userAgent, browserFamily, deviceType, Os
Temporal & Behavioral: Timestamp, mappedEvent, Channel
URL & Navigation: Url_metadata_canonical_url, Url_metadata_raw_query_params, refDomain
Search & Intent: searchQuery, Keywords
Semantic Classification: Categories, Entities, Concepts
Data Schema
IDType
Timestamp
Estid
Ip
userAgent
browserFamily
deviceType
Os
Url_metadata_canonical_url
Url_metadata_raw_query_params
refDomain
mappedEvent
Channel
searchQuery
Ttd_id
Adnxs_id
Keywords
Categories
Entities
Concepts
AI Use Cases
Audience Modeling & Feature Engineering — Use behavioral signals such as searchQuery, Categories, and mappedEvent to construct high-dimensional user feature vectors for downstream classification and segmentation models.
LLM Grounding / RAG Enrichment — Leverage Entities, Concepts, and Keywords fields to build domain-aware retrieval corpora that improve factual grounding in retrieval-augmented generation pipelines.
Fraud Detection & Cybersecurity — Cross-reference Ip, userAgent, Anonymous_id, and identity graph fields (Ttd_id, Adnxs_id) against behavioral patterns to train anomaly detection models identifying fraudulent activity across digital channels.
Intent Classification & NLP Training — Raw searchQuery and Keywords fields provide labeled signal for training and fine-tuning intent classifiers, query understanding models, and semantic search systems.
Identity Graph Construction — Multi-identifier records spanning Anonymous_id, Estid, Ttd_id, and Adnxs_id support entity resolution and cross-device identity linkage for unified user graph construction.
Recommendation System Development — Browsing sequences derived from Url_metadata_canonical_url, refDomain, and Categories provide navigation graph data suitable for training collaborative filtering and content-based recommendation models.
Market Intelligence & Competitive Analysis — Aggregate web activity patterns across 247 countries enable analysis of category-level interest trends, competitive landscape mapping, and proximity-of-interest modeling.
Delivery & Integration
Coverage: 247 countries; attributes include Country, Anonymous ID, IP addresses, Search Query, and associated metadata
Data collection: Dynamic; each export reflects the most current available data and insights
Refresh cadence: Available at daily, weekly, or monthly intervals depending on use case requirements
Export method: Delivered via best-suited method determined at time of engagement
Online-to-offline enrichment: Consumer profiles support holistic audience segment construction for enriched ML feature sets
Talk to an expert: https://www.factori.ai/talk-to-expert/?utm_source=direct&utm_medium=referral&utm_campaign=opendatabay
Dataset documentation: https://docs.factori.ai/docs/web?utm_source=direct&utm_medium=referral&utm_campaign=opendatabay
Listing Stats
VIEWS
111
DELIVERY
INSTANT DOWNLOAD
LISTED
19/01/2026
UPDATED
27/03/2026
REGION
GLOBAL
QUALITY
5 / 5
Loading...
£263,000
Download Dataset in CSV Format
Recommended Datasets
Loading recommendations...
