Factori Web Data for AI & ML | 247-Country Global Coverage

Data Science and Analytics

Tags and Keywords

Ml

Training

Global

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Factori Web Data for AI & ML | 247-Country Global Coverage Dataset on Opendatabay data marketplace

"No reviews yet"

£263,000

About

Factori's AI & ML training data is thoroughly tested and reviewed to ensure quality suitable for production model development. This dataset delivers web activity signals from users browsing popular websites globally, enabling teams to train algorithms for natural language processing, sentiment analysis, audience modeling, and related AI applications. Data integrity is maintained through rigorous validation processes and quality assurance checks guaranteeing reliability for ML pipeline consumption.

Attribute Domains

Identity & Device: Anonymous_id, IDType, Estid, Ttd_id, Adnxs_id, Ip, userAgent, browserFamily, deviceType, Os
Temporal & Behavioral: Timestamp, mappedEvent, Channel URL & Navigation: Url_metadata_canonical_url, Url_metadata_raw_query_params, refDomain
Search & Intent: searchQuery, Keywords Semantic Classification: Categories, Entities, Concepts

Data Schema

IDType
Timestamp
Estid
Ip
userAgent
browserFamily
deviceType
Os
Url_metadata_canonical_url
Url_metadata_raw_query_params
refDomain
mappedEvent
Channel
searchQuery
Ttd_id
Adnxs_id
Keywords
Categories
Entities
Concepts

AI Use Cases

Audience Modeling & Feature Engineering — Use behavioral signals such as searchQuery, Categories, and mappedEvent to construct high-dimensional user feature vectors for downstream classification and segmentation models.
LLM Grounding / RAG Enrichment — Leverage Entities, Concepts, and Keywords fields to build domain-aware retrieval corpora that improve factual grounding in retrieval-augmented generation pipelines.
Fraud Detection & Cybersecurity — Cross-reference Ip, userAgent, Anonymous_id, and identity graph fields (Ttd_id, Adnxs_id) against behavioral patterns to train anomaly detection models identifying fraudulent activity across digital channels.
Intent Classification & NLP Training — Raw searchQuery and Keywords fields provide labeled signal for training and fine-tuning intent classifiers, query understanding models, and semantic search systems.
Identity Graph Construction — Multi-identifier records spanning Anonymous_id, Estid, Ttd_id, and Adnxs_id support entity resolution and cross-device identity linkage for unified user graph construction.
Recommendation System Development — Browsing sequences derived from Url_metadata_canonical_url, refDomain, and Categories provide navigation graph data suitable for training collaborative filtering and content-based recommendation models.
Market Intelligence & Competitive Analysis — Aggregate web activity patterns across 247 countries enable analysis of category-level interest trends, competitive landscape mapping, and proximity-of-interest modeling.

Delivery & Integration

Coverage: 247 countries; attributes include Country, Anonymous ID, IP addresses, Search Query, and associated metadata
Data collection: Dynamic; each export reflects the most current available data and insights
Refresh cadence: Available at daily, weekly, or monthly intervals depending on use case requirements
Export method: Delivered via best-suited method determined at time of engagement
Online-to-offline enrichment: Consumer profiles support holistic audience segment construction for enriched ML feature sets

Talk to an expert: https://www.factori.ai/talk-to-expert/?utm_source=direct&utm_medium=referral&utm_campaign=opendatabay
Dataset documentation: https://docs.factori.ai/docs/web?utm_source=direct&utm_medium=referral&utm_campaign=opendatabay

Listing Stats

VIEWS

111

DELIVERY

INSTANT DOWNLOAD

LISTED

19/01/2026

UPDATED

27/03/2026

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

Loading...

£263,000

Download Dataset in CSV Format