Market Signal Solutions
Licensed LLM Data Provider
Get In touch with Market Signal Solutions
Details
Location
14 Rue des Pins
Joined
13/03/2026
Response time
Instant
Not Provided
Not Provided
About
Who We Are
MarketSignal Solutions is an alternative data provider based in Quebec, Canada, founded by Keven Dubé. We build AI-powered financial feature datasets designed for quantitative researchers, algorithmic traders, and systematic portfolio managers.
Our mission is straightforward: deliver production-grade, ML-ready feature matrices that combine real news sentiment with robust technical and macro indicators — so quants can focus on modeling, not data engineering.
What We Produce
We publish daily feature matrices with 90–94 engineered columns per asset, organized into 11 feature groups:
- AI News Sentiment — Real article-level sentiment analysis (not synthetic or LLM-generated), updated daily
- Technical Indicators — RSI, MACD, Bollinger Bands, ADX, momentum quality, and more
- Cross-Asset Correlations — S&P 100 breadth, 6-sector return decomposition, relative performance, rolling beta
- Volatility Regime Detection — Vol-of-vol, percentile-based regime classification (low/normal/high/extreme)
- Macro Indicators — VIX level and dynamics, yield curve spread, credit spread proxy, USD index
- Options-Derived — Implied volatility approximation, IV-RV spread
- Earnings Calendar — Days-to-next-earnings, earnings day flag
- Social Attention — Wikipedia pageviews, Google Trends interest
- News Volume Analytics — Article counts, z-scores, spike detection
- Sentiment-Price Interactions — Rolling correlations, divergence flags, sentiment x volatility regime
- Forward Labels — 1/3/5-day returns and direction labels for supervised learning
Coverage
Assets:
- Magnificent 7 Tech Stocks: NVDA, AAPL, TSLA, AMZN, META, MSFT, GOOG
- **Gold (XAU/USD)
Time Span:
- Historical: 6+ years (January 2020 – December 2025), ~1,500 trading days per ticker
- Live Feeds: 2026 data, updated weekly with new trading days
Our Data Pipeline
Herodote AI is our proprietary NLP pipeline that processes 100+ financial articles daily per asset:
- Article Collection — Sources include GDELT Project API and public RSS feeds. All sources are compliant — no scraping, no paywalled content.
- AI Sentiment Analysis — Each article is analyzed by Google Gemini AI for market sentiment. This is real sentiment from real articles, not synthetic data generated by an LLM.
- Price & Market Data — Close prices, S&P 100 cross-section, and macro indicators via Yahoo Finance (public API).
- Feature Engineering — 94-column feature matrix computed with numpy. Every feature is numerical — no copyrighted text appears in any deliverable.
Data integrity is non-negotiable. Every dataset passes a 12-criteria automated audit before publication: column count validation, date continuity, sentiment coverage, label integrity, and more.
Why Choose MarketSignal Solutions
Real Sentiment, Not Synthetic. Every sentiment score traces back to actual financial articles analyzed by AI. We process the full text — not just headlines. This is fundamentally different from LLM-generated synthetic sentiment datasets flooding the market.
Production-Grade Feature Engineering. Cross-asset correlations across 6 market sectors. Volatility regime detection with percentile-based classification. Sentiment-price divergence flags. Momentum quality scores. These are the features institutional quants actually use.
ML-Ready Format. Clean CSV files. No NaN surprises — missing values are handled consistently. Forward-looking labels included (1d, 3d, 5d returns + direction). Drop-in ready for any ML pipeline: scikit-learn, PyTorch, XGBoost, LightGBM.
Point-in-Time Correct. We take look-ahead bias seriously. Wikipedia pageviews use a 1-day lag. Google Trends data uses strict publication-date filtering. Earnings calendar uses pattern-based proxies with documented limitations. Every feature is computed with only information that was available at market open.
Updated Weekly. Live feeds are updated every week with the latest trading days. Historical datasets provide the full backtest foundation.
Products & Pricing
Historical Datasets — £199.99 (one-time)
Full 2020–2025 backtest data. ~1,500 trading days per ticker. 94 columns. Everything you need to develop and validate strategies.
Live Data Feeds — £29.99/month
2026 data updated weekly. Same 94-column schema as historical. Seamless continuation for production systems.
MAG7 Bundle
All 7 Magnificent Seven tickers in one package — for portfolio-level analysis, sector rotation strategies, and cross-asset research.
- MAG7 Historical Bundle — All 7 tickers, 2020–2025
- MAG7 Live Bundle — All 7 tickers, 2026 feed
Data Sources & Compliance
All data inputs are sourced from publicly available, compliant sources:
| Source | Data | Access Method | |--------|------|---------------| | GDELT Project | Global news articles | Free public API | | Public RSS feeds | Financial news | Open RSS endpoints | | Yahoo Finance | Price data, VIX, yields, options | Public API (yfinance) | | Wikimedia | Wikipedia pageviews | Free REST API | | Google Trends | Search interest | Public data |
No copyrighted article text is included in any deliverable. Our datasets contain only numerical features derived from article analysis. The raw articles are processed in our pipeline but never distributed.
Disclaimer
These datasets are provided for quantitative research and educational purposes only. They do not constitute investment advice, trading recommendations, or solicitation to buy or sell any security. Past patterns in the data do not guarantee future results. Users are solely responsible for their own investment decisions and should consult qualified financial professionals before trading.
MarketSignal Solutions — Quebec, Canada Website: marketsignal.solutions

