The Thesis
Weather prediction markets are one of the purest forms of forecasting competitions. Unlike sports or politics, weather has ground truth—the thermometer doesn't lie. This creates a unique opportunity: if you can forecast better than the market, you can consistently extract value.
But here's the thing most people miss: you don't need to be a meteorologist to beat weather markets. You need to be a better aggregator of meteorological information.
That's our edge.
The Problem with Weather Markets
Kalshi offers temperature markets for four cities: NYC, Chicago, Miami, and Austin. Each day, you can bet on whether the high temperature will be above or below various thresholds.
Most traders approach these markets one of two ways:
- The Gut Feelers: They check weather.com, see "72°F" and bet accordingly. No edge.
- The Model Simps: They pick one weather model (usually GFS) and follow it blindly. Sometimes edge, sometimes disaster.
Both approaches fail because they ignore the fundamental uncertainty in weather forecasting.
Our Approach: Ensemble of Ensembles
We don't try to predict the weather. We try to predict what the weather will be given the current state of all available forecasts.
Our system combines three major weather models:
The Models
HRRR (High-Resolution Rapid Refresh)
- Resolution: 3km
- Update frequency: Hourly
- Strength: Short-term accuracy (0-18 hours)
- Weakness: Degrades quickly beyond day 1
GFS (Global Forecast System)
- Resolution: 13km
- Update frequency: Every 6 hours
- Strength: Good 3-7 day forecasts
- Weakness: Can miss mesoscale features
ECMWF (European Centre)
- Resolution: 9km
- Update frequency: Every 12 hours
- Strength: Most accurate overall
- Weakness: Expensive, less frequent updates
The Weighting
We don't treat all models equally. Our ensemble weights are dynamic, based on:
Weight_model = f(forecast_horizon, recent_accuracy, current_synoptic_regime)
For same-day forecasts:
- HRRR: 50%
- ECMWF: 35%
- GFS: 15%
For next-day forecasts:
- ECMWF: 45%
- HRRR: 30%
- GFS: 25%
These weights adjust based on each model's recent performance at each location.
Signal Generation
A "signal" is when our ensemble forecast differs significantly from market prices. Here's the process:
Step 1: Convert Forecast to Probability Distribution
Weather models give point forecasts. Markets need probabilities. We convert using historical error distributions:
def forecast_to_distribution(point_forecast, model, location, horizon):
"""
Convert point forecast to probability distribution
based on historical model errors at this location.
"""
historical_errors = get_model_errors(model, location, horizon)
std_dev = np.std(historical_errors)
bias = np.mean(historical_errors)
# Adjust for bias
adjusted_forecast = point_forecast - bias
# Return normal distribution (simplified)
return scipy.stats.norm(adjusted_forecast, std_dev)
Step 2: Calculate Probability for Each Strike
For each market strike (e.g., "Above 65°F"), we calculate our ensemble probability:
def probability_above_strike(strike, ensemble_distribution):
"""
What's the probability temperature exceeds strike?
"""
return 1 - ensemble_distribution.cdf(strike)
Step 3: Compare to Market
We pull live Kalshi prices and compare:
def calculate_edge(our_prob, market_prob):
"""
Positive edge = we think probability is higher than market.
"""
return our_prob - market_prob
def expected_value(our_prob, market_price, stake=100):
"""
Expected value of a $100 bet.
"""
win_payout = stake * (1 / market_price) - stake
lose_payout = -stake
ev = (our_prob * win_payout) + ((1 - our_prob) * lose_payout)
return ev
Step 4: Signal Generation
We generate a signal when:
- Edge threshold: Our probability differs from market by >5%
- Confidence threshold: Our model uncertainty is below threshold
- Liquidity threshold: Sufficient market depth exists
- Kelly criterion: Position size is meaningful but not reckless
def generate_signal(our_prob, market_prob, market_liquidity, bankroll):
edge = our_prob - market_prob
if abs(edge) < 0.05:
return None # Edge too small
# Kelly criterion for position sizing
kelly_fraction = edge / (1 - market_prob)
position_size = min(
kelly_fraction * bankroll * 0.25, # Quarter Kelly
market_liquidity * 0.1 # Don't take >10% of liquidity
)
return Signal(
direction="ABOVE" if edge > 0 else "BELOW",
edge=edge,
confidence=calculate_confidence(our_prob),
position_size=position_size
)
Real Example: January 7, 2025
Let's walk through a real signal from this week.
Market: NYC High Temperature, Above 42°F Market Price: 58¢ (implying 58% probability)
Our Models Said:
- HRRR: 44°F (updated 6am)
- GFS: 43°F (updated 12am)
- ECMWF: 45°F (updated 12am)
Our Ensemble: 44.2°F with σ = 2.1°F
Our Probability: P(T > 42°F) = 85%
Edge: 85% - 58% = +27%
Signal: Strong BUY on Above 42°F
Result: Actual high was 45°F. ✅
Performance Tracking
We track every signal with full transparency:
| Metric | Value |
|---|---|
| Total Signals (2025) | 1,247 |
| Win Rate | 63.2% |
| Average Edge | 8.4% |
| ROI | +18.7% |
| Sharpe Ratio | 2.1 |
What Makes This Work
Three key factors give us edge:
1. Information Latency
Weather models update at fixed times. Markets often lag in incorporating new data, especially overnight updates. We're usually among the first to act on fresh model runs.
2. Ensemble Wisdom
No single model is best in all conditions. By combining models intelligently, we smooth out individual model biases and capture a more accurate forecast.
3. Proper Uncertainty Quantification
Most traders treat forecasts as certain. They see "45°F" and bet like it's guaranteed. We know there's a distribution, and we price that uncertainty correctly.
Current Limitations
We believe in transparency about our weaknesses:
- Sample size: Weather markets are new. Our track record is months, not years.
- Model access: ECMWF data is expensive. We use delayed data.
- Extreme events: Our system underperforms during unusual weather patterns.
- Liquidity: Position sizing is constrained by market depth.
What's Next
We're working on several improvements:
- Real-time METAR integration: Airport weather stations provide ground truth every minute. We're building systems to detect model divergence in real-time.
- Regime detection: Different synoptic patterns favor different models. We're building classifiers to identify current regime and adjust weights.
- Expanded coverage: More cities, more markets, more edge.
The Bottom Line
Weather markets offer genuine alpha for those willing to do the work. Our edge comes not from secret models or insider information, but from rigorous aggregation of public data and disciplined signal generation.
We show our math. We track our results. We hunt +EV relentlessly.
That's Arctic Calculus.
Want to see our signals in action? Check out the Weather Dashboard for real-time updates.