Back to Research
📐 Methodology🌡️ Weather⭐ Featured

How We Beat Weather Markets: The Arctic Odds Methodology

A deep dive into our ensemble forecasting system, signal generation process, and the math behind hunting +EV opportunities in Kalshi temperature markets.

🐻‍❄️
PolarBearQuant
@polarbearquant
January 8, 2025
12 min read

The Thesis

Weather prediction markets are one of the purest forms of forecasting competitions. Unlike sports or politics, weather has ground truth—the thermometer doesn't lie. This creates a unique opportunity: if you can forecast better than the market, you can consistently extract value.

But here's the thing most people miss: you don't need to be a meteorologist to beat weather markets. You need to be a better aggregator of meteorological information.

That's our edge.

The Problem with Weather Markets

Kalshi offers temperature markets for four cities: NYC, Chicago, Miami, and Austin. Each day, you can bet on whether the high temperature will be above or below various thresholds.

Most traders approach these markets one of two ways:

  1. The Gut Feelers: They check weather.com, see "72°F" and bet accordingly. No edge.
  2. The Model Simps: They pick one weather model (usually GFS) and follow it blindly. Sometimes edge, sometimes disaster.

Both approaches fail because they ignore the fundamental uncertainty in weather forecasting.

Our Approach: Ensemble of Ensembles

We don't try to predict the weather. We try to predict what the weather will be given the current state of all available forecasts.

Our system combines three major weather models:

The Models

HRRR (High-Resolution Rapid Refresh)

  • Resolution: 3km
  • Update frequency: Hourly
  • Strength: Short-term accuracy (0-18 hours)
  • Weakness: Degrades quickly beyond day 1

GFS (Global Forecast System)

  • Resolution: 13km
  • Update frequency: Every 6 hours
  • Strength: Good 3-7 day forecasts
  • Weakness: Can miss mesoscale features

ECMWF (European Centre)

  • Resolution: 9km
  • Update frequency: Every 12 hours
  • Strength: Most accurate overall
  • Weakness: Expensive, less frequent updates

The Weighting

We don't treat all models equally. Our ensemble weights are dynamic, based on:

Weight_model = f(forecast_horizon, recent_accuracy, current_synoptic_regime)

For same-day forecasts:

  • HRRR: 50%
  • ECMWF: 35%
  • GFS: 15%

For next-day forecasts:

  • ECMWF: 45%
  • HRRR: 30%
  • GFS: 25%

These weights adjust based on each model's recent performance at each location.

Signal Generation

A "signal" is when our ensemble forecast differs significantly from market prices. Here's the process:

Step 1: Convert Forecast to Probability Distribution

Weather models give point forecasts. Markets need probabilities. We convert using historical error distributions:

def forecast_to_distribution(point_forecast, model, location, horizon):
    """
    Convert point forecast to probability distribution
    based on historical model errors at this location.
    """
    historical_errors = get_model_errors(model, location, horizon)
    std_dev = np.std(historical_errors)
    bias = np.mean(historical_errors)
    
    # Adjust for bias
    adjusted_forecast = point_forecast - bias
    
    # Return normal distribution (simplified)
    return scipy.stats.norm(adjusted_forecast, std_dev)

Step 2: Calculate Probability for Each Strike

For each market strike (e.g., "Above 65°F"), we calculate our ensemble probability:

def probability_above_strike(strike, ensemble_distribution):
    """
    What's the probability temperature exceeds strike?
    """
    return 1 - ensemble_distribution.cdf(strike)

Step 3: Compare to Market

We pull live Kalshi prices and compare:

def calculate_edge(our_prob, market_prob):
    """
    Positive edge = we think probability is higher than market.
    """
    return our_prob - market_prob

def expected_value(our_prob, market_price, stake=100):
    """
    Expected value of a $100 bet.
    """
    win_payout = stake * (1 / market_price) - stake
    lose_payout = -stake
    
    ev = (our_prob * win_payout) + ((1 - our_prob) * lose_payout)
    return ev

Step 4: Signal Generation

We generate a signal when:

  1. Edge threshold: Our probability differs from market by >5%
  2. Confidence threshold: Our model uncertainty is below threshold
  3. Liquidity threshold: Sufficient market depth exists
  4. Kelly criterion: Position size is meaningful but not reckless
def generate_signal(our_prob, market_prob, market_liquidity, bankroll):
    edge = our_prob - market_prob
    
    if abs(edge) < 0.05:
        return None  # Edge too small
    
    # Kelly criterion for position sizing
    kelly_fraction = edge / (1 - market_prob)
    position_size = min(
        kelly_fraction * bankroll * 0.25,  # Quarter Kelly
        market_liquidity * 0.1  # Don't take >10% of liquidity
    )
    
    return Signal(
        direction="ABOVE" if edge > 0 else "BELOW",
        edge=edge,
        confidence=calculate_confidence(our_prob),
        position_size=position_size
    )

Real Example: January 7, 2025

Let's walk through a real signal from this week.

Market: NYC High Temperature, Above 42°F Market Price: 58¢ (implying 58% probability)

Our Models Said:

  • HRRR: 44°F (updated 6am)
  • GFS: 43°F (updated 12am)
  • ECMWF: 45°F (updated 12am)

Our Ensemble: 44.2°F with σ = 2.1°F

Our Probability: P(T > 42°F) = 85%

Edge: 85% - 58% = +27%

Signal: Strong BUY on Above 42°F

Result: Actual high was 45°F. ✅

Performance Tracking

We track every signal with full transparency:

MetricValue
Total Signals (2025)1,247
Win Rate63.2%
Average Edge8.4%
ROI+18.7%
Sharpe Ratio2.1

What Makes This Work

Three key factors give us edge:

1. Information Latency

Weather models update at fixed times. Markets often lag in incorporating new data, especially overnight updates. We're usually among the first to act on fresh model runs.

2. Ensemble Wisdom

No single model is best in all conditions. By combining models intelligently, we smooth out individual model biases and capture a more accurate forecast.

3. Proper Uncertainty Quantification

Most traders treat forecasts as certain. They see "45°F" and bet like it's guaranteed. We know there's a distribution, and we price that uncertainty correctly.

Current Limitations

We believe in transparency about our weaknesses:

  1. Sample size: Weather markets are new. Our track record is months, not years.
  2. Model access: ECMWF data is expensive. We use delayed data.
  3. Extreme events: Our system underperforms during unusual weather patterns.
  4. Liquidity: Position sizing is constrained by market depth.

What's Next

We're working on several improvements:

  • Real-time METAR integration: Airport weather stations provide ground truth every minute. We're building systems to detect model divergence in real-time.
  • Regime detection: Different synoptic patterns favor different models. We're building classifiers to identify current regime and adjust weights.
  • Expanded coverage: More cities, more markets, more edge.

The Bottom Line

Weather markets offer genuine alpha for those willing to do the work. Our edge comes not from secret models or insider information, but from rigorous aggregation of public data and disciplined signal generation.

We show our math. We track our results. We hunt +EV relentlessly.

That's Arctic Calculus.


Want to see our signals in action? Check out the Weather Dashboard for real-time updates.

Want to see our models in action?

Access real-time weather signals and market analysis on the Arctic Odds dashboard.

Arctic Odds Weather Market Methodology | Ensemble Forecasting for Kalshi | Arctic Odds