# DNS1 Best Performer Methodology

Generated: 2026-05-17

This document explains the best performer from the completed real-data one-year
DNS1 strategy sweep. It is intentionally written as an implementation and
research note, not as a capital deployment approval.

The result table is:

`artifacts/final_real_backtest_1y/all_strategy_results.csv`

The runnable sweep code is:

`scripts/run_real_strategy_sweep.py`

The reusable backtest engine is:

`src/dn_research/strategy_lab.py`

## 1. What The Premise Of The Problem Is

The problem is to find a repeatable Polymarket strategy that can deploy
meaningful capital while staying economically market-neutral enough to avoid
being just a directional betting system.

The DNS1 premise is that binary prediction markets sometimes price the selected
side below its true empirical resolution probability. If the market offers a
side at 50c and the historically comparable bucket resolves in favor of that
side 70% of the time, there is a 20c gross expected edge before fees and
execution costs.

The practical trading problem is harder than that simple edge statement. The
system must answer all of these questions at once:

1. Is the apparent edge real, or is it an artifact of stale data, leakage, or
   a badly matched bucket?
2. Can the strategy deploy enough capital to matter?
3. Can the strategy survive actual Polymarket execution: spread, depth, queue
   position, adverse selection, fee drag, and late market information?
4. Does the rule fail catastrophically around live sports, esports, fast-moving
   event markets, or markets whose public quote already contains information
   that the daily historical panel does not represent?
5. Can the strategy be written as a repeatable algorithm that does not depend on
   manual judgment?

The latest backtest here focuses on the first two questions and partially on
the third. It uses real historical panel rows and realized market outcomes. It
does not use generated prices, synthetic outcomes, or fabricated fills. However,
it still uses explicit execution-cost assumptions rather than authenticated
historical fill attribution, so its PnL is a ranking signal, not a deployable
PnL claim.

## 2. What Data Was Used

The sweep used the local empirical Polymarket resolution panel:

`artifacts/empirical_resolution_panel.parquet`

Panel coverage:

- Rows loaded: 3,093,957
- Unique markets: 478,788
- Unique token IDs: 452,265
- Quote-date coverage: 2024-09-19 to 2026-04-17
- Evaluation period: 2025-06-17 to 2026-04-17
- Walk-forward folds: 11
- Scored candidate rows over the evaluation folds: 2,418,194

The backtest used:

- historical quoted side prices from the empirical panel,
- realized binary resolutions from the empirical panel,
- market metadata such as category, subcategory, title, volume, market ID,
  token ID, quote date, and end date,
- explicit maker/taker fee and slippage assumptions,
- no generated prices,
- no generated outcomes,
- no fabricated trade fills.

The execution model is still not complete live-market replay. It is a daily
historical strategy-ranking harness. The ranking is useful. The absolute dollar
PnL is not a final live-capital estimate.

## 3. What The Solution Is

The solution tested here is a purged walk-forward empirical carry strategy.

At a high level:

1. Build empirical resolution surfaces from past markets.
2. Score current candidate rows against those surfaces.
3. Estimate each candidate side's edge after costs.
4. Select only candidates with enough historical evidence and enough edge.
5. Rank candidates by edge and volume.
6. Size positions dynamically up to a fixed gross exposure target.
7. Hold to resolution or apply a simple daily drawdown-stop proxy.
8. Repeat across every day in the one-year evaluation period.
9. Run the same evaluation across a grid of strategy parameters.
10. Write one row per strategy configuration into one CSV.

The completed sweep tested 1,440 strategies. Each strategy used the same
one-year evaluation window, the same training lookback rule, the same purge and
embargo logic, and the same result metrics.

The best historical row was:

`broad_50_95__dte1__edge120__yes__maker_base__pos25000__dd3c`

There is a tied row:

`broad_50_95__dte1__edge120__yes__maker_base__pos25000__hold`

Both rows produced the same metrics because the 3c drawdown-stop rule did not
change the realized winning path in this daily panel. The stop rule should not
be credited as proven protection from this result. It simply tied the
hold-to-resolution version in this run.

## 4. Exact Best-Performer Parameters

The rank-one CSV row has these parameters:

- Price band: 50c to 95c selected-side price
- Max days to expiry: 1 day
- Side: YES only
- Minimum edge after cost: 12c
- Historical evidence threshold: at least 50 bucket observations
- Surface type: category-aware empirical surface with fallback to overall
  surface
- Execution profile: maker-base
- Entry slippage assumption: 30 bps
- Exit slippage assumption: 50 bps
- Fee assumption: 25 bps
- Maker rebate assumption: 0 bps
- Max position size: $25,000
- Min position size: $1,000
- Target gross exposure: $1,000,000
- Max gross exposure: $1,250,000
- Max category fraction: 35%
- Dynamic sizing: enabled
- Edge scale: 20c
- Exit policy in rank-one row: 3c daily drawdown-stop proxy
- Tied exit policy: hold to resolution

The important practical interpretation is that the winning row is not the
original narrow near-100c DNS1 profile. It is a broader terminal YES-side
strategy. It buys YES between 50c and 95c when the empirical bucket says the
selected side has at least 12c edge after modeled cost, and it does so within
one day of expiry.

## 5. Exact Best-Performer Metrics

From `all_strategy_results.csv`:

- Rank: 1 of 1,440
- Objective score: 160.466706
- Total PnL: $145,737,193.53
- Average daily deployed: $719,528.06
- Deployment p05: $437,500.00
- Deployment p50: $732,110.47
- Deployment p95: $1,000,000.00
- Days above $500k deployed: 66.89%
- Positive day rate: 99.34%
- Annualized Sharpe: 56.29
- Max drawdown: -$71,752.26
- Trade count: 9,197
- Win rate: 86.29%
- Total entry notional: $219,456,058.24
- Signal count after the per-day ranked cap: 118,885

The largest category contribution came from Crypto-Markets:

- Crypto-Markets trades: 5,207
- Crypto-Markets PnL: $112,723,700+
- Other-Miscellaneous trades: 2,764
- Other-Miscellaneous PnL: $26,749,000+

Entry price distribution:

- 0th percentile: 50.0c
- 25th percentile: 50.0c
- Median: 50.0c
- 75th percentile: 51.0c
- 95th percentile: 68.5c
- Max: 82.5c

Days-to-expiry distribution:

- 0th percentile: 0 days
- Median: 0 days
- 75th percentile: 1 day
- Max: 1 day

That distribution is the key diagnostic. The winning historical row is mostly a
same-day or one-day terminal YES strategy around 50c. It is not mostly a
high-price carry strategy.

## 6. What The Algorithm Is

The sweep algorithm is deterministic.

### 6.1 Load the real panel

The harness reads `artifacts/empirical_resolution_panel.parquet` and keeps the
columns needed for historical scoring:

- token ID
- quote timestamp
- quote date
- YES mark
- market ID
- title
- end date
- numeric resolution
- main category
- subcategory
- volume
- days to expiry

Rows with missing date, price, resolution, or DTE fields are dropped. Prices are
restricted to valid binary-market prices.

### 6.2 Convert each market into side rows

Each original market row is converted into two possible side rows:

- YES side:
  - side price = YES mark
  - side resolution = numeric resolution
- NO side:
  - side price = 1 - YES mark
  - side resolution = 1 - numeric resolution

This makes the strategy symmetric at the data level. A strategy can later choose
YES only, NO only, or both sides.

### 6.3 Bucket the side price and time to expiry

Each side row is assigned:

- a side-price bucket such as 50-65%, 65-80%, 80-90%, 90-95%, 95-97%,
  97-99%, or 99-100%,
- a DTE bucket such as 0-1d, 1-3d, 3-7d, 7-14d, 14-30d, and longer buckets.

The strategy does not assume every individual market is identical. It estimates
resolution rates at the side-price and DTE bucket level, and optionally within
category.

### 6.4 Build purged walk-forward folds

The evaluation period is one year, ending at the last available quote date in
the panel. For this run the evaluated dates were 2025-06-17 through
2026-04-17.

For each fold:

1. Use a 365-day training lookback.
2. Apply a 30-day purge so markets resolving near the test period do not leak
   into training.
3. Apply a 7-day embargo between training and testing.
4. Score only the test period from the fold.

This is materially stricter than fitting on the full panel and then pretending
the result is out-of-sample.

### 6.5 Build empirical surfaces

For each fold, the training rows are grouped by:

- side,
- side-price bucket,
- DTE bucket.

For each group, the harness computes:

- empirical observation count,
- empirical selected-side resolution probability,
- average selected-side price.

It also computes category-specific versions when enough category evidence is
available. If the category sample is too thin, it falls back to the overall
surface.

### 6.6 Score test candidates

For each test candidate:

1. Look up the empirical side-resolution probability from the surface.
2. Compute gross edge:
   - empirical selected-side probability minus selected-side price.
3. Compute cost edge:
   - entry slippage + fee - maker rebate, in bps.
4. Compute predicted edge after cost:
   - gross edge minus cost edge.
5. Keep only candidates above the strategy's edge threshold.

For the winning row:

- minimum edge after cost was 12c,
- the cost profile was maker-base,
- entry cost was modeled as 30 bps slippage plus 25 bps fee.

### 6.7 Rank candidates within each day

Eligible candidates are ranked by:

- predicted edge after cost,
- volume-weighted score.

The sweep kept the top 500 ranked eligible candidates per day per strategy. This
cap is not synthetic data. It is a deterministic computational cap on candidate
rows. It exists because a $1M book with $25k max position size cannot use every
low-ranked candidate in days with thousands of eligible rows. The cap is applied
consistently across every strategy.

### 6.8 Size positions

Position sizing is dynamic:

- max position size is $25,000,
- edge scale is 20c,
- a higher predicted edge gets closer to the max position,
- a lower predicted edge receives a smaller allocation,
- max gross exposure is constrained,
- category exposure is capped at 35% of max gross exposure.

The target gross exposure is $1,000,000. The winning strategy averaged about
$719,528 deployed per day.

### 6.9 Close positions

Most positions close at settlement. In the tied rank-one drawdown-stop row, the
daily 3c stop did not alter the winning path. In the best-strategy detail
directory, 9,175 trades closed at settlement and 22 residual positions were
forced closed at the end of the backtest window.

### 6.10 Score every strategy

For every strategy configuration, the harness computes:

- total PnL,
- average daily deployment,
- deployment p05, p50, p95,
- fraction of days above $500k deployed,
- positive day rate,
- annualized Sharpe,
- max drawdown,
- max drawdown as fraction of target gross,
- trade count,
- win rate,
- total entry notional,
- objective score.

The objective rewards PnL, Sharpe, deployment, and positive days, and penalizes
underdeployment and drawdown.

## 7. What The Latest Version Is

The latest local version created by this work is:

`real-sweep-v1`

It consists of:

- `scripts/run_real_strategy_sweep.py`
- `artifacts/final_real_backtest_1y/all_strategy_results.csv`
- `artifacts/final_real_backtest_1y/best_strategy_manifest.json`
- `artifacts/final_real_backtest_1y/best_strategy/`
- `artifacts/final_real_backtest_1y/BEST_STRATEGY_METHODOLOGY.md`

The command used for the full run was:

```bash
uv run python scripts/run_real_strategy_sweep.py --output-dir artifacts/final_real_backtest_1y --checkpoint-every 25 --max-signals-per-day 500
```

This latest version is not the live DNS1 production strategy. It is the latest
local, real-panel, one-year strategy sweep.

The prior live-oriented DNS1 line was closer to a high-confidence near-expiry
carry profile. The new sweep says the highest historical objective comes from a
much broader terminal YES profile. That is an important research finding, but
it should be treated with caution because it changes the strategy's economic
character.

## 8. What The Result Means

The result means:

1. The real historical panel contains a very strong terminal YES-side anomaly
   under this empirical surface methodology.
2. The anomaly is strongest in broad 50-95c selected-side prices, not only in
   90c+ near-certain markets.
3. The anomaly is strongest at 0-1 day to expiry.
4. The top result remains strong under both maker-base and taker-stress modeled
   cost profiles.
5. The top result is highly concentrated in Crypto-Markets and
   Other-Miscellaneous.
6. The 95-99.9c terminal bucket performed badly at the bottom of the ranking,
   especially under taker-stress cost.

The result does not prove:

1. That the strategy could actually fill $25k per selected market.
2. That the displayed daily marks were available at executable depth.
3. That live sports, esports, and fast-resolving event markets are safe.
4. That the high Sharpe survives point-in-time universe reconstruction.
5. That the system can deploy live without adverse selection.
6. That the terminal same-day edge is not partly caused by timestamp or
   settlement-timing artifacts.

The correct interpretation is: the sweep found where the historical panel says
the money is, and it also identified the next validation gates required before
any live deployment.

## 9. What The Improvement Should Be

The improvement should not be "deploy the rank-one row as-is." The improvement
should be a new execution-realistic DNS1 version that keeps the empirical edge
engine but adds stronger live-market controls.

### 9.1 Replace the objective with execution-adjusted objective

The current objective is too friendly to terminal same-day opportunities. It
should be replaced or supplemented with an objective that penalizes:

- low executable depth,
- large spread,
- high same-day settlement risk,
- excessive category concentration,
- excessive crypto concentration,
- live sports or esports fast-information markets,
- markets whose quote timestamp is too close to resolution,
- markets whose settlement timing is ambiguous,
- signals that disappear under L2 depth replay,
- signals that cannot be filled at modeled VWAP.

The new objective should optimize expected live PnL after:

- VWAP slippage,
- queue-position haircut,
- maker non-fill probability,
- taker adverse-selection penalty,
- category risk penalty,
- terminal-event risk penalty.

### 9.2 Add historical L2 replay or depth proxy

The current sweep uses a daily historical panel. The next version should join
signals to historical order-book depth wherever available.

For each candidate, the validator should compute:

- best bid and ask at decision time,
- spread,
- depth within 10 bps, 30 bps, 50 bps, and 100 bps,
- expected VWAP for $5k, $10k, $25k, and $50k,
- whether the strategy could actually deploy its target size,
- how much edge remains after filling the full intended size.

Any candidate whose depth-adjusted edge is below threshold should be removed.

### 9.3 Add point-in-time universe reconstruction

The panel backtest should be checked against a point-in-time universe. The
validator must prove that a market was visible and tradable at the decision time
with the metadata that the strategy used.

This matters because a resolved-market research panel can accidentally include
rows or classifications that were not available in the same form at the time of
trade.

The required check is:

- market existed at decision time,
- token ID was known at decision time,
- quote was known at decision time,
- market had not already effectively resolved,
- category metadata was available or safely defaulted,
- resolution timestamp was after the decision timestamp.

### 9.4 Add terminal-market exclusions

The top row's median DTE is zero. That is exactly where timestamp, settlement,
and information-latency risk is highest.

The next version should test and probably enforce:

- no entry after a market-specific cutoff before scheduled start or resolution,
- no entry when a sporting event or live contest is already underway,
- no entry when the market title indicates live sports, esports, or in-game
  scoring unless an explicit live-event model exists,
- no entry when resolution source can update before the quote feed reflects it,
- no entry when spread or depth deteriorates near expiry.

This is especially important because the prior May 9 live-loss investigation
showed that live game/sports-type markets can dominate losses when exits and
exclusions are insufficient.

### 9.5 Add Sentinel/reversion controls to the backtest

The latest production lesson from DNS1 is that entry quality alone is not
enough. The strategy needs a sell/revert layer when the premise changes.

The next backtest should include:

- live-equivalent stale quote detection,
- event-start detection,
- price gap detection,
- order-book collapse detection,
- reversion trigger when edge after depth drops below zero,
- forced liquidation when a market enters a prohibited live state,
- post-entry stop logic that uses actual intraday marks, not just daily marks.

The daily 3c drawdown-stop proxy did not prove much in the winning row because
it tied hold-to-resolution. The real improvement has to use intraday/live
signals.

### 9.6 Add category concentration constraints

Crypto-Markets contributed most of the winning PnL. That may be real, but it is
also a concentration risk.

The next version should test:

- max Crypto-Markets fraction,
- max Other-Miscellaneous fraction,
- max single-title template fraction,
- max single event family fraction,
- max market-maker or source cluster fraction,
- separate performance by category and by quote hour.

The live allocator should not allow a single market family to dominate the
book just because the historical surface says the edge is high.

### 9.7 Add capacity realism

Average deployed notional was $719k, below the $1M target. That is decent, but
not full-capacity.

The next version should report:

- desired notional,
- depth-limited notional,
- actually fillable notional,
- unfilled notional,
- deployment lost to category caps,
- deployment lost to depth caps,
- deployment lost to excluded live-event markets.

Capital allocation should be based on fillable edge, not raw historical edge.

### 9.8 Run nested validation on the final strategy

After adding the execution layer, the best candidate should be tested with:

- purged walk-forward,
- combinatorial purged cross-validation,
- time-block holdout,
- category holdout,
- market-family holdout,
- live-forward paper trading,
- post-loss replay on known bad periods.

The current sweep is one large ranking pass. The next version needs final-model
validation after the strategy class is narrowed.

## 10. Recommended Next Strategy Version

The next version should be called something like:

`real-sweep-v2-execution-adjusted`

It should keep:

- empirical side-resolution surfaces,
- purged walk-forward training,
- category-aware fallback,
- dynamic sizing,
- one CSV row per tested strategy.

It should add:

- L2/depth-adjusted edge,
- point-in-time universe proof,
- live-event and terminal-market exclusions,
- category concentration penalties,
- Sentinel/reversion exits,
- intraday stop testing,
- capacity-adjusted objective,
- post-loss replay gate.

The candidate rule to start from is not "buy everything in the rank-one row."
It is:

1. Start with the broad 50-95c YES-side terminal edge because it ranked first.
2. Remove anything that fails point-in-time and depth validation.
3. Remove live sports/esports/game-state markets unless separately modeled.
4. Penalize crypto concentration until the strategy survives category holdout.
5. Re-run the full one-year sweep with those constraints.
6. Only then promote the best remaining rule to live paper shadow.

## 11. Retention And Cleanup Plan

The folder that now contains the completed backtest result is:

`artifacts/final_real_backtest_1y`

It contains:

- `all_strategy_results.csv`: one CSV with every tested strategy result.
- `all_strategy_results.json`: JSON copy of the same ranking.
- `summary.md`: short result summary.
- `BEST_STRATEGY_METHODOLOGY.md`: this document.
- `panel_manifest.json`: real input-data coverage.
- `fold_manifest.json`: walk-forward fold proof.
- `sweep_grid.json`: exact tested parameter grid.
- `best_strategy_manifest.json`: best selected strategy manifest.
- `best_strategy/daily.csv`: best strategy daily equity/deployment path.
- `best_strategy/trades.csv`: best strategy closed trade rows.
- `best_strategy/signals.csv`: best strategy signal rows.
- `best_strategy/params.json`: best strategy parameters.
- `best_strategy/metrics.json`: best strategy metrics.

I did not delete the rest of the project because the requested deletion scope is
ambiguous and destructive. The safe next cleanup step is to preserve this folder,
the canonical strategy methodology document, and the runnable code paths, then
delete only explicitly approved old artifacts and smoke-test directories.

