HyperAIHyperAI

Command Palette

Search for a command to run...

The Decode-Work Law: Margin-Governed, Provably-Exact Spatial Joins over Compressed Geometry

Madhulatha Mandarapu Sandeep Kunkunuru

Abstract

Filter-and-refine spatial joins have always avoided touching exact geometry for certified candidate pairs, but the field never modeled the decompression cost of the pairs that survive the filter. When geometry is stored in a compressed, progressively-decodable multiresolution codec, the join's true cost is bytes decoded. We study provably-exact polygon intersection joins over a Douglas–Peucker level-of-detail (LOD) ladder, certified by a two-sided Hausdorff-margin test, and make two contributions. First, a reproducible mechanism and harness: on real U.S. Census TIGER water polygons, our progressive certificate join returns the exact result while decoding 3.4–16.8× (median 5.9×) fewer vertices than naive decompress-then-refine, and about 4.9× fewer than the single-approximation multi-step baseline of Brinkhoff et al. [1994], with zero correctness violations across 31 workloads. Second, a characterization we call the decode-work law: decode work is governed by each pair's signed-clearance margin—how close it is to the predicate-flip boundary—independent of object size, because the certificate descends the ladder only until its resolution beats the margin. The law is clean on controlled geometry (held-out R2=0.87R ^ { 2 } = 0 . 8 7R2=0.87 , size-independent) and directional on real data (R20.55)( R ^ { 2 } \approx 0 . 5 5 )(R20.55) . We are explicit about what does not hold: a near-boundary-vertex predictor is the wrong model (we pre-registered one and rejected it), a selectivity regime forecaster did not materialize, and the worst case is the trivial Ω(v)\Omega ( v )Ω(v) read bound on adversarially interleaved boundaries. We contribute the mechanism, budget-honest decode accounting, and an open harness; we do not claim a new index.

One-sentence Summary

Researchers at VaidhyaMegha Private Limited present a progressive certificate join for provably-exact polygon intersection over compressed Douglas–Peucker LOD geometry that uses a two-sided Hausdorff-margin test, decodes 3.416.8×3.4\text{--}16.8\times3.416.8× fewer vertices with zero correctness violations, and establishes the decode-work law: decode work depends on signed-clearance margin, not object size (held-out R2=0.87R ^ { 2 } = 0 . 8 7R2=0.87, real data (R20.55)( R ^ { 2 } \approx 0 . 5 5 )(R20.55)) while a near-boundary-vertex predictor is rejected.

Key Contributions

  • A progressive certificate join over a Douglas–Peucker level-of-detail ladder with a two-sided Hausdorff-margin test returns exact set-equality on real U.S. Census TIGER water polygons while decoding 3.4–16.8× fewer vertices than naive decompress-then-refine and approximately 4.9× fewer than a single-approximation baseline, with zero correctness violations across 31 workloads.
  • The decode-work law shows that each candidate pair’s decode cost is governed by its signed-clearance margin (distance to the predicate-flip boundary), independent of polygon size, because the certificate descends the ladder only until its resolution beats the margin; the law holds cleanly on controlled geometry (R² = 0.87, size-independent) and directionally on real data (R² ≈ 0.55).
  • A transparent benchmark harness reports explicit negative findings: rejection of a pre-registered near-boundary vertex predictor, non-materialization of a selectivity-based regime forecaster, and the worst-case Ω(v) read bound for adversarially interleaved boundaries, all reproducible from a single command.

Introduction

Spatial joins are fundamental to geographic data systems, typically employing a filter-and-refine pipeline where a cheap minimum-bounding-rectangle step generates candidate pairs and an exact geometric test resolves them. Modern storage formats keep geometry heavily compressed, so the dominant cost of refinement is no longer CPU cycles or I/O but the number of coordinate bytes that must be decoded. While classic multi-step joins use a single intermediate approximation to avoid decoding full geometry for some pairs, prior work never modeled or minimized the decode cost of the candidates that survive filtering. The authors address this gap by introducing a progressive certificate join that builds a multi-resolution Douglas–Peucker ladder, certifying intersections or disjointness at the coarsest level possible and only decoding finer detail when the clearance margin demands it. The key contribution is a provably exact, decode-efficient algorithm that reduces vertex decodes by 3.4–16.8× over naive decompress-then-refine, along with an empirical decode-work law showing that the cost is governed by the per-pair signed-clearance margin rather than polygon size.

Method

The authors formulate the problem around pairs of polygons whose vertices are quantized to a fixed grid, eliminating any lossy rounding gap so that certificate pruning remains a sound approximation of the exact stored geometry. For each polygon, a level-of-detail (LOD) ladder is built using the Douglas–Peucker algorithm, which retains a subset of the exact vertices at each level \ell and guarantees a known Hausdorff distance η\eta_{\ell}η between the original polygon ggg and its level-\ell approximation g~\tilde{g}_{\ell}g~, with ηk=0\eta_k = 0ηk=0 at the finest level. Decoding to level \ell incurs a cost proportional to the cumulative vertices/bytes of that level.

The core mechanism is a two-sided certificate that exploits geometric error bounds. For a candidate pair (r,s)(r, s)(r,s) at levels (r,s)(\ell_r, \ell_s)(r,s), the method constructs an outer approximation Og=g~B(η)O_g = \tilde{g}_\ell \oplus B(\eta_\ell)Og=g~B(η) via Minkowski sum with a disc of radius η\eta_\ellη, giving a sound superset (gOgg \subseteq O_ggOg). Symmetrically, an inner approximation Ig=g~B(η)I_g = \tilde{g}_\ell \ominus B(\eta_\ell)Ig=g~B(η) is obtained through erosion. The pair is immediately resolved by testing these volumes: OrOs=O_r \cap O_s = \emptysetOrOs= certifies disjointness, while IrIsI_r \cap I_s \neq \emptysetIrIs= certifies intersection. When neither certificate holds, the pair is ambiguous and the algorithm descends to a finer level on the coarser-side polygon. At η=0\eta = 0η=0 the test reduces to the exact spatial predicate, and the soundness of the outer test is straightforward. Soundness of the inner (erosion) test is enforced by a correctness gate that verifies set-equality against a full-precision oracle on every run, together with an adversarial control procedure; a pre-registration rule mandates halting and hardening if any violation is detected, and during the reported experiments no violation occurred.

The depth at which a pair becomes decidable is governed by a signed-clearance margin

m(r,s)={+inradius(rs)if rs (overlap depth)dist(r,s)if disjoint (gap).m(r, s) = \begin{cases} +\operatorname{inradius}(r \cap s) & \text{if } r \cap s \neq \varnothing \text{ (overlap depth)} \\ -\operatorname{dist}(r, s) & \text{if disjoint (gap)}. \end{cases}m(r,s)={+inradius(rs)dist(r,s)if rs= (overlap depth)if disjoint (gap).

The absolute value m|m|m captures how far the pair is from the predicate-flip boundary. Since a level-\ell certificate can only resolve features coarser than η\eta_\ellη, the pair certifies at the first level where η<m\eta_\ell < |m|η<m. The primary efficiency metric is the decoded fraction ϕpair=decoded vertices/(r+s)\phi_{\text{pair}} = \text{decoded vertices} / (|\mathbf{r}| + |\mathbf{s}|)ϕpair=decoded vertices/(r+s) and the certifying depth.

To ensure a fair accounting of decode costs, all compared methods start from the same minimum bounding rectangle (MBR) candidate set, so measured differences arise solely from refinement depth. The byte tally includes everything a method reads, particularly the LOD headers that encode level structure. Two baselines are established: naive-refine decodes all vertices of every MBR survivor and then tests with exact precision, while the Brinkhoff baseline uses a two-level scheme (one coarse approximation followed by exact geometry). Ground truth is always evaluated at full precision, providing a consistent oracle for both verification and benchmark comparisons.

Experiment

The evaluation setup uses real U.S. TIGER water polygon self-joins under translation sweeps plus controlled synthetic and adversarial geometries, with pre-registration and exact Shapely/GEOS oracles. The progressive join decodes far fewer vertices than naive or single-approximation baselines while delivering provably exact results, with the multi-level level-of-detail ladder providing most of the savings. The central qualitative finding is a decode-work law showing that per-pair decoding cost is governed by the signed margin between boundaries, not polygon size — a clean, size-independent relationship on controlled data that becomes directional but weaker on real multi-scale geometry, while adversarial interlocking structures force full decoding. The work candidly identifies limitations, including a rejected pre-registered predictor and the modest predictive power of the margin law on real geometry.

Across real TIGER workloads, the progressive join decodes a median 5.9× fewer vertices than naive decompress-then-refine and 4.9× fewer than a single-approximation Brinkhoff baseline, while decoding only 6–29% of vertices and returning exactly the oracle’s pair set. The multi-level ladder amplifies Brinkhoff’s savings roughly fivefold by focusing decode on the small fraction of near-tangent pairs that a single coarse approximation cannot resolve. Median vertex decode reduction of 5.9× over naive refine and 4.9× over Brinkhoff’s single approximation, with all 31 workloads producing provably exact results. Decoded fraction of vertices spans 0.06 to 0.29, showing that the method avoids decompressing the vast majority of geometry. The multi-level ladder roughly quintuples Brinkhoff’s savings by resolving near-tangent pairs that a single coarse approximation leaves undecided.

The evaluation uses real TIGER workloads to compare progressive join decoding against a naive decompress-then-refine baseline and a single-approximation Brinkhoff baseline. The progressive join decodes only a small fraction of vertices while still returning the exact oracle pair set, demonstrating that it avoids decompressing most geometry without sacrificing correctness. The multi-level ladder further amplifies the savings by focusing decode on the small number of near-tangent pairs that a single coarse approximation cannot resolve.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp