HyperAIHyperAI

Command Palette

Search for a command to run...

デコードワークの法則:圧縮幾何データに対するマージン支配型・証明可能な厳密空間結合

Madhulatha Mandarapu Sandeep Kunkunuru

概要

フィルタリング・精緻化型の空間結合では、常に認証された候補ペアに対して厳密な幾何データに触れることを避けてきたが、フィルタを通過したペアの解凍コストをモデル化したことはなかった。幾何データが圧縮された段階的デコード可能な多重解像度コーデックで保存されている場合、結合の真のコストはデコードされたバイト数となる。我々は、ダグラス・ポイカー詳細度(LOD)ラダー上での多角形交差結合を、両側ハウスドルフマージンテストによって認証し、証明可能な厳密性を保証する手法を研究し、二つの貢献を行う。第一に、再現可能なメカニズムとハーネス:実際の米国国勢調査TIGER水域ポリゴンを用いた実験において、我々の段階的認証結合は、単純な解凍後精緻化と比較して3.4~16.8倍(中央値5.9倍)少ない頂点数で厳密な結果を返し、Brinkhoffら[1994]の単一近似多段階ベースラインと比較して約4.9倍少ない頂点数で、31のワークロード全体で正当性違反ゼロを達成した。第二に、我々がデコードワークの法則と呼ぶ特性:デコード作業量は、各ペアの符号付きクリアランスマージン(述語反転境界への近さ)によって支配され、オブジェクトサイズには依存しない。なぜなら、認証はその解像度がマージンを上回るまでラダーを降りるだけだからである。この法則は、制御された幾何データでは明瞭であり(ホールドアウトR2=0.87R^{2}=0.87R2=0.87、サイズ非依存)、実データでは方向性を示す(R20.55R^{2}\approx0.55R20.55)。我々は成立しないことを明示する:近境界頂点予測子は誤ったモデルであり(事前登録し棄却した)、選択率レジーム予測子は具現化せず、最悪ケースは敵対的に交錯する境界に対する自明なΩ(v)\Omega(v)Ω(v)の読み取り下限である。我々はメカニズム、予算を考慮した正直なデコード会計、およびオープンハーネスを提供するが、新しいインデックスを主張するものではない。

One-sentence Summary

Researchers at VaidhyaMegha Private Limited present a progressive certificate join for provably-exact polygon intersection over compressed Douglas–Peucker LOD geometry that uses a two-sided Hausdorff-margin test, decodes 3.416.8×3.4\text{--}16.8\times3.416.8× fewer vertices with zero correctness violations, and establishes the decode-work law: decode work depends on signed-clearance margin, not object size (held-out R2=0.87R ^ { 2 } = 0 . 8 7R2=0.87, real data (R20.55)( R ^ { 2 } \approx 0 . 5 5 )(R20.55)) while a near-boundary-vertex predictor is rejected.

Key Contributions

  • A progressive certificate join over a Douglas–Peucker level-of-detail ladder with a two-sided Hausdorff-margin test returns exact set-equality on real U.S. Census TIGER water polygons while decoding 3.4–16.8× fewer vertices than naive decompress-then-refine and approximately 4.9× fewer than a single-approximation baseline, with zero correctness violations across 31 workloads.
  • The decode-work law shows that each candidate pair’s decode cost is governed by its signed-clearance margin (distance to the predicate-flip boundary), independent of polygon size, because the certificate descends the ladder only until its resolution beats the margin; the law holds cleanly on controlled geometry (R² = 0.87, size-independent) and directionally on real data (R² ≈ 0.55).
  • A transparent benchmark harness reports explicit negative findings: rejection of a pre-registered near-boundary vertex predictor, non-materialization of a selectivity-based regime forecaster, and the worst-case Ω(v) read bound for adversarially interleaved boundaries, all reproducible from a single command.

Introduction

Spatial joins are fundamental to geographic data systems, typically employing a filter-and-refine pipeline where a cheap minimum-bounding-rectangle step generates candidate pairs and an exact geometric test resolves them. Modern storage formats keep geometry heavily compressed, so the dominant cost of refinement is no longer CPU cycles or I/O but the number of coordinate bytes that must be decoded. While classic multi-step joins use a single intermediate approximation to avoid decoding full geometry for some pairs, prior work never modeled or minimized the decode cost of the candidates that survive filtering. The authors address this gap by introducing a progressive certificate join that builds a multi-resolution Douglas–Peucker ladder, certifying intersections or disjointness at the coarsest level possible and only decoding finer detail when the clearance margin demands it. The key contribution is a provably exact, decode-efficient algorithm that reduces vertex decodes by 3.4–16.8× over naive decompress-then-refine, along with an empirical decode-work law showing that the cost is governed by the per-pair signed-clearance margin rather than polygon size.

Method

The authors formulate the problem around pairs of polygons whose vertices are quantized to a fixed grid, eliminating any lossy rounding gap so that certificate pruning remains a sound approximation of the exact stored geometry. For each polygon, a level-of-detail (LOD) ladder is built using the Douglas–Peucker algorithm, which retains a subset of the exact vertices at each level \ell and guarantees a known Hausdorff distance η\eta_{\ell}η between the original polygon ggg and its level-\ell approximation g~\tilde{g}_{\ell}g~, with ηk=0\eta_k = 0ηk=0 at the finest level. Decoding to level \ell incurs a cost proportional to the cumulative vertices/bytes of that level.

The core mechanism is a two-sided certificate that exploits geometric error bounds. For a candidate pair (r,s)(r, s)(r,s) at levels (r,s)(\ell_r, \ell_s)(r,s), the method constructs an outer approximation Og=g~B(η)O_g = \tilde{g}_\ell \oplus B(\eta_\ell)Og=g~B(η) via Minkowski sum with a disc of radius η\eta_\ellη, giving a sound superset (gOgg \subseteq O_ggOg). Symmetrically, an inner approximation Ig=g~B(η)I_g = \tilde{g}_\ell \ominus B(\eta_\ell)Ig=g~B(η) is obtained through erosion. The pair is immediately resolved by testing these volumes: OrOs=O_r \cap O_s = \emptysetOrOs= certifies disjointness, while IrIsI_r \cap I_s \neq \emptysetIrIs= certifies intersection. When neither certificate holds, the pair is ambiguous and the algorithm descends to a finer level on the coarser-side polygon. At η=0\eta = 0η=0 the test reduces to the exact spatial predicate, and the soundness of the outer test is straightforward. Soundness of the inner (erosion) test is enforced by a correctness gate that verifies set-equality against a full-precision oracle on every run, together with an adversarial control procedure; a pre-registration rule mandates halting and hardening if any violation is detected, and during the reported experiments no violation occurred.

The depth at which a pair becomes decidable is governed by a signed-clearance margin

m(r,s)={+inradius(rs)if rs (overlap depth)dist(r,s)if disjoint (gap).m(r, s) = \begin{cases} +\operatorname{inradius}(r \cap s) & \text{if } r \cap s \neq \varnothing \text{ (overlap depth)} \\ -\operatorname{dist}(r, s) & \text{if disjoint (gap)}. \end{cases}m(r,s)={+inradius(rs)dist(r,s)if rs= (overlap depth)if disjoint (gap).

The absolute value m|m|m captures how far the pair is from the predicate-flip boundary. Since a level-\ell certificate can only resolve features coarser than η\eta_\ellη, the pair certifies at the first level where η<m\eta_\ell < |m|η<m. The primary efficiency metric is the decoded fraction ϕpair=decoded vertices/(r+s)\phi_{\text{pair}} = \text{decoded vertices} / (|\mathbf{r}| + |\mathbf{s}|)ϕpair=decoded vertices/(r+s) and the certifying depth.

To ensure a fair accounting of decode costs, all compared methods start from the same minimum bounding rectangle (MBR) candidate set, so measured differences arise solely from refinement depth. The byte tally includes everything a method reads, particularly the LOD headers that encode level structure. Two baselines are established: naive-refine decodes all vertices of every MBR survivor and then tests with exact precision, while the Brinkhoff baseline uses a two-level scheme (one coarse approximation followed by exact geometry). Ground truth is always evaluated at full precision, providing a consistent oracle for both verification and benchmark comparisons.

Experiment

The evaluation setup uses real U.S. TIGER water polygon self-joins under translation sweeps plus controlled synthetic and adversarial geometries, with pre-registration and exact Shapely/GEOS oracles. The progressive join decodes far fewer vertices than naive or single-approximation baselines while delivering provably exact results, with the multi-level level-of-detail ladder providing most of the savings. The central qualitative finding is a decode-work law showing that per-pair decoding cost is governed by the signed margin between boundaries, not polygon size — a clean, size-independent relationship on controlled data that becomes directional but weaker on real multi-scale geometry, while adversarial interlocking structures force full decoding. The work candidly identifies limitations, including a rejected pre-registered predictor and the modest predictive power of the margin law on real geometry.

Across real TIGER workloads, the progressive join decodes a median 5.9× fewer vertices than naive decompress-then-refine and 4.9× fewer than a single-approximation Brinkhoff baseline, while decoding only 6–29% of vertices and returning exactly the oracle’s pair set. The multi-level ladder amplifies Brinkhoff’s savings roughly fivefold by focusing decode on the small fraction of near-tangent pairs that a single coarse approximation cannot resolve. Median vertex decode reduction of 5.9× over naive refine and 4.9× over Brinkhoff’s single approximation, with all 31 workloads producing provably exact results. Decoded fraction of vertices spans 0.06 to 0.29, showing that the method avoids decompressing the vast majority of geometry. The multi-level ladder roughly quintuples Brinkhoff’s savings by resolving near-tangent pairs that a single coarse approximation leaves undecided.

The evaluation uses real TIGER workloads to compare progressive join decoding against a naive decompress-then-refine baseline and a single-approximation Brinkhoff baseline. The progressive join decodes only a small fraction of vertices while still returning the exact oracle pair set, demonstrating that it avoids decompressing most geometry without sacrificing correctness. The multi-level ladder further amplifies the savings by focusing decode on the small number of near-tangent pairs that a single coarse approximation cannot resolve.


AIでAIを構築

アイデアからローンチまで — 無料のAIコーディング支援、すぐに使える環境、最高のGPU価格でAI開発を加速。

AI コーディング補助
すぐに使える GPU
最適な料金体系

HyperAI Newsletters

最新情報を購読する
北京時間 毎週月曜日の午前9時 に、その週の最新情報をメールでお届けします
メール配信サービスは MailChimp によって提供されています