HyperAIHyperAI

Command Palette

Search for a command to run...

La loi du travail de décodage : jointures spatiales exactes et prouvables, régies par la marge, sur géométrie compressée

Madhulatha Mandarapu Sandeep Kunkunuru

Résumé

Les jointures spatiales de type filtre-et-raffinement ont toujours évité de toucher la géométrie exacte pour les paires candidates certifiées, mais le domaine n'a jamais modélisé le coût de décompression des paires qui survivent au filtre. Lorsque la géométrie est stockée dans un codec compressé, multi-résolution et décodable progressivement, le coût réel de la jointure se mesure en octets décodés. Nous étudions les jointures d'intersection de polygones prouvablement exactes sur une échelle de niveau de détail (LOD) de Douglas–Peucker, certifiée par un test de marge de Hausdorff bilatéral, et apportons deux contributions. Premièrement, un mécanisme reproductible et un banc d'essai : sur des polygones réels de zones aquatiques du recensement américain TIGER, notre jointure par certificat progressif retourne le résultat exact tout en décodant 3,4 à 16,8 fois (médiane 5,9×) moins de sommets que l'approche naïve de décompression puis raffinement, et environ 4,9 fois moins que la méthode de référence en plusieurs étapes à approximation unique de Brinkhoff et al. [1994], sans aucune violation de justesse sur 31 charges de travail. Deuxièmement, une caractérisation que nous appelons la loi du travail de décodage : le travail de décodage est régi par la marge de dégagement signée de chaque paire — sa proximité au seuil de basculement du prédicat — indépendamment de la taille de l'objet, car le certificat ne descend dans l'échelle que jusqu'à ce que sa résolution batte la marge. La loi est nette sur géométrie contrôlée (R2R^2R2 en validation = 0,87, indépendant de la taille) et directionnelle sur données réelles (R20,55R^2 \approx 0,55R20,55). Nous explicitons ce qui ne tient pas : un prédicteur de sommets proches de la frontière est un mauvais modèle (nous en avons pré-enregistré un et l'avons rejeté), un prévisionniste de régime de sélectivité ne s'est pas matérialisé, et le pire cas est la borne triviale de lecture Ω(v)\Omega(v)Ω(v) sur des frontières entrelacées de manière adversariale. Nous apportons le mécanisme, une comptabilité honnête du budget de décodage, et un banc d'essai ouvert ; nous ne revendiquons pas un nouvel index.

One-sentence Summary

Researchers at VaidhyaMegha Private Limited present a progressive certificate join for provably-exact polygon intersection over compressed Douglas–Peucker LOD geometry that uses a two-sided Hausdorff-margin test, decodes 3.416.8×3.4\text{--}16.8\times3.416.8× fewer vertices with zero correctness violations, and establishes the decode-work law: decode work depends on signed-clearance margin, not object size (held-out R2=0.87R ^ { 2 } = 0 . 8 7R2=0.87, real data (R20.55)( R ^ { 2 } \approx 0 . 5 5 )(R20.55)) while a near-boundary-vertex predictor is rejected.

Key Contributions

  • A progressive certificate join over a Douglas–Peucker level-of-detail ladder with a two-sided Hausdorff-margin test returns exact set-equality on real U.S. Census TIGER water polygons while decoding 3.4–16.8× fewer vertices than naive decompress-then-refine and approximately 4.9× fewer than a single-approximation baseline, with zero correctness violations across 31 workloads.
  • The decode-work law shows that each candidate pair’s decode cost is governed by its signed-clearance margin (distance to the predicate-flip boundary), independent of polygon size, because the certificate descends the ladder only until its resolution beats the margin; the law holds cleanly on controlled geometry (R² = 0.87, size-independent) and directionally on real data (R² ≈ 0.55).
  • A transparent benchmark harness reports explicit negative findings: rejection of a pre-registered near-boundary vertex predictor, non-materialization of a selectivity-based regime forecaster, and the worst-case Ω(v) read bound for adversarially interleaved boundaries, all reproducible from a single command.

Introduction

Spatial joins are fundamental to geographic data systems, typically employing a filter-and-refine pipeline where a cheap minimum-bounding-rectangle step generates candidate pairs and an exact geometric test resolves them. Modern storage formats keep geometry heavily compressed, so the dominant cost of refinement is no longer CPU cycles or I/O but the number of coordinate bytes that must be decoded. While classic multi-step joins use a single intermediate approximation to avoid decoding full geometry for some pairs, prior work never modeled or minimized the decode cost of the candidates that survive filtering. The authors address this gap by introducing a progressive certificate join that builds a multi-resolution Douglas–Peucker ladder, certifying intersections or disjointness at the coarsest level possible and only decoding finer detail when the clearance margin demands it. The key contribution is a provably exact, decode-efficient algorithm that reduces vertex decodes by 3.4–16.8× over naive decompress-then-refine, along with an empirical decode-work law showing that the cost is governed by the per-pair signed-clearance margin rather than polygon size.

Method

The authors formulate the problem around pairs of polygons whose vertices are quantized to a fixed grid, eliminating any lossy rounding gap so that certificate pruning remains a sound approximation of the exact stored geometry. For each polygon, a level-of-detail (LOD) ladder is built using the Douglas–Peucker algorithm, which retains a subset of the exact vertices at each level \ell and guarantees a known Hausdorff distance η\eta_{\ell}η between the original polygon ggg and its level-\ell approximation g~\tilde{g}_{\ell}g~, with ηk=0\eta_k = 0ηk=0 at the finest level. Decoding to level \ell incurs a cost proportional to the cumulative vertices/bytes of that level.

The core mechanism is a two-sided certificate that exploits geometric error bounds. For a candidate pair (r,s)(r, s)(r,s) at levels (r,s)(\ell_r, \ell_s)(r,s), the method constructs an outer approximation Og=g~B(η)O_g = \tilde{g}_\ell \oplus B(\eta_\ell)Og=g~B(η) via Minkowski sum with a disc of radius η\eta_\ellη, giving a sound superset (gOgg \subseteq O_ggOg). Symmetrically, an inner approximation Ig=g~B(η)I_g = \tilde{g}_\ell \ominus B(\eta_\ell)Ig=g~B(η) is obtained through erosion. The pair is immediately resolved by testing these volumes: OrOs=O_r \cap O_s = \emptysetOrOs= certifies disjointness, while IrIsI_r \cap I_s \neq \emptysetIrIs= certifies intersection. When neither certificate holds, the pair is ambiguous and the algorithm descends to a finer level on the coarser-side polygon. At η=0\eta = 0η=0 the test reduces to the exact spatial predicate, and the soundness of the outer test is straightforward. Soundness of the inner (erosion) test is enforced by a correctness gate that verifies set-equality against a full-precision oracle on every run, together with an adversarial control procedure; a pre-registration rule mandates halting and hardening if any violation is detected, and during the reported experiments no violation occurred.

The depth at which a pair becomes decidable is governed by a signed-clearance margin

m(r,s)={+inradius(rs)if rs (overlap depth)dist(r,s)if disjoint (gap).m(r, s) = \begin{cases} +\operatorname{inradius}(r \cap s) & \text{if } r \cap s \neq \varnothing \text{ (overlap depth)} \\ -\operatorname{dist}(r, s) & \text{if disjoint (gap)}. \end{cases}m(r,s)={+inradius(rs)dist(r,s)if rs= (overlap depth)if disjoint (gap).

The absolute value m|m|m captures how far the pair is from the predicate-flip boundary. Since a level-\ell certificate can only resolve features coarser than η\eta_\ellη, the pair certifies at the first level where η<m\eta_\ell < |m|η<m. The primary efficiency metric is the decoded fraction ϕpair=decoded vertices/(r+s)\phi_{\text{pair}} = \text{decoded vertices} / (|\mathbf{r}| + |\mathbf{s}|)ϕpair=decoded vertices/(r+s) and the certifying depth.

To ensure a fair accounting of decode costs, all compared methods start from the same minimum bounding rectangle (MBR) candidate set, so measured differences arise solely from refinement depth. The byte tally includes everything a method reads, particularly the LOD headers that encode level structure. Two baselines are established: naive-refine decodes all vertices of every MBR survivor and then tests with exact precision, while the Brinkhoff baseline uses a two-level scheme (one coarse approximation followed by exact geometry). Ground truth is always evaluated at full precision, providing a consistent oracle for both verification and benchmark comparisons.

Experiment

The evaluation setup uses real U.S. TIGER water polygon self-joins under translation sweeps plus controlled synthetic and adversarial geometries, with pre-registration and exact Shapely/GEOS oracles. The progressive join decodes far fewer vertices than naive or single-approximation baselines while delivering provably exact results, with the multi-level level-of-detail ladder providing most of the savings. The central qualitative finding is a decode-work law showing that per-pair decoding cost is governed by the signed margin between boundaries, not polygon size — a clean, size-independent relationship on controlled data that becomes directional but weaker on real multi-scale geometry, while adversarial interlocking structures force full decoding. The work candidly identifies limitations, including a rejected pre-registered predictor and the modest predictive power of the margin law on real geometry.

Across real TIGER workloads, the progressive join decodes a median 5.9× fewer vertices than naive decompress-then-refine and 4.9× fewer than a single-approximation Brinkhoff baseline, while decoding only 6–29% of vertices and returning exactly the oracle’s pair set. The multi-level ladder amplifies Brinkhoff’s savings roughly fivefold by focusing decode on the small fraction of near-tangent pairs that a single coarse approximation cannot resolve. Median vertex decode reduction of 5.9× over naive refine and 4.9× over Brinkhoff’s single approximation, with all 31 workloads producing provably exact results. Decoded fraction of vertices spans 0.06 to 0.29, showing that the method avoids decompressing the vast majority of geometry. The multi-level ladder roughly quintuples Brinkhoff’s savings by resolving near-tangent pairs that a single coarse approximation leaves undecided.

The evaluation uses real TIGER workloads to compare progressive join decoding against a naive decompress-then-refine baseline and a single-approximation Brinkhoff baseline. The progressive join decodes only a small fraction of vertices while still returning the exact oracle pair set, demonstrating that it avoids decompressing most geometry without sacrificing correctness. The multi-level ladder further amplifies the savings by focusing decode on the small number of near-tangent pairs that a single coarse approximation cannot resolve.


Créer de l'IA avec l'IA

De l'idée au lancement — accélérez votre développement IA avec le co-codage IA gratuit, un environnement prêt à l'emploi et le meilleur prix pour les GPU.

Codage assisté par IA
GPU prêts à l’emploi
Tarifs les plus avantageux

HyperAI Newsletters

Abonnez-vous à nos dernières mises à jour
Nous vous enverrons les dernières mises à jour de la semaine dans votre boîte de réception à neuf heures chaque lundi matin
Propulsé par MailChimp