From 03c25d9b627690d12d584fcd1e35d0b1be1dd231 Mon Sep 17 00:00:00 2001 From: Hermes Agent Date: Fri, 15 May 2026 20:51:56 -0400 Subject: [PATCH] docs: evaluate geotiff importer strategy --- docs/research/geotiff-import-strategy.md | 195 +++++++++++++++++++++++ 1 file changed, 195 insertions(+) create mode 100644 docs/research/geotiff-import-strategy.md diff --git a/docs/research/geotiff-import-strategy.md b/docs/research/geotiff-import-strategy.md new file mode 100644 index 0000000..6e5559c --- /dev/null +++ b/docs/research/geotiff-import-strategy.md @@ -0,0 +1,195 @@ +# GeoTIFF Import Strategy Research Note + +> **Status:** Spike / decision note. Documentation-only — no code or `Cargo.toml` +> changes accompany this note. +> +> **Context:** Phase 4 Milestone C, Task C3 ("Add GeoTIFF spike behind an +> optional feature"). This note records the crate survey and the recommended +> direction so the implementation slice can start from a settled decision. + +## Recommendation + +**Start with a pure-Rust, opt-in GeoTIFF read experiment behind an +`import-geotiff` feature, using `geotiff-reader` (with its `geotiff-core` +support crate) as the first parser.** Do not pull GDAL into the default build, +the test suite, or CI. Treat any GDAL-backed path as future, separately +reviewed work. + +Rationale: + +- **No native dependency in the default build.** Phase 4 explicitly requires + default builds to stay clean and fast (`cargo fmt --check`, `cargo test`, + `cargo clippy`, `git diff --check`). A pure-Rust crate keeps GeoTIFF support + fully optional and removes the C toolchain / `libgdal` install burden from + contributors and CI. +- **License compatibility.** `geotiff-reader` is `MIT OR Apache-2.0`, matching + the project's clean-room, permissively licensed posture. GDAL itself is + MIT-licensed at the library level, but a vendored/native build drags in a + large transitive dependency surface that is harder to audit. +- **Scope fit.** The importer only needs to produce a `HeightGrid` plus + `TerrainSourceMetadata` from a single-band elevation raster. That is a narrow + read-only subset; a full GDAL binding is overkill for the spike. +- **Reversible.** Keeping GeoTIFF behind a feature flag means the crate choice + can be swapped (or upgraded to GDAL) later without disturbing default users. + +The first slice should parse a **tiny generated synthetic GeoTIFF fixture** +only — no proprietary VistaPro data, no large downloaded DEMs. + +## Decision matrix + +Facts below are from `cargo info` at spike time. "GeoTIFF semantics" means the +crate understands GeoTIFF's geo-tags (model tie point / pixel scale, CRS, +elevation band), not just baseline TIFF tags. + +### Pure-Rust candidates + +| Crate | Version | License | MSRV | Native deps | GeoTIFF semantics | Notes | +|---|---|---|---|---|---|---| +| `geotiff-reader` (+ `geotiff-core`) | 0.4.0 | MIT OR Apache-2.0 | rust 1.77 | None | Yes (read-only) | **Recommended.** Pure Rust, read-only; optional `cog` feature pulls `reqwest` for Cloud-Optimized GeoTIFF over HTTP — leave `cog` off. | +| `tiff-reader` | 0.4.0 | MIT OR Apache-2.0 | — | None ("no C deps") | Partial / TIFF-level | Pure-Rust TIFF reader, no C deps; useful fallback or low-level reader, but not a full GeoTIFF semantics layer on its own. | +| `geotiff` (georust, older) | 0.1.0 | MIT | rust 1.70 | None | Yes (limited) | Older, low-version georust crate; narrow API and stale. Acceptable fallback only if `geotiff-reader` regresses. | +| `image` TIFF (`image-rs/tiff`) | 0.10.3 | MIT | rust 1.74 | None | No | Robust pure-Rust TIFF decode/encode, already adjacent to the project's `image` use, but it does **not** interpret GeoTIFF geo-tags. Could back a hand-rolled geo-tag reader if needed. | +| `wbraster` | 0.1.5 | MIT OR Apache-2.0 | — | None (Rust) | Raster I/O | Whitebox raster abstraction; broad raster model, early-stage (0.1.x), heavier than the spike needs. | +| `wbgeotiff` | 0.1.2 | MIT OR Apache-2.0 | — | None (Rust) | Yes | Whitebox GeoTIFF reader/writer; very early (0.1.2). Watch as an alternative if `geotiff-reader` stalls. | +| `georaster` | 0.2.0 | MIT/Apache-2.0 | — | None | Yes (raster) | Pure-Rust georeferenced raster reader; reasonable secondary option, still 0.2.x. | + +### GDAL-backed candidates + +| Crate | Version | License | MSRV | Native deps | Notes | +|---|---|---|---|---|---| +| `gdal` | 0.19.0 | MIT | rust 1.80 | **Yes** — system `libgdal`, or `gdal-src`/`bindgen` to build from source | Mature, full-format coverage. Native GDAL is the main risk: install friction, version skew, large audit surface, brittle CI. Optional `gdal-src` vendoring trades that for long C/C++ build times and a `bindgen`/Clang requirement. | +| `gdal-sys` | 0.12.0 | MIT | rust 1.77 | **Yes** — raw FFI bindings to `libgdal` | Low-level; never used directly here. Same native risk as `gdal`. | +| `rstiff` | 0.2.0 | MIT | — | **Yes** — "powered by GDAL" | A GeoTIFF-focused wrapper, but still GDAL-backed, so it carries the same native dependency risk; early-stage (0.2.x). | +| `oxigdal-geotiff` | 0.1.4 | Apache-2.0 | rust 1.85 | None declared; pure-Rust OxiGDAL driver | Newest entrant, high MSRV (rust 1.85), very early (0.1.4). Too immature and too new-MSRV for the spike even though it avoids native GDAL. | + +## Licensing assessment + +- Project posture: clean-room, open-source, permissively licensed (see + [`docs/legal/asset-policy.md`](../legal/asset-policy.md)). +- `geotiff-reader` / `geotiff-core`, `tiff-reader`, `wbraster`, `wbgeotiff`, and + `georaster` are all `MIT OR Apache-2.0` (or `MIT/Apache-2.0`) — fully + compatible. +- `image`'s TIFF crate, `geotiff` (georust), `gdal`, `gdal-sys`, and `rstiff` + are MIT — compatible. +- `oxigdal-geotiff` is Apache-2.0 only — compatible, but it removes the dual + MIT/Apache option for downstream consumers of that path. +- No copyleft crate appears in the survey. The recommended `geotiff-reader` + path keeps the dual `MIT OR Apache-2.0` license clean throughout. +- **Fixtures, not crates, are the real licensing hazard.** Per + `asset-policy.md`, do not commit proprietary sample landscapes or data. + GeoTIFF test inputs must be synthetic and generated by this project, or + public-domain (USGS/NASA) and redistributed under documented source terms. + +## Native dependency risk + +- A GDAL-backed crate (`gdal`, `gdal-sys`, `rstiff`) requires either a system + `libgdal` (version drift across developer machines, distros, and CI) or a + source build via `gdal-src` (long C/C++ compile, `bindgen` + Clang + toolchain). Either path makes default builds slow or fragile and enlarges the + audited dependency surface. +- Phase 4's plan (Task C3) explicitly anticipates this: "If GDAL is required, + keep the feature opt-in and document native setup," and the cross-milestone + checks expect `cargo test --features import-geotiff` to either pass *or* + produce "a documented skip if native GDAL is not installed." +- **Mitigation:** the recommended pure-Rust path has **zero** native + dependencies, so `import-geotiff` builds and tests run anywhere `cargo` runs. + The "documented skip" escape hatch is then unnecessary for the chosen path + and is reserved only for a hypothetical future GDAL feature. + +## Feature flag plan + +Consistent with Task C1's existing feature scaffold (`import-dem`, +`import-hgt`, `import-geotiff` under `default = []`): + +- `import-geotiff` — **the spike target.** Enables the pure-Rust + `geotiff-reader` + `geotiff-core` dependency and the `src/import/geotiff.rs` + module. Off by default. +- Do **not** enable `geotiff-reader`'s optional `cog` feature; it pulls + `reqwest` and network/TLS dependencies that are out of scope for local file + import. +- Reserve a separate, *not-yet-created* feature name (e.g. `import-geotiff-gdal`) + for any future GDAL-backed path so the two backends never share a flag. This + keeps the native-dependency build strictly opt-in and clearly labeled. +- Default and `--no-default-features` builds must remain GeoTIFF-free; only + `cargo test --features import-geotiff` and `--all-features` exercise it. + +## Fixture and test strategy + +- **Synthetic only.** Generate a tiny single-band elevation GeoTIFF (e.g. 3×3 + or 4×4, signed/float elevation samples, a simple model tie point + pixel + scale, a basic CRS tag) as a project-owned fixture. This mirrors the HGT + approach in Task B1, which uses inline synthetic bytes. +- **Generation, not download.** Prefer generating the fixture programmatically + (a small build helper, a checked-in generator test, or a documented + one-off). If a writer crate is needed, `image`'s TIFF encoder or `wbgeotiff` + can produce baseline bytes; keep any generator code clearly separate from the + parser slice per the plan's commit strategy. +- **What to assert:** parsed dimensions, elevation unit, a couple of known + sample values, and that geo metadata maps into `TerrainSourceMetadata` + (format `"geotiff"`, width, height, `ElevationUnit`). +- **Malformed-input tests:** truncated file, missing geo-tags, unsupported + multi-band or compressed input → typed `ImportError` variants, matching the + rejection style of Task B2. +- **No proprietary data, no large DEMs in the repo.** A real USGS/NASA tile may + be used locally for manual verification but must stay out of git + (`reference/`, `.work/` are already ignored). +- The committed fixture should be only as large as the test needs (target: + well under a kilobyte). + +## Follow-up implementation slice + +Concrete next slice for Task C3, to be done TDD (RED → minimal GREEN → +validation) on a separate implementation branch: + +1. **Add the feature + dependency.** In `Cargo.toml`, make `geotiff-reader` an + optional dependency gated by `import-geotiff` (default `cog` feature + disabled). Confirm `cargo test --no-default-features` and the default build + stay GeoTIFF-free. +2. **Create `src/import/geotiff.rs`.** A `cfg(feature = "import-geotiff")` + module exposing something like + `parse_geotiff_bytes(&[u8]) -> Result`, + reusing the `ImportedTerrain` / `ImportError` / `TerrainSourceMetadata` + types from Milestone A. +3. **Generate the synthetic fixture.** Add a tiny generated single-band + elevation GeoTIFF as a project-owned fixture (separate commit from the + parser, per the plan's commit strategy). +4. **RED test.** `cargo test geotiff --features import-geotiff` — assert + dimensions, sample values, and metadata; assert malformed input yields a + typed error. +5. **Minimal GREEN.** Parse only the tiny supported subset (single band, + uncompressed, known sample type). Reject everything else explicitly; + document that broad real-world GeoTIFF coverage is future work. +6. **CLI plumbing (optional, can be a later slice).** Mirror the HGT pattern + (Task B3): an `--import-geotiff ` render input, gated so it only + appears when the feature is built. +7. **Validation.** `cargo test --no-default-features`, + `cargo test --features import-geotiff`, `cargo test --all-features`, + `cargo fmt --check`, `cargo clippy --all-targets -- -D warnings`, + `git diff --check`. +8. **Update `openvistapro info`** to report GeoTIFF support honestly — only + once a real parser and fixture exist (consistent with Task A3). + +If `geotiff-reader` proves insufficient during the spike (missing geo-tag +coverage, API gaps), fall back in this order: `georaster` → `wbgeotiff` → +hand-rolled geo-tag reading on top of `image`'s TIFF decoder. A GDAL-backed +path stays out of scope until a separate review approves the native-dependency +cost. + +## Sources + +Survey performed via `cargo info`; verify current versions before +implementation. + +- `geotiff-reader` — · +- `geotiff-core` — · +- `tiff-reader` — · +- `geotiff` (georust) — · · +- `image` TIFF crate — · · +- `georaster` — · +- `wbraster` — · +- `wbgeotiff` — · +- `oxigdal-geotiff` — · +- `gdal` — · · +- `gdal-sys` — · +- `rstiff` — · +- Project constraints — [`docs/legal/asset-policy.md`](../legal/asset-policy.md), [`docs/plans/phase-4-formats-scripts-ui.md`](../plans/phase-4-formats-scripts-ui.md) -- 2.39.5