Files
openvistapro/docs/research/geotiff-import-strategy.md
2026-05-16 16:00:51 -04:00

13 KiB
Raw Permalink Blame History

GeoTIFF Import Strategy Research Note

Status: Decision implemented. The recommended pure-Rust path has landed as the optional import-geotiff feature; the crate survey below is kept for provenance.

Context: Phase 4 Milestone C, Task C3 ("GeoTIFF importer behind an optional feature"). This note recorded the crate survey and recommended direction; the "Implementation notes" section below describes what was actually built.

Recommendation

Start with a pure-Rust, opt-in GeoTIFF read experiment behind an import-geotiff feature, using geotiff-reader (with its geotiff-core support crate) as the first parser. Do not pull GDAL into the default build, the test suite, or CI. Treat any GDAL-backed path as future, separately reviewed work.

Rationale:

  • No native dependency in the default build. Phase 4 explicitly requires default builds to stay clean and fast (cargo fmt --check, cargo test, cargo clippy, git diff --check). A pure-Rust crate keeps GeoTIFF support fully optional and removes the C toolchain / libgdal install burden from contributors and CI.
  • License compatibility. geotiff-reader is MIT OR Apache-2.0, matching the project's clean-room, permissively licensed posture. GDAL itself is MIT-licensed at the library level, but a vendored/native build drags in a large transitive dependency surface that is harder to audit.
  • Scope fit. The importer only needs to produce a HeightGrid plus TerrainSourceMetadata from a single-band elevation raster. That is a narrow read-only subset; a full GDAL binding is overkill for the spike.
  • Reversible. Keeping GeoTIFF behind a feature flag means the crate choice can be swapped (or upgraded to GDAL) later without disturbing default users.

The first slice should parse a tiny generated synthetic GeoTIFF fixture only — no proprietary VistaPro data, no large downloaded DEMs.

Decision matrix

Facts below are from cargo info at spike time. "GeoTIFF semantics" means the crate understands GeoTIFF's geo-tags (model tie point / pixel scale, CRS, elevation band), not just baseline TIFF tags.

Pure-Rust candidates

Crate Version License MSRV Native deps GeoTIFF semantics Notes
geotiff-reader (+ geotiff-core) 0.4.0 MIT OR Apache-2.0 rust 1.77 None Yes (read-only) Recommended. Pure Rust, read-only; optional cog feature pulls reqwest for Cloud-Optimized GeoTIFF over HTTP — leave cog off.
tiff-reader 0.4.0 MIT OR Apache-2.0 None ("no C deps") Partial / TIFF-level Pure-Rust TIFF reader, no C deps; useful fallback or low-level reader, but not a full GeoTIFF semantics layer on its own.
geotiff (georust, older) 0.1.0 MIT rust 1.70 None Yes (limited) Older, low-version georust crate; narrow API and stale. Acceptable fallback only if geotiff-reader regresses.
image TIFF (image-rs/tiff) 0.10.3 MIT rust 1.74 None No Robust pure-Rust TIFF decode/encode, already adjacent to the project's image use, but it does not interpret GeoTIFF geo-tags. Could back a hand-rolled geo-tag reader if needed.
wbraster 0.1.5 MIT OR Apache-2.0 None (Rust) Raster I/O Whitebox raster abstraction; broad raster model, early-stage (0.1.x), heavier than the spike needs.
wbgeotiff 0.1.2 MIT OR Apache-2.0 None (Rust) Yes Whitebox GeoTIFF reader/writer; very early (0.1.2). Watch as an alternative if geotiff-reader stalls.
georaster 0.2.0 MIT/Apache-2.0 None Yes (raster) Pure-Rust georeferenced raster reader; reasonable secondary option, still 0.2.x.

GDAL-backed candidates

Crate Version License MSRV Native deps Notes
gdal 0.19.0 MIT rust 1.80 Yes — system libgdal, or gdal-src/bindgen to build from source Mature, full-format coverage. Native GDAL is the main risk: install friction, version skew, large audit surface, brittle CI. Optional gdal-src vendoring trades that for long C/C++ build times and a bindgen/Clang requirement.
gdal-sys 0.12.0 MIT rust 1.77 Yes — raw FFI bindings to libgdal Low-level; never used directly here. Same native risk as gdal.
rstiff 0.2.0 MIT Yes — "powered by GDAL" A GeoTIFF-focused wrapper, but still GDAL-backed, so it carries the same native dependency risk; early-stage (0.2.x).
oxigdal-geotiff 0.1.4 Apache-2.0 rust 1.85 None declared; pure-Rust OxiGDAL driver Newest entrant, high MSRV (rust 1.85), very early (0.1.4). Too immature and too new-MSRV for the spike even though it avoids native GDAL.

Licensing assessment

  • Project posture: clean-room, open-source, permissively licensed (see docs/legal/asset-policy.md).
  • geotiff-reader / geotiff-core, tiff-reader, wbraster, wbgeotiff, and georaster are all MIT OR Apache-2.0 (or MIT/Apache-2.0) — fully compatible.
  • image's TIFF crate, geotiff (georust), gdal, gdal-sys, and rstiff are MIT — compatible.
  • oxigdal-geotiff is Apache-2.0 only — compatible, but it removes the dual MIT/Apache option for downstream consumers of that path.
  • No copyleft crate appears in the survey. The recommended geotiff-reader path keeps the dual MIT OR Apache-2.0 license clean throughout.
  • Fixtures, not crates, are the real licensing hazard. Per asset-policy.md, do not commit proprietary sample landscapes or data. GeoTIFF test inputs must be synthetic and generated by this project, or public-domain (USGS/NASA) and redistributed under documented source terms.

Native dependency risk

  • A GDAL-backed crate (gdal, gdal-sys, rstiff) requires either a system libgdal (version drift across developer machines, distros, and CI) or a source build via gdal-src (long C/C++ compile, bindgen + Clang toolchain). Either path makes default builds slow or fragile and enlarges the audited dependency surface.
  • Phase 4's plan (Task C3) explicitly anticipates this: "If GDAL is required, keep the feature opt-in and document native setup," and the cross-milestone checks expect cargo test --features import-geotiff to either pass or produce "a documented skip if native GDAL is not installed."
  • Mitigation: the recommended pure-Rust path has zero native dependencies, so import-geotiff builds and tests run anywhere cargo runs. The "documented skip" escape hatch is then unnecessary for the chosen path and is reserved only for a hypothetical future GDAL feature.

Feature flag plan

Consistent with Task C1's existing feature scaffold (import-dem, import-hgt, import-geotiff under default = []):

  • import-geotiffthe spike target. Enables the pure-Rust geotiff-reader + geotiff-core dependency and the src/import/geotiff.rs module. Off by default.
  • Do not enable geotiff-reader's optional cog feature; it pulls reqwest and network/TLS dependencies that are out of scope for local file import.
  • Reserve a separate, not-yet-created feature name (e.g. import-geotiff-gdal) for any future GDAL-backed path so the two backends never share a flag. This keeps the native-dependency build strictly opt-in and clearly labeled.
  • Default and --no-default-features builds must remain GeoTIFF-free; only cargo test --features import-geotiff and --all-features exercise it.

Fixture and test strategy

  • Synthetic only. Generate a tiny single-band elevation GeoTIFF (e.g. 3×3 or 4×4, signed/float elevation samples, a simple model tie point + pixel scale, a basic CRS tag) as a project-owned fixture. This mirrors the HGT approach in Task B1, which uses inline synthetic bytes.
  • Generation, not download. Prefer generating the fixture programmatically (a small build helper, a checked-in generator test, or a documented one-off). If a writer crate is needed, image's TIFF encoder or wbgeotiff can produce baseline bytes; keep any generator code clearly separate from the parser slice per the plan's commit strategy.
  • What to assert: parsed dimensions, elevation unit, a couple of known sample values, and that geo metadata maps into TerrainSourceMetadata (format "geotiff", width, height, ElevationUnit).
  • Malformed-input tests: truncated file, missing geo-tags, unsupported multi-band or compressed input → typed ImportError variants, matching the rejection style of Task B2.
  • No proprietary data, no large DEMs in the repo. A real USGS/NASA tile may be used locally for manual verification but must stay out of git (reference/, .work/ are already ignored).
  • The committed fixture should be only as large as the test needs (target: well under a kilobyte).

Implementation notes

The recommended pure-Rust path has landed. What was built:

  1. Feature + dependency. Cargo.toml declares import-geotiff = ["dep:geotiff-reader"]; geotiff-reader is an optional dependency built with its local feature and default-features = false, so the network-oriented cog feature stays off. Default and --no-default-features builds remain GeoTIFF-free.
  2. src/import/geotiff.rs. A cfg(feature = "import-geotiff") module exposing parse_geotiff_bytes(&[u8]) -> Result<ImportedTerrain, ImportError>, reusing the ImportedTerrain / ImportError / TerrainSourceMetadata types from Milestone A. It reads the payload in memory, rejects rasters that are not single-band, decodes the raster as f32 into a row-major HeightGrid, and records TerrainSourceMetadata with format "geotiff". Reader and decode failures map to ImportError::MalformedSource.
  3. Synthetic fixtures only. No GeoTIFF binary is committed. Tests generate a tiny single-band elevation tile in memory with the geotiff-writer dev-dependency (alongside ndarray) and parse it back — mirroring the inline-bytes approach used for HGT. Real USGS/NASA tiles may be used for local manual verification only, under the already-ignored reference/ and .work/ directories.
  4. Tests. cargo test geotiff --features import-geotiff asserts dimensions, sample values, and metadata, and checks that non-GeoTIFF input yields a typed ImportError::MalformedSource.
  5. Supported subset. Only the narrow single-band f32 case is parsed; broader real-world GeoTIFF coverage (multi-band, exotic compression, full geo-tag interpretation) remains future work.
  6. openvistapro info. supported_importers() and info_text() list geotiff only when the feature is built, keeping info honest.
  7. GDAL is not part of the landed feature. The importer has zero native dependencies; a GDAL-backed path stays out of scope until a separate review approves the native-dependency cost, and would use its own opt-in feature name rather than import-geotiff.

Not yet done: CLI render plumbing. A render --import-geotiff <path> input mirroring the HGT pattern (Task B3) is still future work; the parser is reachable through the library API and reported by info, but not yet wired into render.

If geotiff-reader later proves insufficient (missing geo-tag coverage, API gaps), the documented fallback order remains georasterwbgeotiff → hand-rolled geo-tag reading on top of image's TIFF decoder.

Sources

Survey performed via cargo info; verify current versions before implementation.