docs: evaluate GeoTIFF importer strategy #4

Merged
moldybits merged 1 commits from spike/geotiff-import-strategy-t_a4a59fc5 into main 2026-05-15 21:12:49 -04:00
Showing only changes of commit 03c25d9b62 - Show all commits
+195
View File
@@ -0,0 +1,195 @@
# GeoTIFF Import Strategy Research Note
> **Status:** Spike / decision note. Documentation-only — no code or `Cargo.toml`
> changes accompany this note.
>
> **Context:** Phase 4 Milestone C, Task C3 ("Add GeoTIFF spike behind an
> optional feature"). This note records the crate survey and the recommended
> direction so the implementation slice can start from a settled decision.
## Recommendation
**Start with a pure-Rust, opt-in GeoTIFF read experiment behind an
`import-geotiff` feature, using `geotiff-reader` (with its `geotiff-core`
support crate) as the first parser.** Do not pull GDAL into the default build,
the test suite, or CI. Treat any GDAL-backed path as future, separately
reviewed work.
Rationale:
- **No native dependency in the default build.** Phase 4 explicitly requires
default builds to stay clean and fast (`cargo fmt --check`, `cargo test`,
`cargo clippy`, `git diff --check`). A pure-Rust crate keeps GeoTIFF support
fully optional and removes the C toolchain / `libgdal` install burden from
contributors and CI.
- **License compatibility.** `geotiff-reader` is `MIT OR Apache-2.0`, matching
the project's clean-room, permissively licensed posture. GDAL itself is
MIT-licensed at the library level, but a vendored/native build drags in a
large transitive dependency surface that is harder to audit.
- **Scope fit.** The importer only needs to produce a `HeightGrid` plus
`TerrainSourceMetadata` from a single-band elevation raster. That is a narrow
read-only subset; a full GDAL binding is overkill for the spike.
- **Reversible.** Keeping GeoTIFF behind a feature flag means the crate choice
can be swapped (or upgraded to GDAL) later without disturbing default users.
The first slice should parse a **tiny generated synthetic GeoTIFF fixture**
only — no proprietary VistaPro data, no large downloaded DEMs.
## Decision matrix
Facts below are from `cargo info` at spike time. "GeoTIFF semantics" means the
crate understands GeoTIFF's geo-tags (model tie point / pixel scale, CRS,
elevation band), not just baseline TIFF tags.
### Pure-Rust candidates
| Crate | Version | License | MSRV | Native deps | GeoTIFF semantics | Notes |
|---|---|---|---|---|---|---|
| `geotiff-reader` (+ `geotiff-core`) | 0.4.0 | MIT OR Apache-2.0 | rust 1.77 | None | Yes (read-only) | **Recommended.** Pure Rust, read-only; optional `cog` feature pulls `reqwest` for Cloud-Optimized GeoTIFF over HTTP — leave `cog` off. |
| `tiff-reader` | 0.4.0 | MIT OR Apache-2.0 | — | None ("no C deps") | Partial / TIFF-level | Pure-Rust TIFF reader, no C deps; useful fallback or low-level reader, but not a full GeoTIFF semantics layer on its own. |
| `geotiff` (georust, older) | 0.1.0 | MIT | rust 1.70 | None | Yes (limited) | Older, low-version georust crate; narrow API and stale. Acceptable fallback only if `geotiff-reader` regresses. |
| `image` TIFF (`image-rs/tiff`) | 0.10.3 | MIT | rust 1.74 | None | No | Robust pure-Rust TIFF decode/encode, already adjacent to the project's `image` use, but it does **not** interpret GeoTIFF geo-tags. Could back a hand-rolled geo-tag reader if needed. |
| `wbraster` | 0.1.5 | MIT OR Apache-2.0 | — | None (Rust) | Raster I/O | Whitebox raster abstraction; broad raster model, early-stage (0.1.x), heavier than the spike needs. |
| `wbgeotiff` | 0.1.2 | MIT OR Apache-2.0 | — | None (Rust) | Yes | Whitebox GeoTIFF reader/writer; very early (0.1.2). Watch as an alternative if `geotiff-reader` stalls. |
| `georaster` | 0.2.0 | MIT/Apache-2.0 | — | None | Yes (raster) | Pure-Rust georeferenced raster reader; reasonable secondary option, still 0.2.x. |
### GDAL-backed candidates
| Crate | Version | License | MSRV | Native deps | Notes |
|---|---|---|---|---|---|
| `gdal` | 0.19.0 | MIT | rust 1.80 | **Yes** — system `libgdal`, or `gdal-src`/`bindgen` to build from source | Mature, full-format coverage. Native GDAL is the main risk: install friction, version skew, large audit surface, brittle CI. Optional `gdal-src` vendoring trades that for long C/C++ build times and a `bindgen`/Clang requirement. |
| `gdal-sys` | 0.12.0 | MIT | rust 1.77 | **Yes** — raw FFI bindings to `libgdal` | Low-level; never used directly here. Same native risk as `gdal`. |
| `rstiff` | 0.2.0 | MIT | — | **Yes** — "powered by GDAL" | A GeoTIFF-focused wrapper, but still GDAL-backed, so it carries the same native dependency risk; early-stage (0.2.x). |
| `oxigdal-geotiff` | 0.1.4 | Apache-2.0 | rust 1.85 | None declared; pure-Rust OxiGDAL driver | Newest entrant, high MSRV (rust 1.85), very early (0.1.4). Too immature and too new-MSRV for the spike even though it avoids native GDAL. |
## Licensing assessment
- Project posture: clean-room, open-source, permissively licensed (see
[`docs/legal/asset-policy.md`](../legal/asset-policy.md)).
- `geotiff-reader` / `geotiff-core`, `tiff-reader`, `wbraster`, `wbgeotiff`, and
`georaster` are all `MIT OR Apache-2.0` (or `MIT/Apache-2.0`) — fully
compatible.
- `image`'s TIFF crate, `geotiff` (georust), `gdal`, `gdal-sys`, and `rstiff`
are MIT — compatible.
- `oxigdal-geotiff` is Apache-2.0 only — compatible, but it removes the dual
MIT/Apache option for downstream consumers of that path.
- No copyleft crate appears in the survey. The recommended `geotiff-reader`
path keeps the dual `MIT OR Apache-2.0` license clean throughout.
- **Fixtures, not crates, are the real licensing hazard.** Per
`asset-policy.md`, do not commit proprietary sample landscapes or data.
GeoTIFF test inputs must be synthetic and generated by this project, or
public-domain (USGS/NASA) and redistributed under documented source terms.
## Native dependency risk
- A GDAL-backed crate (`gdal`, `gdal-sys`, `rstiff`) requires either a system
`libgdal` (version drift across developer machines, distros, and CI) or a
source build via `gdal-src` (long C/C++ compile, `bindgen` + Clang
toolchain). Either path makes default builds slow or fragile and enlarges the
audited dependency surface.
- Phase 4's plan (Task C3) explicitly anticipates this: "If GDAL is required,
keep the feature opt-in and document native setup," and the cross-milestone
checks expect `cargo test --features import-geotiff` to either pass *or*
produce "a documented skip if native GDAL is not installed."
- **Mitigation:** the recommended pure-Rust path has **zero** native
dependencies, so `import-geotiff` builds and tests run anywhere `cargo` runs.
The "documented skip" escape hatch is then unnecessary for the chosen path
and is reserved only for a hypothetical future GDAL feature.
## Feature flag plan
Consistent with Task C1's existing feature scaffold (`import-dem`,
`import-hgt`, `import-geotiff` under `default = []`):
- `import-geotiff`**the spike target.** Enables the pure-Rust
`geotiff-reader` + `geotiff-core` dependency and the `src/import/geotiff.rs`
module. Off by default.
- Do **not** enable `geotiff-reader`'s optional `cog` feature; it pulls
`reqwest` and network/TLS dependencies that are out of scope for local file
import.
- Reserve a separate, *not-yet-created* feature name (e.g. `import-geotiff-gdal`)
for any future GDAL-backed path so the two backends never share a flag. This
keeps the native-dependency build strictly opt-in and clearly labeled.
- Default and `--no-default-features` builds must remain GeoTIFF-free; only
`cargo test --features import-geotiff` and `--all-features` exercise it.
## Fixture and test strategy
- **Synthetic only.** Generate a tiny single-band elevation GeoTIFF (e.g. 3×3
or 4×4, signed/float elevation samples, a simple model tie point + pixel
scale, a basic CRS tag) as a project-owned fixture. This mirrors the HGT
approach in Task B1, which uses inline synthetic bytes.
- **Generation, not download.** Prefer generating the fixture programmatically
(a small build helper, a checked-in generator test, or a documented
one-off). If a writer crate is needed, `image`'s TIFF encoder or `wbgeotiff`
can produce baseline bytes; keep any generator code clearly separate from the
parser slice per the plan's commit strategy.
- **What to assert:** parsed dimensions, elevation unit, a couple of known
sample values, and that geo metadata maps into `TerrainSourceMetadata`
(format `"geotiff"`, width, height, `ElevationUnit`).
- **Malformed-input tests:** truncated file, missing geo-tags, unsupported
multi-band or compressed input → typed `ImportError` variants, matching the
rejection style of Task B2.
- **No proprietary data, no large DEMs in the repo.** A real USGS/NASA tile may
be used locally for manual verification but must stay out of git
(`reference/`, `.work/` are already ignored).
- The committed fixture should be only as large as the test needs (target:
well under a kilobyte).
## Follow-up implementation slice
Concrete next slice for Task C3, to be done TDD (RED → minimal GREEN →
validation) on a separate implementation branch:
1. **Add the feature + dependency.** In `Cargo.toml`, make `geotiff-reader` an
optional dependency gated by `import-geotiff` (default `cog` feature
disabled). Confirm `cargo test --no-default-features` and the default build
stay GeoTIFF-free.
2. **Create `src/import/geotiff.rs`.** A `cfg(feature = "import-geotiff")`
module exposing something like
`parse_geotiff_bytes(&[u8]) -> Result<ImportedTerrain, ImportError>`,
reusing the `ImportedTerrain` / `ImportError` / `TerrainSourceMetadata`
types from Milestone A.
3. **Generate the synthetic fixture.** Add a tiny generated single-band
elevation GeoTIFF as a project-owned fixture (separate commit from the
parser, per the plan's commit strategy).
4. **RED test.** `cargo test geotiff --features import-geotiff` — assert
dimensions, sample values, and metadata; assert malformed input yields a
typed error.
5. **Minimal GREEN.** Parse only the tiny supported subset (single band,
uncompressed, known sample type). Reject everything else explicitly;
document that broad real-world GeoTIFF coverage is future work.
6. **CLI plumbing (optional, can be a later slice).** Mirror the HGT pattern
(Task B3): an `--import-geotiff <path>` render input, gated so it only
appears when the feature is built.
7. **Validation.** `cargo test --no-default-features`,
`cargo test --features import-geotiff`, `cargo test --all-features`,
`cargo fmt --check`, `cargo clippy --all-targets -- -D warnings`,
`git diff --check`.
8. **Update `openvistapro info`** to report GeoTIFF support honestly — only
once a real parser and fixture exist (consistent with Task A3).
If `geotiff-reader` proves insufficient during the spike (missing geo-tag
coverage, API gaps), fall back in this order: `georaster``wbgeotiff`
hand-rolled geo-tag reading on top of `image`'s TIFF decoder. A GDAL-backed
path stays out of scope until a separate review approves the native-dependency
cost.
## Sources
Survey performed via `cargo info`; verify current versions before
implementation.
- `geotiff-reader` — <https://crates.io/crates/geotiff-reader> · <https://docs.rs/geotiff-reader>
- `geotiff-core` — <https://crates.io/crates/geotiff-core> · <https://docs.rs/geotiff-core>
- `tiff-reader` — <https://crates.io/crates/tiff-reader> · <https://docs.rs/tiff-reader>
- `geotiff` (georust) — <https://crates.io/crates/geotiff> · <https://docs.rs/geotiff> · <https://github.com/georust/geotiff>
- `image` TIFF crate — <https://crates.io/crates/tiff> · <https://docs.rs/tiff> · <https://github.com/image-rs/image-tiff>
- `georaster` — <https://crates.io/crates/georaster> · <https://docs.rs/georaster>
- `wbraster` — <https://crates.io/crates/wbraster> · <https://docs.rs/wbraster>
- `wbgeotiff` — <https://crates.io/crates/wbgeotiff> · <https://docs.rs/wbgeotiff>
- `oxigdal-geotiff` — <https://crates.io/crates/oxigdal-geotiff> · <https://docs.rs/oxigdal-geotiff>
- `gdal` — <https://crates.io/crates/gdal> · <https://docs.rs/gdal> · <https://github.com/georust/gdal>
- `gdal-sys` — <https://crates.io/crates/gdal-sys> · <https://docs.rs/gdal-sys>
- `rstiff` — <https://crates.io/crates/rstiff> · <https://docs.rs/rstiff>
- Project constraints — [`docs/legal/asset-policy.md`](../legal/asset-policy.md), [`docs/plans/phase-4-formats-scripts-ui.md`](../plans/phase-4-formats-scripts-ui.md)