Files
openvistapro/docs/research/geotiff-import-strategy.md
2026-05-16 16:00:51 -04:00

202 lines
13 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# GeoTIFF Import Strategy Research Note
> **Status:** Decision implemented. The recommended pure-Rust path has landed
> as the optional `import-geotiff` feature; the crate survey below is kept for
> provenance.
>
> **Context:** Phase 4 Milestone C, Task C3 ("GeoTIFF importer behind an
> optional feature"). This note recorded the crate survey and recommended
> direction; the "Implementation notes" section below describes what was
> actually built.
## Recommendation
**Start with a pure-Rust, opt-in GeoTIFF read experiment behind an
`import-geotiff` feature, using `geotiff-reader` (with its `geotiff-core`
support crate) as the first parser.** Do not pull GDAL into the default build,
the test suite, or CI. Treat any GDAL-backed path as future, separately
reviewed work.
Rationale:
- **No native dependency in the default build.** Phase 4 explicitly requires
default builds to stay clean and fast (`cargo fmt --check`, `cargo test`,
`cargo clippy`, `git diff --check`). A pure-Rust crate keeps GeoTIFF support
fully optional and removes the C toolchain / `libgdal` install burden from
contributors and CI.
- **License compatibility.** `geotiff-reader` is `MIT OR Apache-2.0`, matching
the project's clean-room, permissively licensed posture. GDAL itself is
MIT-licensed at the library level, but a vendored/native build drags in a
large transitive dependency surface that is harder to audit.
- **Scope fit.** The importer only needs to produce a `HeightGrid` plus
`TerrainSourceMetadata` from a single-band elevation raster. That is a narrow
read-only subset; a full GDAL binding is overkill for the spike.
- **Reversible.** Keeping GeoTIFF behind a feature flag means the crate choice
can be swapped (or upgraded to GDAL) later without disturbing default users.
The first slice should parse a **tiny generated synthetic GeoTIFF fixture**
only — no proprietary VistaPro data, no large downloaded DEMs.
## Decision matrix
Facts below are from `cargo info` at spike time. "GeoTIFF semantics" means the
crate understands GeoTIFF's geo-tags (model tie point / pixel scale, CRS,
elevation band), not just baseline TIFF tags.
### Pure-Rust candidates
| Crate | Version | License | MSRV | Native deps | GeoTIFF semantics | Notes |
|---|---|---|---|---|---|---|
| `geotiff-reader` (+ `geotiff-core`) | 0.4.0 | MIT OR Apache-2.0 | rust 1.77 | None | Yes (read-only) | **Recommended.** Pure Rust, read-only; optional `cog` feature pulls `reqwest` for Cloud-Optimized GeoTIFF over HTTP — leave `cog` off. |
| `tiff-reader` | 0.4.0 | MIT OR Apache-2.0 | — | None ("no C deps") | Partial / TIFF-level | Pure-Rust TIFF reader, no C deps; useful fallback or low-level reader, but not a full GeoTIFF semantics layer on its own. |
| `geotiff` (georust, older) | 0.1.0 | MIT | rust 1.70 | None | Yes (limited) | Older, low-version georust crate; narrow API and stale. Acceptable fallback only if `geotiff-reader` regresses. |
| `image` TIFF (`image-rs/tiff`) | 0.10.3 | MIT | rust 1.74 | None | No | Robust pure-Rust TIFF decode/encode, already adjacent to the project's `image` use, but it does **not** interpret GeoTIFF geo-tags. Could back a hand-rolled geo-tag reader if needed. |
| `wbraster` | 0.1.5 | MIT OR Apache-2.0 | — | None (Rust) | Raster I/O | Whitebox raster abstraction; broad raster model, early-stage (0.1.x), heavier than the spike needs. |
| `wbgeotiff` | 0.1.2 | MIT OR Apache-2.0 | — | None (Rust) | Yes | Whitebox GeoTIFF reader/writer; very early (0.1.2). Watch as an alternative if `geotiff-reader` stalls. |
| `georaster` | 0.2.0 | MIT/Apache-2.0 | — | None | Yes (raster) | Pure-Rust georeferenced raster reader; reasonable secondary option, still 0.2.x. |
### GDAL-backed candidates
| Crate | Version | License | MSRV | Native deps | Notes |
|---|---|---|---|---|---|
| `gdal` | 0.19.0 | MIT | rust 1.80 | **Yes** — system `libgdal`, or `gdal-src`/`bindgen` to build from source | Mature, full-format coverage. Native GDAL is the main risk: install friction, version skew, large audit surface, brittle CI. Optional `gdal-src` vendoring trades that for long C/C++ build times and a `bindgen`/Clang requirement. |
| `gdal-sys` | 0.12.0 | MIT | rust 1.77 | **Yes** — raw FFI bindings to `libgdal` | Low-level; never used directly here. Same native risk as `gdal`. |
| `rstiff` | 0.2.0 | MIT | — | **Yes** — "powered by GDAL" | A GeoTIFF-focused wrapper, but still GDAL-backed, so it carries the same native dependency risk; early-stage (0.2.x). |
| `oxigdal-geotiff` | 0.1.4 | Apache-2.0 | rust 1.85 | None declared; pure-Rust OxiGDAL driver | Newest entrant, high MSRV (rust 1.85), very early (0.1.4). Too immature and too new-MSRV for the spike even though it avoids native GDAL. |
## Licensing assessment
- Project posture: clean-room, open-source, permissively licensed (see
[`docs/legal/asset-policy.md`](../legal/asset-policy.md)).
- `geotiff-reader` / `geotiff-core`, `tiff-reader`, `wbraster`, `wbgeotiff`, and
`georaster` are all `MIT OR Apache-2.0` (or `MIT/Apache-2.0`) — fully
compatible.
- `image`'s TIFF crate, `geotiff` (georust), `gdal`, `gdal-sys`, and `rstiff`
are MIT — compatible.
- `oxigdal-geotiff` is Apache-2.0 only — compatible, but it removes the dual
MIT/Apache option for downstream consumers of that path.
- No copyleft crate appears in the survey. The recommended `geotiff-reader`
path keeps the dual `MIT OR Apache-2.0` license clean throughout.
- **Fixtures, not crates, are the real licensing hazard.** Per
`asset-policy.md`, do not commit proprietary sample landscapes or data.
GeoTIFF test inputs must be synthetic and generated by this project, or
public-domain (USGS/NASA) and redistributed under documented source terms.
## Native dependency risk
- A GDAL-backed crate (`gdal`, `gdal-sys`, `rstiff`) requires either a system
`libgdal` (version drift across developer machines, distros, and CI) or a
source build via `gdal-src` (long C/C++ compile, `bindgen` + Clang
toolchain). Either path makes default builds slow or fragile and enlarges the
audited dependency surface.
- Phase 4's plan (Task C3) explicitly anticipates this: "If GDAL is required,
keep the feature opt-in and document native setup," and the cross-milestone
checks expect `cargo test --features import-geotiff` to either pass *or*
produce "a documented skip if native GDAL is not installed."
- **Mitigation:** the recommended pure-Rust path has **zero** native
dependencies, so `import-geotiff` builds and tests run anywhere `cargo` runs.
The "documented skip" escape hatch is then unnecessary for the chosen path
and is reserved only for a hypothetical future GDAL feature.
## Feature flag plan
Consistent with Task C1's existing feature scaffold (`import-dem`,
`import-hgt`, `import-geotiff` under `default = []`):
- `import-geotiff`**the spike target.** Enables the pure-Rust
`geotiff-reader` + `geotiff-core` dependency and the `src/import/geotiff.rs`
module. Off by default.
- Do **not** enable `geotiff-reader`'s optional `cog` feature; it pulls
`reqwest` and network/TLS dependencies that are out of scope for local file
import.
- Reserve a separate, *not-yet-created* feature name (e.g. `import-geotiff-gdal`)
for any future GDAL-backed path so the two backends never share a flag. This
keeps the native-dependency build strictly opt-in and clearly labeled.
- Default and `--no-default-features` builds must remain GeoTIFF-free; only
`cargo test --features import-geotiff` and `--all-features` exercise it.
## Fixture and test strategy
- **Synthetic only.** Generate a tiny single-band elevation GeoTIFF (e.g. 3×3
or 4×4, signed/float elevation samples, a simple model tie point + pixel
scale, a basic CRS tag) as a project-owned fixture. This mirrors the HGT
approach in Task B1, which uses inline synthetic bytes.
- **Generation, not download.** Prefer generating the fixture programmatically
(a small build helper, a checked-in generator test, or a documented
one-off). If a writer crate is needed, `image`'s TIFF encoder or `wbgeotiff`
can produce baseline bytes; keep any generator code clearly separate from the
parser slice per the plan's commit strategy.
- **What to assert:** parsed dimensions, elevation unit, a couple of known
sample values, and that geo metadata maps into `TerrainSourceMetadata`
(format `"geotiff"`, width, height, `ElevationUnit`).
- **Malformed-input tests:** truncated file, missing geo-tags, unsupported
multi-band or compressed input → typed `ImportError` variants, matching the
rejection style of Task B2.
- **No proprietary data, no large DEMs in the repo.** A real USGS/NASA tile may
be used locally for manual verification but must stay out of git
(`reference/`, `.work/` are already ignored).
- The committed fixture should be only as large as the test needs (target:
well under a kilobyte).
## Implementation notes
The recommended pure-Rust path has landed. What was built:
1. **Feature + dependency.** `Cargo.toml` declares
`import-geotiff = ["dep:geotiff-reader"]`; `geotiff-reader` is an optional
dependency built with its `local` feature and `default-features = false`, so
the network-oriented `cog` feature stays off. Default and
`--no-default-features` builds remain GeoTIFF-free.
2. **`src/import/geotiff.rs`.** A `cfg(feature = "import-geotiff")` module
exposing `parse_geotiff_bytes(&[u8]) -> Result<ImportedTerrain, ImportError>`,
reusing the `ImportedTerrain` / `ImportError` / `TerrainSourceMetadata` types
from Milestone A. It reads the payload in memory, rejects rasters that are
not single-band, decodes the raster as `f32` into a row-major `HeightGrid`,
and records `TerrainSourceMetadata` with format `"geotiff"`. Reader and
decode failures map to `ImportError::MalformedSource`.
3. **Synthetic fixtures only.** No GeoTIFF binary is committed. Tests generate a
tiny single-band elevation tile in memory with the `geotiff-writer`
dev-dependency (alongside `ndarray`) and parse it back — mirroring the
inline-bytes approach used for HGT. Real USGS/NASA tiles may be used for
local manual verification only, under the already-ignored `reference/` and
`.work/` directories.
4. **Tests.** `cargo test geotiff --features import-geotiff` asserts dimensions,
sample values, and metadata, and checks that non-GeoTIFF input yields a typed
`ImportError::MalformedSource`.
5. **Supported subset.** Only the narrow single-band `f32` case is parsed;
broader real-world GeoTIFF coverage (multi-band, exotic compression, full
geo-tag interpretation) remains future work.
6. **`openvistapro info`.** `supported_importers()` and `info_text()` list
`geotiff` only when the feature is built, keeping `info` honest.
7. **GDAL is not part of the landed feature.** The importer has zero native
dependencies; a GDAL-backed path stays out of scope until a separate review
approves the native-dependency cost, and would use its own opt-in feature
name rather than `import-geotiff`.
**Not yet done:** CLI render plumbing. A `render --import-geotiff <path>` input
mirroring the HGT pattern (Task B3) is still future work; the parser is reachable
through the library API and reported by `info`, but not yet wired into `render`.
If `geotiff-reader` later proves insufficient (missing geo-tag coverage, API
gaps), the documented fallback order remains `georaster``wbgeotiff`
hand-rolled geo-tag reading on top of `image`'s TIFF decoder.
## Sources
Survey performed via `cargo info`; verify current versions before
implementation.
- `geotiff-reader` — <https://crates.io/crates/geotiff-reader> · <https://docs.rs/geotiff-reader>
- `geotiff-core` — <https://crates.io/crates/geotiff-core> · <https://docs.rs/geotiff-core>
- `tiff-reader` — <https://crates.io/crates/tiff-reader> · <https://docs.rs/tiff-reader>
- `geotiff` (georust) — <https://crates.io/crates/geotiff> · <https://docs.rs/geotiff> · <https://github.com/georust/geotiff>
- `image` TIFF crate — <https://crates.io/crates/tiff> · <https://docs.rs/tiff> · <https://github.com/image-rs/image-tiff>
- `georaster` — <https://crates.io/crates/georaster> · <https://docs.rs/georaster>
- `wbraster` — <https://crates.io/crates/wbraster> · <https://docs.rs/wbraster>
- `wbgeotiff` — <https://crates.io/crates/wbgeotiff> · <https://docs.rs/wbgeotiff>
- `oxigdal-geotiff` — <https://crates.io/crates/oxigdal-geotiff> · <https://docs.rs/oxigdal-geotiff>
- `gdal` — <https://crates.io/crates/gdal> · <https://docs.rs/gdal> · <https://github.com/georust/gdal>
- `gdal-sys` — <https://crates.io/crates/gdal-sys> · <https://docs.rs/gdal-sys>
- `rstiff` — <https://crates.io/crates/rstiff> · <https://docs.rs/rstiff>
- Project constraints — [`docs/legal/asset-policy.md`](../legal/asset-policy.md), [`docs/plans/phase-4-formats-scripts-ui.md`](../plans/phase-4-formats-scripts-ui.md)