Title: | Hydrofabric Subsetter |
---|---|
Description: | Subset Hydrofabric Data in R. |
Authors: | Mike Johnson [aut, cre], Justin Singh-Mohudpur [aut] |
Maintainer: | Mike Johnson <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.3.2 |
Built: | 2025-01-10 01:26:27 UTC |
Source: | https://github.com/lynker-spatial/hfsubsetR |
A lazy data frame for GDAL vector data sources. as_ogr is DBI compatible and designed to work with dplyr.
as_ogr(x, layer, ..., query = NA, ignore_lyrs = "gpkg_|rtree_|sqlite_") ## S3 method for class 'character' as_ogr(x, layer, ..., query = NA, ignore_lyrs = "gpkg_|rtree_|sqlite_") ## S3 method for class 'OGRSQLConnection' as_ogr(x, layer, ..., query = NA, ignore_lyrs = "gpkg_|rtree_|sqlite_")
as_ogr(x, layer, ..., query = NA, ignore_lyrs = "gpkg_|rtree_|sqlite_") ## S3 method for class 'character' as_ogr(x, layer, ..., query = NA, ignore_lyrs = "gpkg_|rtree_|sqlite_") ## S3 method for class 'OGRSQLConnection' as_ogr(x, layer, ..., query = NA, ignore_lyrs = "gpkg_|rtree_|sqlite_")
x |
the data source (file path, url, or database connection) |
layer |
layer name (varies by driver, may be a file name without
extension); in case |
... |
parameter(s) passed on to st_as_sf |
query |
SQL query to pass in directly |
ignore_lyrs |
pattern for layers to be ignored description |
The output of ‘as_ogr()' is a ’tbl_OGRSQLConnection' that extends 'tbl_dbi' and may be used with functions and workflows in the normal DBI way, see [OGRSQL()] for the as_ogr DBI support.
To obtain an in memory data frame use an explict 'collect()' or 'st_as_sf()'. A call to 'collect()' is triggered by 'st_as_sf()' and will add the sf class to the output.
a 'tbl_OGRSQLConnection'
dbConnect for sources that can be read by package sf
## S4 method for signature 'OGRSQLDriver' dbConnect(drv, DSN = "", readonly = TRUE, ...)
## S4 method for signature 'OGRSQLDriver' dbConnect(drv, DSN = "", readonly = TRUE, ...)
drv |
OGRSQLDriver created by |
DSN |
data source name |
readonly |
open in readonly mode ('TRUE' is the only option) |
... |
ignored |
The 'OGRSQL' available is documented with GDAL: https://gdal.org/user/ogr_sql_dialect.html
Find an origin from indexed IDs
find_origin( network, id, type = c("id", "comid", "hl_uri", "poi_id", "nldi_feature", "xy") )
find_origin( network, id, type = c("id", "comid", "hl_uri", "poi_id", "nldi_feature", "xy") )
network |
A 'dplyr'-compatible object. |
id |
A queryable identifier of type 'type'. |
type |
An index type describing 'id'. |
A network origin. If a single origin is not found, then an exception is raised.
Downloads a hydrofabric Geopackage from a specified URL and saves it to a local file.
get_hydrofabric( url = "https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric", version = "2.2", domain = "conus", type = "nextgen", outfile = NULL, overwrite = FALSE )
get_hydrofabric( url = "https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric", version = "2.2", domain = "conus", type = "nextgen", outfile = NULL, overwrite = FALSE )
url |
A character string specifying the base URL of the hydrofabric repository. Defaults to ''https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric''. |
version |
A character string indicating the version of the hydrofabric to download. Defaults to ''2.2''. |
domain |
A character string specifying the geographic domain of the hydrofabric. Defaults to ''conus''. |
type |
A character string indicating the type of hydrofabric. Defaults to ''nextgen''. |
outfile |
A character string specifying the path to save the downloaded file. If 'NULL', the file will not be saved. Defaults to 'NULL'. |
overwrite |
A logical value indicating whether to overwrite an existing file. Defaults to 'FALSE'. |
The function returns the path to the downloaded file ('outfile').
## Not run: # Download the default hydrofabric file get_hydrofabric(outfile = "conus_nextgen.gpkg") # Specify a different domain and version get_hydrofabric( version = "3.0", domain = "hawaii", outfile = "hawaii_nextgen.gpkg", overwrite = TRUE ) ## End(Not run)
## Not run: # Download the default hydrofabric file get_hydrofabric(outfile = "conus_nextgen.gpkg") # Specify a different domain and version get_hydrofabric( version = "3.0", domain = "hawaii", outfile = "hawaii_nextgen.gpkg", overwrite = TRUE ) ## End(Not run)
Build a hydrofabric subset
get_subset( id = NULL, comid = NULL, hl_uri = NULL, poi_id = NULL, nldi_feature = NULL, xy = NULL, lyrs = c("divides", "flowpaths", "network", "nexus"), gpkg = NULL, source = "s3://lynker-spatial/hydrofabric", hf_version = "2.2", type = "nextgen", domain = "conus", outfile = NULL, overwrite = FALSE )
get_subset( id = NULL, comid = NULL, hl_uri = NULL, poi_id = NULL, nldi_feature = NULL, xy = NULL, lyrs = c("divides", "flowpaths", "network", "nexus"), gpkg = NULL, source = "s3://lynker-spatial/hydrofabric", hf_version = "2.2", type = "nextgen", domain = "conus", outfile = NULL, overwrite = FALSE )
id |
hydrofabric id. datatype: string / vector of strings e.g., 'wb-10026' or c('wb-10026', 'wb-10355') |
comid |
NHDPlusV2 COMID. datatype: int / vector of int e.g., 61297116 or c(61297116 , 6129261) |
hl_uri |
hydrolocation URI. datatype: string / vector of string / a url e.g., HUC12-010100100101 or c(HUC12-010100100101 , HUC12-010100110104) |
poi_id |
POI identifier. datatype: int / vector of int e.g., 266387 or c(266387, 266745) |
nldi_feature |
list with names 'featureSource' and 'featureID' where 'featureSource' is derived from the "source" column of the response of dataRetrieval::get_nldi_sources() and the 'featureID' is a known identifier from the specified 'featureSource'. datatype: a url e.g., 'https://labs.waterdata.usgs.gov/api/nldi/linked-data/census2020-nhdpv2' |
xy |
Location given as vector of XY in EPSG:4326 (longitude, latitude, crs) |
lyrs |
layers to extract |
gpkg |
a local gpkg file |
source |
hydrofabric source (local root directory or s3 link) |
hf_version |
hydrofabric version |
type |
hydrofabric type |
domain |
hydrofabric domain |
outfile |
If gpkg file path is provided, data will be written to a file. |
overwrite |
overwrite existing outfile file path. Default is FALSE |
Retrieve and Process Vector Processing Unit (VPU) Hydrofabric Layers
This function retrieves and optionally filters spatial data layers from a GeoPackage (GPKG) based on a specified Vector Processing Unit ID (VPU ID). The function can either return the filtered layers as a list or write them to an output file.
get_vpu_fabric(gpkg, vpuid = NULL, outfile = NULL)
get_vpu_fabric(gpkg, vpuid = NULL, outfile = NULL)
gpkg |
A string specifying the path to the GeoPackage file. |
vpuid |
A vector of VPU IDs to filter the layers. If 'NULL', no filtering is applied. Default is 'NULL'. |
outfile |
A string specifying the path to write the filtered layers to a new GeoPackage. If 'NULL', the layers are returned as a list. Default is 'NULL'. |
The function reads all layers from the provided GeoPackage, excluding the "error" layer. For each layer, the data is optionally filtered by the provided 'vpuid' and then processed into 'sf' objects. If an output file path is provided, the filtered layers are written to a new GeoPackage. Otherwise, the layers are stored in a list and returned.
If 'outfile' is 'NULL', returns a list where each element is a filtered spatial layer ('sf' object). If 'outfile' is provided, returns the path to the output GeoPackage.
## Not run: # Example 1: Retrieve filtered layers as a list fabric <- get_vpu_fabric("path/to/geopackage.gpkg", vpuid = c("01", "02")) # Example 2: Write filtered layers to a new GeoPackage get_vpu_fabric("path/to/geopackage.gpkg", vpuid = c("01", "02"), outfile = "output.gpkg") ## End(Not run)
## Not run: # Example 1: Retrieve filtered layers as a list fabric <- get_vpu_fabric("path/to/geopackage.gpkg", vpuid = c("01", "02")) # Example 2: Write filtered layers to a new GeoPackage get_vpu_fabric("path/to/geopackage.gpkg", vpuid = c("01", "02"), outfile = "output.gpkg") ## End(Not run)
OGRSQL OGRSQL driver, use to [dbConnect()] to a data source readable by sf
OGRSQL()
OGRSQL()
Initialize a new hfsubset query
query()
query()
A 'hfsubset_query' object
Set the identifier of a query
query_set_id( query, identifier, type = c("id", "comid", "hl_uri", "poi_id", "nldi_feature", "xy") )
query_set_id( query, identifier, type = c("id", "comid", "hl_uri", "poi_id", "nldi_feature", "xy") )
query |
A 'hfsubset_query' object |
identifier |
Identifier value |
type |
Identifier type |
'query' with the identifier included
Set the layers of a query
query_set_layers(query, layers)
query_set_layers(query, layers)
query |
A 'hfsubset_query' object |
layers |
A 'character' vector of layer names |
'query' with the layers included
Set the sink of a query
query_set_sink(query, sink, overwrite = FALSE)
query_set_sink(query, sink, overwrite = FALSE)
query |
A 'hfsubset_query' object |
sink |
A character path to sink |
overwrite |
If TRUE, then if the sink exists, it should be overwritten |
'query' with the sink included
Set the source of a query
query_set_source(query, src)
query_set_source(query, src)
query |
A 'hfsubset_query' object |
src |
A 'hfsubset_query_source' object |
'query' with the source included
query_source_arrow query_source_sf
Create a new 'arrow' query source.
query_source_arrow(srcname, ...)
query_source_arrow(srcname, ...)
srcname |
URI to 'arrow'-compatible dataset |
... |
Unused |
An 'hfsubset_query_source_arrow' object
Create a new 'sf' query source.
query_source_sf(srcname, ...)
query_source_sf(srcname, ...)
srcname |
Path or VSI URI to source |
... |
Unused |
An 'hfsubset_query_source_sf' object
Execute a query subset
query_subset(query)
query_subset(query)
query |
A 'hfsubset_query' object |
A list of hydrofabric layers, or the path to the sink of the query
Read an Arrow multi-file dataset and create sf
object
read_sf_dataset(dataset, find_geom = FALSE)
read_sf_dataset(dataset, find_geom = FALSE)
dataset |
a |
find_geom |
logical. Only needed when returning a subset of columns.
Should all available geometry columns be selected and added to to the
dataset query without being named? Default is |
This function is primarily for use after opening a dataset with
arrow::open_dataset
. Users can then query the arrow Dataset
using dplyr
methods such as filter
or
select
. Passing the resulting query to this function
will parse the datasets and create an sf
object. The function
expects consistent geographic metadata to be stored with the dataset in
order to create sf
objects.
Adopted from wcjochem/sfarrow
object of class sf
open_dataset
, st_read
, st_read_parquet
Force collection of a OGR query Convert as_ogr to a data frame or sf object
## S3 method for class 'tbl_OGRSQLConnection' st_as_sf(x, ...)
## S3 method for class 'tbl_OGRSQLConnection' st_as_sf(x, ...)
x |
output of [as_ogr()] |
... |
passed to [collect()] |
a data frame from 'collect()', sf data frame from 'st_as_sf()' (only if it contains an 'sfc' geometry column)
sf
objectRead a Parquet file. Uses standard metadata information to identify geometry columns and coordinate reference system information.
st_read_parquet(dsn, col_select = NULL, props = NULL, ...)
st_read_parquet(dsn, col_select = NULL, props = NULL, ...)
dsn |
character file path to a data source |
col_select |
A character vector of column names to keep. Default is
|
props |
Now deprecated in |
... |
additional parameters to pass to
|
Reference for the metadata used:
https://github.com/geopandas/geo-arrow-spec. These are
standard with the Python GeoPandas
library.
Adopted from wcjochem/sfarrow
object of class sf
sf
object to Parquet fileConvert a simple features spatial object from sf
to a Parquet file using write_parquet
. Geometry
columns (type sfc
) are converted to well-known binary (WKB) format.
st_write_parquet( obj, dsn, hf_version = "2.2", license = "ODbL", source = "lynker-spatial", ... )
st_write_parquet( obj, dsn, hf_version = "2.2", license = "ODbL", source = "lynker-spatial", ... )
obj |
object of class |
dsn |
data source name. A path and file name with .parquet extension |
hf_version |
dataset version |
license |
dataset license |
source |
dataset source |
... |
additional options to pass to |
Adopted from wcjochem/sfarrow
obj
invisibly
Write sf
object to an Arrow multi-file dataset
write_sf_dataset( obj, path, format = "parquet", partitioning = dplyr::group_vars(obj), hf_version = "2.2", license = "ODbL", source = "lynker-spatial", ... )
write_sf_dataset( obj, path, format = "parquet", partitioning = dplyr::group_vars(obj), hf_version = "2.2", license = "ODbL", source = "lynker-spatial", ... )
obj |
object of class |
path |
string path referencing a directory for the output |
format |
output file format ("parquet" or "feather") |
partitioning |
character vector of columns in |
hf_version |
dataset version |
license |
dataset license |
source |
dataset source |
... |
additional arguments and options passed to |
Translate an sf
spatial object to data.frame
with WKB
geometry columns and then write to an arrow
dataset with
partitioning. Allows for dplyr
grouped datasets (using
group_by
) and uses those variables to define
partitions. Adopted from wcjochem/sfarrow
obj
invisibly
write_dataset
, st_read_parquet