Zero-fill the species presence data by adding zero observation counts (absences) data to an existing naturecounts dataset.
format_zero_fill(
df_db,
by = "SamplingEventIdentifier",
species = "all",
fill = "ObservationCount",
extra_species = NULL,
extra_event = NULL,
warn = TRUE,
verbose = TRUE
)
Either data frame or a connection to database with
naturecounts
table (a data frame is returned).
Character vector. By default, "SamplingEventIdentifier" or a vector of specific column names to fill by (see details)
Character vector. Either "all", for species in the data, or a vector of species ID codes to fill in.
Character. The column name to fill in. Defaults to "ObservationCount".
Character vector. Extra columns/fields uniquely
associated with species_id
to keep in the data (all columns not in by
,
species
, fill
, extra_species
, or extra_event
will be omitted from
the result).
Character vector. Extra columns/fields uniquely associated
with the Sampling Event (the field defined by by
) to keep in the data
(all columns not in by
, species
, fill
, extra_species
, or
extra_event
) will be omitted from the result).
Logical. If TRUE, stop zero-filling if >100 species and >1000 unique sampling events. If FALSE, ignore and proceed.
Logical. Show messages?
Data frame
by
refers to the combination of columns which are used to detect
missing values. By default SamplingEventIdentifier
is used. Otherwise
users can specify their own combination of columns.
If species
is supplied, all records will be used to determine observation
events, but only records (zero-filled or otherwise) which correspond to a
species in species
will be returned (all others will be omitted). Note
that records where species_id
is NA (generally for 0 counts for
presence/absence), will be converted to a list of 0's for the individual
species.
# Download data (with "core" fields to include 'CommonName')
sample <- nc_data_dl(collection = c("SAMPLE1", "SAMPLE2"), fields_set = "core",
username = "sample", info = "nc_example")
#> Using filters: collections (SAMPLE1, SAMPLE2); fields_set (BMDE2.00)
#> Collecting available records...
#> collection nrecords
#> 1 SAMPLE1 991
#> 2 SAMPLE2 995
#> Total records: 1,986
#>
#> Downloading records for each collection:
#> SAMPLE1
#> Records 1 to 991 / 991
#> SAMPLE2
#> Records 1 to 995 / 995
# Remove casual observations (i.e. 'AllSpeciesReported' = "No")
library(dplyr) # For filter function
#>
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#>
#> filter, lag
#> The following objects are masked from ‘package:base’:
#>
#> intersect, setdiff, setequal, union
sample <- filter(sample, AllSpeciesReported == "Yes")
# Remove data with "X" ObservationCount (only keep numeric obs)
sample <- filter(sample, ObservationCount != "X")
# Zero fill by all species present
sample_all_zeros <- format_zero_fill(sample)
#> - Converted 'fill' column (ObservationCount) from character to numeric
# Zero fill only for Canada Goose
goose <- format_zero_fill(sample, species = "230")
#> - Converted 'fill' column (ObservationCount) from character to numeric
# Keep species-specific variables
goose <- format_zero_fill(sample, species = "230", extra_species = "CommonName")
#> - Converted 'fill' column (ObservationCount) from character to numeric
# Keep sampling-event-specific variables
coords <- format_zero_fill(sample, extra_event = c("latitude", "longitude"))
#> - Converted 'fill' column (ObservationCount) from character to numeric
# By species, keeping extra species variables and event variables
goose_coords <- format_zero_fill(sample, species = "230",
extra_species = "CommonName",
extra_event = c("latitude", "longitude"))
#> - Converted 'fill' column (ObservationCount) from character to numeric
# Only return event information
events <- format_zero_fill(sample, fill = NA,
extra_event = c("latitude", "longitude"))