Chapter 3 Understanding and Viewing Data
This chapter begins with a brief introduction to the structure of the NatureCounts database, followed by a description of access levels and how to create a user account. We then provide instructions on how to view data from various collections and apply filters.
The code in this Chapter will not work unless you replace
"testuser"
with your actual user name. You will be prompted to enter your password.
3.1 Data Structure
The Bird Monitoring Data Exchange (BMDE) was developed to be a standardized data exchange schema to promote the sharing and analysis of avian observational data. The schema is the core sharing standard of the Avian Knowledge Network.The BMDE (currently version 2.0) includes 169 core fields (variables) that are capable of capturing all metrics and descriptors associated with a bird observation. The BMDE schema was extended in 2018, and the complete version now includes 265 fields (variables).
Fields are variables or columns in a data set
By default, the naturecounts package downloads the data with the minimum set of fields/columns. However, for more advanced applications, users may wish to specify which fields/columns to return using the field_set
and fields
options in the nc_data_dl()
function. For help with this feature, see the naturecounts article ‘Selecting columns and fields to download’.
3.2 Levels of Data Access
NatureCounts hosts many datasets, representing in excess of 170 million occurrence records, with a primary focus on Canadian bird monitoring data. Many of those datasets are from projects lead by Birds Canada and/or its partners. While we strive to make our data as openly available as possible, we also need to recognize the needs of our partners and funders.
NatureCounts has five Levels of Data Access, which define how each dataset can be used. Those levels are set individually for each dataset, in consultation with the various partners and data custodians involved.
Level 0: most restricted (archival only)
Level 1: archival only, metadata visible
Level 2: data used for visualizations only
Level 3: data available to third parties by request
Level 4: data shared with external portals and available by request
Level 5: open access
All contributing members of NatureCounts have complete authority over the use of the data they have provided, and can withhold data at any time from any party or application. All users of any NatureCounts data must clearly acknowledge the contribution of the members who are making data available. Each dataset comes with its own Data Sharing Policy that defines the various conditions for data usage.
You can view the Data Access Level for each collection on the NatureCounts Datasets page or using the metadata function (see akn_level
):
<-meta_collections()
collectionsView(collections)
You can create a stand alone table for any metadata table using similar syntax as above
3.3 Authorizations
To access data using the naturecounts R package, you must sign up for a free account. Further, if you would like to access Level 3 or 4 collections you must make a data request. For step-by-step visual instructions, we encourage you to watch: NatureCounts: An Introductory Tutorial.
Create your free account now before continuing with this workbook
3.4 Viewing information about NatureCounts collections
First, lets use the naturecounts R package to view the number of records available for different collections. To do this we use the nc_count()
function. You can view all the available collections and the number of observations using the default setting.
If a username is provided, the collections are filtered to only those available to the user. Otherwise all counts from all data sources are returned (default: show = "all"
).
nc_count()
Or you can view the collections for which you have access using your username/password.
nc_count(username = "testuser")
Further refinements can be applied to the nc_count()
function using filters Options include: collections
, project_id
, species
, years
, doy
(day-of-year), region
, and site_type
.
3.5 Metadata codes and descriptions
There are metadata associated with the various arguments used in the nc_count()
and nc_data_dl()
functions, the latter you will use in Chapter 4. These are stored locally and can be accessed anytime to help filter your data view or download query. They include:
meta_country_codes()
: country codesmeta_statprov_codes()
: state/Province codesmeta_subnational2_codes()
: subnational2 codesmeta_iba_codes()
: Important Bird Area (IBA) codesmeta_bcr_codes()
: Bird Conservation Region (BCR) codesmeta_utm_squares()
: UTM Square codesmeta_species_authority()
: species taxonomic authoritiesmeta_species_codes()
: alpha-numeric codes for avian speciesmeta_species_taxonomy()
: codes and taxonomic information for all speciesmeta_collections()
: collections names and descriptionsmeta_breeding_codes()
: breeding codes and descriptionsmeta_project_protocols()
: project protocolsmeta_projects()
: projects ids, names, websites, and descriptionsmeta_protocol_types()
: protocol types and descriptions
You can explore the metadata materials using two lines of code. For example, you can view the Important Bird Area (IBA) metadata using:
<-meta_iba_codes()
ibaView(iba)
3.6 Region & Species filtering
Filtering will often be done based on geographic extent (i.e., region
). To filtering by region
you must provide a named list with one of the following:
country
: country code (e.g., CA for Canada)statprov
: state/province code (e.g., MB for Manitoba)subnational2
: subnational (type 2) code (e.g., CA.MB.07 for the Brandon Area)iba
: Important Bird Areas (IBA) code (e.g., AB001 for Beaverhill Lake in Alberta)bcr
: Bird Conservation Regions (e.g., 2 for Western Alaska)utm_squares
: UTM square code (e.g., 10UFE96 for a grid in Alberta)bbox
: bounding box coordinates (e.g., c(left = -101.097223, bottom = 50.494717, right = -99.511239, top = 51.027557) for a box containing Riding Mountain National Park in Manitoba). On the NatureCounts web portal there is a handy Within Coordinates) tool to help you retrieve custom coordinates for your data query and/or download.
To use the region
argument:
nc_count(region=list(statprov="ON"))
Another commonly used filter is specific to species
. In order to filter by species
you need to get the species id codes. These are numeric codes that reflect species identity.
For all species, you can search by scientific, English or French name with the search_species() function.
search_species("chickadee")
For birds, you can also search by alphanumeric species codes with the search_species_code() function. This function also gives you the option of returning all species codes (including subspecies) related to a single species, and is considered a more robust method for ensuring that you do not miss observations.
search_species_code("BCCH")
For additional examples and more advanced options are available online for retrieving Region and Species codes.
3.7 Examples
Here are a few examples for you to work through to become familiar with the nc_count()
function.
Example 1: Determine the number of collections and records for a specific region. The options include: country
, statprov
, subnational2
, iba
, bcr
, utm_squares
, and bbox
. You can find details and examples on how to search_region()
at the link provided.
The following code will retrieve all available collections and number of records for British Columbia
search_region("British Columbia", type = "statprov")
nc_count(region = list(statprov = "BC"))
Example 2: Determine the number of records for a specific species. You can find details and examples on how to search_species_code()
based on 4 letter alpha code and search_species()
based on common names at the links provided.
The following code will retrieve all available collections and number of records for Red-headed Woodpecker
search_species("Red-headed Woodpecker")
search_species_code("RHWO")
<-nc_count(species = 10060)
RHWOView(RHWO)
Example 3: We can further refine the Red-headed Woodpecker example (above) by filtering the species-specific data by region (e.g., Bird Conservation Region 11), time period (e.g., 2015-2019), or a combination of both.
<- nc_count(species = 10060, region = list(bcr = "11"))
RHWO_11 View(RHWO_11)
<- nc_count(species = 10060, year = c(2015, 2019))
RHWO_year View(RHWO_year)
<- nc_count(species = 10060, region = list(bcr = "11"), year = c(2015, 2019))
RHWO_11_year View(RHWO_11_year)
3.8 Exercises
Now apply your newly acquired skills!
Exercise 1: If you are interesting in doing a research project on Snowy Owls in Quebec, which three collections are you most likely to consider using (i.e., which have the most data)?
Answer: EBird-CA-QC, OISEAUXQC, CBC
Exercise 2: How many records of Gadwal are in the British Columbia Coastal Waterbird Survey collection? What if you are only interested in records from 2010-2019, how many records are available?
Answer: 702, 389
Full answers to the exercises can be found in Chapter 7.