Chapter 3 Understanding and Viewing Data

This chapter begins with a brief introduction to the structure of the NatureCounts database, followed by a description of access levels and how to create a user account. We then provide instructions on how to view data from various collections and apply filters.

The code in this Chapter will not work unless you replace "testuser" with your actual user name. You will be prompted to enter your password.

3.1 Data Structure

The Bird Monitoring Data Exchange (BMDE) was developed to be a standardized data exchange schema to promote the sharing and analysis of avian observational data. The schema is the core sharing standard of the Avian Knowledge Network.The BMDE (currently version 2.0) includes 169 core fields (variables) that are capable of capturing all metrics and descriptors associated with a bird observation. The BMDE schema was extended in 2018, and the complete version now includes 265 fields (variables).

Fields are variables or columns in a data set

By default, the naturecounts package downloads the data with the minimum set of fields/columns. However, for more advanced applications, users may wish to specify which fields/columns to return using the field_set and fields options in the nc_data_dl() function. For help with this feature, see the naturecounts article ‘Selecting columns and fields to download’.

3.2 Levels of Data Access

NatureCounts hosts many datasets, representing in excess of 170 million occurrence records, with a primary focus on Canadian bird monitoring data. Many of those datasets are from projects lead by Birds Canada and/or its partners. While we strive to make our data as openly available as possible, we also need to recognize the needs of our partners and funders.

NatureCounts has five Levels of Data Access, which define how each dataset can be used. Those levels are set individually for each dataset, in consultation with the various partners and data custodians involved.

  • Level 0: most restricted (archival only)

  • Level 1: archival only, metadata visible

  • Level 2: data used for visualizations only

  • Level 3: data available to third parties by request

  • Level 4: data shared with external portals and available by request

  • Level 5: open access

All contributing members of NatureCounts have complete authority over the use of the data they have provided, and can withhold data at any time from any party or application. All users of any NatureCounts data must clearly acknowledge the contribution of the members who are making data available. Each dataset comes with its own Data Sharing Policy that defines the various conditions for data usage.

You can view the Data Access Level for each collection on the NatureCounts Datasets page or using the metadata function (see akn_level):

collections<-meta_collections()
View(collections)

You can create a stand alone table for any metadata table using similar syntax as above

3.3 Authorizations

To access data using the naturecounts R package, you must sign up for a free account. Further, if you would like to access Level 3 or 4 collections you must make a data request. For step-by-step visual instructions, we encourage you to watch: NatureCounts: An Introductory Tutorial.

Create your free account now before continuing with this workbook

3.4 Viewing information about NatureCounts collections

First, lets use the naturecounts R package to view the number of records available for different collections. To do this we use the nc_count() function. You can view all the available collections and the number of observations using the default setting.

If a username is provided, the collections are filtered to only those available to the user. Otherwise all counts from all data sources are returned (default: show = "all").

nc_count() 

Or you can view the collections for which you have access using your username/password.

nc_count(username = "testuser")

Further refinements can be applied to the nc_count() function using filters Options include: collections, project_id, species, years, doy (day-of-year), region, and site_type.

3.5 Metadata codes and descriptions

There are metadata associated with the various arguments used in the nc_count() and nc_data_dl() functions, the latter you will use in Chapter 4. These are stored locally and can be accessed anytime to help filter your data view or download query. They include:

  • meta_country_codes(): country codes

  • meta_statprov_codes(): state/Province codes

  • meta_subnational2_codes(): subnational2 codes

  • meta_iba_codes(): Important Bird Area (IBA) codes

  • meta_bcr_codes(): Bird Conservation Region (BCR) codes

  • meta_utm_squares(): UTM Square codes

  • meta_species_authority(): species taxonomic authorities

  • meta_species_codes(): alpha-numeric codes for avian species

  • meta_species_taxonomy(): codes and taxonomic information for all species

  • meta_collections(): collections names and descriptions

  • meta_breeding_codes(): breeding codes and descriptions

  • meta_project_protocols(): project protocols

  • meta_projects(): projects ids, names, websites, and descriptions

  • meta_protocol_types(): protocol types and descriptions

You can explore the metadata materials using two lines of code. For example, you can view the Important Bird Area (IBA) metadata using:

iba<-meta_iba_codes()
View(iba)

3.6 Region & Species filtering

Filtering will often be done based on geographic extent (i.e., region). To filtering by region you must provide a named list with one of the following:

  • country: country code (e.g., CA for Canada)

  • statprov: state/province code (e.g., MB for Manitoba)

  • subnational2: subnational (type 2) code (e.g., CA.MB.07 for the Brandon Area)

  • iba: Important Bird Areas (IBA) code (e.g., AB001 for Beaverhill Lake in Alberta)

  • bcr: Bird Conservation Regions (e.g., 2 for Western Alaska)

  • utm_squares: UTM square code (e.g., 10UFE96 for a grid in Alberta)

  • bbox: bounding box coordinates (e.g., c(left = -101.097223, bottom = 50.494717, right = -99.511239, top = 51.027557) for a box containing Riding Mountain National Park in Manitoba). On the NatureCounts web portal there is a handy Within Coordinates) tool to help you retrieve custom coordinates for your data query and/or download.

To use the region argument:

nc_count(region=list(statprov="ON"))

Another commonly used filter is specific to species. In order to filter by species you need to get the species id codes. These are numeric codes that reflect species identity.

For all species, you can search by scientific, English or French name with the search_species() function.

search_species("chickadee")

For birds, you can also search by alphanumeric species codes with the search_species_code() function. This function also gives you the option of returning all species codes (including subspecies) related to a single species, and is considered a more robust method for ensuring that you do not miss observations.

search_species_code("BCCH")

For additional examples and more advanced options are available online for retrieving Region and Species codes.

3.7 Examples

Here are a few examples for you to work through to become familiar with the nc_count() function.

Example 1: Determine the number of collections and records for a specific region. The options include: country, statprov, subnational2, iba, bcr, utm_squares, and bbox. You can find details and examples on how to search_region() at the link provided.

The following code will retrieve all available collections and number of records for British Columbia

search_region("British Columbia", type = "statprov")
nc_count(region = list(statprov = "BC"))

Example 2: Determine the number of records for a specific species. You can find details and examples on how to search_species_code() based on 4 letter alpha code and search_species() based on common names at the links provided.

The following code will retrieve all available collections and number of records for Red-headed Woodpecker

search_species("Red-headed Woodpecker")
search_species_code("RHWO")

RHWO<-nc_count(species = 10060)
View(RHWO)

Example 3: We can further refine the Red-headed Woodpecker example (above) by filtering the species-specific data by region (e.g., Bird Conservation Region 11), time period (e.g., 2015-2019), or a combination of both.

RHWO_11 <- nc_count(species = 10060, region = list(bcr = "11"))
View(RHWO_11)

RHWO_year <- nc_count(species = 10060, year = c(2015, 2019))
View(RHWO_year)

RHWO_11_year <- nc_count(species = 10060, region = list(bcr = "11"), year = c(2015, 2019))
View(RHWO_11_year)

3.8 Exercises

Now apply your newly acquired skills!

Exercise 1: If you are interesting in doing a research project on Snowy Owls in Quebec, which three collections are you most likely to consider using (i.e., which have the most data)?

Answer: EBird-CA-QC, OISEAUXQC, CBC

Exercise 2: How many records of Gadwal are in the British Columbia Coastal Waterbird Survey collection? What if you are only interested in records from 2010-2019, how many records are available?

Answer: 702, 389

Full answers to the exercises can be found in Chapter 7.