Chapter 3 Zero-fill Matrix
Many analyses require that you have records not only of when/where species were detected, but also where they were not detected. PFW contains records of species presence, however, we can infer species absence when a species was not detected during a survey, if the survey was done within the bounds of the species range.
Let’s start this Chapter by reloading our libraries and resetting our working directories in the event you took a break and are starting a new session with your newly filtered and cleaned data set.
require(tidyverse)
require(reshape)
require(data.table)
require(tibble)
require(raster)
require(sp)
require(rgeos)
require(rgdal)
<- paste("Output/")
out.dir <- paste("Data/") dat.dir
3.1 Load Filtered Data
First we will load each of the filtered and cleaned PFW dataset into your working Environment under a different dateframe name.
#Old data
<-fread("Data/PFW_Canada_1988-1995.csv")
dat1<-fread("Data/PFW_Canada_1995-2000.csv")
dat2<-fread("Data/PFW_Canada_2000-2005.csv")
dat3<-fread("Data/PFW_Canada_2005-2010.csv")
dat4<-fread("Data/PFW_Canada_2010-2015.csv")
dat5<-fread("Data/PFW_Canada_2015-2020.csv")
dat6
#New data
<-fread("Data/PFW_Canada_2020-2021.csv")
dat7<-fread("Data/PFW_Canada_2021-2022.csv") dat8
Notice that the newest dateset provided by Cornell Labs is missing the plus_code
(i.e., dat7 has one fewer column). We will add this column in the right spot to make the bind possible.
<-add_column(dat7, plus_code = 0, .after = 12)
dat7
#Data directly from Cornell formatted slightly differently.
<-dat8 %>% dplyr::select(-alt_full_spp_code, -user_id, -observer_id, -housing_density, -no_birds) dat8
Bind and write your full Canadian PFW dataset to the Data
directory for use later in the analysis
<-rbind(dat1, dat2, dat3, dat4, dat5, dat6, dat7, dat8)
data
write.table(data, paste(dat.dir,"PFW_Canada_All.csv", sep=""), row.names = FALSE, col.name = TRUE, append = FALSE, quote = FALSE, sep = ",")
3.2 Species Range
We will zero-fill the data two different ways. Why? because one way is used for standardized reporting by Cornell and Birds Canada, and the second way is more biologically appropriate and should be adopted for research purposes.
For reporting purposes, we will zero-fill the data matrix based on Province (or State). Specifically, if a species was detected >10 time in the past 10 years within a specific Province, it will be considered in range.
First, lets create the zero-fill range file for each Province. First, we creating a master list that links all loc_id
from PFW to Province, the we determine if a species was detected >=10 times. This is done using a loop, which creates a Range output table. Note that we use the REPORT_AS
for the species ID field, not species_code
(which would at first glance seems to make sense, but in the long run it doesn’t work out. Trust me!).
# Using the past 10 years of data to assign range limits for species
<-data %>% filter(year>=2010)
dat
# Remove duplicated and NA
<-dat %>% dplyr::select(Prov) %>% distinct()
master
# Write table to your output director for safe keeping. You will use the master list later.
write.table(master, file = paste(out.dir, "master_prov.csv", sep=""), row.names = FALSE, sep = ",")
<-unique(data$REPORT_AS) # n = 303 in this example
sp.list
# Create a species loop
for(m in 1:length(sp.list)) {
# m<-1 #for testing each species
<-NULL
sp.data <- filter(dat, REPORT_AS == sp.list[m]) %>%
sp.data droplevels()
# Count number of observation in each Atlas Block
<-sp.data %>% group_by(Prov) %>%
sp.data::summarize(nobs=length(how_many)) %>%
dplyrmutate(count=ifelse(nobs>=5, 1, 0)) %>%
::select(-nobs)
dplyr
colnames(sp.data)[colnames(sp.data) == "count"] <- sp.list[m]
#Optional: if there are less than 2 Atlas Blocks containing a record of a species, remove. Considered rare and/or out of range.
# if(nrow(sp.data)<2){
# sp.data<-NULL
# }else{
<-left_join(master, sp.data, by = "Prov" )
master# } #end if statement
#end sp.list loop
}
is.na(master)]=0
master[
# Write your new range table to an output file
write.table(master, file = paste(out.dir, "Range_prov.csv", sep=""), row.names = FALSE, sep = ",")
Each row under the species id indicates if the Atlas Block is within range (1) or is out of range (0).
3.3 Sampling Events
Now that we have a handle on the winter distribution/range of each species, we need to know when each PFW site was sampled. This way we only add a zero-count for a site that was being watched. This is done for each sub_id
which covers the two-day count period.
First we create new effort fields. The first sums the number of half days the feeder site was watched (max = 4) and the second changes the effort hours into a factor.
# Create the full data table. This step is repetitive.
<-rbind(dat1, dat2, dat3, dat4, dat5, dat6, dat7, dat8)
data
# Create effort days field (max = 4)
<-data %>% mutate(Effort_days = (day1_am +day1_pm +day2_am +day2_pm)/2)
data
# Create effort hours field that is a factor
<-data %>% mutate(Effort_hrs = as.factor(effort_hrs_atleast))
datalevels(data$Effort_hrs)<-c("0_1", "1_4", "4_8", ">8")
Now we are ready to create the full sampling events layer using the filtered Canadian PFW dataset.
<- data %>%
event.data ::select(loc_id, sub_id, latitude, longitude, month, day, year, Period, Effort_days, Effort_hrs, Prov, region) %>%
dplyrgroup_by(loc_id, sub_id, latitude, longitude, month, day, year, Period, Effort_days, Effort_hrs, Prov, region) %>%
distinct() %>%
ungroup() %>%
as.data.frame()
# write your new sampling events table to an Output folder
write.table(event.data, paste(out.dir,"Events.csv", sep=""), row.names = FALSE, col.name = TRUE, append = FALSE, quote = FALSE, sep = ",")
Now you have the tables that you need to zero-fill your data matrix. We can now start creating data summaries in Chapter 4.