Downloading raw data on your computer
Vessel Monitoring System (VMS) data comes organized by year. The
function vms_download()
automatically downloads it and
store into the working directory in a VMS-data folder. Within the folder
raw data are organized by monthly folders (with names in Spanish) that
contain several .csv files that usually store byweekly data intervals.
Each file have different rows and some have different column names. For
that we highly recommend to use the vms_clean()
function.
The latter corrects several inconsistencies within the raw data. If you
have any suggestion or spot some errors we will be very pleased if you
create an issue.
The function below downloads data from the year 2019.
library(dafishr)
vms_download(year = 2019, destination.folder = getwd())
Using the cleaning functions
The first vms_clean()
function works on the VMS
data.frame
. You can either load downloaded data or use the
sample_dataset
that you can call and clean like so:
The vms_clean()
function returns a message with the
number of rows that were cleaned because they contained
NULL
values in coordinates.
Spatial wrangling
Once the dataset is wrangled, there are some other preprocessing
steps to follow. First, all points that fall inland should be
eliminated. This is because VMS data are vessels, thus points falling
inland are errors in data registration. For that we will upload the
mx_inland
shapefile which helps eliminating all the points
within a certain distance from the coastline.
data("mx_inland") # Shapefile of inland Mexico area
vms_cleaned_land <- clean_land_points(vms_cleaned, mx_inland)
Associating port data
Once all land points are eliminated, we can use the
join_ports_locations()
function to label all the points
where a vessel was inside a port or a marina. We achieve this by using
the mx_ports
shapefile that will be used to create a buffer
around each port or marina location. Then each VMS point that falls
within these buffers will be labelled as at_port
or
at_sea
in a new column that will be automatically called
location
.
data("mx_ports")
# If you are just testing, it is a good idea to subsample...
# it takes a while on the full data!
vms_subset <- dplyr::sample_n(vms_cleaned, 1000)
with_ports <- join_ports_locations(vms_subset)
Now we can check the results in a map: