Skip to contents

Downloading raw data on your computer

Vessel Monitoring System (VMS) data comes organized by year. The function vms_download() automatically downloads it and store into the working directory in a VMS-data folder. Within the folder raw data are organized by monthly folders (with names in Spanish) that contain several .csv files that usually store byweekly data intervals. Each file have different rows and some have different column names. For that we highly recommend to use the vms_clean() function. The latter corrects several inconsistencies within the raw data. If you have any suggestion or spot some errors we will be very pleased if you create an issue.

The function below downloads data from the year 2019.

library(dafishr)

vms_download(year = 2019, destination.folder = getwd())

Using the cleaning functions

The first vms_clean() function works on the VMS data.frame. You can either load downloaded data or use the sample_dataset that you can call and clean like so:

library(dafishr)
data("sample_dataset")
vms_cleaned <- vms_clean(sample_dataset)

The vms_clean() function returns a message with the number of rows that were cleaned because they contained NULL values in coordinates.

Spatial wrangling

Once the dataset is wrangled, there are some other preprocessing steps to follow. First, all points that fall inland should be eliminated. This is because VMS data are vessels, thus points falling inland are errors in data registration. For that we will upload the mx_inland shapefile which helps eliminating all the points within a certain distance from the coastline.

data("mx_inland") # Shapefile of inland Mexico area
vms_cleaned_land <- clean_land_points(vms_cleaned, mx_inland)

Associating port data

Once all land points are eliminated, we can use the join_ports_locations() function to label all the points where a vessel was inside a port or a marina. We achieve this by using the mx_ports shapefile that will be used to create a buffer around each port or marina location. Then each VMS point that falls within these buffers will be labelled as at_port or at_sea in a new column that will be automatically called location.

data("mx_ports")

# If you are just testing, it is a good idea to subsample...
# it takes a while on the full data!

vms_subset <- dplyr::sample_n(vms_cleaned, 1000)
with_ports <- join_ports_locations(vms_subset)

Now we can check the results in a map:

with_ports_sf <- sf::st_as_sf(with_ports,
                              coords = c("longitude", "latitude"),
                              crs = 4326)

data("mx_shape")

library(ggplot2)
ggplot2::ggplot(mx_shape) +
  geom_sf(col = "gray90") +
  geom_sf(data = with_ports_sf, aes(col = location)) +
  facet_wrap(~ location) +
  theme_bw()