US Droughts Per County

U.S. Drought Monitor is jointly produced by the National Drought Mitigation Center at the University of Nebraska-Lincoln, the United States Department of Agriculture, and the National Oceanic and Atmospheric Administration. They collect this data through a joint venture at which gives an API to download this data for a variety of parameters e.g. time period, % population effected , % area effected etc.

Using this data and R we can enrich and visualise this data from 01-01-2019 (or before the data goes further) over time through an animated map . This piece is primarily a place to store and show the 13mb GIF (Twitter doesn’t allow > 2.5mb) but while we’re here let’s talk through the process of getting here as well!

First we can download the data from with the only specification that changes is the time you wish to map across, the other parameters should be county level and categories. You can always filter the dates down inside the script but you’re better off saving the memory to deal with the larger crunching later on.

Import & Clean


us_drought <- 
read_csv(here::here("dm_export_20190101_20210727.csv")) %>%
  pivot_longer(cols = c(None : D4) , names_to = "drought_lvl" , values_to = "pct_pop") %>%
  janitor::clean_names() %>%
  mutate(fips = sprintf("%05s" , fips)) %>%
  mutate(drought_lvl = factor(drought_lvl , 
                              levels = c("None" , "D0" , "D1" , "D2" , "D3" , "D4"),
                              labels = c("None", "Abnormal", "Moderate", "Severe", "Extreme", "Exceptional"))) %>%
  group_by(valid_start , fips) %>%
  slice_max(order_by = pct_pop , n = 1) %>%

Now if you use setwd() in your scripts then PLEASE STOP DOING THAT, read this article to understand why. It essentially it boils down to making your script very fragile. As soon as you rename that file or move it, it script with setwd() breaks. This is all to real for someone else who may need to run your script where setwd() build off a very different environment. here() from the {here} package set the default file path to “the path to the top-level of my current project” of which you only need to specify any further folder structure you want embedded and file title.

Due to the wide format of the cvs data from Drought Monitor we need to tidy the data to show as a data matrix (long format) instead of denormalised (wide format). This will allow us to work with our data in a far easier form. We need to use pivot_longer() to specify the columns we wish to collapse cols = c(None:D4) as well as what we want to call the two columns we want to collapse into the observation name column and the observation value. In this instance we will have a drought level and a % population effected by that level as our value.

The last part of the wrangle is the group_by() %>% slice_max() combo which allows us to group by an observation value and then take the top of that group. This allows us to bring back the highest drought level for each week (valid_start) in each county (fips) even though the dataset shows us each of the 5 levels (None , D1 , D2 , D3 , D4) for each county in each week broken down by what % of the population was effected. This allows us to plot only one value per county not five. Don’t forget to ungroup().

In order to map we must download shape files to map onto which we can do using the R package {tigris]



# import state and counties from tigris so we can filter out states we do not want to use (Alaska , Hawaii etc)

counties <- 
tigris::counties() %>% 
select(state_id = STATEFP , county_id = GEOID , county_name = NAMELSAD)

# lookup_code for "Alaska" , "Hawaii" , "Puerto Rico" to strip them out of map

#The code for Alaska is '02'.
#The code for Hawaii is '15'. 
#The code for Puerto Rico is '72'.

counties %
  rename(fips = county_id) %&gt;%
  filter(!state_id %in% c("02" , "15" , "72")) %&gt;%
  st_simplify(dTolerance = 0.060)

The {tigris} package has a very useful selection of US data one of which is county level data and polygon information for mapping which we can grab by loading the package – library(tigris) – and then loading the counties() function to save the data. We also don’t want play with all the data the counties function has so we do some research and strip out the ones we don’t want, here Alaska, Hawaii and Puerto Rico.

Last thing we do here is probably the most important part of the entire script for large amounts of data which can be anywhere past 1 million observations at this point and that is st_simplify(). st_simplify allows us to covert our polygons to a lower definition using the GEOS implementation of the Douglas-Peucker algorithm to reduce the vertex count. Essentially it lowers the detail of the shape files as shown below. This is very important as we’re about to join these shape files to our drought data which we have for every week for every county for many years. When I did this for 2018-present data I had nearly 1 million observations I was going to join shape files to. Without lowering the detail of those polygons I would have ended up running out of memory and crashing out.

seine_simp = st_simplify(seine, dTolerance = 2000) # 2000 m



# join into one table of geoinfo and droughts
counties_drought <- 
  counties %>% 
  inner_join(us_drought, by = "fips") 

The parallel package from R 2.14.0 and later provides functions for parallel execution of R code on machines with multiple cores or processors or multiple computers. Again we’re trying to give our script as much help as possible for the last crunch. In the end the actual join (if you’ve simplified and create a parallel) should not be such a bad time.

GIF Creation & Static Graphics

library(showtext); showtext_auto() ; font_add_google("Amatic SC" , "amatic")

drought_map_simp <-
counties_drought %>%
  #filter(valid_start >= "2020-07-01") %>% # testing filter for gganimate
  #filter(valid_start == max(valid_start)) %>% # turn on for static map
  ggplot(aes(fill = drought_lvl)) +
  geom_sf() +
  borders("state") +
  transition_manual(valid_start) + # turn off static
  scale_fill_viridis_d(option = "rocket" , begin = 0 , end = 1) +
    title = "County Drought Levels Over Time Throughout the United States",
    subtitle = "{current_frame}", # turn off static
    caption = "Source: | Graphic: @NearandDistant",
    fill = "Drought Level") +
  ggthemes::theme_map() +
  coord_sf() +
############################################################# all theme elements
    #text = element_text(family = "BM Hanna Air Regular"), # turn off for animation
    plot.title = element_text(vjust = -110 , size = 60 , face = "bold"),
    plot.subtitle = element_text(vjust = -105 , hjust = 0.95 , size = 60), # turn off static
    plot.caption = element_text(vjust = -10 , hjust = 0 , size = 40),
    legend.title = element_text(size = 50 , face = "bold"),
    legend.text = element_text(size = 40),
    legend.position = c(0,0.05),
    plot.margin = margin(-2,0,1,0, unit = "cm"))

We can turn off the animation features (transition, subtitle render etc) and pull a static graphic from the above or turn them on a lock a GIF to be animated. The static graphic is shown below.

Lastly we need to take our GIF and animate it using {gganimate} shown below again using here::here() to save this properly!



drought_map %>%
animate(fps = 55 , nframe = 300 , 
        height = 2000 , width = 2340,
        start_pause = 1, end_pause = 1,
        renderer = gifski_renderer(here::here("us_droughts_simp.gif")))

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s