tidyverse

The tidyverse collection of packages is a suite of packages that simplifies a huge number of the commonest tasks I do in R. It’s become indispensable for me, and I’ll make heavy use of it.

I draw your attention to dplyr, one of the tidyverse packages. It provides a set of functions that makes manipulating data frames a lot neater. You can filter, select, sort, and create new columns in a much neater way than using R’s… esoteric… native syntax.

I strongly recommend visiting — and bookmarking — their website.

Here’s one example of how the clarity of a piece of code can improve. Suppose you want to subset the (inbuilt) iris data frame according to the width of the sepals and the length of the petals. In the traditional R way, you might write

iris[iris$Sepal.Width < 3.25 & iris$Petal.Length < 5, ]

But using dplyr, it’s

iris %>%
filter(Sepal.Width < 3.25) %>%
filter(Petal.Length < 5)

I will use filter, select, arrange, mutate from the dplyr package, crossing from the tidyr package, and many functions from the stringr package frequently.

magrittr

The magrittr package is a great resource for making R more readable… and more writable.

Here’s an example of hard-to-read code

print(head(rev(toupper(letters))))

If you came across that in someone else’s code, you might gag. And rightly so. It takes a few moments to work out what’s going on: print the first five characters of the reverse of the upper case of the letters of the alphabet. Not very readable. If an entire R programme comprises lines like that, you might just chuck it in for good. But magrittr offers us another way: piping an object into a function using the %>% operator:

letters %>% toupper() %>% rev() %>% head() %>% print()

It’s longer. But the whitespace hopefully makes is more readable. I think so.

Another thing that helps, is that the syntactic and logical orders of the functions are the same. What I mean here is that the order in which the functions appear is also the order they are processed in. So here, you take the letters of the alphabet, upper case them, reverse it, take the first five characters, then print it.

It’s even more readable if you spread it all out over several lines:

letters %>%
toupper() %>%
rev() %>%
head() %>%
print()

There should be no doubt at all what’s going on. Very readable. And, more writable: suppose you forgot something. The above code snippets all produce the result Z Y X W V. But what if your output was meant to look like this: Z26 Y25 X24 W23 V22? In the first snippet, you would need to dig into those nested brackets and come up with

print(head(rev(paste0(toupper(letters), 1:26))))

Ugh. But with pipe operator, it’s easy:

letters %>%
toupper() %>%
paste0(1:26) %>%
rev() %>%
head() %>%
print()

I hope you agree that inserting a line into the logical order of the transformations you’re performing is a lot easy than delving into those brackets. For this reason, the code I use in this blog will tend to use magrittr pipes a lot. The magrittr manual details several variations on the theme, including %<>% and %T>%, which I use quite a bit too.

hello woRld

This blog is about the intersect between my two big passions: Geocaching and R programming.

Geocaching is an international treasure hunt, where you seek containers hidden by other cachers, using your phone or a GPS receiver.

R is a programming language which is commonly used in statistics-heavy academic and professional organisations.

But how do these intersect? Geocaching is about getting out into nature, walking up hills, climbing trees. I don’t know whether you’re been outdoors lately, but rain is a real thing. You need a computer to run R, and a wet computer is not a happy computer.

The reason for R being involved in the world of geocaching is because of a type of geocache called the “mystery” (alias “unknown”). This is a type of cache where you have to solve some kind of puzzle before you get given the coordinates. And these puzzles can vary from being incredibly straightforward to extremely difficult. The more complex ones can sometimes be tackled using a programmatic approach.

Since mysteries are my favourite kind of cache, there is a natural overlap for me. And that’s what this blog will cover: my use of R in journeying through the world of geocaches.