Chapter 4 Visualization

Common problems

As you start to run R code, you’re likely to run into problems. Don’t worry — it happens to everyone. I have been writing R code for years, and every day I still write code that doesn’t work!

Start by carefully comparing the code that you’re running to the code in the book. R is extremely picky, and a misplaced character can make all the difference. Make sure that every ( is matched with a ) and every " is paired with another ". Sometimes you’ll run the code and nothing happens. Check the left-hand of your console: if it’s a +, it means that R doesn’t think you’ve typed a complete expression and it’s waiting for you to finish it. In this case, it’s usually easy to start from scratch again by pressing ESCAPE to abort processing the current command.

If you’re still stuck, try the help. You can get help about any R function by running ?function_name in the console, or selecting the function name and pressing F1 in RStudio. Don’t worry if the help doesn’t seem that helpful - instead skip down to the examples and look for code that matches what you’re trying to do.

If that doesn’t help, carefully read the error message. Sometimes the answer will be buried there! But when you’re new to R, the answer might be in the error message but you don’t yet know how to understand it. Another great tool is Google: try googling the error message, as it’s likely someone else has had the same problem, and has gotten help online.

plot() and ggplot()

The base plotting paradigm is "ink on paper" whereas the lattice and ggplot paradigms are basically writing a program that uses the grid-package to accomplish the low-level output to the target graphics devices. Theggplot-paradigm has the "Grammar of Graphics" design which tries to integrate a variety of different plotting functions into one coherent package. It does require loading theggplot2package, whereas R starts up with thegraphics and grDevices packages already loaded. Bothggplot2andlatticefunctions require the use of an explicitprint` call when they are used inside a function.

With ggplot2 you assign the result of that function to an object name and then further modify it. When it's ready for "publication" you get the output processed and sent to a device with print. "ggplot" graphics often get progressively modified by adding "layers" to a base plot created with qplot or ggplot through the use of the +.gg-function.

In the case of base-graphics there is no R object that holds results. The commands get processed immediately and inscribed on the "paper" of the current device. You then issue further commands to augment the output on that device. The plotrix package gives a good example of the development of advanced plotting facilities using the base-graphics paradigm.

One major limitation of ggplot2-functions versus base and lattice graphics functions is that ggplot2 does not have any 3D plotting functions. The lattice-package, however, is not being actively maintained, but it seemed fairly mature at the point that active development was stopped and if you find a bug it will probably be fixed. There are both the gridExtra and latticeExtra packages that extend lattice and ggplot2 capabilities. There is now also a gridBase package that supoports saving base plotting results as a grid "grob" and then merging base and grid, i.e. lattice or ggplot, output. It is certainly true that "ggplot"-paradigm seems to be the target of more sustained activity in recent years.

Scales

The third way you can make your plot better for communication is to adjust the scales. Scales control the mapping from data values to things that you can perceive. Normally, ggplot2 automatically adds scales for you. For example, when you type:

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class))

ggplot2 automatically adds default scales behind the scenes:

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class)) +
  scale_x_continuous() +
  scale_y_continuous() +
  scale_colour_discrete()

Note the naming scheme for scales: scale_ followed by the name of the aesthetic, then _, then the name of the scale. The default scales are named according to the type of variable they align with: continuous, discrete, datetime, or date. There are lots of non-default scales which you’ll learn about below.

The default scales have been carefully chosen to do a good job for a wide range of inputs. Nevertheless, you might want to override the defaults for two reasons:

You might want to tweak some of the parameters of the default scale. This allows you to do things like change the breaks on the axes, or the key labels on the legend.
You might want to replace the scale altogether, and use a completely different algorithm. Often you can do better than the default because you know more about the data.