Chapter 2.1.1.2 Using Atomic Vectors

It’s useful to review some of the important tools for working with different types of atomic vector. These include:

How to convert from one type to another, and when that happens automatically.
How to tell if an object is a specific type of vector.
What happens when you work with vectors of different lengths.
How to name the elements of a vector.
How to pull out elements of interest.

Scalars

Each of the four primary atomic vectors has special syntax to create an individual value, aka a scalar¹², and its own missing value.:

Strings are surrounded by " ("hi") or ' ('bye'). The string missing value is NA_character_. Special characters are escaped with \\; see ?Quotes for full details.
Doubles can be specified in decimal (0.1234), scientific (1.23e4), or hexadecimal (0xcafe) forms. There are three special values unique to doubles: Inf, -Inf, and NaN. The double missing value is NA_real_.
Integers are written similarly to doubles but must be followed by L¹³ (1234L, 1e4L, or 0xcafeL), and can not include decimals. The integer missing value is NA_integer_.
Logicals can be spelled out (TRUE or FALSE), or abbreviated (T or F). The logical missing value is NA.

Scalars and recycling rules

As well as implicitly coercing the types of vectors to be compatible, R will also implicitly coerce the length of vectors. This is called vector recycling, because the shorter vector is repeated, or recycled, to the same length as the longer vector.

This is generally most useful when you are mixing vectors and “scalars”. I put scalars in quotes because R doesn’t actually have scalars: instead, a single number is a vector of length 1. Because there are no scalars, most built-in functions are vectorised, meaning that they will operate on a vector of numbers. That’s why, for example, this code works:

sample(10) + 100
#> [1] 109 108 104 102 103 110 106 107 105 101
runif(10) > 0.5
#> [1] TRUE TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE

In R, basic mathematical operations work with vectors. That means that you should never need to perform explicit iteration when performing simple mathematical computations.

It’s intuitive what should happen if you add two vectors of the same length, or a vector and a “scalar”, but what happens if you add two vectors of different lengths?

1:10 + 1:2
#>  [1]  2  4  4  6  6  8  8 10 10 12

Here, R will expand the shortest vector to the same length as the longest, so called recycling. This is silent except when the length of the longer is not an integer multiple of the length of the shorter:

1:10 + 1:3
#> Warning in 1:10 + 1:3: longer object length is not a multiple of shorter
#> object length
#>  [1]  2  4  6  5  7  9  8 10 12 11

While vector recycling can be used to create very succinct, clever code, it can also silently conceal problems. For this reason, the vectorised functions in tidyverse will throw errors when you recycle anything other than a scalar. If you do want to recycle, you’ll need to do it yourself with rep().

Making longer vectors with `c()`

To greater longer vectors from shorter vectors, use c():

dbl_var <- c(1, 2.5, 4.5)
int_var <- c(1L, 6L, 10L)
lgl_var <- c(TRUE, FALSE)
chr_var <- c("these are", "some strings")

In diagrams, I’ll depict vectors as connected rectangles, so the above code could be drawn as follows:

Coercion

There are two ways to convert, or coerce, one type of vector to another:

Explicit coercion happens when you call a function like as.logical(), as.integer(), as.double(), or as.character(). Whenever you find yourself using explicit coercion, you should always check whether you can make the fix upstream, so that the vector never had the wrong type in the first place. For example, you may need to tweak your readr col_types specification.
Implicit coercion happens when you use a vector in a specific context that expects a certain type of vector. For example, when you use a logical vector with a numeric summary function, or when you use a double vector where an integer vector is expected.

Because explicit coercion is used relatively rarely, and is largely easy to understand, I’ll focus on implicit coercion here.

You’ve already seen the most important type of implicit coercion: using a logical vector in a numeric context. In this case TRUE is converted to 1 and FALSE converted to 0. That means the sum of a logical vector is the number of trues, and the mean of a logical vector is the proportion of trues.

You may see some code (typically older) that relies on implicit coercion in the opposite direction, from integer to logical:

if (length(x)) {
# do something
}

In this case, 0 is converted to FALSE and everything else is converted to TRUE. I think this makes it harder to understand your code, and I don’t recommend it. Instead be explicit: length(x) > 0.

An atomic vector can not have a mix of different types because the type is a property of the complete vector, not the individual elements. If you need to mix multiple types in the same vector, you should use a list, which you’ll learn about shortly.

Testing and coercing

Generally, you can test if a vector is of a given type with an is. function, but they need to be used with care.

is.character(), is.double(), is.integer(), and is.logical() do what you might expect: they test if a vector is a character, double, integer, or logical.

Beware is.vector(), is.atomic(), and is.numeric(): they don’t test if you have a vector, atomic vector, or numeric vector, and you’ll need to carefully read the docs to figure out what they do do.

The type is a property of the entire atomic vector, so all elements of an atomic must be the same type. When you attempt to combine different types they will be coerced to the most flexible one (character >> double >> integer >> logical).

For example, combining a character and an integer yields a character:

str(c("a", 1))
#>  chr [1:2] "a" "1"

Coercion often happens automatically. Most mathematical functions (+, log, abs, etc.) will coerce to numeric.

This coercion is particularly useful for logical vectors because TRUE becomes 1 and FALSE becomes 0.

x <- c(FALSE, FALSE, TRUE)
as.numeric(x)
#> [1] 0 0 1

# Total number of TRUEs
sum(x)
#> [1] 1

# Proportion that are TRUE
mean(x)
#> [1] 0.333

Vectorised logical operations (&, |, any, etc) will coerce to a logical, but since this might lose information, it’s always accompanied by a warning.

Generally, you can deliberately coerce by using an as. function, like as.character(), as.double(), as.integer(), or as.logical().

Failed coercions from strings generate a warning and a missing value:

as.integer(c("1", "1.5", "a"))
#> Warning: NAs introduced by coercion
#> [1]  1  1 NA

Test functions

Sometimes you want to do different things based on the type of vector.

One option is to use typeof().

Another is to use a test function which returns a TRUE or FALSE.

Base R provides many functions like is.vector() and is.atomic(), but they often returns surprising results. Instead, it’s safer to use the is_* functions provided by purrr, which are summarised in the table below.

	lgl	int	dbl	chr	list
`is_logical()`	x
`is_integer()`		x
`is_double()`			x
`is_numeric()`		x	x
`is_character()`				x
`is_atomic()`	x	x	x	x
`is_list()`					x
`is_vector()`	x	x	x	x	x

Each predicate also comes with a “scalar” version, like is_scalar_atomic(), which checks that the length is 1. This is useful, for example, if you want to check that an argument to your function is a single logical value.