Chapter 6.2.4 Logistic Regression

Logistic regression with glm(family = "binomial"

The inverse logit function used in binary logistic regression to convert logits to probabilities.

Figure 15.3: The inverse logit function used in binary logistic regression to convert logits to probabilities.

The most common non-normal regression analysis is logistic regression, where your dependent variable is just 0s and 1.

To do a logistic regression analysis with glm(), use the family = binomial argument.

Let’s run a logistic regression on the diamonds dataset. First, I’ll create a binary variable called value.g190 indicating whether the value of a diamond is greater than 190 or not. Then, I’ll conduct a logistic regression with our new binary variable as the dependent variable. We’ll set family = "binomial" to tell glm() that the dependent variable is binary.

Modeling Conditional Probabilities

So far, we either looked at estimating the conditional expectations of continuous variables (as in regression), or at estimating distributions. There are many situations where however we are interested in input-output relationships, as in regression, but the output variable is discrete rather than continuous. In particular there are many situations where we have binary outcomes (it snows in Pittsburgh on a given day, or it doesn’t; this squirrel carries plague, or it doesn’t; this loan will be paid back, or it won’t; this person will get heart disease in the next five years, or they won’t). In addition to the binary outcome, we have some input variables, which may or may not be continuous. How could we model and analyze such data?

Transforming skewed variables prior to standard regression

If you have a highly skewed variable that you want to include in a regression analysis, you can do one of two things.

Option 1 is to use the general linear model glm() with an appropriate family (like family = "gamma").

Option 2 is to do a standard regression analysis with lm(), but before doing so, transforming the variable into something less skewed. For highly skewed data, the most common transformation is a log-transformation.