The many faces of regression

Một phần của tài liệu R in action (Trang 199 - 202)

The term regression can be confusing because there are so many specialized varieties (see table 8.1). In addition, R has powerful and comprehensive features for fitting regression models, and the abundance of options can be confusing as well. For ex- ample, in 2005, Vito Ricci created a list of over 205 functions in R that are used to generate regression analyses (http://cran.r-project.org/doc/contrib/Ricci-refcard- regression.pdf).

Table 8.1 Varieties of regression analysis

Type of regression Typical use

Simple linear Predicting a quantitative response variable from a quantitative explanatory variable

Polynomial Predicting a quantitative response variable from a quantitative explanatory variable, where the relationship is modeled as an nth order polynomial

Multiple linear Predicting a quantitative response variable from two or more explanatory variables

Multivariate Predicting more than one response variable from one or more explanatory variables

Logistic Predicting a categorical response variable from one or more explanatory variables

Poisson Predicting a response variable representing counts from one or more explanatory variables

Cox proportional hazards Predicting time to an event (death, failure, relapse) from one or more explanatory variables

Time-series Modeling time-series data with correlated errors

Nonlinear Predicting a quantitative response variable from one or more explanatory variables, where the form of the model is nonlinear Nonparametric Predicting a quantitative response variable from one or more

explanatory variables, where the form of the model is derived from the data and not specified a priori

Robust Predicting a quantitative response variable from one or more

explanatory variables using an approach that’s resistant to the effect of influential observations

In this chapter, we’ll focus on regression methods that fall under the rubric of ordinary least squares (OLS) regression, including simple linear regression, polynomial regres- sion, and multiple linear regression. OLS regression is the most common variety of sta- tistical analysis today. Other types of regression models (including logistic regression and Poisson regression) will be covered in chapter 13.

8.1.1 Scenarios for using OLS regression

In OLS regression, a quantitative dependent variable is predicted from a weighted sum of predictor variables, where the weights are parameters estimated from the data. Let’s take a look at a concrete example (no pun intended), loosely adapted from Fwa (2006).

An engineer wants to identify the most important factors related to bridge deterioration (such as age, traffic volume, bridge design, construction materials and methods, construction quality, and weather conditions) and determine the

mathematical form of these relationships. She collects data on each of these variables from a representative sample of bridges and models the data using OLS regression.

The approach is highly interactive. She fits a series of models, checks their compliance with underlying statistical assumptions, explores any unexpected or aberrant findings, and finally chooses the “best” model from among many possible models. If successful, the results will help her to

n Focus on important variables, by determining which of the many collected variables are useful in predicting bridge deterioration, along with their relative importance.

n Look for bridges that are likely to be in trouble, by providing an equation that can be used to predict bridge deterioration for new cases (where the values of the predictor variables are known, but the degree of bridge deterioration isn’t).

n Take advantage of serendipity, by identifying unusual bridges. If she finds that some bridges deteriorate much faster or slower than predicted by the model, a study of these “outliers” may yield important findings that could help her to understand the mechanisms involved in bridge deterioration.

Bridges may hold no interest for you. I’m a clinical psychologist and statistician, and I know next to nothing about civil engineering. But the general principles ap- ply to an amazingly wide selection of problems in the physical, biological, and so- cial sciences. Each of the following questions could also be addressed using an OLS approach:

n What’s the relationship between surface stream salinity and paved road surface area (Montgomery, 2007)?

n What aspects of a user’s experience contribute to the overuse of massively multi- player online role playing games (MMORPGs) (Hsu, Wen, & Wu, 2009)?

n Which qualities of an educational environment are most strongly related to higher student achievement scores?

n What’s the form of the relationship between blood pressure, salt intake, and age?

Is it the same for men and women?

n What’s the impact of stadiums and professional sports on metropolitan area de- velopment (Baade & Dye, 1990)?

n What factors account for interstate differences in the price of beer (Culbertson

& Bradford, 1991)? (That one got your attention!)

Our primary limitation is our ability to formulate an interesting question, devise a use- ful response variable to measure, and gather appropriate data.

8.1.2 What you need to know

For the remainder of this chapter I’ll describe how to use R functions to fit OLS regres- sion models, evaluate the fit, test assumptions, and select among competing models.

It’s assumed that the reader has had exposure to least squares regression as typical- ly taught in a second semester undergraduate statistics course. However, I’ve made

efforts to keep the mathematical notation to a minimum and focus on practical rather than theoretical issues. A number of excellent texts are available that cover the statis- tical material outlined in this chapter. My favorites are John Fox’s Applied Regression Analysis and Generalized Linear Models (for theory) and An R and S-Plus Companion to Applied Regression (for application). They both served as major sources for this chapter.

A good nontechnical overview is provided by Licht (1995).

Một phần của tài liệu R in action (Trang 199 - 202)

Tải bản đầy đủ (PDF)

(474 trang)