If the “one unit change” is measured in grams, then 0.097848 is likely a large effect. Whether 0.097848 is big or small depends critically on the units used for the Cement variable. Recall the interpretation of the coefficients: “A one unit change in variable \(X_i\) is associated with a \(\beta_i\) change in the response variable.” So here, a one unit change in Cement is associated with a 0.097848 change in Strength. # F-statistic: 142.2 on 8 and 94 DF, p-value: < 2.2e-16Īs one might expect, the adjusted R 2 has gone up slightly relative to the kitchen sink model.ġ0.6 Standardized regression coefficients # Residual standard error: 2.664 on 94 degrees of freedom # lm(formula = Strength ~ No + Cement + Slag + Water + CoarseAgg + This is what I mean by changing granularity. A regression equation with a zillion dummy variables in it is hard to read and has little generalizable business value.įor example, instead of having a factor “city” with many different levels/values. You will likely find approach (2) to be the most useful in practice because, in many cases, you will want to change the granularity of your categorical variables.
This is what the negative coefficient for the AirEntrainYes variable tells us: adding air leads to an average decrease in strength of -6.068252. As such, the best-fit line slopes downwards. It turns out that the mean of the points at AirEntrainDummy = 0 is higher than then mean of the points at AirEntrainDummy = 1. But you can see the basic idea: Each measure of concrete strength falls on either the AirEntrainDummy = 0 or the AirEntrainDummy = 1 tick mark. Here, I have had to manually create a temporary AirEntrainDummy variable in order to get the regression line to plot correctly. Ggplot( data=Con %>% mutate( AirEntrainDummy = if_else(AirEntrain = "Yes", 1, 0)), mapping= aes( x=AirEntrainDummy, y=Strength)) + geom_point() + geom_smooth( method=lm, col= "blue", se= FALSE) We can isolate AirEntrain and show this visually: There is no “AirEntrainNo” variable because “No” has been selected as the base-case (when AirEntrainYes = 0).
R adds the “Yes” suffix to remind us that the original values of AirEntrain have been mapped to “Yes” = 1. You can see in the table of coefficients that a new variable called “AirEntrainYes” has been added to the model automatically.
# Residual standard error: 2.679 on 93 degrees of freedom # F-statistic: 56.21 on 8 and 94 DF, p-value: < 2.2e-16Īs we should expect, this result is identical to the kitchen sink model in Excel (R 2 = 0.8271). # Residual standard error: 4.01 on 94 degrees of freedom 10.6 Standardized regression coefficients.9.1.3 Model quality and statistical significance.7.3.2 Using gmodel’s CrossTable Command.7 Gap Analysis with Categorical Variables.6.3.4 Equality of variance test (formula).6.3.3 Equality of variance test (pivoted columns).6.3.2 Equality of variance test (columns).6.2.2 Boxplots in base R (and formula notation).5.3 Recode According to List Membership.3.3.4 Relative frequency (more advanced).2.1.3 Load the tidyverse package into R.