Linear Models -- Foundation Class Exercise Handout What Type of Variables? Before you begin what are your choices (main categories, subcategories)? What is the Response Variable? Before your being what is a response variable? LM Foundation 2 Linear Models A categorization scheme 1 Factor 1

4 3 2 Factors 2 5 All use a common foundation of theory LM Foundation 5 Class Exercise Handout Which Test? Why? Identify response variable and explanatory variable(s) Determine which type of variable each is. Use table to identify method to use.

LM Foundation 6 Example Data Sex & Direction A sample of 30 males and 30 females was taken to an unfamiliar wooded park and given spatial orientation tests, including pointing to the south. The absolute pointing error, in degrees, was recorded. The results are in the SexDirection.csv file on the webpage. Is there a difference in sense of direction between men and women? from Sholl, M.J., J.C. Acacio, R.O. Makar, and C. Leon. 2000. The relation of sex and sense of direction to spatial orientation in an unfamiliar environment. Journal of Environmental Psychology. 20:17-28. LM Foundation 12

Example Data Sex & Direction What are the hypotheses? HO: mm-mf=0 HA: mm- mf 0 Use which hypothesis test? Two Sample T-test What is conclusion from handout? No significant difference in mean APE between males and females LM Foundation 13 Competing Models Characteristic Hypothesis

# Parameters Relative Fit Full Model HA Simple Model H0 More Better Less Worse LM Foundation 14 Simple

H0 More Better Less Worse 0 Full HA Absolute Error 50 100 150 Competing Models

Female Male Sex LM Foundation 15 Competing Models 2-sample T H0: mi = m 0 Absolute Error 50 100 150 The mean for each group equals a single grand mean

i.e., No difference in group means Female Male Sex LM Foundation 16 Competing Models 2-sample T HA: mi = mi (where m1m2) 0 Absolute Error 50

100 150 Each group mean equals a different value i.e., Difference in group means Female Male Sex LM Foundation 17 Competing Models Characteristic Full Model HA

Simple Model H0 Hypothesis # Parameters More Better Relative Fit Is the benefit of a Less Worse 0 Absolute Error 50 100 150

0 Absolute Error 50 100 150 better fit worth the cost of added complexity? Female Male Sex Female Male

Sex 18 0 0 Absolute Error 50 100 150 Absolute Error 50 100 150 Measuring Fit

Female Male Sex Female Male Sex Measuring Fit Notation Yij = Y measurement on individual j in group i I = total number of groups ni = number of individuals in group i n = number of individuals in all groups `Yi. = group i sample mean (i.e., group mean) `Y.. = sample mean of all individuals (i.e., grand mean) LM Foundation

20 Measuring Fit Notation Examples ith Group Sample Mean Grand Sample Mean LM Foundation 21 Measuring Fit SS Measures lack-of-fit of a model to a set of data LM Foundation 22 Measuring Fit SSTotal

Absolute Error 50 100 150 = 115465 model 0 data Female Male Sex LM Foundation

23 Measuring Fit SSWithin Absolute Error 50 100 150 =110496 0 data Female Male Sex

model Measuring Fit SSWithin & SSTotal SSTotal SSWithin Full model ALWAYS fits better! 0 SSWithin Absolute Error 50 100 150

SSTotal = 115465 SSWithin= 110496 Female Male Sex LM Foundation 25 Measuring Fit SSTotal Partitions SSTotal = SSWithin

+ SSAmong where Difference in SS between full & simple models Improvement in lack-of-fit when using full model (rather than simple model) Measure of how different the group means are LM Foundation 26 Measuring Fit SSAmong Must not forget about differences in model complexity! 0

SSAmong Absolute Error 50 100 150 What would make SSamong be large? Female Male Sex LM Foundation 27

Measuring Complexity df = n number of predictions Simple model dfTotal = n-1 Full model dfWithin = n-I dfTotal = dfWithin + dfAmong dfAmong = I-1 Difference in number of model parameters Added complexity of full model LM Foundation 28 Fit vs. Complexity Factor out difference in number of parameters on fit calculation by dividing SS by df Result is mean square (MS) MS are sample variances MSTotal = s2 = total variability among individuals

around grand mean MSWithin = sp2 = pooled variability among individuals around group means MSAmong = variability of group means around the grand mean LM Foundation 29 Fit vs. Complexity MS Suppose that MSAmong = 10 Is this large if MSWithin = 100? Is this large if MSWithin = 1? M S Among F= M S Within LM Foundation

30 Fit vs Complexity F Distribution Has numerator and denominator df numerator from dfAmong denominator from dfWithin F Right-skewed, all positive numbers P-value always upper tail MS Among MS Within LM Foundation 31

Fit vs. Complexity p-value Full model not better Group means do not differ SSAmong Small SSAmong MSWithin SS Among MS Among df Among 0 Small MSAmong relative to MSWithin

F MS Among Absolute Error 50 100 150 Large p-value? Small F Female Male Sex LM Foundation 32

Fit vs. Complexity p-value Large p-value? Small F Small p-value? Large F Small MSAmong relative Large MSAmong relative to MSWithin to MSWithin Small SSAmong Large SSAmong Full model not better Full model is better Group means do not Group means do differ differ LM Foundation 33

Linear Models in R HO Note use of lm() summary() coef() confint() fitPlot() anova() LM Foundation 34 Things To Remember Always two models Full model is separate means for each group Simple model is a single mean for each group The SSTotal partitions into two parts -- SSAmong+SSWithin = SSTotal SSAmong is the improvement in lack-of-fit using the full model

MS are SS/df and are variances MSTotal is variance of Y MSWithin is the pooled common variance dfAmong is the increase in complexity of the full model MSAmong + MSWithin not = MSTotal (because of different df) 35