16.1 Example School Data

The data used in this first example comes from a publicly available data set called the National Education Longitudinal Study of 1988 (yes it’s a bit old data, but sufficient for our purposes here). In this data set math scores are recorded for 519 students from 23 schools.

List a few characteristics that you think are associated with math performance, and at what level they are measured.

The School23 data set contains the following variables:

  • School (macro) level variables
    • School type
    • class structure
    • school size
    • urbanity
    • geographic region
    • percent minority
    • student-teacher ratio
  • Student (micro) level variables
    • Gender
    • Race
    • Time spent on math homework
    • SES
    • parental education
    • math score

Imagine a model of math score based on school type (\(X_{1}\) 1 for public, 0 for private) and SES (\(X_{2}\)).

\[ Y_{i} = \beta_{0} + \beta_{1}X_{1i} + \beta_{2}X_{2i} + \epsilon_{i}, \qquad i = 1, \ldots, n=519 \]

This model does not take into the account the hierarchical nature of the data in that students are nested within schools. School type is a macro level variable, SES is a micro level variable. We could consider adding indicator variables for each of the 23 schools to create a Fixed Effects model,

\[ Y_{i} = \beta_{0} + \beta_{1}(SchoolType)_{1i} + \beta_{2}(SES)_{2i} + \beta_{3}(School2)_{i} + \ldots + \beta_{24}(School23)_{i} + \epsilon_{i} \]

but we already are well aware of fitting models with that many parameters, and when some school only have a few students in them. So we need a different model.