The data relates to credit approval decisions, particularly whether or not a client is approved for credit depending on their credit score, years of credit history, revolving balance, revolving utilization and home ownership status. The natural dependent variable to be predicted is the credit score, which is dependent on the years of credit, the revolving balance and the revolving utilization. Apart from the credit score, the approval decision can also be used as a categorical depend variable. It should however be noted that the ordinary least squares method cannot suffice to produce a good linear unbiased estimator and as such, a linear probability model would have to be adopted. However, in this case, the regression line will not be a good fit for the data, which implies that usual measures such as the coefficient of determination () are more often than not unreliable.
Moreover, LPM models are also characterised by heteroskedasticity and most likely produce estimates that are greater than 1 and less than 0, which makes them difficult to interpret because the estimates are probabilities, which should not be greater than one. The error term in such models is also likely to be non-normal, because they follow abnormal distributions. Finally, the relationship between the variables is also likely to be non-linear, which suggests that a different type of regression line would be required to fit the data more accurately, for instance an ‘S’ shaped curve.