Statistics versus Efficacy

What are Statistics ?

===> a great difficulty to manage jointly the two former aspects.

What to do ?

Are there problems with regression ?

What are the remaining problems ?

Introduction

Concentration of Lactose versus number of Leucocytes

Different types of models

Individual influence

===> Sensitivity to specific data

Outliers with respect to Y

Outliers with respect to the xi

Leverage points with low residuals

Individual influence diagnostics

Diagnostics based on the residuals

Examination of residuals

 

Prediction influence diagnostics

Comparison of individual influence diagnostics

Robust regression

            Either individual diagnostics and classical (non-robust) adjustment

            or robust adjustment and detection of extremes.

 

Alternatives

Least absolute deviations

           

               

Example:

M regression

       

               

GM regression

Least median squares method

Example

Least trimmed squares method

Sampling

At least four aspects in sampling

Selection on predictors

           *    recall that selection can (should) not be done on responses.

 

Precision increasing

The variance of the estimator of the slope of regression is:

when x is a randomly drop out of a population the expectation is:

It can be usefull to increase the variance of X by selection at the extremes.

What has been done ?

Univariate selection procedures

Precision increasing

     
random sampling:   0,05075342
40/20/40:  ]-4,-1[ / [-1,1] / ]1,4[ 0,03537267
40/20/40: ]-2,-1[ / [-1,1] / ]1,2[  0,03863843

 

Precision increasing

      Correlation of the estimators
random sampling: 0,05011554 0,05131047 -0,01838
selection on the predictors 0,03452182  0,03477434  0,01189
selection on a linear combination of the predictors  0,04430179 0,04344855 -0,3664