Statistics versus Efficacy

What are Statistics ?

===> a great difficulty to manage jointly the two former aspects.

What to do ?

Are there problems with regression ?

What are the remaining problems ?


Concentration of Lactose versus number of Leucocytes

Different types of models

Individual influence

===> Sensitivity to specific data

Outliers with respect to Y

Outliers with respect to the xi

Leverage points with low residuals

Individual influence diagnostics

Diagnostics based on the residuals

Examination of residuals


Prediction influence diagnostics

Comparison of individual influence diagnostics

Robust regression

            Either individual diagnostics and classical (non-robust) adjustment

            or robust adjustment and detection of extremes.



Least absolute deviations




M regression



GM regression

Least median squares method


Least trimmed squares method


At least four aspects in sampling

Selection on predictors

           *    recall that selection can (should) not be done on responses.


Precision increasing

The variance of the estimator of the slope of regression is:

when x is a randomly drop out of a population the expectation is:

It can be usefull to increase the variance of X by selection at the extremes.

What has been done ?

Univariate selection procedures

Precision increasing

random sampling:   0,05075342
40/20/40:  ]-4,-1[ / [-1,1] / ]1,4[ 0,03537267
40/20/40: ]-2,-1[ / [-1,1] / ]1,2[  0,03863843


Precision increasing

      Correlation of the estimators
random sampling: 0,05011554 0,05131047 -0,01838
selection on the predictors 0,03452182  0,03477434  0,01189
selection on a linear combination of the predictors  0,04430179 0,04344855 -0,3664