- When there is no need of statistics, there is no need of the "help" of an statistician
- When there is some need of (elementary) statistics, the "help" of a statistician can be avoided (fortunately)
- When there is some need of (less elementary) statistics, the "help" of a statistician can be avoided (only for a while, unfortunately)
- When the problem still is complicated the "help" becomes necessary.
- When the "help" has been requested, the problem becomes untractable, but it is too late to backtrack, the fox has minded the geese.

- a theoretically well-established methodology to model and analyze problems concerned with uncertainty (mathematics)
- an empirical and know-how approach concerned with extraction of valuable information contained in data (engineering)

===> a great difficulty to manage jointly the two former aspects.

- try to find the midpoint between efficacy and simplicity

- No problem at all as far as theory has been tied up for years.
- Some problems when you try to use regression in a specific context.
- Many problems when you try to define rules of application and to achieve some transparency.

- Model (models)
- Criterion (criteria)
- Sampling
- Precision
- Outliers
- Transparency

Concentration of Lactose versus number of Leucocytes

- Linear models (the most used: Classical Regression).
- "Extended Linear Models" (PCR, LRR, PLS).
- Generalized Linear Models (Used by NL to repredict sex).
- Non Linear Models (at least Neural Nets).

- Robustness of the estimators

===> Sensitivity to specific data

- outliers
- leverage points
- subpopulations (?)

**High residual**

**Leverage points**

- Linked with the identification of outliers

**Either**
individual diagnostics and classical (non-robust) adjustment

**or**
robust adjustment and detection of extremes.

- Evaluation of robustness: breakdown point, Minimal contamination level

- Least absolute deviations method.
- M-regression.
- Least median squares method.
- Least trimmed squares method.

- Estimation of $ by minimisation of

- Robustness to outliers in Y
- Weak robustness to leverage points

- Effectiveness : 64%

- Estimation of $ by minimisation of

- Robustness to outliers in Y

- Weak robustness to leverage points

- representativity (sample survey)
- sample size
- sub-populations
- selection on predictors (precision improvement)

- selection on predictors has been usually made
* remark that the variables on which the selection operates MUST be put in the model (BE)

---> In this case no problem.

- selection has been made on combinations of predictors

* recall that selection can (should) not be done on responses.

The variance of the estimator of the slope of regression is:

when x is a randomly drop out of a population the expectation is:

It can be usefull to increase the variance of X by selection at the extremes.

- random selection
- over selection of the extremes (40 /20 /40) on a predictor (?)
- over selection of the extremes on a linear combination of the predictors (the predicted lean meat %).
- over selection of the extremes on some predictors.

random sampling: | 0,05075342 | |

40/20/40: | ]-4,-1[ / [-1,1] / ]1,4[ | 0,03537267 |

40/20/40: | ]-2,-1[ / [-1,1] / ]1,2[ | 0,03863843 |

Correlation of the estimators | |||

random sampling: | 0,05011554 | 0,05131047 | -0,01838 |

selection on the predictors | 0,03452182 | 0,03477434 | 0,01189 |

selection on a linear combination of the predictors | 0,04430179 | 0,04344855 | -0,3664 |