Task 2: Assessment of classification methods 


EC regulation n3127/94 states 3 statistical constraints.

      The method for assessing the lean meat content of pig carcasses has to be either double regression or other statistically proven procedure.

      The resulting precision is at least equal to that obtained using standard regression techniques on 120 carcasses.

      Authorisation of the grading methods shall, moreover, be subject to the root mean square deviation of the errors, measured about zero, being less than 2,5.

In case of double regression, formulas to check the two last constraints have been developed, but they need a wider diffusion within the EU. Another technique, called regression with surrogate predictors, has been imagined in order to reduce experimental cost. But the second constraint has to be explained. Due to increasing complexity of the measuring procedures other techniques like principal components regression (PCR) or partial least squares (PLS) have been used. For these methods the two last constraints have not been demonstrated in particular the criterion root mean square deviation of the errors is not adequate.

In case of very large numbers of measurements collected per carcass it is also interesting to reduce experimental cost. Constraints are not explicit for any combination of all these methods.

Other methods, like for instance, project pursuit regression, multivariate calibration, krigeage or neural network seem to be potentially interesting, but also an explanation of the constraints does not exist.  



The objective of this task is to enable the EU countries to respect the constraints of the EU regulations about pig classification on the same scientific basis. Furthermore, to make it possible to use the most advanced statistical techniques in the correct and cheapest way. It is therefore very important to solve the present problems, to anticipate the future problems and to form a basis for the evolution of the regulations.

Work in progress

In order to manage both outliers and influential observations, robust estimation seems to be an efficient tool.

A study and description of all relevant statistical methods for estimating predictions formulas are in progress. Comparison between OLS and PLS has been done. Data from Autofom have been processed by PLS using SAS software.

Some works concern the criteria to validate the estimations and to assess the predictive abilities. It is proposed to use the average of Mean Squared Prediction Error (MSPE) where the average is over a random sample from the national population. Specific attention is paid to bias.