Aspects of sampling in pig grading

Bas Engel, Willem Buist

Institute for Animal Science and Health (ID-Lelystad), Lelystad, The Netherlands.

Classification of pig carcases in the European Community is based on the lean meat percentage of the carcass. The lean meat percentage is predicted from instrumental carcass measurements obtained in the slaughterline. The prediction formula employed has to meet requirements for authorisation as put down in EC regulations. Requirements involve the sampling procedure, the sample size and the accuracy of prediction. Formulae are often derived by linear regression. We will discuss a type of sampling scheme, which has been submitted for authorisation on a number of occasions, but lacks formal statistical justification when employed in conjunction with linear regression. Our aim is to assess the performance of the prediction formula that follows from the potentially faulty combination of the sampling scheme and linear regression in relation to the requirements in the EC regulations.

In linear regression, inference is based on the distribution of the dependent variable, i.e. the lean meat percentage, conditional upon the independent variables, i.e. instrumental carcass measurements. To put it less formally, we make an educated guess of the percentage lean on the basis of the likelihood of different lean meat percentages that may correspond to the observed instrumental carcass measurements, e.g. fat and muscle depth at specific places on the carcass. The carcasses may be selected on the basis of the fat and muscle depth measurements because our only interest is in the random variation of the percentage lean given the observed fat and muscle depth measurements. We have no particular interest in the random variation in the fat and muscle depth measurements themselves. It is in fact well known that selection of carcasses with more extreme fat or muscle depth measurements will improve the accuracy of prediction. Therefore, carcasses are usually not selected randomly but according to a sampling scheme that favours a larger percentage of more extreme instrumental measurements.

The EC regulations (EC, 1994) state that a sample of pig carcasses should be representative for a national or regional pig population. Possibly because of a misunderstanding about the intention of the regulation, carcasses are regularly selected not only on the basis of the instrumental measurements, but also on the basis of other variables, such as carcass weight. These additional selection variables are not intended to be included in the prediction formula. However, carcasses no longer offer a correct impression of the most likely values for the percentage lean, given the observed instrumental measurements, because of the additional selection on e.g. carcass weight. Inference based on traditional regression theory may be misleading. The accuracy of prediction may actually be less than required in the regulations, which means that a formula may be wrongly authorised. Obviously the European Community is not well served by the authorisation of inaccurate formulas with adverse effects for harmonisation between countries. Alternatively, when the prediction formula is more accurate than it appears to be on the basis of standard linear regression theory, it may mistakenly not be authorised. This is quite problematic for the region or country involved since considerable effort and expense are invested in the introduction of a new measurement instrument in the slaughterline.

In this talk we discuss the performance of the potentially faulty sampling scheme on the basis of results from computer simulation. Initially, simulated data are based on actual and historical data from The Netherlands. The instrumental measurements are a fat and muscle depth measurement obtained with the Henessy Grading Probe. The additional selection variable is carcass weight. We study other data configurations as well. These are for instance relevant for formulae for two measurement instruments derived from the same sample of dissected carcasses. In that case selection may be based on instrumental measurements that appear in one of the formulae but not in the other. For the latter instrument, results from traditional regression will be in doubt and our simulation results will indicate how quality of prediction may be affected.