Biostatistics and Biometrics Open Access Journal

Mini Review

An Improved Estimation Procedure of the Mean of a Sensitive Variable Using Auxiliary Information

Tanveer AT¹ and Housila PS²

¹Department of Computer Science and Engineering, Islamic University of Science and Technology, India

²School of Studies in Statistics, Vikram University Ujjain, India

Submission: August 22, 2017; Published: October 02, 2017

*Corresponding author: Tanveer A Tarray, Department of Computer Science and Engineering, Islamic University of Science and Technology, India, Email: tanveerstat@gmail.com

How to cite this article: Tanveer AT, Housila PS. An Improved Estimation Procedure of the Mean of a Sensitive Variable Using Auxiliary Information. Biostat Biometrics Open Acc J. 2017; 3(2): 555607. DOI:10.19080/BBOAJ.2017.03.555607

Abstract

This paper proposes new ratio and regression estimators for the mean of sensitive variable utilizing information on a non - sensitive auxilia-ry variable. Expressions for the Biases and mean square errors of the suggested estimators correct up to first order of approximation are derived. It has been shown that the suggested new ratio and regression estimators are better than conventional unbiased estimators which do not utilize the auxiliary information, Sousa et al. [1] ratio estimator and Gupta et al. [2] regression estimator under a very realistic condition. In support of the present study we have also given the numerical illustrations.

Keywords: Ratio estimator; Regression estimator; Randomized response technique; Mean Square error; Bias; Auxiliary variable

Introduction

Let Y be the variable under study, a sensitive variable which can't be observed directly. Let X is a non - sensitive auxiliary variable which is strongly correlated with Y. Let S be a scrambling variable independent of the study variable Y and the auxiliary variable X. The usual additive model used for gathering information on quantitative sensitive variable is due to Himmelfarb & Edgell [3]. Their model allows the interviewee to hide personal information using a scrambling variable to their response. The respondent is asked to report a scrambled response for the study variable Y (based on additive model) given by Za =Y+S, but is asked to provided a true response for the auxiliary variable X [1].

Hussain [4] have discussed the use of subtracting scrambling. Thus following Hussain [4], the respondent is asked to report a scrambled response for the study variable Y (based on subtractive model) given by Zs = Y-S, but is asked to provide a true response for the auxiliary variable X. It is interesting to mention that the proposed model generalizes both usual additive and subtractive models. Gjestvang & Singh [5] have pointed out that “the practical application of an additive model is much easier than the multiplicative model, that is, respondents may like to add two numbers rather than doing painstaking work of multiplying two numbers or dividing two numbers: thus the improvement of the additive model has its own importance in the literature”. Looking at the form the additive model, subtractive model and above arguments due to Gjestvang & Singh [5] we have introduced a new model (which is additive in nature)

Z_φ = Y + φS

where φ is a known scalar such that - 1 ≤ φ ≤1.

Thus keeping the proposed model Z_φ = Y + φS view, the respondent is asked to report a scrambled response for Y given but is asked to provide a true response for X. Let a simple random sample of size n be drawn without replacement from a finite population U = (U1,U2,...UN). For the ith unit (i=1,2,... ,N), let and respectively be the values of the study vari able Y and the auxiliary variable X. Further, let be the sample means and be the population mean for Y, X and Z_φ respectively. We assume that the population mean of the auxiliary variable X is known and = E(S)= O

Thus,E(Zφ) = E(Y) .We also define

where Cx and C_zφ are coefficients of variation of X and ^Z_φ respectively, and ρ_xz is the correlation coefficient between X and Z_φ The square of the coefficient of variation of

If information on auxiliary variable X is ignored, then the mean square error of the estimator basedon conventional additive model Z _a = Y +S is given by

Further if the information on auxiliary variable X not utilized, then the mean square error of the estimator based on conventional subtractive model Z_s = Y - S is given by

It follows from (1.1) and (1.2) that

The mean square error of the estimator based on the suggested additive model Z_φ= Y + φS is given by

Thus in the proposed additive model Z_φ = Y + φS , the choice of the value of the scalar between -1 to +1 is justified.

We note that Sousa et al. [1] have mentioned in their study (about their proposed estimators) that "there is hardly any difference in the first order and second order approximations for mean square error (MSE) even for small sample sizes". Keeping this in view, we have studied the properties of the proposed estimators in the subsequent sections only to the first order of approximation. The merits of the proposed estimators are examined through numerical illustration.

The suggested ratio estimator

We consider the following ratio estimator for the population mean of the study variable Y using the known population mean of the auxiliary variable X:

We note that for =1, the proposed estimator reduces to the estimator

which is due to Sousa et al [1], where For estimator

based on true responses of variables Y and X.

Expressing (2.1) in terms of ^e_Zφ and ex we have

We assume that |ex|<1 so that (1+ex)-1 is expandable. Expanding the right hand side of (2.4), multiplying out and neglecting terms of e's having power greater than we have

Taking expectation of both sides of (2.5) we get the bias of to the first order of approximation as

It is observed from (2.6) that the bias of the proposed 2.6). Thus the bias of the proposed estimator is independent of ϕ.So whatever be the vale of ϕ' the bias of Will remains same as given in (2.6). Thus the bias of the proposed estimator and the bias of the estimator due to Sousa et al. [1] are same. This fact can also be seen from (2.6) and (2.9).

Squaring both sides of (2.5) and neglecting terms of e's having power greater than two we have

Taking expectation of both sides of (2.7) we get the mean square error (MSE) of to the first degree of approximation as

Expression (2.8) indicates that the MSE of the proposed estimator depends on the scalar ϕ. So there will be effect of selecting the value of ϕ towards increasing or decreasing the MSE of . So one should be very cautious about the selection of value of ϕ . Setting φ = 1 in (2.6) and (2.8) we get the bias and MSE of the Sousa et al. [1] estimator t_R(1) to the first degree of approximation respectively as

Efficiency Comparison

Thus the proposed estimator is more efficient than the usual unbiased estimator as long as the condition (2.11) is satisfied. The conditon (2.11) also holds for the proposed estimator to be better than the usual estimator based on subtractive model.

This is a condition of the classical ratio estimator t_R in (2.3) to be better than the usual unbiased estimator It follows from (2.11) and (2.12) the proposed estimator more efficient than the unbiased estimator and if the conditions (2.11) holds true.

Further from (2.8) and (2.10) we have

Thus it follows from (2.11), (2.12) and (2.13) that the suggested estimator is more efficient than the unbiased estimator , and the ratio type estimator t_R(1) due to Sousa et al. [1].

Remark 2.1: If the correlation between the two variables Z ϕ and the auxiliary variable X is negative high, then one can consider the following product- type estimator for the population mean as

To exact bias of the proposed product - type estimator by t_p(φ) is given by

which is same as the bias of the classical product estimator

based on true response of variables Y and X.

It is observed from (2.15) that the bias expression of t_p(φ) is free from the scalar ϕ . So whatever be the value of ϕ, the bias of t_p(φ) will remains same as given in (2.15).

The mean square error of the estimator t_p(φ) ( p) to the first degree of approximation is given by

which depends on the value of the scalar ϕ . So one should be careful in selecting the value of ϕ .

From (1.3) and (2.17) we have

Which equals to the same condition in which the classical product estimator tP is better than usual unbiased estimator

Empirical Study

To judge the superiority of the proposed estimator over and the ratio type estimator t^R(1) due to Sousa et al.[1]we have computed the percent relative efficiencies of with respect to and t^R(1) by using the formulae:

For the percent relative efficiency (PRE's) computation purpose we assume for the sake of simplicity that where α is a scalar in percent, (i.e. α % ) as mentioned in Sousa et al. [1], Gupta et al.[2]. Under the above assumptions the PRE's formulae given by (2.14), (2.15) and (2.16) respectively reduce to:

It is observed from Table 1-3 that:

I. For fixed values of a = 10 % , 20% , 30 % , larger gain in efficiency is observed by using the proposed estimator over the conventional unbiased estimators and which do not utilize the auxiliary information.

II. For α = 10 % the gain in efficiency by using the proposed estimator over the ratio type estimator due t_R(1) to Sousa et al.'s (2010) is marginal while for a =20 % and 30 % are substantial.

III. For fixed values of (α , ρyx), the values of, ncrease as the value of ϕ increases up to 'zero' and starts decreasing when it goes beyond 'zero'.

IV. The maximum gain in efficiency is observed when ϕ =0, which is obvious because proposed additive model Z_φ becomes free from the scrambling.

V. For fixed value of (ρyx , α), the values of increase as the values of the correlation coefficient ρ _yx increases.

Overall we conclude that the proposed estimator is to be preferred in practice when:

i. The standard deviation of the scrambling variable S is closer to the standard deviation of the auxiliary variable X.

ii. The value of ϕ is closer to 'zero' and the value of correlation coefficient ρ _yx is larger.

Proposed Regression Estimator

To obtain the regression estimator of the population mean we first define the difference estimator for as

where d is a suitably chosen constant. It is easy to verify that the difference estimator td is unbiased estimator of the population mean .

The variance of the estimator td is given by

Substitution of (3.3) in (3.1) yields the resulting optimum difference estimator for the population mean as

We note that the value of β_zφx is unknown in practice. In such a situation we replace β_zφx by its consistent estimate

where is the sample regression coefficient of Z_ϕ and X and Z_ϕ=Y+ ϕ S is the scrambled response on Y;

and are unbiased estimators of ^S_zφx and ^S²_x respectively. Thus the resulting regression estimator for the population mean is given by

To obtain the bias of the regression estimator t_lr we further write

such that

E(e1)=E(e2) =0

and from Sukhatme & Sukhatme [6] we have

We assume that |e₂| <1 so that (1+e₂)^-1 is expandable. Now expanding the right hand side of (3.7), multiplying out and neglecting terms of e's having power greater than two we have

Taking expectation of both sides of (3.9) we get the bias of t_lr to the first degree of approximation as

Showing that the proposed regression estimator t_lr is a biased estimate. The bias will be negligible if the sample size n is sufficiently large.

Squaring both sides of (3.10) and neglecting terms of e’s having power greater than two we have

Taking expectation of both sides of (3.11) we get the mean square error of t_lr to the first degree of approximation as

In the light of (3.13), the expression (3.12) reduces to:

It is observed from (3.14) that the MSE of t_lr depends on the scalar ϕ. So the value ϕ will effect the MSE of tlr . Thus one should be very cautious about the selection of the value of scalar ϕ. Assuming linear relationship between Y and X Gupta et al. (2012) suggested the following regression estimator for the population mean as

where is the sample regression coefficient between Z_a and X and Z_a = Y +X is the scrambled response on Y. Setting ϕ = 1 in (3.10), one can easily get the bias of Gupta et al.'s (2012) regression estimator as

The mean square of the Gupta et al.'s (2012) regression estimator to the first degree of approximation is given by

which can be also obtained from (3.14) just by setting ϕ =1

Efficiency Comparisons

From (1.1) and (3.14) we

which is always positive if

It follows from (3.19), (3.20), (3.21) and (3.22) that the proposed estimator t_lr is more efficient than:

(i) The conventional unbiased estimator and the regression estimator due to Gupta et al. (2012) as long as the condition: | ϕ | <1 is satisfied.

(ii) The usual unbiased estimator Z_φ

(iii) The ratio estimator t_R(1) considered by Sousa et al.(2010) unless R = β_yx , the case where both the estimators t_R(1) and t_lr are equally efficient.

Empirical Study

To judge the merits of the suggested regression estimator tlr over Gupta et al. [2] regression estimator we have computed the percent relative efficiency of the suggested estimator tlr with respect to Gupta et al.'s (2012) estimator by using the formula:

Under the assumption and ,where α is a scalar in percent (i.e. α %), the reduces to:

We have computed the values of in (3.24) for α = 10 %, 20 %, 30 % and ρ _yx =0.55, (0.6) 0.9 and the finding are depicted in Tables 4-6.

Tables 4-6 clearly indicate that the values of are larger than 100. So the proposed regression estimator t_lr is more efficient than that of Gupta et al. [2] regression estimator when | ϕ | < 1. There is considerable gain in efficiency by using the proposed regression estimator t_lr over Gupta et al.'s (2012) regression estimator when the value of ϕ is in the neighborhood of 'origin', the value of ρ _yx is closer to 'unity' and the value of a is moderately large. Thus in such situations our recommendation is to use the proposed regression estimator t_lr as long as |ϕ | < 1.

Acknowledgement

The authors are thankful to the Editor-in-Chief, and to the anonymous learned referee for his valuable suggestions regarding improvement of the paper.