...
概述
符号定义:
-
要抽取的样本容量\(n\),总体中含有的个体数\(N\)。
-
总体中第\(i\)个单元\(Y_i\)的规模度量\(M_i\)。
-
总体的总规模\(\displaystyle{M_0=\sum_{i=1}^{N}M_i}\)。
-
每次抽样中,\(Y_i\)被抽中的概率\(Z_i\),如果是\(\mathrm{PPS}\)抽样,则有
\[Z_i=\frac{M_i}{M_0}=\frac{M_i}{\sum\limits_{i=1}^{N}M_i}.
\]
对总体总值的估计量:汉森-赫维茨(Hansen-Hurwitz)估计量。
\[\hat Y_{HH}=\frac{1}{n}\sum_{i=1}^{n}\frac{y_i}{Z_i}.
\]
如果是\(\mathrm{PPS}\)抽样,则
\[\hat Y_{HH}=\frac{1}{n}\sum_{i=1}^{n}\frac{y_i}{Z_i}=\frac{M_0}{n}\sum_{i=1}^{n}\frac{y_i}{M_i}.
\]
HH统计量的期望、方差
定理:\(\hat Y_{HH}\)是总体总值\(Y\)的无偏估计量,即
\[\mathbb{E}(\hat {Y}_{HH})=Y.
\]
可先计算只抽取一个样本时,\(y_i/Z_i\)的期望,为
\[\mathbb{E}\left(\frac{y_i}{Z_i}\right)=\sum_{i=1}^{N}Z_i\frac{Y_i}{Z_i}=Y,
\]
再由不放回抽样时每个样本的独立性,有
\[\mathbb{E}(\hat{Y}_{HH})=\frac{1}{n}\sum_{i=1}^{n}\mathbb{E}\left(\frac{y_i}{Z_i}\right)=\frac{1}{n}\sum_{i=1}^{n}Y=Y.
\]
要注意到每一个\(Z_i\)是与\(Y_i\)相联系的量,因此当实际抽中\(Y_i\)时,可以将其观测值视为\(Y_i/Z_i\),再按照离散分布列,加权计算期望即可。
定理:\(\hat Y_{HH}\)的方差为
\[\mathbb{D}(\hat Y_{HH})=\frac{1}{n}\sum_{i=1}^{N}Z_i\left(\frac{Y_i}{Z_i}-Y \right)^2.
\]
类似地,可以先计算每一个\(y_i/Z_i\)的方差,再由样本间的同分布独立性计算整体方差,为
\[\mathbb{D}\left(\frac{y_i}{Z_i} \right)=\sum_{i=1}^{N}Z_i\left(\frac{Y_i}{Z_i}-Y \right)^2,\\
\mathbb{D}(\hat Y_{HH})=\mathbb{D}\left(\frac{1}{n}\sum_{i=1}^{n}\frac{y_i}{Z_i} \right)=\frac{1}{n}\mathbb{D}\left(\frac{y_i}{Z_i} \right)=\frac{1}{n}\sum_{i=1}^{N}Z_i\left(\frac{Y_i}{Z_i}-Y \right)^2.
\]
HH统计量方差的无偏估计
定理:当\(n>1\)时,\(\mathbb{D}(\hat Y_{HH})\)的无偏估计为
\[v(\hat Y_{HH})=\frac{1}{n}\frac{1}{n-1}\sum_{i=1}^{n}\left(\frac{y_i}{Z_i}-\hat Y_{HH} \right)^2,\\
\mathbb{E}(v(\hat Y_{HH}))=\frac{1}{n}\sum_{i=1}^{N}Z_i\left(\frac{Y_i}{Z_i}-Y \right)^2.
\]
不妨记\(t_i\)为\(Y_i\)的入样次数,则\(\displaystyle{\sum_{i=1}^{N}t_i=n}\),\(t_i\sim B(n, Z_i)\),\((t_i,t_j)\)服从多项分布,且
\[\mathbb{E}(t_i)=nZ_i,\quad \mathbb{D}(t_i)=nZ_i(1-Z_i),\\
\mathbb{E}(t_it_j)=n(n-1)Z_iZ_j,\\
\mathrm{cov}(t_i,t_j)=-nZ_iZ_j.
\]
要证明定理,即证明
\[\mathbb{E}\left[\sum_{i=1}^{n}\left(\frac{y_i}{Z_i}-\hat Y_{HH} \right)^2 \right]=(n-1)\sum_{i=1}^{N}Z_i\left(\frac{Y_i}{Z_i}-Y \right)^2=n(n-1)\mathbb{D}(\hat{Y}_{HH}).
\]
注意到
\[\hat Y_{HH}=\frac{1}{n}\sum_{i=1}^{n}\frac{y_i}{Z_i},
\]
所以
\[\begin{aligned}
\sum_{i=1}^{n}\left(\frac{y_i}{Z_i}-\hat Y_{HH} \right)^2=\sum_{i=1}^{n}\left(\frac{y_i}{Z_i} \right)^2-n\hat Y_{HH}^2=\sum_{i=1}^{n}\left(\frac{y_i}{Z_i}-Y \right)^2-n(\hat Y_{HH}-Y)^2
\end{aligned},
\]
这里\(\displaystyle{\mathbb{E}\left(\frac{y_i}{Z_i} \right)=\mathbb{E}(\hat Y_{HH})=Y}\),于是
\[\begin{aligned}
\mathbb{E}\left[\sum_{i=1}^{n}\left(\frac{y_i}{Z_i}-\hat Y_{HH} \right)^2 \right]&=\mathbb{E}\left[\sum_{i=1}^{n}\left(\frac{y_i}{Z_i}-Y \right)^2-n(\hat Y_{HH}-Y)^2 \right]\\
&=\mathbb{E}\left[\sum_{i=1}^{N}t_i\left(\frac{y_i}{Z_i}-Y \right)^2 \right]-n\mathbb{D}(\hat Y_{HH})\\
&=\sum_{i=1}^{N}\mathbb{E}(t_i)\left(\frac{y_i}{Z_i}-Y \right)^2-n\mathbb{D}(\hat Y_{HH})\\
&=n\sum_{i=1}^{N}Z_i\left(\frac{y_i}{Z_i}-Y \right)^2-n\mathbb{D}(\hat{Y}_{HH})\\
&=n^2\mathbb{D}(\hat{Y}_{HH})-n\mathbb{D}(\hat{Y}_{HH})\\
&=n(n-1)\mathbb{D}(\hat{Y}_{HH}),
\end{aligned}
\]
原式得证。
推论:如果是\(\mathrm{PPS}\)抽样,则由\(Z_i=\dfrac{M_i}{M_0}\),有
\[v(\hat{Y}_{HH})=\frac{1}{n(n-1)}\sum_{i=1}^{n}\left(\frac{y_i}{Z_i}-\hat Y_{HH} \right)^2=\frac{M_0^2}{n(n-1)}\sum_{i=1}^{n}\left(\frac{y_i}{M_i}-\frac{\hat Y_{HH}}{M_0} \right)^2.
\]