欢迎您访问 最编程 本站为您分享编程语言代码,编程技术文章!
您现在的位置是: 首页

[取样] 不等概率取样 - 简单回放 不等概率取样

最编程 2024-05-03 21:06:09
...

概述

符号定义:

  • 要抽取的样本容量\(n\),总体中含有的个体数\(N\)

  • 总体中第\(i\)个单元\(Y_i\)的规模度量\(M_i\)

  • 总体的总规模\(\displaystyle{M_0=\sum_{i=1}^{N}M_i}\)

  • 每次抽样中,\(Y_i\)被抽中的概率\(Z_i\),如果是\(\mathrm{PPS}\)抽样,则有

    \[Z_i=\frac{M_i}{M_0}=\frac{M_i}{\sum\limits_{i=1}^{N}M_i}. \]

对总体总值的估计量:汉森-赫维茨(Hansen-Hurwitz)估计量。

\[\hat Y_{HH}=\frac{1}{n}\sum_{i=1}^{n}\frac{y_i}{Z_i}. \]

如果是\(\mathrm{PPS}\)抽样,则

\[\hat Y_{HH}=\frac{1}{n}\sum_{i=1}^{n}\frac{y_i}{Z_i}=\frac{M_0}{n}\sum_{i=1}^{n}\frac{y_i}{M_i}. \]

HH统计量的期望、方差

定理:\(\hat Y_{HH}\)是总体总值\(Y\)的无偏估计量,即

\[\mathbb{E}(\hat {Y}_{HH})=Y. \]

可先计算只抽取一个样本时,\(y_i/Z_i\)的期望,为

\[\mathbb{E}\left(\frac{y_i}{Z_i}\right)=\sum_{i=1}^{N}Z_i\frac{Y_i}{Z_i}=Y, \]

再由不放回抽样时每个样本的独立性,有

\[\mathbb{E}(\hat{Y}_{HH})=\frac{1}{n}\sum_{i=1}^{n}\mathbb{E}\left(\frac{y_i}{Z_i}\right)=\frac{1}{n}\sum_{i=1}^{n}Y=Y. \]

要注意到每一个\(Z_i\)是与\(Y_i\)相联系的量,因此当实际抽中\(Y_i\)时,可以将其观测值视为\(Y_i/Z_i\),再按照离散分布列,加权计算期望即可。

定理:\(\hat Y_{HH}\)的方差为

\[\mathbb{D}(\hat Y_{HH})=\frac{1}{n}\sum_{i=1}^{N}Z_i\left(\frac{Y_i}{Z_i}-Y \right)^2. \]

类似地,可以先计算每一个\(y_i/Z_i\)的方差,再由样本间的同分布独立性计算整体方差,为

\[\mathbb{D}\left(\frac{y_i}{Z_i} \right)=\sum_{i=1}^{N}Z_i\left(\frac{Y_i}{Z_i}-Y \right)^2,\\ \mathbb{D}(\hat Y_{HH})=\mathbb{D}\left(\frac{1}{n}\sum_{i=1}^{n}\frac{y_i}{Z_i} \right)=\frac{1}{n}\mathbb{D}\left(\frac{y_i}{Z_i} \right)=\frac{1}{n}\sum_{i=1}^{N}Z_i\left(\frac{Y_i}{Z_i}-Y \right)^2. \]

HH统计量方差的无偏估计

定理:当\(n>1\)时,\(\mathbb{D}(\hat Y_{HH})\)的无偏估计为

\[v(\hat Y_{HH})=\frac{1}{n}\frac{1}{n-1}\sum_{i=1}^{n}\left(\frac{y_i}{Z_i}-\hat Y_{HH} \right)^2,\\ \mathbb{E}(v(\hat Y_{HH}))=\frac{1}{n}\sum_{i=1}^{N}Z_i\left(\frac{Y_i}{Z_i}-Y \right)^2. \]

不妨记\(t_i\)\(Y_i\)的入样次数,则\(\displaystyle{\sum_{i=1}^{N}t_i=n}\)\(t_i\sim B(n, Z_i)\)\((t_i,t_j)\)服从多项分布,且

\[\mathbb{E}(t_i)=nZ_i,\quad \mathbb{D}(t_i)=nZ_i(1-Z_i),\\ \mathbb{E}(t_it_j)=n(n-1)Z_iZ_j,\\ \mathrm{cov}(t_i,t_j)=-nZ_iZ_j. \]

要证明定理,即证明

\[\mathbb{E}\left[\sum_{i=1}^{n}\left(\frac{y_i}{Z_i}-\hat Y_{HH} \right)^2 \right]=(n-1)\sum_{i=1}^{N}Z_i\left(\frac{Y_i}{Z_i}-Y \right)^2=n(n-1)\mathbb{D}(\hat{Y}_{HH}). \]

注意到

\[\hat Y_{HH}=\frac{1}{n}\sum_{i=1}^{n}\frac{y_i}{Z_i}, \]

所以

\[\begin{aligned} \sum_{i=1}^{n}\left(\frac{y_i}{Z_i}-\hat Y_{HH} \right)^2=\sum_{i=1}^{n}\left(\frac{y_i}{Z_i} \right)^2-n\hat Y_{HH}^2=\sum_{i=1}^{n}\left(\frac{y_i}{Z_i}-Y \right)^2-n(\hat Y_{HH}-Y)^2 \end{aligned}, \]

这里\(\displaystyle{\mathbb{E}\left(\frac{y_i}{Z_i} \right)=\mathbb{E}(\hat Y_{HH})=Y}\),于是

\[\begin{aligned} \mathbb{E}\left[\sum_{i=1}^{n}\left(\frac{y_i}{Z_i}-\hat Y_{HH} \right)^2 \right]&=\mathbb{E}\left[\sum_{i=1}^{n}\left(\frac{y_i}{Z_i}-Y \right)^2-n(\hat Y_{HH}-Y)^2 \right]\\ &=\mathbb{E}\left[\sum_{i=1}^{N}t_i\left(\frac{y_i}{Z_i}-Y \right)^2 \right]-n\mathbb{D}(\hat Y_{HH})\\ &=\sum_{i=1}^{N}\mathbb{E}(t_i)\left(\frac{y_i}{Z_i}-Y \right)^2-n\mathbb{D}(\hat Y_{HH})\\ &=n\sum_{i=1}^{N}Z_i\left(\frac{y_i}{Z_i}-Y \right)^2-n\mathbb{D}(\hat{Y}_{HH})\\ &=n^2\mathbb{D}(\hat{Y}_{HH})-n\mathbb{D}(\hat{Y}_{HH})\\ &=n(n-1)\mathbb{D}(\hat{Y}_{HH}), \end{aligned} \]

原式得证。

推论:如果是\(\mathrm{PPS}\)抽样,则由\(Z_i=\dfrac{M_i}{M_0}\),有

\[v(\hat{Y}_{HH})=\frac{1}{n(n-1)}\sum_{i=1}^{n}\left(\frac{y_i}{Z_i}-\hat Y_{HH} \right)^2=\frac{M_0^2}{n(n-1)}\sum_{i=1}^{n}\left(\frac{y_i}{M_i}-\frac{\hat Y_{HH}}{M_0} \right)^2. \]