User:The Jedi Math Squirrel/joinwork

From Wikipedia, the free encyclopedia


Coverage Error[edit]

All colored circles are included in the target population. Green and Orange colored circles are included in the sample frame. Green colored circles are a randomly generated sample from the sample frame.  The sample frame includes overcoverage because John and Jack are the same person, but he is included more than once in the sample frame.  The sample frame includes undercoverage because not all of the target population is included in the sample frame.

Coverage error is one type of Total survey error. It is a type of non-sampling error that occurs when there is not a one-to-one correspondence between the target population and the data frame from which a sample is drawn.[1] In survey sampling, a Sampling frame is used to draw a random sample from the population. In a census, a sampling frame is still used, but the intent is to include the entire population. The differences between the target population of the survey and the sample frame result in coverage error.[2]

For example, suppose a researcher is using Twitter to determine the United State's public opinion on a recent action taken by their President. U.S. Twitter user demographics are not representative of U.S. demographics. Therefore, the data source will introduce a type of error called undercoverage. Undercoverage is the error that results when the target population of a study is not contained within the sampled population. A Longitudinal study is particularly susceptible to undercoverage due to the evolving of populations over time.[3]Also, not all users are assigned to an individual, and one individual might have multiple accounts. Therefore, the data source will introduce a type of error called overcoverage. Overcoverage is the error that results when data exists for entities that should not be counted or entities are counted more than once.[4][5] The result of overcoverage and undercoverage is Sampling bias.

As another example, a researcher may wish to study the opinions of registered voters (target population) by calling residences listed in a telephone directory (data frame).  It is likely that the phone numbers of some registered voters are not listed in the directory, resulting in undercoverage of the target population, and potential bias if the characteristics of voters listed in the directory differ from those who are unlisted.  Bias can also occur if some registered voters are over-represented in the directory, as could occur if some residences have more than one telephone listed in the directory.  Bias also might occur if some of the phone numbers listed in the directory do not belong to registered voters.[6]

Ways to Quantify Coverage Error[edit]

Many different methods have been used to quantify and correct for coverage error. Methods employed in Mathematical statistics in identifying a plausible Statistical model can be applied. Often, the methods employed are unique to specific agencies and organizations.  For example, the United States Census Bureau has developed models using the U.S. Postal Service's Delivery Sequence File, IRS 1040 address data, commercially available foreclosure counts, and other data to develop models capable of predicting undercount by census block.  The Census Bureau has reported some success fitting such models to Zero Inflated Negative Binomial or Zero Inflated Poisson (ZIP) distributions.[7] See Zero-inflated model.

Another method to quantify coverage error is to perform an evaluation study. This approach is similar to mark-recapture methodology.[5][8] In capture/recapture methods, a sample is taken directly from the population, marked, and re-introduced to the population.  Another sample is then taken from the population, and the proportion of previously captured samples is used to estimate actual population size.  This method can be extended to determining the validity of the sampling frame by taking a sample directly from the target population ("capture") and then taking another sample from the data frame ("re-capture") in order to estimate under-coverage.[9] For example, suppose a census was conducted. After the completion of the census, random samples from the frame are drawn to be counted again. The difference between the two counts of the same area sampled is used to determine coverage error.[5]

Ways to Reduce Coverage Error[edit]

One way to reduce coverage error is to rely on multiple sources to either build a sample frame or solicit information. This is called a mixed-mode approach. For example, Washington State University students conducted Student Survey Experience Surveys by building a sample frame using both street addresses and email addresses. In another example, the 2010 U.S. Census primarily relied on residential mail responses, and then field interviewers were deployed for non-responders. This approach had the added benefit of cost reduction as the majority of people responded by mail and did not require a field visit.[2]

Another way to reduce coverage error is by utilizing Paradata. An example of this is using paradata to produce a sampling frame of telephone numbers. Suppose the target population is households. Since telephone numbers can include businesses, overcoverage is a concern. There is a method of assigning a score to phone numbers which indicates the number's likelihood of being assigned to a person or business.[4]

2010 Census[edit]

The U.S. Census Bureau prepares and maintains a Master Address File of some 144.9 million addresses that it uses as a sampling frame for the U.S. Decennial census and other surveys.  Despite the efforts of some 111,105 field representatives and an expenditure of nearly half a billion dollars, the Census bureau still found a significant number of addresses that had not found their way into the Master Address File.[7]

Coverage Follow-Up (CFU) and Field Verification (FV) were United States governmental operations in the 2010 Census that were formed to improve upon the 2000 Census. The type of coverage errors these operations intended to address were as follows: not counting someone who should have been counted; counting someone who should not have been counted; and counting someone who should have been counted, but whose identified location was in error. Coverage errors in the U.S. Census have the potential impact of allowing people groups to be underrepresented by the government. Of particular concern is "differential undercounts" which underestimates demographic groups. Although the efforts of the CFU and FV improved the 2010 Census accuracy, more study was recommended to address the question of differential undercounts.[10]

See Also:[edit]

Sampling Error

References:[edit]

See Also:[edit]

Sampling Error

  1. ^ Fisheries, NOAA (2019-02-21). "Survey Statistics Overview | NOAA Fisheries". www.fisheries.noaa.gov. Retrieved 2019-02-24.
  2. ^ a b 1941-, Dillman, Don A.,. Internet, phone, mail, and mixed-mode surveys : the tailored design method. Smyth, Jolene D.,, Christian, Leah Melani, (Fourth edition ed.). Hoboken. ISBN 9781118921302. OCLC 878301194. {{cite book}}: |edition= has extra text (help); |last= has numeric name (help)CS1 maint: extra punctuation (link) CS1 maint: multiple names: authors list (link)
  3. ^ 1966-, Lynn, Peter, (2009). Methodology of longitudinal surveys. Chichester, UK: John Wiley & Sons. ISBN 9780470743911. OCLC 317116422. {{cite book}}: |last= has numeric name (help)CS1 maint: extra punctuation (link) CS1 maint: multiple names: authors list (link)
  4. ^ a b Frauke., Kreuter, (2013). Improving surveys with paradata : analytic uses of process information. Wiley & Sons. ISBN 9781118591451. OCLC 974751893.{{cite book}}: CS1 maint: extra punctuation (link) CS1 maint: multiple names: authors list (link)
  5. ^ a b c Total survey error in practice. Biemer, Paul P.,, Leeuw, Edith Desirée de,, Eckman, Stephanie,, Edwards, Brad,, Kreuter, Frauke,, Lyberg, Lars,. Hoboken, New Jersey. ISBN 9781119041689. OCLC 971891428.{{cite book}}: CS1 maint: extra punctuation (link) CS1 maint: others (link)
  6. ^ Elementary survey sampling. Scheaffer, Richard L. (7th ed., student ed ed.). Boston, MA: Brooks/Cole. 2012. ISBN 0840053614. OCLC 732960076. {{cite book}}: |edition= has extra text (help)CS1 maint: others (link)
  7. ^ a b Bureau, US Census. "Selection of Predictors to Model Coverage Errors". www.census.gov. Retrieved 2019-02-24.
  8. ^ Elementary survey sampling. Scheaffer, Richard L. (7th ed., student ed ed.). Boston, MA: Brooks/Cole. 2012. ISBN 0840053614. OCLC 732960076. {{cite book}}: |edition= has extra text (help)CS1 maint: others (link)
  9. ^ Bureau, US Census. "Coverage Error Models for Census and Survey Data". www.census.gov. Retrieved 2019-02-24.
  10. ^ Office., United States. Government Accountability ([2010]). 2010 census : follow-up should reduce coverage errors, but effects on demographic groups need to be determined : report to congressional requesters. U.S. Govt. Accountability Office. OCLC 721261877. {{cite book}}: Check date values in: |date= (help)