Sampling and Surveys Textbook Section 4.1 Vocabulary Population: The entire group of individuals we want information about. Census: Collects data from every individual in the population. Sample: A subset of individuals in the population from which we actually collect data. Example: A furniture maker buys hardwood in large batches. The supplier is supposed to dry the wood before shipping. The furniture maker chooses five pieces of wood from each batch and tests their moisture content. If any piece exceeds 12% moisture content, the entire batch is sent back.

Population: all of the pieces of hardwood in a batch Sample: The five pieces selected to test (testing all pieces would be impractical, expensive and time consuming) Idea of a Sample Survey Statistics helps us to make assumptions about a whole population based on a sample. Like buying a whole bowl of ice cream after tasting a bite. The first step in planning a sample survey is to say exactly what population we want to describe. The second step is to say exactly what we want to measure.

The final step is to decide how to choose a sample from the population so that it will actually represent the entire population as closely as possible. How to Sample Badly Convenience Sampling: We want to know how long students spend doing homework each week, so we go to the library and ask the first 30 students we see. Problem: Produces unrepresentative data people in the library are probably more likely to spend time doing homework than other students. Voluntary Response Sample: Call-in, text-in, internet polls, etc. rely on voluntary response. Problem: Generally, the people who reply to these polls

have very strong feelings on the topic. Bias: A study is designed to show bias if it would consistently under- or overestimate the value you want to know. How to Sample Well: Simple Random Sampling Random Sampling involves using a chance process to determine which members of a population are included in the sample. Simple Random Sample of a given size is chosen such that every member of the population has an equal chance of being chosen for the sample.

HOW TO: Generally, we number each member of the population and use a random number generator to choose the individuals in our sample. Random Number Table Calculator Repeats? Out of Range? Other Random Sampling Methods Sometimes populations are too large and/or complex for a Simple Random Sample to guarantee a representative sample. A Stratified Random Sample starts by classifying the population into similar groups (e.g., high school grades, election districts, income levels, etc.) called

strata, then choose an SRS from each strata Example: A British farmer grows sunflowers for making sunflower oil, with 10 rows and 10 columns. Irrigation ditches run along the top and the bottom of the field. The farmer would like to estimate the number of healthy plants in the field. It would take too much time to count all the plants. How and why should she stratify her field before taking an SRS? Other Sampling Methods Both Simple Random Samples and Stratified Samples are hard to use when populations are large and/or spread out over a wide area. A helpful method in this case is Cluster sampling. Start by classifying the population into groups that

are near each other (in the same box, neighborhood, classroom, etc.). Choose an SRS of clusters Then, survey ALL individuals in the cluster MAKE SURE you understand the difference between strata and clusters. How do you do it?? The student council wants to conduct a survey during the first five minutes of an all-school assembly. They would like to announce the results at the end of the assembly. There are 800 students present at the assembly. 12th grade is seated in seats 1 200 11th grade is seated in seats 201 400 10th grade is seated in seats 401 600

9th grade is seated in seats 601 800 How would you conduct an SRS? How would you do a stratified random sample? How would you do a cluster sample? Which is best in this case? Inference for Sampling The purpose of a sample is give us information about a larger population. The process of drawing conclusions based on a sample is called inference. Inference from biased samples (like convenience or voluntary response) would be misleading.

The first reason to use random sampling is to avoid this bias and allow us to make inferences. It is important to remember, though, that no two samples will provide the exact same information. Because we use chance to select our samples, the laws of probability allow for trustworthy inference within a margin of error. Larger samples will give better information about the population than smaller samples What can go wrong? Other than using bad sampling methods, other problems can sometimes occur. Undercoverage occurs when some members of the population cannot be chosen in a sample (lists of populations are seldom complete).

Lists of households, for example, miss homeless people, those in prison, etc. Nonresponse occurs when an individual chosen cant be contacted or refuses to participate. Nonresponse to surveys often exceeds 50% Response Bias is when the interviewees systematically give inaccurate answers. The wording of questions is the most important influence on the answers to a sample survey. Even the order in which questions are asked matters. NEVER trust the results of a survey until youve read the EXACT questions asked.