Data Set/Description 
Owner 
Last edited 
Size 
Views 
US Counties and Presidential Voting Dataset
Sampling Unit county
3141 observations and 19 variables, maximum # NAs:2956
Name
county  County
state  State
msa  Metropolitan Statistical Area
pmsa  Primary Metropolitan Statistical Area
pop.density  1992 pop per 1990 miles^2
pop  1990 population
pop.change  Percent population change 19801992
age6574  Percent age 6574, 1990
age75  Percent age >= 75, 1990
crime  serious crimes per 100,000 1991
college  Percent with bachelor's degree or higher of those
age>=25
income  median family income, 1989 dollars
farm  farm population, % of total, 1990
democrat  Percent votes cast for democratic president
republican  Percent votes cast for republican president
Perot  Percent votes cast for Ross Perot
white  Percent white, 1990
black  Percent black, 1990
turnout  1992 votes for president / 1990 pop x 100  craig_slinkman  Apr 12, 2011  755KB  2469 
Woodbury Sampling
76 student responses to ...
1) Are you a smoker?
2) Do you own an iPhone?
3) How much did you spend on books and supplies for your courses this semester?  georgew49  Aug 15, 2012  1.024B  2029 
Violent Crimes by State
http://www.census.gov/statab/ranks/rank21.html
State Rankings  Statistical Abstract of the United States
VIOLENT CRIMES 1 PER 100,000 POPULATION  2006
[When states share the same rank, the next lower rank is omitted. Because of rounded data, states may have identical values shown, but different ranks. Cautionary note]
Cautionary note about rankings
The ranks in some tables are based on estimates derived from a sample(s). Because of sampling and nonsampling errors associated with the estimates, the ranking of the estimates does not necessarily reflect the correct ranking of the unknown true values. Thus, caution should be used when making inferences or statements about the states' true values based on a ranking of the estimates. As an example, the estimated total (average, percent, ratio, etc.) for State A may be larger than the estimates for all other states. This does not necessarily mean that the true total (average, percent, ratio, etc.) for State A is larger than those for all other states. Such an inference typically depends on among other factors the size of the difference(s) between the estimates in question, and the size of their associated standard errors.
In other tables, the ranks are based on a complete enumeration of the target population, or on complete administrative reporting from the population. In such cases, sampling is not used, and there is no sampling error component in the estimates. Still, care should still be taken when making inferences or statements based on the rankings. The table values may still exhibit nonsampling error originating from such sources as coverage problems (missing units or duplicates), nonresponse, misreporting, and others.
Last Revised: September 27, 2011 at 09:43:17 AM  phil_larson  Jan 16, 2013  881B  3469 
Annual Movie Data 2008 Random Sampling.txt
This data is a random sampling of movies that played in theaters in 2008. It includes movies released in previous years that earned money during 2008. For example, a movie released over Thanksgiving in 2007 will most likely earn money in 2007 and 2008.
Each box office year ends on the first Sunday of the following year. The next year starts the following day (Monday). For example, the "2004 box office year" ended on Sunday, January 2, 2005.
Inflationadjusted figures are based ticket sale estimates, and may not be precise due to rounding errors.  wikipeterson  Oct 7, 2009  8KB  512 
Arlington Gasoline Retailers Sampling Frame.xls
This is a sampling frame of all gasoline retailer in Arlington Texas collected in Spring Spemster of 2010.
Note that you may need to drag the column lines in order to see the entire data fields.  craig_slinkman  Apr 8, 2010  22KB  491 
Annual Movie Data 2008 Random Sampling.txt
This chart ranks movies by the amount they earned during 2008.  wikipeterson  Oct 14, 2009  6KB  436 
Maria's Data Analysis
Maria's Classroom data to be used for the Sampling Variability Project  ninibb1  Jun 14, 2016  2KB  369 
UTA Cola
This data set consists of 10,000 observations. The variable of interest is the actual number of fluid ounces in a 16 ounce bottle of UTA Cola. The population mean is 16 and the standard deviation is 0.1.
This data is useful for demonstrating the concept of random sampling, sampling distributions, confidence intervals, and hypothesis tests.
 craig_slinkman  Mar 31, 2011  137KB  238 
Dividing City into blocks
numbers shown represent house address numbers  niarah.brown0116  Jun 1, 2019  871B  43 
Sampling Senators 115th
Name, State, Affiliation, & Age of members of the 115th Congress  pmontegary  Mar 27, 2019  3KB  288 
Simple Random Sample of n=30
Using the Age column, take a Simple Random Sample (SRS) of n=30. Show the sample and explain how you took the sample.
Use StatCrunch sampling, Excel Analysis Toolpak sampling, or Excel function Randbetween (using row numbers). Taking the first 30, or every 5th row, or other such schemes are NOT random.  lethamfrancis  Mar 31, 2014  631B  404 
BSTAT 3321 Final Averages
Used to random sampling and the concept of sampling error.  craig_slinkman  Mar 19, 2011  270B  144 
Have COC Students Decided Their Major?
Data collected by method of convenience sampling.  aldusdean  Oct 22, 2017  289B  68 
Deal
This table lists the number of wins from playing the Let's Make a Deal applet 50 times with the strategy "stay with door 1 no matter what." These data were generated during Lab 2 (i.e., 1 entry per group) in STAT 215 at WVU and will be reused later in this course to illustrate sampling distribution concepts.  kjryan  Sep 6, 2017  110B  50  Wolf River Pollution
Jaffe, Parker and Wilson (1982) have investigated the concentration of several hydrophobic organic substances (such as hexachlorobenzene, chlordane, heptachlor, aldrin, dieldrin, endrin) in the Wolf River in Tennessee. Measurements were taken downstream of an abandoned dump site that had previously been used by the pesticide industry to dispose of its waste products.
It was expected that these hydrophic substances might have a nonhomogeneous vertical distribution in the river because of differences in density between these compounds and water and because of the adsorption of these compounds on sediments, which could lead to higher concentrations on the bottom. It is important to check this hypothesis because the standard procedure of sampling at sixtenths of the depth could miss the bulk of these pollutants if the distribution were not uniform.
Grab samples were taken with a La MotteVandorn water sampler of 1 litre capacity at various depths of the river. This sampler consists of a horizontal plexiglas tube of 7 centimetres diameter and a plunger of each side which shuts the sampler when the sampler is at the desired depth. Ten surface, 10 middepth and 10 bottom samples were collected, all within a relatively short period. Until they were analysed the samples were stored in 1quart mason jars at low temperature.
In the analysis of the samples, a 250millilitre water sample was taken from each mason jar and was extracted with 1 millilitre of either hexanes or petroleum ether. A sample of the extract was then injected into a gas chromatograph and the output was compared against standards of known concentrations. The test procedure was repeated two more times, injecting different samples of the extract in the gas chromatograph. The average aldrin and hexachlorobenzene (HCB) concentrations (in nanograms per liter) in these 30 samples are given in the data.  jmanthey  Apr 17, 2014  569B  444 
