
Data sets shared by StatCrunch members
Showing 1 to 15 of 141 data sets matching distribution
Data Set/Description 
Owner 
Last edited 
Size 
Views 
PENNYAGESn800.XLS
Year of 800 pennies from a local bank, sampled in 2011 (which is why frequency for 2011 is low).  anderson_instructor  Oct 29, 2018  4KB  1471 
RegisteredNursesSurvey.xlsx
For what survey produced it, see http://www.statcrunch.com/5.0/survey.php?surveyid=8178&code=YINVQ
and inputs of all team mates.
Towards the end, some validation was done, deleting data where working hours was less than a work day, or outliers to legally admissible work days. Finally arbitarily long chains which were less likely to be encountered in draws of simulated data (M/F, Degrees etc.. were discarded). A total of 12 observations were thus thrown out.
All Credit goes to Team 3,the Instructor, our unnamed Friends in the Nursing profession who enthusiastically did a last minute push through over their extended social media groups for data and the respondents who kindly took out time for the survey.
Another thought is about the distribution of hours worked.
Wven if random, it "should be" "centered on" certain hours a day* number of days, with deviations from centre penalised, while picking a sample..
The observations 38 appear many times for example, however without an explainable reason (we are talking of workdistribution among nursing staff sample)
So do "primes" "47, 37, 29"
It is not to argue that they "shouldn't occur", but there has to be some reason for their being so significant/vibrant.
At this stage we may conclude that most of the respondents may not have been under fulltime nursing employments in strict sense of the term. 42, 48,72,60, 50,40 appearing more often would give us less variation but more regularity in the data. Since we haven't tried stratification, we do not know "how often they should occur". We thus do not redraw observations.  ugoagwu  Jun 14, 2014  2KB  1094 
2006 US Household Income
This data was simulated as a lognormal distribution based on computations from a data sumerization done by the Census Bureau.  craig_slinkman  Aug 21, 2010  35KB  1168 
AAPL Stock Prices 2/8/12  12/31/12
Apple Stock Prices from 2/8/122/31/12 Fairly normal distribution  burnsbeth  Jun 10, 2014  11KB  15984 
AAPL Stock Prices 2/8/12  12/31/12
Apple Stock Prices from 2/8/122/31/12 Fairly normal distribution  math1150  Aug 2, 2016  11KB  1968 
Students heights
Create a histogram for students’ heights. May need to experiment with different bin widths (e.g 3, 5). Comment on the shape of the distribution.  smcdaniel04  Jan 23, 2012  239B  1013 
Responses to How much should you spend on a wedding?
Respondents provided the amount they thought was reasonable to spend on a wedding, their gender, whether or not they have had a wedding, whether or not they were currently planning a wedding and their age. The data set is full of outliers with 60 of the 1424 respondents providing amounts over 100,000. Try using a Where expression of Amount <= 100000 to remove these extreme observations. The trimmed distribution still has a number of interesting features. Focus on these extremes by using a Where expression of Amount > 100000 and you will see that Males somewhat surprisingly make up the majority of this group.  scsurvey  May 21, 2014  29KB  1068 
Breast Cancer
Datafile Name: Breast Cancer
Datafile Subjects: Health , Medical
Story Names: Breast cancer
Reference: A.J. Lea. (1965). New Observations on Distribution of Neoplasms of Female Breast in Certain Countries. British Medical Journal, 1, 488490.
Text Citation: Velleman, P. F. and Hoaglin, D. C. (1981). Applications, Basics, and Computing of Exploratory Data Analysis. Belmont. CA: Wadsworth, Inc., pp. 127134.
Authorization: free use
Description: Data contains the mean annual temperature (in degrees F)
and Mortality Index for neoplasms of the female breast.
Data were taken from certain regions of Great Britain, Norway, and Sweden.
Number of cases: 16
Variable Names:
Mortality: Mortality index for neoplasms of the female breast
Temperature: Mean annual temperature (in degrees F)
In the early 1960s, data were collected from official statistics registers of Great Britain, Norway and Sweden on breast cancer mortality. Death rates for neoplasms of the breast were calculated for various age groups and for certain areas at the same latitude. Agespecific death rates were
then calculated for each area and converted to a mortality index using 100 as the agespecific rate for all of England and Wales. The mean annual temperatures at various latitudes under study were obtained from the British Meteorological Office.  phil_larson  Dec 2, 2015  187B  2287 
Intro Stats GPA
Grade point average for 24 students in intro stats class.
Create a histogram for GPA (If not using StatCrunch, you may want to create a frequency distribution first.) Note: May need to try different bin widths (e.g. 0.5, 1). Is the data continuous or discrete? Comment on the shape of the distribution.  smcdaniel04  Jan 23, 2012  178B  1757 
2006 Simulated Household Income.xlsx
This data file contains 5,000 simulated household income assuming that the distribution of household income is lognormally distributed. The mean and standard deviation of the logged data (base 10) were estimated from the U.S. Census Bureau’s report Table HINC06 entitled Income Distribution to $250,000 or More for Households: 2006. This can be found at the following web site: http://pubdb3.census.gov/macro/032007/hhinc/new06_000.htm.
This data was simulated because I could not find the original data on the web.
 craig_slinkman  Mar 24, 2010  40KB  590 
Number of songs on an iPod
Students in my class answered the question, "How many songs are on your iPod?" The distribution is skewed right. We generated 50 means from samples of size 30 that were randomly selected from this data.  lpyott  Mar 28, 2012  2KB  763 
Intro Stats: Qualitative Data (14)
1. Create a pie chart and a bar chart for the number of males and females.
2. Construct a relative frequency distribution for students’ classification.
3. What percentage of students are Freshmen?
4. Construct a frequency and relative frequency bar graph for students’ classification. Compare the graphs of these two. What do you notice?
 smcdaniel04  Jan 23, 2012  937B  597 
Distribution of US Population
This dataset is based on the distribution of the US population. 1500 households are randomly selected to determine which region they are from in the Observed Count column. The break down of region is then compared to the Expected Count column which is based off of the percentages for each region in the year 2000 (19.0% for Northeast, 22.9% for Midwest, 35.6% for South, and 22.5% for West). This data set comes from "Statistics: Informed Decisions Using Data" by Michael Sullivan.  statcrunchhelp  Sep 16, 2014  140B  530 
Bottles of Alcohol Solution (Fluid Ounce Quality Control)
Data were simulated and represent 25 bottles of an alcohol solution randomly selected from a manufacturer's distribution center. Each value represents the amount of fluid ounces of alcohol in a randomly selected container. The manufacturer claims the amount of alcohol is normally distributed with a population mean of 12 fl oz and standard deviation of 0.2 fl oz.  dlozimek  Mar 8, 2018  220B  468 
UTA Cola
This data set consists of 10,000 observations. The variable of interest is the actual number of fluid ounces in a 16 ounce bottle of UTA Cola. The population mean is 16 and the standard deviation is 0.1.
This data is useful for demonstrating the concept of random sampling, sampling distributions, confidence intervals, and hypothesis tests.
 craig_slinkman  Mar 31, 2011  137KB  236 

