StatCrunch logo (home)

Data sets shared by StatCrunch members
Showing 1 to 15 of 141 data sets matching distribution
Data Set/Description Owner Last edited Size Views
Year of 800 pennies from a local bank, sampled in 2011 (which is why frequency for 2011 is low).
anderson_instructorOct 29, 20184KB1493
For what survey produced it, see and inputs of all team mates. Towards the end, some validation was done, deleting data where working hours was less than a work day, or outliers to legally admissible work days. Finally arbitarily long chains which were less likely to be encountered in draws of simulated data (M/F, Degrees etc.. were discarded). A total of 12 observations were thus thrown out. All Credit goes to Team 3,the Instructor, our unnamed Friends in the Nursing profession who enthusiastically did a last minute push through over their extended social media groups for data and the respondents who kindly took out time for the survey. Another thought is about the distribution of hours worked. Wven if random, it "should be" "centered on" certain hours a day* number of days, with deviations from centre penalised, while picking a sample.. The observations 38 appear many times for example, however without an explainable reason (we are talking of work-distribution among nursing staff sample) So do "primes" "47, 37, 29" It is not to argue that they "shouldn't occur", but there has to be some reason for their being so significant/vibrant. At this stage we may conclude that most of the respondents may not have been under full-time nursing employments in strict sense of the term. 42, 48,72,60, 50,40 appearing more often would give us less variation but more regularity in the data. Since we haven't tried stratification, we do not know "how often they should occur". We thus do not re-draw observations.
ugoagwuJun 14, 20142KB1127
2006 US Household Income
This data was simulated as a lognormal distribution based on computations from a data sumerization done by the Census Bureau.
craig_slinkmanAug 21, 201035KB1206
AAPL Stock Prices 2/8/12 - 12/31/12
Apple Stock Prices from 2/8/12-2/31/12 Fairly normal distribution
burnsbethJun 10, 201411KB16896
AAPL Stock Prices 2/8/12 - 12/31/12
Apple Stock Prices from 2/8/12-2/31/12 Fairly normal distribution
math1150Aug 2, 201611KB2154
Students heights
Create a histogram for students’ heights. May need to experiment with different bin widths (e.g 3, 5). Comment on the shape of the distribution.
smcdaniel04Jan 23, 2012239B1042
Responses to How much should you spend on a wedding?
Respondents provided the amount they thought was reasonable to spend on a wedding, their gender, whether or not they have had a wedding, whether or not they were currently planning a wedding and their age. The data set is full of outliers with 60 of the 1424 respondents providing amounts over 100,000. Try using a Where expression of Amount <= 100000 to remove these extreme observations. The trimmed distribution still has a number of interesting features. Focus on these extremes by using a Where expression of Amount > 100000 and you will see that Males somewhat surprisingly make up the majority of this group.
scsurveyMay 21, 201429KB1086
Breast Cancer
Datafile Name: Breast Cancer Datafile Subjects: Health , Medical Story Names: Breast cancer Reference: A.J. Lea. (1965). New Observations on Distribution of Neoplasms of Female Breast in Certain Countries. British Medical Journal, 1, 488-490. Text Citation: Velleman, P. F. and Hoaglin, D. C. (1981). Applications, Basics, and Computing of Exploratory Data Analysis. Belmont. CA: Wadsworth, Inc., pp. 127-134. Authorization: free use Description: Data contains the mean annual temperature (in degrees F) and Mortality Index for neoplasms of the female breast. Data were taken from certain regions of Great Britain, Norway, and Sweden. Number of cases: 16 Variable Names: Mortality: Mortality index for neoplasms of the female breast Temperature: Mean annual temperature (in degrees F) In the early 1960s, data were collected from official statistics registers of Great Britain, Norway and Sweden on breast cancer mortality. Death rates for neoplasms of the breast were calculated for various age groups and for certain areas at the same latitude. Age-specific death rates were then calculated for each area and converted to a mortality index using 100 as the age-specific rate for all of England and Wales. The mean annual temperatures at various latitudes under study were obtained from the British Meteorological Office.
phil_larsonDec 2, 2015187B2311
Intro Stats GPA
Grade point average for 24 students in intro stats class. Create a histogram for GPA (If not using StatCrunch, you may want to create a frequency distribution first.) Note: May need to try different bin widths (e.g. 0.5, 1). Is the data continuous or discrete? Comment on the shape of the distribution.
smcdaniel04Jan 23, 2012178B1790
2006 Simulated Household Income.xlsx
This data file contains 5,000 simulated household income assuming that the distribution of household income is log-normally distributed. The mean and standard deviation of the logged data (base 10) were estimated from the U.S. Census Bureau’s report Table HINC-06 entitled Income Distribution to $250,000 or More for Households: 2006. This can be found at the following web site: This data was simulated because I could not find the original data on the web.
craig_slinkmanMar 24, 201040KB604
Number of songs on an iPod
Students in my class answered the question, "How many songs are on your iPod?" The distribution is skewed right. We generated 50 means from samples of size 30 that were randomly selected from this data.
lpyottMar 28, 20122KB768
Intro Stats: Qualitative Data (1-4)
1. Create a pie chart and a bar chart for the number of males and females. 2. Construct a relative frequency distribution for students’ classification. 3. What percentage of students are Freshmen? 4. Construct a frequency and relative frequency bar graph for students’ classification. Compare the graphs of these two. What do you notice?
smcdaniel04Jan 23, 2012937B608
Distribution of US Population
This dataset is based on the distribution of the US population. 1500 households are randomly selected to determine which region they are from in the Observed Count column. The break down of region is then compared to the Expected Count column which is based off of the percentages for each region in the year 2000 (19.0% for Northeast, 22.9% for Midwest, 35.6% for South, and 22.5% for West). This data set comes from "Statistics: Informed Decisions Using Data" by Michael Sullivan.
statcrunchhelpSep 16, 2014140B530
Bottles of Alcohol Solution (Fluid Ounce Quality Control)
Data were simulated and represent 25 bottles of an alcohol solution randomly selected from a manufacturer's distribution center. Each value represents the amount of fluid ounces of alcohol in a randomly selected container. The manufacturer claims the amount of alcohol is normally distributed with a population mean of 12 fl oz and standard deviation of 0.2 fl oz.
dlozimekMar 8, 2018220B474
UTA Cola
This data set consists of 10,000 observations. The variable of interest is the actual number of fluid ounces in a 16 ounce bottle of UTA Cola. The population mean is 16 and the standard deviation is 0.1. This data is useful for demonstrating the concept of random sampling, sampling distributions, confidence intervals, and hypothesis tests.
craig_slinkmanMar 31, 2011137KB238

1 2 3 4 5 6 7 8 9 10   >

Always Learning