StatCrunch logo (home)

Data sets shared by StatCrunch members
Showing 1 to 15 of 21 data sets matching variation
Data Set/Description Owner Last edited Size Views
RegisteredNursesSurvey.xlsx
For what survey produced it, see http://www.statcrunch.com/5.0/survey.php?surveyid=8178&code=YINVQ and inputs of all team mates. Towards the end, some validation was done, deleting data where working hours was less than a work day, or outliers to legally admissible work days. Finally arbitarily long chains which were less likely to be encountered in draws of simulated data (M/F, Degrees etc.. were discarded). A total of 12 observations were thus thrown out. All Credit goes to Team 3,the Instructor, our unnamed Friends in the Nursing profession who enthusiastically did a last minute push through over their extended social media groups for data and the respondents who kindly took out time for the survey. Another thought is about the distribution of hours worked. Wven if random, it "should be" "centered on" certain hours a day* number of days, with deviations from centre penalised, while picking a sample.. The observations 38 appear many times for example, however without an explainable reason (we are talking of work-distribution among nursing staff sample) So do "primes" "47, 37, 29" It is not to argue that they "shouldn't occur", but there has to be some reason for their being so significant/vibrant. At this stage we may conclude that most of the respondents may not have been under full-time nursing employments in strict sense of the term. 42, 48,72,60, 50,40 appearing more often would give us less variation but more regularity in the data. Since we haven't tried stratification, we do not know "how often they should occur". We thus do not re-draw observations.
ugoagwuJun 14, 20142KB1127
Diet
This is a small data set used to illustrate the failure of an inappropriate use of an independent means test compared with a paired test on the same data. The story is that we have before and after weights for 6 customers of a weight loss clinic. Visual observation makes it clear that the clinic is effective (except in one questionable case). Students can discuss what sources there are for the variation found in the data set and relate them to the assumptions of the independent versus paired analysis models. Application of classical techniques will produce an extremely large p_value for the independent analysis and a significant p_value for the paired analysis. To illustrate the difference with simulation techniques, first do a randomization for two means between the before and after data groups. This will spectacularly fail to show a difference, when in fact there is a clear difference. Then use a bootstrap to examine the 6 differences and it is clear that a zero difference is highly unlikely.
david.zeitlerMay 19, 201187B1930
Regression: Cigarettes Lung Kidney Leukemia Bladder
"Cigarette smoking and cancers of the urinary tract: Geographic variation in the United States" Journal of the National Cancer Institute (vol. 41, no. 5, November, 1968), pp. 1205-1211; table from pp. 1206-1207. Joseph F. Fraumeni, Jr. Oxford University Press Units: cigarettes sold per capita, cancer deaths per 100,000
phil_larsonSep 22, 20132KB3663
smoking data.xls
Reference: J.F. Fraumeni, "Cigarette Smoking and Cancers of the Urinary Tract: Geographic Variations in the United States," Journal of the National Cancer Institute, 41, 1205-1211. Authorization: free use Description: The data are per capita numbers of cigarettes smoked (sold) by 43 states and the District of Columbia in 1960 together with death rates per thouusand population from various forms of cancer. Number of cases: 44 Variable Names: CIG = Number of cigarettes smoked (hds per capita) BLAD = Deaths per 100K population from bladder cancer LUNG = Deathes per 100K population from lung cancer KID = Deaths per 100K population from bladder cancer LEUK = Deaths per 100 K population from leukemia
wikipetersonDec 2, 20081KB1778
cigarette data 1960 Data Only.xlsx
The data are per capita numbers of cigarettes smoked (sold) by 43 states and the District of Columbia in 1960 together with death rates per thouusand population from various forms of cancer. Reference: J.F. Fraumeni, "Cigarette Smoking and Cancers of the Urinary Tract: Geographic Variations in the United States," Journal of the National Cancer Institute, 41, 1205-1211.
gorlinmaJan 20, 20161KB603
Analysis of Mean and Variation - How far from School is your Home
To demonstrate the role of outliers in changing statistics
jricoiiiJan 26, 2017750B211
10 highest peaks in Rockies and Appalachians
Is there more variation in the heights of the highest peaks in the Rockies (newer) or in the Appalachians (older)? Use coefficient of variation to find out!
anderson_instructorAug 18, 2018174B163
Smoking and Cancer
The data are per capita numbers of cigarettes smoked (sold) by 43 states and the District of Columbia in 1960 together with death rates per thouusand population from various forms of cancer. Number of cases: 44 Reference: J.F. Fraumeni, "Cigarette Smoking and Cancers of the Urinary Tract: Geographic Variations in the United States," Journal of the National Cancer Institute, 41, 1205-1211. [Outlier , Regression , Residuals , Transformation , Nonlinear regression , Dummy variable]
VariableDescription
CIG Number of cigarettes smoked (hds per capita)
BLAD Deaths per 100K population from bladder cancer
LUNG Deathes per 100K population from lung cancer
KID Deaths per 100K population from bladder cancer
LEUK Deaths per 100 K population from leukemia
ds-231%scAug 11, 20081KB1664
Smoking and cancer
Per capita numbers of cigarettes smoked (sold) by 43 states and the District of Columbia in 1960 and death rates per thousand population from various forms of cancer. Reference: J.F. Fraumeni, Cigarette Smoking and Cancers of the Urinary Tract: Geographic Variations in the United States, Journal of the National Cancer Institute, 41, 1205-1211.
ColumnDescription
STATE state abbreviation
CIG Number of cigarettes smoked (hds per capita)
BLAD Deaths per 100K population from bladder cancer
LUNG Deaths per 100K population from lung cancer
KID Deaths per 100K population from bladder cancer
LEUK Deaths per 100K population from leukemia
sampleuserMay 25, 20071KB608
Regression: Cigarettes Lung Kidney Leukemia Bladder
"Cigarette smoking and cancers of the urinary tract: Geographic variation in the United States" Journal of the National Cancer Institute (vol. 41, no. 5, November, 1968), pp. 1205-1211; table from pp. 1206-1207. Joseph F. Fraumeni, Jr. Oxford University Press Units: cigarettes sold per capita, cancer deaths per 100,000
lauren.bartschApr 9, 20152KB350
Project Data
This is my project data. There are three categories of normal hair color: Blonde, Brown, and Red. Any variation of a normal hair color with an "unnatural" hair color such as pink/green/or blue is shown with a / behind the natural hair color. Example: Blonde/Pink. Pink is the "unnatural" hair color.
kschaap22Apr 29, 2010528B107
Smoking and cancer
Per capita numbers of cigarettes smoked (sold) by 43 states and the District of Columbia in 1960 and death rates per thousand population from various forms of cancer. Reference: J.F. Fraumeni, Cigarette Smoking and Cancers of the Urinary Tract: Geographic Variations in the United States, Journal of the National Cancer Institute, 41, 1205-1211.
ColumnDescription
STATE state abbreviation
CIG Number of cigarettes smoked (hds per capita)
BLAD Deaths per 100K population from bladder cancer
LUNG Deaths per 100K population from lung cancer
KID Deaths per 100K population from bladder cancer
LEUK Deaths per 100K population from leukemia
mystatcourseAug 10, 20082KB180
Exercise 17.34 Trees
Data provided by Jason Hamilton, University of Illinois. The study is reported in E. H. DeLucia et al., "Net primary production of a forest ecosystem with experimental CO2 enhancement," Science, 284 (1999), pp. 1177-1179. Note that resampling methods cannot remove the variation due to random sampling of the original data. no method for inference can be trusted with n=3. In this study, each observation is very costly, so the small n is inevitable.
bbeardAug 11, 200870B131
smoking data-NE.xls
Reference: J.F. Fraumeni, "Cigarette Smoking and Cancers of the Urinary Tract: Geographic Variations in the United States," Journal of the National Cancer Institute, 41, 1205-1211. Authorization: free use Description: The data are per capita numbers of cigarettes smoked (sold) by 43 states and the District of Columbia in 1960 together with death rates per thouusand population from various forms of cancer. Number of cases: 44 Variable Names: CIG = Number of cigarettes smoked (hds per capita) BLAD = Deaths per 100K population from bladder cancer LUNG = Deathes per 100K population from lung cancer KID = Deaths per 100K population from bladder cancer LEUK = Deaths per 100 K population from leukemia
emirelesOct 3, 2016758B61
Exercise 4.37 Drilling
From a graph in F. S. Hu et al., "Cyclic variation and solar forcing of Holocene climate in the Alaskan subartic," Science, 301 (2003), pp. 1890-1893.
bbeardAug 11, 2008177B37

1 2   >

Always Learning