StatCrunch logo (home)

Data sets shared by StatCrunch members
Showing 1 to 15 of 74 data sets matching linear
Data Set/Description Owner Last edited Size Views
Mean Weights of Boys Ages 2 to 12
I'm using this for Modeling Linear Associations. It has a decent linear correlation coefficient. A linear regression produces the stats and scatter plot with a polynomial of order one trend line overlay which can be used to illustrate extrapolation/interpolation, error estimates, and model breakdown. For over/underestimates and error, interpolate mean weights for 3 and 5 year olds and compare with observed mean weights of 31.0 pounds and 40.5 pounds, respectively. For model breakdown, adjust the x-axis of the scatter plot to range between 0 and 20, with integer tick marks, and the y-axis to range between 0 and 200, with tick marks 0, 10, 20, ..., 200, and an extrapolation for mean weight at age 20 will suggest a weight somewhere near 135 lbs for a 20 year old male.
kcramerOct 26, 2019110B543
StatCrunch Instruction Sheet Linear Corr and Reg Example - S. Lohse
This data set was included in a text book I was using at the time this example sheet was written.
slohse9395Oct 7, 2019179B426
Mother and Daughter Heights.xls
This data set is Galton's Mother and Daughter data set as used in Sanfford Weisberg's Applied Linear Regression, 3rd Edition.
craig_slinkmanApr 10, 201013KB7301
Cigarette Consumption vs CHD Mortality
Now that cigarette smoking has been clearly tied to lung cancer, researchers are focusing on possible links to other diseases. The data below show annual rates of cigarette consumption and deaths from coronary heart disease for several nations. Some public health officials are urging that the US adopt a national goal of cutting cigarette consumption in half over the next decade. Examine these data and write a report. In your report you should: 1. Include appropriate graphs (e.g. scatterplot, residual plot) and statistics (e.g. mean and SD); 2. Describe the association between cigarette smoking and coronary heart disease; 3. Create a linear model; 4. Evaluate the strength and appropriateness of your model; 5. Interpret the slope and y-intercept of the line; 6. Use your model to estimate the potential benefits of reaching the national goal proposed for the US. That is, based on your linear model, if the US were to cut its cigarette consumption in half (from 3900 to 1950), what does the linear model predict would happen to the CHD rate. 7. You should use Statcrunch to generate nice looking graphs and output as needed. Be sure to size them appropriately. No need for a 8x10 scatterplot; Make your graphs about 3x3. You should scale them in Statcrunch first, then copy and paste into Word.
smcdaniel04Sep 29, 2011267B5788
Rebound Regression
Here is a data set that students in one group of our introductory course generated for this activity. Only three drops at each of eleven heights were made here, but this should provide an idea of the type of data that would be collected.
smcdanie%scJul 4, 2008219B902
Baseball data for correlation and regression
This table shows the total number of runs scored, at bats, hits, etc for each of the 30 MLB teams for the 2009-2011 seasons. //// Correlations and linear regression models can be calculated between the different numeric variables. A good exercise is to see which variables correlate most strongly with runs_scored. //// As emphasized in the movie Moneyball, some of the classic metrics such as batting_avg is not as good as the newer metrics like OBP (on base percentage), SLG (slugging percentage), or OPS (on base plus slugging). //// A guide to a few of the variables that may not be self explanatory. Runs_Scored: The total of all runs (points) the baseball team scored by the end of the season. Batting_avg: This is equal to the number of hits divided by at_bats OBP: On Base Percentage. Similar to batting average, except that it takes into account walks and hit-by-pitch. Some players who don't have high batting averages, manage to get walked quite frequently. SLG: Slugging - This weights hits to first base as 1 point, hits to second base as 2 points, third as 3, homeruns as 4, and divides the total by the number of at bats. OPS - On Base Plus Slugging - this is just OBP added to the SLG numbers.
mileschenApr 17, 20126KB4590
realestate
This example was used during a slide presentation on simple linear regression descriptive statistics in STAT 215 at WVU. This data tables lists the selling price (in $1000), size (in 100ft^2), and condition (from 1-10) of n=10 homes sold in 1986 in some market.
kjryanNov 25, 2017138B662
7-1 Discussion LinearRegression_SampleTests2
7-1 Discussion added Max Temperature MAT 240, 17EW1, R1026 course
smiiOct 19, 2017803B361
chapter 9
This data set is Galton's Mother and Daughter data set as used in Sanfford Weisberg's Applied Linear Regression, 3rd Edition.
katcroweApr 12, 2019847B83
7-1 Discussion LinearRegression_SampleTests SMM1
7-1 Discussion Linear Regression, level of humidity between mid-September and beginning October for years 2016 and 2017 for MAT 270, R1026, 17EW1 course.
smiiOct 19, 2017543B159
Singfat Chu diamond ring data
NAME: Diamond Ring Pricing Using Linear Regression TYPE: Random sample SIZE: 48 observations, 2 variables DESCRIPTIVE ABSTRACT: This dataset contains the prices of ladies' diamond rings and the carat size of their diamond stones. The rings are made with gold of 20 carats purity and are each mounted with a single diamond stone. SOURCE: The source of the data is a full page advertisement placed in the _Straits Times_ newspaper issue of February 29, 1992, by a Singapore-based retailer of diamond jewelry. VARIABLE DESCRIPTIONS: Columns 6 - 8 Size of diamond in carats (1 carat = .2 gram) 16 - 19 Price of ring in Singapore dollars Values are aligned and delimited by blanks. There are no missing values. STORY BEHIND THE DATA: Data presented in a newspaper advertisement suggest the use of simple linear regression to relate the prices of diamond rings to the weights of their diamond stones. The intercept of the resulting regression line is negative and significantly different from zero. This finding raises questions about an assumed pricing mechanism and motivates consideration of remedial actions. PEDAGOGICAL NOTES: This dataset can be used to illustrate model-building in linear regression. A possibly counter-intuitive negative intercept may be avoided by using a multiplicative or exponential regression model. These regression models are intrinsically linear, and they are estimated using standard linear regression technology after a suitable transformation of the data. Additional information about these data can be found in the "Datasets and Stories" article "Diamond Ring Pricing Using Linear Regression" in the _Journal of Statistics Education_ (Chu 1996). SUBMITTED BY: Singfat Chu Department of Decision Sciences National University of Singapore 10 Kent Ridge Crescent Singapore 119260 fbachucl@nus.sg
worths1Oct 29, 20081KB852
US Crime
These data are crime-related and demographic statistics for 47 US states in 1960. The data were collected from the FBI's Uniform Crime Report and other government agencies to determine how the variable crime rate depends on the other variables measured in the study. Number of cases: 47 Reference:Vandaele, W. (1978) Participation in illegitimate activities: Erlich revisited. In Deterrence and incapacitation, Blumstein, A., Cohen, J. and Nagin, D., eds., Washington, D.C.: National Academy of Sciences, 270-335. Methods: A Primer, New York: Chapman & Hall, 11. Also found in: Hand, D.J., et al. (1994) A Handbook of Small Data Sets, London: Chapman & Hall, 101-103. [Collinearity , Correlation , Causation , Lurking variable , Regression]
VariableDescription
R Crime rate # of offenses reported to police per million population
Age The number of males of age 14-24 per 1000 population
S Indicator variable for Southern states (0 = No, 1 = Yes)
Ed Mean # of years of schooling x 10 for persons of age 25 or older
Ex0 1960 per capita expenditure on police by state and local government
Ex1 1959 per capita expenditure on police by state and local government
LF Labor force participation rate per 1000 civilian urban males age 14-24
MThe number of males per 1000 females
NState population size in hundred thousands
NW The number of non-whites per 1000 population
U1Unemployment rate of urban males per 1000 of age 14-24
U2 Unemployment rate of urban males per 1000 of age 35-39
W Median value of transferable goods and assets or family income in tens of $
X The number of families per 1000 earning below 1/2 the median income
ds-231%scAug 11, 20082KB2385
Wages and Hours
The data are from a national sample of 6000 households with a male head earning less than $15,000 annually in 1966. The data were clasified into 39 demographic groups for analysis. The study was undertaken in the context of proposals for a guaranteed annual wage (negative income tax). At issue was the response of labor supply (average hours) to increasing hourly wages. The study was undertaken to estimate this response from available data [ Regression , Outlier , Collinearity , Assumptions, regression]
VariableDescription
HRSAverage hours worked during the year
WAGE Average hourly wage ($)
ERSP Average yearly earnings of spouse ($)
ERNO Average yearly earnings of other family members ($)
NEIN Average yearly non-earned income
ASSET Average family asset holdings (Bank account, etc.) ($)
AGE Average age of respondent
DEP Average number of dependents
RACEPercent of white respondents
SCHOOL Average highest grade of school completed
ds-231%scAug 11, 20082KB1660
Smoking and Cancer
The data are per capita numbers of cigarettes smoked (sold) by 43 states and the District of Columbia in 1960 together with death rates per thouusand population from various forms of cancer. Number of cases: 44 Reference: J.F. Fraumeni, "Cigarette Smoking and Cancers of the Urinary Tract: Geographic Variations in the United States," Journal of the National Cancer Institute, 41, 1205-1211. [Outlier , Regression , Residuals , Transformation , Nonlinear regression , Dummy variable]
VariableDescription
CIG Number of cigarettes smoked (hds per capita)
BLAD Deaths per 100K population from bladder cancer
LUNG Deathes per 100K population from lung cancer
KID Deaths per 100K population from bladder cancer
LEUK Deaths per 100 K population from leukemia
ds-231%scAug 11, 20081KB1688
Responses to Sleep Survey
Topic: Sleeping Habits Course: STA 220 (statistics) Semester: Fall 2013 Name: Tiffany Turner Introduction: Sleeping habits is a behavioral state that is a natural part of every body’s life. Humans spend about 1/3 of their lives asleep. People generally know little about the importance of sleep. Sleep is not just something to fill time when a person is inactive. Sleep is a required activity, not an option. Even though the precise functions of sleep remain a mystery, sleep is important for normal motor and cognitive function. We all recognize and feel the need to sleep. After sleeping, we recognize changes that have occurred, as we feel rested and more alert. Sleep actually appears to be required for survival. Methodology: Data was collected through a survey in which individuals were asked about their sleeping habits and what their age is. The survey was given to people in my family and some of the people on statcrunch who participated. I had 14 people to participate in my survey. The data obtained was analyzed by statcrunch data analysis package available at www.statcrunch.com Analysis and Results: Both descriptive and inferential analyzing was done at www.statcrunch.com A. Descriptive Data Analysis: A pie chart was used to describe the sample data since there were all different ages being used. B. Test analysis (inferential statistics) Regression analysis was done by using statcrunch to identify the existence of a correlation between the participants’ weekdays and weekends sleeping habits. A similar analysis. A similar analysis was also done for a possible correlation between the age and the hours slept. The result indicates almost zero correlations between individual’s age and sleeping habits. Conclusion: The linear regression results obtained contradicted my initial belief that an individual’s age will not increase the hours an individual will sleep. The respondents could have provided inaccurate data since there is no way to verify the information obtained from the survey. Also, the 14 individuals from my family and HCTC who were surveyed may have been dominantly age observers who sleep more during weekends then weekdays. Given the above, it would not be accurate to conclude that there is no correlation between age and how much sleep one gets. Proper methods of data collection such as observation may be appropriate for this type of study.
tturner0090Jan 28, 2013572B1616

1 2 3 4 5   >

Always Learning