
Data sets shared by StatCrunch members
Showing 1 to 15 of 99 data sets matching model
Data Set/Description 
Owner 
Last edited 
Size 
Views 
Car Details 2019 Models
This data set contains info on the 2019 models of widely sold cars. MSRP stands for Manufacture Suggested Retail Price, and MPG stands for Miles Per Gallon.
This data set was originally uploaded to StatCrunch by the parasami user.  statcrunch_featured  Nov 13, 2019  21KB  646 
Domestic Autos 2019 Models
Domestic Sample from featured data, "Car Details 2019 Models," gas engines only.  craigtisddale  Nov 28, 2019  5KB  99 
Foreign Autos 2019 Models
Foreign 2019 model sample from featured data, "Car Details 2019 Models" for gas engines only.  craigtisddale  Nov 28, 2019  14KB  70 
Mean Weights of Boys Ages 2 to 12
I'm using this for Modeling Linear Associations. It has a decent linear correlation coefficient. A linear regression produces the stats and scatter plot with a polynomial of order one trend line overlay which can be used to illustrate extrapolation/interpolation, error estimates, and model breakdown. For over/underestimates and error, interpolate mean weights for 3 and 5 year olds and compare with observed mean weights of 31.0 pounds and 40.5 pounds, respectively. For model breakdown, adjust the xaxis of the scatter plot to range between 0 and 20, with integer tick marks, and the yaxis to range between 0 and 200, with tick marks 0, 10, 20, ..., 200, and an extrapolation for mean weight at age 20 will suggest a weight somewhere near 135 lbs for a 20 year old male.
 kcramer  Oct 26, 2019  110B  545 
Incident Data for Traffic Tickets
These data are from a survey of traffic violations; participants could report on up to 4 incidents. Had Ticket: 0 = No, 1 = Yes; Tickets = Number of tickets received in life; Warnings = Number of warnings (i.e., pulled over but no ticket) in life; Age at incident; Reason for incident; How far over the speed limit the citation was for; Time# and Time both indicate time of day of incident; Road indicates where incident occurred; Utah = Whether incident occurred in Utah (0 = No, 1 = Yes); as well as the make, model, and category of the car.  qtpie1480  Dec 2, 2010  18KB  3080 
TrafficTickets
These data are from a survey of traffic violations; participants could report on up to 4 incidents. Had Ticket: 0 = No, 1 = Yes; Tickets = Number of tickets received in life; Warnings = Number of warnings (i.e., pulled over but no ticket) in life; Age at incident; Reason for incident; How far over the speed limit the citation was for; Time# and Time both indicate time of day of incident; Road indicates where incident occurred; Utah = Whether incident occurred in Utah (0 = No, 1 = Yes); as well as the make, model, and category of the car.  cvoisei  Sep 7, 2017  26KB  2538 
cars
A number of variables (eg. weight, cylinders, country of origin, gas mileage) for 38 models of car  butler  Aug 10, 2011  1KB  3113 
Cigarette Consumption vs CHD Mortality
Now that cigarette smoking has been clearly tied to lung cancer, researchers are focusing on possible links to other diseases. The data below show annual rates of cigarette consumption and deaths from coronary heart disease for several nations. Some public health officials are urging that the US adopt a national goal of cutting cigarette consumption in half over the next decade.
Examine these data and write a report. In your report you should:
1. Include appropriate graphs (e.g. scatterplot, residual plot) and statistics (e.g. mean and SD);
2. Describe the association between cigarette smoking and coronary heart disease;
3. Create a linear model;
4. Evaluate the strength and appropriateness of your model;
5. Interpret the slope and yintercept of the line;
6. Use your model to estimate the potential benefits of reaching the national goal proposed for the US. That is, based on your linear model, if the US were to cut its cigarette consumption in half (from 3900 to 1950), what does the linear model predict would happen to the CHD rate.
7. You should use Statcrunch to generate nice looking graphs and output as needed. Be sure to size them appropriately. No need for a 8x10 scatterplot; Make your graphs about 3x3. You should scale them in Statcrunch first, then copy and paste into Word.
 smcdaniel04  Sep 29, 2011  267B  5791 
Diet
This is a small data set used to illustrate the failure of an inappropriate use of an independent means test compared with a paired test on the same data.
The story is that we have before and after weights for 6 customers of a weight loss clinic. Visual observation makes it clear that the clinic is effective (except in one questionable case). Students can discuss what sources there are for the variation found in the data set and relate them to the assumptions of the independent versus paired analysis models. Application of classical techniques will produce an extremely large p_value for the independent analysis and a significant p_value for the paired analysis.
To illustrate the difference with simulation techniques, first do a randomization for two means between the before and after data groups. This will spectacularly fail to show a difference, when in fact there is a clear difference. Then use a bootstrap to examine the 6 differences and it is clear that a zero difference is highly unlikely.  david.zeitler  May 19, 2011  87B  1956 
Attendance Vs. Grade
Compares percent of classes attended with final grade in the class.
If you use % missed as the independent variable, you end up with a regression model that allows for interpretation of the intercept and has a negative slope.  lbgreen  Jan 28, 2019  744B  1474 
Low Birth Weight Study
SOURCE: Hosmer and Lemeshow (2000) Applied Logistic Regression: Second Edition
Data were collected at Baystate
Medical Center, Springfield, Massachusetts during 1986.
DESCRIPTIVE ABSTRACT:
The goal of this study was to identify risk factors associated with
giving birth to a low birth weight baby (weighing less than 2500 grams).
Data were collected on 189 women, 59 of which had low birth weight babies and 130 of which had normal birth weight babies. Four variables which were thought to be of importance were age, weight of the subject at her last menstrual period, race, and the number of physician visits during the first trimester of pregnancy.
LIST OF VARIABLES:
Columns Variable Abbreviation

24 Identification Code ID
10 Low Birth Weight (0 = Birth Weight >= 2500g, LOW
1 = Birth Weight < 2500g)
1718 Age of the Mother in Years AGE
2325 Weight in Pounds at the Last Menstrual Period LWT
32 Race (1 = White, 2 = Black, 3 = Other) RACE
40 Smoking Status During Pregnancy (1 = Yes, 0 = No) SMOKE
48 History of Premature Labor (0 = None 1 = One, etc.) PTL
55 History of Hypertension (1 = Yes, 0 = No) HT
61 Presence of Uterine Irritability (1 = Yes, 0 = No) UI
67 Number of Physician Visits During the First Trimester FTV
(0 = None, 1 = One, 2 = Two, etc.)
7376 Birth Weight in Grams BWT

PEDAGOGICAL NOTES:
These data have been used as an example of fitting a multiple
logistic regression model.
STORY BEHIND THE DATA:
Low birth weight is an outcome that has been of concern to physicians
for years. This is due to the fact that infant mortality rates and birth
defect rates are very high for low birth weight babies. A woman's behavior
during pregnancy (including diet, smoking habits, and receiving prenatal care)
can greatly alter the chances of carrying the baby to term and, consequently,
of delivering a baby of normal birth weight.
The variables identified in the code sheet given in the table have been
shown to be associated with low birth weight in the obstetrical literature. The
goal of the current study was to ascertain if these variables were important
in the population being served by the medical center where the data were
collected.
References:
1. Hosmer and Lemeshow, Applied Logistic Regression, Wiley, (1989).
 wikipeterson  Jul 23, 2012  6KB  7983 
Baseball data for correlation and regression
This table shows the total number of runs scored, at bats, hits, etc for each of the 30 MLB teams for the 20092011 seasons.
////
Correlations and linear regression models can be calculated between the different numeric variables. A good exercise is to see which variables correlate most strongly with runs_scored.
////
As emphasized in the movie Moneyball, some of the classic metrics such as batting_avg is not as good as the newer metrics like OBP (on base percentage), SLG (slugging percentage), or OPS (on base plus slugging).
////
A guide to a few of the variables that may not be self explanatory.
Runs_Scored: The total of all runs (points) the baseball team scored by the end of the season.
Batting_avg: This is equal to the number of hits divided by at_bats
OBP: On Base Percentage. Similar to batting average, except that it takes into account walks and hitbypitch. Some players who don't have high batting averages, manage to get walked quite frequently.
SLG: Slugging  This weights hits to first base as 1 point, hits to second base as 2 points, third as 3, homeruns as 4, and divides the total by the number of at bats.
OPS  On Base Plus Slugging  this is just OBP added to the SLG numbers.  mileschen  Apr 17, 2012  6KB  4608 
2017 Fuel Economy Data w Weight & Power  No Hybrids.xlsx
2017 model cars sold in U.S. Subset of original data from EPA Office of Transportation and Air Quality. Removed all trucks, SUVs, and hybrids. Also skipped duplicate vehicles (e.g., 4dr and 2dr of same model). Added vehicle weight, hp, torque, and number of passengers.  len.cabrera  Sep 2, 2017  90KB  879 
2017 Fuel Economy Data.xlsx
2017 model cars and light trucks sold in U.S. Original data from EPA Office of Transportation and Air Quality. Modified to remove incomplete fields, combine fields (codes and descriptions), and identify hybrid vehicles.  len.cabrera  Aug 19, 2017  227KB  577 
Arctic sea ice volume
units: Thousand cubic kilometers
The data that was used was collected through a coupled iceocean model known as the PanArctic IceOcean Modeling and Assimilation System. PIOMAS was developed at the Polar Science Center, University of Washington. Variances in arctic sea ice volume are calculated on a daily basis relative to the average over the years of 19792013. The averages of the months of March and September were taken and recorded in my data set to compare the arctic ice volume of these months over the period of 19792013.
 madiyanagin  Nov 4, 2016  1KB  669 

