StatCrunch logo (home)

Data sets shared by StatCrunch members
Showing 1 to 15 of 99 data sets matching model
Data Set/Description Owner Last edited Size Views
Car Details 2019 Models
This data set contains info on the 2019 models of widely sold cars. MSRP stands for Manufacture Suggested Retail Price, and MPG stands for Miles Per Gallon.

This data set was originally uploaded to StatCrunch by the parasami user.
statcrunch_featuredNov 13, 201921KB646
Domestic Autos 2019 Models
Domestic Sample from featured data, "Car Details 2019 Models," gas engines only.
craigtisddaleNov 28, 20195KB99
Foreign Autos 2019 Models
Foreign 2019 model sample from featured data, "Car Details 2019 Models" for gas engines only.
craigtisddaleNov 28, 201914KB70
Mean Weights of Boys Ages 2 to 12
I'm using this for Modeling Linear Associations. It has a decent linear correlation coefficient. A linear regression produces the stats and scatter plot with a polynomial of order one trend line overlay which can be used to illustrate extrapolation/interpolation, error estimates, and model breakdown. For over/underestimates and error, interpolate mean weights for 3 and 5 year olds and compare with observed mean weights of 31.0 pounds and 40.5 pounds, respectively. For model breakdown, adjust the x-axis of the scatter plot to range between 0 and 20, with integer tick marks, and the y-axis to range between 0 and 200, with tick marks 0, 10, 20, ..., 200, and an extrapolation for mean weight at age 20 will suggest a weight somewhere near 135 lbs for a 20 year old male.
kcramerOct 26, 2019110B545
Incident Data for Traffic Tickets
These data are from a survey of traffic violations; participants could report on up to 4 incidents. Had Ticket: 0 = No, 1 = Yes; Tickets = Number of tickets received in life; Warnings = Number of warnings (i.e., pulled over but no ticket) in life; Age at incident; Reason for incident; How far over the speed limit the citation was for; Time# and Time both indicate time of day of incident; Road indicates where incident occurred; Utah = Whether incident occurred in Utah (0 = No, 1 = Yes); as well as the make, model, and category of the car.
qtpie1480Dec 2, 201018KB3080
TrafficTickets
These data are from a survey of traffic violations; participants could report on up to 4 incidents. Had Ticket: 0 = No, 1 = Yes; Tickets = Number of tickets received in life; Warnings = Number of warnings (i.e., pulled over but no ticket) in life; Age at incident; Reason for incident; How far over the speed limit the citation was for; Time# and Time both indicate time of day of incident; Road indicates where incident occurred; Utah = Whether incident occurred in Utah (0 = No, 1 = Yes); as well as the make, model, and category of the car.
cvoiseiSep 7, 201726KB2538
cars
A number of variables (eg. weight, cylinders, country of origin, gas mileage) for 38 models of car
butlerAug 10, 20111KB3113
Cigarette Consumption vs CHD Mortality
Now that cigarette smoking has been clearly tied to lung cancer, researchers are focusing on possible links to other diseases. The data below show annual rates of cigarette consumption and deaths from coronary heart disease for several nations. Some public health officials are urging that the US adopt a national goal of cutting cigarette consumption in half over the next decade. Examine these data and write a report. In your report you should: 1. Include appropriate graphs (e.g. scatterplot, residual plot) and statistics (e.g. mean and SD); 2. Describe the association between cigarette smoking and coronary heart disease; 3. Create a linear model; 4. Evaluate the strength and appropriateness of your model; 5. Interpret the slope and y-intercept of the line; 6. Use your model to estimate the potential benefits of reaching the national goal proposed for the US. That is, based on your linear model, if the US were to cut its cigarette consumption in half (from 3900 to 1950), what does the linear model predict would happen to the CHD rate. 7. You should use Statcrunch to generate nice looking graphs and output as needed. Be sure to size them appropriately. No need for a 8x10 scatterplot; Make your graphs about 3x3. You should scale them in Statcrunch first, then copy and paste into Word.
smcdaniel04Sep 29, 2011267B5791
Diet
This is a small data set used to illustrate the failure of an inappropriate use of an independent means test compared with a paired test on the same data. The story is that we have before and after weights for 6 customers of a weight loss clinic. Visual observation makes it clear that the clinic is effective (except in one questionable case). Students can discuss what sources there are for the variation found in the data set and relate them to the assumptions of the independent versus paired analysis models. Application of classical techniques will produce an extremely large p_value for the independent analysis and a significant p_value for the paired analysis. To illustrate the difference with simulation techniques, first do a randomization for two means between the before and after data groups. This will spectacularly fail to show a difference, when in fact there is a clear difference. Then use a bootstrap to examine the 6 differences and it is clear that a zero difference is highly unlikely.
david.zeitlerMay 19, 201187B1956
Attendance Vs. Grade
Compares percent of classes attended with final grade in the class. If you use % missed as the independent variable, you end up with a regression model that allows for interpretation of the intercept and has a negative slope.
lbgreenJan 28, 2019744B1474
Low Birth Weight Study
SOURCE: Hosmer and Lemeshow (2000) Applied Logistic Regression: Second Edition Data were collected at Baystate Medical Center, Springfield, Massachusetts during 1986. DESCRIPTIVE ABSTRACT: The goal of this study was to identify risk factors associated with giving birth to a low birth weight baby (weighing less than 2500 grams). Data were collected on 189 women, 59 of which had low birth weight babies and 130 of which had normal birth weight babies. Four variables which were thought to be of importance were age, weight of the subject at her last menstrual period, race, and the number of physician visits during the first trimester of pregnancy. LIST OF VARIABLES: Columns Variable Abbreviation ----------------------------------------------------------------------------- 2-4 Identification Code ID 10 Low Birth Weight (0 = Birth Weight >= 2500g, LOW 1 = Birth Weight < 2500g) 17-18 Age of the Mother in Years AGE 23-25 Weight in Pounds at the Last Menstrual Period LWT 32 Race (1 = White, 2 = Black, 3 = Other) RACE 40 Smoking Status During Pregnancy (1 = Yes, 0 = No) SMOKE 48 History of Premature Labor (0 = None 1 = One, etc.) PTL 55 History of Hypertension (1 = Yes, 0 = No) HT 61 Presence of Uterine Irritability (1 = Yes, 0 = No) UI 67 Number of Physician Visits During the First Trimester FTV (0 = None, 1 = One, 2 = Two, etc.) 73-76 Birth Weight in Grams BWT ----------------------------------------------------------------------------- PEDAGOGICAL NOTES: These data have been used as an example of fitting a multiple logistic regression model. STORY BEHIND THE DATA: Low birth weight is an outcome that has been of concern to physicians for years. This is due to the fact that infant mortality rates and birth defect rates are very high for low birth weight babies. A woman's behavior during pregnancy (including diet, smoking habits, and receiving prenatal care) can greatly alter the chances of carrying the baby to term and, consequently, of delivering a baby of normal birth weight. The variables identified in the code sheet given in the table have been shown to be associated with low birth weight in the obstetrical literature. The goal of the current study was to ascertain if these variables were important in the population being served by the medical center where the data were collected. References: 1. Hosmer and Lemeshow, Applied Logistic Regression, Wiley, (1989).
wikipetersonJul 23, 20126KB7983
Baseball data for correlation and regression
This table shows the total number of runs scored, at bats, hits, etc for each of the 30 MLB teams for the 2009-2011 seasons. //// Correlations and linear regression models can be calculated between the different numeric variables. A good exercise is to see which variables correlate most strongly with runs_scored. //// As emphasized in the movie Moneyball, some of the classic metrics such as batting_avg is not as good as the newer metrics like OBP (on base percentage), SLG (slugging percentage), or OPS (on base plus slugging). //// A guide to a few of the variables that may not be self explanatory. Runs_Scored: The total of all runs (points) the baseball team scored by the end of the season. Batting_avg: This is equal to the number of hits divided by at_bats OBP: On Base Percentage. Similar to batting average, except that it takes into account walks and hit-by-pitch. Some players who don't have high batting averages, manage to get walked quite frequently. SLG: Slugging - This weights hits to first base as 1 point, hits to second base as 2 points, third as 3, homeruns as 4, and divides the total by the number of at bats. OPS - On Base Plus Slugging - this is just OBP added to the SLG numbers.
mileschenApr 17, 20126KB4608
2017 Fuel Economy Data w Weight & Power - No Hybrids.xlsx
2017 model cars sold in U.S. Subset of original data from EPA Office of Transportation and Air Quality. Removed all trucks, SUVs, and hybrids. Also skipped duplicate vehicles (e.g., 4-dr and 2-dr of same model). Added vehicle weight, hp, torque, and number of passengers.
len.cabreraSep 2, 201790KB879
2017 Fuel Economy Data.xlsx
2017 model cars and light trucks sold in U.S. Original data from EPA Office of Transportation and Air Quality. Modified to remove incomplete fields, combine fields (codes and descriptions), and identify hybrid vehicles.
len.cabreraAug 19, 2017227KB577
Arctic sea ice volume
units: Thousand cubic kilometers The data that was used was collected through a coupled ice-ocean model known as the Pan-Arctic Ice-Ocean Modeling and Assimilation System. PIOMAS was developed at the Polar Science Center, University of Washington. Variances in arctic sea ice volume are calculated on a daily basis relative to the average over the years of 1979-2013. The averages of the months of March and September were taken and recorded in my data set to compare the arctic ice volume of these months over the period of 1979-2013.
madiyanaginNov 4, 20161KB669

1 2 3 4 5 6 7   >

Always Learning