Data sets shared by StatCrunch members
Showing 1 to 15 of 89 data sets matching regression
Low Birth Weight Study|
SOURCE: Hosmer and Lemeshow (2000) Applied Logistic Regression: Second Edition
Data were collected at Baystate
Medical Center, Springfield, Massachusetts during 1986.
The goal of this study was to identify risk factors associated with
giving birth to a low birth weight baby (weighing less than 2500 grams).
Data were collected on 189 women, 59 of which had low birth weight babies and 130 of which had normal birth weight babies. Four variables which were thought to be of importance were age, weight of the subject at her last menstrual period, race, and the number of physician visits during the first trimester of pregnancy.
LIST OF VARIABLES:
Columns Variable Abbreviation
2-4 Identification Code ID
10 Low Birth Weight (0 = Birth Weight >= 2500g, LOW
1 = Birth Weight < 2500g)
17-18 Age of the Mother in Years AGE
23-25 Weight in Pounds at the Last Menstrual Period LWT
32 Race (1 = White, 2 = Black, 3 = Other) RACE
40 Smoking Status During Pregnancy (1 = Yes, 0 = No) SMOKE
48 History of Premature Labor (0 = None 1 = One, etc.) PTL
55 History of Hypertension (1 = Yes, 0 = No) HT
61 Presence of Uterine Irritability (1 = Yes, 0 = No) UI
67 Number of Physician Visits During the First Trimester FTV
(0 = None, 1 = One, 2 = Two, etc.)
73-76 Birth Weight in Grams BWT
These data have been used as an example of fitting a multiple
logistic regression model.
STORY BEHIND THE DATA:
Low birth weight is an outcome that has been of concern to physicians
for years. This is due to the fact that infant mortality rates and birth
defect rates are very high for low birth weight babies. A woman's behavior
during pregnancy (including diet, smoking habits, and receiving prenatal care)
can greatly alter the chances of carrying the baby to term and, consequently,
of delivering a baby of normal birth weight.
The variables identified in the code sheet given in the table have been
shown to be associated with low birth weight in the obstetrical literature. The
goal of the current study was to ascertain if these variables were important
in the population being served by the medical center where the data were
1. Hosmer and Lemeshow, Applied Logistic Regression, Wiley, (1989).
|yes||wikipeterson||Jul 23, 2012||6KB||504|
Cigarette Consumption vs CHD Mortality|
Now that cigarette smoking has been clearly tied to lung cancer, researchers are focusing on possible links to other diseases. The data below show annual rates of cigarette consumption and deaths from coronary heart disease for several nations. Some public health officials are urging that the US adopt a national goal of cutting cigarette consumption in half over the next decade.
Examine these data and write a report. In your report you should:
1. Include appropriate graphs (e.g. scatterplot, residual plot) and statistics (e.g. mean and SD);
2. Describe the association between cigarette smoking and coronary heart disease;
3. Create a linear model;
4. Evaluate the strength and appropriateness of your model;
5. Interpret the slope and y-intercept of the line;
6. Use your model to estimate the potential benefits of reaching the national goal proposed for the US. That is, based on your linear model, if the US were to cut its cigarette consumption in half (from 3900 to 1950), what does the linear model predict would happen to the CHD rate.
7. You should use Statcrunch to generate nice looking graphs and output as needed. Be sure to size them appropriately. No need for a 8x10 scatterplot; Make your graphs about 3x3. You should scale them in Statcrunch first, then copy and paste into Word.
|yes||smcdaniel04||Sep 29, 2011||267B||1148|
Anscombe's 4 data sets for regression. They are very different, yet have the same correlation and regression coefficients.
|firstname.lastname@example.org||May 31, 2011||360B||215|
Mother and Daughter Heights.xls|
This data set is Galton's Mother and Daughter data set as used in Sanfford Weisberg's Applied Linear Regression, 3rd Edition.
|yes||craig_slinkman||Apr 10, 2010||13KB||1123|
Seating Choice versus GPA (For 3 rows, with Text and Indicator Columns)|
This dataset contains hypothetical (I believe) data on GPA for students who sit in the front, middle, and back rows of a classroom, as well as a hypothetical gender variable. The data are shown using both text variables (e.g., "front" and "middle") and 0/1 indicator variables for the row and gender variables. This dataset is useful for demonstrating the different ways that StatCrunch can compare means based on two factors: (a) the text factor columns can be used in a two-way ANOVA; and (b) the 0/1 indicator columns can be used in multiple regression. (Because of StatCrunch's current limitation on equal cells, the 0/1 variables only use the first and middle rows.) Both procedures gives the same p-value and same conclusion (as long as the interaction term is centered), thus highlighting the similarity of statistical procedures and StatCrunch's flexibility.
|yes||bartonpoulson||Apr 08, 2010||1KB||787|
Seating Choice versus GPA (Stacked & Split Columns for Front & Back Rows)|
This dataset contains hypothetical (I believe) data on GPA for students who sit in the front and back row of a classroom. The data are shown in several ways: (a) two separate columns (one for the front row GPA and another or the back row GPA); (b) stacked with one column to indicate front or back row and another column with the GPAs; and (c) the row column repeated as a 0/1 indicator variable.
This dataset is useful for comparing the different ways that StatCrunch can compare the means of two groups: (a) The two columns of scores (front and back) can be used in the 2-sample t-test or a one-way ANOVA; (b) the stacked text column (front/back) with a separate column for GPA can also be used for one-way ANOVA; and (c) the 0/1 indicator column and stacked GPAs can be used with correlation and regression. Every procedure gives the same p-value and same conclusion, thus highlighting the similarity of statistical procedures and StatCrunch's flexibility.
|yes||bartonpoulson||Apr 08, 2010||465B||409|
Report on the Loss of the ‘Titanic’ (S.S.) (1990), British Board of Trade Inquiry Report (reprint), Gloucester, UK: Allan Sutton Publishing. Taken from the Journal on Statistical Education Archive, submitted by email@example.com. Dr. Craig Slinkman has recoded the data as self-explanatory nominal variables. yes craig_slinkman Mar 23, 2010 68KB 5
|yes||craig_slinkman||Mar 23, 2010||61KB||468|
Home Runs and Strike Outs for 2004 Boston Red Sox by Handedness|
These data show home runs and strike outs for the 12 players from the Boston Red Sox who had more than 200 at-bats in the 2004 season (the first year they won the World Series after the 86-year Curse of the Bambino). It also shows whether the players bat left-handed or as switch hitters, both of which are coded as 0/1 (No/Yes, respectively) indicator variables (also known as dummy variables), as well as a text L/R/LR variable. These data were used for a demonstration for bivariate and multiple regression.
|yes||bartonpoulson||Nov 03, 2009||375B||391|
worked out answer
|yes||dlemay@sc||Mar 08, 2009||192B||75|
Ebay Regression||yes||statcrunch||Feb 25, 2009||2KB||48|
Here is a data set that students in one group of our introductory course generated for this activity. Only three drops at each of eleven heights were made here, but this should provide an idea of the type of data that would be collected.
|yes||smcdanie%sc||Jul 04, 2008||219B||118|
|Regression data sets||yes||marshcc||Mar 12, 2013||4KB||30|
|Chet : Understanding Regression Analysisfirstname.lastname@example.org||Feb 11, 2013||29B||14|
For use in TCC MTH 157 class for regression project.
|yes||krd29491||Feb 07, 2013||331B||199|
|Responses to Sleep Survey|
Topic: Sleeping Habits
Course: STA 220 (statistics)
Semester: Fall 2013
Name: Tiffany Turner
Sleeping habits is a behavioral state that is a natural part of every bodyâ€™s life. Humans spend about 1/3 of their lives asleep. People generally know little about the importance of sleep. Sleep is not just something to fill time when a person is inactive. Sleep is a required activity, not an option. Even though the precise functions of sleep remain a mystery, sleep is important for normal motor and cognitive function. We all recognize and feel the need to sleep. After sleeping, we recognize changes that have occurred, as we feel rested and more alert. Sleep actually appears to be required for survival.
Data was collected through a survey in which individuals were asked about their sleeping habits and what their age is. The survey was given to people in my family and some of the people on statcrunch who participated. I had 14 people to participate in my survey. The data obtained was analyzed by statcrunch data analysis package available at www.statcrunch.com
Analysis and Results:
Both descriptive and inferential analyzing was done at www.statcrunch.com
A. Descriptive Data Analysis: A pie chart was used to describe the sample data since there were all different ages being used.
B. Test analysis (inferential statistics) Regression analysis was done by using statcrunch to identify the existence of a correlation between the participantsâ€™ weekdays and weekends sleeping habits. A similar analysis. A similar analysis was also done for a possible correlation between the age and the hours slept. The result indicates almost zero correlations between individualâ€™s age and sleeping habits.
The linear regression results obtained contradicted my initial belief that an individualâ€™s age will not increase the hours an individual will sleep. The respondents could have provided inaccurate data since there is no way to verify the information obtained from the survey. Also, the 14 individuals from my family and HCTC who were surveyed may have been dominantly age observers who sleep more during weekends then weekdays. Given the above, it would not be accurate to conclude that there is no correlation between age and how much sleep one gets. Proper methods of data collection such as observation may be appropriate for this type of study.
|yes||tturner0090||Jan 28, 2013||572B||61|