StatCrunch logo (home)

Data sets shared by StatCrunch members
Showing 1 to 15 of 663 data sets matching EXAM
Data Set/Description Owner Last edited Size Views
All MLB Salaries (1985-2015)
This data has all MLB player salaries between 1985-2015 including the team played for, the city, and a unique ID for each player. Total this includes 25,575 salaries for 4,963 different baseball players.
The player ID is the first 5 letters from the last name, followed by the first two letters from the first name, followed by a number in case of duplicate names. For example, bondsba01 stands for Barry Bonds with "01" because he's the first with the "bondsba" name ID.
statcrunch_featuredJun 27, 20171MB4380
Median sales price vs Median rent for housing in 50 cities
This data is obtained from Zillow and includes the median sales price and the median price for to rent a home in 50 cities, as of July 2018, taken from https://www.zillow.com/research/local-market-reports/ This will be an excellent data set to use to introduce correlation and regression. Can we predict the median rent in a city based on the median price of homes sold in the city? It is also a good example to discuss the effect of outliers.
rosenthiSep 11, 20181KB496
Times World University Rankings (2011-2016)
This data comes from the annual Times magazine rankings of universities across the world. The webpage for the Times 2016 rankings is listed above in the source.
The formula for the 2016 rankings is as follows:
30% for Teaching Rating
7.5% for International Outlook Rating
30% for Research Rating
30% for Citations Rating
2.5% for Industry Income Rating.
The “Total Score” from 2016 can be recreated using this formula.

ColumnDescription
World_RankUniversity rank for a given year
University_NameThe name of the university
CountryLocation of university
Teaching_Rating Rating from a 0-100 scale of the quality of teaching at the university. This rating is based on the institution’s reputation for teaching, it’s student/staff ratio, it’s PhD’s/ undergraduate degrees awarded ratio, and it’s institutional income/ academic staff ratio.
Inter_Outlook_Rating Rating from a 0-100 scale of the international makeup of a university. This rating is based the international student percentage, international staff percentage, and the percentage of research papers from the university that include at least one international author.
Research_Rating Rating from a 0-100 scale of quality of research at the university. This rating is based on the university’s reputation, it’s research income/ academic staff ratio, and it’s production of scholarly papers.
Citations_Rating Rating from a 0-100 scale of based on the normalized average of citations by other papers per paper from the university (how often the research from the university is cited by other papers).
Industry_Income_Rating Rating from a 0-100 scale grading how much companies are willing to invest in the universities research. The rating is calculated based on the research income from businesses per academic staff member.
Total_ScoreThe final score used to determine the university ranking based on Teaching_Rating, International_Outlook_Rating, Research_Rating, Citations_Rating, and Industrial_Income_Rating.
Num_StudentsTotal number of students in a given year
Student/Staff_RatioNumber of students per academic staff member
%_Inter_StudentsPercentage of student body who come from a foreign county
%_Female_Students Percentage of student body that is female.
YearAcademic year that the ranking was released. For example, 2016 denotes the 2015-2016 academic year.
statcrunchhelpApr 5, 2016254KB3696
Baseball2013.xlsx
Stats from the major league baseball teams for 2013. The last column I added denotes AL for American League and NL for National League. One could possibly conduct a two-sample means test, for example, to find out whether the average runs for the two leagues are equal. Or there are of course lots of regressions one could run.
eykoloNov 4, 20133KB1986
All MLB Salaries (1985-2015)
This data has all MLB player salaries between 1985-2015 including the team played for, the city, and a unique ID for each player. Total this includes 25,575 salaries for 4,963 different baseball players.
The player ID is the first 5 letters from the last name, followed by the first two letters from the first name, followed by a number in case of duplicate names. For example, bondsba01 stands for Barry Bonds with "01" because he's the first with the "bondsba" name ID.
statcrunchhelpMar 15, 20161MB1483
RegisteredNursesSurvey.xlsx
For what survey produced it, see http://www.statcrunch.com/5.0/survey.php?surveyid=8178&code=YINVQ and inputs of all team mates. Towards the end, some validation was done, deleting data where working hours was less than a work day, or outliers to legally admissible work days. Finally arbitarily long chains which were less likely to be encountered in draws of simulated data (M/F, Degrees etc.. were discarded). A total of 12 observations were thus thrown out. All Credit goes to Team 3,the Instructor, our unnamed Friends in the Nursing profession who enthusiastically did a last minute push through over their extended social media groups for data and the respondents who kindly took out time for the survey. Another thought is about the distribution of hours worked. Wven if random, it "should be" "centered on" certain hours a day* number of days, with deviations from centre penalised, while picking a sample.. The observations 38 appear many times for example, however without an explainable reason (we are talking of work-distribution among nursing staff sample) So do "primes" "47, 37, 29" It is not to argue that they "shouldn't occur", but there has to be some reason for their being so significant/vibrant. At this stage we may conclude that most of the respondents may not have been under full-time nursing employments in strict sense of the term. 42, 48,72,60, 50,40 appearing more often would give us less variation but more regularity in the data. Since we haven't tried stratification, we do not know "how often they should occur". We thus do not re-draw observations.
ugoagwuJun 14, 20142KB953
Body Image Data Set
A student survey was conducted at a major university. Data were collected from a random sample of 239 undergraduate students, and the information that was collected included physical characteristics (such as height, handedness, etc.), study habits, academic performance and attitudes, and social behaviors. In this exercise, we will focus on exploring relationships between some of those variables. Note that empty boxes signify that this observation is not available (this is known as a 'missing value').Variables: Variables Gender - Male or Female Height - Self-reported height (in inches) GPA - Student's cumulative college GPA HS_GPA - Student's high school GPA (senior year) Seat - Typical classroom seat location (F = Front, M = Middle, B = Back) WtFeel - Does the student feel that he/she is: Underweight, About Right, Overweight Cheat - Would the tell the instructor if he/she saw somebody cheating on exam? (No or Yes)
stjohn314Apr 8, 201810KB2311
Alcohol data from adults
My group and I design a survey to find out among the adult who drinks , why they drink, their age, education level and how many drink they have per day. The data was gathered individually and put together into statcrunch by one member of the group. This survey shows the number of drinking adults and what motivate them to drink. Our survey question is below. 1. Do you Drink Alcohol? Circle one: Y N 2. What is your age?____years 3. What is your gender? Circle one: Male Female 4. Are you having an increasing number of A. Financial problems B. family problems C. Work problems D. Health problems E. Financial and family problems F financial, health and family problems G. Family and work problems H. Financial, Family, and work problems I. none of the above Circle one. 5. How many drinks do you have a week?_____ drinks 6. Education: What is the highest degree or level of school you have completed? If currently enrolled, mark the previous grade or highest degree received. A. No schooling completed B. Nursery school to 8th grade C. 9th, 10th or 11th grade D. 12th grade, no diploma E. High school graduate - high school diploma or the equivalent (for example: GED) F. Some college credit, but less than 1 year G. 1 or more years of college, no degree H. Associate degree (for example: AA, AS) I. Bachelor's degree (for example: BA, AB, BS) J. Master's degree (for example: MA, MS, MEng, MEd, MSW, MBA) K. Professional degree (for example: MD, DDS, DVM, LLB, JD) Circle one. ----- Original Message ---- Sent on:Tuesday, May 22, 2012 11:46 PM Hi. It looks good. Change: 2. What is your gender? Circle one: Male Female Other to2. What is your gender? Circle one: Male Female Other Since I do not think you will get someone answering as Other. In #3, I forgot another option:3. Are you having an increasing number of A. Financial problems B. family problems C. Work problems D. Financial and family problems E. financial and family problems F. Family and work problems G. Financial, Family, and work problems H. none of the above Circle one.
rosesegeJun 21, 20129KB4631
AP Statistics Predictions 2013-16
GPA = Student's Weighted GPA before beginning AP Statistics PrevMath = The highest math course the student completed at our school prior to AP Stats AP.Ave = The student's average score on the AP exams taken (if available) MathGPA = Unweighted GPA of student's work in math courses MT.MC = Students number correct (out of 40) on the multiple choice section of their midterm (MT) MT.Raw = Student's raw score (out of 100) on the multiple choice and free response sections of a previously released AP exam Locus.Aug = Student's score (out of 100) on the LOCUS diagnostic test in the beginning of the school year S1P = Student's first semester grade as a percentage S1G = Student's first semester letter grade S1F = Student's (scaled) first semester final exam grade (a.k.a. midterm test grade) S2P = Student's second semester grade as a percentage S2G = Student's second semester letter grade Ch 1-4 = Student's raw test average on ch. 1-4 Ch 1-6 = Student's raw test average on ch. 1-6 Ch 1-8 = Students raw test average on ch. 1-8 MT = Student's raw test average on the midterm Ch 1-12=Student's raw test average on ch. 1-12 (entire textbook) Mock 1 = Student's raw score on first mock exam (mid-March) Mock 2 = Student's raw score on second mock exam (late April) Mock 1&2 = Student's average on two mock exams MT&Mock1&2 = Student's average on midterm and two mock exams MT.AP = Student's converted score (1-5) on midterm Mocks.AP = Student's converted score (1-5) on average of two mock exams MT&Mocks.AP = Student's converted score (1-5) on average of MT and two mock exams ACTUAL = student's actual performance on AP exam (blank means student opted out of taking exam) MT.Resid = Actual score - Midterm score Mocks.Resid = Actual score - average Mock exam score MT&Mocks.Resid = Actual score - average midterm and mock exam score
je175Jul 5, 20169KB1936
Housing Price Data
This is an example of the relationship between housing prices with the square footage of the house, the age of the house and if the house has a finished basement.
jpalmateerNov 7, 20133KB1673
Treatment Effects of a Drug on Cognitive Functioning in Children with Mental Retardation and ADHD
Research conducted by: Pearson et al. Case study prepared by: David Lane and Emily Zitek Overview This study investigated the cognitive effects of stimulant medication in children with mental retardation and Attention-Deficit/Hyperactivity Disorder. This case study shows the data for the Delay of Gratification (DOG) task. Children were given various dosages of a drug, methylphenidate (MPH) and then completed this task as part of a larger battery of tests. The order of doses was counterbalanced so that each dose appeared equally often in each position. For example, six children received the lowest dose first, six received it second, etc. The children were on each dose one week before testing. This task, adapted from the preschool delay task of the Gordon Diagnostic System (Gordon, 1983), measures the ability to suppress or delay impulsive behavioral responses. Children were told that a star would appear on the computer screen if they waited long enough to press a response key. If a child responded sooner in less than four seconds after their previous response, they did not earn a star, and the 4-second counter restarted. The DOG differentiates children with and without ADHD of normal intelligence (e.g., Mayes et al., 2001), and is sensitive to MPH treatment in these children (Hall & Kataria, 1992). Questions to Answer Does higher dosage lead to higher cognitive performance (measured by the number of correct responses to the DOG task)? Design Issues This is a repeated-measures design because each participant performed the task after each dosage. Variable Description Placebo: Number of correct responses after taking a placebo d15 Number of correct responses after taking .15 mg/kg of the drug d30 Number of correct responses after taking .30 mg/kg of the drug d60 Number of correct responses after taking .60 mg/kg of the drug
kari.taylorOct 22, 2014434B1387
Cigarette Consumption vs CHD Mortality
Now that cigarette smoking has been clearly tied to lung cancer, researchers are focusing on possible links to other diseases. The data below show annual rates of cigarette consumption and deaths from coronary heart disease for several nations. Some public health officials are urging that the US adopt a national goal of cutting cigarette consumption in half over the next decade. Examine these data and write a report. In your report you should: 1. Include appropriate graphs (e.g. scatterplot, residual plot) and statistics (e.g. mean and SD); 2. Describe the association between cigarette smoking and coronary heart disease; 3. Create a linear model; 4. Evaluate the strength and appropriateness of your model; 5. Interpret the slope and y-intercept of the line; 6. Use your model to estimate the potential benefits of reaching the national goal proposed for the US. That is, based on your linear model, if the US were to cut its cigarette consumption in half (from 3900 to 1950), what does the linear model predict would happen to the CHD rate. 7. You should use Statcrunch to generate nice looking graphs and output as needed. Be sure to size them appropriately. No need for a 8x10 scatterplot; Make your graphs about 3x3. You should scale them in Statcrunch first, then copy and paste into Word.
smcdaniel04Sep 29, 2011267B5310
Diet
This is a small data set used to illustrate the failure of an inappropriate use of an independent means test compared with a paired test on the same data. The story is that we have before and after weights for 6 customers of a weight loss clinic. Visual observation makes it clear that the clinic is effective (except in one questionable case). Students can discuss what sources there are for the variation found in the data set and relate them to the assumptions of the independent versus paired analysis models. Application of classical techniques will produce an extremely large p_value for the independent analysis and a significant p_value for the paired analysis. To illustrate the difference with simulation techniques, first do a randomization for two means between the before and after data groups. This will spectacularly fail to show a difference, when in fact there is a clear difference. Then use a bootstrap to examine the 6 differences and it is clear that a zero difference is highly unlikely.
david.zeitlerMay 19, 201187B1710
Body Image Data Set
A student survey was conducted at a major university. Data were collected from a random sample of 239 undergraduate students, and the information that was collected included physical characteristics (such as height, handedness, etc.), study habits, academic performance and attitudes, and social behaviors. In this exercise, we will focus on exploring relationships between some of those variables. Note that empty boxes signify that this observation is not available (this is known as a 'missing value').Variables: Variables Gender - Male or Female Height - Self-reported height (in inches) GPA - Student's cumulative college GPA HS_GPA - Student's high school GPA (senior year) Seat - Typical classroom seat location (F = Front, M = Middle, B = Back) WtFeel - Does the student feel that he/she is: Underweight, About Right, Overweight Cheat - Would the tell the instructor if he/she saw somebody cheating on exam? (No or Yes)
33225049_ecollege_sacmlpJul 1, 20157KB7304
Low Birth Weight Study
SOURCE: Hosmer and Lemeshow (2000) Applied Logistic Regression: Second Edition Data were collected at Baystate Medical Center, Springfield, Massachusetts during 1986. DESCRIPTIVE ABSTRACT: The goal of this study was to identify risk factors associated with giving birth to a low birth weight baby (weighing less than 2500 grams). Data were collected on 189 women, 59 of which had low birth weight babies and 130 of which had normal birth weight babies. Four variables which were thought to be of importance were age, weight of the subject at her last menstrual period, race, and the number of physician visits during the first trimester of pregnancy. LIST OF VARIABLES: Columns Variable Abbreviation ----------------------------------------------------------------------------- 2-4 Identification Code ID 10 Low Birth Weight (0 = Birth Weight >= 2500g, LOW 1 = Birth Weight < 2500g) 17-18 Age of the Mother in Years AGE 23-25 Weight in Pounds at the Last Menstrual Period LWT 32 Race (1 = White, 2 = Black, 3 = Other) RACE 40 Smoking Status During Pregnancy (1 = Yes, 0 = No) SMOKE 48 History of Premature Labor (0 = None 1 = One, etc.) PTL 55 History of Hypertension (1 = Yes, 0 = No) HT 61 Presence of Uterine Irritability (1 = Yes, 0 = No) UI 67 Number of Physician Visits During the First Trimester FTV (0 = None, 1 = One, 2 = Two, etc.) 73-76 Birth Weight in Grams BWT ----------------------------------------------------------------------------- PEDAGOGICAL NOTES: These data have been used as an example of fitting a multiple logistic regression model. STORY BEHIND THE DATA: Low birth weight is an outcome that has been of concern to physicians for years. This is due to the fact that infant mortality rates and birth defect rates are very high for low birth weight babies. A woman's behavior during pregnancy (including diet, smoking habits, and receiving prenatal care) can greatly alter the chances of carrying the baby to term and, consequently, of delivering a baby of normal birth weight. The variables identified in the code sheet given in the table have been shown to be associated with low birth weight in the obstetrical literature. The goal of the current study was to ascertain if these variables were important in the population being served by the medical center where the data were collected. References: 1. Hosmer and Lemeshow, Applied Logistic Regression, Wiley, (1989).
wikipetersonJul 23, 20126KB7199

1 2 3 4 5 6 7 8 9 10   >

Always Learning