StatCrunch logo (home)

Data sets shared by StatCrunch members
Showing 1 to 15 of 434 data sets matching Random
Data Set/Description Owner Last edited Size Views
Class Seating vs Grade
From Body Image Data Set: "A student survey was conducted at a major university. Data were collected from a random sample of 239 undergraduate students". Variables: Gender - Male or Female, GPA - Student's cumulative college GPA. GPA is then converted to Grades (where, 4.33 = A+, 4.00 = A, 3.67 = A-, 3.33 = B+, 3.00 = B, 2.67 = B-, 2.33 = C+, 2.00 = C, 1.67 = C-). Seat - Typical classroom seat location (Front & Back)
mallirhea86Oct 26, 20182KB3611
North Carolina birth data
A Random Sample of 1000 births from the state of North Carolina. Plurarility refers to the number of children associated with the birth. Gender 1=Male, 2=Female. fage is age of father (years), mage is age of mother (years), visits is number of pre-natal medical visits, marital is 1=married, 2=unmarried, racemom is Race of Mother (0=Other Non-white, 1=White, 2=Black 3=American Indian, 4=Chinese, 5=Japanese, 6=Hawaiian, 7=Filipino, 8=Other Asian or Pacific Islander), hispmom is whether mother is of Hispanic origin (C=Cuban, M=Mexican, N=Non-Hispanic, O=Other and Unknown Hispanic, P=Puerto Rican, S=Central/South American, U=Not Classifiable), gained is weight gain during pregnancy (pounds), lowbw is if birth weight is 2500 grams or lower, tpounds is birthweight in pounds, smoke is 0=no, 1=yes for mother admitted to smoking, mature is 0=no, 1-yes for mother is 35 or older, premie is 0=no, 1=yes to being born 36 weeks or sooner.
jph422Sep 8, 200837KB5196
1970 Draft Lottery Data
In 1970, Congress instituted a random selection process for the military draft. All 366 possible birth dates were placed in plastic capsules in a rotating drum and were selected one by one. The first date drawn from the drum received draft number one and eligible men born on that date were drafted first. In a truly random lottery there should be no relationship between the date and the draft number. However, this dataset suggests that men born later in the year were more likely to be drafted.
cdcummings12Jun 1, 20108KB3259
Happiness Data from GSS.xls
These data come from the 2008 General Social Survey. A subset of 190 respondents were selected at random from the full data set. Children = number of children. Education is highest year of education (e.g., 12 = High School; 16 = Bachelors, etc.). Happy: 1 = Not too happy, 2 = Pretty Happy, 3 = Very Happy. Health: 1 = Poor, 2 = Fair, 3 = Good, 4 = Excellent. Income: 1 = Under $1000; 2 = $1000-2999; 3 = $3000-3999; 4 = $4000-4999; 5 = $5000-5999; 6 = $6000-6999; 7 = $7000-7999; 8 = $8000-9999; 9 = $10000-12499; 10 = $12500-14999; 11 = $15000-17499; 12 = $17500-19999; 13 = $20000-22499; 14 = $22500-24999; 15 = $25000-29999; 16 = $30000-34999; 17 = $35000-39999; 18 = $40000-49999; 19 = $50000-59999; 20 = $60000-74999; 21 = $75000-$89999; 22 = $90000-$109999; 23 = $110000-$129999; 24 = $130000-$149999; 25 = $150000+. Married: 0 = No, 1 = Yes. Religious: 1 = Not religious, 2 = Slightly religious, 3 = Moderately religious, 4 = Very religious.
jacobgsimonsApr 20, 20105KB3982
North Carolina premature births
A Random Sample of 1000 births from the state of North Carolina. Plurarility refers to the number of children associated with the birth. Gender 1=Male, 2=Female. fage is age of father (years), mage is age of mother (years), visits is number of pre-natal medical visits, marital is 1=married, 2=unmarried, racemom is Race of Mother (0=Other Non-white, 1=White, 2=Black 3=American Indian, 4=Chinese, 5=Japanese, 6=Hawaiian, 7=Filipino, 8=Other Asian or Pacific Islander), hispmom is whether mother is of Hispanic origin (C=Cuban, M=Mexican, N=Non-Hispanic, O=Other and Unknown Hispanic, P=Puerto Rican, S=Central/South American, U=Not Classifiable), gained is weight gain during pregnancy (pounds), lowbw is if birth weight is 2500 grams or lower, tpounds is birthweight in pounds, smoke is 0=no, 1=yes for mother admitted to smoking, mature is 0=no, 1-yes for mother is 35 or older, premie is 0=no, 1=yes to being born 36 weeks or sooner.
statcrunchhelpApr 10, 20144KB2151
BODYMEAS.XLS
Random Sample of 100 observations from NHANES (which contains more observations). GENDER (1=Male, 2=Female), AGE (years), WEIGHTENG (inches), HEIGHTENG (inches), SIXFOOT (0=No, 1=Yes to being 72 inches or taller), LEGENG (Leg length inches), WAISTENG (Waist circumference inches), THIGHENG (Thigh circumference inches), WAIST28 (0=No, 1=Yes to having waist 28 inches or smaller), HEIGHT65 (0=No, 1=Yes to being 65 inches tall or shorter), BMI30 (0=No, 1=Yes to having Body Mass Index 30 or higher), OVER200 (0=No, 1=Yes to weighing 200 pounds or more).
jph422Sep 16, 20084KB3783
RegisteredNursesSurvey.xlsx
For what survey produced it, see http://www.statcrunch.com/5.0/survey.php?surveyid=8178&code=YINVQ and inputs of all team mates. Towards the end, some validation was done, deleting data where working hours was less than a work day, or outliers to legally admissible work days. Finally arbitarily long chains which were less likely to be encountered in draws of simulated data (M/F, Degrees etc.. were discarded). A total of 12 observations were thus thrown out. All Credit goes to Team 3,the Instructor, our unnamed Friends in the Nursing profession who enthusiastically did a last minute push through over their extended social media groups for data and the respondents who kindly took out time for the survey. Another thought is about the distribution of hours worked. Wven if random, it "should be" "centered on" certain hours a day* number of days, with deviations from centre penalised, while picking a sample.. The observations 38 appear many times for example, however without an explainable reason (we are talking of work-distribution among nursing staff sample) So do "primes" "47, 37, 29" It is not to argue that they "shouldn't occur", but there has to be some reason for their being so significant/vibrant. At this stage we may conclude that most of the respondents may not have been under full-time nursing employments in strict sense of the term. 42, 48,72,60, 50,40 appearing more often would give us less variation but more regularity in the data. Since we haven't tried stratification, we do not know "how often they should occur". We thus do not re-draw observations.
ugoagwuJun 14, 20142KB1050
AMSTAT Census at School
This is a random sample (n=250) from the AMSTAT Census at School classroom project.
squesenJul 14, 201520KB1480
Cell Phone OLI
Math Math SAT score Verbal Verbal SAT score Credits Number of credits the student is registered for Year Year in college (1=Freshman, 2=Sophomore, 3=Junior, 4=Senior) Exer Time (in minutes) spent exercising in a typical day Sleep Time (in hours) spent sleeping in a typical day Veg Are you a vegetarian (yes, no, some) Cell Do you own a cell phone (yes, no) Cell Phones College students at a large state university completed a survey about their academic and personal life. Questions ranged from "How many credits are you registered for this semester?" to "Would you define yourself as a vegetarian?" Four sections of an introductory statistics course were chosen at random from all the sections of introductory statistics courses offered at the university in the semester when the survey was conducted, and the 312 students who completed the survey were students registered in one of the four chosen sections. In this exercise, we will use a subset of variables from the survey and use the collected data to answer three questions. Note that (1) these are real data, and (2) the symbol * in the worksheet means that this observation is not available (this is known as a "missing value").
corp_richardMay 2, 20168KB1266
Effect of Smoke on infants
Data was collected by a random survey of mothers in KY through a dance studio during November 2010 by SABRINA LAFFERTY & KAREN HOLLAND (ST 291 Fall 2010 candidates at HCTC) as a requirement for semester project. They asked 57 mothers about the gestation period for their pregnancies, the birth weight, the length of their newborns and whether they smoked while they were pregnant.
statcrunchhelpMar 6, 20141KB7184
Asking prices for 4-bedroom homes in Bryan-College Station TX
Random sample of 30 four-bedroom homes listed for sale in the Bryan-College Station, Texas, area. For each home, the data set contains the list price in thousands of dollars (Price), square footage (Sqft), number of bathrooms (Baths) and location (Bryan, TX or College Station, TX).
statcrunchhelpApr 4, 2014951B4260
Comparing two drugs
The basic practice of statistics: instructor's edition. David S. Moore - William Notz - Michael A. Fligner - R. Scott Linder - W.H. Freeman and Co. – 2013 (p. 462) 18.50 Comparing two drugs. Makers of generic drugs must show that they do not differ significantly from the “reference” drugs that they imitate. One aspect in which drugs might differ is their extent of absorption in the blood. Table 18.6 gives data taken from 20 healthy nonsmoking male subjects for one pair of drugs. This is a matched pairs design. Numbers 1 to 20 were assigned at random to the subjects. Subjects 1 to 10 received the generic drug first, followed by the reference drug. Subjects 11 to 20 received the reference drug first, followed by the generic drug. In all cases, a washout period separated the two drugs so that the first had disappeared from the blood before the subject took the second. By randomizing the order, we eliminate the order in which the drugs were administered from being confounded with the difference in the absorption in the blood. Do the drugs differ significantly in the amount absorbed in the blood? Table 18.6 Absorption extent for two versions of a drug
phil_larsonApr 9, 2013290B1620
Diet
This is a small data set used to illustrate the failure of an inappropriate use of an independent means test compared with a paired test on the same data. The story is that we have before and after weights for 6 customers of a weight loss clinic. Visual observation makes it clear that the clinic is effective (except in one questionable case). Students can discuss what sources there are for the variation found in the data set and relate them to the assumptions of the independent versus paired analysis models. Application of classical techniques will produce an extremely large p_value for the independent analysis and a significant p_value for the paired analysis. To illustrate the difference with simulation techniques, first do a randomization for two means between the before and after data groups. This will spectacularly fail to show a difference, when in fact there is a clear difference. Then use a bootstrap to examine the 6 differences and it is clear that a zero difference is highly unlikely.
david.zeitlerMay 19, 201187B1847
Telephone Holding Times
An airline has a toll-free phone number that they use for reservations. Sometimes callers have to be placed on hold. The airline conducted a randomized experiment to determine if there was a significant difference in how long a caller would remain on hold depending on what is playing on the call. The airline randomly selected one out of every 1000 calls to be placed on hold with either a advertisement of current promotions, with muzak playing (elevator music), or with classical music playing. Total, 15 callers were sampled for this study. Each column is the number of minutes that the random caller remained on the line until they hung up for each type of recorded message. This data set comes from "Statistics: The Art and Science of Learning from Data" by Alan Agresti and Christine Franklin.
statcrunchhelpSep 17, 201485B1677
nc2005birth300.xls
A Random Sample of 300 births from the state of North Carolina. Plurarility refers to the number of children associated with the birth. Gender 1=Male, 2=Female. fage is age of father (years), mage is age of mother (years), visits is number of pre-natal medical visits, marital is 1=married, 2=unmarried, racemom is Race of Mother (0=Other Non-white, 1=White, 2=Black 3=American Indian, 4=Chinese, 5=Japanese, 6=Hawaiian, 7=Filipino, 8=Other Asian or Pacific Islander), hispmom is whether mother is of Hispanic origin (C=Cuban, M=Mexican, N=Non-Hispanic, O=Other and Unknown Hispanic, P=Puerto Rican, S=Central/South American, U=Not Classifiable), gained is weight gain during pregnancy (pounds), lowbw is if birth weight is 2500 grams or lower, tpounds is birthweight in pounds, smoke is 0=no, 1=yes for mother admitted to smoking, mature is 0=no, 1-yes for mother is 35 or older, premie is 0=no, 1=yes to being born 36 weeks or sooner.
jph422Nov 5, 200711KB896

1 2 3 4 5 6 7 8 9 10   >

Always Learning