Public profile for ds-231%sc
Properties of 60 Standard Metropolitan Statistical Areas (a standard Census Bureau designation of the region around a city) in the United States, collected from a variety of sources. The data include information on the social and economic conditions in these areas, on their climate, and some indices of air pollution potentials. Number of cases: 60 Reference: U.S. Department of Labor Statistics [ Outlier , Transformation , Regression]
cityCity name
JanTempMean January temperature (degrees Farenheit)
JulyTempMean July temperature (degrees Farenheit)
RelHumRelative Humidity
Rain Annual rainfall (inches)
MortalityAge adjusted mortality
Education Median education
PopDensity Population density
%NonWhite Percentage of non whites
%WC Percentage of white collar workers
pop Population
pop/house Population per household
incomeMedian income
HCPot HC pollution potential
NOxPotNitrous Oxide pollution potential
SO2Pot Sulfur Dioxide pollution potential
NOx Nitrous Oxide
Aug 11, 2008
Smoking and Cancer
The data are per capita numbers of cigarettes smoked (sold) by 43 states and the District of Columbia in 1960 together with death rates per thouusand population from various forms of cancer. Number of cases: 44 Reference: J.F. Fraumeni, "Cigarette Smoking and Cancers of the Urinary Tract: Geographic Variations in the United States," Journal of the National Cancer Institute, 41, 1205-1211. [Outlier , Regression , Residuals , Transformation , Nonlinear regression , Dummy variable]
CIG Number of cigarettes smoked (hds per capita)
BLAD Deaths per 100K population from bladder cancer
LUNG Deathes per 100K population from lung cancer
KID Deaths per 100K population from bladder cancer
LEUK Deaths per 100 K population from leukemia
Aug 11, 2008
Glove Use Among Nurses
Data from an experiment to see how an educational program on the importance of using gloves affected the rate of glove use by a group of nurses in an inner-city pediatric hospital emergency department. Without their knowledge, the nurses were observed during vascular access procedures before and one, two, and five months after an educational program to see how often they wore gloves. Each procedure by a nurse was counted as a separate observation. Missing values are indicated by large dots. Number of cases: 23 Reference: Friedland, L., Joffe, M., Moore, D. , et al. (1992), "Effect of Educational Program on Compliance With Glove Use in a Pediatric Emergency Department," American Journal of Diseases of Childhood, 146, 1355-1358. [ Proportion]
PeriodObservation period (1 = before intervention, 2 = one month after intervention, 3 = two months after, 4 = 5 months after intervention)
ObservedNumber of times the nurse was observed
Gloves Number of times the nurse used gloves
Experience Years of experience of nurse
Aug 11, 2008
Improving Reading Ability
Results of an experiment to test whether directed reading activities in the classroom help elementary school students improve aspects of their reading ability. A treatment class of 21 third-grade students participated in these activities for eight weeks, and a control class of 23 third-graders followed the same curriculum without the activities. After the eight-week period, students in both classes took a Degree of Reading Power (DRP) test which measures the aspects of reading ability that the treatment is designed to improve. Number of cases: 44 Reference: Moore, David S., and George P. McCabe (1989). Introduction to the Practice of Statistics[Two sample t-test , Summary statistics]
Treatment Whether student participated in activities (treated) or not (control)
Response Score on Degree of Reading Power test
Aug 11, 2008
Cancer Survival
Patients with advanced cancers of the stomach, bronchus, colon, ovary or breast were treated with ascorbate. The purpose of the study was to determine if the survival times differ with respect to the organ affected by the cancer. Number of cases: 64 Reference:Cameron, E. and Pauling, L. (1978) Supplemental ascorbate in the supportive treatment of cancer: re-evaluation of prolongation of survival times in terminal human cancer. Proceedings of the National Academy of Science USA, 75, 4538Ð4542. Also found in: Manly, B.F.J. (1986) Multivariate Statistical Methods: A Primer, New York: Chapman & Hall, 11. Also found in: Hand, D.J., et al. (1994) A Handbook of Small Data Sets, London: Chapman & Hall, 255. [ANOVA , Boxplot , Transformation]
Survival Survival time (in days?)
Organ Organ affected by the cancer
Aug 11, 2008
Birth rates
Births per 10,000 23-year-old women in the United States from 1917-1975. Number of cases: 59 Reference: P.K. Whelpton and A. A. Campbell, "Fertility Tables for Birth Charts of American Women," Vital Statistics Special Reports 51, no. 1. (Washington D.C.:Government Printing Office, 1960, years 1917-1975). National Center for Health Statistics, Vital Statistics of the United States Vol. 1, Natality (Washington D.C.:Government Printing Office, yearly, 1958-1975). [ Scatterplot , Time series]
Birthrate: Births per 10,000 23-year-old women in the US from 1917-1975
Year: The year
Aug 11, 2008
Predicting Retail Sales
These data are published monthly in the statistical section of the Survey of Current Business. Number of cases: 44 Reference: U.S. Department of Commerce, Survey of Current Business [Regression , Residuals , Time series]
TIME Quarter, from 1st quarter 1979 to 4th quarter 1989
WASA National income wage and salary disbursements ($ billions)
EMPLEmployees on payrolls of non-agricultural establishments (thousands)
BLDG Building material dealer sales ($ millions)
AUTOAutomotive dealer sales ($ millions)
FURNFurniture and home furnishings dealer sales ($ millions)
GMERGeneral merchandise dealer sales ($ millions)
Aug 11, 2008
Educational Spending
Description: Average salary paid to teachers and expenditures per pupil on education in the 50 states and the District of Columbia. Number of cases: 51 Reference: Moore, David S., and George P. McCabe (1989). Introduction to the Practice of Statistics [ANCOVA , ANOVA , Scatterplot]
State State
Region Region
PayAmount of pay in thousands
SpendAverage amount spent per student in thousands
Aug 11, 2008
Percentage of Entering Class Graduating on Time. A large university reports the percentage of the entering Freshman class graduating on time in each of 8 years from each of 6 separate colleges making up the university. The years cover a period of war protest and other upheavals that may have disrupted some student's education plans. Number of cases: 48 Reference: This data is distributed with the software package, Data Desk¨. Data Description, Inc. (1993). Data Desk¨. Ithaca, NY: Data Description, Inc. [Methods: Scatterplot , Time series]
School Code for college
%_grad_on_timeThe percentage of entering class that graduated on time
YearYear of entering class
Aug 11, 2008
Reading Test Scores
Data from a study of the effect of three different methods of instruction on reading comprehension in children. Participants were given a reading comprehension test before and after receiving the instruction. Number of cases: 66 Reference: Moore, David S., and George P. McCabe (1989). Introduction to the Practice of Statistics[ANOVA]
SubjectSubject number
GroupType of instruction that student received (Basal, DRTA, or Strat)
PRE1 Pretest score on first reading comprehension measure
PRE2Pretest score on second reading comprehension measure
POST1 Posttest score on first reading comprehension measure
POST2 Posttest score on second reading comprehension measure
POST3 Posttest score on third reading comprehension measure
Aug 11, 2008
Heights of singers in the NY Choral Society in 1979. Self-report, to the nearest inch. Voice parts in order from highest pitch to lowest pitch are Soprano, Alto, Tenor, Bass. The first two are female voices and the last two are male voices. The original dataset included two divisions for each voice part. This dataset reports only soprano 1, alto 1, tenor 1, and bass 1 from the original dataset. Reference: Chambers, Cleveland, Kleiner, and Tukey. (1983). Graphical Methods for Data Analysis[Pooled t-test , ANOVA , Boxplot]
SopranoHeights of sopranos (in inches)
AltoHeights of altos (in inches)
Tenor Heights of tenors (in inches)
BassHeights of basses (in inches)
Aug 11, 2008
Percent of a Standard 50-word list heard correctly in the presence of background noise. 24 subjects with normal hearing listened to standard audiology tapes of English words at low volume with a noisy background. They repeated the words and were scored correct or incorrect in their perception of the words. The order of list presentation was randomized. The word lists are standard audiology tools for assessing hearing. They are calibrated to be equally difficult to perceive. However, the original calibration was performed with normal-hearing subjects and no noise background. The experimenter wished to determine whether the lists were still equally difficult to understand in the presence of a noisy background. Number of cases: 96 Reference: Loven, Faith. (1981). A Study of the Interlist Equivalency of the CID W-22 Word List Presented in Quiet and in Noise. Unpublished MS Thesis, University of Iowa. [ ANOVA]
SubjectID Code for each subject - 24 of them
ListIDCode for each list played
Hearing Score received on hearing test
Aug 11, 2008
Nursing Home Data
The data were collected by the Department of Health and Social Services of the State of New Mexico and cover 52 of the 60 licensed nursing facilities in New Mexico in 1988. Number of cases: 52 Reference: These data are part of the data analyzed in Howard L. Smith, Niell F. Piland, and Nancy Fisher, "A Comparison of Financial Performance, Organizational Character- istics, and Management Strategy Among Rural and Urban Nursing Facilities, Journal of Rural Health, Winter 1992, pp 27-40. [ T-test , Outlier , Boxplot , Mann Whitney U test , Summary statistics]
BED number of beds in home
MCDAYS annual medical in-patient days (hundreds)
TDAYS annual total patient days (hundreds)
PCREV annual total patient care revenue ($hundreds)
NSAL annual nursing salaries ($hundreds)
FEXP annual facilities expenditures ($hundreds)
RURAL rural (1) and non-rural (0) homes
Aug 11, 2008
Voting for the President
Percent of the popular vote that was won by the Democratic presidential candidates in the 1980 and 1984 elections. Both candidates, Jimmy Carter in 1980 and Walter Mondale in 1984, were defeated by the Republican Ronald Reagan. Number of cases: 50 Reference: Moore, David S., and George P. McCabe (1989). Introduction to the Practice of Statistics[Dummy variable , Regression , Scatterplot]
State State
Dem1980 Percent of the presidential votes won by the Democratic candidate in 1980
Dem1984 Percent of the presidential votes won by the Democratic candidate in 1984
Aug 11, 2008
US Crime
These data are crime-related and demographic statistics for 47 US states in 1960. The data were collected from the FBI's Uniform Crime Report and other government agencies to determine how the variable crime rate depends on the other variables measured in the study. Number of cases: 47 Reference:Vandaele, W. (1978) Participation in illegitimate activities: Erlich revisited. In Deterrence and incapacitation, Blumstein, A., Cohen, J. and Nagin, D., eds., Washington, D.C.: National Academy of Sciences, 270-335. Methods: A Primer, New York: Chapman & Hall, 11. Also found in: Hand, D.J., et al. (1994) A Handbook of Small Data Sets, London: Chapman & Hall, 101-103. [Collinearity , Correlation , Causation , Lurking variable , Regression]
R Crime rate # of offenses reported to police per million population
Age The number of males of age 14-24 per 1000 population
S Indicator variable for Southern states (0 = No, 1 = Yes)
Ed Mean # of years of schooling x 10 for persons of age 25 or older
Ex0 1960 per capita expenditure on police by state and local government
Ex1 1959 per capita expenditure on police by state and local government
LF Labor force participation rate per 1000 civilian urban males age 14-24
MThe number of males per 1000 females
NState population size in hundred thousands
NW The number of non-whites per 1000 population
U1Unemployment rate of urban males per 1000 of age 14-24
U2 Unemployment rate of urban males per 1000 of age 35-39
W Median value of transferable goods and assets or family income in tens of $
X The number of families per 1000 earning below 1/2 the median income
Aug 11, 2008
Pulse Rate
The following data set represent the pulse rates for 24 randomly selected individuals
Aug 11, 2008
Hospital Data Aug 11, 2008
secretary data Aug 11, 2008
Beers Data Aug 11, 2008
Alcohol Consumption Aug 11, 2008
CEO Golf and Stock Data
Data from New York Times (31 May 1998, Section 3, p 1) reporting correlation between CEO's golf handicaps and performance of their companies' stock.
Aug 11, 2008
Magazine Ads Readability
Thirty magazines were ranked by educational level of their readers. Three magazines were randomly selected from the first, second, and third ten magazines. Six advertise- ments were randomly selected from each of the nine selected magazines. The magazines were Group 1 Highest educational level: 1. Scientific American 2. Fortune 3. The New Yorker Group 2 Medium educational level: 4. Sports IIlustrated 5. Newsweek 6. People Group 3 Lowest educational level : 7. National Enquirer 8. Grit 9 True Confessions For each advertisement, the data below were observed. Number of cases: 54 Reference: F.K. Shuptrine and D.D. McVicker, "Readability Levels of Magazine Ads," Journal of Advertising Research, 21:5 (October 1981), p 47.[ ANOVA ]
WDS number of words in advertisement copy
SEN number of sentences in advertising copy
3SYL number of 3+ syllable words in advertising copy
MAG magazine (1 through 9 as above)
GRP educational level (as above)
Aug 11, 2008
Wages and Hours
The data are from a national sample of 6000 households with a male head earning less than $15,000 annually in 1966. The data were clasified into 39 demographic groups for analysis. The study was undertaken in the context of proposals for a guaranteed annual wage (negative income tax). At issue was the response of labor supply (average hours) to increasing hourly wages. The study was undertaken to estimate this response from available data [ Regression , Outlier , Collinearity , Assumptions, regression]
HRSAverage hours worked during the year
WAGE Average hourly wage ($)
ERSP Average yearly earnings of spouse ($)
ERNO Average yearly earnings of other family members ($)
NEIN Average yearly non-earned income
ASSET Average family asset holdings (Bank account, etc.) ($)
AGE Average age of respondent
DEP Average number of dependents
RACEPercent of white respondents
SCHOOL Average highest grade of school completed
Aug 11, 2008
Ice Cream Consumption
Ice cream consumption was measured over 30 four-week periods from March 18, 1951 to July 11, 1953. The purpose of the study was to determine if ice cream consumption depends on the variables price, income, or temperature. The variables Lag-temp and Year have been added to the original data. Number of cases: 30 Reference: Koteswara Rao Kadiyala (1970) Testing for the independence of regression disturbances. Econometrica, 38, 97-117. Also found in: Hand, D.J., et al. (1994) A Handbook of Small Data Sets, London: Chapman & Hall, 214. [Regression , Time series ]
Date Time period (1-30) of the study (from 3/18/51 to 7/11/53)
IC Ice cream consumption in pints per capita
Price Price of ice cream per pint in dollars
Income Weekly family income in dollars
Temp Mean temperature in degrees F.
Lag-temp Temp variable lagged by one time period
Year Year within the study (0 = 1951, 1 = 1952, 2 = 1953)
Aug 11, 2008
Agricultural Economics Studies
Price and consumption per capita of beef and pork annually from 1925 to 1941 together with other variables relevant to an economic analysis of price and/or consumption of beef and pork over the period. Number of cases: 17 Reference: F.B. Waugh, Graphic Analysis in Agricultural Economics, Agricultural Handbook No. 128, U.S. Department of Agriculture, 1957. (Regression , Multivariate regression , Time series )
PBE Price of beef (cents/lb)
CBE Consumption of beef per capita (lbs)
PPO Price of pork (cents/lb)
CPO Consumption of pork per capita (lbs)
PFO Retail food price index (1947-1949 = 100)
DINC Disposable income per capita index (1947-1949 = 100)
CFO Food consumption per capita index (1947-1949 = 100)
RDINC Index of real disposable income per capita (1947-1949 = 100)
RFP Retail food price index adjusted by the CPI (1947-1949 = 100)
Aug 11, 2008
Predicting Appliance Sales
The file gives unit shipments of dishwashers, disposers, refrigerators, and washers in the United States from 1960 to 1985. This and other data are published currently in the Department of Commerce's Survey of Current Business, and are summarized from time to time in their publication, Business Statistics. Also included in the file are durable goods expenditures and private residential investment in the United States. Number of cases: 26 Reference: Business Statistics, U.S. Department of Commerce [ Regression , Time series ]
YEAR: 1960 to 1985
DISH: Factory shipments (domestic) of dishwashers (thousands)
DISP: Factory shipments (domestic) of disposers (thousands)
FRIG: Factory shipments (domestic) of refrigerators (thousands)
WASH: Factory shipments (domestic) of washing machines (thousands)
DUR: Durable goods expenditures (billions of 1972 dollars)
RES: Private residential investment (billions of 1972 dollars)
Aug 11, 2008
Home Prices
The data are a random sample of records of resales of homes from Feb 15 to Apr 30, 1993 from the files maintained by the Albuquerque Board of Realtors. This type of data is collected by multiple listing agencies in many cities and is used by realtors as an information base. Number of cases: 117 Reference: Albuquerque Board of Realtors [ Diagnostics , Dummy variable.[Interaction , Regression]
PRICE Selling price ($hundreds)
SQFT Square feet of living space
AGE Age of home (years)
FEATS Number out of 11 features (dishwasher, refrigerator, microwave, disposer, washer, intercom, skylight(s), compactor, dryer, handicap fit, cable TV access
NE Located in northeast sector of city (1) or not (0)
COR Corner location (1) or not (0)
TAX Annual taxes ($)
Aug 11, 2008
Healthy Breakfast
Data on several variable of different brands of cereal. A value of -1 for nutrients indicates a missing observation. Number of cases: 77 Reference: Data available at many grocery stores [ Histogram , Scatterplot , Regression]
Name Name of cereal
mfrManufacturer of cereal where A = American Home Food Products; G = General Mills; K = Kelloggs; N = Nabisco; P = Post; Q = Quaker Oats; R = Ralston Purina
type cold or hot
calories calories per serving
proteingrams of protein
fat grams of fat
sodiummilligrams of sodium
fibergrams of dietary fiber
carbo grams of complex carbohydrates
sugarsgrams of sugars
potass milligrams of potassium
vitamins vitamins and minerals - 0, 25, or 100, indicating the typical percentage of FDA recommended
shelf display shelf (1, 2, or 3, counting from the floor)
weight weight in ounces of one serving
cupsnumber of cups in one serving
rating a rating of the cereals
Aug 11, 2008
CEO Salaries
Small companies were defined as those with annual sales greater than five and less than $350 million. Companies were ranked according to 5-year average return on investment. This data covers the first 60 ranked firms. Reference: Forbes, November 8, 1993, "America's Best Small Companies,". [Outlier , Histogram , Mean , Median , Boxplot , Distribution]
Age: Age of chief executive officer
Sal: Salary of chief executive officer (including bonuses), $thousands
Aug 11, 2008
Hot dogs
Results of a laboratory analysis of calories and sodium content of major hot dog brands. Researchers for Consumer Reports analyzed three types of hot dog: beef, poultry, and meat (mostly pork and beef, but up to 15% poultry meat). [ ANOVA ]
Typetypes of hot dog
Caloriescalories content
Sodiumsodium content

Brain Size and Intelligence
Willerman et al. (1991) collected a sample of 40 right-handed Anglo introductory psychology students at a large southwestern university. Subjects took four subtests (Vocabulary, Similarities, Block Design, and Picture Completion) of the Wechsler (1981) Adult Intelligence Scale-Revised. The researchers used Magnetic Resonance Imaging (MRI) to determine the brain size of the subjects. Information about gender and body size (height and weight) are also included. The researchers withheld the weights of two subjects and the height of one subject for reasons of confidentiality. Reference: Willerman, L., Schultz, R., Rutledge, J. N., and Bigler, E. (1991), "In Vivo Brain Size and Intelligence," Intelligence, 15, 223-228. [Correlation , Regression , Scatterplot]
Gender Male or Female
FSIQ Full Scale IQ scores based on the four Wechsler (1981) subtests
VIQ Verbal IQ scores based on the four Wechsler (1981) subtests
PIQ Performance IQ scores based on the four Wechsler (1981) subtests
Weightbody weight in pounds
Heightheight in inches
MRI_Count total pixel Count from the 18 MRI scans
ds-231%scAug 11, 20081KB4824


