J Pribe auto-mpg.xlsx
The data set covers 12 years of vehicles and contains 398 individual entries. The data describes popular consumer vehicle’s miles per gallon (MPG), the number of engine cylinders, total engine size (displacement), engine horsepower, the vehicle weight, a measure of acceleration (0-60 MPH time), the model year of the vehicle (1970-1982), a coded identifier for the place of origin, and the make and model of the vehicle. MPG, number of cylinders, engine size, horsepower, weight, acceleration time, and model year are all numerical values. The vehicle origin, full name, make, and model are categorical. This data was chosen to meet the assignment requirements, and because cars are cool. *Origin data code: 1=USA, 2=Europe, 3=Japan. The "car name" variable was broken into additional make and model variables to ease analysis, a change from the original data set.
jpribeFeb 16, 201932KB107
McKenna Morrissey: Depression and the Internet
This study was done to figure if spending more time on the internet causes depression. This data set includes hours spent on the internet per week, depression before, and after, gender, race, age, household income, and household size. (
mckenrmOct 24, 20189KB929
USDA Nutrition Data
This dataset has the nutritional values per serving size for a large variety of foods as calculated by the USDA.

US Department of Agriculture, Agricultural Research Service, Nutrient Data Laboratory. USDA National Nutrient Database for Standard Reference, Release 28. Version Current: September 2015. Internet:
statcrunchhelpJan 13, 2016832KB1650
Nutrition Information for French Fries
Nutrition information for french fries at fast food restaurants. Information is for a large serving. *These restaurants only offer one serving size.
petkewicAug 25, 20091KB3137
Economics and Policy.xls
Growth RGDP: RGDP is real gross domestic product. The growth in real gross domestic product is the common measure of a country's economic health. Unemployment Rate: The number of unemployed people divided by the number of people in the labor force. The labor force includes only those who have a job or who are seeking a job. Employment Rate: The number of employed people divided by the working age population. The working age population includes all people from age 15 to 64, regardless of whether or not they are in the labor force. Federal Revenue per GDP: The total amount of money the Federal government receives expressed as a fraction of the size of the economy (GDP). Federal Spending per GDP: The total amount of money the Federal government spends expressed as a fraction of the size of the economy (GDP). Federal Debt per GDP: The total Federal debt expressed as a fraction of the size of the economy (GDP). Here, Federal debt includes both public debt outstanding (money the Federal government has borrowed from people, companies, and foreign governments) and intergovernmental debt (money the Federal government has borrowed from the Social Security trust fund). Top Federal Income Tax Rate: The Federal income tax rate paid by those in the highest tax bracket. Recession: This variable is 1 if the country was in recession in the indicated year and 0 otherwise. Democratic President: This variable is 1 if the President was a Democrat, 0 if the President was a Republican. Seats in House Held by Democrats: The number of Democrats in the House of Representative as a fraction of the total number of Representatives. Due to a small number of independents, the fraction of seats held by Republicans is approximately (but not exactly) one minus the fraction of seats held by Democrats. Seats in Senate Held by Democrats: The number of Democrats in the Senate as a fraction of the total number of Senators. Due to a small number of independents, the fraction of seats held by Republicans is approximately (but not exactly) one minus the fraction of seats held by Democrats. War: This variable is 1 if the country was at war, 0 otherwise.
adaviesNov 2, 20108KB1405
Cigarette Consumption vs CHD Mortality
Now that cigarette smoking has been clearly tied to lung cancer, researchers are focusing on possible links to other diseases. The data below show annual rates of cigarette consumption and deaths from coronary heart disease for several nations. Some public health officials are urging that the US adopt a national goal of cutting cigarette consumption in half over the next decade. Examine these data and write a report. In your report you should: 1. Include appropriate graphs (e.g. scatterplot, residual plot) and statistics (e.g. mean and SD); 2. Describe the association between cigarette smoking and coronary heart disease; 3. Create a linear model; 4. Evaluate the strength and appropriateness of your model; 5. Interpret the slope and y-intercept of the line; 6. Use your model to estimate the potential benefits of reaching the national goal proposed for the US. That is, based on your linear model, if the US were to cut its cigarette consumption in half (from 3900 to 1950), what does the linear model predict would happen to the CHD rate. 7. You should use Statcrunch to generate nice looking graphs and output as needed. Be sure to size them appropriately. No need for a 8x10 scatterplot; Make your graphs about 3x3. You should scale them in Statcrunch first, then copy and paste into Word.
smcdaniel04Sep 29, 2011267B5412
Baseball data for correlation and regression
This table shows the total number of runs scored, at bats, hits, etc for each of the 30 MLB teams for the 2009-2011 seasons. //// Correlations and linear regression models can be calculated between the different numeric variables. A good exercise is to see which variables correlate most strongly with runs_scored. //// As emphasized in the movie Moneyball, some of the classic metrics such as batting_avg is not as good as the newer metrics like OBP (on base percentage), SLG (slugging percentage), or OPS (on base plus slugging). //// A guide to a few of the variables that may not be self explanatory. Runs_Scored: The total of all runs (points) the baseball team scored by the end of the season. Batting_avg: This is equal to the number of hits divided by at_bats OBP: On Base Percentage. Similar to batting average, except that it takes into account walks and hit-by-pitch. Some players who don't have high batting averages, manage to get walked quite frequently. SLG: Slugging - This weights hits to first base as 1 point, hits to second base as 2 points, third as 3, homeruns as 4, and divides the total by the number of at bats. OPS - On Base Plus Slugging - this is just OBP added to the SLG numbers.
mileschenApr 17, 20126KB3798
2011 DUI Arrests
Two variables are listed by state - population size and number of DUI arrests for 2011.
cecil_collegeJun 20, 20121KB3323
Violent Crimes by State State Rankings -- Statistical Abstract of the United States VIOLENT CRIMES 1 PER 100,000 POPULATION -- 2006 [When states share the same rank, the next lower rank is omitted. Because of rounded data, states may have identical values shown, but different ranks. Cautionary note] Cautionary note about rankings The ranks in some tables are based on estimates derived from a sample(s). Because of sampling and nonsampling errors associated with the estimates, the ranking of the estimates does not necessarily reflect the correct ranking of the unknown true values. Thus, caution should be used when making inferences or statements about the states' true values based on a ranking of the estimates. As an example, the estimated total (average, percent, ratio, etc.) for State A may be larger than the estimates for all other states. This does not necessarily mean that the true total (average, percent, ratio, etc.) for State A is larger than those for all other states. Such an inference typically depends on --among other factors-- the size of the difference(s) between the estimates in question, and the size of their associated standard errors. In other tables, the ranks are based on a complete enumeration of the target population, or on complete administrative reporting from the population. In such cases, sampling is not used, and there is no sampling error component in the estimates. Still, care should still be taken when making inferences or statements based on the rankings. The table values may still exhibit nonsampling error originating from such sources as coverage problems (missing units or duplicates), nonresponse, misreporting, and others. Last Revised: September 27, 2011 at 09:43:17 AM
phil_larsonJan 16, 2013881B3258
Fairfax City Home Sales
This data set presents a random sample of homes sold in Fairfax City in 2017. This sample was provided by a local realtor using the Multiple Listing Service (MLS). The response variable is Price (which refers to the closing price of the home). Consider four explanatory variables The Year the Home was built (variable named “Year”). The number of days the home has been listed (variable named “Days”) The taxable living area in square feet (variable named “TLArea”). Lot Size in acres (variable named “Acres”).
nramezanOct 12, 20172KB969
YMS Table 3-3 Brain Size vrs IQ
Brain size as measured by magnetic resonance imaging (number of pixels in the computer image of the brain scan).. verses intelligence as measured by Wechsler Adult Intelligence Test. 20 volunteers of each gender
lakestatsSep 27, 20071KB722
Number of songs on an iPod
Students in my class answered the question, "How many songs are on your iPod?" The distribution is skewed right. We generated 50 means from samples of size 30 that were randomly selected from this data.
lpyottMar 28, 20122KB729
Brain Size and Intelligence
my major is psyc and this is the only mildly interesting thing under that catagory i found
fenrirsomaJun 7, 20101KB722
Ice cream-Julia Meave
Blue Bell carries many different ice cream flavors. In this data set I gave the different flavors along with the nutritional information in pint and half gallon size.
juliameavNov 8, 20181KB41
500 Random Samples of sizes 5, 10, 25, 50, and 100 taken from the population of 300 penny ages (pennies.xls).
jph422Oct 4, 200713KB413

