StatCrunch logo (home)

Data sets shared by StatCrunch members
Showing 1 to 15 of 84 data sets matching scatter
Data Set/Description Owner Last edited Size Views
Movie Budgets and Box Office Earnings (Updated Spring 2018)
This data all comes from the following website the tracks the financial performance of movies:

The “Budget”, “Domestic Gross”, and “Worldwide Gross” columns each are in millions of dollars.

statcrunch_featuredOct 4, 2018270KB10678
Shark Attacks Worldwide
This data comes from It records data on all shark attacks in recorded history including attacks before 1800. Included is all known information on the shark attack including the date, location, information on the individual who was attacked, details on the injuries sustained by the victim, and the species of the shark
statcrunch_featuredJun 28, 20171MB9450
California Home Prices, 2009
This dataset is a collection of real estate listings from San Luis Obispo county, California, and some locations around it from 2009. The prices are their list price at the creation of this dataset. For more information about this data, go to the website source listed above.
statcrunch_featuredApr 3, 201746KB6818
US Workforce Participation
This data primarily comes from two sources: Federal Reserve Bank of St. Louis and the US Bureau of Labor Statistics .
YearThe calendar year for each value
Annual Average Workforce ParticipationDefined by the Bureau of Labor Statistics as "the percentage of the population [16 years and older] that is either employed or unemployed (that is, either working or actively seeking work). Note that 2015's Annual Average is calculated using the first 11 months."
Male Workforce Participation RateAnnual workforce participation rate for males.
Female Workforce Participation RateAnnual workforce participation rate for females.
Male Inactivity Rate Aged 25-54Defined as the proportion of the male population aged 25-54 that is not in the labour force. Common reasons for leaving labour force: college, retirement, stay at home, can't find work and no longer try.
Change in Rate (Male Inactivity Rate Aged 25-54)The change in the inactivity rate calculated as the current year minus the previous year.
Female Inactivity Rate Aged 25-54Defined as the proportion of the female population aged 25-54 that is not in the labour force.
Change in Rate (Female Inactivity Rate Aged 25-54)The change in the inactivity rate calculated as the current year minus the previous year.
Presidential ControlPolitical party of president.
Senate ControlPolitical party of the Senate majority
House ControlPolitical party of the House of Representatives majority.
Legislative Branch (House and Senate)Combined control of Senate and House of Representativs
statcrunch_featuredJun 27, 201710KB2372
All MLB Salaries (1985-2015)
This data has all MLB player salaries between 1985-2015 including the team played for, the city, and a unique ID for each player. Total this includes 25,575 salaries for 4,963 different baseball players.
The player ID is the first 5 letters from the last name, followed by the first two letters from the first name, followed by a number in case of duplicate names. For example, bondsba01 stands for Barry Bonds with "01" because he's the first with the "bondsba" name ID.
statcrunch_featuredJun 27, 20171MB4621
Roller Coasters Data
This dataset looks at some of the roller coasters across the US and various other countries.
NameName of roller coaster
ParkAmusement park for roller coaster
CityCity for amusement park
StateState abbreviation
CountryCountry of the roller coaster. US: United States, MX: Mexico, CR: Costa Rica, GT: Guatemala, CO: Columbia, VE: Venezuela, BR: Brazil, AR: Argentina, CL: Chile, EQ: Ecuador, PE: Peru, F: France, D: Germany
TypeS: Steel, W: Wood
ConstructorType of build for the roller coaster
HeightHeight in meters
SpeedSpeed in kilometers per hour (km/h)
LengthLength in meters
InversionsYes if there are inversions, no if not
DurationDuration of ride in seconds
GForceMax g-force
OpenedYear it opened
RegionGeographic region for the roller coaster
statcrunch_featuredApr 3, 201748KB5779
Cereal Brands
Data on several variable of different brands of cereal. Number of cases: 77 Variable Names: Name: Name of cereal mfr: Manufacturer of cereal where A = American Home Food Products; G = General Mills; K = Kelloggs; N = Nabisco; P = Post; Q = Quaker Oats; R = Ralston Purina type: cold or hot calories: calories per serving protein: grams of protein fat: grams of fat sodium: milligrams of sodium fiber: grams of dietary fiber carbo: grams of complex carbohydrates sugars: grams of sugars potass: milligrams of potassium vitamins: vitamins and minerals - 0, 25, or 100, indicating the typical percentage of FDA recommended shelf: display shelf (1, 2, or 3, counting from the floor) weight: weight in ounces of one serving cups: number of cups in one serving rating: a rating of the cereals
statcrunch_featuredApr 3, 20174KB6481
Body Temperature
Data taken from the Journal of Statistics Education online data archive. That archive in turn got the data from an article in the Journal of the American Medical Association. (Mackowiak, et al., "A Critical Appraisal of 98.6 Degrees F …", vol. 268, pp. 1578-80, 1992).
"Body Temp" is measured in degrees fahrenheit
"Heart rate" is the resting beats per minute
statcrunch_featuredJun 27, 20172KB12253
Wine and Crime
This data set contains the wine consumption (in gallons per capita) and the violent crime rate for the US from 1970 to 2004. A scatter plot of these two columns is very interesting.
websterwestFeb 29, 20081KB3226
Cigarette Consumption vs CHD Mortality
Now that cigarette smoking has been clearly tied to lung cancer, researchers are focusing on possible links to other diseases. The data below show annual rates of cigarette consumption and deaths from coronary heart disease for several nations. Some public health officials are urging that the US adopt a national goal of cutting cigarette consumption in half over the next decade. Examine these data and write a report. In your report you should: 1. Include appropriate graphs (e.g. scatterplot, residual plot) and statistics (e.g. mean and SD); 2. Describe the association between cigarette smoking and coronary heart disease; 3. Create a linear model; 4. Evaluate the strength and appropriateness of your model; 5. Interpret the slope and y-intercept of the line; 6. Use your model to estimate the potential benefits of reaching the national goal proposed for the US. That is, based on your linear model, if the US were to cut its cigarette consumption in half (from 3900 to 1950), what does the linear model predict would happen to the CHD rate. 7. You should use Statcrunch to generate nice looking graphs and output as needed. Be sure to size them appropriately. No need for a 8x10 scatterplot; Make your graphs about 3x3. You should scale them in Statcrunch first, then copy and paste into Word.
smcdaniel04Sep 29, 2011267B5425
Rebound Regression
Here is a data set that students in one group of our introductory course generated for this activity. Only three drops at each of eleven heights were made here, but this should provide an idea of the type of data that would be collected.
smcdanie%scJul 4, 2008219B830
Presidential Heights
Heights of presidents and their main opponent (in cm.).
mooresk1Sep 5, 2017346B701
US Population Estimates by Gender and Age (2010-2014)
Find a description at the following webpage: Census Data Description
statcrunchhelpMar 9, 201617KB415
Bike Sharing
This dataset contains the hourly and daily count of rental bikes between years 2011 and 2012 in Capital bikeshare system with the corresponding weather and seasonal information. - instant: record index - dteday : date - season : season (1:springer, 2:summer, 3:fall, 4:winter) - yr : year (0: 2011, 1:2012) - mnth : month ( 1 to 12) - hr : hour (0 to 23) - holiday : weather day is holiday or not (extracted from [Web Link]) - weekday : day of the week - workingday : if day is neither weekend nor holiday is 1, otherwise is 0. + weathersit : - 1: Clear, Few clouds, Partly cloudy, Partly cloudy - 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist - 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds - 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog - temp : Normalized temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-8, t_max=+39 (only in hourly scale) - atemp: Normalized feeling temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-16, t_max=+50 (only in hourly scale) - hum: Normalized humidity. The values are divided to 100 (max) - windspeed: Normalized wind speed. The values are divided to 67 (max) - casual: count of casual users - registered: count of registered users - cnt: count of total rental bikes including both casual and registered
je175May 11, 20161MB268
6-1 Discussion Applications of Correlation_SMM
Scatter Plot and Correlation for 6-1 Discussion post for MAT 240.
smiiOct 16, 2017282B56

1 2 3 4 5 6   >

Always Learning