StatCrunch logo (home)

Data sets shared by StatCrunch members
Showing 1 to 15 of 87 data sets matching race
Data Set/Description Owner Last edited Size Views
Criminal Recidivism in Iowa: 2010-2014
Recidivism is defined as the "tendency of a convicted criminal to reoffend". This dataset tracks former criminals from Iowa over a 3 year period after their release from prison to see whether or not they were convicted of a new crime during that time. The recidivism reporting year is the fiscal year (year ending June 30) marking the end of the three year tracking period. Included are the following variables: Fiscal Year Released (the year the individual was released from Prison), the Race, Ethnicity, Sex, and Age of individual when released. Also included are details about the original crime committed along with whether that individual committed a new crime (Recidivism - Return to Prison) within the 3 year window.
statcrunch_featuredMar 21, 20183MB3644
New York City Leading Causes of Death (2007-2014)
This data set breaks down the leading causes of death in New York City between 2007-2014. Included is the number of Deaths (Deaths) for each combination of Sex and Race Ethnicity. The Death Rate represents the rate within that Sex/ Race Ethnicity category. Age Adjusted Death Rate adjusts the Death Rate by the ages of those who died.
statcrunch_featuredAug 1, 201896KB4206
Super Heroes
This data set originally came from the following website: https://www.kaggle.com/claudiodavi/superhero-set. It contains various physical characteristics for over 700 fictional comic book super heroes.
statcrunch_featuredAug 1, 201847KB6835
Fatal Encounters Updated September 2018
This data set was downloaded from Fatal Encounters, a non-profit organization that is collecting data on Police Involved Deaths. This data set has been truncated to include the subject's name, age at time of death, subject's gender, subjects race, location of death, cause of death and year of death. This does not only include people shot by police, but there are also instances of police that died during fatal encounters. It is good to remind students using this data set that this is a volunteer agency collecting the data from people that are scouring news articles for evidence of these fatal encounters, thus it is not a complete population of fatal encounters, only a very, very large sample.
habarkerApr 8, 20193MB162
McKenna Morrissey: Depression and the Internet
This study was done to figure if spending more time on the internet causes depression. This data set includes hours spent on the internet per week, depression before, and after, gender, race, age, household income, and household size. (https://dasl.datadescription.com/datafile/depression-and-the-internet/?_sfm_cases=4+17504&sf_paged=6)
mckenrmOct 24, 20189KB1124
All Texas Executions from 1982-2015
This data set records all executions in Texas from 1982-2015 and comes from the following website: Texas Executions. The data includes a variety of information about each execution including their last statement.
statcrunchhelpJan 7, 2016242KB2662
U.S. House Candidates Fund Raising (In-District vs Out-of-District)
This data set contains each candidate's name, party affiliation, state/district of race, total funds raised, funds raised in-district, funds raised out-of-district, funds raised with no district information, percentage of funds raised in-district and percentage of funds raised out-of-district.
websterwestDec 2, 201454KB1441
North Carolina birth data
A Random Sample of 1000 births from the state of North Carolina. Plurarility refers to the number of children associated with the birth. Gender 1=Male, 2=Female. fage is age of father (years), mage is age of mother (years), visits is number of pre-natal medical visits, marital is 1=married, 2=unmarried, racemom is Race of Mother (0=Other Non-white, 1=White, 2=Black 3=American Indian, 4=Chinese, 5=Japanese, 6=Hawaiian, 7=Filipino, 8=Other Asian or Pacific Islander), hispmom is whether mother is of Hispanic origin (C=Cuban, M=Mexican, N=Non-Hispanic, O=Other and Unknown Hispanic, P=Puerto Rican, S=Central/South American, U=Not Classifiable), gained is weight gain during pregnancy (pounds), lowbw is if birth weight is 2500 grams or lower, tpounds is birthweight in pounds, smoke is 0=no, 1=yes for mother admitted to smoking, mature is 0=no, 1-yes for mother is 35 or older, premie is 0=no, 1=yes to being born 36 weeks or sooner.
jph422Sep 8, 200837KB5194
US Smoking
This data set presents three-year annual average estimates of smoking status by age, sex, state, and race/ethnicity beginning with the period 1997-1999. These estimates are presented as three- year annual averages to obtain stable estimates.
websterwestJul 17, 20085KB5557
North Carolina premature births
A Random Sample of 1000 births from the state of North Carolina. Plurarility refers to the number of children associated with the birth. Gender 1=Male, 2=Female. fage is age of father (years), mage is age of mother (years), visits is number of pre-natal medical visits, marital is 1=married, 2=unmarried, racemom is Race of Mother (0=Other Non-white, 1=White, 2=Black 3=American Indian, 4=Chinese, 5=Japanese, 6=Hawaiian, 7=Filipino, 8=Other Asian or Pacific Islander), hispmom is whether mother is of Hispanic origin (C=Cuban, M=Mexican, N=Non-Hispanic, O=Other and Unknown Hispanic, P=Puerto Rican, S=Central/South American, U=Not Classifiable), gained is weight gain during pregnancy (pounds), lowbw is if birth weight is 2500 grams or lower, tpounds is birthweight in pounds, smoke is 0=no, 1=yes for mother admitted to smoking, mature is 0=no, 1-yes for mother is 35 or older, premie is 0=no, 1=yes to being born 36 weeks or sooner.
statcrunchhelpApr 10, 20144KB2151
Percent of Adult Current Smokers by Sex and Race/Ethnicity, 1995-2010
The original data comes from the U.S. Department of Health and Human Services Cited from the source: Adults are defined as 18 years of age and older. The CDC defines a "Current Smoker" as an adult who has smoked at least 100 cigarettes (5 packs) in their lifetime and currently smokes either "Every Day" or "Some Days." BRFSS data methodology changed in 2011; therefore, 2011 and after is not comparable to 2010 data and before.
statcrunchhelpMar 9, 20161KB924
Stop And Frisk Data For January 2012
This data set contains information on the 69,073 stops made under the Stop And Frisk policy of the New York City police department in January of 2012. For the detainee, the variables include Sex (0 - female, 1- male), Race (1 - black, 2- black Hispanic, 3- white Hispanic, 4- white, 5 - Asian/Pacific Islander, 6 - American Indian), Age, Height and Weight. Note that Age, Height and Weight may be subject to coding errors based on some of the more extreme values. Other variables in the data set are FriskOrSearch (0 - if the stop did not result in either frisk or search, 1- otherwise), FoundSomething (0 - if the detainee was not found to have either contraband or weapons, 1 - otherwise) and ArrestMade (0 - if no arrest was made, 1 - otherwise).
websterwestAug 27, 20131MB3574
nc2005birth300.xls
A Random Sample of 300 births from the state of North Carolina. Plurarility refers to the number of children associated with the birth. Gender 1=Male, 2=Female. fage is age of father (years), mage is age of mother (years), visits is number of pre-natal medical visits, marital is 1=married, 2=unmarried, racemom is Race of Mother (0=Other Non-white, 1=White, 2=Black 3=American Indian, 4=Chinese, 5=Japanese, 6=Hawaiian, 7=Filipino, 8=Other Asian or Pacific Islander), hispmom is whether mother is of Hispanic origin (C=Cuban, M=Mexican, N=Non-Hispanic, O=Other and Unknown Hispanic, P=Puerto Rican, S=Central/South American, U=Not Classifiable), gained is weight gain during pregnancy (pounds), lowbw is if birth weight is 2500 grams or lower, tpounds is birthweight in pounds, smoke is 0=no, 1=yes for mother admitted to smoking, mature is 0=no, 1-yes for mother is 35 or older, premie is 0=no, 1=yes to being born 36 weeks or sooner.
jph422Nov 5, 200711KB896
Low Birth Weight Study
SOURCE: Hosmer and Lemeshow (2000) Applied Logistic Regression: Second Edition Data were collected at Baystate Medical Center, Springfield, Massachusetts during 1986. DESCRIPTIVE ABSTRACT: The goal of this study was to identify risk factors associated with giving birth to a low birth weight baby (weighing less than 2500 grams). Data were collected on 189 women, 59 of which had low birth weight babies and 130 of which had normal birth weight babies. Four variables which were thought to be of importance were age, weight of the subject at her last menstrual period, race, and the number of physician visits during the first trimester of pregnancy. LIST OF VARIABLES: Columns Variable Abbreviation ----------------------------------------------------------------------------- 2-4 Identification Code ID 10 Low Birth Weight (0 = Birth Weight >= 2500g, LOW 1 = Birth Weight < 2500g) 17-18 Age of the Mother in Years AGE 23-25 Weight in Pounds at the Last Menstrual Period LWT 32 Race (1 = White, 2 = Black, 3 = Other) RACE 40 Smoking Status During Pregnancy (1 = Yes, 0 = No) SMOKE 48 History of Premature Labor (0 = None 1 = One, etc.) PTL 55 History of Hypertension (1 = Yes, 0 = No) HT 61 Presence of Uterine Irritability (1 = Yes, 0 = No) UI 67 Number of Physician Visits During the First Trimester FTV (0 = None, 1 = One, 2 = Two, etc.) 73-76 Birth Weight in Grams BWT ----------------------------------------------------------------------------- PEDAGOGICAL NOTES: These data have been used as an example of fitting a multiple logistic regression model. STORY BEHIND THE DATA: Low birth weight is an outcome that has been of concern to physicians for years. This is due to the fact that infant mortality rates and birth defect rates are very high for low birth weight babies. A woman's behavior during pregnancy (including diet, smoking habits, and receiving prenatal care) can greatly alter the chances of carrying the baby to term and, consequently, of delivering a baby of normal birth weight. The variables identified in the code sheet given in the table have been shown to be associated with low birth weight in the obstetrical literature. The goal of the current study was to ascertain if these variables were important in the population being served by the medical center where the data were collected. References: 1. Hosmer and Lemeshow, Applied Logistic Regression, Wiley, (1989).
wikipetersonJul 23, 20126KB7614
Tour de France GC Data, 1903-2018
List of all Tour de France general classification winners, including distance, winning time, average speed, winning margin, and more. Sources: 1903-2017 data on pp110-112 http://netstorage.lequipe.fr/ASO/cycling_tdf/2018-historical-guide.pdf 2018 data https://www.letour.fr/en/history https://www.letour.fr/en/rider/8/team-sky/geraint-thomas 1903-201 1903-2018 data on victory margin, stage wins, stages in lead, other titles https://en.wikipedia.org/wiki/List_of_Tour_de_France_general_classification_winners
len.cabreraJan 10, 201914KB152

1 2 3 4 5 6   >

Always Learning