General Help

What is StatCrunch?

StatCrunch is a statistical data analysis package for the World Wide Web. It is written in the form of a Java applet. We think users will find it easy to use, and we hope they enjoy working with our package!

Who we are

StatCrunch was created and programmed by a team of programmers and statisticians led byWebster West. Dr. West is in the Department of Statistics at Texas A&M University. The package was created as an initial attempt to solve many of the problems that exist with the delivery and use of modern statistical software. Many times statisticians develop procedures in languages such as Splus, SAS, Minitab, etc.., which are very specific to statisticians. Students and other potential users may not have access to these packages, and therefore may not be able to use the procedures. By using Java and the World Wide Web, StatCrunch should reach the broadest possible audience of any statistical software of its kind.

Getting started

StatCrunch should run on any of the three major platforms (Mac, PC, Unix). It only has the minimal requirement of a Java-capable Web browser which almost everyone on the Web now has. If you do not have a Java-enabled browser, you will probably see a gray box which may or may not have a red x in it after logging in to StatCrunch. A test to determine if a browser is Java-enabled is given below:

Java -

If the test above indicates that Java is not enabled, perform one of the following:

 
Explorer 5.x
 
Explorer 4.x
 
Communicator 4.x
 
Navigator 3.x

Using StatCrunch

The Data, Stat, Graphics and Help menus, located at the top of the StatCrunch frame, provide users with access to the analysis procedures of the software. The Help menu is linked to the StatCrunch help page. See the Data, Stat and Graphics help pages for a listing of these procedures and instructions on how to use them.

The dataset to be analyzed is displayed inside the data table located below the menu bar. StatCrunch offers a variety of methods for loading data. After loading data and selecting a menu item, a listing of the available procedures will appear in a new window. A dialog box will appear after selecting one of these procedures. In the dialog box will be a ? button which directly links the user to the relevant help information for that procedure. After making selections within the dialog box, the results of the procedure will appear in the window.


Saving and Printing Results

To copy, save or print StatCrunch results, you will first need to export the result to HTML. First, select the Export option under the Options menu of the result window. The graphics in the output are written as GIF files on the StatCrunch server, so this may take a few moments for results that contain a large number of graphics. With the latest StatCrunch interface, the results in HTML format will be displayed in the frame below the data table. In older versions of the interface, the results appear in pop up windows (which may be blocked if you have a pop up blocker turned on in your browser). In either case, use the browser's File menu to print your results. In most browsers, you can also copy selected graphics and/or text to the clipboard by choosing the Copy option under the Edit menu of your browser or by right clicking your mouse in the frame containing the HTML results. It is important to remember that the graphics links in the HTML file are to the graphics stored on the StatCrunch server. The file names for graphics are encoded with random letters so that only individuals who have the exact file names will have access to them. Individual graphics may be saved by clicking on the graphic in the new window and then using the browser's File menu to save or print it. If graphics are downloaded to a local file system, the IMG tags in the HTML file must be edited to indicate the proper path to the graphics files on the local system.

Including StatCrunch in a web page

Feel free to link to the StatCrunch site using the following syntax:

<A href="http://www.statcrunch.com/">StatCrunch</A>


Linking Data

Using the Link Generator Form, an HTML link can be created so that a specified data file on the web will be automatically loaded into StatCrunch when the link is clicked. Both text and Excel files can be linked.
  1. Simply specify the text of the link to be displayed on the web page (e.g., "My Data File").
  2. Specify the WWW address of the dataset to be loaded (e.g., http://www.myData.com/myData.txt).
  3. If the first line of the data file contains variable names, check the Use first line as variable names option.
  4. If the data file is a text file, specify the delimiter for the observations. The delimiter options are whitespace (any whitespace character such as a space or tab), tab, comma (for .csv files) and semicolon.
As an example, the Excel data file located at http://www.stat.sc.edu/~west/hotdog.xls can be accessed by clicking the following link: Hotdog Data

Using WHERE

When selecting data to be used with the various analysis procedures, a WHERE statement can be used to determine which rows from the data table will be included in the analysis. The Where statement provides an excellent way to isolate a subgroup within the data for analysis. The statement should be a valid boolean expression which evaluates to either a true or false value. The expression will be evaluated for each row in the data set, and only rows where the expression evaluates to true will be included in the analysis. See the section on expressions below for more information on constructing boolean expressions. Example syntax for Where statements, using the Hotdog Data, are given below.
Calories=190
includes rows where the Calories column is equal to 190
Calories>150
includes rows where the Calories column is greater than 150
Calories>=150
includes rows where the Calories column is greater than or equal to 150
Calories<>190
includes rows where the Calories column is not equal to 190
Calories!=190
includes rows where the Calories column is not equal to 190
LOG(Calories)>5
includes rows where the natural logarithm of the Calories column is greater than 5
Type=Meat
includes rows where the text in the Type column is Meat.
Type="Meat"
includes rows where the text in the Type column is Meat. Note that it is only necessary to use double quotes when the text string contains spaces.
Type<>Beef
includes rows where the text in the Type column is not Beef.
Sodium=386 AND Type=Meat
includes rows where the Sodium column is equal to 386 and the Type column is Meat
Sodium<=400 OR Type="Meat"
includes rows where the Sodium column is less than or equal to 400 or the Type column is Meat
(Sodium>=400 OR Sodium<=500) AND Type="Meat"
includes rows where the Sodium column is between 400 and 500 and the Type column is Meat
row=5
includes only the 5th row
row>=3 AND row<=10
includes rows 3 through 10

Expressions

Some StatCrunch procedures allow the user to input either a boolean (true/false) or mathematical expression. See the compute expression section to see examples of mathematical expressions. See the WHERE section for examples of boolean expressions used to control the data rows that are included in an analysis. Notes on using expressions:

Using GROUP BY

Most StatCrunch analysis procedures allow the user to group results based on a column in the data table. For example, to compute summary statistics of Calories grouped by Type using the Hotdog Data, select Calories under Select column(s) and Type as the "Group by" variable. This will return summary statics for each distinct value of Type.

Some of the graphics provide an option to view separate graphs for each group. This option is not selected by default. If this option is not chosen, then the plot will be color coded based on the grouping variable for easy reference.


Fonts

StatCrunch allows the user to specify the three separate fonts that are used to display the data table, text results (tabular results), and graphical results. To specify these fonts, use the Edit > Fonts menu option, and then specify the name/size for each of the fonts. Note that the available fonts may vary depending on the fonts available on the user's computer system. Also, note that the font specified for graphical results is the maximum font size that may be used. When constructing graphics, StatCrunch may shrink the font in order to fit the graphic nicely in a standard sized result window. The corresponding result window may be manually resized to increase the font size up to the maximum font specified for graphics.


Orderings

StatCrunch allows the user to create orderings that are used to determine the display order for certain types of tabular and graphical output. An ordering can be specified to help StatCrunch display output in a more natural way. As examples, StatCrunch provides predefined orderings for both the natural ordering of the days of week (Sunday,Monday,Tuesday,... and Sun,Mon,Tue,...) and the natural ordering of the months of the year (January,February,March,... and Jan,Feb,Mar,...). When StatCrunch produces output with a set of group labels that contains only the values defined in an ordering or some subset of them, the groups will be reordered in the output according to their relative position in the ordering sequence rather than using an alphanumeric ordering which is otherwise standard.

Orderings can be added, deleted and modified using the Edit > Orderings menu option. When this option is selected, a new dialog will appear with a listing of all the active orderings for the current StatCrunch session. To modify/remove a particular ordering, select the ordering from the list and then press the Edit/Delete button. A new ordering can be added by clicking the Add new ordering button. When modifying or adding an ordering, the distinct values should be entered one per line in the resulting text field. StatCrunch ignores case when comparing ordering values to group labels to determine whether or not to apply the ordering to specific output.


Contact Us

With questions or comments please submit a request via the tech support page.


Applets

StatCrunch now offers interactive applets that are designed to allow users to learn statistical concepts and explore data in a more interactive fashion. These components are available under the StatCrunch > Applets menu option. These applets have the look and feel of many standalone Java applets that have been used by statistical educators to illustrate statistical concepts for many years. While many educators have traditionally sampled applets from a variety of Web locations, an expanded list of StatCrunch applets in addition to existing data analysis capabilities will allow StatCrunch to become a one stop shop for many interested educators.

StatCrunch applets offer some exciting capabilities beyond those encountered in most Java applets for statistical education. First, the applets can be generated using data loaded in the StatCrunch software package whereas most stand alone applets have a data set preloaded which can not be easily changed. This feature opens the door for developing interesting demonstrations in a variety of different customized data settings. Second, these components can be saved as standalone applets and shared with others. This allows instructors to develop applets for interesting relevant data sets for their courses and easily share them with their students as well as other instructors. If an applet is shared with everyone as part of the saving process, the applet will also be viewable by those without StatCrunch accounts. When saving an applet, the current state of the applet is also saved. This allows students to also develop and save applets as part of course work to be evaluated by their instructor. The applets that are currently available are described below.

Histogram with sliders

The Histogram with sliders option under the StatCrunch > Applets menu allows users to explore the impact of changing the starting point and bin width parameters for a histogram. To construct the applet, the user is prompted to specify the column containing the data and an optional Where statement to control the specific rows of data that are included. In this setting, a Where statement is useful for focusing the analysis on a particular subgroup or for eliminating outliers.

Users may interactively change the starting point and bin width of the resulting histogram applet using the sliders provided. As an example of potential classroom usage, individual students might provide their own choices for these settings and save their results so that they can be compared with other students. The current setting for starting point and bin width are saved when the applet is saved using the Options > Export to My Results menu option on the applet window within StatCrunch.

For more info and an interactive example see:
http://www.statcrunch.com/5.0/viewreport.php?reportid=15261

Regression by eye

The Regression by eye option under the StatCrunch > Applets menu is designed to allow users to better understand the least squares principle underlying linear regression by allowing them to visually estimate the line of best fit and compare their results to the true regression line. The interface for constructing the applet is shown in Result 1 for the Chicken Sandwich data set. The user selects the columns containing the X and Y variables, and provides an optional Where statement to control the data from these columns used in the regression analysis.

The user controls the green line shown in the scatter plot of the resulting applet by dragging the green dots on each end of the line. The resulting intercept and slope of the green line is displayed on the bottom line of the table below the graph along with the line's sum of squared error (SSE), which is the sum of squared vertical deviations of the data points from the line. The line of best fit in regression is defined as the line that minimizes SSE. The table also displays these values for the true regression line in red. Additional points such as outliers can be added to the analysis by simply clicking at the desired location on the graphic. Individual points can be moved by clicking on them and dragging the mouse. Points can be removed from the analysis completely by dragging them to the trash. Clicking on the trash can will clear all of the points and provide a blank slate for the regression analysis. After doing their best to minimize SSE, the user can click the Show regression line button to plot the true regression line in red to be compared with the user's green line. Clicking the button again, will hide the regression line. The two light gray lines on the graphic show the mean of the X and Y values. An experienced user can use these values to better estimate the true regression line. For additional practice, the user can click the Create new data button to go through the exercise again with a randomly generated data set. For classroom usage, students might compete to see who can best approximate the true regression line for a specific set of data.

For more info and an interactive example see:
http://www.statcrunch.com/5.0/viewreport.php?reportid=15262

Randomization test for two means

The Randomization test for two means option under the StatCrunch > Applets menu is designed to illustrate the way the randomization test works for the case of comparing two means. In this case, the null hypothesis states that the two population means are the same. To simulate this situation, the test repeatedly randomizes the sample assignments for each of the data values. The differences between the means for the randomized data are then compared to the observed mean difference for the original data in order to determine statistical significance.

The interface for constructing the applet is very similar to that of the two sample t procedure within StatCrunch. The user selects the column containing the first sample along with an optional Where clause to specify the data rows to be included in the sample. The user then selects the column containing the second sample along with an optional Where clause to specify the data rows to be included in the sample. Note that for stacked data the first and second column will typically be the same. In this case, the user may also enter optional short labels for each sample to be used in the resulting applet.

The resulting applet offers the ability to randomize the data 1 time, 5 times or 1000 times. These ideas match pedagogically with the ideas of step, walk and run, respectively. Click the 1 time button to see a single randomization of the data. A new window will pop up which shows the actual shuffling of the sample labels with the results then displayed in the data table within the applet. The difference between the means of the shuffled samples (Sample 1 - Sample 2) for the shuffled sample labels is computed below the data table. If the mean difference for the randomized data is more extreme in absolute value than the observed difference for the original data, it will be displayed in red otherwise it is displayed in black. Click the 5 times button to see this process repeated five times at a slightly faster pace. A running tally of the results is provided in the Results panel in the right portion of the applet. A histogram of color-coded randomized mean differences is displayed. A table of the frequencies and relative frequencies for the randomized mean differences that are less than the negative absolute value of the observed mean difference or greater than the absolute value of the observed mean difference is also shown. The counts/proportions totaled across both regions is also provided in the bottom line of this table. The user can use the appropriate line from this table to estimate a P-values based on the proper alternative for a large number of randomizations.

A large number of randomizations can be quickly generated by clicking the 1000 times button. In this case, the individual randomizations are not displayed, but the Results panel is quickly updated after each randomization is computed. A user may desire to click this button several times to generate an even larger number of total randomizations. The user may then go back and inspect the results by clicking on the individual randomization numbers displayed in the applet or by clicking on the histogram in the Results panel. The far left panel in the applet contains the results of each randomization ordered numerically with those providing more extreme than observed results shown in red. Clicking on a randomization number shows the corresponding randomized data in the data table and the resulting mean difference. A small triangle is also displayed at the proper location corresponding to this mean difference within the histogram. The Graph button located below the data table can be used to construct a simple dotplot of the selected randomization. A user can select randomizations within the histogram by clicking and dragging the mouse to draw a rectangle intersecting a specific bar or bars. The total number of randomizations with mean differences in the selected bar(s) is then displayed in a pop up window along with the option for selecting individual color-coded randomizations falling within the selected range. From a pedagogical perspective, students may use this applet to understand the idea of repeated randomization and may identify and explore randomizations that are extreme.

When saving the applet from within StatCrunch, the user is asked whether or not to save the current random number seed. This seed determines the randomizations that will be generated as described above. If the seed is saved, subsequent users will generate the exact same randomizations as the current user. If it is not saved, then subsequent users will each get different randomizations.

For more info and an interactive example see:
http://www.statcrunch.com/5.0/viewreport.php?reportid=15263

Randomization test for correlation

The Randomization test for correlation option under the StatCrunch > Applets menu is designed to illustrate the way the randomization test works when considering the correlation between two variables, X and Y. In this case, the null hypothesis states that the correlation is zero. To simulate this situation, the test repeatedly randomizes the pairings of X and Y values in the sample data. The sample correlations for the randomized data are then compared to the observed sample correlation for the original data in order to determine statistical significance.

The interface for constructing the applet is very similar to that of the linear regression procedure within StatCrunch. The user selects the columns containing the X and Y samples along with an optional Where clause to specify the data rows to be included in the analysis.

The resulting applet offers the ability to randomize the data 1 time, 5 times or 1000 times. These ideas match pedagogically with the ideas of step, walk and run, respectively. First click the 1 time button to see a single randomization of the data. A new window will pop up that shows the actual shuffling of the Y values with the results then displayed in the data table within the applet. The sample correlation for the shuffled data is computed below the data table. If the sample correlation for the randomized data is more extreme in absolute value than the observed sample correlation for the original data, it will be displayed in red otherwise it is displayed in black. Click the 5 times button to see this process repeated five times at a slightly faster pace. A running tally of the results is provided in the Results panel in the right portion of the applet. A histogram of color-coded correlations is displayed. A table of the frequencies and relative frequencies for the randomized mean differences that are less than the negative absolute value of the observed sample correlation or greater than the absolute value of the observed sample correlation is shown. The counts/proportions totaled across both regions is also provided in the bottom line of this table. The user can use the appropriate line from this table to estimate a P-value for a given alternative based on a large number of randomizations.

A large number of randomizations can be quickly generated by clicking the 1000 times button. In this case, the individual randomizations are not displayed, but the results panel is quickly updated after each randomization is computed. Click this button several times to generate an even larger number of total randomizations. After generating a large number of randomizations, the user may then go back and inspect the results by clicking on the individual randomization numbers displayed in the applet or by clicking on the histogram in the Results panel. The far left panel in the applet contains the results of each randomization ordered numerically with those providing more extreme than observed results shown in red. Clicking on a randomization number shows the corresponding randomized data in the data table and the resulting sample correlation. A small triangle is also displayed at the proper location corresponding to this sample correlation within the histogram. The Graph button located below the data table can be used to construct a simple scatter plot of the selected randomization. A user can select randomizations within the histogram by clicking and dragging the mouse to draw a rectangle intersecting a specific bar or bars. The total number of randomizations with sample correlations in in the selected bar(s) is then displayed in a pop up window along with the option for selecting individual color-coded randomizations falling within the selected range. From a pedagogical perspective, students may use this applet to understand the idea of repeated randomization and may identify and explore randomizations that are extreme.

When saving the applet from within StatCrunch, the user is asked whether or not to save the current random number seed. This seed determines the randomizations that will be generated as described above. If the seed is saved, subsequent users will generate the exact same randomizations as the current user. If it is not saved, then subsequent users will each get different randomizations.

For more info and an interactive example see:
http://www.statcrunch.com/5.0/viewreport.php?reportid=15264