StatCrunch should run on any of the three major platforms (Mac, PC, Unix). It only has the minimal requirement of a Java-capable Web browser which almost everyone on the Web now has. If you do not have a Java-enabled browser, you will probably see a gray box which may or may not have a red x in it after logging in to StatCrunch. A test to determine if a browser is Java-enabled is given below:

**Java** -

If the test above indicates that Java is not enabled, perform one of the following:

- Go to "Tools | Internet Options..." from the main menu
- Change to the "Security" tab
- Click "Custom Level..." button
- To enable: make sure a setting other than "Disable Java" is selected under "Java." If you're not sure which setting to choose, select "High safety"
- Restart the browser

- Go to "View | Internet Options..." from the main menu
- Change to the "Security" tab
- Select "Custom" and click on the "Settings..." button
- To enable: make sure a setting other than "Disable Java" is selected under "Java." If you're not sure which setting to choose, select "High safety"
- Restart the browser

- Go to "Edit | Preferences..." from the main menu
- Select "Advanced" panel
- To enable: make sure "Enable Java" check box is checked.

- Restart the browser

- Go to "Options | Network Preferences..." from the main menu
- Change to the "Languages" tab
- To enable: make sure "Enable Java" check box is checked

- Restart the browser

The dataset to be analyzed is displayed inside the data
table located below the menu bar. StatCrunch offers a variety of
methods for loading
data. After loading data and selecting a menu item, a listing of
the available procedures will appear in a new window. A dialog box
will appear after selecting one of these procedures. In the dialog box
will be a **?** button which directly links the user to the
relevant help information for that procedure. After making selections
within the dialog box, the results of the procedure will appear in the
window.

<A href="http://www.statcrunch.com/">StatCrunch</A>

- Simply specify the text of the link to be displayed on the web page (e.g., "My Data File").
- Specify the WWW address of the dataset to be loaded (e.g., http://www.myData.com/myData.txt).
- If the first line of the data file contains variable names, check
the
**Use first line as variable names**option. - If the data file is a text file, specify the delimiter for the observations. The delimiter options are whitespace (any whitespace character such as a space or tab), tab, comma (for .csv files) and semicolon.

**Calories=190**- includes rows where the Calories column is equal to 190
**Calories>150**- includes rows where the Calories column is greater than 150
**Calories>=150**- includes rows where the Calories column is greater than or equal to 150
**Calories<>190**- includes rows where the Calories column is not equal to 190
**Calories!=190**- includes rows where the Calories column is not equal to 190
**LOG(Calories)>5**- includes rows where the natural logarithm of the Calories column is greater than 5
**Type=Meat**- includes rows where the text in the Type column is Meat.
**Type="Meat"**-
includes rows where the text in the Type column is Meat.
**Note that it is only necessary to use double quotes when the text string contains spaces.** **Type<>Beef**- includes rows where the text in the Type column is not Beef.
**Sodium=386 AND Type=Meat**- includes rows where the Sodium column is equal to 386 and the Type column is Meat
**Sodium<=400 OR Type="Meat"**- includes rows where the Sodium column is less than or equal to 400 or the Type column is Meat
**(Sodium>=400 OR Sodium<=500) AND Type="Meat"**- includes rows where the Sodium column is between 400 and 500 and the Type column is Meat
**row=5**- includes only the 5th row
**row>=3 AND row<=10**- includes rows 3 through 10

- Most expressions should contain references to the existing columns in the data table. If there is a column name that contains a space, references to the column need to be enclosed in double quotes (e.g., "Column One").
**Row**may be used to refer to the row id column in the data table. This is a StatCrunch keyword, so any other columns given this name will not be properly referenced.**Row_Selected**may be used to obtain a vector of boolean values where an element in the vector is true if the corresponding row in the data table is selected and false otherwise. This reference is very handy for specifying the data rows to be included in a specific analysis when used in conjunction with a Where statement after selecting rows. This is a StatCrunch keyword, so any other columns given this name will not be properly referenced.- Parentheses can be used to force the order of evaluation in both mathematical and boolean expressions.
- The syntax for StatCrunch expressions
follows ANSI SQL syntax. The components that can be used in an expression are listed below.
**Comparison Operators**These operators below are very useful when constructing boolean expressions for

**Where**statements.**=**- tests for equality of numeric or text values
**>**- tests if one numeric value is greater than another numeric value
**<**- tests if one numeric value is less than another numeric value
**>=**- tests if one numeric value is greater than or equal to another numeric value
**<=**- tests if one numeric value is less than or equal to another numeric value
**<>**- tests for nonequality of values
**!=**- tests for nonequality of values
**IS NULL**- tests for a null value (empty cell)
**IS NOT NULL**- tests for a nonnull value

**Logical Operators**These operators below are very useful when constructing boolean expressions for

**Where**statements.**AND**- compares two boolean values, returns true if both are true, and false otherwise
**OR**- compares two boolean values, returns true if either is true, and false otherwise

**Arithmetic Operators**These operators return numeric results when used with numeric arguments and null values otherwise.

**/**- divides two numeric values
*****- multiplies two numeric values
**+**- adds two numeric values
**-**- subtracts two numeric values
******- exponentiates one numeric value by another
**^**- Same as
******above.

**Comparison Functions**The functions below produce boolean (true/false) outputs. You can specify columns names as inputs in which case the function will return a vector applying the function to each value in the input column.

**between(x,y,z)**- returns true if x is between y and z (noninclusive) and false otherwise
**ifelse(x,y,z)**- returns y if x is true and z otherwise
**ifnull(x,y,z)**- returns y if x is null (empty) and z otherwise
**isNaN(x)**- returns true if x is not a numeric value and false otherwise
**isNull(x)**- returns true if x is null and false otherwise

**Mathematical Functions**The functions below require numeric inputs and provide numeric outputs. StatCrunch attempts to coerce nonumeric values into the correct input type. You can specify columns names as inputs in which case the function will return a vector applying the function to each value in the input column.

**abs(x)**- absolute value
**acos(x)**- arc cosine
**asin(x)**- arc sine
**atan(x)**- arc tangent
**ceil(x)**- ceiling, round up
**cos(x)**- cosine
**dbeta(x,alpha,beta)**- beta distribution function at the value x with shape alpha and scale beta
**dbinom(x,n,p)**- binomial distribution function at the value x with parameters n and p
**dcauchy(x)**- cauchy density at the value x with location 0 and scale 1
**dchisq(x,df)**- chi-square density at the value x with degrees of freedom df
**df(x,ndf,ddf)**- F density at the value x with numerator degrees of freedom ndf and denominator degrees of freedom ddf
**dgamma(x,alpha)**- gamma density at the value x with shape alpha
**dnorm(x,mu,sigma)**- normal density at the value x with mean of mu and standard deviation sigma
**dpois(x,lambda)**- Poisson distribution function at the value x with mean lambda
**dt(x,df)**- t density at the value x with degrees of freedom df
**exp(x)**- exponent
**floor(x)**- truncates to nearest integer
**larger(x,y)**- returns the larger of x and y
**lesser(x,y)**- returns the lesser of x and y
**lngamma(x)**- natural logarithm of the gamma function
**log(x)**- natural logarithm
**log10(x)**- logarithm base 10
**log2(x)**- logarithm base 2
**logbeta(x,y)**- natural logarithm of the beta function
**ln(x)**- natural logarithm base e
**pbeta(x,alpha,beta)**- beta CDF at the value x with shape alpha and scale beta
**pbinom(x,n,p)**- binomial CDF at the value x with parameters n and p
**pcauchy(x)**- cauchy CDF at the value x with location 0 and scale 1
**pchisq(x,df)**- chi-square CDF at the value x with degrees of freedom df
**pf(x,ndf,ddf)**- F CDF at the value x with numerator degrees of freedom ndf and denominator degrees of freedom ddf
**pgamma(x,alpha)**- gamma CDF at the value x with shape alpha
**pnorm(x,mu,sigma)**- normal CDF at the value x with mean of mu and standard deviation sigma
**ppois(x,lambda)**- Poisson CDF at the value x with mean lambda
**pt(x,df)**- t CDF at the value x with degrees of freedom df
**qbeta(x,alpha,beta)**- beta quantile at the value x (between 0 an 1) with shape alpha and scale beta
**qbinom(x,n,p)**- binomial quantile at the value x (between 0 an 1) with parameters n and p
**qcauchy(x)**- cauchy quantile at the value x (between 0 an 1) with location 0 and scale 1
**qchisq(x,df)**- chi-square quantile at the value x v with degrees of freedom df
**qf(x,ndf,ddf)**- F quantile at the value x (between 0 an 1) with numerator degrees of freedom ndf and denominator degrees of freedom ddf
**qgamma(x,alpha)**- gamma quantile at the value x (between 0 an 1) with shape alpha
**qnorm(x,mu,sigma)**- normal quantile at the value x (between 0 an 1) with mean of mu and standard deviation sigma
**qpois(x,lambda)**- Poisson quantile at the value x (between 0 an 1) with mean lambda
**qt(x,df)**- t quantile at the value x (between 0 an 1) with degrees of freedom df
**rbeta(n,alpha,beta)**- beta sample of size n with shape alpha and scale beta
**rbinom(n,size,p)**- binomial sample of size n with parameters size and p
**rcauchy(n)**- cauchy sample of size n with location 0 and scale 1
**rchisq(n,df)**- chi-square sample of size n with degrees of freedom df
**rf(n,ndf,ddf)**- F sample of size n with numerator degrees of freedom ndf and denominator degrees of freedom ddf
**rgamma(n,alpha)**- gamma sample of size n with shape alpha
**rnorm(n,mu,sigma)**- normal sample of size n with mean of mu and standard deviation sigma
**round(x)**- rounds to nearest integer
**rpois(n,lambda)**- Poisson sample of size n with mean lambda
**rt(n,df)**- t sample of size n with degrees of freedom df
**sin(x)**- sine
**sqrt(x)**- square root
**tan(x)**- tangent

**String Functions**The functions below require string (text) inputs. StatCrunch attempts to coerce nonstring values into the correct input type. You can specify columns names as inputs in which case the function will return a vector applying the function to each value in the input column.

**contains(x,y)**- returns true if the string x contains the string y and false otherwise
**endsWith(x,y)**- returns true if the string x ends with the string y and false otherwise
**indexOf(x,y)**- returns the first numeric index in the string x where the string y occurs (starting at index of 0), returns -1 if x does not contain y
**lastIndexOf(x,y)**- returns the last numeric index in the string x where the string y occurs (starting at index of 0), returns -1 if x does not contain y
**length(x)**- returns the length of the string x
**replace(x,y,x)**- replaces all occurrences of y with z in x
**startsWith(x,y)**- returns true if the string x starts with the string y and false otherwise
**substring(x,y)**- returns the substring of x beginning at y (starting at index of 0)

**Date and Time Functions**The functions below require string (text) inputs which StatCrunch will attempt to turn into valid dates and/or times. You can specify columns names as inputs in which case the function will return a vector applying the function to each value in the input column. The format argument required by these functions is discussed below.

**getDate(x,format)**- returns the day of the month represented by this date. The value returned is between 1 and 31.
**getDay(x,format)**- returns the day of the week represented by this date. The value returned is between 1 and 7, where 1 represents Sunday.
**getHours(x,format)**- returns the hour represented by this date. The value returned is between 0 and 23, where 0 represents midnight.
**getMinutes(x,format)**- returns the number of minutes past the hour represented by this date. The value returned is between 0 and 59.
**getMonth(x,format)**- returns the month represented by this date. The value returned is between 1 and 12, with the value 1 representing January.
**getSeconds(x,format)**- returns the number of seconds past the minute represented by this date. The value returned is between 0 and 60. The value 60 can only occur on those Java Virtual Machines that take leap seconds into account.
**getYear(x,format)**- returns the year represented by this date.

The format argument to the above functions represents a text pattern of the dates/times to be parsed. The following pattern letters are defined (all other characters from`'A'`

to`'Z'`

and from`'a'`

to`'z'`

are reserved):Letter Date or Time Component Examples `G`

Era designator `AD`

`y`

Year `1996`

;`96`

`M`

Month in year `July`

;`Jul`

;`07`

`w`

Week in year `27`

`W`

Week in month `2`

`D`

Day in year `189`

`d`

Day in month `10`

`F`

Day of week in month `2`

`E`

Day in week `Tuesday`

;`Tue`

`a`

Am/pm marker `PM`

`H`

Hour in day (0-23) `0`

`k`

Hour in day (1-24) `24`

`K`

Hour in am/pm (0-11) `0`

`h`

Hour in am/pm (1-12) `12`

`m`

Minute in hour `30`

`s`

Second in minute `55`

`S`

Millisecond `978`

`z`

Time zone `Pacific Standard Time`

;`PST`

;`GMT-08:00`

`Z`

Time zone `-0800`

Date/Time Format 12/31/2007 "MM/dd/yyyy" 12-31-2007 "MM-dd-yyyy" 2007/12/31 "yyyy/MM/dd" Dec 31, 2007 23:59:00 "MMM dd, yyyy HH:mm:ss" **Column Functions**The functions below have a column name(s) as an argument(s) and return numeric values or vectors. The function names are not case sensitive.

**count(x)**- returns the number of values in the column
**cor(x,y)**- correlation between x and y
**cov(x,y)**- covariance between x and y
**diff(x)**- returns a vector of consecutive differences, x[i] - x[i-1], of the column
**max(x)**- returns the maximum of the column
**mean(x)**- returns the mean of the column
**median(x)**- returns the median of the column
**min(x)**- returns the minimum of the column
**percentile(x,p)**- returns the pth percentile of x
**range(x)**- returns the range of the column
**rep(x,y)**- returns each value in x repeated a corresponding number of y times (with all sequences stacked)
**seq(x,y,z)**- returns the sequence of values from x to y by z (with all sequences stacked)
**sort(x)**- returns the sort of the column
**std(x)**- returns the standard deviation of the column
**subset(x,y)**- returns a vector containing the values of the vector x where the corresponding values of the y vector are true (note the y values in the y vector are coerced to boolean values).
**sum(x)**- returns the sum of the column
**var(x)**- returns the variance of the column

**Row Functions**The functions below require a comma delimited list of column names (...) as an argument and returns in most cases a vector of values for each row.

**pconcat(...)**- row wise string concatenation
**pmax(...)**- row wise maximum
**pmean(...)**- row wise mean
**pmin(...)**- row wise minimum
**pvar(...)**- row wise variance
**concat(...)**- stacks all values into a single column

Some of the graphics provide an option to view separate graphs for each group. This option is not selected by default. If this option is not chosen, then the plot will be color coded based on the grouping variable for easy reference.

Orderings can be added, deleted and modified using the **Edit > Orderings** menu option. When this option is selected,
a new dialog will appear with a listing of all the active orderings for the current StatCrunch session. To modify/remove a
particular ordering, select the ordering from the list and then press the **Edit**/**Delete** button. A new ordering can
be added by clicking the **Add new ordering** button. When modifying or adding an ordering, the distinct values should be
entered one per line in the resulting text field. StatCrunch ignores case when comparing ordering values to group labels
to determine whether or not to apply the ordering to specific output.

StatCrunch applets offer some exciting capabilities beyond those encountered in most Java applets for statistical education. First, the applets can be generated using data loaded in the StatCrunch software package whereas most stand alone applets have a data set preloaded which can not be easily changed. This feature opens the door for developing interesting demonstrations in a variety of different customized data settings. Second, these components can be saved as standalone applets and shared with others. This allows instructors to develop applets for interesting relevant data sets for their courses and easily share them with their students as well as other instructors. If an applet is shared with everyone as part of the saving process, the applet will also be viewable by those without StatCrunch accounts. When saving an applet, the current state of the applet is also saved. This allows students to also develop and save applets as part of course work to be evaluated by their instructor. The applets that are currently available are described below.

Users may interactively change the starting point and bin width of the resulting histogram applet using the sliders provided. As an example of potential classroom usage, individual students might provide their own choices for these settings and save their results so that they can be compared with other students. The current setting for starting point and bin width are saved when the applet is saved using the Options > Export to My Results menu option on the applet window within StatCrunch.

For more info and an interactive example see:

http://www.statcrunch.com/5.0/viewreport.php?reportid=15261

The user controls the green line shown in the scatter plot of the resulting applet by dragging the green dots on each end of the line. The resulting intercept and slope of the green line is displayed on the bottom line of the table below the graph along with the line's sum of squared error (SSE), which is the sum of squared vertical deviations of the data points from the line. The line of best fit in regression is defined as the line that minimizes SSE. The table also displays these values for the true regression line in red. Additional points such as outliers can be added to the analysis by simply clicking at the desired location on the graphic. Individual points can be moved by clicking on them and dragging the mouse. Points can be removed from the analysis completely by dragging them to the trash. Clicking on the trash can will clear all of the points and provide a blank slate for the regression analysis. After doing their best to minimize SSE, the user can click the Show regression line button to plot the true regression line in red to be compared with the user's green line. Clicking the button again, will hide the regression line. The two light gray lines on the graphic show the mean of the X and Y values. An experienced user can use these values to better estimate the true regression line. For additional practice, the user can click the Create new data button to go through the exercise again with a randomly generated data set. For classroom usage, students might compete to see who can best approximate the true regression line for a specific set of data.

For more info and an interactive example see:

http://www.statcrunch.com/5.0/viewreport.php?reportid=15262

The interface for constructing the applet is very similar to that of the two sample t procedure within StatCrunch. The user selects the column containing the first sample along with an optional Where clause to specify the data rows to be included in the sample. The user then selects the column containing the second sample along with an optional Where clause to specify the data rows to be included in the sample. Note that for stacked data the first and second column will typically be the same. In this case, the user may also enter optional short labels for each sample to be used in the resulting applet.

The resulting applet offers the ability to randomize the data 1 time, 5 times or 1000 times. These ideas match pedagogically with the ideas of step, walk and run, respectively. Click the 1 time button to see a single randomization of the data. A new window will pop up which shows the actual shuffling of the sample labels with the results then displayed in the data table within the applet. The difference between the means of the shuffled samples (Sample 1 - Sample 2) for the shuffled sample labels is computed below the data table. If the mean difference for the randomized data is more extreme in absolute value than the observed difference for the original data, it will be displayed in red otherwise it is displayed in black. Click the 5 times button to see this process repeated five times at a slightly faster pace. A running tally of the results is provided in the Results panel in the right portion of the applet. A histogram of color-coded randomized mean differences is displayed. A table of the frequencies and relative frequencies for the randomized mean differences that are less than the negative absolute value of the observed mean difference or greater than the absolute value of the observed mean difference is also shown. The counts/proportions totaled across both regions is also provided in the bottom line of this table. The user can use the appropriate line from this table to estimate a P-values based on the proper alternative for a large number of randomizations.

A large number of randomizations can be quickly generated by clicking the 1000 times button. In this case, the individual randomizations are not displayed, but the Results panel is quickly updated after each randomization is computed. A user may desire to click this button several times to generate an even larger number of total randomizations. The user may then go back and inspect the results by clicking on the individual randomization numbers displayed in the applet or by clicking on the histogram in the Results panel. The far left panel in the applet contains the results of each randomization ordered numerically with those providing more extreme than observed results shown in red. Clicking on a randomization number shows the corresponding randomized data in the data table and the resulting mean difference. A small triangle is also displayed at the proper location corresponding to this mean difference within the histogram. The Graph button located below the data table can be used to construct a simple dotplot of the selected randomization. A user can select randomizations within the histogram by clicking and dragging the mouse to draw a rectangle intersecting a specific bar or bars. The total number of randomizations with mean differences in the selected bar(s) is then displayed in a pop up window along with the option for selecting individual color-coded randomizations falling within the selected range. From a pedagogical perspective, students may use this applet to understand the idea of repeated randomization and may identify and explore randomizations that are extreme.

When saving the applet from within StatCrunch, the user is asked whether or not to save the current random number seed. This seed determines the randomizations that will be generated as described above. If the seed is saved, subsequent users will generate the exact same randomizations as the current user. If it is not saved, then subsequent users will each get different randomizations.

For more info and an interactive example see:

http://www.statcrunch.com/5.0/viewreport.php?reportid=15263

The interface for constructing the applet is very similar to that of the linear regression procedure within StatCrunch. The user selects the columns containing the X and Y samples along with an optional Where clause to specify the data rows to be included in the analysis.

The resulting applet offers the ability to randomize the data 1 time, 5 times or 1000 times. These ideas match pedagogically with the ideas of step, walk and run, respectively. First click the 1 time button to see a single randomization of the data. A new window will pop up that shows the actual shuffling of the Y values with the results then displayed in the data table within the applet. The sample correlation for the shuffled data is computed below the data table. If the sample correlation for the randomized data is more extreme in absolute value than the observed sample correlation for the original data, it will be displayed in red otherwise it is displayed in black. Click the 5 times button to see this process repeated five times at a slightly faster pace. A running tally of the results is provided in the Results panel in the right portion of the applet. A histogram of color-coded correlations is displayed. A table of the frequencies and relative frequencies for the randomized mean differences that are less than the negative absolute value of the observed sample correlation or greater than the absolute value of the observed sample correlation is shown. The counts/proportions totaled across both regions is also provided in the bottom line of this table. The user can use the appropriate line from this table to estimate a P-value for a given alternative based on a large number of randomizations.

A large number of randomizations can be quickly generated by clicking the 1000 times button. In this case, the individual randomizations are not displayed, but the results panel is quickly updated after each randomization is computed. Click this button several times to generate an even larger number of total randomizations. After generating a large number of randomizations, the user may then go back and inspect the results by clicking on the individual randomization numbers displayed in the applet or by clicking on the histogram in the Results panel. The far left panel in the applet contains the results of each randomization ordered numerically with those providing more extreme than observed results shown in red. Clicking on a randomization number shows the corresponding randomized data in the data table and the resulting sample correlation. A small triangle is also displayed at the proper location corresponding to this sample correlation within the histogram. The Graph button located below the data table can be used to construct a simple scatter plot of the selected randomization. A user can select randomizations within the histogram by clicking and dragging the mouse to draw a rectangle intersecting a specific bar or bars. The total number of randomizations with sample correlations in in the selected bar(s) is then displayed in a pop up window along with the option for selecting individual color-coded randomizations falling within the selected range. From a pedagogical perspective, students may use this applet to understand the idea of repeated randomization and may identify and explore randomizations that are extreme.

When saving the applet from within StatCrunch, the user is asked whether or not to save the current random number seed. This seed determines the randomizations that will be generated as described above. If the seed is saved, subsequent users will generate the exact same randomizations as the current user. If it is not saved, then subsequent users will each get different randomizations.

For more info and an interactive example see:

http://www.statcrunch.com/5.0/viewreport.php?reportid=15264