Data Analysis

Use the Data Analysis tool to analyze univaritate and bivariate data in graphical and numeric displays.

A Data Analysis data sheet is like a spreadsheet environment with statistics related functionality. |

To modify or edit a pre-loaded data set, choose Edit | Enable.

Choose a data set from the Data menu for quick access to pre-loaded data examples.

Help topics for Edit menu options including Cut, Copy, Paste, Fill Down, Set # of Digits, Column Name, and Column Formula are common across Spreadsheet and Data Analysis.

Note: Some functionality is available within Spreadsheet and not in Data Analysis (e.g., Sort, Delete and Insert rows/columns).

Note: Settings for graphs are available within the Options menus once a plot is shown.

Options for Graphical Display (see Graph menu) include those for univariate and bivariate data display, statistical plots, and frequency tables. Link to additional help contents and settings for each graph type:

Note: The Tools menu of each graph window gives options to Copy a graph image to a clipboard to paste into another application (e.g., Text Editor). Also choose to view Tool | Basic Statistics for the plotted data.

Use a single column, or Control+Click to select more than one column of data for a "stacked" Histogram.

- Choose Options | Relative Frequencies to display relative frequencies
(percent) above each histogram (the vertical axis will reflect this labeling
scheme). Otherwise, the frequency (number of items) is displayed.
Note: If Options | Label Bars is not checked, individual histogram bars will not be labeled yet the vertical axis labeling will match the appropriate selection.

- Other entries of the Histogram Options menu change the default appearance of the plot. They are displayed when checked, or hidden otherwise. These include: Labels, Grid, Standard Deviation.
- Histogram settings can be modified by typing or dragging at the bottom of the plot window. Change the "Min X" or minimum value in a data set that will be included in the plot. Press enter. Also change the "Bin Width" to set the width of all histogram bars.

Use a single column, or Control+Click to select more than one column of data for a "stacked" Box Plot.

- To change the default appearance of the box plot, select an entry of the Options menu.
- When Options | Outliers is checked any existent outliers will be displayed as distinct points.
- When Options | Horizontal is checked the plot is oriented horizontally.

Two columns must be selected to plot Scatterplots; one independent and one dependent column are required.

- Excluded :
First select a data point (highlighted in blue)
then choose to
remove that data point from the plot (highlighted in red).
Note: If the Options | Show Equation(s) is checked when points are excluded (or included) the regression equation(s) will adjust accordingly, with the most recent equation in blue - the others in red.

- Draw Moveable Model : Show or hide a moveable line to estimate the best-fit curve for the data. Click and drag the green squares to alter the position and shape of the model.
- Draw Regression Model :
Show (or hide) the regression model (of the chosen type) on the scatterplot.
- First choose one regression model type from the Models menu (deselect all others). Then choose .
- While one regression model type is drawn, you may add other types of models by selecting them from the Models menu.
- The most recently drawn regression model is colored in blue and all previously drawn regression models are in red.

Note: Regression models are not moveable, but will be reformulated and drawn if points are removed/excluded (or re-included).

See also Moveable Model. - Draw Residuals :
Show (or hide) the residuals for a drawn moveable model ()
and the most recently drawn regression model ().
At least one regression model or a moveable line must be drawn for residuals
to be plotted.
See also Statistics | Regression, Models | Analysis.

- Draw Squares :
Show (or hide) the squares of the residuals for a drawn moveable model ()
and the most recently drawn regression model ().
At least one moveable model or a moveable line must be drawn for squares
to be plotted.
See also Error Thermometer.

- Draw Means :
Show (or hide) horizontal and vertical lines representing the
*x*and*y*means for the plotted data. Move the mouse cursor over the intersection of these red lines to give the ordered pair of the means.Note: If a data point is excluded () the means will adjust accordingly.

- Choose Options | Show Equations to show (or hide) the equations of any drawn moveable or regression models on the scatterplot. The equations will be colored-coded to match the model they represent.
- Select Options | Show Predicted
Value(s) to show (hide) a table of predicted values corresponding to all
drawn regression ()
and moveable ()
models in a separate window. At least one regression model or a moveable
line must be drawn for residuals to be plotted. Click on a table entry and
use the arrow keys on your keyboard to move through the table of values;
notice the corresponding point and coordinates are highlighted in the scatterplot
window.
Note: The range of x values that are used within the scatterplot window will be used by default for the Predicted Values window. Type numerical values into the Min x, Max x, and x Step boxes and press Enter after typing in each box to change these settings.

- Choose Options | Error Thermometer
to show (or hide) a depiction of the sum of the squared errors (SSE) for
the drawn models to the right of the scatterplot. This option is useful when
comparing the SEE of the moveable model with the SEE of the regression model.
See also Draw Squares.

- Guess Correlation :
Show (or hide) the guess correlation bar to the right of the scatterplot.
Select a guess for
*r*from one of the five options available then click Show r to view the actual correlation value.Note: Although you can calculate a correlation for any scatterplot, r measures only straight-line (linear) relationships.

- Plot Information :
Show (or hide) scatterplot information. Options include those for modifying
labels for
*x*and*y,*viewing basic statistics for the plotted data, naming the plot (or title), and adjusting window settings. Additionally, you may check options to Show Correlation, Show SEE, and Show Confidence Bands. - The Scatterplot | Models menu allows you to select (or deselect) the type of model that is drawn on a scatterplot. The chosen model applies to both Moveable () and Regression () models.
- Choose Models | Enter/Edit a Model
to type your own model type in "
*y*= ..." form. If such a model has already been drawn, choosing this menu option will allow you to edit the previously-entered model.Note: The program will not accept your model if you type "y=" in the Enter/Edit a Model window. Instead, enter the rest of the model, without the "y=" then click OK.

- Analysis :
Analyze a plot of the independent variable and the plot of residuals in a
separate window. Within this Residual Plot(s) window:
- The Tool menu gives options to view Basic Statistics, Print, or Save the residual analysis.
- Choose an entry from the Options menu to change the view of the residual plot. The default view of Residual vs. X is selected, when other options are selected they will be added to the right of the original selection. Menu entries include: Residuals vs. X, Residuals vs. Y, Residuals vs. Order, Normal Probability Plot, and Regression Analysis.

Two or more columns must be selected to plot a Matrix Plot.

At least one column of data must be selected to plot a Time Series graph (multi-column graphs will be color-coded).

At least one column of data must be selected to plot a Normal Plot graph (multi-column graphs will be color-coded).

Frequency Table | Histogram, Box Plot

There are two options for frequency tables: Histogram, and Box Plot. To use these plot options, the values of frequency table should be listed in one column, with their corresponding frequencies in another column.

Descriptive Statistics

Choose Statistics | Descriptive Statistics to view summary statistics for the columns of an active data sheet. The following calculations are given for each column:

n (number of values in the column), Mean (arithmetic average), Minimum (smallest value), Q1 (first quartile), Median (middle value of ordered list), Q3 (third quartile), Maximum (largest value), Sample Standard Deviation (a measure of spread), Sample Variance (the square of the standard deviation).

To use the Chi-Square Test:

- First open or create your own bivariate categorical data set (e.g., Crying).
- Choose Statistics | Chi-Square Test to open a Chi-Square Analysis window to obtain the chi-squared statistic.
- Use the Tests menu and choose to Test for Independence, Test for Homogeneity, or determine a Goodness of Fit.
- Use the Options menu to Show Expected Counts, Show Expected Percents, or Show Critical Values.

Create an approximate sampling distribution of possible differences in means, medians, or standard deviations of two treatments by rerandomizing, then identify extreme events.

See Statistics & Probability Custom Apps.

Explore the shape, mean, and standard deviation of distributions of sample means, medians, or standard deviations for various sample sizes.

See Statistics & Probability Custom Apps.

Within an open data set, choose a regression type (e.g., linear) from the
Statistics** ** menu. Within the Choose Columns window, select
one column for each of the independent and dependent variables. Click OK to
approve the choices and view the Regression Analysis Frame. Several options
are available within this frame:

- The Results tab lists all regression statistics for the chosen regression type (e.g., Linear) including: Sample Statistics, Coefficient Estimates, 95% Confidence Intervals, and Analysis of Variance.
- The Graph tab shows a scatterplot of the data along with the chosen regression model, shown in red. The regression equation is shown above the graph.
- The Residuals tab shows a plot of the residuals on the scatterplot of the data. A time series plot of the data is also shown within this tab.