Descriptive Facets
D01 Defining Variables
D010a Variables are valid and well defined.
D010b Variables have predictive validity and well defined.
D019a Variables are not well defined.
D019b Variables are invalid.
Graphical Displays
D02 General Graphics: axes
D020a An axes can be broken when outliers are present, but an unbroken plot should also be shown.
D020b The range of the axes is a bit bigger than the range of the data.
D024 Axes must contain the origin.
D029 Points may be joined across a break in the axes (e.g., time series).
D03 General Graphics: placing points
D030 The value of the data point determines is position on the axis.
D039 The rank of a data point determines its position on the axis.
D04 Histograms: Generation
D040 A histogram is used to display a continuous data set with many observations.
D047 A histogram is used to display a continuous data set with few observations.
D048 A bar chart is used to display continuous data (i.e., each observation is represented as a bar with height equal to the value.
D049 Class widths are poorly chosen (unequal or two few).
D05 Histograms: Interpretation
D050 The area of each box indicates its relative frequency.
D059 The height (alone) of each box indicates its relative frequency.
D06 Pie Charts
Context: Counts or relative frequencies of several categories. The relative frequencies may not sum to 100%. (i.e., there are some categories missing).
D060 A pie chart (or segmented bar chart) is generated to emphasize parts of a whole which totals 100%. A category labeled "other" is added, if necessary.
D065 A pie chart is not generated because of a missing "other" category necessary to sum to 100%.
D069 A pie chart is generated and the missing "other" category is ignored.
D07 Bar Charts: single nominal variable
D070 X-axis = categories, Y-axis = counts or relative frequency.
D072 X-axis = counts or relative frequency , Y-axis = categories.
D079 The plot includes a smooth curve.
D08 Scatterplots
Context: Two continuous variables, small labeled data set.
D080a A scatterplot is generated and a straight-line summary is included, if appropriate.
D080b A scatterplot is generated and the points are connected if the plot is a time-series plot and the connection clarifies the graph.
D080c Some or all points are labeled, in addition to D080a or b.
D081 A scatterplot is generated with no other summaries, if appropriate.
D087 The continuous variables are plotted against the labels.
D088 Points are connected inappropriately (for all scatterplots).
D089 An inappropriate straight-line summary is included.
Numerical Summaries
D09 General
D090a Outliers or skewness is considered when choosing summary statistics.
D090b The middle and spread together are usually the best two numbers to describe the data.
D097 Two measures of the middle are usually the best two numbers to describe the data.
D098a No cases may be omitted when computing summary statistics.
D098b A single number can usually describe a data set.
D099 Random data cannot be summarized.
D10 Single number summaries
Note: An expert facet may exist depending on the context of the problem.
D105 The single number which best describes the data must be a measure of location.
D106 The single number which best describes the data is the mean.
D108 The single number which best describes the data must be a number in the data set.
D109 Max or min are usually good single number summaries best describing the data.
D11 Location
D110a Outliers are considered in choosing a summary measure of location. If present, a trimmed measure or median is used.
D110b Skewness is considered in choosing a summary measure of location. If present, a median is used.
D119 The average is always a good single number which best describes the location.
D12 Spread
D120a Outliers are considered when choosing a summary of spread.
D120b Skewness is considered when choosing a summary of spread.
D120c The standard deviation is a measure of average distance from the mean.
D129 The standard deviation is always a good single number which best describes the spread (i.e., outliers and/or skewness are not considered).
D13 Discrete Variables
D130a The modal category is typically a good single number summary.
D130b Outliers are considered.
D130c Skewness is considered.
D130d The average is a good single number summary, provided it can be computed, and D130b and c are present.
D137 The single number summary must be one of the categories.
D138 The average is used as a single number summary when outliers are present.
D139 The average is used as a single number summary when the categories are only partially defined.
D14 Single 2x2 table
Context: Consider the following 2´ 2 table.
|
Trt1 |
Trt 2 |
Tot |
|
|
S |
a |
b |
m 1 |
|
F |
c |
d |
m 2 |
|
Tot |
n 1 |
n 2 |
D140a To compare Treatment 1 with Treatment 2, compare a/n1 with b/n2, a/c with b/d, or c/n1 and d/n2.
D140b When n1=n2, to compare Treatment 1 and 2 compare a and b, or c and d.
D148 When n1¹ n2, to compare Treatment 1 and 2 compare a and b, or c and d.
D149 To compare Treatment 1 and 2, compare a/m1 with c/m2, or b/m1 with d/m2, or a/b with c/d.
D15 Discrete x Continuous
D150 A measure of location and spread for each category is a useful numerical summary for the data.
D155 A measure of location for each category plus an overall measure of location is a good summary of the data.
D156a A measure of location for each category plus the range of the averages for the categories is a good summary of the data.
D156b A measure of location for each category plus an overall measure of spread is a good summary of the data.
D157 A measure of location (alone) for each category is a good summary of the data.
D159 The categories are ignored in summarizing the data
D16 Paired difference data: summaries
D160 The average or median difference is used to summarize the data.
D167 A summary of the increases and summary of the decreases are provided as a useful summary of the data.
D168 The total of the increases minus the total of the decreases is used to summarize the data.
D169 Paired data is reduced to the number of increases and the number of decreases (with no regard to the loss of information).
D17 Counts and Rates
Context: Counts (n1,n2) and population sizes (N1,N2, N1¹ N2) are given.
D170 Rates are used to compare and describe population attribute.
D178 Counts are used to compare and describe population attribute.
D178 No comparison is deemed possible because N1¹ N2.