Descriptive Facets

Back to facet modules.

D01 Defining Variables

D010a Variables are valid and well defined.

D010b Variables have predictive validity and well defined.

D019a Variables are not well defined.

D019b Variables are invalid.

Graphical Displays

D02 General Graphics: axes

D020a An axes can be broken when outliers are present, but an unbroken plot should also be shown.

D020b The range of the axes is a bit bigger than the range of the data.

D024 Axes must contain the origin.

D029 Points may be joined across a break in the axes (e.g., time series).

D03 General Graphics: placing points

D030 The value of the data point determines is position on the axis.

D039 The rank of a data point determines its position on the axis.

D04 Histograms: Generation

D040 A histogram is used to display a continuous data set with many observations.

D047 A histogram is used to display a continuous data set with few observations.

D048 A bar chart is used to display continuous data (i.e., each observation is represented as a bar with height equal to the value.

D049 Class widths are poorly chosen (unequal or two few).

D05 Histograms: Interpretation

D050 The area of each box indicates its relative frequency.

D059 The height (alone) of each box indicates its relative frequency.

D06 Pie Charts

Context: Counts or relative frequencies of several categories. The relative frequencies may not sum to 100%. (i.e., there are some categories missing).

D060 A pie chart (or segmented bar chart) is generated to emphasize parts of a whole which totals 100%. A category labeled "other" is added, if necessary.

D065 A pie chart is not generated because of a missing "other" category necessary to sum to 100%.

D069 A pie chart is generated and the missing "other" category is ignored.

D07 Bar Charts: single nominal variable

D070 X-axis = categories, Y-axis = counts or relative frequency.

D072 X-axis = counts or relative frequency , Y-axis = categories.

D079 The plot includes a smooth curve.

D08 Scatterplots

Context: Two continuous variables, small labeled data set.

D080a A scatterplot is generated and a straight-line summary is included, if appropriate.

D080b A scatterplot is generated and the points are connected if the plot is a time-series plot and the connection clarifies the graph.

D080c Some or all points are labeled, in addition to D080a or b.

D081 A scatterplot is generated with no other summaries, if appropriate.

D087 The continuous variables are plotted against the labels.

D088 Points are connected inappropriately (for all scatterplots).

D089 An inappropriate straight-line summary is included.

Numerical Summaries

D09 General

D090a Outliers or skewness is considered when choosing summary statistics.

D090b The middle and spread together are usually the best two numbers to describe the data.

D097 Two measures of the middle are usually the best two numbers to describe the data.

D098a No cases may be omitted when computing summary statistics.

D098b A single number can usually describe a data set.

D099 Random data cannot be summarized.

D10 Single number summaries

Note: An expert facet may exist depending on the context of the problem.

D105 The single number which best describes the data must be a measure of location.

D106 The single number which best describes the data is the mean.

D108 The single number which best describes the data must be a number in the data set.

D109 Max or min are usually good single number summaries best describing the data.

D11 Location

D110a Outliers are considered in choosing a summary measure of location. If present, a trimmed measure or median is used.

D110b Skewness is considered in choosing a summary measure of location. If present, a median is used.

D119 The average is always a good single number which best describes the location.

D12 Spread

D120a Outliers are considered when choosing a summary of spread.

D120b Skewness is considered when choosing a summary of spread.

D120c The standard deviation is a measure of average distance from the mean.

D129 The standard deviation is always a good single number which best describes the spread (i.e., outliers and/or skewness are not considered).

D13 Discrete Variables

D130a The modal category is typically a good single number summary.

D130b Outliers are considered.

D130c Skewness is considered.

D130d The average is a good single number summary, provided it can be computed, and D130b and c are present.

D137 The single number summary must be one of the categories.

D138 The average is used as a single number summary when outliers are present.

D139 The average is used as a single number summary when the categories are only partially defined.

D14 Single 2x2 table

Context: Consider the following 2´ 2 table.

 

Trt1

Trt 2

Tot

S

a

b

m1

F

c

d

m2

Tot

n1

n2

 

D140a To compare Treatment 1 with Treatment 2, compare a/n1 with b/n2, a/c with b/d, or c/n1 and d/n2.

D140b When n1=n2, to compare Treatment 1 and 2 compare a and b, or c and d.

D148 When n1¹ n2, to compare Treatment 1 and 2 compare a and b, or c and d.

D149 To compare Treatment 1 and 2, compare a/m1 with c/m2, or b/m1 with d/m2, or a/b with c/d.

D15 Discrete x Continuous

D150 A measure of location and spread for each category is a useful numerical summary for the data.

D155 A measure of location for each category plus an overall measure of location is a good summary of the data.

D156a A measure of location for each category plus the range of the averages for the categories is a good summary of the data.

D156b A measure of location for each category plus an overall measure of spread is a good summary of the data.

D157 A measure of location (alone) for each category is a good summary of the data.

D159 The categories are ignored in summarizing the data

D16 Paired difference data: summaries

D160 The average or median difference is used to summarize the data.

D167 A summary of the increases and summary of the decreases are provided as a useful summary of the data.

D168 The total of the increases minus the total of the decreases is used to summarize the data.

D169 Paired data is reduced to the number of increases and the number of decreases (with no regard to the loss of information).

D17 Counts and Rates

Context: Counts (n1,n2) and population sizes (N1,N2, N1¹ N2) are given.

D170 Rates are used to compare and describe population attribute.

D178 Counts are used to compare and describe population attribute.

D178 No comparison is deemed possible because N1¹ N2.

 



Andrew Schaffner
University of Washington
Department of Statistics
andrew@stat.washington.edu