EDA vs. HYPOTHESIS TESTING
As opposed to traditional hypothesis testing designed to
verify a priori hypotheses about relations between variables (e.g., "There is a positive correlation between the
AGE of a person and his/her RISK TAKING disposition"), Exploratory Data Analysis (EDA) is used to identify systematic relations between variables
when there are no (or not complete) a priori expectations as to the nature of
those relations. In a typical Exploratory Data Analysis Process, many variables
are taken into account and compared, using a variety of techniques in the
search for systematic patterns.
COMPUTATIONAL EDA TECHNIQUES
Computational Exploratory Data Analysis methods include both simple basic statistics and more
advanced, designated multivariate exploratory techniques designed to identify
patterns in multivariate data sets.
Basic statistical exploratory methods. The basic
statistical exploratory methods include such techniques as examining distributions
of variables (e.g., to identify highly skewed or non-normal, such as bi-modal
patterns), reviewing large correlation matrices for coefficients that meet
certain thresholds (see example above), or examining multi-way frequency tables
(e.g., "slice by slice" systematically reviewing combinations of
levels of control variables).
MULTIVARIATE EXPLORATORY TECHNIQUES
Multivariate exploratory techniques designed specifically
to identify patterns in multivariate (or univariate, such as sequences of
measurements) data sets include: Cluster Analysis, Factor Analysis,
Discriminant Function Analysis, Multidimensional Scaling, Log-linear Analysis,
Canonical Correlation, Stepwise Linear and Nonlinear (e.g., Logit) Regression,
Correspondence Analysis, Time Series Analysis, and Classification Trees.
GRAPHICAL (DATA VISUALIZATION) EDA TECHNIQUES
A large selection of powerful exploratory data analytic
techniques is also offered by graphical data visualization methods that can
identify relations, trends, and biases "hidden" in
unstructured data sets. Brushing.
Perhaps the most common and historically first widely used
technique explicitly identified as graphical exploratory data analysis is
brushing, an interactive method allowing us to select on-screen specific data points
or subsets of data and identify their (e.g., common) characteristics, or to
examine their effects on relations between relevant variables.
Those relations between variables can be visualized by
fitted functions (e.g., 2D lines or 3D surfaces) and their confidence
intervals, thus, for example, we can examine changes in those functions by
interactively (temporarily) removing or adding specific subsets of data.
For example, one of many applications of the brushing
technique is to select (i.e., highlight) in a matrix scatterplot all data
points that belong to a certain category (e.g., a "medium"
income level, see the highlighted subset in the fourth component graph of the
first row in the illustration left) in order to examine how those specific
observations contribute to relations between other variables in the same data
set (e.g, the correlation between the "debt" and
"assets" in the current example).
If the brushing facility supports features such as "animated
brushing" or "automatic function re-fitting," we can
define a dynamic brush that would move over the consecutive ranges of a
criterion variable (e.g., "income" measured on a continuous
scale or a discrete [3-level] scale as on the illustration above) and examine
the dynamics of the contribution of the criterion variable to the relations
between other relevant variables in the same data set.
Other graphical EDA techniques. Other graphical exploratory
analytic techniques include function fitting and plotting, data smoothing,
overlaying and merging of multiple displays, categorizing data,
splitting/merging subsets of data in graphs, aggregating data in graphs,
identifying and marking subsets of data that meet specific conditions, icon
plots, shading, plotting confidence intervals and confidence areas (e.g.,
ellipses), generating tessellations, spectral planes, integrated layered
compressions, and projected contours, data image reduction techniques,
interactive (and continuous) rotation with animated stratification
(cross-sections) of 3D displays, and selective highlighting of specific series
and blocks of data.
VERIFICATION OF RESULTS OF EDA
The exploration of data can only serve as the first stage
of data analysis and its results can be treated as tentative at best as long as
they are not confirmed, e.g., crossvalidated, using a different data set (or an
independent subset).
If the result of the exploratory stage suggests a
particular model, then its validity can be verified by applying it to a new
data set and testing its fit (e.g., testing its predictive validity). Case
selection conditions can be used to quickly define subsets of data (e.g., for
estimation and verification), and for testing the robustness of results.
Reference source : documentation(dot)statsoft(dot)com
ALL OF EDA ( Exploratory Data Analysis Process )
Reviewed by AIA
on
December 21, 2019
Rating:

No comments: