ALL OF EDA ( Exploratory Data Analysis Process )


EDA vs. HYPOTHESIS TESTING
As opposed to traditional hypothesis testing designed to verify a priori hypotheses about relations between variables (e.g., "There is a positive correlation between the AGE of a person and his/her RISK TAKING disposition"), Exploratory Data Analysis (EDA) is used to identify systematic relations between variables when there are no (or not complete) a priori expectations as to the nature of those relations. In a typical Exploratory Data Analysis Process, many variables are taken into account and compared, using a variety of techniques in the search for systematic patterns.
 
all-of-eda
Image source : towardsdatascience.com

COMPUTATIONAL EDA TECHNIQUES
Computational Exploratory Data Analysis methods include both simple basic statistics and more advanced, designated multivariate exploratory techniques designed to identify patterns in multivariate data sets.
Basic statistical exploratory methods. The basic statistical exploratory methods include such techniques as examining distributions of variables (e.g., to identify highly skewed or non-normal, such as bi-modal patterns), reviewing large correlation matrices for coefficients that meet certain thresholds (see example above), or examining multi-way frequency tables (e.g., "slice by slice" systematically reviewing combinations of levels of control variables).

MULTIVARIATE EXPLORATORY TECHNIQUES
Multivariate exploratory techniques designed specifically to identify patterns in multivariate (or univariate, such as sequences of measurements) data sets include: Cluster Analysis, Factor Analysis, Discriminant Function Analysis, Multidimensional Scaling, Log-linear Analysis, Canonical Correlation, Stepwise Linear and Nonlinear (e.g., Logit) Regression, Correspondence Analysis, Time Series Analysis, and Classification Trees.

GRAPHICAL (DATA VISUALIZATION) EDA TECHNIQUES
A large selection of powerful exploratory data analytic techniques is also offered by graphical data visualization methods that can identify relations, trends, and biases "hidden" in unstructured data sets. Brushing.
 
all-of-eda
Image source : boostlabs.com
Perhaps the most common and historically first widely used technique explicitly identified as graphical exploratory data analysis is brushing, an interactive method allowing us to select on-screen specific data points or subsets of data and identify their (e.g., common) characteristics, or to examine their effects on relations between relevant variables.
Those relations between variables can be visualized by fitted functions (e.g., 2D lines or 3D surfaces) and their confidence intervals, thus, for example, we can examine changes in those functions by interactively (temporarily) removing or adding specific subsets of data.

For example, one of many applications of the brushing technique is to select (i.e., highlight) in a matrix scatterplot all data points that belong to a certain category (e.g., a "medium" income level, see the highlighted subset in the fourth component graph of the first row in the illustration left) in order to examine how those specific observations contribute to relations between other variables in the same data set (e.g, the correlation between the "debt" and "assets" in the current example).

If the brushing facility supports features such as "animated brushing" or "automatic function re-fitting," we can define a dynamic brush that would move over the consecutive ranges of a criterion variable (e.g., "income" measured on a continuous scale or a discrete [3-level] scale as on the illustration above) and examine the dynamics of the contribution of the criterion variable to the relations between other relevant variables in the same data set.

Other graphical EDA techniques. Other graphical exploratory analytic techniques include function fitting and plotting, data smoothing, overlaying and merging of multiple displays, categorizing data, splitting/merging subsets of data in graphs, aggregating data in graphs, identifying and marking subsets of data that meet specific conditions, icon plots, shading, plotting confidence intervals and confidence areas (e.g., ellipses), generating tessellations, spectral planes, integrated layered compressions, and projected contours, data image reduction techniques, interactive (and continuous) rotation with animated stratification (cross-sections) of 3D displays, and selective highlighting of specific series and blocks of data.

VERIFICATION OF RESULTS OF EDA
The exploration of data can only serve as the first stage of data analysis and its results can be treated as tentative at best as long as they are not confirmed, e.g., crossvalidated, using a different data set (or an independent subset).
 
all-of-eda
Image source : firstdraftnews.com
If the result of the exploratory stage suggests a particular model, then its validity can be verified by applying it to a new data set and testing its fit (e.g., testing its predictive validity). Case selection conditions can be used to quickly define subsets of data (e.g., for estimation and verification), and for testing the robustness of results.


Reference source : documentation(dot)statsoft(dot)com

ALL OF EDA ( Exploratory Data Analysis Process ) ALL OF EDA ( Exploratory Data Analysis Process ) Reviewed by AIA on December 21, 2019 Rating: 5

Post Comments

No comments:

Powered by Blogger.