DATA PREPARATION (in Data Mining)
![]() |
Image source : talend.com |
Data preparation and cleaning is an often neglected but
extremely important step in the data mining process. The old saying "garbage-in-garbage-out"
is particularly applicable to the typical data mining projects where large data
sets collected via some automatic methods (e.g., via the Web) serve as the
input into the analyses. Often, the method by which the data where gathered was
not tightly controlled, and so the data may contain out-of-range values (e.g.,
Income: -100), impossible data combinations (e.g., Gender: Male, Pregnant:
Yes), and the like. Analyzing data that has not been carefully screened for
such problems can produce highly misleading results, in particular in
predictive data mining.
DATA REDUCTION (for Data Mining)
![]() |
Image source : medium.com |
The term Data Reduction in the context of data mining is
usually applied to projects where the goal is to aggregate or amalgamate the
information contained in large datasets into manageable (smaller) information
nuggets. Data reduction methods can include simple tabulation, aggregation
(computing descriptive statistics) or more sophisticated techniques like
clustering, principal components analysis, etc.
DEPLOYMENT
![]() |
Image source : stackify.com |
The concept of deployment in predictive data mining refers
to the application of a model for prediction or classification to new data.
After a satisfactory model or set of models has been identified (trained) for a
particular application, we usually want to deploy those models so that
predictions or predicted classifications can quickly be obtained for new data.
For example, a credit card company may want to deploy a trained model or set of
models (e.g., neural networks, meta-learner) to quickly identify transactions
which have a high probability of being fraudulent.
DRILL-DOWN ANALYSIS
![]() |
Image source : prezi.com |
The concept of drill-down analysis applies to the area of
data mining, to denote the interactive exploration of data, in particular of
large databases. The process of drill-down analyses begins by considering some
simple break-downs of the data by a few variables of interest (e.g., Gender,
geographic region, etc.).
Various statistics, tables, histograms, and other graphical
summaries can be computed for each group. Next, we may want to "drill-down"
to expose and further analyze the data "underneath" one of the
categorizations, for example, we might want to further review the data for
males from the mid-west. Again, various statistical and graphical summaries can
be computed for those cases only, which might suggest further break-downs by
other variables (e.g., income, age, etc.). At the lowest ("bottom")
level are the raw data: For example, you may want to review the addresses of
male customers from one region, for a certain income group, etc., and to offer
to those customers some particular services of particular utility to that
group.
Reference source : documentation(dot)statsoft(dot)com
About Data | Preparation, Reduction | Deployment and Drill-Down Analysis
Reviewed by AIA
on
December 21, 2019
Rating:

No comments: