BAGGING (Voting, Averaging)
The concept of bagging (voting for classification,
averaging for regression-type problems with continuous dependent variables of
interest) applies to the area of predictive data mining, to combine the
predicted classifications (prediction) from multiple models, or from the same
type of model for different learning data.
It is also used to address the inherent instability of
results when applying complex models to relatively small data sets. Suppose
your data mining task is to build a model for predictive classification, and
the dataset from which to train the model (learning data set, which contains
observed classifications) is relatively small.
You could repeatedly sub-sample (with replacement) from the
dataset, and apply, for example, a tree classifier (e.g., C&RT and CHAID) to the successive
samples. In practice, very different trees will often be grown for the
different samples, illustrating the instability of models often evident with
small data sets. One method of deriving a single prediction (for new
observations) is to use all trees found in the different samples, and to apply
some simple voting:
The final classification is the one most often predicted by
the different trees. Note that some
weighted combination of predictions (weighted vote, weighted average) is also
possible, and commonly used. A sophisticated (machine learning) algorithm for
generating weights for weighted prediction or voting is the Boosting procedure.
BOOSTING
The concept of boosting applies to the area of predictive
data mining, to generate multiple models or classifiers (for prediction or
classification), and to derive weights to combine the predictions from those
models into a single prediction or predicted classification (see also Bagging).
A simple
algorithm for boosting works like this : Start by applying some method (e.g., a tree classifier
such as C&RT or CHAID) to the learning data, where each observation is
assigned an equal weight.
Compute the predicted classifications, and apply weights to
the observations in the learning sample that are inversely proportional to the
accuracy of the classification. In other words, assign greater weight to those
observations that were difficult to classify (where the misclassification rate
was high), and lower weights to those that were easy to classify (where the
misclassification rate was low).
In the context of C&RT for example, different
misclassification costs (for the different classes) can be applied, inversely
proportional to the accuracy of prediction in each class. Then apply the
classifier again to the weighted data (or with different misclassification
costs), and continue with the next iteration (application of the analysis
method for classification to the re-weighted data).
Boosting will generate a sequence of classifiers, where
each consecutive classifier in the sequence is an "expert" in classifying
observations that were not well classified by those preceding it. During
deployment (for prediction or classification of new cases), the predictions
from the different classifiers can then be combined (e.g., via voting, or some
weighted voting procedure) to derive a single best prediction or
classification.
Note that boosting can also be applied to learning methods
that do not explicitly support weights or misclassification
costs. In that case, random sub-sampling can be applied to the learning data in
the successive steps of the iterative boosting procedure, where the probability
for selection of an observation into the subsample is inversely proportional to
the accuracy of the prediction for that observation in the previous iteration
(in the sequence of iterations of the boosting procedure).
Reference source : documentation(dot)statsoft(dot)com
Bagging and Boosting
Reviewed by AIA
on
December 21, 2019
Rating:

No comments: