Random Forest, receiver operating characteristic curves (ROC), Principal Component Analysis (PCA) and t-SNE show effect of over-sampling on data visualization and classification. For comparison shown are plots for the original and over-sampled data. For comparison shown are plots for the original and data following over-sampling. Plots are also included in the zipped downloadable file set.

Random Forest Classification

ROC before over-sampling


ROC after over-sampling


Data Visualization

PCA before over-sampling


PCA after over-sampling


t-SNE before over-sampling


t-SNE after over-sampling