## Some R Packages for ROC Curves

In a recent post, I presented some of the theory underlying ROC curves, and outlined the history leading up to their present popularity for characterizing the performance of machine learning models. In this post, I describe how to search CRAN for packages to plot ROC curves, and highlight six useful packages.

Although I began with a few ideas about packages that I wanted to talk about, like ROCR and pROC, which I have found useful in the past, I decided to use Gábor Csárdi’s relatively new package pkgsearch to search through CRAN and see what’s out there. The package_search() function takes a text string as input and uses basic text mining techniques to search all of CRAN. The algorithm searches through package text fields, and produces a score for each package it finds that is weighted by the number of reverse dependencies and downloads.

After some trial and error, I settled on the following query, which includes a number of interesting ROC-related packages.

Then, I narrowed down the field to 46 packages by filtering out orphaned packages and packages with a score less than 190.

To complete the selection process, I did the hard work of browsing the documentation for the packages to pick out what I thought would be generally useful to most data scientists. The following plot uses Guangchuang Yu’s dlstats package to look at the download history for the six packages I selected to profile.

### ROCR – 2005

ROCR has been around for almost 14 years, and has be a rock-solid workhorse for drawing ROC curves. I particularly like the way the performance() function has you set up calculation of the curve by entering the true positive rate, tpr , and false positive rate, fpr , parameters. Not only is this reassuringly transparent, it shows the flexibility to calculate nearly every performance measure for a binary classifier by entering the appropriate parameter. For example, to produce a precision-recall curve, you would enter prec and rec . Although there is no vignette, the documentation of the package is very good.

The following code sets up and plots the default ROCR ROC curve using a synthetic data set that comes with the package. I will use this same data set throughout this post.

### pROC – 2020

It is clear from the downloads curve that pROC is also popular with data scientists. I like that it is pretty easy to get confidence intervals for the Area Under the Curve, AUC , on the plot.

### PRROC – 2020

Although not nearly as popular as ROCR and pROC , PRROC seems to be making a bit of a comeback lately. The terminology for the inputs is a bit eclectic, but once you figure that out the roc.curve() function plots a clean ROC curve with minimal fuss. PRROC is really set up to do precision-recall curves as the vignette indicates.

### plotROC – 2020

plotROC is an excellent choice for drawing ROC curves with ggplot() . My guess is that it appears to enjoy only limited popularity because the documentation uses medical terminology like “disease status” and “markers”. Nevertheless, the documentation, which includes both a vignette and a Shiny application, is very good.

The package offers a number of feature-rich ggplot() geoms that enable the production of elaborate plots. The following plot contains some styling, and includes Clopper and Pearson (1934) exact method confidence intervals.

### precrec – 2020

precrec is another library for plotting ROC and precision-recall curves.

Parameter options for the evalmod() function make it easy to produce basic plots of various model features.

### ROCit – 2020

ROCit is a new package for plotting ROC curves and other binary classification visualizations that rocketed onto the scene in January, and is climbing quickly in popularity. I would never have discovered it if I had automatically filtered my original search by downloads. The default plot includes the location of the Yourden’s J Statistic.

Several other visualizations are possible. The following plot shows the cumulative densities of the positive and negative responses. The KS statistic shows the maximum distance between the two curves.

In this attempt to dig into CRAN and uncover some of the resources R contains for plotting ROC curves and other binary classifier visualizations, I have only scratched the surface. Moreover, I have deliberately ignored the many packages available for specialized applications, such as survivalROC for computing time-dependent ROC curves from censored survival data, and cvAUC, which contains functions for evaluating cross-validated AUC measures. Nevertheless, I hope that this little exercise will help you find what you are looking for.

## sklearn.metrics .roc_auc_score¶

Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.

Note: this implementation can be used with binary, multiclass and multilabel classification, but some restrictions apply (see Parameters).

Read more in the User Guide .

Parameters **y_true** array-like of shape (n_samples,) or (n_samples, n_classes)

True labels or binary label indicators. The binary and multiclass cases expect labels with shape (n_samples,) while the multilabel case expects binary label indicators with shape (n_samples, n_classes).

**y_score** array-like of shape (n_samples,) or (n_samples, n_classes)

Target scores. In the binary and multilabel cases, these can be either probability estimates or non-thresholded decision values (as returned by decision_function on some classifiers). In the multiclass case, these must be probability estimates which sum to 1. The binary case expects a shape (n_samples,), and the scores must be the scores of the class with the greater label. The multiclass and multilabel cases expect a shape (n_samples, n_classes). In the multiclass case, the order of the class scores must correspond to the order of labels , if provided, or else to the numerical or lexicographical order of the labels in y_true .

If None , the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data: Note: multiclass ROC AUC currently only handles the ‘macro’ and ‘weighted’ averages.

Calculate metrics globally by considering each element of the label indicator matrix as a label.

Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.

Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label).

Calculate metrics for each instance, and find their average.

Will be ignored when y_true is binary.

**sample_weight** array-like of shape (n_samples,), default=None

**max_fpr** float > 0 and

If not None , the standardized partial AUC [2] over the range [0, max_fpr] is returned. For the multiclass case, max_fpr , should be either equal to None or 1.0 as AUC ROC partial computation currently is not supported for multiclass.

**multi_class** <‘raise’, ‘ovr’, ‘ovo’>, default=’raise’

Multiclass only. Determines the type of configuration to use. The default value raises an error, so either ‘ovr’ or ‘ovo’ must be passed explicitly.

Computes the AUC of each class against the rest [3] [4]. This treats the multiclass case in the same way as the multilabel case. Sensitive to class imbalance even when average == ‘macro’ , because class imbalance affects the composition of each of the ‘rest’ groupings.

Computes the average AUC of all possible pairwise combinations of classes [5]. Insensitive to class imbalance when average == ‘macro’ .

**labels** array-like of shape (n_classes,), default=None

Multiclass only. List of labels that index the classes in y_score . If None , the numerical or lexicographical order of the labels in y_true is used.

Returns **auc** float

Area under the precision-recall curve

Compute Receiver operating characteristic (ROC) curve

Provost, F., Domingos, P. (2000). Well-trained PETs: Improving probability estimation trees (Section 6.2), CeDER Working Paper #IS-00-04, Stern School of Business, New York University.

## Compare ROCs (RapidMiner Studio Core)

## Synopsis

## Description

The Compare ROCs operator is a nested operator i.e. it has a subprocess. The operators in the subprocess must produce a model. This operator calculates ROC curves for all these models. All the ROC curves are plotted together in the same plotter.

The comparison is based on the average values of a k-fold cross validation. Please study the documentation of the Cross Validation operator for more information about cross validation. Alternatively, this operator can use an internal split into a test and a training set from the given data set in this case the operator behaves like the Split Validation operator. Please note that any former predicted label of the given ExampleSet will be removed during the application of this operator.

ROC curve is a graphical plot of the sensitivity, or true positive rate, vs. false positive rate (one minus the specificity or true negative rate), for a binary classifier system as its discrimination threshold is varied. The ROC can also be represented equivalently by plotting the fraction of true positives out of the positives (TPR = true positive rate) vs. the fraction of false positives out of the negatives (FPR = false positive rate).

ROC curves are calculated by first ordering the classified examples by confidence. Afterwards all the examples are taken into account with decreasing confidence to plot the false positive rate on the x-axis and the true positive rate on the y-axis. With optimistic, neutral and pessimistic there are three possibilities to calculate ROC curves. If there is more than one example for a confidence with optimistic ROC calculation the correct classified examples are taken into account before looking at the false classification. With pessimistic calculation it is the other way round: wrong classifications are taken into account before looking at correct classifications. Neutral calculation is a mix of both calculation methods described above. Here correct and false classifications are taken into account alternately. If there are no examples with equal confidence or all examples with equal confidence are assigned to the same class the optimistic, neutral and pessimistic ROC curves will be the same.

## Input

- example set (IOObject)

This input port expects an ExampleSet with binominal label. It is the output of the Retrieve operator in the attached Example Process. The output of other operators can also be used as input.

## Output

- example set (IOObject)

The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

rocComparison (ROC Comparison)

The ROC curves for all the models are delivered from this port. All the ROC curves are plotted together in the same plotter.

## Parameters

- number_of_folds This parameter specifies the number of folds to use for the cross validation evaluation. If this parameter is set to -1 this operator uses split ratio and behaves like the Split Validation operator. Range: integer
- split_ratio This parameter specifies the relative size of the training set. It should be between 1 and 0, where 1 means that the entire ExampleSet will be used as training set. Range: real
- sampling_type Several types of sampling can be used for building the subsets. Following options are available:
- Linear sampling: Linear sampling simply divides the ExampleSet into partitions without changing the order of the examples i.e. subsets with consecutive examples are created.
- Shuffled sampling: Shuffled sampling builds random subsets of the ExampleSet. Examples are chosen randomly for making subsets.
- Stratified sampling: Stratified sampling builds random subsets and ensures that the class distribution in the subsets is the same as in the whole ExampleSet. For example in the case of a binominal classification, Stratified sampling builds random subsets so that each subset contains roughly the same proportions of the two values of class labels.

Range: selection

- use_local_random_seed This parameter indicates if a
*local random seed*should be used for randomizing examples of a subset. Using the same value of*local random seed*will produce the same subsets. Changing the value of this parameter changes the way examples are randomized, thus subsets will have a different set of examples. This parameter is only available if Shuffled or Stratified sampling is selected. It is not available for Linear sampling because it requires no randomization, examples are selected in sequence. Range: boolean - local_random_seed This parameter specifies the
*local random seed*. This parameter is only available if the*use local random seed*parameter is set to true. Range: integer - use_example_weights This parameter indicates if example weights should be considered. If this parameter is not set to true then weight 1 is used for each example. Range: boolean
- roc_bias This parameter determines how the ROC are evaluated i.e. correct predictions are counted first, last, or alternately. ROC curves are calculated by first ordering the classified examples by confidence. Afterwards all the examples are taken into account with decreasing confidence to plot the false positive rate on the x-axis and the true positive rate on the y-axis. With optimistic, neutral and pessimistic there are three possibilities to calculate ROC curves. If there are no examples with equal confidence or all examples with equal confidence are assigned to the same class the optimistic, neutral and pessimistic ROC curves will be the same.
- optimistic: If there is more than one example for a confidence with optimistic ROC calculation the correct classified examples are taken into account before looking at the false classification.
- pessimistic: With pessimistic calculation wrong classifications are taken into account before looking at correct classifications.
- neutral: Neutral calculation is a mix of both optimistic and pessimistic calculation methods. Here correct and false classifications are taken into account alternately.

Range: selection

## Tutorial Processes

### Comparing different classifiers graphically by ROC curves

This process shows how several different classifiers could be graphically compared by means of multiple ROC curves. The ‘Ripley-Set’ data set is loaded using the Retrieve operator. The Compare ROCs operator is applied on it. Have a look at the subprocess of the Compare ROCs operator. You can see that three different learners are applied i.e. Naive Bayes, Rule Induction and Decision Tree. The resultant models are connected to the outputs of the subprocess. The Compare ROCs operator calculates ROC curves for all these models. All the ROC curves are plotted together in the same plotter which can be seen in the Results Workspace.

© RapidMiner GmbH 2020. All rights reserved.