themis: Extra Steps for tidymodels + recipes
Working with unbalanced data sets? Remember that accuracy, alone, is not the best performance metric (especially when dealing with unbalanced data). Instead, place more importance on Cohen’s kappa coefficient, F1 harmonic mean, or focus on improving your model’s specificity or sensitivity, etc.
I’ve been transitioning a lot of my workflows to the tidymodels framework and I am super excited about the future of tidymodels (recipes + parsnip + dials + tune + workflow + more 😭✊🙌). If you’re using recipes often like me, a new library called {themis}, by Emil Hvitfeldt expands the {recipes} pre-processing steps for working with unbalanced data sets (it adds functionality for under- and hybrid-sampling techniques). I love me some smote, and now I can incorporate this sampling technique into my recipes with themis::step_smote()
!
# Installation
install.packages("themis")
# Example workflow
library(recipes)
library(modeldata)
library(themis)
data(okc)
sort(table(okc$Class, useNA = "always"))
#>
#> <NA> stem other
#> 0 9539 50316
<- recipe(Class ~ age + height, data = okc) %>%
ds_rec step_meanimpute(all_predictors()) %>%
step_smote(Class) %>%
prep()
sort(table(bake(ds_rec, new_data = NULL)$Class, useNA = "always"))
#>
#> <NA> stem other
#> 0 50316 50316
Source:
* themis
* themis on GitHub