Does anyone know any packages or software that will let you split data into training and testing AND stratify on more than one variable? #bioinformatics #machinelearning #rstats #Python #compbio
5
5
1
8
Replying to @SamanthaLWilson
It's been a while since I looked at it but I am fairly certain that `caret` and/or `mlr3` already cover it for #rstats. It is a not uncommon task. Here is a tweet from just yesterday doing it for #rspatial data too:
New version of #rstats #rspatial package CAST allows visualizing whether training data for #MachineLearning have representative coverage of the prediction area and whether CV folds are appropriately chosen. Tutorial: hannameyer.github.io/CAST/ar… @MLdwig @edzerpebesma @carles_milagarc

Mar 18, 2022 · 3:56 PM UTC

3
1
2