I have a function that needs to merge a large user dataset with standard column names (upto ~10M rows) with an internal dataset (~5k rows). The internal dataset is always the same.
We currently use data.table for this. Are there any faster options?
#rstats #rlang #DataScience
3
5
5
Also, we have python users in our team and more than happy to outsource this particular problem to any other language if it's faster. Speed is the end goal, but merging is slow. I considered writing it in C++ to merge a hardcoded file.
#python #julialang #rcpp #cpp
1
Per the @Rdatatable benchmarks at h2oai.github.io/db-benchmark… (which includes a 'join' problem likely close to your merge issue) you are probably unlikely to beat `data.table` just by going to #Python or #Rcpp. Maybe profile a little and then discuss with team @Rdatatable?
May 27, 2021 · 7:31 PM UTC
1
1
5


