R^4 #15: tidyverse and data.table, sitting side-by-side ... (part 1) Using a simple example to compare and contrast two approaches goo.gl/o92xJe #rstats h/t @gelliottmorris

Jan 21, 2018 · 11:44 PM UTC

9
26
4
64
I use both regularly, often together. A relevant question: what is more likely in the next 5 years, TV getting faster or DT code getting less esoteric and more intuitive?
4
Though it may cost you some $$$ github.com/Rdatatable/data.t… (I should borrow from that one for my matrixStats benchmark reports)
1
3
It's easy to read, if you know SQL. If not, you can crutch it with the tidyverse
Not really, if you know SQL it helps, if you don't, there's no need to learn it to use data.table. I have found data Table easier to use, and the speed just pushes it over the edge for me between the 2.
1
Do you use sqldf?
1
Relatively fair comparison. I think package number is a weird metric. Having date and reading funcs in data.table is good for simplicity, but tidyverse has been splitting out packages/functions for valid reasons too. Speed is clear. f-read is a f-reak. Readability favors tidy.
1
3
The tidyverse function can be written with a line of code less than the data.table function. However, still not faster.
2
Data.table is not that hard to read. I see no reason not using dtplyr at least. Data.table is fast! And I always teach data.table alongside tidyverse
2
5
Is the code buggy? in the comments you are aiming for 14 days and (as I read it & I may be wrong) you are getting 14 entries & there are days with multiple poll ends polls_2016 %>% group_by(end_date) %>% summarise(n_finish = n()) %>% filter(n_finish > 1) %>% arrange(-n_finish)
1