I have a 12mln row data.frame in R with 6 columns. 5 columns lots of repeating strings. Exported CSV weighs ~3Gb, but object.size shows that it uses only 0.6Gb, or 8 bytes exactly for each cell. Does R use some in-memory columnar storage? #rstats
2
2
1
Replying to @mpiktas
Character variables are internally hashed so you save on use of repeated values; that was an important change many (many!!) moons ago. We can all bow in the general direction of Iowa and thank @LukeTierney4 for this (and so many other internal improvements). #rstats

Jan 19, 2023 · 9:03 PM UTC

1
1
13
Actually, string hashing was added by @sfalcon , many moons ago as you say.
1
7