I have a 12mln row data.frame in R with 6 columns. 5 columns lots of repeating strings. Exported CSV weighs ~3Gb, but object.size shows that it uses only 0.6Gb, or 8 bytes exactly for each cell. Does R use some in-memory columnar storage? #rstats
2
2
1
Character variables are internally hashed so you save on use of repeated values; that was an important change many (many!!) moons ago. We can all bow in the general direction of Iowa and thank @LukeTierney4 for this (and so many other internal improvements). #rstats
Jan 19, 2023 · 9:03 PM UTC
1
1
13


