From these questions - Random sample of rows from subset of an R dataframe & Random rows in dataframe in R I can easily see how to randomly sample 'n' rows from a df, or 'n' rows that originate from a specific level of a factor within a df.
Here are some sample data:
df <- data.frame(matrix(rnorm(80), nrow=40)) df$color <- rep(c("blue", "red", "yellow", "pink"), each=10) df[sample(nrow(df), 3), ] #samples 3 random rows from df, without replacement.
To e.g. just sample 3 random rows from 'pink' color - using library(kimisc):
library(kimisc) sample.rows(subset(df, color == "pink"), 3)
or writing custom function:
sample.df <- function(df, n) df[sample(nrow(df), n), , drop = FALSE] sample.df(subset(df, color == "pink"), 3)
However, what I am trying to do is create a new df that contains 3 (or n) random row from all levels of the factor. i.e. the new df would have 12 rows (3 from blue, 3 from red, 3 from yellow, 3 from pink). It's obviously possible to run this several times, create newdfs for each color, and then bind them together. However, I am trying to work out a simpler solution, for when there are many, many levels that I need to do this across.