Collect existing data and group specific variables for future operations. Many operations are performed in groups.
Example: Grouping by age and sex (male/female) can be useful in a data set if we are interested in the scores of women of a certain age compared to men of a certain age (or the comparison of ages between men and women). women).
Let's create a sample dataset to reflect this example (to avoid input errors, copy and paste this into your script):
## Creation of an identification number to represent 50 individual personsid <- C(1:50)## Creation of gender variable (25 men/25 women)sex <- representative(C("macho","female"),25)# rep means replicate## Creating age variable (20-39 years old)Age <- C(26,25,39,37,31,34,34,30,26,33,39,28,26,29,33,22,35,23,26,36,21,20,31,21,35,39,36,22,22,25,27,30,26,34,38,39,30,29,26,25,26,36,23,21,21,39,26,26,27,21)## Creating a dependent variable named ScoreScore <- C(0,010,0,418,0,014,0,090,0,061,0,328,0,656,0,002,0,639,0,173,0,076,0,152,0,467,0,186,0,520,0,493,0,388,0,501,0,800,0,482,0,384,0,046,0,920,0,865,0,625,0,035,0,501,0,851,0,285,0,752,0,686,0,339,0,710,0,665,0,214,0,560,0,287,0,665,0,630,0,567,0,812,0,637,0,772,0,905,0,405,0,363,0,773,0,410,0,535,0,449)## Create a unified dataset that brings together all variablesdata <- tibble(ID, Sex, Age, Score)
Let's say I want to calculate/compare the mean
score(and other measures) for males and females separately:
Given%>% group by(Sex)%>% resume(metro = mean(Score),# average s = Dakota del Sur(Score),# calculate the standard deviation norte = norte())%>% # calculate the total number of observations ungroup()
## # A tibble: 2 x 4## Sexo m s n## <chr> <dbl> <dbl> <int>## 1 femenino 0,437 0,268 25## 2 masculino 0,487 0,268 25
In the above code, we group by
Sex, which means that the calculations made with our data will take into account men and women separately. After running the code, the console displays the average
score, the standard deviation (
Dakota del Sur) and the total number of participants (
norte()) for women and for men (
group_by(Sexo)). That is, the average
scorefor women it is 0.437 and the average
scorefor men it is 0.487.
Let's group by
Eranext (the order in which the variables appear within
group_by()It doesn't matter):
Given%>% group by(gender, age)%>% # grouped by sex and age resume(metro = mean(Score),s = Dakota del Sur(Score),norte = norte())%>% ungroup()
## # A tibble: 27 x 5## Sex Age m s n## <chr> <dbl> <dbl> <dbl> <int> ## 1 female 20 0.046 NaN 1## 2 female 21 0.740 0.253 3## 3 Woman 22 0.672 0.253 2## 4 Woman 23 0.501 NaN 1## 5 Woman 25 0.579 0.167 3## 6 Woman 26 0.41 NaN 1## 7 Woman 28 0.152 NaN 1## 8 Woman 29 0.426 0.339 2## 9 Woman 30 0.170 0.238 2## 10 woman 33 0.173 NaN 1## # ... with 17 more lines
There are now considerably more lines (27 lines) in this output. When performing calculations, R now considers every combination of
Sex. For example, the average 25-year-old woman scored 0.579.
We also see that some standard deviation (NaN) values are missing. This is because calculating the standard deviation requiresmore than oneparticipant observation.
we could also use
group_by()to add a new column based on the group.
Given%>% group by(Sex)%>% tap(metro = mean(Score))%>% # calculate the average score by gender ungroup()
## # A tibble: 50 x 5## ID Sex Age Score m## <int> <chr> <dbl> <dbl> <dbl>## 1 1 male 26 0.01 0.487## 2 2 female 25 0.418 0.437# # 3 3 male 39 0.014 0.487## 4 4 female 37 0.09 0.437## 5 5 male 31 0.061 0.487## 6 6 female 34 0.328 0.437## 7 7 male 34 0.656 0.487## 8 0.4 female 303# 26 male 9 0.639 0.487## 10 10 woman 33 0.173 0.437## # ... with another 40 lines
Instead of collapsing all the rows into a summary value,
to change()add a new column (
metro) containing the mean male and female score. The mean scores in the column
metrocorresponds to the value of
Sexcolumn (0.487 for men and 0.437 for women).
ungroup()always used after
group()after performing the calculations. if you forget
ungroup()data, future data management is likely to produce errors.All the time
ungroup()when you finish your calculations.
Let's see an example of when it's important to ungroup:
## Example 1Given%>% group by(Sex)%>% tap(metro = mean(Age))%>% # calculate the average age of men and women tap(x = mean(Score))%>% # count the number of participants ungroup()# closing ungroup()
Compare this to the code you include
ungroup()snuggled between the two
## Example 2Given%>% group by(Sex)%>% tap(metro = mean(Age))%>% # calculate the average age of men and women ungroup()%>% # ungroup nested() tap(x = mean(Score))# count the number of participants
In the first example,
metro, which averages
Era, is 29.2 if the participant is a man or 28.96 if she is a woman.
X, which averages
score, is 0.487 for men and 0.437 for women. For both calculations, the data is grouped by
In the second example,
Erafor males separated from females as in the first example. Nevertheless,
Xequal to one
scoreof 0.462 for each line/observation. This is because
group_by(Sexo)is removed via
ungroup()after the first
to change()function. Here,
No method is right or wrong, it depends on what you are trying to achieve. When deciding where to place the
ungroup()function, ask yourself: does it make sense to compute different values for this
Variable? If so, the
group_by(Variable)The function must be writtenbeforethe calculation function (mutate/resume).
group_by(), you must have a match
ungroup()somewhere. Even if you don't plan on doing additional calculations, it's a good habit to keep. making sure that
ungroup()it isespecially important when creating objects!!
## Creating/Saving the object named "data1"dados1 <- Given%>% group by(Sex)%>% tap(metro = mean(Age))## Use the data1 object after previously saving it (WITHOUT ungrouping)data1%>% tap(x = mean(Score))
every time you use
data1, a saved object, will automatically have
group_by(Sexo)as part of its definition and other calculations will take these grouping variables into account.
ungroup()it can get even more complex!
## Creating/Saving the object named "data1"dados1 <- Given%>% group by(Sex)%>% tap(metro = mean(Age))## Use the data1 object after previously saving it (WITHOUT ungrouping)data1%>% group by(Age)%>% tap(x = mean(Score))%>% ungroup()
Now the second piece of code is grouped by
Era. that is the
scorefor each combination of
Era. Even if that's what you wanted, it's better to specify it on every line. As your scripts get larger, it can be difficult to remember details about object definitions, especially if it's a special grouping variable. Keep your objects as simple as possible!
This is the proper method for saving and using the
dados1 <- Given%>% group by(Sex)%>% tap(metro = mean(Age))%>% ungroup()# Ungroup at the end of a definition!!!data1%>% group by(gender, age)%>% # group relevant variables here tap(x = mean(Score))%>% ungroup()
To complete,ALWAYS UNGROUP AFTER GROUP