I am computing weighted means of subgroups using the groupby
and transform
approach. See below for an illustration.
My understanding is that the new name is sourcevar1_sourcevar2_function
because the weighted mean function does not return a single value or vector (explained here).
I need to get weighted means of several columns and so I am wondering if there is any way to set the column name within the transform command? Or does this have to be done in a separate step?
Thanks for helping with this!
using DataFrames
df = DataFrame(Region = ["state1", "state1", "state1", "state2", "state2", "state2"], Income = [10, 7, 12, 10, 7, 12], Weight = [51, 20, 86, 75, 125, 16])
gdf = groupby(df, :Region)
df_reg_mean_unweighted = transform(gdf, :Income => mean => :Region_mean_income_unweighted) # Income unweighted
│ Row │ Region │ Income │ Weight │ Region_mean_income_unweighted │
│ │ String │ Int64 │ Int64 │ Float64 │
├─────┼────────┼────────┼────────┼───────────────────────────────┤
│ 1 │ state1 │ 10 │ 51 │ 9.66667 │
│ 2 │ state1 │ 7 │ 20 │ 9.66667 │
│ 3 │ state1 │ 12 │ 86 │ 9.66667 │
│ 4 │ state2 │ 10 │ 75 │ 9.66667 │
│ 5 │ state2 │ 7 │ 125 │ 9.66667 │
│ 6 │ state2 │ 12 │ 16 │ 9.66667 │
df_reg_mean_weighted = transform(gdf, [:Income, :Weight] => (x, y) -> (mean(x, weights(y)))) # Income weighted
│ Row │ Region │ Income │ Weight │ Income_Weight_function │
│ │ String │ Int64 │ Int64 │ Float64 │
├─────┼────────┼────────┼────────┼────────────────────────┤
│ 1 │ state1 │ 10 │ 51 │ 10.7134 │
│ 2 │ state1 │ 7 │ 20 │ 10.7134 │
│ 3 │ state1 │ 12 │ 86 │ 10.7134 │
│ 4 │ state2 │ 10 │ 75 │ 8.41204 │
│ 5 │ state2 │ 7 │ 125 │ 8.41204 │
│ 6 │ state2 │ 12 │ 16 │ 8.41204 │
6 posts - 2 participants