panda DF baru dari groupby
df = pd.DataFrame(old_df.groupby(['groupby_attribute'])['mean_attribute'].mean())
df = df.reset_index()
df
Cheerful Cheetah
df = pd.DataFrame(old_df.groupby(['groupby_attribute'])['mean_attribute'].mean())
df = df.reset_index()
df
data.groupby('amount', as_index=False).agg({"duration": "sum"})
When you use as_index=False , you indicate to groupby() that you don't want to set the column ID as the index (duh!). ... Using as_index=True allows you to apply a sum over axis=1 without specifying the names of the columns, then summing the value over axis 0.
pd.DataFrame( {'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6]})
df.groupby('a')['b'].apply(list)
Out:
a
A [1, 2]
B [5, 5, 4]
C [6]
Name: b, dtype: object
# Groups the DataFrame using the specified columns
df.groupBy().avg().collect()
# [Row(avg(age)=3.5)]
sorted(df.groupBy('name').agg({'age': 'mean'}).collect())
# [Row(name='Alice', avg(age)=2.0), Row(name='Bob', avg(age)=5.0)]
sorted(df.groupBy(df.name).avg().collect())
# [Row(name='Alice', avg(age)=2.0), Row(name='Bob', avg(age)=5.0)]
sorted(df.groupBy(['name', df.age]).count().collect())
# [Row(name='Alice', age=2, count=1), Row(name='Bob', age=5, count=1)]
>>> emp.groupby(['dept', 'gender']).agg({'salary':'mean'}).round(-3)