In [2]:
# Import pandas and Create a Sample DataFrame
import pandas as pd
data = {
'Region': ['East', 'East', 'West', 'West', 'South', 'South', 'East'],
'Salesperson': ['Alice', 'Bob', 'Alice', 'Bob', 'Alice', 'Bob', 'Alice'],
'Month': ['Jan', 'Jan', 'Feb', 'Feb', 'Jan', 'Feb', 'Feb'],
'Sales': [200, 120, 340, 150, 300, 250, 180]
}
df = pd.DataFrame(data)
print(df)
Region Salesperson Month Sales 0 East Alice Jan 200 1 East Bob Jan 120 2 West Alice Feb 340 3 West Bob Feb 150 4 South Alice Jan 300 5 South Bob Feb 250 6 East Alice Feb 180
Basic Pivot Table¶
In [3]:
# Total Sales by Salesperson and Month
pivot = pd.pivot_table(df, values='Sales', index='Salesperson', columns='Month', aggfunc='sum')
print(pivot)
Month Feb Jan Salesperson Alice 520 500 Bob 400 120
Multi-Level Pivot Table¶
In [4]:
pivot_multi = pd.pivot_table(
df,
values='Sales',
index=['Region', 'Salesperson'],
columns='Month',
aggfunc='sum',
fill_value=0
)
print(pivot_multi)
Month Feb Jan
Region Salesperson
East Alice 180 200
Bob 0 120
South Alice 0 300
Bob 250 0
West Alice 340 0
Bob 150 0
Reset index to flatten¶
In [7]:
flat = pivot_multi.reset_index()
print(flat)
Month Region Salesperson Feb Jan 0 East Alice 180 200 1 East Bob 0 120 2 South Alice 0 300 3 South Bob 250 0 4 West Alice 340 0 5 West Bob 150 0
In [17]:
flat.columns.name = None
print(flat)
Region Salesperson Feb Jan 0 East Alice 180 200 1 East Bob 0 120 2 South Alice 0 300 3 South Bob 250 0 4 West Alice 340 0 5 West Bob 150 0
Understanding Multi-Index¶
In [18]:
# All data for Region 'East'
print(pivot_multi.loc['East'])
print("\n\n")
# Specific person in a region
print(pivot_multi.loc[('East', 'Alice')])
Month Feb Jan Salesperson Alice 180 200 Bob 0 120 Month Feb 180 Jan 200 Name: (East, Alice), dtype: int64
Multiple Aggregations¶
In [24]:
multi_agg = pd.pivot_table(
df,
values='Sales',
index='Region',
columns='Month',
aggfunc=['sum', 'mean']
)
print(multi_agg)
sum mean Month Feb Jan Feb Jan Region East 180.0 320.0 180.0 160.0 South 250.0 300.0 250.0 300.0 West 490.0 NaN 245.0 NaN
Pivot Table vs GroupBy in pandas¶
Overview¶
Both pivot_table() and groupby() are used to analyze and summarize data in pandas. However, they differ in structure, flexibility, and typical use cases.
Comparison Table¶
| Feature | groupby() |
pivot_table() |
|---|---|---|
| Function | Groups data and applies aggregation | Summarizes data like Excel Pivot Tables |
| Shape of Output | Returns Series or DataFrame with grouped rows | Returns matrix-like table with optional columns |
| Aggregation | Use methods like .sum(), .mean() manually |
Use aggfunc argument directly |
| Indexing | Works on rows only | Works on both rows and columns |
| Multi-Aggregation | Use .agg() for multiple functions |
Use aggfunc with multiple functions |
| Missing Value Handling | No fill options by default | Has fill_value to replace NaNs |
Example: groupby()¶
import pandas as pd
df.groupby(['Region', 'Salesperson'])['Sales'].sum()
Returns a Series with a MultiIndex based on Region and Salesperson.
Example: pivot_table()¶
pd.pivot_table(
df,
values='Sales',
index='Region',
columns='Salesperson',
aggfunc='sum',
fill_value=0
)
Returns a 2D table with Regions as rows and Salespersons as columns.
Use Case Guidance¶
| Scenario | Use groupby() |
Use pivot_table() |
|---|---|---|
| Row-level grouping | Yes | No |
| Cross-tabulation or 2D summary tables | No | Yes |
| Full control with custom logic | Yes | No |
| Excel-style summaries | No | Yes |
Tips¶
- Use
groupby()for flexible and programmable aggregation logic. - Use
pivot_table()for clean, spreadsheet-style summaries. - Both share similar aggregation backends, but suit different use cases.