In [2]:
# Import pandas and Create a Sample DataFrame

import pandas as pd

data = {
    'Region': ['East', 'East', 'West', 'West', 'South', 'South', 'East'],
    'Salesperson': ['Alice', 'Bob', 'Alice', 'Bob', 'Alice', 'Bob', 'Alice'],
    'Month': ['Jan', 'Jan', 'Feb', 'Feb', 'Jan', 'Feb', 'Feb'],
    'Sales': [200, 120, 340, 150, 300, 250, 180]
}

df = pd.DataFrame(data)
print(df)
  Region Salesperson Month  Sales
0   East       Alice   Jan    200
1   East         Bob   Jan    120
2   West       Alice   Feb    340
3   West         Bob   Feb    150
4  South       Alice   Jan    300
5  South         Bob   Feb    250
6   East       Alice   Feb    180

Basic Pivot Table¶

In [3]:
# Total Sales by Salesperson and Month

pivot = pd.pivot_table(df, values='Sales', index='Salesperson', columns='Month', aggfunc='sum')
print(pivot)
Month        Feb  Jan
Salesperson          
Alice        520  500
Bob          400  120

Multi-Level Pivot Table¶

In [4]:
pivot_multi = pd.pivot_table(
    df, 
    values='Sales', 
    index=['Region', 'Salesperson'], 
    columns='Month', 
    aggfunc='sum', 
    fill_value=0
)
print(pivot_multi)
Month               Feb  Jan
Region Salesperson          
East   Alice        180  200
       Bob            0  120
South  Alice          0  300
       Bob          250    0
West   Alice        340    0
       Bob          150    0

Reset index to flatten¶

In [7]:
flat = pivot_multi.reset_index()
print(flat)
Month Region Salesperson  Feb  Jan
0       East       Alice  180  200
1       East         Bob    0  120
2      South       Alice    0  300
3      South         Bob  250    0
4       West       Alice  340    0
5       West         Bob  150    0
In [17]:
flat.columns.name = None
print(flat)
  Region Salesperson  Feb  Jan
0   East       Alice  180  200
1   East         Bob    0  120
2  South       Alice    0  300
3  South         Bob  250    0
4   West       Alice  340    0
5   West         Bob  150    0

Understanding Multi-Index¶

In [18]:
# All data for Region 'East'
print(pivot_multi.loc['East'])

print("\n\n")
# Specific person in a region
print(pivot_multi.loc[('East', 'Alice')])
Month        Feb  Jan
Salesperson          
Alice        180  200
Bob            0  120



Month
Feb    180
Jan    200
Name: (East, Alice), dtype: int64

Multiple Aggregations¶

In [24]:
multi_agg = pd.pivot_table(
    df,
    values='Sales',
    index='Region',
    columns='Month', 
    aggfunc=['sum', 'mean']
)
print(multi_agg)
          sum          mean       
Month     Feb    Jan    Feb    Jan
Region                            
East    180.0  320.0  180.0  160.0
South   250.0  300.0  250.0  300.0
West    490.0    NaN  245.0    NaN

Pivot Table vs GroupBy in pandas¶

Overview¶

Both pivot_table() and groupby() are used to analyze and summarize data in pandas. However, they differ in structure, flexibility, and typical use cases.

Comparison Table¶

Feature groupby() pivot_table()
Function Groups data and applies aggregation Summarizes data like Excel Pivot Tables
Shape of Output Returns Series or DataFrame with grouped rows Returns matrix-like table with optional columns
Aggregation Use methods like .sum(), .mean() manually Use aggfunc argument directly
Indexing Works on rows only Works on both rows and columns
Multi-Aggregation Use .agg() for multiple functions Use aggfunc with multiple functions
Missing Value Handling No fill options by default Has fill_value to replace NaNs

Example: groupby()¶

import pandas as pd

df.groupby(['Region', 'Salesperson'])['Sales'].sum()

Returns a Series with a MultiIndex based on Region and Salesperson.

Example: pivot_table()¶

pd.pivot_table(
    df, 
    values='Sales', 
    index='Region', 
    columns='Salesperson', 
    aggfunc='sum', 
    fill_value=0
)

Returns a 2D table with Regions as rows and Salespersons as columns.

Use Case Guidance¶

Scenario Use groupby() Use pivot_table()
Row-level grouping Yes No
Cross-tabulation or 2D summary tables No Yes
Full control with custom logic Yes No
Excel-style summaries No Yes

Tips¶

  • Use groupby() for flexible and programmable aggregation logic.
  • Use pivot_table() for clean, spreadsheet-style summaries.
  • Both share similar aggregation backends, but suit different use cases.