In Pandas, reshaping data involves changing the structure of a DataFrame without altering the data itself. Two common methods for reshaping are melt()
and pivot()
. They are complementary functions: while melt()
transforms wide data into long format, pivot()
does the reverse, transforming long data back into wide format.
1. melt()
Function
The melt()
function is used to convert wide-format data into long-format data. Long format is useful when you have multiple columns that you want to stack into one, typically when you’re preparing data for analysis or visualization.
Syntax of melt()
:
pd.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value')
frame
: The DataFrame to melt.id_vars
: Column(s) to keep as identifier variables (they stay in their original form).value_vars
: Column(s) to melt (convert from wide to long).var_name
: Name of the new “variable” column (optional).value_name
: Name of the new “value” column (default is ‘value’).
Example:
Suppose you have the following DataFrame representing sales data across different years:
import pandas as pd
df = pd.DataFrame({
'Product': ['A', 'B', 'C'],
'2019': [200, 150, 100],
'2020': [220, 180, 130],
'2021': [250, 190, 160]
})
print(df)
Output:
Product 2019 2020 2021
0 A 200 220 250
1 B 150 180 190
2 C 100 130 160
You can use melt()
to convert this wide-format data into long-format:
df_melted = pd.melt(df, id_vars='Product', var_name='Year', value_name='Sales')
print(df_melted)
Output:
Product Year Sales
0 A 2019 200
1 B 2019 150
2 C 2019 100
3 A 2020 220
4 B 2020 180
5 C 2020 130
6 A 2021 250
7 B 2021 190
8 C 2021 160
Here, the data has been “melted” into long format, with Product
as the identifier variable (id_vars
), and the year columns (2019
, 2020
, 2021
) converted into rows under the Year
column. The sales values are now stored in the Sales
column.
2. pivot()
Function
The pivot()
function is the opposite of melt()
—it converts long-format data back into wide-format data, where one column’s values become the new columns.
Syntax of pivot()
:
DataFrame.pivot(index=None, columns=None, values=None)
index
: Column to use as the new index (the rows).
columns
: Column whose unique values will become the new columns.
values
: Column to populate the new table with data.
Example:
Let’s take the previously melted DataFrame df_melted
and pivot it back to the original wide format:
df_pivoted = df_melted.pivot(index='Product', columns='Year', values='Sales')
print(df_pivoted)
Output:
Year 2019 2020 2021
Product
A 200 220 250
B 150 180 190
C 100 130 160