15 de Enero del 2023 | Jhonatan Montilla
En esta publicación veremos el uso de las tablas pivote en Pandas para el manejo y análisis de estructuras de datos. En este artículo encontrarás la guía más completa para la manipulación de datos con esta práctica y muy útil herramienta. Para obtener el conjunto de datos "hotel_reservations.csv" de ejemplo puedes descargarlo desde nuestro repositorio haciendo clic aquí.
import pandas as pd
data = pd.read_csv('hotel_reservations.csv')
data.head()
Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | INN00001 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00 | 0 | Not_Canceled |
1 | INN00002 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68 | 1 | Not_Canceled |
2 | INN00003 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00 | 0 | Canceled |
3 | INN00004 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00 | 0 | Canceled |
4 | INN00005 | 2 | 0 | 1 | 1 | Not Selected | 0 | Room_Type 1 | 48 | 2018 | 4 | 11 | Online | 0 | 0 | 0 | 94.50 | 0 | Canceled |
Convierte los datos de una variable en valores de índice sumando los datos del resto de variables
round(pd.pivot_table(data, index='market_segment_type'),2)
arrival_date | arrival_month | arrival_year | avg_price_per_room | lead_time | no_of_adults | no_of_children | no_of_previous_bookings_not_canceled | no_of_previous_cancellations | no_of_special_requests | no_of_week_nights | no_of_weekend_nights | repeated_guest | required_car_parking_space | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
market_segment_type | ||||||||||||||
Aviation | 15.36 | 7.12 | 2018.00 | 100.70 | 5.49 | 1.02 | 0.00 | 0.21 | 0.04 | 0.00 | 2.86 | 1.16 | 0.13 | 0.05 |
Complementary | 15.02 | 7.72 | 2017.64 | 3.14 | 12.04 | 1.48 | 0.13 | 2.48 | 0.21 | 0.88 | 1.24 | 0.33 | 0.32 | 0.08 |
Corporate | 15.70 | 7.10 | 2017.75 | 82.91 | 21.82 | 1.23 | 0.01 | 2.07 | 0.17 | 0.22 | 1.49 | 0.43 | 0.30 | 0.09 |
Offline | 15.40 | 7.57 | 2017.72 | 91.63 | 122.87 | 1.78 | 0.02 | 0.01 | 0.01 | 0.20 | 2.18 | 0.73 | 0.01 | 0.00 |
Online | 15.69 | 7.38 | 2017.87 | 112.26 | 75.33 | 1.94 | 0.15 | 0.01 | 0.01 | 0.84 | 2.29 | 0.89 | 0.00 | 0.04 |
Convierte los datos de una variable en columnas y las demás variables en valores de índice sumando sus datos en las nuevas columnas
round(pd.pivot_table(data, columns='market_segment_type'),2)
market_segment_type | Aviation | Complementary | Corporate | Offline | Online |
---|---|---|---|---|---|
arrival_date | 15.36 | 15.02 | 15.70 | 15.40 | 15.69 |
arrival_month | 7.12 | 7.72 | 7.10 | 7.57 | 7.38 |
arrival_year | 2018.00 | 2017.64 | 2017.75 | 2017.72 | 2017.87 |
avg_price_per_room | 100.70 | 3.14 | 82.91 | 91.63 | 112.26 |
lead_time | 5.49 | 12.04 | 21.82 | 122.87 | 75.33 |
no_of_adults | 1.02 | 1.48 | 1.23 | 1.78 | 1.94 |
no_of_children | 0.00 | 0.13 | 0.01 | 0.02 | 0.15 |
no_of_previous_bookings_not_canceled | 0.21 | 2.48 | 2.07 | 0.01 | 0.01 |
no_of_previous_cancellations | 0.04 | 0.21 | 0.17 | 0.01 | 0.01 |
no_of_special_requests | 0.00 | 0.88 | 0.22 | 0.20 | 0.84 |
no_of_week_nights | 2.86 | 1.24 | 1.49 | 2.18 | 2.29 |
no_of_weekend_nights | 1.16 | 0.33 | 0.43 | 0.73 | 0.89 |
repeated_guest | 0.13 | 0.32 | 0.30 | 0.01 | 0.00 |
required_car_parking_space | 0.05 | 0.08 | 0.09 | 0.00 | 0.04 |
Suma o totaliza los datos de una variable con índice de datos de otra variable
round(pd.pivot_table(data,
index='type_of_meal_plan',
values = 'avg_price_per_room').head(),2)
avg_price_per_room | |
---|---|
type_of_meal_plan | |
Meal Plan 1 | 103.59 |
Meal Plan 2 | 115.31 |
Meal Plan 3 | 41.20 |
Not Selected | 94.91 |
Cuenta la cantidad total de datos únicos de una variable a través de una función de agregación.
pd.pivot_table(data,
index='market_segment_type',
values = 'room_type_reserved',
aggfunc = 'nunique')
room_type_reserved | |
---|---|
market_segment_type | |
Aviation | 2 |
Complementary | 7 |
Corporate | 7 |
Offline | 7 |
Online | 7 |
Calcula la cantidad de total de datos únicos, el valor máximo y el promedio de una o más variables a través de la función de agregación correspondiente. Paso adicional el reinicio del índice.
df_mod = round(pd.pivot_table(data,
index = 'market_segment_type',
aggfunc = {'room_type_reserved': 'count',
'lead_time':'max',
'avg_price_per_room': 'mean'}),2)
df_mod = df_mod.reset_index()
df_mod
market_segment_type | avg_price_per_room | lead_time | room_type_reserved | |
---|---|---|---|---|
0 | Aviation | 100.70 | 23 | 125 |
1 | Complementary | 3.14 | 386 | 391 |
2 | Corporate | 82.91 | 297 | 2017 |
3 | Offline | 91.63 | 443 | 10528 |
4 | Online | 112.26 | 443 | 23214 |
Agrupa los datos de una variable en columnas calculando valores de otra variable a través de una función de agregación, suma, promedio, máximo y/o mínimo, reemplazando los valores vacíos con ceros.
round(pd.pivot_table(data,
index = 'market_segment_type',
columns = ['booking_status','arrival_year'],
values = 'avg_price_per_room',
aggfunc = 'mean',
fill_value=0).head(),2)
booking_status | Canceled | Not_Canceled | ||
---|---|---|---|---|
arrival_year | 2017 | 2018 | 2017 | 2018 |
market_segment_type | ||||
Aviation | 0.00 | 102.24 | 0.00 | 100.06 |
Complementary | 0.00 | 0.00 | 2.45 | 3.52 |
Corporate | 84.27 | 98.23 | 76.48 | 83.03 |
Offline | 97.67 | 101.35 | 86.85 | 88.12 |
Online | 91.06 | 116.12 | 99.37 | 113.18 |
Agrupa los datos de varias variables en columnas calculando valores de otra variable a través de una función de agregación, suma, promedio, máximo y/o mínimo, reemplazando los valores vacíos con ceros.
round(pd.pivot_table(data,
index = 'market_segment_type',
columns = 'booking_status',
values = ['no_of_weekend_nights','arrival_month'],
aggfunc = 'mean',
fill_value=0).head(),2)
arrival_month | no_of_weekend_nights | |||
---|---|---|---|---|
booking_status | Canceled | Not_Canceled | Canceled | Not_Canceled |
market_segment_type | ||||
Aviation | 6.62 | 7.33 | 2.19 | 0.73 |
Complementary | 0.00 | 7.72 | 0.00 | 0.33 |
Corporate | 6.63 | 7.16 | 0.39 | 0.43 |
Offline | 7.46 | 7.62 | 0.75 | 0.72 |
Online | 7.36 | 7.39 | 0.95 | 0.85 |
Agrupa el cáculo de valores suma, promedio, máximo y/o mínimo de una variable en columnas (función de agregación), reemplazando los valores vacíos con ceros.
round(pd.pivot_table(data,
index = 'market_segment_type',
columns = 'booking_status',
values = 'lead_time',
aggfunc = ['mean','max'],
fill_value=0).head(),2)
mean | max | |||
---|---|---|---|---|
booking_status | Canceled | Not_Canceled | Canceled | Not_Canceled |
market_segment_type | ||||
Aviation | 7.59 | 4.60 | 23 | 21 |
Complementary | 0.00 | 12.04 | 0 | 386 |
Corporate | 55.77 | 17.66 | 276 | 297 |
Offline | 209.71 | 85.75 | 443 | 386 |
Online | 115.73 | 52.11 | 443 | 351 |
Se realizan todas las agrupaciones anteriores en una misma tabla pivote.
df_mod = round(pd.pivot_table(data,
index = 'market_segment_type',
columns = ['booking_status','arrival_year'],
values = ['lead_time','no_of_weekend_nights'],
aggfunc = ['median','max'],
fill_value=0),2)
df_mod
median | max | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
lead_time | no_of_weekend_nights | lead_time | no_of_weekend_nights | |||||||||||||
booking_status | Canceled | Not_Canceled | Canceled | Not_Canceled | Canceled | Not_Canceled | Canceled | Not_Canceled | ||||||||
arrival_year | 2017 | 2018 | 2017 | 2018 | 2017 | 2018 | 2017 | 2018 | 2017 | 2018 | 2017 | 2018 | 2017 | 2018 | 2017 | 2018 |
market_segment_type | ||||||||||||||||
Aviation | 0 | 5 | 0 | 3 | 0 | 1 | 0 | 0.5 | 0 | 23 | 0 | 21 | 0 | 5 | 0 | 3 |
Complementary | 0 | 0 | 1 | 3 | 0 | 0 | 0 | 0.0 | 0 | 0 | 88 | 386 | 0 | 0 | 2 | 4 |
Corporate | 4 | 36 | 7 | 7 | 0 | 0 | 0 | 0.0 | 87 | 276 | 86 | 297 | 6 | 6 | 3 | 4 |
Offline | 117 | 239 | 59 | 67 | 0 | 1 | 0 | 1.0 | 273 | 443 | 327 | 386 | 2 | 5 | 2 | 4 |
Online | 96 | 100 | 16 | 41 | 1 | 1 | 1 | 1.0 | 296 | 443 | 327 | 351 | 5 | 7 | 4 | 6 |
Pudimos observar algunas de las configuraciones posibles en el uso de tablas pivote para el análisis de datos en nuestro conjunto de datos. Esperamos que esta publicación haya sido de ayuda para usted.