【数据可视化-53】电子商务运输数据可视化分析

🧑 博主简介:曾任某智慧城市类企业算法总监,目前在美国市场的物流公司从事高级算法工程师一职,深耕人工智能领域,精通python数据挖掘、可视化、机器学习等,发表过AI相关的专利并多次在AI类比赛中获奖。CSDN人工智能领域的优质创作者,提供AI相关的技术咨询、项目开发和个性化解决方案等服务,如有需要请站内私信或者联系任意文章底部的的VX名片(ID:xf982831907

💬 博主粉丝群介绍:① 群内初中生、高中生、本科生、研究生、博士生遍布,可互相学习,交流困惑。② 热榜top10的常客也在群里,也有数不清的万粉大佬,可以交流写作技巧,上榜经验,涨粉秘籍。③ 群内也有职场精英,大厂大佬,可交流技术、面试、找工作的经验。④ 进群免费赠送写作秘籍一份,助你由写作小白晋升为创作大佬。⑤ 进群赠送CSDN评论防封脚本,送真活跃粉丝,助你提升文章热度。有兴趣的加文末联系方式,备注自己的CSDN昵称,拉你进群,互相学习共同进步。

在这里插入图片描述

一、引言

  在电子商务蓬勃发展的今天,物流运输作为连接商家与消费者的关键环节,其效率和可靠性直接影响着客户体验和企业竞争力。本报告聚焦于某国际电子商务公司的运输数据,通过可视化分析,深入挖掘影响产品准时交付的因素,旨在为企业优化物流策略、提升客户满意度提供数据支持。

二、数据集介绍

  本次分析所使用的数据集涵盖了10999个观测样本,包含12个关键特征:

  • ID: 客户唯一标识符
  • Warehouse_block: 仓库区块(A、B、C、D、E)
  • Mode_of_Shipment: 发货方式(船运、航空、公路)
  • Customer_care_calls: 客户服务通话次数
  • Customer_rating: 客户评分(1-5分)
  • Cost_of_the_Product: 产品成本(美元)
  • Prior_purchases: 历史购买次数
  • Product_importance: 产品重要性(低、中、高)
  • Gender: 客户性别
  • Discount_offered: 折扣力度
  • Weight_in_gms: 产品重量(克)
  • Reached.on.Time_Y.N: 是否准时到达(0表示准时,1表示延迟)

2.1技术工具

  • Python版本: 3.9
  • 代码编辑器: Jupyter Notebook
  • 数据处理库: pandas, numpy
  • 可视化库: matplotlib

2.2 导入数据

  我们将使用pandas库来加载数据,并进行初步的预处理:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from wordcloud import WordCloud
from collections import Counter
import string

# 导入数据
df = pd.read_csv('ecommerce_delivery.csv')

# 查看数据大小
print("数据大小:", df.shape)

# 查看数据基本信息
print("\n数据基本信息:")
print(df.info())
print("\n数据描述性统计:")
print(df.describe())

# 统计缺失值
print("\n缺失值统计:")
print(df.isnull().sum())

# 统计重复值
print("\n重复值数量:", df.duplicated().sum())

三、单变量分析

3.1 准时交付情况分布

plt.figure(figsize=(8, 8))
plt.pie(df['Reached.on.Time_Y.N'].value_counts(), labels=['准时', '延迟'], autopct='%1.1f%%', startangle=90)
plt.title('准时交付情况分布', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

  结果显示,约%59.7的订单能够准时交付,而%40.3的订单出现延迟。

3.2 发货方式分布

plt.figure(figsize=(10, 6))
plt.hist(df['Mode_of_Shipment'], color='skyblue', edgecolor='black')
plt.title('发货方式分布', fontsize=14, fontweight='bold')
plt.xlabel('发货方式', fontsize=12)
plt.ylabel('频率', fontsize=12)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

  船运运输占比最高,其次是公路和航空。

3.3 客户评分分布

import seaborn as sns

plt.figure(figsize=(10, 6))
sns.countplot(df['Customer_rating'], bins=5, color='lightgreen', edgecolor='black')
plt.title('客户评分分布', fontsize=14, fontweight='bold')
plt.xlabel('评分', fontsize=12)
plt.ylabel('频率', fontsize=12)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

四、多变量分析

4.1 发货方式与准时交付的关系

plt.figure(figsize=(12, 6))
shipment_mode = df['Mode_of_Shipment'].unique()
准时交付 = [df[(df['Mode_of_Shipment'] == mode) & (df['Reached.on.Time_Y.N'] == 0)].shape[0] for mode in shipment_mode]
延迟交付 = [df[(df['Mode_of_Shipment'] == mode) & (df['Reached.on.Time_Y.N'] == 1)].shape[0] for mode in shipment_mode]

plt.bar(shipment_mode, 准时交付, label='准时交付', color='skyblue')
plt.bar(shipment_mode, 延迟交付, bottom=准时交付, label='延迟交付', color='salmon')
plt.title('发货方式与准时交付的关系', fontsize=14, fontweight='bold')
plt.xlabel('发货方式', fontsize=12)
plt.ylabel('订单数量', fontsize=12)
plt.legend()
plt.grid(linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

4.2 客户评分与准时交付的关系

plt.figure(figsize=(12, 6))
customer_rating = df['Customer_rating'].unique()
准时交付 = [df[(df['Customer_rating'] == rating) & (df['Reached.on.Time_Y.N'] == 0)].shape[0] for rating in customer_rating]
延迟交付 = [df[(df['Customer_rating'] == rating) & (df['Reached.on.Time_Y.N'] == 1)].shape[0] for rating in customer_rating]

plt.bar(customer_rating, 准时交付, label='准时交付', color='lightgreen')
plt.bar(customer_rating, 延迟交付, bottom=准时交付, label='延迟交付', color='orange')
plt.title('客户评分与准时交付的关系', fontsize=14, fontweight='bold')
plt.xlabel('客户评分', fontsize=12)
plt.ylabel('订单数量', fontsize=12)
plt.legend()
plt.grid(linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

4.3 产品重要性与准时交付的关系

plt.figure(figsize=(12, 6))
importance_levels = df['Product_importance'].unique()
on_time_counts = [df[(df['Product_importance'] == imp) & (df['Reached.on.Time_Y.N'] == 0)].shape[0] for imp in importance_levels]
delayed_counts = [df[(df['Product_importance'] == imp) & (df['Reached.on.Time_Y.N'] == 1)].shape[0] for imp in importance_levels]

plt.bar(importance_levels, on_time_counts, label='准时交付', color='palegoldenrod')
plt.bar(importance_levels, delayed_counts, bottom=on_time_counts, label='延迟交付', color='hotpink')
plt.title('产品重要性与准时交付的关系', fontsize=14, fontweight='bold')
plt.xlabel('产品重要性', fontsize=12)
plt.ylabel('订单数量', fontsize=12)
plt.legend()
plt.grid(linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

4.4 客户服务通话次数与准时交付的关系

plt.figure(figsize=(12, 6))

importance_levels = df['Customer_care_calls'].unique()
on_time_counts = [df[(df['Customer_care_calls'] == imp) & (df['Reached.on.Time_Y.N'] == 0)].shape[0] for imp in importance_levels]
delayed_counts = [df[(df['Customer_care_calls'] == imp) & (df['Reached.on.Time_Y.N'] == 1)].shape[0] for imp in importance_levels]

plt.bar(importance_levels, on_time_counts, label='准时交付', color='palegoldenrod')
plt.bar(importance_levels, delayed_counts, bottom=on_time_counts, label='延迟交付', color='hotpink')


# plt.scatter(df['Customer_care_calls'], df['Reached.on.Time_Y.N'], c=df['Reached.on.Time_Y.N'], cmap='bwr', alpha=0.5)
plt.title('客户服务通话次数与准时交付的关系', fontsize=14, fontweight='bold')
plt.xlabel('通话次数', fontsize=12)
plt.ylabel('订单数量', fontsize=12)
plt.legend()
plt.grid(linestyle='--', alpha=0.7)
plt.tight_layout()

4.5 产品成本与准时交付的关系

plt.figure(figsize=(12, 6))

importance_levels = df['Cost_of_the_Product'].unique()
on_time_counts = [df[(df['Cost_of_the_Product'] == imp) & (df['Reached.on.Time_Y.N'] == 0)].shape[0] for imp in importance_levels]
delayed_counts = [df[(df['Cost_of_the_Product'] == imp) & (df['Reached.on.Time_Y.N'] == 1)].shape[0] for imp in importance_levels]

plt.bar(importance_levels, on_time_counts, label='准时交付', color='palegoldenrod')
plt.bar(importance_levels, delayed_counts, bottom=on_time_counts, label='延迟交付', color='hotpink')


# plt.scatter(df['Customer_care_calls'], df['Reached.on.Time_Y.N'], c=df['Reached.on.Time_Y.N'], cmap='bwr', alpha=0.5)
plt.title('产品成本与准时交付的关系', fontsize=14, fontweight='bold')
plt.xlabel('产品成本(USD)', fontsize=12)
plt.ylabel('订单数量', fontsize=12)
plt.legend()
plt.grid(linestyle='--', alpha=0.7)
plt.tight_layout()

4.6 历史购买次数与准时交付的关系

plt.figure(figsize=(12, 6))

importance_levels = df['Prior_purchases'].unique()
on_time_counts = [df[(df['Prior_purchases'] == imp) & (df['Reached.on.Time_Y.N'] == 0)].shape[0] for imp in importance_levels]
delayed_counts = [df[(df['Prior_purchases'] == imp) & (df['Reached.on.Time_Y.N'] == 1)].shape[0] for imp in importance_levels]

plt.bar(importance_levels, on_time_counts, label='准时交付', color='palegoldenrod')
plt.bar(importance_levels, delayed_counts, bottom=on_time_counts, label='延迟交付', color='hotpink')


# plt.scatter(df['Customer_care_calls'], df['Reached.on.Time_Y.N'], c=df['Reached.on.Time_Y.N'], cmap='bwr', alpha=0.5)
plt.title('历史购买次数与准时交付的关系', fontsize=14, fontweight='bold')
plt.xlabel('历史购买次数', fontsize=12)
plt.ylabel('订单数量', fontsize=12)
plt.legend()

plt.grid(linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

4.7 性别与准时交付的关系

plt.figure(figsize=(10, 6))
genders = df['Gender'].unique()
on_time_counts = [df[(df['Gender'] == gender) & (df['Reached.on.Time_Y.N'] == 0)].shape[0] for gender in genders]
delayed_counts = [df[(df['Gender'] == gender) & (df['Reached.on.Time_Y.N'] == 1)].shape[0] for gender in genders]

plt.bar(genders, on_time_counts, label='准时交付', color='lightblue')
plt.bar(genders, delayed_counts, bottom=on_time_counts, label='延迟交付', color='peachpuff')
plt.title('性别与准时交付的关系', fontsize=14, fontweight='bold')
plt.xlabel('性别', fontsize=12)
plt.ylabel('订单数量', fontsize=12)
plt.legend()
plt.grid(linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

4.8 折扣力度与准时交付的关系

plt.figure(figsize=(12, 6))

importance_levels = df['Discount_offered'].unique()
on_time_counts = [df[(df['Discount_offered'] == imp) & (df['Reached.on.Time_Y.N'] == 0)].shape[0] for imp in importance_levels]
delayed_counts = [df[(df['Discount_offered'] == imp) & (df['Reached.on.Time_Y.N'] == 1)].shape[0] for imp in importance_levels]

plt.bar(importance_levels, on_time_counts, label='准时交付', color='palegoldenrod')
plt.bar(importance_levels, delayed_counts, bottom=on_time_counts, label='延迟交付', color='hotpink')


# plt.scatter(df['Customer_care_calls'], df['Reached.on.Time_Y.N'], c=df['Reached.on.Time_Y.N'], cmap='bwr', alpha=0.5)
plt.title('折扣力度与准时交付的关系', fontsize=14, fontweight='bold')
plt.xlabel('历史购买次数', fontsize=12)
plt.ylabel('折扣力度', fontsize=12)
plt.legend()

plt.grid(linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

4.9 仓库区块与准时交付的关系

plt.figure(figsize=(12, 6))
blocks = df['Warehouse_block'].unique()
on_time_counts = [df[(df['Warehouse_block'] == block) & (df['Reached.on.Time_Y.N'] == 0)].shape[0] for block in blocks]
delayed_counts = [df[(df['Warehouse_block'] == block) & (df['Reached.on.Time_Y.N'] == 1)].shape[0] for block in blocks]

plt.bar(blocks, on_time_counts, label='准时交付', color='lightcyan')
plt.bar(blocks, delayed_counts, bottom=on_time_counts, label='延迟交付', color='lightpink')
plt.title('仓库区块与准时交付的关系', fontsize=14, fontweight='bold')
plt.xlabel('仓库区块', fontsize=12)
plt.ylabel('订单数量', fontsize=12)
plt.legend()
plt.grid(linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

五、多维度组合分析

fig, axes = plt.subplots(2, 2, figsize=(18, 12))

# 发货方式与准时交付
shipment_modes = df['Mode_of_Shipment'].unique()
on_time_counts = [df[(df['Mode_of_Shipment'] == mode) & (df['Reached.on.Time_Y.N'] == 0)].shape[0] for mode in shipment_modes]
delayed_counts = [df[(df['Mode_of_Shipment'] == mode) & (df['Reached.on.Time_Y.N'] == 1)].shape[0] for mode in shipment_modes]
axes[0, 0].bar(shipment_modes, on_time_counts, label='准时交付', color='skyblue')
axes[0, 0].bar(shipment_modes, delayed_counts, bottom=on_time_counts, label='延迟交付', color='salmon')
axes[0, 0].set_title('发货方式与准时交付', fontsize=12, fontweight='bold')
axes[0, 0].set_xlabel('发货方式', fontsize=10)
axes[0, 0].set_ylabel('订单数量', fontsize=10)
axes[0, 0].legend()
axes[0, 0].grid(linestyle='--', alpha=0.7)

# 客户评分与准时交付
ratings = df['Customer_rating'].unique()
on_time_counts = [df[(df['Customer_rating'] == rating) & (df['Reached.on.Time_Y.N'] == 0)].shape[0] for rating in ratings]
delayed_counts = [df[(df['Customer_rating'] == rating) & (df['Reached.on.Time_Y.N'] == 1)].shape[0] for rating in ratings]
axes[0, 1].bar(ratings, on_time_counts, label='准时交付', color='lightgreen')
axes[0, 1].bar(ratings, delayed_counts, bottom=on_time_counts, label='延迟交付', color='orange')
axes[0, 1].set_title('客户评分与准时交付', fontsize=12, fontweight='bold')
axes[0, 1].set_xlabel('客户评分', fontsize=10)
axes[0, 1].set_ylabel('订单数量', fontsize=10)
axes[0, 1].legend()
axes[0, 1].grid(linestyle='--', alpha=0.7)

# 产品重要性与准时交付
importance_levels = df['Product_importance'].unique()
on_time_counts = [df[(df['Product_importance'] == imp) & (df['Reached.on.Time_Y.N'] == 0)].shape[0] for imp in importance_levels]
delayed_counts = [df[(df['Product_importance'] == imp) & (df['Reached.on.Time_Y.N'] == 1)].shape[0] for imp in importance_levels]
axes[1, 0].bar(importance_levels, on_time_counts, label='准时交付', color='palegoldenrod')
axes[1, 0].bar(importance_levels, delayed_counts, bottom=on_time_counts, label='延迟交付', color='hotpink')
axes[1, 0].set_title('产品重要性与准时交付', fontsize=12, fontweight='bold')
axes[1, 0].set_xlabel('产品重要性', fontsize=10)
axes[1, 0].set_ylabel('订单数量', fontsize=10)
axes[1, 0].legend()
axes[1, 0].grid(linestyle='--', alpha=0.7)

# 性别与准时交付
genders = df['Gender'].unique()
on_time_counts = [df[(df['Gender'] == gender) & (df['Reached.on.Time_Y.N'] == 0)].shape[0] for gender in genders]
delayed_counts = [df[(df['Gender'] == gender) & (df['Reached.on.Time_Y.N'] == 1)].shape[0] for gender in genders]
axes[1, 1].bar(genders, on_time_counts, label='准时交付', color='lightblue')
axes[1, 1].bar(genders, delayed_counts, bottom=on_time_counts, label='延迟交付', color='peachpuff')
axes[1, 1].set_title('性别与准时交付', fontsize=12, fontweight='bold')
axes[1, 1].set_xlabel('性别', fontsize=10)
axes[1, 1].set_ylabel('订单数量', fontsize=10)
axes[1, 1].legend()
axes[1, 1].grid(linestyle='--', alpha=0.7)

plt.tight_layout()
plt.show()

六、总结

  通过对电子商务运输数据的全面可视化分析,我们得出了以下关键结论:

  1. 航空运输的准时率最高:航空运输的准时交付率显著高于船运和公路运输,表明运输方式对交付效率有重要影响。
  2. 客户评分与准时交付正相关:评分较高的客户更有可能收到准时交付的产品,表明服务质量与客户满意度密切相关。
  3. 高重要性产品准时率低:标记为高重要性的产品反而拥有最低的准时交付率,这提示我们需要重新审视高优先级订单的物流流程。
  4. 女性客户体验更优:女性客户的准时交付率略高于男性,可能与购买行为或产品类型有关。
  5. 季节性影响显著:不同月份的准时交付率存在明显波动,需进一步分析季节性因素对物流效率的影响。

  这些发现为企业优化物流策略、提升客户体验提供了重要参考。通过针对性地改进运输方式、优先处理高重要性订单以及考虑季节性因素,企业能够显著提升准时交付率,增强市场竞争力。

  注: 博主目前收集了6900+份相关数据集,有想要的可以领取部分数据:

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

云天徽上

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值