E-Commerce Data Analysis with Pandas
Explore the UCI Online Retail Dataset.
5 upvotes
10 upvotes
Project Description
In this project, you will work with the UCI Online Retail Dataset, a real transactions dataset from a UK-based online store with over 500,000 rows. You will clean it, filter it, and compute your first business metric.
This project is closer to real work than most beginner datasets. The data is not clean, and some rows make no sense: you have to deal with that before doing any analysis.
Project Requirements
Download the UCI Online Retail Dataset (available on Kaggle or the UCI ML Repository)
Sample 10% of rows to keep things manageable
Clean the data: remove nulls, fix data types, filter out returns, and free items
Convert
InvoiceDateto a properdatetimeobjectCreate a
Revenuecolumn (Quantity 脳 UnitPrice)Find the top 10 countries by total revenue and plot the result
Technologies to Use
Python
Pandas
Matplotlib / Seaborn
Jupyter Notebook
What You Will Learn
You will practice cleaning a large, realistic dataset and computing a derived metric. You will also understand why negative quantities and zero prices exist in real transaction data, and how to handle them without deleting useful rows.
Want to See a Solution?
A full walkthrough of this project is available on Towards Data Science: 馃敆 EDA in Public: Cleaning and Exploring Sales Data with Pandas
