Data Visualization Cheat Sheet with Seaborn and Matplotlib | #14

Bursiform
7 min readNov 18, 2020

--

Introduction

Exploratory Data Analysis — EDA is an indispensable step in data mining. To interpret various aspects of a data set like its distribution, principal or interference, it is necessary to visualize our data in different graphs or images. Fortunately, Python offers a lot of libraries to make visualization more convenient and easier than ever. Some of which are widely used today such as Matplotlib, Seaborn, Plotly or Bokeh.

Since my job concentrates on scrutinizing all angles of data, I have been exposed to many types of graphs. However, because there are way too many functions and the codes are not easy to remember, I sometimes forget the syntax and have to review or search for similar codes on the Internet. Without doubt, it has wasted a lot of my time, hence my motivation for writing this article. Hopefully, it can be a small help to anyone who has a memory of a goldfish like me.

Data Description

My dataset is downloaded from public Kaggle dataset. It is a grocery dataset, and you can easily get the data from the link below:

Groceries dataset

Dataset of 38765 rows for Market Basket Analysis

www.kaggle.com

This grocery data consists of 3 columns, which are:

  • Member_number: id numbers of customers
  • Date: date of purchasing
  • itemDescription: Item name

Now, let’s have a look at the data frame and its information:

Figure 1: Data frame

Figure 2: Data’s description

Install necessary packages

There are some packages that we should import first.

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

Visualize data

Line Chart

For this section, I will use a line graph to visualize sales the grocery store during the time of 2 years 2014 and 2015.

First, I will transform the data frame a bit to get the items counted by month and year.

Figure 3: Items Counted by Month-Year

After we have our data, let’s try to visualize it:

Figure 4: Line Chart of Items Counted by Month-Year

Bar Chart

Bar chart is used to simulate the changing trend of objects over time or to compare the figures / factors of objects. Bar charts usually have two axes: one axis is the object / factor that needs to be analyzed, the other axis is the parameters of the objects.

For this dataset, I will use a bar chart to visualize 10 best categories sold in 2014 and 2015. You can either display it by horizontal or vertical bar chart. Let’s see how it looks.

Data Transformation

Figure 4: Items Counted by Categories

Horizontal Bar Chart

Figure 5: Horizontal Bar Chart

If you prefer vertical bar chart, try this:

Figure 6: Vertical Bar Chart

Bar Chart with Hue Value

If you want to compare each category’s sales by year, what would your visualization look like? You can draw the graph with an addition of an element called hue value.

Figure 7: Bar Chart with Hue Value

Now, can you see it more clearly?

Histogram

Imagine that I want to discover the frequency of customers buying whole milk, the best seller category. I will use histogram to obtain this information.

Figure 8: Frequency of customers buying whole milk in 2014 and 2015

By looking at the visualization, we can see that customers hardly repurchase this item more than twice, and a lot of customers cease to buy this product after their first purchases.

Pie chart

Actually, pie charts are quite poor at communicating the data. However, it does not hurt to learn this visualization technique.

For this data, I want to compare the sales of top 10 categories with the rest in both year 2014 and 2015. Now, let’s transform our data to get this information visualized.

Our data is now ready. Let’s see the pies!

Figure 9: Pie Charts

So, it is obvious that top 10 categories were less purchased in 2015 compared to 2014, by 5.5%.

Swarm Plot

Another way to review your data is swarm plot. In swarm plot, points are adjusted (vertical classification only) so that they do not overlap. This is helpful as it complements box plot when you want to display all observations along with some representation of the underlying distribution.

As I want to see the number of items sold in each day of the week, I may use this type of chart to display the information. As usual, let’s first calculate the items sold and group them by categories and days.

After we obtain the data, let’s see how the graph looks like.

Figure 10: Swarm Chart

Conclude

In this article, I have shown you how to customize your data with different types of visualizations. If you find it helpful, you can save it and review anytime you want. It can save you tons of time down the road. :D

WRITTEN BY

Chi Nguyen

An introverted girl who craves for learning and writing

Follow

313

Your journey starts here.

DATA SCIENCE

Top 3 Lesser-Known Pandas Function

DATA SCIENCE

Data visualisation: 3 secret tips on Python to make interactive graphs and impress your boss.

DATA SCIENCE

5 Pandas Tricks That’ll Make Your Life Easier

DATA SCIENCE

How to change semi-structured text into a Pandas dataframe

Sign up for The Daily Pick

By Towards Data Science

Hands-on real-world examples, research, tutorials, and cutting-edge techniques delivered Monday to Thursday. Make learning your daily ritual. Take a look

Get this newsletter

Emails will be sent to bursiform@socialtalker.com.

Not you?

Thanks to Linda Chen.

313

More from Towards Data Science

Follow

A Medium publication sharing concepts, ideas, and codes.

Jerry Wei

·1 day ago

Applying Curriculum Learning to Medical Images

A quick explanation of using curriculum learning for medical image analysis.

General Overview. In this study, I worked with a team of researchers to apply curriculum learning to improve the accuracy of a deep learning model for classifying colorectal cancer images. The full paper can be found here, and it is going to be published and presented at the 2021 Winter Conference on Applications of Computer Vision (WACV).

Proposed curriculum learning scheme for training a colorectal polyp classifier. The classifier first trains on easy images, and progressively-harder images are gradually added in subsequent stages.

The Motivation. Curriculum learning is an elegant idea inspired by human learning that proposes that deep learning models should be trained on examples in a specified order based on difficulty (typically easy examples and then hard examples), as opposed to random sampling. …

Read more · 4 min read

10

Luís Rita

·1 day ago

First Helmet Detector using YOLOv5

CycleAI: Empowering cyclists in fighting for their own safety

CycleAI: Empowering cyclists in fighting for their own safety [Image by Author].

Mobility is a priority theme for the European Union in the context of urban development. At the same time, hundreds of people, including cyclists and pedestrians, lose their lives on the roads. Therefore, planning and ordering of cities through appropriate infrastructures is urging, alongside a safe and efficient transport network aimed at active mobility — both on foot and by bicycle.

It is now presented the object detection model that was trained to identify whether cyclists are wearing a helmet and, potentially, studying their prevalence.

YOLOv5

YOLOv5 is the most recent version of YOLO which was originally developed by Joseph Redmon. First version runs in a framework called Darknet which was purposely built to execute YOLO [1]. …

Read more · 5 min read

Felipe de Pontes Adachi

·1 day ago

How I Learned to Stop Worrying and Track my Machine Learning Experiments

Keep your machine learning projects under control

Photo by Annie Spratt on Unsplash

To track and reproduce

From my personal experience, one thing I realized is that tracking machine learning experiments is important. This realization was eventually followed by another one: tracking machine learning experiments is hard.

--

--

Bursiform
Bursiform

No responses yet