Creating a Lollipop Plot with two groups in Python

Luke Beggs
5 min readMar 2, 2021

A lollipop plot is a mix between a scatter plot and a barplot. It is used to show the relationship between a numerical value and another variable. Lollipop plots from a single group are essentially barplots with the data represented as a ‘lollipop’ and not a bar. But, when you add a second observation for the groups, you can use it as a tool to visualise the difference or change in some data by plotting the difference in the values, as opposed to the having both values beside each other. This style is also often known as a Cleveland dot plot.

The full code can be found on GitHub.

A lollipop plot with two groups

At the time of creating this, there are no Python packages or libraries that allow one to directly create a 2 group lollypop plot. So in order to create this plot one needs to combine scatter plots and horizontal lines (or vertical, depending on the orientation of the plot).

As mentioned earlier, this version of a lollipop plot is useful when you have two different observations for a group. For example this could be: the life expectancy in various countries with the two observations as gender. The group in this scenario would be the countries and the two observations might be male and female life expectancy. This plot would then show the value for each observation and visualise the difference between them.

As an example, I will be representing the difference in points accumulated for premier league teams in their home and away matches (as of 01/03/2021). In this case, the group is the teams in the premier league and the observations are their points totals for home and away matches. This is a very simple example with easily available data that will still show all the necessary features.

I will go through the code explaining what each line does, with the full code to be found on my GitHub.

This method encorporates matplotlib and seaborn for plotting the data and pandas for reading in the data and storing it in a dataframe. These libraries need to be imported.

Importing the required libraries

Once the relevant libraries have been imported, we can load the data. We use pandas to read saved .csv files into a dataframe. These are two files that contain each premier league teams data for their home and away matches.

Load the data from a local file into a dataframe

I have reversed the order of these dataframes as a preference for plotting later. In a football table, the ‘lower’ your position in the table, the better you are doing. When this is plotted, the better performing teams appear lower on the y-axis which does not intuitively fit with ones perception of a league table.

We do not need to process the data in this instance so we can go directly into setting up a plot.

Changing styling options and setting up plot

The first line of this code sets the dimensions of the plot that we will make. Then we change the seaborn styling options. I chose whitegrid and dashed gridlines. After this, I create a variable ‘my_range’ to be a list of n numbers, where n is the amount of data observations I have. In this case this will be 20 as there are 20 teams in the table.

Now we can plot the data.

Plotting the data

This section of the code does all the plotting. It starts with turning off the axis frame which is just a style preference. One horizontal line is plotted and two scatter plots. The horizontal lines plot a line for each team that goes from from one observation to the other. Then the colour and line width are adjusted.

Each scatter plot represents a different observation. In this case, the first one shows the number of points achieved in home matches, with the second showing the number of points from away matches.

The scatter plots take multiple arguments. They take the relevant x and y values to plot and then have their colour and the size of the points changed and finally a label added. The ‘zorder’ value changes the order in which the plots are made, this value ensure that the scatter plots are performed last which means that the the lines connecting the points will not overlap the points, leading to a cleaner visualisation.

All that remains is cleaning up the plot.

Edit the plot to aid readability

These lines of code perform the following (in order):

  • Add a legend
  • Change the y-ticks to have the team names
  • Add a title
  • Add a label to the x-axis (not needed on the y-axis)
  • Change to a ‘tight’ layout

Now all that remains is to save this figure and show the plot.

Saving and displaying the plot

I chose to save this in the .svg format as I will attempt to further enhance the aesthetic of the plot in a vector editor (Inkscape). The current plot is the following image.

Final output

As it is, this visualisation is fine. Unnecessary clutter, like the y-axis label or the axes themselves, has been removed and the plot clearly displays the data. Though in my opinion it can be improved upon. With the use of a vector graphics editor, certain elements can be changed in an attempt to create a better looking plot.

I used inkscape to change the font of the text to something I deem to be more aesthetically pleasing. The title is increased in size and I added a brief description to tell the viewer what they are looking at (I did however forget to add the year or data the data is from). I also changed the background to a near-black colour and the text to white to really make the figure pop. The final element I adjusted was the legend, making the text line up to the colour and putting it closer together. The final output is a much cleaner and easier to look at standalone visualisation (in my opinion, of course).

Final visualisation for the data.

--

--

Luke Beggs

Data Science student with an interest in sports / esports