Data Exploration of historical Olympics dataset

I this notebook I use python to run some data exploration techniques to provide my view of viewing the dataset.

This is a historical dataset on the modern Olympic Games, including all the Games from Athens 1896 to Rio 2016.

Note that the Winter and Summer Games were held in the same year up until 1992. After that, they staggered them such that Winter Games occur on a four year cycle starting with 1994, then Summer in 1996, then Winter in 1998, and so on.

I run my analysis primarily on Summer Olympics

The data have been scraped from in May 2018.The dataset can also be accessed from Kaggle.

Dataset Content

The file athlete_events.csv contains 271116 rows and 15 columns; Each row corresponds to an individual athlete competing in an individual Olympic event (athlete-events). The columns are the following:

Index of contents

Importing Dataset

Using the library Opendatasets I downloaded the dataset as follows

Let us read the data within the pandas dataframe

Let us check the size of the dataset

Data Preparation & Cleaning

As Age, height, weight are numerical columns. Replacing those values by zero.
For Medal I will replace the NaN values by None
Also converting the Age fielding to integer

Exploratory Analysis and Visualization

Before we ask questions on the Olympic datasets, it would help to understand the participants ‘ demographics, i.e., country, age, gender etc. It’s essential to explore these variables to understand how representative the participants is of the worldwide sports community.

Top countries participating in Olympics

I use the seaborn library to build the visualization

As USA has historically won maximum no of medals it would make sense the participation is highest from US. Surprisingly Soviet Union is not present in the list of top 10 countries.

Age Distribution

From the above distribution we observe maximum participants are of age between 22–26 years, Which would make sense as it is likely for people with less age would perform better in active sport.

Gender Distribution

Male seems to be dominating in terms of participation. Let us check the female participants of the years 1900 to 2016

Female Participation Over The Years

Although the female participation in 27.5% over the years it has incresed significantly as displayed above

Participants across seasons

Spots person participating across seasons

Why winter Olympic has fewer participants, Lets try to explore the sports and event across winter and summer Olympics

As per the above data we have 52 sports and 651 events in summer Olympics where we have 17 sports and 119 events in winter Olympics. Hence we have higher number of participants in summer Olympics

Asking and Answering Questions

We’ve already gained several insights about the participants involved in Olympics. Let’s ask some specific questions and try to answer them using data frame operations and visualizations.

Q: Which countries WON the maximum Gold Medals in last held Olympic competitions

US seems to lead the Gold medal charts for the last held Olympics in the year 2016. I am curious to know which sport fetch the most gold medals.

It seems like Swimming fetched the maximum Gold medals to US. Below is the visual representation of the same.

Highest medal distribution across top 5 sports

Q: Countries winning maximum Medals per year ?

Following are the insights we derive from above visuals

Q: Top 10 Individual winning maximum number of Olympics Medals for their country

Michael Fred Phelps

Q: Spread of Medal based on Age, Height and Weight

An interesting observation is sportsman's with height with less that 140 and age less that 20 are also winning Medals at Olympics. Let us see in which sport they for this medals

It is clearly visible athlete with lesser heights and age seems to have done well in gymnastics

Age and Weight

Let us have a look at the medalist having high weight and also the one have low weight

Height and Weight

Let us evaluate the athlete having high weight

As I observe most of such sport are Wrestling, Weightlifting and Judo

Q: Women participation at Olympics

As we see the trend, Woman participation has been increasing over the years on an average

Medal won Individual with Age more than 50

Individuals above 50 have been doing in Equestrianism, Shooting, Sailing, Art competitions and Archery. These sports seems to have require more mental strength and then physical strength.

Inferences and Conclusions

We’ve drawn many inferences from the survey. Here’s a summary of a few of them:


Check out the following resources to learn more about the dataset and tools used in this notebook:

The entire notebook can be access here. I would like to thank Aakash N S and and @Jovian Community for providing all the necessary training through Zero to Pandas course.

I really hope you guys learned something from this post. Feel free to 👏if you liked the content. This keeps me motivated.

Thank you for reading the post. Happy Learning 😃

Data Science Practitioner | Machine Learning | Deep Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store