I this notebook I use python to run some data exploration techniques to provide my view of viewing the dataset.
This is a historical dataset on the modern Olympic Games, including all the Games from Athens 1896 to Rio 2016.
Note that the Winter and Summer Games were held in the same year up until 1992. After that, they staggered them such that Winter Games occur on a four year cycle starting with 1994, then Summer in 1996, then Winter in 1998, and so on.
I run my analysis primarily on Summer Olympics
The file athlete_events.csv contains 271116 rows and 15 columns; Each row corresponds to an individual athlete competing in an individual Olympic event (athlete-events). The columns are the following:
- ID — Unique number for each athlete
- Name — Athlete’s name
- Sex — M or F
- Age — Integer
- Height — In centimeters
- Weight — In kilograms
- Team — Team name
- NOC — National Olympic Committee 3-letter code
- Games — Year and season
- Year — Integer
- Season — Summer or Winter
- City — Host city
- Sport — Sport
- Event — Event
- Medal — Gold, Silver, Bronze, or NA.
Index of contents
- Importing Dataset
- Data Preparation & Cleaning
- Exploratory Analysis and Visualization
- Top countries participating in Olympics
- Age Distribution
- Gender Distribution
- Participants across seasons
- Asking and Answering Questions
- Q1: Which countries WON the maximum Gold Medals in last held Olympic competitions ?
- Q2: Countries winning maximum Medals per year ?
- Q3: Top 10 Individual winning maximum number of Olympics Medals for their country ?
- Q4: Spread of Medal based on Age, Height and Weight ?
- Q5: Women participation at Olympics ?
- Medal won Individual with Age more than 50
- Inferences and Conclusions
- References and Future Work
Using the library Opendatasets I downloaded the dataset as follows
Let us read the data within the pandas dataframe
Let us check the size of the dataset
Data Preparation & Cleaning
As Age, height, weight are numerical columns. Replacing those values by zero.
For Medal I will replace the NaN values by None
Also converting the Age fielding to integer
Exploratory Analysis and Visualization
Before we ask questions on the Olympic datasets, it would help to understand the participants ‘ demographics, i.e., country, age, gender etc. It’s essential to explore these variables to understand how representative the participants is of the worldwide sports community.
Top countries participating in Olympics
I use the seaborn library to build the visualization
As USA has historically won maximum no of medals it would make sense the participation is highest from US. Surprisingly Soviet Union is not present in the list of top 10 countries.
From the above distribution we observe maximum participants are of age between 22–26 years, Which would make sense as it is likely for people with less age would perform better in active sport.
Male seems to be dominating in terms of participation. Let us check the female participants of the years 1900 to 2016
Although the female participation in 27.5% over the years it has incresed significantly as displayed above
Participants across seasons
Why winter Olympic has fewer participants, Lets try to explore the sports and event across winter and summer Olympics
As per the above data we have 52 sports and 651 events in summer Olympics where we have 17 sports and 119 events in winter Olympics. Hence we have higher number of participants in summer Olympics
Asking and Answering Questions
We’ve already gained several insights about the participants involved in Olympics. Let’s ask some specific questions and try to answer them using data frame operations and visualizations.
Q: Which countries WON the maximum Gold Medals in last held Olympic competitions
US seems to lead the Gold medal charts for the last held Olympics in the year 2016. I am curious to know which sport fetch the most gold medals.
It seems like Swimming fetched the maximum Gold medals to US. Below is the visual representation of the same.
Q: Countries winning maximum Medals per year ?
Following are the insights we derive from above visuals
- US seems to have been winning maximum medals for most number of years.
- Sovient union seems to have Won highest number of medals in a Olympic event.
- The winning variance in maximum with sovient union
- Germany, Sweden, Great Britain and Greece has won maximum medals once
Q: Top 10 Individual winning maximum number of Olympics Medals for their country
Q: Spread of Medal based on Age, Height and Weight
An interesting observation is sportsman's with height with less that 140 and age less that 20 are also winning Medals at Olympics. Let us see in which sport they for this medals
It is clearly visible athlete with lesser heights and age seems to have done well in gymnastics
Age and Weight
Let us have a look at the medalist having high weight and also the one have low weight
Height and Weight
Let us evaluate the athlete having high weight
As I observe most of such sport are Wrestling, Weightlifting and Judo
Q: Women participation at Olympics
As we see the trend, Woman participation has been increasing over the years on an average
Medal won Individual with Age more than 50
Individuals above 50 have been doing in Equestrianism, Shooting, Sailing, Art competitions and Archery. These sports seems to have require more mental strength and then physical strength.
Inferences and Conclusions
We’ve drawn many inferences from the survey. Here’s a summary of a few of them:
- US seems to dominants in terms of participation of maximum gold as well as overall participation in games.
- We observe athletes from the age of 12 till the age of 58 years winning medals.
- Summer Olympics have higher no of events and sports as compared to the winter Olympics.
- In the history of 120 years of Olympics, Michael Fred Phelps, II has won maximum medals for his country i.e. 28 Medals
- We see a trend that woman participants across the years in in upward tread.
- Participate with high weight (like > 150) seems to have done well in Wrestling, Weight lifting and Judo.
Check out the following resources to learn more about the dataset and tools used in this notebook:
- 120 Years of Olympic History: https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results
- Pandas user guide: https://pandas.pydata.org/docs/user_guide/index.html
- Matplotlib user guide: https://matplotlib.org/3.3.1/users/index.html
- Seaborn user guide & tutorial: https://seaborn.pydata.org/tutorial.html
- opendatasets Python library: https://github.com/JovianML/opendatasets
I really hope you guys learned something from this post. Feel free to 👏if you liked the content. This keeps me motivated.
Thank you for reading the post. Happy Learning 😃