The COVID-19 pandemic has had an impact on every aspect of human life, be it physical, psychological or social. Given its far-reaching impact on psychological health, I was curious to explore the effect that the pandemic had on the well-being of people at a global level.
Since 2012, the World Happiness Report has been released year after year providing a review of the state of happiness in the world. These reports are based on the Gallup World Poll data which assesses subjective well-being by asking a simple question worded in English as “Please imagine a ladder, with steps numbered from 0 at the bottom to 10 at the top. The top of the ladder represents the best possible life for you and the bottom of the ladder represents the worst possible life for you. On which step of the ladder would you say you personally feel you stand at this time?” This measure is also referred to as Cantril life ladder. The poll data for each country is the national average response to this question of life evaluations, year on year. The scores provided by the WHR for each year are an average of that and the previous two years for e.g., the score for 2022 is an average of the scores for 2020, 2021 and 2022 for that country.
In order to analyse the pandemic’s impact on world happiness or well-being, I compared the world happiness scores for each country from three years before the pandemic i.e., 2017 to 2019 to the three years during the pandemic i.e., 2020 to 2022. I aimed to address the following questions:
How have the World Happiness scores for every country fared from 2017 to 2022?
What were the changes in the World Happiness scores for every country and every world region from 2017-2019 to 2020-2022?
The data was taken from the World Happiness Reports from 2018 to 2023. Each year’s report, published on 20th March, reports the poll data gathered in the previous year thus providing the data for 2017 to 2022. Throughout the rest of this project, the years mentioned refer to the year in which the data was gathered and not when the report was published.
The links for accessing individual datasets are provided under the References section
.
Every dataset obtained from the World Happiness Report website had multiple columns representing many variables. For this project, I imported the complete dataset for each year and then selected the variables needed for my analysis - the country names, the regional indicators and the ladder or happiness scores.
Library and Version | Purpose |
---|---|
tidyverse_1.3.2 | for handling data |
readr_2.1.4 | for reading data files e.g., csv files |
readxl_1.4.1 | for reading data files e.g., excel files |
here_1.0.1 | for easy file and directory referencing |
ggplot2_3.4.2 | for data visualisations |
plotly_4.10.1 | for interactive data visualisations |
reshape2_1.4.4 | for transforming data between wide and long formats |
forcats_1.0.0 | for managing character vectors |
if(!require(tidyverse)){install.packages("tidyverse"); library(tidyverse)}
if(!require(readr)){install.packages("readr"); library(readr)}
if(!require(readxl)){install.packages("readxl"); library(readxl)}
if(!require(here)){install.packages("here"); library(here)}
if(!require(ggplot2)){install.packages("ggplot2"); library(ggplot2)}
if(!require(reshape2)){install.packages("reshape2"); library(reshape2)}
if(!require(plotly)){install.packages("plotly"); library(plotly)}
if(!require(forcats)){install.packages("forcats"); library(forcats)}
The variables (taken from the datasets or calculated by wrangling the data) used in this analysis are:
Variable name | Explanation |
---|---|
Country | Country name (130 countries) |
Regional indicator | World region that the country is part of, as per the World Happiness Report’s categorization (10 Regions) |
Year | Year in which data was collected (6 years) |
Happiness score/ Ladder score | Life evaluation score based on the Gallup World Poll data |
avg_2017_2019 | Average Happiness Score for 2017 to 2019 |
avg_2020_2022 | Average Happiness Score for 2020 to 2022 |
mean_diff | Difference of means for 2017-2019 and 2020-2022 |
diff_pos | mean_diff > 0 = TRUE, mean_diff < 0 = FALSE |
#import data file for 2017
<- read_excel(here("data", "whr2018_data.xls"), sheet = 2, range = "A1:K157")
data_2017_raw
#clean data file for 2017
<- data_2017_raw %>%
data_2017 select("Country",
"Happiness score") %>% #select the required columns
rename('2017'='Happiness score') #rename columns for ease of merging data frames
#rename countries in the 2017 data frame to match later years for ease of merging
$Country <- gsub("Northern Cyprus","North Cyprus",as.character(data_2017$Country))
data_2017$Country <- gsub("Hong Kong SAR, China", "Hong Kong S.A.R. of China",as.character(data_2017$Country))
data_2017$Country <- gsub("Trinidad & Tobago", "Trinidad and Tobago",as.character(data_2017$Country)) data_2017
#import data file for 2018
<- read_excel(here("data", "whr2019_data.xls"), sheet = 2, range = "A1:K157")
data_2018_raw
#clean data file for 2018
<- data_2018_raw %>%
data_2018 select("Country",
"Happiness score") %>% #select the required columns
rename('2018'='Happiness score') #rename column for ease of merging data frames
#import data file for 2019
<- read_excel(here("data", "whr2020_data.xls"))
data_2019_raw
#clean data file for 2019
<- data_2019_raw %>%
data_2019 select("Country name",
"Regional indicator",
"Ladder score") %>% #select the required columns
rename('Country'='Country name',
'2019'='Ladder score') #rename columns for ease of merging data frames
#import data file for 2020
<- read_excel(here("data", "whr2021_data.xls"))
data_2020_raw
#clean data file for 2020
<- data_2020_raw %>%
data_2020 select("Country name",
"Ladder score") %>% #select the required columns
rename('Country'='Country name',
'2020'='Ladder score') #rename columns for ease of merging data frames
#import data file for 2021
<- read_excel(here("data", "whr2022_data.xls"), range = "A1:L147")
data_2021_raw
#clean data file for 2021
<- data_2021_raw %>%
data_2021 select("Country",
"Happiness score") %>% #select the required columns
rename('2021'='Happiness score') #rename columns for ease of merging data frames
#remove '*' from countries (*indicated no information available for 2020, 2021; averages are based on the 2019)
$Country <- gsub("\\*","",as.character(data_2021$Country)) data_2021
#import data file for 2022
<- read_excel(here("data" , "whr2023_data.xls"))
data_2022_raw
#clean data file for 2022
<- data_2022_raw %>%
data_2022 select("Country name",
"Ladder score") %>% #select the required columns
rename('Country'='Country name',
'2022'='Ladder score')
#changing Turkey to Turkiye to maintain uniformity across all dataframes
# make a list of the names of the dataframes
<- c("data_2017", "data_2018", "data_2019", "data_2020", "data_2021", "data_2022")
df_names
# Loop through each dataframe
for (i in 1:length(df_names)) {
# Get the current dataframe
<- get(df_names[i])
df
# Replace 'Turkey' with 'Turkiye' in the 'Country' column
$Country[df$Country == "Turkey"] <- "Turkiye"
df
# Assign the updated dataframe back to the original variable
assign(df_names[i], df)
}
#merge data frames for all years
<- list(data_2017, data_2018, data_2019, data_2020, data_2021, data_2022)
df_list <- df_list %>% reduce(full_join, by='Country')
df
#arranging the columns in the correct order
<- df %>% select(`Country`, `Regional indicator`, `2017`, `2018`, `2019`, `2020`, `2021`, `2022`)
df
#view first 6 rows of the dataframe
head(df)
The first question analysed the World Happiness score data for every country for trends across the six years. A choropleth map was created to visualise this.
#convert wide data frame to long data frame
<- df %>% select("Country", "2017", "2018", "2019", "2020", "2021", "2022") %>%
df_map melt(id='Country')
#rename columns for easier understanding
<- rename(df_map, 'Year' = 'variable', 'Happiness' = 'value')
df_map
#organize data frame by country instead of year
<- df_map[order(df_map$Country), ]
df_map
#round off happiness score column at 3 decimal places
$Happiness <- round(df_map$Happiness,3)
df_map
#drop rows with NA in happiness score column
<- drop_na(df_map, Happiness)
df_map
#view dataset
head(df_map)
#creating a directory for saving all figures
if(!dir.exists(paste0(getwd(),"/figures")))dir.create(paste0(getwd(),"/figures"))
<- paste0(getwd(),"/figures") figures_dir
#fonts
<- list(family = "Times New Roman")
f <- list(family = "Times New Roman", size=20)
f1
#create map visualization of world happiness data using plotly library
#locationmode specifies locations to be used i.e. countries in the world in the case
#frame specifies the column to be used to create the slider
#add trace creates a layer on the map and plots the data from the data frame columns on the map
<- plot_geo(df_map, locationmode = 'country names', frame = ~Year) %>%
covidhappiness_map add_trace(
locations = ~Country,
z = ~Happiness,
color = ~Happiness,
colors = "PuRd",
text = ~paste("Country:", Country,
"<br>Happiness score:", Happiness),
hoverinfo = "text"
%>%
) layout(font = f,
title = list(text='Happiness across the world from 2017 to 2022<br><sup>The impact of COVID-19 pandemic on world happiness<br><sup>Source: World Happiness Reports 2018 - 2023',
x = 0.5, y = 0.9, xanchor = 'center', yanchor = 'top', font=f1),
margin = list(l = 50, r = 50, b = 100, t = 100, pad = 4)
%>%
) colorbar(title = "Happiness Score", limits = c(1,8))
#save the visualisation
::saveWidget(covidhappiness_map, file.path(figures_dir, "map.html"))
htmlwidgets
#print the visualisation
covidhappiness_map
#This visual is created through plotly to be interactive.
#Click on each year in the slider below the visual, to view the data for that year.
#Click on ‘Play’ to cycle through the years.
#Hover the mouse on each country to view its happiness score.
The happiness scores for most counties seemed to have remained in a similar range across the years. No significant changes became apparent. However, data was unavailable for multiple countries for the pandemic years (2020-2022).
In order to better analyse the changes in World Happiness Scores from 2017-2019 to 2020-2022, the difference of means was considered. A diverging bar plot was constructed with the data for each country to visualise the direction and magnitude of difference.
#create a copy of the base dataframe for this visualisation
<- df
avg_df
#rename columns for ease and change numeric names to characters e.g. '2017' to 'year2017'
<- c("Country", "Region", "year2017","year2018","year2019","year2020","year2021","year2022")
col_names colnames(avg_df) <- col_names
#adding columns for calculating the difference of means
<- avg_df %>%
avg_df #create column for average happiness score for 2017-2019 (pre-pandemic)
mutate(avg_2017_2019 = (year2017 + year2018 + year2019)/3) %>%
#create column for average happiness score for 2020-2022 (during pandemic)
mutate(avg_2020_2022 = (year2020 + year2021 + year2022)/3) %>%
#create column for difference of means of 2017-2019 and 2020-2022
mutate(mean_diff = avg_2020_2022-avg_2017_2019) %>%
#create a column in which positive mean diff is TRUE and negative mean diff is FALSE
mutate(diff_pos = mean_diff >= 0)
#round off mean_diff column at 3 decimal places
$mean_diff <- round(avg_df$mean_diff,3)
avg_df
#remove rows with NA in mean_diff column
<- drop_na(avg_df, mean_diff)
tidy_avg_df
#create column having country name followed by mean difference for labelling y axis of the visualisation
<- mutate(tidy_avg_df,
tidy_avg_df Country_Details =
paste(tidy_avg_df$Country,"(",tidy_avg_df$mean_diff,")"))
<- tidy_avg_df %>%
change ggplot(aes(x=mean_diff, y=reorder(Country_Details, mean_diff), fill = diff_pos)) +
geom_col(colour = "black") +
scale_fill_manual (values = c("red","green"), name = "", labels=c("Mean Decrease","Mean Increase")) +
labs (title = "Comparing Happiness in 2017-2019 and 2020-2022",
subtitle = "Changes from pre-COVID years to during COVID pandemic years",
caption = "Source: World Happiness Reports 2018 - 2023",
x = "Changes from 2017-2019 to 2020-2022",
y = NULL) +
xlim (-2,2) +
theme(text=element_text(family = "serif"),
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
plot.title = element_text(size = 16, face = "bold"),
axis.ticks.y = element_blank(),
legend.position = "top",
legend.justification = "right",
legend.key.width = unit(0.5, unit = "cm"),
legend.key.height = unit(0.5, unit = "cm"))
#saving the visualisation
ggsave(filename = "country_wise_change.png",
plot = change,
path = figures_dir,
width = 10, height = 20, units = "in")
#print the visualisation
change
Almost equal number of countries saw an increase and decrease in happiness scores across the world. Overall, barring a few exceptions (e.g., Lebanon = –1.745), the mean difference between the two sets of years was quite small for most countries.
Taking the analysis further, the changes in World Happiness scores from 2017-2019 to 2020-2022 were considered at the level of world regions. A boxplot was constructed to visualise the mean difference of countries as per each world region.
<- ggplot(tidy_avg_df, aes(x=Region, y=mean_diff, fill=Region))
region_wise_1 <-
region_wise_plot_1 +
region_wise_1 geom_boxplot() +
labs(title = "Region-wise changes in happiness",
subtitle = "Plotting the mean difference between 2017-2019 and 2020-2022",
caption = "Source: World Happiness Report 2018 to 2023") +
xlab("World Region")+
ylab("Mean Difference")+
ylim (-2,1) +
theme(legend.position="none",
axis.text.x=element_text(angle = 45,hjust=1, size=8),
text=element_text(family = "serif"),
plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5)) +
scale_fill_brewer(palette = "Set3")
#save the visualisation
ggsave(filename = "region_wise_change.png",
plot = region_wise_plot_1,
path = figures_dir,
width = 8, height = 7, units = "in")
#print the visualisation
region_wise_plot_1
The mean difference for South Asia showed a negative trend more than other regions, suggesting that more countries in this region experienced a decrease in happiness compared to other regions. Central and Eastern Europe as well as the Commonwealth of Independent States showed a stronger positive trend suggesting an increase in happiness. However, the median scores for all regions were very close to zero suggesting that even at a world region level there was no significant increase or decrease in world happiness.
The project analysed the World Happiness scores from 2017 to 2022, to consider how wellbeing was affected across the world in light of the COVID-19 pandemic. The data was analysed and visualised to look for trends and changes at a global, world region and country level. Overall, the data suggested that that psychological wellbeing levels held their ground despite the anxiety and stress created by the global crisis.
The findings of this project were quite counterintuitive. So, further I would analyse the explanatory factors within the World Happiness Reports (e.g., social support, perceptions of corruption) to see what factors played a role in maintaining the world’s resilience. In terms of visualisation, I would create a more interactive dashboard that allows the viewer to choose variables (countries, regions, years, explanatory factors) that would then be visualised.
2017 data is from -
Helliwell, J., Layard, R., & Sachs, J. (2018). World Happiness Report 2018, New York: Sustainable Development Solutions Network. https://worldhappiness.report/ed/2018/#appendices-and-data
2018 data is from -
Helliwell, J., Layard, R., & Sachs, J. (2019). World Happiness Report 2019, New York: Sustainable Development Solutions Network. https://worldhappiness.report/ed/2019/#appendices-and-data
2019 data is from -
Helliwell, John F., Richard Layard, Jeffrey Sachs, and Jan-Emmanuel De Neve, eds. 2020. World Happiness Report 2020. New York: Sustainable Development Solutions Network https://worldhappiness.report/ed/2020/#appendices-and-data
2020 data is from -
Helliwell, John F., Richard Layard, Jeffrey Sachs, and Jan-Emmanuel De Neve, eds. 2021. World Happiness Report 2021. New York: Sustainable Development Solutions Network. https://worldhappiness.report/ed/2021/#appendices-and-data
2021 data is from -
Helliwell, J. F., Layard, R., Sachs, J. D., De Neve, J.-E., Aknin, L. B., & Wang, S. (Eds.). (2022). World Happiness Report 2022. New York: Sustainable Development Solutions Network. https://worldhappiness.report/ed/2022/#appendices-and-data
2022 data is from -
Helliwell, J. F., Layard, R., Sachs, J. D., Aknin, L. B., De Neve, J.-E., & Wang, S. (Eds.). (2023). World Happiness Report 2023 (11th ed.). Sustainable Development Solutions Network. https://worldhappiness.report/ed/2023/#appendices-and-data