<div align="center">
<h1>
Python for Social Science Workshop - Challenge 1
</h1>
</div>
<br />
<div align="center">
<h3>
Jose J Alcocer
</h3>
</div>
<br />
<div align="center">
<h4>
April 11, 2023
</h4>
</div>

****

This mini challenge will test your ability to use some of the stuff you learned for the purpose of data handling and visualization. For this challenge, we will be using a dataset that contains information about asylum applications in the EU from 2011 to 2022. The data was collected and compiled from an interactive dashboard (link [here](https://anonyms.shinyapps.io/asylum/)) created by [D. Toshkov](https://www.dimiter.eu/) (2022). <br>

<br>

Let's start by importing all the library packages we will use and upload/view the dataset.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv('Asylum applications in the EU.csv')

In [2]:
df

Unnamed: 0,id,Country,Year,Asylum applicants,Asylum applicants per capita,Asylum applicants per GDP,Asylum applicants (first instance),Asylum applicants (first instance) per capita,Asylum applicants (first instance) per GDP,Total positive decisions,Total positive decisions per capita,Total positive decisions per GDP,Geneva Convention status grants,Geneva Convention status grants per capita,Geneva Convention status grants per GDP
0,AT,Austria,2011,14455,173,47,14455,173,47,4085,49,13,2480,30,8
1,BE,Belgium,2011,32270,293,86,25585,233,68,5075,46,13,3810,35,10
2,BG,Bulgaria,2011,890,12,21,705,10,17,190,3,5,10,0,0
3,CH,Switzerland,2011,23880,303,46,19445,247,38,6445,82,12,3675,47,7
4,CY,Cyprus,2011,1770,211,89,1745,208,88,70,8,4,55,7,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
373,PT,Portugal,2022,2120,20,9,1980,19,8,675,7,3,615,6,3
374,RO,Romania,2022,12355,65,43,12065,63,42,1025,5,4,490,3,2
375,SE,Sweden,2022,18640,178,33,14075,135,25,3390,32,6,2200,21,4
376,SI,Slovenia,2022,6785,322,115,6645,315,113,205,10,3,40,2,1


1. Let's start by rearranging our dataframe so that we can see each country's total number of asylum applicants, average number of asylum applicants, and highest number of applicants throughout the entire years of the dataset. Hint: `.groupby()` function might help with this.

In [2]:
# Grouping DataFrame by country and asking python to calculate the mean of asylum applicants
df.groupby("Country")['Asylum applicants'].agg(['sum','mean','max'])

Unnamed: 0_level_0,sum,mean,max
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Austria,422805,35233.75,108780
Belgium,314740,26228.333333,44760
Bulgaria,103605,8633.75,20390
Croatia,24550,2455.0,12870
Cyprus,80980,6748.333333,22190
Czechia,15715,1309.583333,1920
Denmark,76930,6410.833333,20970
Estonia,4260,355.0,2945
Finland,76525,6377.083333,32345
France,1168340,97361.666667,156570


2. In the original DataFrame, create a new variable that calculates the ratio of asylum positive decisions by the total number of applicants per country, per year.

In [3]:
df['positive case by total applicants ratio'] = (df['Total positive decisions']/df['Asylum applicants'])

df

Unnamed: 0,id,Country,Year,Asylum applicants,Asylum applicants per capita,Asylum applicants per GDP,Asylum applicants (first instance),Asylum applicants (first instance) per capita,Asylum applicants (first instance) per GDP,Total positive decisions,Total positive decisions per capita,Total positive decisions per GDP,Geneva Convention status grants,Geneva Convention status grants per capita,Geneva Convention status grants per GDP,positive case by total applicants ratio
0,AT,Austria,2011,14455,173,47,14455,173,47,4085,49,13,2480,30,8,0.282601
1,BE,Belgium,2011,32270,293,86,25585,233,68,5075,46,13,3810,35,10,0.157267
2,BG,Bulgaria,2011,890,12,21,705,10,17,190,3,5,10,0,0,0.213483
3,CH,Switzerland,2011,23880,303,46,19445,247,38,6445,82,12,3675,47,7,0.269891
4,CY,Cyprus,2011,1770,211,89,1745,208,88,70,8,4,55,7,3,0.039548
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
373,PT,Portugal,2022,2120,20,9,1980,19,8,675,7,3,615,6,3,0.318396
374,RO,Romania,2022,12355,65,43,12065,63,42,1025,5,4,490,3,2,0.082962
375,SE,Sweden,2022,18640,178,33,14075,135,25,3390,32,6,2200,21,4,0.181867
376,SI,Slovenia,2022,6785,322,115,6645,315,113,205,10,3,40,2,1,0.030214


3. Create a copy of the original DataFrame so that you can subset it to include only four countries of your choice.

In [4]:
df2 = df.copy()

df2 = df2[(df2['Country']=="France") | (df2['Country']=="Germany") |
   (df2['Country']=="Greece") | (df2['Country']=="Sweden")]

df2

Unnamed: 0,id,Country,Year,Asylum applicants,Asylum applicants per capita,Asylum applicants per GDP,Asylum applicants (first instance),Asylum applicants (first instance) per capita,Asylum applicants (first instance) per GDP,Total positive decisions,Total positive decisions per capita,Total positive decisions per GDP,Geneva Convention status grants,Geneva Convention status grants per capita,Geneva Convention status grants per GDP,positive case by total applicants ratio
6,DE,Germany,2011,53345,66,20,45740,57,17,9675,12,4,7100,9,3,0.181367
9,GR,Greece,2011,9310,84,46,9310,84,46,180,2,1,45,0,0,0.019334
12,FR,France,2011,57335,88,28,52140,80,25,4615,7,2,3340,5,2,0.080492
27,SE,Sweden,2011,29710,316,72,29690,315,72,8805,94,21,2335,25,6,0.296365
37,DE,Germany,2012,77650,97,28,64540,80,24,17140,21,6,8765,11,3,0.220734
40,GR,Greece,2012,9575,86,51,9575,86,51,95,1,1,30,0,0,0.009922
43,FR,France,2012,61455,94,29,54280,83,26,8645,13,4,7070,11,3,0.140672
58,SE,Sweden,2012,43945,463,102,43930,463,102,12400,131,29,3745,39,9,0.282171
68,DE,Germany,2013,126995,158,45,109580,136,39,20125,25,7,10915,14,4,0.158471
71,GR,Greece,2013,8225,75,46,7860,71,44,500,5,3,255,2,1,0.06079


4. Plot the newly created ratio variable for all four countries across all years. Hint(s): you do not need to use the `datetime` for this; you might need to google 'seaborn hue argument options'.

In [7]:
# Setting theme to ticks only
sns.set_style('ticks')

# Creating figure object
fig = sns.lineplot(data=df2, x='Year', y='positive case by total applicants ratio',
             hue='Country')

# Removing x-axis margins
fig.margins(x=0)

# Adding title and labels
fig.set_title('Asylum Approvals in the EU', fontdict={'size': 18, 'weight': 'normal'})
fig.set_xlabel('Year', fontdict={'size': 12})
fig.set_ylabel('Ratio of Asylum Approvals to Applicants', fontdict={'size': 12})

# Manually overriding of x-axis ticks to display all years instead of only a few
plt.xticks(range(2011,2023,1))


ValueError: Could not interpret value `positive case by total applicants ratio` for parameter `y`