Python for Social Science Workshop - Lesson 3


Jose J Alcocer


April 18, 2023


Univariate and Multivariate Regressions


This lesson will be covering applications of regressions using multiple datasets. The first part of this lesson will give a brief summary of the linear regression estimator to provide a refresher.



1.0 Summary of Linear Regression Estimator

What is Linear Regression?

Linear Regression is a statistical method used to capture linear relationships between two (or more) variables.

The model consists of:

  • A dependent or outcome variable (Y) that denotes what we want to explain;
  • An independent or explanatory variable(s) (X) that denotes what we think explains a change in our outcome variable

An example of this can be the effect of education (X) on income (Y).

Choosing the Best Line

We strive to find the best fitting line as it allows us to:

  • Make predictions using new explanatory observations
  • Describe important relationships between explanatory and outcome variables
  • Make causal effect estimates (so long as our explanatory variable is randomly assigned)

The equation that gets us the best fitting line is:

$$ \( Y_{i} = \beta_{0} + \beta_{1}X_{i} + \varepsilon_{i} \) $$


Where:

  • The $\beta_{0}$ is the intercept or constant term; this tells us what our Y is when our X = 0,
  • The $\beta_{1}$ is the slope coefficient; this tells us how much Y is affected by X, and
  • The $\varepsilon$ is the error term; this represents all that is unobserved in this equation

How Do We Get the Best Line?

With so many possible lines, how exactly do we select the best one?

We select the combination of intercept $(\(\beta_{0}\))$ and coefficient $(\(\beta_{1}\))$ values from our line equation ($\( Y_{i} = \beta_{0} + \beta_{1}X_{i} + \varepsilon_{i} \)$) that minimizes our sum of squared residuals (SSR).

The sum of squared residuals is non-other than the difference between our actual outcomes $(\(Y_{i}\))$ and our predicted outcomes $(\(\hat{Y_{i}}\))$ generated by an iteration of a line equation being tested.

$$ SSR = \( \sum_{i=1}^{n}(Y_{i} - \hat{Y_{i}})^2 \) $$


One doesn't have to worry about how this formula is derived to obtain the ideal values, as we have statistical software help us; but, if interested then this link shows how it is done. In addition, this link provides an excellent review/introduction to regressions for those who may need it.

Let's apply this estimator in Python using a made-up example to start off and then move onto to real-world datasets.

1.1 Setting Up the Environment

Let's download the library packages we will be using for this lesson. The package that will allow us to run regressions in Python is the statsmodels library.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf

Let's also import the first dataset we will be using.

In [2]:
# Importing the dataset
df = pd.read_csv('Salary_Data.csv')
df.head()
Out[2]:
YearsExperience Salary Sex
0 1.1 39343 Female
1 1.3 46205 Male
2 1.5 37731 Male
3 2.0 43525 Female
4 2.2 39891 Female

2.0 Univariate Regression

The imported dataset reports year of experience at a certain job, their salary, and if the person is a woman. Suppose we want to run a regression where we are interested in understanding how years of work experience can affect one's salary. We would have to regress salary on years of experience. Before doing that exactly, let's begin by visualizing our data relationship.

We can visualize data by using the seaborn function sns.regplot and adding additional aesthetics.

In [3]:
# Set theme
sns.set_style('ticks')
plot = sns.regplot(x='YearsExperience', y='Salary', data= df, color='black')
# Adding title and labels
plot.set_title('Figure 1', fontdict={'size': 18, 'weight': 'normal'})
plot.set_xlabel('Years of Work Experience', fontdict={'size': 12})
plot.set_ylabel('Salary (Dollars)', fontdict={'size': 12})
Out[3]:
Text(0, 0.5, 'Salary (Dollars)')

Our initial visualization of the data shows that there may be a linear relationship between both years of work experience and salary. We can also see that our data appears homoskedastic (there is no apparent variance or sparsity in the data observations). Let's continue by fitting the model and running our regression.

In [4]:
# Running the regression
model = smf.ols('Salary ~ YearsExperience', data=df).fit()

# Presenting a table summary
print(model.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 Salary   R-squared:                       0.957
Model:                            OLS   Adj. R-squared:                  0.955
Method:                 Least Squares   F-statistic:                     622.5
Date:                Tue, 18 Apr 2023   Prob (F-statistic):           1.14e-20
Time:                        12:44:12   Log-Likelihood:                -301.44
No. Observations:                  30   AIC:                             606.9
Df Residuals:                      28   BIC:                             609.7
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
===================================================================================
                      coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------
Intercept        2.579e+04   2273.053     11.347      0.000    2.11e+04    3.04e+04
YearsExperience  9449.9623    378.755     24.950      0.000    8674.119    1.02e+04
==============================================================================
Omnibus:                        2.140   Durbin-Watson:                   1.648
Prob(Omnibus):                  0.343   Jarque-Bera (JB):                1.569
Skew:                           0.363   Prob(JB):                        0.456
Kurtosis:                       2.147   Cond. No.                         13.2
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

The model.summary() commands presents us with a nicely formatted table displaying a lot of information. Let's dissect it.

2.1 Interpreting the Results¶

2.1.1 Top Section of Regresion Results¶

The top section of the regression summary contains data on what was done and what was calculated in the regression model.

Starting with the left column of information, the summary tells us what our outcome variable of interest was (in this case it was the salary of each individual), followed by the model and method that was used to get the results (in this case, it was an ordinary least squares regression that obtained the intercept and coefficient results by minimizing the sum of squared residuals).

It then displays system information (the time and date that the regression was done) and it ends with some information about the dataset that it used to calculate the results (the total number of observations in the dataset was 30 and the 'df residuals' is simply the total number of observations of the dataset subtracted by the number of variables being estimated).

The last input on the left column displays what kind of covariance measure the model accounted for. Here, we have 'nonrobust', which is the default type of covariance measure that does not account for any Heteroskedasticity or Heteroskedasticity-Autocorrelation present in the data. We will run a different kind of regression that does account for these two later.

The right column of the top section continues to display data related to the regression model built.

Starting with the first two measures, the R-squared measures how well the line we found fits the data and explains the relationship. The R-squared gives us the percentage amount of variation explained by our regression line created. It uses the Sum of Squared Residuals (SSR) we calculated earlier to obtain this value. It does by dividing that SSR by the Total Sum of Sqaures (TSS), and it is denoted with the following equation:

$$ R^2 = 1 - \frac{SSR}{TSS} $$


If you are interested in knowing how the TSS is calculated, this three-minute YouTube video explains it quite well.

Because our R-Squared metric ranges from 0 to 1, we can interpret a value closer to 1 as a higher value that explains the relationship between the data. A value closer to 0 would suggest that our regression does not explain the relationship between the variables.

The adjusted R-Squared is a modified version of the R-Squared that takes into account additional predictors that may be present in the model. Here, because we only have one independent variable, our Adjusted R-Squared does not vary by much. If you are interested in re-familiarizing yourself with these two concepts, the following two links are good resources to do so.

  • Towards Data Science article
  • Statology article

The F-Statistic measures the ratio of the mean squares treatment over the means squares error. It essentially measures the variation between sample means relative to the variation found within the samples used in the regression. Having a large F-Statistic value would suggest that there is evidence of a difference between the group means (you can reject the null hypothesis of no effect)

The Probability (F-Statistic) is none other than our collective p-value for our regression results. Per conventional standards, if our Prob. F-Statistic is lower than 0.05, then we can reject the null hypothesis of no effect (meaning our results are statistically significant).

We will ignore the last three measures as we do not need to understand these measures for our purposes.

2.1.2 Middle Section of Regression Results

The midsection of the regression summary is the most important, as this section displays our intercept and coefficient details. Let's interpret what we are seeing.

The first column displays all the variables that were inputted into the regression model. In this case, we only have our intercept and our years of experience variable.

The intercept is the value that our regression model equation produces if all independent variables were set to 0.

'YearsExperience' is our variable of interest and here we can see the beta coefficient's effect on salary. One would interpret this value as: an additional year of work experience is associated with an estimated increase of $9,450 dollars on one's salary, holding all else constant (that's a big jump).

The second column displays the standard errors (SEs) associated with the beta coefficients in our model. The standard error metric measures the sampling variation from our sample estimates (n) to the true population (N) that is unobserved. In other words, it is an indication of how well our sample estimates can be in relation to our beta coefficient(s). The higher the SE, the larger the variation from our sample to the true population pool. How does this relate to our estimated effects. If the SE of a coefficient were to be close to its estimated effect, then it would be assumed that the difference is close to none, making it indistinguishable from zero (non-significant effect). If the SE of a coefficient is not remotely close to our coefficient, then it can be suggested that our effect is rather large and therefore distinguishable from zero (significant effect). Because our coefficient is rather large from the SE, we can assume that there are distinguishable effects present.

The third column displays the test statistic, which is the coefficient divided by the standard error.

In order to get our p-value, one would need to take our test statistic and look at the t-distribution table with n-1 degrees of freedom to get that value. Luckily, the fourth column does that for us and presents the p-value of the beta coefficient estimate. Under arbitrary but conventional norms of significance, if our p-value is less than 0.05, then we can deem our coefficient as statistically significant.

The last column displays the beta coefficient's values within 95% of our data (or within 2 standard deviations of it). This is known as our confidence intervals.

2.1.3 Bottom Section of Regression Results

While the bottom section of the results contains multiple measures, the most important one to know is the Durbin-Watson measure. This measure tells us about the regression's amount of homoscedasticity, or how even are the distribution of errors in the data. Having a value between 1 and 2 would suggest that there is an acceptable amount of even distribution, or no presence of Heteroskedasticity.

3.0 Multivariate Regression

Suppose we wanted to introduce an additional variable into our regression: the sex of the individual. We would then have to regress salary on years of experience and sex. Before doing that exactly, let's begin by visualizing our data relationship using one continuous and categorical variable.

While the previous example used sns.regplot() to plot our regression visualization, we will be using sns.lmplot() this time, as it will allow us to add a categorical component. There are two ways to plot a regression line, depending on how you want to convey the data.

Method A: Both Lines In Same Plot¶

In [5]:
# Set theme
sns.set_style('ticks')

# Plot itself
plot = sns.lmplot(x='YearsExperience', y='Salary', data= df, hue ='Sex', markers =['o', 'v'])

# Plot title and axis labels
plt.title('Figure 2')
plt.ylabel("Salary")
plt.xlabel("Years of Work Experience")

# Adjust x limits
#plt.xlim(1,10)

# Remove legend from outer layer to innner
plt.tight_layout()

# Set plot size for display
plt.figure(figsize=(8, 6))
Out[5]:
<Figure size 800x600 with 0 Axes>
<Figure size 800x600 with 0 Axes>

Method B: Two Different Plots¶

In [6]:
# Set theme
sns.set_style('ticks')

# Plot itself
plot = sns.lmplot(x='YearsExperience', y='Salary', data= df, col ='Sex')

# Remove legend from outer layer to innner
plt.tight_layout()

# Set plot size for display
plt.figure(figsize=(8, 6))
Out[6]:
<Figure size 800x600 with 0 Axes>
<Figure size 800x600 with 0 Axes>

Now, let's run the regression. There are multiple ways to run a regression with a categorical variable as an additional independent variable. We will cover a few of those ways.

Method 1: Straight Forward Plugging Variables In

Here, our coefficient tells us what the relationship between sex and salary is. Python automatically chooses one of the categories, and it gives us that relationship. Useful and easy, but not as modular if you were interested in having more control over what actually gets captured.

In [7]:
# Running the regression
model = smf.ols('Salary ~ YearsExperience + Sex' , data=df).fit()

# Presenting a table summary
print(model.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 Salary   R-squared:                       0.958
Model:                            OLS   Adj. R-squared:                  0.955
Method:                 Least Squares   F-statistic:                     306.9
Date:                Tue, 18 Apr 2023   Prob (F-statistic):           2.71e-19
Time:                        12:44:12   Log-Likelihood:                -301.12
No. Observations:                  30   AIC:                             608.2
Df Residuals:                      27   BIC:                             612.5
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
===================================================================================
                      coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------
Intercept        2.656e+04   2505.089     10.604      0.000    2.14e+04    3.17e+04
Sex[T.Male]     -1625.9575   2138.556     -0.760      0.454   -6013.912    2761.997
YearsExperience  9467.9701    382.375     24.761      0.000    8683.401    1.03e+04
==============================================================================
Omnibus:                        1.685   Durbin-Watson:                   1.656
Prob(Omnibus):                  0.431   Jarque-Bera (JB):                1.198
Skew:                           0.232   Prob(JB):                        0.549
Kurtosis:                       2.138   Cond. No.                         16.0
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Method 2: Specifying A Category

An alternative way to let Python know you are working with a categorical variable is by using the C() operator within the regression. As we can see, we are still having Python choose the variable for us.

In [8]:
# Running the regression
model = smf.ols('Salary ~ YearsExperience + C(Sex)' , data=df).fit()

# Presenting a table summary
print(model.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 Salary   R-squared:                       0.958
Model:                            OLS   Adj. R-squared:                  0.955
Method:                 Least Squares   F-statistic:                     306.9
Date:                Tue, 18 Apr 2023   Prob (F-statistic):           2.71e-19
Time:                        12:44:12   Log-Likelihood:                -301.12
No. Observations:                  30   AIC:                             608.2
Df Residuals:                      27   BIC:                             612.5
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
===================================================================================
                      coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------
Intercept        2.656e+04   2505.089     10.604      0.000    2.14e+04    3.17e+04
C(Sex)[T.Male]  -1625.9575   2138.556     -0.760      0.454   -6013.912    2761.997
YearsExperience  9467.9701    382.375     24.761      0.000    8683.401    1.03e+04
==============================================================================
Omnibus:                        1.685   Durbin-Watson:                   1.656
Prob(Omnibus):                  0.431   Jarque-Bera (JB):                1.198
Skew:                           0.232   Prob(JB):                        0.549
Kurtosis:                       2.138   Cond. No.                         16.0
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Method 3: Manipulating the DataFrame to Create our Own Set of Dummies

A third way to let Python know you are working with a categorical variable is by creating a new variable that is a dummy of women. There are two ways to do this: (A) the long way, and (B) the short way. Let's cover both of them.

3A: The Intuitive But Long Way¶

In [9]:
# Show DataFrame
df.head()
Out[9]:
YearsExperience Salary Sex
0 1.1 39343 Female
1 1.3 46205 Male
2 1.5 37731 Male
3 2.0 43525 Female
4 2.2 39891 Female
In [10]:
# Create an empty list
list = []

# Use a for-loop and if-else statement
for x in df['Sex']:     # For every observation (row) in the variable, Sex
    if x == 'Female':   # If the observation (row) in the variable, Sex, is equal to 'Female'
        list.append(1)  # Add a 1 to nth observation in the list we created
    else:
        list.append(0)  # If it doesn't say "Female", add a 0 to that nth observation

# Create a new variable and assign that list we created
df['Female'] = list

# Confirm that the variable is present
print(df)

# Running the regression
model = smf.ols('Salary ~ YearsExperience + Female' , data=df).fit()

# Presenting a table summary
print(model.summary())
    YearsExperience  Salary     Sex  Female
0               1.1   39343  Female       1
1               1.3   46205    Male       0
2               1.5   37731    Male       0
3               2.0   43525  Female       1
4               2.2   39891  Female       1
5               2.9   56642  Female       1
6               3.0   60150    Male       0
7               3.2   54445    Male       0
8               3.2   64445  Female       1
9               3.7   57189    Male       0
10              3.9   63218    Male       0
11              4.0   55794  Female       1
12              4.0   56957  Female       1
13              4.1   57081    Male       0
14              4.5   61111    Male       0
15              4.9   67938    Male       0
16              5.1   66029  Female       1
17              5.3   83088    Male       0
18              5.9   81363    Male       0
19              6.0   93940  Female       1
20              6.8   91738  Female       1
21              7.1   98273  Female       1
22              7.9  101302    Male       0
23              8.2  113812  Female       1
24              8.7  109431  Female       1
25              9.0  105582    Male       0
26              9.5  116969    Male       0
27              9.6  112635    Male       0
28             10.3  122391    Male       0
29             10.5  121872  Female       1
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 Salary   R-squared:                       0.958
Model:                            OLS   Adj. R-squared:                  0.955
Method:                 Least Squares   F-statistic:                     306.9
Date:                Tue, 18 Apr 2023   Prob (F-statistic):           2.71e-19
Time:                        12:44:12   Log-Likelihood:                -301.12
No. Observations:                  30   AIC:                             608.2
Df Residuals:                      27   BIC:                             612.5
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
===================================================================================
                      coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------
Intercept        2.494e+04   2551.242      9.775      0.000    1.97e+04    3.02e+04
YearsExperience  9467.9701    382.375     24.761      0.000    8683.401    1.03e+04
Female           1625.9575   2138.556      0.760      0.454   -2761.997    6013.912
==============================================================================
Omnibus:                        1.685   Durbin-Watson:                   1.656
Prob(Omnibus):                  0.431   Jarque-Bera (JB):                1.198
Skew:                           0.232   Prob(JB):                        0.549
Kurtosis:                       2.138   Cond. No.                         16.3
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

3B: The Easy Way¶

In [11]:
# Importing the dataset to clear it
df = pd.read_csv('Salary_Data.csv')

# Use pd.get_dummies to generate dummy variables out of Sex variable
# You can use the argument 'drop_first=True' to drop whatever is the first dummy variable
#   created
Sex = pd.get_dummies(df['Sex'], drop_first=False)

# Join new dataset with original one using 'df.join'
# Unlike concat and merge, join command forces a join without any logic, this only works
#   if your rows are aligned well
df = df.join(Sex)

# Running the regression
model = smf.ols('Salary ~ YearsExperience + Female' , data=df).fit()

# Presenting a table summary
print(model.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 Salary   R-squared:                       0.958
Model:                            OLS   Adj. R-squared:                  0.955
Method:                 Least Squares   F-statistic:                     306.9
Date:                Tue, 18 Apr 2023   Prob (F-statistic):           2.71e-19
Time:                        12:44:12   Log-Likelihood:                -301.12
No. Observations:                  30   AIC:                             608.2
Df Residuals:                      27   BIC:                             612.5
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
===================================================================================
                      coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------
Intercept        2.494e+04   2551.242      9.775      0.000    1.97e+04    3.02e+04
YearsExperience  9467.9701    382.375     24.761      0.000    8683.401    1.03e+04
Female           1625.9575   2138.556      0.760      0.454   -2761.997    6013.912
==============================================================================
Omnibus:                        1.685   Durbin-Watson:                   1.656
Prob(Omnibus):                  0.431   Jarque-Bera (JB):                1.198
Skew:                           0.232   Prob(JB):                        0.549
Kurtosis:                       2.138   Cond. No.                         16.3
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

3.1 Multivariate Regression with Interaction Terms

Suppose we wanted to know what are the effects of women and years of work experience on salary. Regressions tend to hold variables constant, so unless we tell python to interact both of them at the same time, we will not be able to observe the effect of both variables together.

We can do this by using the '*' operator within the regression function.

In [12]:
# Running the regression
model = smf.ols('Salary ~ YearsExperience * Female' , data=df).fit()

# Presenting a table summary
print(model.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 Salary   R-squared:                       0.959
Model:                            OLS   Adj. R-squared:                  0.955
Method:                 Least Squares   F-statistic:                     205.3
Date:                Tue, 18 Apr 2023   Prob (F-statistic):           3.25e-18
Time:                        12:44:12   Log-Likelihood:                -300.53
No. Observations:                  30   AIC:                             609.1
Df Residuals:                      26   BIC:                             614.7
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==========================================================================================
                             coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------------
Intercept               2.687e+04   3170.710      8.473      0.000    2.03e+04    3.34e+04
YearsExperience         9115.6677    514.374     17.722      0.000    8058.357    1.02e+04
Female                 -2526.1464   4587.162     -0.551      0.587    -1.2e+04    6902.899
YearsExperience:Female   785.8048    768.207      1.023      0.316    -793.268    2364.878
==============================================================================
Omnibus:                        3.031   Durbin-Watson:                   1.703
Prob(Omnibus):                  0.220   Jarque-Bera (JB):                1.450
Skew:                           0.138   Prob(JB):                        0.484
Kurtosis:                       1.959   Cond. No.                         33.3
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

3.2 Multivariate Regression with Robust Standard Errors

In addition, we can specify to Python that we want robust standard errors by inserting the "cov_type = 'HC1'" argument in the .fit() function.

In [13]:
# Running the regression
model = smf.ols('Salary ~ YearsExperience * Female' , data=df).fit(cov_type='HC1')

# Presenting a table summary
print(model.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 Salary   R-squared:                       0.959
Model:                            OLS   Adj. R-squared:                  0.955
Method:                 Least Squares   F-statistic:                     321.1
Date:                Tue, 18 Apr 2023   Prob (F-statistic):           1.18e-20
Time:                        12:44:12   Log-Likelihood:                -300.53
No. Observations:                  30   AIC:                             609.1
Df Residuals:                      26   BIC:                             614.7
Df Model:                           3                                         
Covariance Type:                  HC1                                         
==========================================================================================
                             coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------------------
Intercept               2.687e+04   2832.504      9.485      0.000    2.13e+04    3.24e+04
YearsExperience         9115.6677    374.322     24.352      0.000    8382.010    9849.326
Female                 -2526.1464   4393.520     -0.575      0.565   -1.11e+04    6084.995
YearsExperience:Female   785.8048    682.491      1.151      0.250    -551.854    2123.464
==============================================================================
Omnibus:                        3.031   Durbin-Watson:                   1.703
Prob(Omnibus):                  0.220   Jarque-Bera (JB):                1.450
Skew:                           0.138   Prob(JB):                        0.484
Kurtosis:                       1.959   Cond. No.                         33.3
==============================================================================

Notes:
[1] Standard Errors are heteroscedasticity robust (HC1)

3.3 Running Multiple Models and Presenting Them in A Table

Suppose we want to compare different regression models and present them in a more organized table. There are two methods that can allow us to do this.

Method 1: Using Statsmodel Summary Col¶

In [14]:
# Import summary_col package
from statsmodels.iolib.summary2 import summary_col
In [15]:
## Run three separate regression models
# Reg 1
model1 = smf.ols('Salary ~ YearsExperience' , data=df).fit()
# Reg 2
model2 = smf.ols('Salary ~ YearsExperience + Female' , data=df).fit()
# Reg 3
model3 = smf.ols('Salary ~ YearsExperience * Female' , data=df).fit()

dfoutput = summary_col([model1,model2,model3],stars=True, float_format='%0.2f',
                       regressor_order=model1.params.index.tolist())
print(dfoutput)

# Use this to produce the table as LaTeX and copy/paste it to get actual table
#dfoutput.as_latex()
==========================================================
                         Salary I   Salary II   Salary III
----------------------------------------------------------
Intercept              25792.20*** 24937.74*** 26866.59***
                       (2273.05)   (2551.24)   (3170.71)  
YearsExperience        9449.96***  9467.97***  9115.67*** 
                       (378.75)    (382.38)    (514.37)   
Female                             1625.96     -2526.15   
                                   (2138.56)   (4587.16)  
YearsExperience:Female                         785.80     
                                               (768.21)   
R-squared              0.96        0.96        0.96       
R-squared Adj.         0.96        0.95        0.95       
==========================================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01

Method 2: Using Stargazer Library (Originally R Package Now Adapted For Python)

The github repo for this package can be found here.

In [16]:
## Install package from CRAN
# pip install stargazer

# Import library package to workspace
from stargazer.stargazer import Stargazer, LineLocation

# Use previous models to create the table
stargazer = Stargazer([model1,model2,model3] )

# Specify significant digits
stargazer.significant_digits(2)

stargazer
Out[16]:
Dependent variable:Salary
(1)(2)(3)
Female1625.96-2526.15
(2138.56)(4587.16)
Intercept25792.20***24937.74***26866.59***
(2273.05)(2551.24)(3170.71)
YearsExperience9449.96***9467.97***9115.67***
(378.75)(382.38)(514.37)
YearsExperience:Female785.80
(768.21)
Observations303030
R20.960.960.96
Adjusted R20.960.950.95
Residual Std. Error5788.32 (df=28)5832.43 (df=27)5827.43 (df=26)
F Statistic622.51*** (df=1; 28)306.85*** (df=2; 27)205.27*** (df=3; 26)
Note: *p<0.1; **p<0.05; ***p<0.01

Here, we notice that our coefficients are a little wonky, as Python places them in alphabetical order. But, if we wanted them in a more traditional format, we can also specify it to the stargazer package.

In [17]:
stargazer.covariate_order(['Intercept', 'YearsExperience', 'Female', 'YearsExperience:Female'])
stargazer

# The table we see below is HTML, if we want the code for that, all we need to execute is
# stargazer.render_html()
Out[17]:
Dependent variable:Salary
(1)(2)(3)
Intercept25792.20***24937.74***26866.59***
(2273.05)(2551.24)(3170.71)
YearsExperience9449.96***9467.97***9115.67***
(378.75)(382.38)(514.37)
Female1625.96-2526.15
(2138.56)(4587.16)
YearsExperience:Female785.80
(768.21)
Observations303030
R20.960.960.96
Adjusted R20.960.950.95
Residual Std. Error5788.32 (df=28)5832.43 (df=27)5827.43 (df=26)
F Statistic622.51*** (df=1; 28)306.85*** (df=2; 27)205.27*** (df=3; 26)
Note: *p<0.1; **p<0.05; ***p<0.01

Lastly, we can also render the table as LaTeX for easy exporting.

In [18]:
print(stargazer.render_latex())
\begin{table}[!htbp] \centering
\begin{tabular}{@{\extracolsep{5pt}}lccc}
\\[-1.8ex]\hline
\hline \\[-1.8ex]
& \multicolumn{3}{c}{\textit{Dependent variable:}} \
\cr \cline{3-4}
\\[-1.8ex] & (1) & (2) & (3) \\
\hline \\[-1.8ex]
 Intercept & 25792.20$^{***}$ & 24937.74$^{***}$ & 26866.59$^{***}$ \\
  & (2273.05) & (2551.24) & (3170.71) \\
 YearsExperience & 9449.96$^{***}$ & 9467.97$^{***}$ & 9115.67$^{***}$ \\
  & (378.75) & (382.38) & (514.37) \\
 Female & & 1625.96$^{}$ & -2526.15$^{}$ \\
  & & (2138.56) & (4587.16) \\
 YearsExperience:Female & & & 785.80$^{}$ \\
  & & & (768.21) \\
\hline \\[-1.8ex]
 Observations & 30 & 30 & 30 \\
 $R^2$ & 0.96 & 0.96 & 0.96 \\
 Adjusted $R^2$ & 0.96 & 0.95 & 0.95 \\
 Residual Std. Error & 5788.32(df = 28) & 5832.43(df = 27) & 5827.43(df = 26)  \\
 F Statistic & 622.51$^{***}$ (df = 1.0; 28.0) & 306.85$^{***}$ (df = 2.0; 27.0) & 205.27$^{***}$ (df = 3.0; 26.0) \\
\hline
\hline \\[-1.8ex]
\textit{Note:} & \multicolumn{3}{r}{$^{*}$p$<$0.1; $^{**}$p$<$0.05; $^{***}$p$<$0.01} \\
\end{tabular}
\end{table}

3.3 Multivariate Regression Using Real Data

Example 1

Let's use a more real world example of data. The following data is from Zhou and Shaver's (2021) article, 'Reexamining the Effect of Refugees on Civil Conflict: A Global Subnational Analysis', and they seek to understand (among other things) whether the presence of refugees cause more conflict within states. While their paper conducts different analyses, the one we will be replicating is their robust regression of refugee presence on amount of Violent Events. The regression equation they use to estimate these effects is:

$$ ViolentEvents_{i,t} = \beta_0 + \beta_1 RefugeePresence_{i,t} + \beta_2 RefugeePresenceinOtherProvinces_{i,t} + $$$$ \beta_3 LaggedViolentEvents_{i,t} + \beta_4 NeighboringViolentEvents_{i,t} + $$$$ \beta_5 LaggedPopulation_{i,t} + \beta_6 LaggedGDP_{i,t} + \beta_7 TerrainRuggedness_{i} + $$$$ \beta_8 ProvinceSize_{i} + \beta_9 BorderDistance_{i} + \beta_{10} CapitalDistance_{i} + $$$$ \beta_{11} IDPPresence_{i,t} + CountryFixedEffects + YearFixedEffects + \varepsilon_{i,t} $$

Let's import the data and view it.

In [19]:
df = pd.read_csv('panel.full_GED_2020.csv')
df.head()
Out[19]:
GMI_ADMIN GMI_CNTRY CNTRY_NAME ADMIN_NAME Country Year attack attack_state_based attack_non_state attack_one_sided ... no_new_site_rs_1 no_new_site_rt_2 no_new_site_rc_2 no_new_site_rs_2 nearborder nearcapital TotalRoadLength RoadDensity TotalRoadLength_km year
0 AFG-BAM AFG Afghanistan Bamian Afghanistan 1990-01-01 0.0 0.0 0.0 0.0 ... 0 0 0 0 0.0 1.0 NaN NaN NaN 1990
1 AFG-BAM AFG Afghanistan Bamian Afghanistan 1991-01-01 0.0 0.0 0.0 0.0 ... 0 0 0 0 0.0 1.0 NaN NaN NaN 1991
2 AFG-BAM AFG Afghanistan Bamian Afghanistan 1992-01-01 0.0 0.0 0.0 0.0 ... 0 0 0 0 0.0 1.0 789914.4805 0.041157 789.91448 1992
3 AFG-BAM AFG Afghanistan Bamian Afghanistan 1993-01-01 0.0 0.0 0.0 0.0 ... 0 0 0 0 0.0 1.0 NaN NaN NaN 1993
4 AFG-BAM AFG Afghanistan Bamian Afghanistan 1994-01-01 0.0 0.0 0.0 0.0 ... 0 0 0 0 0.0 1.0 NaN NaN NaN 1994

5 rows × 541 columns

The replication instructions tell us what variables were used for this regression analysis. Let's create a list object of those names and subset them.

In [20]:
var_list = ['attack','rtb','rtb.other','attack_1','attack_neighbors_sum', 'log_pop_1',
            'gcp_ppp_1', 'STD', 'SQKM_ADMIN', 'log_bdist2', 'log_capdist', 'idpb',
            'Country', 'year']
df = df[var_list]
df.head()
Out[20]:
attack rtb rtb.other attack_1 attack_neighbors_sum log_pop_1 gcp_ppp_1 STD SQKM_ADMIN log_bdist2 log_capdist idpb Country year
0 0.0 0 0 NaN 2 NaN NaN 221.432 19192.551 5.458124 5.258359 0 Afghanistan 1990
1 0.0 0 1 0.0 2 12.590328 0.234417 221.432 19192.551 5.458124 5.258359 0 Afghanistan 1991
2 0.0 0 1 0.0 0 12.666320 0.234417 221.432 19192.551 5.458124 5.258359 0 Afghanistan 1992
3 0.0 0 0 0.0 1 12.736942 0.234417 221.432 19192.551 5.458124 5.258359 0 Afghanistan 1993
4 0.0 0 1 0.0 20 12.802905 0.234417 221.432 19192.551 5.458124 5.258359 0 Afghanistan 1994

Before we move on to the main analysis, let's discuss some useful commands.

In [21]:
# This gives us a set of statistics across all variables in the dataset
# You can specify a few by indexing the df with those variables then placing describe after
# exammple: df['attack'].describe()
df.describe()
Out[21]:
attack rtb rtb.other attack_1 attack_neighbors_sum log_pop_1 gcp_ppp_1 STD SQKM_ADMIN log_bdist2 log_capdist idpb year
count 73215.000000 73527.000000 73527.000000 70679.000000 73527.000000 65819.000000 65324.000000 73527.000000 7.352700e+04 73179.000000 73179.000000 73527.000000 73527.000000
mean 2.247067 0.037184 0.247474 2.161689 6.220246 13.375119 19.890419 103.564727 5.203192e+04 4.664811 5.588554 0.003618 2004.002652
std 20.342200 0.189213 0.431547 19.792058 45.933583 1.678239 74.463542 86.645497 1.676880e+05 1.120037 1.084331 0.060039 8.365720
min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.057000e+00 0.716082 1.019839 0.000000 1990.000000
25% 0.000000 0.000000 0.000000 0.000000 0.000000 12.477365 0.701478 25.171700 3.943059e+03 3.876278 4.891450 0.000000 1997.000000
50% 0.000000 0.000000 0.000000 0.000000 0.000000 13.450893 2.971663 84.933500 1.131859e+04 4.671915 5.645031 0.000000 2004.000000
75% 0.000000 0.000000 0.000000 0.000000 0.000000 14.376565 9.336956 162.612000 3.908533e+04 5.391149 6.280998 0.000000 2011.000000
max 1986.000000 1.000000 1.000000 1986.000000 3472.000000 19.282600 1973.801000 459.845000 3.528428e+06 7.772757 8.948518 1.000000 2018.000000

We know that the country variable is categorical, so we can table it to see what countries are in our dataset

In [22]:
df['Country'].value_counts()
Out[22]:
Russia           2351
Thailand         2088
Turkey           1943
Vietnam          1537
United States    1508
                 ... 
Macedonia          26
South Sudan        24
Nauru              20
Timor Leste        17
Montenegro         13
Name: Country, Length: 185, dtype: int64

We can also run correlations on variables we are interested in and provide aesthetic arguments to them. Here, we will just run a correlation on a few numeric variables in the dataframe.

In [23]:
# Storing the correlation in an object
corr = df[['attack','rtb','log_pop_1',
           'gcp_ppp_1']].corr(numeric_only=True) # this makes sure that it keeps numeric only

# Aesthetic mapping
corr.style.background_gradient(cmap='RdBu')
Out[23]:
  attack rtb log_pop_1 gcp_ppp_1
attack 1.000000 0.040611 0.078641 -0.010595
rtb 0.040611 1.000000 0.081201 -0.032658
log_pop_1 0.078641 0.081201 1.000000 0.396185
gcp_ppp_1 -0.010595 -0.032658 0.396185 1.000000

Now, back to the main analysis. Let's run the regression.

In [24]:
model = smf.ols('attack ~ rtb + rtb.other + attack_1 + attack_neighbors_sum + log_pop_1 + gcp_ppp_1 + STD + SQKM_ADMIN + log_bdist2 + log_capdist + idpb + C(Country) + C(year)', data=df).fit()

print(model.summary())
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File ~/Library/r-miniconda/lib/python3.10/site-packages/patsy/compat.py:36, in call_and_wrap_exc(msg, origin, f, *args, **kwargs)
     35 try:
---> 36     return f(*args, **kwargs)
     37 except Exception as e:

File ~/Library/r-miniconda/lib/python3.10/site-packages/patsy/eval.py:169, in EvalEnvironment.eval(self, expr, source_name, inner_namespace)
    168 code = compile(expr, source_name, "eval", self.flags, False)
--> 169 return eval(code, {}, VarLookupDict([inner_namespace]
    170                                     + self._namespaces))

File <string>:1

File ~/Library/r-miniconda/lib/python3.10/site-packages/pandas/core/generic.py:5902, in NDFrame.__getattr__(self, name)
   5901     return self[name]
-> 5902 return object.__getattribute__(self, name)

AttributeError: 'Series' object has no attribute 'other'

The above exception was the direct cause of the following exception:

PatsyError                                Traceback (most recent call last)
Cell In[24], line 1
----> 1 model = smf.ols('attack ~ rtb + rtb.other + attack_1 + attack_neighbors_sum + log_pop_1 + gcp_ppp_1 + STD + SQKM_ADMIN + log_bdist2 + log_capdist + idpb + C(Country) + C(year)', data=df).fit()
      3 print(model.summary())

File ~/Library/r-miniconda/lib/python3.10/site-packages/statsmodels/base/model.py:200, in Model.from_formula(cls, formula, data, subset, drop_cols, *args, **kwargs)
    197 if missing == 'none':  # with patsy it's drop or raise. let's raise.
    198     missing = 'raise'
--> 200 tmp = handle_formula_data(data, None, formula, depth=eval_env,
    201                           missing=missing)
    202 ((endog, exog), missing_idx, design_info) = tmp
    203 max_endog = cls._formula_max_endog

File ~/Library/r-miniconda/lib/python3.10/site-packages/statsmodels/formula/formulatools.py:63, in handle_formula_data(Y, X, formula, depth, missing)
     61 else:
     62     if data_util._is_using_pandas(Y, None):
---> 63         result = dmatrices(formula, Y, depth, return_type='dataframe',
     64                            NA_action=na_action)
     65     else:
     66         result = dmatrices(formula, Y, depth, return_type='dataframe',
     67                            NA_action=na_action)

File ~/Library/r-miniconda/lib/python3.10/site-packages/patsy/highlevel.py:309, in dmatrices(formula_like, data, eval_env, NA_action, return_type)
    299 """Construct two design matrices given a formula_like and data.
    300 
    301 This function is identical to :func:`dmatrix`, except that it requires
   (...)
    306 See :func:`dmatrix` for details.
    307 """
    308 eval_env = EvalEnvironment.capture(eval_env, reference=1)
--> 309 (lhs, rhs) = _do_highlevel_design(formula_like, data, eval_env,
    310                                   NA_action, return_type)
    311 if lhs.shape[1] == 0:
    312     raise PatsyError("model is missing required outcome variables")

File ~/Library/r-miniconda/lib/python3.10/site-packages/patsy/highlevel.py:164, in _do_highlevel_design(formula_like, data, eval_env, NA_action, return_type)
    162 def data_iter_maker():
    163     return iter([data])
--> 164 design_infos = _try_incr_builders(formula_like, data_iter_maker, eval_env,
    165                                   NA_action)
    166 if design_infos is not None:
    167     return build_design_matrices(design_infos, data,
    168                                  NA_action=NA_action,
    169                                  return_type=return_type)

File ~/Library/r-miniconda/lib/python3.10/site-packages/patsy/highlevel.py:66, in _try_incr_builders(formula_like, data_iter_maker, eval_env, NA_action)
     64 if isinstance(formula_like, ModelDesc):
     65     assert isinstance(eval_env, EvalEnvironment)
---> 66     return design_matrix_builders([formula_like.lhs_termlist,
     67                                    formula_like.rhs_termlist],
     68                                   data_iter_maker,
     69                                   eval_env,
     70                                   NA_action)
     71 else:
     72     return None

File ~/Library/r-miniconda/lib/python3.10/site-packages/patsy/build.py:693, in design_matrix_builders(termlists, data_iter_maker, eval_env, NA_action)
    689 factor_states = _factors_memorize(all_factors, data_iter_maker, eval_env)
    690 # Now all the factors have working eval methods, so we can evaluate them
    691 # on some data to find out what type of data they return.
    692 (num_column_counts,
--> 693  cat_levels_contrasts) = _examine_factor_types(all_factors,
    694                                                factor_states,
    695                                                data_iter_maker,
    696                                                NA_action)
    697 # Now we need the factor infos, which encapsulate the knowledge of
    698 # how to turn any given factor into a chunk of data:
    699 factor_infos = {}

File ~/Library/r-miniconda/lib/python3.10/site-packages/patsy/build.py:443, in _examine_factor_types(factors, factor_states, data_iter_maker, NA_action)
    441 for data in data_iter_maker():
    442     for factor in list(examine_needed):
--> 443         value = factor.eval(factor_states[factor], data)
    444         if factor in cat_sniffers or guess_categorical(value):
    445             if factor not in cat_sniffers:

File ~/Library/r-miniconda/lib/python3.10/site-packages/patsy/eval.py:568, in EvalFactor.eval(self, memorize_state, data)
    567 def eval(self, memorize_state, data):
--> 568     return self._eval(memorize_state["eval_code"],
    569                       memorize_state,
    570                       data)

File ~/Library/r-miniconda/lib/python3.10/site-packages/patsy/eval.py:551, in EvalFactor._eval(self, code, memorize_state, data)
    549 def _eval(self, code, memorize_state, data):
    550     inner_namespace = VarLookupDict([data, memorize_state["transforms"]])
--> 551     return call_and_wrap_exc("Error evaluating factor",
    552                              self,
    553                              memorize_state["eval_env"].eval,
    554                              code,
    555                              inner_namespace=inner_namespace)

File ~/Library/r-miniconda/lib/python3.10/site-packages/patsy/compat.py:43, in call_and_wrap_exc(msg, origin, f, *args, **kwargs)
     39     new_exc = PatsyError("%s: %s: %s"
     40                          % (msg, e.__class__.__name__, e),
     41                          origin)
     42     # Use 'exec' to hide this syntax from the Python 2 parser:
---> 43     exec("raise new_exc from e")
     44 else:
     45     # In python 2, we just let the original exception escape -- better
     46     # than destroying the traceback. But if it's a PatsyError, we can
     47     # at least set the origin properly.
     48     if isinstance(e, PatsyError):

File <string>:1

PatsyError: Error evaluating factor: AttributeError: 'Series' object has no attribute 'other'
    attack ~ rtb + rtb.other + attack_1 + attack_neighbors_sum + log_pop_1 + gcp_ppp_1 + STD + SQKM_ADMIN + log_bdist2 + log_capdist + idpb + C(Country) + C(year)
                   ^^^^^^^^^

What happened? It appears that Python is not recognizing the name 'rtb.other' in the regression. Let's change the name of the variable and rerun it.

In [25]:
# Rename variable name
df.rename(columns={'rtb.other':'rtb_other'}, inplace=True)
In [26]:
# Rerun regression
model = smf.ols('attack ~ rtb + rtb_other + attack_1 + attack_neighbors_sum + log_pop_1 + gcp_ppp_1 + STD + SQKM_ADMIN + log_bdist2 + log_capdist + idpb + C(Country) + C(year)', data=df).fit()

print(model.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 attack   R-squared:                       0.612
Model:                            OLS   Adj. R-squared:                  0.610
Method:                 Least Squares   F-statistic:                     481.4
Date:                Tue, 18 Apr 2023   Prob (F-statistic):               0.00
Time:                        12:44:33   Log-Likelihood:            -2.5339e+05
No. Observations:               65012   AIC:                         5.072e+05
Df Residuals:                   64799   BIC:                         5.091e+05
Df Model:                         212                                         
Covariance Type:            nonrobust                                         
================================================================================================================
                                                   coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------------------------------
Intercept                                        6.9926      1.033      6.769      0.000       4.968       9.017
C(Country)[T.Albania]                          -10.4937      0.767    -13.690      0.000     -11.996      -8.991
C(Country)[T.Algeria]                           -9.9632      0.578    -17.249      0.000     -11.095      -8.831
C(Country)[T.Andorra]                         4.048e-11   2.93e-11      1.381      0.167    -1.7e-11    9.79e-11
C(Country)[T.Angola]                           -10.0907      0.709    -14.238      0.000     -11.480      -8.702
C(Country)[T.Argentina]                        -10.5099      0.665    -15.802      0.000     -11.813      -9.206
C(Country)[T.Armenia]                          -11.1490      2.305     -4.838      0.000     -15.666      -6.632
C(Country)[T.Australia]                         -9.7005      1.056     -9.189      0.000     -11.770      -7.631
C(Country)[T.Austria]                          -10.8752      0.881    -12.347      0.000     -12.601      -9.149
C(Country)[T.Azerbaijan]                        -9.1036      1.675     -5.434      0.000     -12.387      -5.820
C(Country)[T.Bahrain]                           -9.8513      1.240     -7.942      0.000     -12.282      -7.420
C(Country)[T.Bangladesh]                       -11.6946      1.130    -10.348      0.000     -13.910      -9.480
C(Country)[T.Barbados]                          -9.9299      2.308     -4.302      0.000     -14.454      -5.406
C(Country)[T.Belarus]                          -10.5137      1.039    -10.119      0.000     -12.550      -8.477
C(Country)[T.Belgium]                          -10.5311      0.905    -11.642      0.000     -12.304      -8.758
C(Country)[T.Belize]                            -9.6549      1.057     -9.137      0.000     -11.726      -7.584
C(Country)[T.Benin]                            -10.8059      1.041    -10.385      0.000     -12.845      -8.766
C(Country)[T.Bhutan]                           -10.7179      0.806    -13.303      0.000     -12.297      -9.139
C(Country)[T.Bolivia]                          -10.4967      0.879    -11.941      0.000     -12.220      -8.774
C(Country)[T.Bosnia and Herzegovina]            10.4473      2.345      4.456      0.000       5.852      15.043
C(Country)[T.Botswana]                         -10.3172      0.870    -11.863      0.000     -12.022      -8.613
C(Country)[T.Bouvet Island]                  -1.692e-12   7.56e-11     -0.022      0.982    -1.5e-10    1.46e-10
C(Country)[T.Brazil]                           -10.1727      0.670    -15.172      0.000     -11.487      -8.858
C(Country)[T.Brunei]                           -10.5257      2.315     -4.547      0.000     -15.063      -5.988
C(Country)[T.Bulgaria]                         -10.7891      0.880    -12.262      0.000     -12.514      -9.065
C(Country)[T.Burkina Faso]                     -10.2120      0.630    -16.201      0.000     -11.447      -8.977
C(Country)[T.Burundi]                          -11.4974      0.889    -12.931      0.000     -13.240      -9.755
C(Country)[T.Cambodia]                         -10.1776      0.705    -14.436      0.000     -11.559      -8.796
C(Country)[T.Cameroon]                         -10.1522      0.851    -11.923      0.000     -11.821      -8.483
C(Country)[T.Canada]                            -9.9247      0.855    -11.602      0.000     -11.601      -8.248
C(Country)[T.Central African Republic]         -10.0012      0.751    -13.319      0.000     -11.473      -8.529
C(Country)[T.Chad]                             -11.1758      0.765    -14.602      0.000     -12.676      -9.676
C(Country)[T.Chile]                            -10.8279      0.772    -14.017      0.000     -12.342      -9.314
C(Country)[T.China]                            -11.8951      0.662    -17.962      0.000     -13.193     -10.597
C(Country)[T.Colombia]                          -9.2547      0.602    -15.379      0.000     -10.434      -8.075
C(Country)[T.Comoros]                           -9.9175      2.316     -4.282      0.000     -14.457      -5.378
C(Country)[T.Congo]                            -10.3818      0.887    -11.699      0.000     -12.121      -8.643
C(Country)[T.Congo, DRC]                        -6.9989      0.915     -7.653      0.000      -8.791      -5.206
C(Country)[T.Costa Rica]                       -10.5717      0.970    -10.904      0.000     -12.472      -8.671
C(Country)[T.Cote d'Ivoire]                    -10.1150      0.578    -17.488      0.000     -11.249      -8.981
C(Country)[T.Croatia]                           -6.4706      2.303     -2.809      0.005     -10.985      -1.956
C(Country)[T.Cuba]                             -10.2297      0.746    -13.716      0.000     -11.691      -8.768
C(Country)[T.Cyprus]                           -10.0311      1.040     -9.649      0.000     -12.069      -7.994
C(Country)[T.Czech Republic]                   -10.9331      1.003    -10.900      0.000     -12.899      -8.967
C(Country)[T.Czechoslovakia]                   -10.7273      2.717     -3.949      0.000     -16.052      -5.403
C(Country)[T.Denmark]                          -10.0883      0.788    -12.804      0.000     -11.633      -8.544
C(Country)[T.Djibouti]                         -10.9075      1.124     -9.705      0.000     -13.110      -8.705
C(Country)[T.Dominica]                       -3.164e-12    1.4e-10     -0.023      0.982   -2.77e-10    2.71e-10
C(Country)[T.Dominican Republic]               -10.1677      0.683    -14.897      0.000     -11.505      -8.830
C(Country)[T.Ecuador]                          -10.3548      0.677    -15.296      0.000     -11.682      -9.028
C(Country)[T.Egypt]                            -10.0910      0.658    -15.336      0.000     -11.381      -8.801
C(Country)[T.El Salvador]                      -10.6366      0.889    -11.960      0.000     -12.380      -8.893
C(Country)[T.Equatorial Guinea]                 -9.9420      1.120     -8.880      0.000     -12.136      -7.748
C(Country)[T.Eritrea]                          -11.1229      0.952    -11.679      0.000     -12.990      -9.256
C(Country)[T.Estonia]                          -10.4592      2.308     -4.532      0.000     -14.982      -5.936
C(Country)[T.Ethiopia]                         -10.9030      0.782    -13.947      0.000     -12.435      -9.371
C(Country)[T.Finland]                          -10.0756      0.807    -12.491      0.000     -11.657      -8.495
C(Country)[T.France]                           -10.5699      0.668    -15.813      0.000     -11.880      -9.260
C(Country)[T.French Guiana]                     -9.3243      1.679     -5.553      0.000     -12.616      -6.033
C(Country)[T.Gabon]                            -10.1867      0.902    -11.292      0.000     -11.955      -8.419
C(Country)[T.Georgia]                          -11.8831      1.375     -8.639      0.000     -14.579      -9.187
C(Country)[T.Germany]                          -10.6643      0.796    -13.402      0.000     -12.224      -9.105
C(Country)[T.Ghana]                            -10.9151      0.864    -12.635      0.000     -12.608      -9.222
C(Country)[T.Glorioso Islands]               -1.415e-12   6.24e-11     -0.023      0.982   -1.24e-10    1.21e-10
C(Country)[T.Greece]                           -10.6476      0.876    -12.156      0.000     -12.364      -8.931
C(Country)[T.Grenada]                           -9.8224      2.310     -4.252      0.000     -14.350      -5.294
C(Country)[T.Guatemala]                        -10.7083      0.700    -15.295      0.000     -12.081      -9.336
C(Country)[T.Guinea]                           -10.7869      0.636    -16.948      0.000     -12.034      -9.539
C(Country)[T.Guinea-Bissau]                     -9.9812      0.985    -10.130      0.000     -11.912      -8.050
C(Country)[T.Guyana]                            -9.6131      0.870    -11.052      0.000     -11.318      -7.908
C(Country)[T.Haiti]                            -10.6446      0.917    -11.610      0.000     -12.442      -8.848
C(Country)[T.Honduras]                         -10.4331      0.712    -14.658      0.000     -11.828      -9.038
C(Country)[T.Hungary]                          -10.4908      0.715    -14.667      0.000     -11.893      -9.089
C(Country)[T.Iceland]                           -9.1603      1.003     -9.135      0.000     -11.126      -7.195
C(Country)[T.India]                             -6.2476      0.630     -9.912      0.000      -7.483      -5.012
C(Country)[T.Indonesia]                        -10.0617      0.657    -15.316      0.000     -11.349      -8.774
C(Country)[T.Iran]                             -11.4329      0.679    -16.826      0.000     -12.765     -10.101
C(Country)[T.Iraq]                              -8.0762      0.716    -11.275      0.000      -9.480      -6.672
C(Country)[T.Ireland]                          -10.3339      1.383     -7.470      0.000     -13.045      -7.622
C(Country)[T.Israel]                            -7.5541      0.967     -7.814      0.000      -9.449      -5.659
C(Country)[T.Italy]                            -10.7665      0.677    -15.897      0.000     -12.094      -9.439
C(Country)[T.Jamaica]                           -9.9754      0.886    -11.261      0.000     -11.712      -8.239
C(Country)[T.Japan]                            -10.5082      0.922    -11.394      0.000     -12.316      -8.701
C(Country)[T.Jordan]                           -12.3579      1.026    -12.044      0.000     -14.369     -10.347
C(Country)[T.Juan De Nova Island]            -4.303e-13   1.88e-11     -0.023      0.982   -3.74e-11    3.65e-11
C(Country)[T.Kazakhstan]                       -10.3569      0.706    -14.677      0.000     -11.740      -8.974
C(Country)[T.Kenya]                            -10.4239      0.935    -11.148      0.000     -12.257      -8.591
C(Country)[T.Kuwait]                           -10.3159      1.234     -8.362      0.000     -12.734      -7.898
C(Country)[T.Kyrgyzstan]                       -11.0551      1.108     -9.977      0.000     -13.227      -8.883
C(Country)[T.Laos]                             -10.5817      0.725    -14.604      0.000     -12.002      -9.161
C(Country)[T.Latvia]                           -10.5426      2.308     -4.567      0.000     -15.067      -6.018
C(Country)[T.Lebanon]                          -10.7674      1.112     -9.687      0.000     -12.946      -8.589
C(Country)[T.Lesotho]                          -11.0968      0.974    -11.391      0.000     -13.006      -9.187
C(Country)[T.Liberia]                          -10.5896      0.793    -13.351      0.000     -12.144      -9.035
C(Country)[T.Libya]                             -9.5059      0.657    -14.471      0.000     -10.793      -8.218
C(Country)[T.Liechtenstein]                  -9.238e-13   3.95e-11     -0.023      0.981   -7.83e-11    7.65e-11
C(Country)[T.Lithuania]                        -10.7198      2.310     -4.641      0.000     -15.247      -6.193
C(Country)[T.Luxembourg]                       -10.2931      2.319     -4.439      0.000     -14.837      -5.749
C(Country)[T.Macedonia]                        -11.1880      2.387     -4.688      0.000     -15.866      -6.510
C(Country)[T.Madagascar]                       -10.3781      1.041     -9.971      0.000     -12.418      -8.338
C(Country)[T.Malawi]                           -11.4192      1.394     -8.191      0.000     -14.152      -8.687
C(Country)[T.Malaysia]                         -10.9470      0.772    -14.182      0.000     -12.460      -9.434
C(Country)[T.Maldives]                          -8.9294      0.758    -11.783      0.000     -10.415      -7.444
C(Country)[T.Mali]                              -8.6893      1.036     -8.384      0.000     -10.721      -6.658
C(Country)[T.Malta]                             -9.9947      2.309     -4.328      0.000     -14.521      -5.468
C(Country)[T.Mauritania]                       -10.0691      0.806    -12.491      0.000     -11.649      -8.489
C(Country)[T.Mexico]                            -9.5778      0.629    -15.236      0.000     -10.810      -8.346
C(Country)[T.Moldova]                          -10.4543      1.669     -6.265      0.000     -13.725      -7.183
C(Country)[T.Monaco]                         -1.848e-12   8.16e-11     -0.023      0.982   -1.62e-10    1.58e-10
C(Country)[T.Mongolia]                         -10.0094      0.708    -14.135      0.000     -11.397      -8.621
C(Country)[T.Montenegro]                       -11.0597      3.346     -3.306      0.001     -17.617      -4.502
C(Country)[T.Morocco]                          -10.7342      0.881    -12.188      0.000     -12.460      -9.008
C(Country)[T.Mozambique]                       -10.8886      0.855    -12.729      0.000     -12.565      -9.212
C(Country)[T.Myanmar]                           -9.6915      0.752    -12.895      0.000     -11.165      -8.218
C(Country)[T.Namibia]                           -9.9167      0.671    -14.772      0.000     -11.232      -8.601
C(Country)[T.Nauru]                           1.474e-12   6.34e-11      0.023      0.981   -1.23e-10    1.26e-10
C(Country)[T.Nepal]                             -7.8232      0.769    -10.178      0.000      -9.330      -6.317
C(Country)[T.Netherlands]                      -10.3422      0.815    -12.694      0.000     -11.939      -8.745
C(Country)[T.New Zealand]                       -9.7107      0.787    -12.335      0.000     -11.254      -8.168
C(Country)[T.Nicaragua]                        -10.3620      0.778    -13.317      0.000     -11.887      -8.837
C(Country)[T.Niger]                            -10.8769      0.980    -11.099      0.000     -12.798      -8.956
C(Country)[T.Nigeria]                           -9.4646      0.641    -14.760      0.000     -10.721      -8.208
C(Country)[T.North Korea]                      -10.7603      0.818    -13.154      0.000     -12.364      -9.157
C(Country)[T.Norway]                           -10.2392      0.700    -14.630      0.000     -11.611      -8.867
C(Country)[T.Oman]                             -10.2930      0.918    -11.215      0.000     -12.092      -8.494
C(Country)[T.Pakistan]                          -1.7774      1.042     -1.706      0.088      -3.819       0.264
C(Country)[T.Panama]                           -10.0690      0.849    -11.855      0.000     -11.734      -8.404
C(Country)[T.Papua New Guinea]                 -10.4955      0.699    -15.026      0.000     -11.865      -9.126
C(Country)[T.Paraguay]                         -10.0092      0.706    -14.186      0.000     -11.392      -8.626
C(Country)[T.Peru]                             -10.5480      0.640    -16.474      0.000     -11.803      -9.293
C(Country)[T.Philippines]                       -8.1711      0.768    -10.644      0.000      -9.676      -6.666
C(Country)[T.Poland]                           -10.3304      0.574    -18.009      0.000     -11.455      -9.206
C(Country)[T.Portugal]                         -10.3147      0.681    -15.141      0.000     -11.650      -8.979
C(Country)[T.Puerto Rico]                     5.353e-13   2.32e-11      0.023      0.982    -4.5e-11    4.61e-11
C(Country)[T.Qatar]                             -9.3391      1.008     -9.267      0.000     -11.314      -7.364
C(Country)[T.Romania]                          -10.5278      0.579    -18.175      0.000     -11.663      -9.392
C(Country)[T.Russia]                            -9.9070      0.548    -18.082      0.000     -10.981      -8.833
C(Country)[T.Rwanda]                           -10.5828      0.938    -11.284      0.000     -12.421      -8.745
C(Country)[T.San Marino]                     -1.417e-13   6.68e-12     -0.021      0.983   -1.32e-11    1.29e-11
C(Country)[T.Sao Tome and Principe]             -9.6305      1.672     -5.761      0.000     -12.907      -6.354
C(Country)[T.Saudi Arabia]                     -10.6553      0.763    -13.965      0.000     -12.151      -9.160
C(Country)[T.Senegal]                          -10.1515      0.859    -11.820      0.000     -11.835      -8.468
C(Country)[T.Sierra Leone]                      -6.7315      1.216     -5.538      0.000      -9.114      -4.349
C(Country)[T.Singapore]                        -10.7057      2.319     -4.617      0.000     -15.250      -6.161
C(Country)[T.Slovakia]                         -10.9679      1.431     -7.667      0.000     -13.772      -8.164
C(Country)[T.Slovenia]                         -11.6506      2.346     -4.965      0.000     -16.250      -7.051
C(Country)[T.Somalia]                           -6.9920      0.705     -9.915      0.000      -8.374      -5.610
C(Country)[T.South Africa]                      -9.6880      0.885    -10.951      0.000     -11.422      -7.954
C(Country)[T.South Korea]                      -10.6570      0.849    -12.554      0.000     -12.321      -8.993
C(Country)[T.South Sudan]                       -1.6418      2.522     -0.651      0.515      -6.585       3.302
C(Country)[T.Spain]                            -10.5423      0.688    -15.331      0.000     -11.890      -9.194
C(Country)[T.Sri Lanka]                         -6.8662      0.878     -7.819      0.000      -8.587      -5.145
C(Country)[T.St. Lucia]                        -10.1532      2.307     -4.402      0.000     -14.674      -5.632
C(Country)[T.St. Vincent and the Grenadines]    -9.9546      2.317     -4.296      0.000     -14.496      -5.413
C(Country)[T.Sudan]                             -7.6688      0.942     -8.141      0.000      -9.515      -5.822
C(Country)[T.Suriname]                          -9.2158      0.915    -10.067      0.000     -11.010      -7.421
C(Country)[T.Swaziland]                        -10.2924      1.230     -8.369      0.000     -12.703      -7.882
C(Country)[T.Sweden]                           -10.0701      0.660    -15.268      0.000     -11.363      -8.777
C(Country)[T.Switzerland]                      -10.7819      0.825    -13.065      0.000     -12.399      -9.164
C(Country)[T.Syria]                            -18.4160      2.798     -6.582      0.000     -23.900     -12.932
C(Country)[T.Taiwan]                           -10.4879      2.311     -4.539      0.000     -15.017      -5.959
C(Country)[T.Tajikistan]                       -10.4326      1.218     -8.564      0.000     -12.820      -8.045
C(Country)[T.Tanzania]                         -10.8448      0.663    -16.347      0.000     -12.145      -9.545
C(Country)[T.Thailand]                         -10.5052      0.552    -19.042      0.000     -11.587      -9.424
C(Country)[T.The Gambia]                       -10.5757      1.400     -7.554      0.000     -13.320      -7.832
C(Country)[T.Timor Leste]                      -10.6574      2.936     -3.630      0.000     -16.412      -4.903
C(Country)[T.Togo]                             -10.3568      0.811    -12.767      0.000     -11.947      -8.767
C(Country)[T.Trinidad and Tobago]              -10.5374      2.306     -4.569      0.000     -15.058      -6.017
C(Country)[T.Tunisia]                          -10.2820      0.664    -15.476      0.000     -11.584      -8.980
C(Country)[T.Turkey]                           -10.0720      0.521    -19.332      0.000     -11.093      -9.051
C(Country)[T.Turkmenistan]                     -10.8098      1.119     -9.661      0.000     -13.003      -8.617
C(Country)[T.USSR]                             -10.1749      1.400     -7.270      0.000     -12.918      -7.432
C(Country)[T.Uganda]                           -10.5963      0.866    -12.234      0.000     -12.294      -8.899
C(Country)[T.Ukraine]                          -10.0109      0.660    -15.164      0.000     -11.305      -8.717
C(Country)[T.United Arab Emirates]             -10.3401      1.032    -10.018      0.000     -12.363      -8.317
C(Country)[T.United Kingdom]                   -10.1519      1.144     -8.875      0.000     -12.394      -7.910
C(Country)[T.United States]                    -10.3425      0.609    -16.983      0.000     -11.536      -9.149
C(Country)[T.Uruguay]                           -9.8545      0.707    -13.939      0.000     -11.240      -8.469
C(Country)[T.Uzbekistan]                       -10.8402      0.778    -13.931      0.000     -12.365      -9.315
C(Country)[T.Venezuela]                        -10.0922      0.649    -15.559      0.000     -11.364      -8.821
C(Country)[T.Vietnam]                          -10.7552      0.552    -19.468      0.000     -11.838      -9.672
C(Country)[T.Yemen]                             -9.3241      0.726    -12.850      0.000     -10.746      -7.902
C(Country)[T.Yugoslavia]                        -7.3932      1.801     -4.105      0.000     -10.923      -3.863
C(Country)[T.Zambia]                           -10.9877      0.904    -12.159      0.000     -12.759      -9.216
C(Country)[T.Zimbabwe]                         -10.8388      0.923    -11.748      0.000     -12.647      -9.030
C(year)[T.1991]                                  0.0198      0.251      0.079      0.937      -0.473       0.512
C(year)[T.1992]                                  0.3692      0.246      1.500      0.134      -0.113       0.852
C(year)[T.1993]                                  0.3247      0.246      1.320      0.187      -0.158       0.807
C(year)[T.1994]                                  0.1986      0.246      0.808      0.419      -0.283       0.680
C(year)[T.1995]                                  0.0331      0.246      0.135      0.893      -0.449       0.515
C(year)[T.1996]                                 -0.2885      0.246     -1.173      0.241      -0.770       0.194
C(year)[T.1997]                                 -0.1242      0.246     -0.505      0.614      -0.606       0.358
C(year)[T.1998]                                  0.3067      0.246      1.247      0.213      -0.176       0.789
C(year)[T.1999]                                  0.1814      0.246      0.737      0.461      -0.301       0.664
C(year)[T.2000]                                  0.2017      0.246      0.820      0.412      -0.281       0.684
C(year)[T.2001]                                 -0.3461      0.246     -1.406      0.160      -0.829       0.136
C(year)[T.2002]                                  1.0385      0.246      4.218      0.000       0.556       1.521
C(year)[T.2003]                                 -0.4090      0.246     -1.660      0.097      -0.892       0.074
C(year)[T.2004]                                  0.8883      0.246      3.606      0.000       0.406       1.371
C(year)[T.2005]                                 -0.0438      0.246     -0.178      0.859      -0.527       0.439
C(year)[T.2006]                                  0.1819      0.246      0.738      0.461      -0.301       0.665
C(year)[T.2007]                                  0.3107      0.246      1.261      0.207      -0.172       0.794
C(year)[T.2008]                                  0.4659      0.247      1.889      0.059      -0.018       0.949
C(year)[T.2009]                                  0.4861      0.247      1.971      0.049       0.003       0.970
C(year)[T.2010]                                  0.1019      0.247      0.413      0.679      -0.382       0.585
C(year)[T.2011]                                  0.4702      0.247      1.904      0.057      -0.014       0.954
C(year)[T.2012]                                  0.2446      0.247      0.989      0.322      -0.240       0.729
C(year)[T.2013]                                 -0.0133      0.247     -0.054      0.957      -0.498       0.471
C(year)[T.2014]                                  0.8414      0.247      3.402      0.001       0.357       1.326
C(year)[T.2015]                                  0.3698      0.248      1.493      0.135      -0.116       0.855
C(year)[T.2016]                                  0.0180      0.248      0.072      0.942      -0.468       0.504
C(year)[T.2017]                                  0.7471      0.248      3.015      0.003       0.261       1.233
C(year)[T.2018]                                  0.4176      0.248      1.686      0.092      -0.068       0.903
rtb                                              0.1675      0.271      0.617      0.537      -0.364       0.699
rtb_other                                        0.3748      0.241      1.557      0.119      -0.097       0.847
attack_1                                         0.6728      0.003    251.298      0.000       0.668       0.678
attack_neighbors_sum                             0.0374      0.001     31.558      0.000       0.035       0.040
log_pop_1                                        0.2454      0.053      4.668      0.000       0.142       0.348
gcp_ppp_1                                       -0.0014      0.001     -1.679      0.093      -0.003       0.000
STD                                              0.0020      0.001      2.359      0.018       0.000       0.004
SQKM_ADMIN                                       -4e-07   3.47e-07     -1.151      0.250   -1.08e-06    2.81e-07
log_bdist2                                      -0.1588      0.071     -2.244      0.025      -0.298      -0.020
log_capdist                                      0.0855      0.070      1.218      0.223      -0.052       0.223
idpb                                             2.9011      0.784      3.701      0.000       1.365       4.438
==============================================================================
Omnibus:                   114979.490   Durbin-Watson:                   2.124
Prob(Omnibus):                  0.000   Jarque-Bera (JB):       5777146419.606
Skew:                          11.262   Prob(JB):                         0.00
Kurtosis:                    1463.205   Cond. No.                     2.19e+21
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 4.58e-28. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.