{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Regresión lineal"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Abordemos las primeras ideas de regresión lineal a través de un ejemplo práctico:\n",
"\n",
"- Creamos dos variables, Ingreso y Consumo Esperado"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Y | \n",
" X | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 55 | \n",
" 80 | \n",
"
\n",
" \n",
" 1 | \n",
" 60 | \n",
" 80 | \n",
"
\n",
" \n",
" 2 | \n",
" 65 | \n",
" 80 | \n",
"
\n",
" \n",
" 3 | \n",
" 70 | \n",
" 80 | \n",
"
\n",
" \n",
" 4 | \n",
" 75 | \n",
" 80 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Y X\n",
"0 55 80\n",
"1 60 80\n",
"2 65 80\n",
"3 70 80\n",
"4 75 80"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import random\n",
"import statsmodels.formula.api as smf\n",
"from statsmodels.graphics.regressionplots import abline_plot\n",
"\n",
"\n",
"df = pd.DataFrame({\n",
" 'Col1': [1,2,3],\n",
" 'Col2': [4,5,6]\n",
" })\n",
"\n",
"\n",
"familia = pd.DataFrame({'Y':[55,60,65,70,75,\n",
" 65,70,74,80,85,88,\n",
" 79,84,90,94,98,\n",
" 80,93,95,103,108,113,115,\n",
" 102,107,110,116,118,125,\n",
" 110,115,120,130,135,140,\n",
" 120,136,140,144,145,\n",
" 135,137,140,152,157,160,162,\n",
" 137,145,155,165,175,189,\n",
" 150,152,175,178,180,185,191\n",
" ],'X':[80,80,80,80,80,\n",
" 100,100,100,100,100,100,\n",
" 120,120,120,120,120,\n",
" 140,140,140,140,140,140,140,\n",
" 160,160,160,160,160,160,\n",
" 180,180,180,180,180,180,\n",
" 200,200,200,200,200,\n",
" 220,220,220,220,220,220,220,\n",
" 240,240,240,240,240,240,\n",
" 260,260,260,260,260,260,260\n",
" ]})\n",
"familia.head()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['Y', 'X'], dtype='object')"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ingresos = np.arange(80,261,20)\n",
"ingresos\n",
"consumoEsperado = [65,77,89,101,113,125,137,149,161,173]\n",
"consumoEsperado\n",
"\n",
"\n",
"familia.columns"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.figure() # llama al dispositivo grafico\n",
"plt.plot(ingresos,consumoEsperado)\n",
"plt.scatter(familia['X'],familia['Y'])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"¿Qué hemos hecho?\n",
"\n",
"$$ E(Y|X_i) = f(X_i)$$\n",
"\n",
"$$E(Y|X_i) = \\beta_1+\\beta_2X_i$$\n",
"\n",
"$$ u_i = Y_i - E(Y|X_i) $$\n",
"\n",
"$$ Y_i = E(Y|X_i) + u_i$$\n",
"\n",
"\n",
"¿Qué significa que sea lineal?\n",
"\n",
"El término regresión lineal siempre significará una regresión lineal en los parámetros; los $\\beta$ (es decir, los parámetros) se elevan sólo a la primera potencia. Puede o no ser lineal en las variables explicativas $X$.\n",
"\n",
"Para evidenciar la factibilidad del uso de RL en este caso, vamos a obtener una muestra de la población:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,\n",
" 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,\n",
" 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,\n",
" 51, 52, 53, 54, 55, 56, 57, 58, 59])"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nS = familia.shape\n",
"type(nS)\n",
"indice = np.arange(0,nS[0])\n",
"indice # Creamos una variable indicadora para obtener una muestra"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[54, 59, 33, 25, 32, 39, 47, 51, 18, 3, 34, 12, 29, 7, 26, 5, 56, 50, 44, 13]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"random.seed(8519)\n",
"muestra = random.sample(list(indice),k = 20) # cambio de array a lista\n",
"muestra # samos sample para obtener una muestra sin reemplazo del tamaño indicado"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"ingreso_muestra = familia.loc[muestra,'X']\n",
"consumo_muestra = familia.loc[muestra,'Y']"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: consumo_muestra R-squared: 0.909\n",
"Model: OLS Adj. R-squared: 0.904\n",
"Method: Least Squares F-statistic: 179.4\n",
"Date: Wed, 18 Sep 2024 Prob (F-statistic): 8.44e-11\n",
"Time: 05:32:16 Log-Likelihood: -76.677\n",
"No. Observations: 20 AIC: 157.4\n",
"Df Residuals: 18 BIC: 159.3\n",
"Df Model: 1 \n",
"Covariance Type: nonrobust \n",
"===================================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"-----------------------------------------------------------------------------------\n",
"Intercept 13.2650 8.817 1.504 0.150 -5.259 31.789\n",
"ingreso_muestra 0.6226 0.046 13.394 0.000 0.525 0.720\n",
"==============================================================================\n",
"Omnibus: 3.437 Durbin-Watson: 2.369\n",
"Prob(Omnibus): 0.179 Jarque-Bera (JB): 2.288\n",
"Skew: -0.828 Prob(JB): 0.319\n",
"Kurtosis: 2.997 Cond. No. 634.\n",
"==============================================================================\n",
"\n",
"Notes:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n"
]
}
],
"source": [
"df = pd.DataFrame(list(zip(consumo_muestra,ingreso_muestra)),columns = ['consumo_muestra','ingreso_muestra'])\n",
"ajuste_1 = smf.ols('consumo_muestra~ingreso_muestra',data =df).fit()\n",
"\n",
"print(ajuste_1.summary())"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0, 0.5, 'Cellphone')"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.figure()\n",
"plt.plot(df.ingreso_muestra,df.consumo_muestra,'o')\n",
"plt.plot(df.ingreso_muestra,ajuste_1.fittedvalues,'-',color='r')\n",
"plt.xlabel('Pcapincome')\n",
"plt.ylabel('Cellphone')\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Regresión: Paso a paso"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"La función poblacional sería:\n",
"\n",
"$$\n",
"Y_i = \\beta_1 + \\beta_2X_i+u_i\n",
"$$\n",
"\n",
"Como no es observable, se usa la muestral\n",
"\n",
"$$\n",
"Y_i=\\hat{\\beta}_1+\\hat{\\beta}_2X_i+\\hat{u}_i\n",
"$$\n",
"\n",
"\n",
"$$\n",
"Y_i=\\hat{Y}_i+\\hat{u}_i\n",
"$$\n",
"\n",
"\n",
"$$\n",
"\\hat{u}_i = Y_i-\\hat{Y}_i\n",
"$$\n",
"\n",
"\n",
"$$\n",
"\\hat{u}_i = Y_i- \\hat{\\beta}_1-\\hat{\\beta}_2X_i\n",
"$$\n",
"\n",
"\n",
"Es por esto que los residuos se obtienen a través de los betas:\n",
"\n",
"\n",
"$$\n",
"\\sum\\hat{u}_i^2 =\\sum (Y_i- \\hat{\\beta}_1-\\hat{\\beta}_2X_i)^2\n",
"$$\n",
"\n",
"\n",
"$$\n",
"\\sum\\hat{u}_i^2 =f(\\hat{\\beta}_1,\\hat{\\beta}_2)\n",
"$$\n",
"\n",
"\n",
"Diferenciando se obtiene:\n",
"\n",
"$$\n",
" \\hat{\\beta}_2 = \\frac{S_{xy}}{S_{xx}}\n",
"$$\n",
"\n",
"$$\n",
" \\hat\\beta_1 = \\bar{Y} - \\hat\\beta_2\\bar{X}\n",
"$$\n",
"donde \n",
"\n",
"$$\n",
"S_{xx} = \\sum_{i=1}^{n}x_i^2-n\\bar{x}^2\n",
"$$\n",
"\n",
"$$\n",
"S_{xy} = \\sum_{i=1}^{n}x_i y_i-n\\bar{x}\\bar{y}\n",
"$$\n",
"\n",
"\n",
"Abrimos la `tabla3.2`, vamos a obtener:\n",
"\n",
"- salario promedio por hora (Y) y \n",
"- los años de escolaridad (X).\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Y | \n",
" X | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 70 | \n",
" 80 | \n",
"
\n",
" \n",
" 1 | \n",
" 65 | \n",
" 100 | \n",
"
\n",
" \n",
" 2 | \n",
" 90 | \n",
" 120 | \n",
"
\n",
" \n",
" 3 | \n",
" 95 | \n",
" 140 | \n",
"
\n",
" \n",
" 4 | \n",
" 110 | \n",
" 160 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Y X\n",
"0 70 80\n",
"1 65 100\n",
"2 90 120\n",
"3 95 140\n",
"4 110 160"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"consumo = pd.read_csv('https://raw.githubusercontent.com/vmoprojs/DataLectures/master/GA/Tabla3_2.csv',\n",
" sep = ';',decimal = '.')\n",
"\n",
"\n",
"consumo.head()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(24.454545454545467, 0.509090909090909)"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"media_x = np.mean(consumo['X'])\n",
"media_y = np.mean(consumo['Y'])\n",
"\n",
"\n",
"n = consumo.shape[0]\n",
"\n",
"sumcuad_x = np.sum(consumo['X']*consumo['X'])\n",
"sum_xy = np.sum(consumo['X']*consumo['Y'])\n",
"\n",
"beta_som = (sum_xy-n*media_x*media_y)/(sumcuad_x-n*(media_x**2))\n",
"alpha_som = media_y-beta_som*media_x\n",
"(alpha_som,beta_som)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Verificamos lo anterior mediante:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: Y R-squared: 0.962\n",
"Model: OLS Adj. R-squared: 0.957\n",
"Method: Least Squares F-statistic: 202.9\n",
"Date: Wed, 18 Sep 2024 Prob (F-statistic): 5.75e-07\n",
"Time: 05:32:17 Log-Likelihood: -31.781\n",
"No. Observations: 10 AIC: 67.56\n",
"Df Residuals: 8 BIC: 68.17\n",
"Df Model: 1 \n",
"Covariance Type: nonrobust \n",
"==============================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"Intercept 24.4545 6.414 3.813 0.005 9.664 39.245\n",
"X 0.5091 0.036 14.243 0.000 0.427 0.592\n",
"==============================================================================\n",
"Omnibus: 1.060 Durbin-Watson: 2.680\n",
"Prob(Omnibus): 0.589 Jarque-Bera (JB): 0.777\n",
"Skew: -0.398 Prob(JB): 0.678\n",
"Kurtosis: 1.891 Cond. No. 561.\n",
"==============================================================================\n",
"\n",
"Notes:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/victormorales/opt/anaconda3/lib/python3.9/site-packages/scipy/stats/_stats_py.py:1971: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=10\n",
" k, _ = kurtosistest(a, axis)\n"
]
}
],
"source": [
"reg_1 = smf.ols('Y~X',data = consumo)\n",
"print(reg_1.fit().summary())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Veamos cómo queda nuestra estimación:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" X | \n",
" y_ajustado | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 80 | \n",
" 65.181818 | \n",
"
\n",
" \n",
" 1 | \n",
" 100 | \n",
" 75.363636 | \n",
"
\n",
" \n",
" 2 | \n",
" 120 | \n",
" 85.545455 | \n",
"
\n",
" \n",
" 3 | \n",
" 140 | \n",
" 95.727273 | \n",
"
\n",
" \n",
" 4 | \n",
" 160 | \n",
" 105.909091 | \n",
"
\n",
" \n",
" 5 | \n",
" 180 | \n",
" 116.090909 | \n",
"
\n",
" \n",
" 6 | \n",
" 200 | \n",
" 126.272727 | \n",
"
\n",
" \n",
" 7 | \n",
" 220 | \n",
" 136.454545 | \n",
"
\n",
" \n",
" 8 | \n",
" 240 | \n",
" 146.636364 | \n",
"
\n",
" \n",
" 9 | \n",
" 260 | \n",
" 156.818182 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" X y_ajustado\n",
"0 80 65.181818\n",
"1 100 75.363636\n",
"2 120 85.545455\n",
"3 140 95.727273\n",
"4 160 105.909091\n",
"5 180 116.090909\n",
"6 200 126.272727\n",
"7 220 136.454545\n",
"8 240 146.636364\n",
"9 260 156.818182"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_ajustado = alpha_som+beta_som*consumo['X']\n",
"\n",
"dfAux = pd.DataFrame(list(zip(consumo['X'],y_ajustado)),\n",
" columns = ['X','y_ajustado'])\n",
"dfAux"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" X | \n",
" Y | \n",
" y_ajustado | \n",
" e | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 80 | \n",
" 70 | \n",
" 65.181818 | \n",
" 4.818182 | \n",
"
\n",
" \n",
" 1 | \n",
" 100 | \n",
" 65 | \n",
" 75.363636 | \n",
" -10.363636 | \n",
"
\n",
" \n",
" 2 | \n",
" 120 | \n",
" 90 | \n",
" 85.545455 | \n",
" 4.454545 | \n",
"
\n",
" \n",
" 3 | \n",
" 140 | \n",
" 95 | \n",
" 95.727273 | \n",
" -0.727273 | \n",
"
\n",
" \n",
" 4 | \n",
" 160 | \n",
" 110 | \n",
" 105.909091 | \n",
" 4.090909 | \n",
"
\n",
" \n",
" 5 | \n",
" 180 | \n",
" 115 | \n",
" 116.090909 | \n",
" -1.090909 | \n",
"
\n",
" \n",
" 6 | \n",
" 200 | \n",
" 120 | \n",
" 126.272727 | \n",
" -6.272727 | \n",
"
\n",
" \n",
" 7 | \n",
" 220 | \n",
" 140 | \n",
" 136.454545 | \n",
" 3.545455 | \n",
"
\n",
" \n",
" 8 | \n",
" 240 | \n",
" 155 | \n",
" 146.636364 | \n",
" 8.363636 | \n",
"
\n",
" \n",
" 9 | \n",
" 260 | \n",
" 150 | \n",
" 156.818182 | \n",
" -6.818182 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" X Y y_ajustado e\n",
"0 80 70 65.181818 4.818182\n",
"1 100 65 75.363636 -10.363636\n",
"2 120 90 85.545455 4.454545\n",
"3 140 95 95.727273 -0.727273\n",
"4 160 110 105.909091 4.090909\n",
"5 180 115 116.090909 -1.090909\n",
"6 200 120 126.272727 -6.272727\n",
"7 220 140 136.454545 3.545455\n",
"8 240 155 146.636364 8.363636\n",
"9 260 150 156.818182 -6.818182"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"e = consumo['Y']-y_ajustado\n",
"\n",
"\n",
"dfAux = pd.DataFrame(list(zip(consumo['X'],consumo['Y'],y_ajustado,e)),\n",
" columns = ['X','Y','y_ajustado','e'])\n",
"\n",
"dfAux"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[1.00000000e+00, 1.13838806e-15],\n",
" [1.13838806e-15, 1.00000000e+00]])"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.mean(e)\n",
"np.corrcoef(e,consumo['X'])"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.9620615604867568"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"SCT = np.sum((consumo['Y']-media_y)**2)\n",
"SCE = np.sum((y_ajustado-media_y)**2)\n",
"SCR = np.sum(e**2)\n",
"\n",
"R_2 = SCE/SCT\n",
"R_2\n"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: Y R-squared: 0.962\n",
"Model: OLS Adj. R-squared: 0.957\n",
"Method: Least Squares F-statistic: 202.9\n",
"Date: Wed, 18 Sep 2024 Prob (F-statistic): 5.75e-07\n",
"Time: 05:32:17 Log-Likelihood: -31.781\n",
"No. Observations: 10 AIC: 67.56\n",
"Df Residuals: 8 BIC: 68.17\n",
"Df Model: 1 \n",
"Covariance Type: nonrobust \n",
"==============================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"Intercept 24.4545 6.414 3.813 0.005 9.664 39.245\n",
"X 0.5091 0.036 14.243 0.000 0.427 0.592\n",
"==============================================================================\n",
"Omnibus: 1.060 Durbin-Watson: 2.680\n",
"Prob(Omnibus): 0.589 Jarque-Bera (JB): 0.777\n",
"Skew: -0.398 Prob(JB): 0.678\n",
"Kurtosis: 1.891 Cond. No. 561.\n",
"==============================================================================\n",
"\n",
"Notes:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/victormorales/opt/anaconda3/lib/python3.9/site-packages/scipy/stats/_stats_py.py:1971: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=10\n",
" k, _ = kurtosistest(a, axis)\n"
]
}
],
"source": [
"print(reg_1.fit().summary())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Otro ejemplo"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$\n",
"H_0:\\beta_2=0\n",
"$$\n",
"$$\n",
"H_1:\\beta_2\\neq 0\n",
"$$"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: FOODEXP R-squared: 0.370\n",
"Model: OLS Adj. R-squared: 0.358\n",
"Method: Least Squares F-statistic: 31.10\n",
"Date: Wed, 18 Sep 2024 Prob (F-statistic): 8.45e-07\n",
"Time: 05:32:17 Log-Likelihood: -308.16\n",
"No. Observations: 55 AIC: 620.3\n",
"Df Residuals: 53 BIC: 624.3\n",
"Df Model: 1 \n",
"Covariance Type: nonrobust \n",
"==============================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"Intercept 94.2088 50.856 1.852 0.070 -7.796 196.214\n",
"TOTALEXP 0.4368 0.078 5.577 0.000 0.280 0.594\n",
"==============================================================================\n",
"Omnibus: 0.763 Durbin-Watson: 2.083\n",
"Prob(Omnibus): 0.683 Jarque-Bera (JB): 0.258\n",
"Skew: 0.120 Prob(JB): 0.879\n",
"Kurtosis: 3.234 Cond. No. 3.66e+03\n",
"==============================================================================\n",
"\n",
"Notes:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
"[2] The condition number is large, 3.66e+03. This might indicate that there are\n",
"strong multicollinearity or other numerical problems.\n"
]
}
],
"source": [
"uu = 'https://raw.githubusercontent.com/vmoprojs/DataLectures/master/GA/table2_8.csv'\n",
"\n",
"datos = pd.read_csv(uu,sep = ';')\n",
"datos.shape\n",
"datos.columns\n",
"m1 = smf.ols('FOODEXP~TOTALEXP',data = datos)\n",
"print(m1.fit().summary())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Regresamos el gasto total en el gasto en alimentos\n",
"\n",
"¿Son los coeficientes diferentes de cero?"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3.8888077047438685e-07"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import scipy.stats as st\n",
"\n",
"\n",
"t_ho = 0\n",
"t1 = (0.4368-t_ho)/ 0.078\n",
"(1-st.t.cdf(t1,df = 53))\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.04261898819196597"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\n",
"t_ho = 0.3\n",
"t1 = (0.4368-t_ho)/ 0.078\n",
"(1-st.t.cdf(np.abs(t1),df = 53))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Interpretación de los coeficientes\n",
"\n",
"- El coeficiente de la variable dependiente mide la tasa de cambio (derivada=pendiente) del modelo\n",
"- La interpretación suele ser *En promedio, el aumento de una unidad en la variable independiente produce un aumento/disminución de $\\beta_i$ cantidad en la variable dependiente*\n",
"- Interprete la regresión anterior."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Práctica: Paridad del poder de compra"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" COUNTRY | \n",
" BMACLC | \n",
" BMAC$ | \n",
" EXCH | \n",
" PPP | \n",
" LOCALC | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" United States | \n",
" 2.54 | \n",
" 2.54 | \n",
" -99999.00 | \n",
" -99999.00 | \n",
" -99999 | \n",
"
\n",
" \n",
" 1 | \n",
" Argentina | \n",
" 2.50 | \n",
" 2.50 | \n",
" 1.00 | \n",
" 0.98 | \n",
" -40 | \n",
"
\n",
" \n",
" 2 | \n",
" Australia | \n",
" 3.00 | \n",
" 1.52 | \n",
" 1.98 | \n",
" 1.18 | \n",
" -35 | \n",
"
\n",
" \n",
" 3 | \n",
" Brazil | \n",
" 3.60 | \n",
" 1.64 | \n",
" 2.19 | \n",
" 1.42 | \n",
" -31 | \n",
"
\n",
" \n",
" 4 | \n",
" Britain | \n",
" 1.99 | \n",
" 2.85 | \n",
" 1.43 | \n",
" 1.28 | \n",
" 12 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" COUNTRY BMACLC BMAC$ EXCH PPP LOCALC\n",
"0 United States 2.54 2.54 -99999.00 -99999.00 -99999\n",
"1 Argentina 2.50 2.50 1.00 0.98 -40\n",
"2 Australia 3.00 1.52 1.98 1.18 -35\n",
"3 Brazil 3.60 1.64 2.19 1.42 -31\n",
"4 Britain 1.99 2.85 1.43 1.28 12"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"uu = \"https://raw.githubusercontent.com/vmoprojs/DataLectures/master/GA/Tabla5_9.csv\"\n",
"\n",
"datos = pd.read_csv(uu,sep = ';')\n",
"datos.head()"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" COUNTRY | \n",
" BMACLC | \n",
" BMAC$ | \n",
" EXCH | \n",
" PPP | \n",
" LOCALC | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" United States | \n",
" 2.54 | \n",
" 2.54 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" Argentina | \n",
" 2.50 | \n",
" 2.50 | \n",
" 1.00 | \n",
" 0.98 | \n",
" -40.0 | \n",
"
\n",
" \n",
" 2 | \n",
" Australia | \n",
" 3.00 | \n",
" 1.52 | \n",
" 1.98 | \n",
" 1.18 | \n",
" -35.0 | \n",
"
\n",
" \n",
" 3 | \n",
" Brazil | \n",
" 3.60 | \n",
" 1.64 | \n",
" 2.19 | \n",
" 1.42 | \n",
" -31.0 | \n",
"
\n",
" \n",
" 4 | \n",
" Britain | \n",
" 1.99 | \n",
" 2.85 | \n",
" 1.43 | \n",
" 1.28 | \n",
" 12.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" COUNTRY BMACLC BMAC$ EXCH PPP LOCALC\n",
"0 United States 2.54 2.54 NaN NaN NaN\n",
"1 Argentina 2.50 2.50 1.00 0.98 -40.0\n",
"2 Australia 3.00 1.52 1.98 1.18 -35.0\n",
"3 Brazil 3.60 1.64 2.19 1.42 -31.0\n",
"4 Britain 1.99 2.85 1.43 1.28 12.0"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"datos['EXCH'] = datos.EXCH.replace(to_replace= -99999, value=np.nan)\n",
"datos['PPP'] = datos.PPP.replace(to_replace = -99999, value = np.nan)\n",
"datos['LOCALC'] = datos.LOCALC.replace(to_replace= -99999, value = np.nan)\n",
"\n",
"datos.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Regresamos la paridad del poder de compra en la tasa de cambio"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: EXCH R-squared: 0.987\n",
"Model: OLS Adj. R-squared: 0.986\n",
"Method: Least Squares F-statistic: 2066.\n",
"Date: Wed, 18 Sep 2024 Prob (F-statistic): 8.80e-28\n",
"Time: 05:32:17 Log-Likelihood: -205.44\n",
"No. Observations: 30 AIC: 414.9\n",
"Df Residuals: 28 BIC: 417.7\n",
"Df Model: 1 \n",
"Covariance Type: nonrobust \n",
"==============================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"Intercept -61.3889 44.987 -1.365 0.183 -153.541 30.763\n",
"PPP 1.8153 0.040 45.450 0.000 1.733 1.897\n",
"==============================================================================\n",
"Omnibus: 35.744 Durbin-Watson: 2.062\n",
"Prob(Omnibus): 0.000 Jarque-Bera (JB): 94.065\n",
"Skew: -2.585 Prob(JB): 3.75e-21\n",
"Kurtosis: 9.966 Cond. No. 1.18e+03\n",
"==============================================================================\n",
"\n",
"Notes:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
"[2] The condition number is large, 1.18e+03. This might indicate that there are\n",
"strong multicollinearity or other numerical problems.\n"
]
}
],
"source": [
"reg1 = smf.ols('EXCH~PPP',data = datos)\n",
"print(reg1.fit().summary())"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.figure()\n",
"plt.plot(np.arange(0,30),reg1.fit().resid,'-o')"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: np.log(EXCH) R-squared: 0.983\n",
"Model: OLS Adj. R-squared: 0.983\n",
"Method: Least Squares F-statistic: 1655.\n",
"Date: Wed, 18 Sep 2024 Prob (F-statistic): 1.87e-26\n",
"Time: 05:32:18 Log-Likelihood: -7.4056\n",
"No. Observations: 30 AIC: 18.81\n",
"Df Residuals: 28 BIC: 21.61\n",
"Df Model: 1 \n",
"Covariance Type: nonrobust \n",
"===============================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"-------------------------------------------------------------------------------\n",
"Intercept 0.3436 0.086 3.990 0.000 0.167 0.520\n",
"np.log(PPP) 1.0023 0.025 40.688 0.000 0.952 1.053\n",
"==============================================================================\n",
"Omnibus: 2.829 Durbin-Watson: 1.629\n",
"Prob(Omnibus): 0.243 Jarque-Bera (JB): 1.449\n",
"Skew: -0.179 Prob(JB): 0.485\n",
"Kurtosis: 1.985 Cond. No. 5.38\n",
"==============================================================================\n",
"\n",
"Notes:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n"
]
}
],
"source": [
"reg3 = smf.ols('np.log(EXCH)~np.log(PPP)',data = datos)\n",
"print(reg3.fit().summary())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"La PPA sostiene que con una unidad de moneda debe ser posible comprar la misma canasta de bienes en todos los países."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Práctica: Sueño"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"uu = \"https://raw.githubusercontent.com/vmoprojs/DataLectures/master/WO/sleep75.csv\"\n",
"\n",
"datos = pd.read_csv(uu,sep = \",\",header=None)\n",
"datos.columns\n",
"datos.columns = [\"age\",\"black\",\"case\",\"clerical\",\"construc\",\"educ\",\"earns74\",\"gdhlth\",\"inlf\", \"leis1\", \"leis2\", \"leis3\", \"smsa\", \"lhrwage\", \"lothinc\", \"male\", \"marr\", \"prot\", \"rlxall\", \"selfe\", \"sleep\", \"slpnaps\", \"south\", \"spsepay\", \"spwrk75\", \"totwrk\" , \"union\" , \"worknrm\" , \"workscnd\", \"exper\" , \"yngkid\",\"yrsmarr\", \"hrwage\", \"agesq\"]"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#totwrk: minutos trabajados por semana\n",
"# sleep: minutos dormidos por semana\n",
"plt.figure()\n",
"plt.scatter(datos['totwrk'],datos['sleep'])"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: sleep R-squared: 0.103\n",
"Model: OLS Adj. R-squared: 0.102\n",
"Method: Least Squares F-statistic: 81.09\n",
"Date: Wed, 18 Sep 2024 Prob (F-statistic): 1.99e-18\n",
"Time: 05:32:18 Log-Likelihood: -5267.1\n",
"No. Observations: 706 AIC: 1.054e+04\n",
"Df Residuals: 704 BIC: 1.055e+04\n",
"Df Model: 1 \n",
"Covariance Type: nonrobust \n",
"==============================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"Intercept 3586.3770 38.912 92.165 0.000 3509.979 3662.775\n",
"totwrk -0.1507 0.017 -9.005 0.000 -0.184 -0.118\n",
"==============================================================================\n",
"Omnibus: 68.651 Durbin-Watson: 1.955\n",
"Prob(Omnibus): 0.000 Jarque-Bera (JB): 192.044\n",
"Skew: -0.483 Prob(JB): 1.99e-42\n",
"Kurtosis: 5.365 Cond. No. 5.71e+03\n",
"==============================================================================\n",
"\n",
"Notes:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
"[2] The condition number is large, 5.71e+03. This might indicate that there are\n",
"strong multicollinearity or other numerical problems.\n"
]
}
],
"source": [
"dormir = smf.ols('sleep~totwrk',data = datos)\n",
"print(dormir.fit().summary())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Para acceder a elementos de la estimación"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Intercept 38.912427\n",
"totwrk 0.016740\n",
"dtype: float64\n",
"Intercept 3586.376952\n",
"totwrk -0.150746\n",
"dtype: float64\n"
]
}
],
"source": [
"print(dormir.fit().bse)\n",
"print(dormir.fit().params)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Intervalo de confianza para $\\beta_2$ y veamos los residuos"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([-0.18422633, -0.11726532])"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dormir.fit().params[1]+(-2*dormir.fit().bse[1],2*dormir.fit().bse[1])"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.figure()\n",
"plt.hist(dormir.fit().resid,bins = 60,density = True);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Transformaciones lineales"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: Cellphone R-squared: 0.626\n",
"Model: OLS Adj. R-squared: 0.615\n",
"Method: Least Squares F-statistic: 53.67\n",
"Date: Wed, 18 Sep 2024 Prob (F-statistic): 2.50e-08\n",
"Time: 05:32:19 Log-Likelihood: -148.94\n",
"No. Observations: 34 AIC: 301.9\n",
"Df Residuals: 32 BIC: 304.9\n",
"Df Model: 1 \n",
"Covariance Type: nonrobust \n",
"==============================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"Intercept 12.4795 6.109 2.043 0.049 0.037 24.922\n",
"Pcapincome 0.0023 0.000 7.326 0.000 0.002 0.003\n",
"==============================================================================\n",
"Omnibus: 1.398 Durbin-Watson: 2.381\n",
"Prob(Omnibus): 0.497 Jarque-Bera (JB): 0.531\n",
"Skew: 0.225 Prob(JB): 0.767\n",
"Kurtosis: 3.414 Cond. No. 3.46e+04\n",
"==============================================================================\n",
"\n",
"Notes:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
"[2] The condition number is large, 3.46e+04. This might indicate that there are\n",
"strong multicollinearity or other numerical problems.\n"
]
}
],
"source": [
"uu = \"https://raw.githubusercontent.com/vmoprojs/DataLectures/master/GA/Table%2031_3.csv\"\n",
"\n",
"datos = pd.read_csv(uu, sep =';')\n",
"\n",
"reg_1 = smf.ols('Cellphone~Pcapincome',data = datos)\n",
"print(reg_1.fit().summary())"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0, 0.5, 'Cellphone')"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.figure()\n",
"plt.plot(datos.Pcapincome,datos.Cellphone,'o')\n",
"plt.plot(datos.Pcapincome,reg_1.fit().fittedvalues,'-',color='r')\n",
"plt.xlabel('Pcapincome')\n",
"plt.ylabel('Cellphone')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Modelo reciproco"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" CM | \n",
" FLR | \n",
" PGNP | \n",
" TFR | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 64.000000 | \n",
" 64.000000 | \n",
" 64.000000 | \n",
" 64.000000 | \n",
"
\n",
" \n",
" mean | \n",
" 141.500000 | \n",
" 51.187500 | \n",
" 1401.250000 | \n",
" 5.549688 | \n",
"
\n",
" \n",
" std | \n",
" 75.978067 | \n",
" 26.007859 | \n",
" 2725.695775 | \n",
" 1.508993 | \n",
"
\n",
" \n",
" min | \n",
" 12.000000 | \n",
" 9.000000 | \n",
" 120.000000 | \n",
" 1.690000 | \n",
"
\n",
" \n",
" 25% | \n",
" 82.000000 | \n",
" 29.000000 | \n",
" 300.000000 | \n",
" 4.607500 | \n",
"
\n",
" \n",
" 50% | \n",
" 138.500000 | \n",
" 48.000000 | \n",
" 620.000000 | \n",
" 6.040000 | \n",
"
\n",
" \n",
" 75% | \n",
" 192.500000 | \n",
" 77.250000 | \n",
" 1317.500000 | \n",
" 6.615000 | \n",
"
\n",
" \n",
" max | \n",
" 312.000000 | \n",
" 95.000000 | \n",
" 19830.000000 | \n",
" 8.490000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" CM FLR PGNP TFR\n",
"count 64.000000 64.000000 64.000000 64.000000\n",
"mean 141.500000 51.187500 1401.250000 5.549688\n",
"std 75.978067 26.007859 2725.695775 1.508993\n",
"min 12.000000 9.000000 120.000000 1.690000\n",
"25% 82.000000 29.000000 300.000000 4.607500\n",
"50% 138.500000 48.000000 620.000000 6.040000\n",
"75% 192.500000 77.250000 1317.500000 6.615000\n",
"max 312.000000 95.000000 19830.000000 8.490000"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"uu = \"https://raw.githubusercontent.com/vmoprojs/DataLectures/master/GA/tabla_6_4.csv\"\n",
"datos = pd.read_csv(uu,sep = ';')\n",
"datos.describe()"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0, 0.5, 'CM')"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.figure()\n",
"plt.plot(datos.PGNP,datos.CM,'o')\n",
"plt.xlabel('PGNP')\n",
"plt.ylabel('CM')"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: CM R-squared: 0.166\n",
"Model: OLS Adj. R-squared: 0.153\n",
"Method: Least Squares F-statistic: 12.36\n",
"Date: Wed, 18 Sep 2024 Prob (F-statistic): 0.000826\n",
"Time: 05:32:19 Log-Likelihood: -361.64\n",
"No. Observations: 64 AIC: 727.3\n",
"Df Residuals: 62 BIC: 731.6\n",
"Df Model: 1 \n",
"Covariance Type: nonrobust \n",
"==============================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"Intercept 157.4244 9.846 15.989 0.000 137.743 177.105\n",
"PGNP -0.0114 0.003 -3.516 0.001 -0.018 -0.005\n",
"==============================================================================\n",
"Omnibus: 3.321 Durbin-Watson: 1.931\n",
"Prob(Omnibus): 0.190 Jarque-Bera (JB): 2.545\n",
"Skew: 0.345 Prob(JB): 0.280\n",
"Kurtosis: 2.309 Cond. No. 3.43e+03\n",
"==============================================================================\n",
"\n",
"Notes:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
"[2] The condition number is large, 3.43e+03. This might indicate that there are\n",
"strong multicollinearity or other numerical problems.\n"
]
}
],
"source": [
"reg1 = smf.ols('CM~PGNP',data = datos)\n",
"print(reg1.fit().summary())"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: CM R-squared: 0.459\n",
"Model: OLS Adj. R-squared: 0.450\n",
"Method: Least Squares F-statistic: 52.61\n",
"Date: Wed, 18 Sep 2024 Prob (F-statistic): 7.82e-10\n",
"Time: 05:32:20 Log-Likelihood: -347.79\n",
"No. Observations: 64 AIC: 699.6\n",
"Df Residuals: 62 BIC: 703.9\n",
"Df Model: 1 \n",
"Covariance Type: nonrobust \n",
"==============================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"Intercept 81.7944 10.832 7.551 0.000 60.141 103.447\n",
"RepPGNP 2.727e+04 3759.999 7.254 0.000 1.98e+04 3.48e+04\n",
"==============================================================================\n",
"Omnibus: 0.147 Durbin-Watson: 1.959\n",
"Prob(Omnibus): 0.929 Jarque-Bera (JB): 0.334\n",
"Skew: 0.065 Prob(JB): 0.846\n",
"Kurtosis: 2.671 Cond. No. 534.\n",
"==============================================================================\n",
"\n",
"Notes:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n"
]
}
],
"source": [
"datos['RepPGNP'] = 1/datos.PGNP\n",
"\n",
"reg2 = smf.ols('CM~RepPGNP',data = datos)\n",
"print(reg2.fit().summary())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Modelo log-lineal"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: lsalary R-squared: 0.013\n",
"Model: OLS Adj. R-squared: 0.008\n",
"Method: Least Squares F-statistic: 2.334\n",
"Date: Wed, 18 Sep 2024 Prob (F-statistic): 0.128\n",
"Time: 05:32:20 Log-Likelihood: -160.84\n",
"No. Observations: 177 AIC: 325.7\n",
"Df Residuals: 175 BIC: 332.0\n",
"Df Model: 1 \n",
"Covariance Type: nonrobust \n",
"==============================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"Intercept 6.5055 0.068 95.682 0.000 6.371 6.640\n",
"ceoten 0.0097 0.006 1.528 0.128 -0.003 0.022\n",
"==============================================================================\n",
"Omnibus: 3.858 Durbin-Watson: 2.084\n",
"Prob(Omnibus): 0.145 Jarque-Bera (JB): 3.907\n",
"Skew: -0.189 Prob(JB): 0.142\n",
"Kurtosis: 3.622 Cond. No. 16.1\n",
"==============================================================================\n",
"\n",
"Notes:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n"
]
}
],
"source": [
"uu = \"https://raw.githubusercontent.com/vmoprojs/DataLectures/master/WO/ceosal2.csv\"\n",
"\n",
"datos = pd.read_csv(uu,header = None)\n",
"datos.columns = [\"salary\", \"age\", \"college\", \"grad\", \"comten\", \"ceoten\", \"sales\", \"profits\",\"mktval\", \"lsalary\", \"lsales\", \"lmktval\", \"comtensq\", \"ceotensq\", \"profmarg\"]\n",
"datos.head()\n",
"\n",
"\n",
"reg1 = smf.ols('lsalary~ceoten',data = datos)\n",
"print(reg1.fit().summary())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Regresión a través del origen"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" OLS Regression Results \n",
"=======================================================================================\n",
"Dep. Variable: Y R-squared (uncentered): 0.502\n",
"Model: OLS Adj. R-squared (uncentered): 0.500\n",
"Method: Least Squares F-statistic: 241.2\n",
"Date: Wed, 18 Sep 2024 Prob (F-statistic): 4.41e-38\n",
"Time: 05:32:21 Log-Likelihood: -751.30\n",
"No. Observations: 240 AIC: 1505.\n",
"Df Residuals: 239 BIC: 1508.\n",
"Df Model: 1 \n",
"Covariance Type: nonrobust \n",
"==============================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"X 1.1555 0.074 15.532 0.000 1.009 1.302\n",
"==============================================================================\n",
"Omnibus: 9.576 Durbin-Watson: 1.973\n",
"Prob(Omnibus): 0.008 Jarque-Bera (JB): 13.569\n",
"Skew: -0.268 Prob(JB): 0.00113\n",
"Kurtosis: 4.034 Cond. No. 1.00\n",
"==============================================================================\n",
"\n",
"Notes:\n",
"[1] R² is computed without centering (uncentered) since the model does not contain a constant.\n",
"[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n"
]
}
],
"source": [
"uu = \"https://raw.githubusercontent.com/vmoprojs/DataLectures/master/GA/Table%206_1.csv\"\n",
"datos = pd.read_csv(uu,sep = ';')\n",
"datos.head()\n",
"\n",
"\n",
"lmod1 = smf.ols('Y~ -1+X',data = datos)\n",
"print(lmod1.fit().summary())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Regresión Lineal Múltiple"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" price | \n",
" assess | \n",
" bdrms | \n",
" lotsize | \n",
" sqrft | \n",
" colonial | \n",
" lprice | \n",
" lassess | \n",
" llotsize | \n",
" lsqrft | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 88.000000 | \n",
" 88.000000 | \n",
" 88.000000 | \n",
" 88.000000 | \n",
" 88.000000 | \n",
" 88.000000 | \n",
" 88.000000 | \n",
" 88.000000 | \n",
" 88.000000 | \n",
" 88.000000 | \n",
"
\n",
" \n",
" mean | \n",
" 293.546034 | \n",
" 315.736364 | \n",
" 3.568182 | \n",
" 9019.863636 | \n",
" 2013.693182 | \n",
" 0.693182 | \n",
" 5.633180 | \n",
" 5.717994 | \n",
" 8.905105 | \n",
" 7.572610 | \n",
"
\n",
" \n",
" std | \n",
" 102.713445 | \n",
" 95.314437 | \n",
" 0.841393 | \n",
" 10174.150414 | \n",
" 577.191583 | \n",
" 0.463816 | \n",
" 0.303573 | \n",
" 0.262113 | \n",
" 0.544060 | \n",
" 0.258688 | \n",
"
\n",
" \n",
" min | \n",
" 111.000000 | \n",
" 198.700000 | \n",
" 2.000000 | \n",
" 1000.000000 | \n",
" 1171.000000 | \n",
" 0.000000 | \n",
" 4.709530 | \n",
" 5.291796 | \n",
" 6.907755 | \n",
" 7.065613 | \n",
"
\n",
" \n",
" 25% | \n",
" 230.000000 | \n",
" 253.900000 | \n",
" 3.000000 | \n",
" 5732.750000 | \n",
" 1660.500000 | \n",
" 0.000000 | \n",
" 5.438079 | \n",
" 5.536940 | \n",
" 8.653908 | \n",
" 7.414873 | \n",
"
\n",
" \n",
" 50% | \n",
" 265.500000 | \n",
" 290.200000 | \n",
" 3.000000 | \n",
" 6430.000000 | \n",
" 1845.000000 | \n",
" 1.000000 | \n",
" 5.581613 | \n",
" 5.670566 | \n",
" 8.768719 | \n",
" 7.520231 | \n",
"
\n",
" \n",
" 75% | \n",
" 326.250000 | \n",
" 352.125000 | \n",
" 4.000000 | \n",
" 8583.250000 | \n",
" 2227.000000 | \n",
" 1.000000 | \n",
" 5.787642 | \n",
" 5.863982 | \n",
" 9.057567 | \n",
" 7.708266 | \n",
"
\n",
" \n",
" max | \n",
" 725.000000 | \n",
" 708.600000 | \n",
" 7.000000 | \n",
" 92681.000000 | \n",
" 3880.000000 | \n",
" 1.000000 | \n",
" 6.586172 | \n",
" 6.563291 | \n",
" 11.436920 | \n",
" 8.263591 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" price assess bdrms lotsize sqrft \\\n",
"count 88.000000 88.000000 88.000000 88.000000 88.000000 \n",
"mean 293.546034 315.736364 3.568182 9019.863636 2013.693182 \n",
"std 102.713445 95.314437 0.841393 10174.150414 577.191583 \n",
"min 111.000000 198.700000 2.000000 1000.000000 1171.000000 \n",
"25% 230.000000 253.900000 3.000000 5732.750000 1660.500000 \n",
"50% 265.500000 290.200000 3.000000 6430.000000 1845.000000 \n",
"75% 326.250000 352.125000 4.000000 8583.250000 2227.000000 \n",
"max 725.000000 708.600000 7.000000 92681.000000 3880.000000 \n",
"\n",
" colonial lprice lassess llotsize lsqrft \n",
"count 88.000000 88.000000 88.000000 88.000000 88.000000 \n",
"mean 0.693182 5.633180 5.717994 8.905105 7.572610 \n",
"std 0.463816 0.303573 0.262113 0.544060 0.258688 \n",
"min 0.000000 4.709530 5.291796 6.907755 7.065613 \n",
"25% 0.000000 5.438079 5.536940 8.653908 7.414873 \n",
"50% 1.000000 5.581613 5.670566 8.768719 7.520231 \n",
"75% 1.000000 5.787642 5.863982 9.057567 7.708266 \n",
"max 1.000000 6.586172 6.563291 11.436920 8.263591 "
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"uu = \"https://raw.githubusercontent.com/vmoprojs/DataLectures/master/WO/hprice1.csv\"\n",
"datos = pd.read_csv(uu,header=None)\n",
"datos.columns = [\"price\" , \"assess\" , \n",
" \"bdrms\" , \"lotsize\" ,\n",
" \"sqrft\" , \"colonial\",\n",
" \"lprice\" , \"lassess\" ,\n",
" \"llotsize\" , \"lsqrft\"]\n",
"\n",
"datos.describe()\n"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: lprice R-squared: 0.773\n",
"Model: OLS Adj. R-squared: 0.762\n",
"Method: Least Squares F-statistic: 70.58\n",
"Date: Wed, 18 Sep 2024 Prob (F-statistic): 6.45e-26\n",
"Time: 05:32:21 Log-Likelihood: 45.750\n",
"No. Observations: 88 AIC: -81.50\n",
"Df Residuals: 83 BIC: -69.11\n",
"Df Model: 4 \n",
"Covariance Type: nonrobust \n",
"==============================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"Intercept 0.2637 0.570 0.463 0.645 -0.869 1.397\n",
"lassess 1.0431 0.151 6.887 0.000 0.742 1.344\n",
"llotsize 0.0074 0.039 0.193 0.848 -0.069 0.084\n",
"lsqrft -0.1032 0.138 -0.746 0.458 -0.379 0.172\n",
"bdrms 0.0338 0.022 1.531 0.129 -0.010 0.078\n",
"==============================================================================\n",
"Omnibus: 14.527 Durbin-Watson: 2.048\n",
"Prob(Omnibus): 0.001 Jarque-Bera (JB): 56.436\n",
"Skew: 0.118 Prob(JB): 5.56e-13\n",
"Kurtosis: 6.916 Cond. No. 501.\n",
"==============================================================================\n",
"\n",
"Notes:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n"
]
}
],
"source": [
"\n",
"modelo1 = smf.ols('lprice~lassess+llotsize+lsqrft+bdrms',data = datos)\n",
"print(modelo1.fit().summary())"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: lprice R-squared: 0.643\n",
"Model: OLS Adj. R-squared: 0.630\n",
"Method: Least Squares F-statistic: 50.42\n",
"Date: Wed, 18 Sep 2024 Prob (F-statistic): 9.74e-19\n",
"Time: 05:32:21 Log-Likelihood: 25.861\n",
"No. Observations: 88 AIC: -43.72\n",
"Df Residuals: 84 BIC: -33.81\n",
"Df Model: 3 \n",
"Covariance Type: nonrobust \n",
"==============================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"Intercept -1.2970 0.651 -1.992 0.050 -2.592 -0.002\n",
"llotsize 0.1680 0.038 4.388 0.000 0.092 0.244\n",
"lsqrft 0.7002 0.093 7.540 0.000 0.516 0.885\n",
"bdrms 0.0370 0.028 1.342 0.183 -0.018 0.092\n",
"==============================================================================\n",
"Omnibus: 12.060 Durbin-Watson: 2.089\n",
"Prob(Omnibus): 0.002 Jarque-Bera (JB): 34.890\n",
"Skew: -0.188 Prob(JB): 2.65e-08\n",
"Kurtosis: 6.062 Cond. No. 410.\n",
"==============================================================================\n",
"\n",
"Notes:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n"
]
}
],
"source": [
"modelo2 = smf.ols('lprice~llotsize+lsqrft+bdrms',data = datos)\n",
"print(modelo2.fit().summary())"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: lprice R-squared: 0.215\n",
"Model: OLS Adj. R-squared: 0.206\n",
"Method: Least Squares F-statistic: 23.53\n",
"Date: Wed, 18 Sep 2024 Prob (F-statistic): 5.43e-06\n",
"Time: 05:32:21 Log-Likelihood: -8.8147\n",
"No. Observations: 88 AIC: 21.63\n",
"Df Residuals: 86 BIC: 26.58\n",
"Df Model: 1 \n",
"Covariance Type: nonrobust \n",
"==============================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"Intercept 5.0365 0.126 39.862 0.000 4.785 5.288\n",
"bdrms 0.1672 0.034 4.851 0.000 0.099 0.236\n",
"==============================================================================\n",
"Omnibus: 7.476 Durbin-Watson: 2.056\n",
"Prob(Omnibus): 0.024 Jarque-Bera (JB): 13.085\n",
"Skew: -0.182 Prob(JB): 0.00144\n",
"Kurtosis: 4.854 Cond. No. 17.2\n",
"==============================================================================\n",
"\n",
"Notes:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n"
]
}
],
"source": [
"modelo3 = smf.ols('lprice~bdrms',data = datos)\n",
"print(modelo3.fit().summary())"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import seaborn as sns\n",
"\n",
"sns.pairplot(datos.loc[:,['lprice','llotsize' , 'lsqrft' , 'bdrms']])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Predicción"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" mean | \n",
" mean_se | \n",
" mean_ci_lower | \n",
" mean_ci_upper | \n",
" obs_ci_lower | \n",
" obs_ci_upper | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 5.776577 | \n",
" 0.029185 | \n",
" 5.718541 | \n",
" 5.834614 | \n",
" 5.404916 | \n",
" 6.148239 | \n",
"
\n",
" \n",
" 1 | \n",
" 5.707740 | \n",
" 0.029306 | \n",
" 5.649463 | \n",
" 5.766018 | \n",
" 5.336041 | \n",
" 6.079440 | \n",
"
\n",
" \n",
" 2 | \n",
" 5.310543 | \n",
" 0.033384 | \n",
" 5.244156 | \n",
" 5.376930 | \n",
" 4.937486 | \n",
" 5.683600 | \n",
"
\n",
" \n",
" 3 | \n",
" 5.326681 | \n",
" 0.031818 | \n",
" 5.263407 | \n",
" 5.389955 | \n",
" 4.954165 | \n",
" 5.699197 | \n",
"
\n",
" \n",
" 4 | \n",
" 5.797220 | \n",
" 0.031014 | \n",
" 5.735544 | \n",
" 5.858895 | \n",
" 5.424972 | \n",
" 6.169467 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" mean mean_se mean_ci_lower mean_ci_upper obs_ci_lower \\\n",
"0 5.776577 0.029185 5.718541 5.834614 5.404916 \n",
"1 5.707740 0.029306 5.649463 5.766018 5.336041 \n",
"2 5.310543 0.033384 5.244156 5.376930 4.937486 \n",
"3 5.326681 0.031818 5.263407 5.389955 4.954165 \n",
"4 5.797220 0.031014 5.735544 5.858895 5.424972 \n",
"\n",
" obs_ci_upper \n",
"0 6.148239 \n",
"1 6.079440 \n",
"2 5.683600 \n",
"3 5.699197 \n",
"4 6.169467 "
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"datos_nuevos = pd.DataFrame({'llotsize':np.log(2100),'lsqrft':np.log(8000),'bdrms':4},index = [0])\n",
"\n",
"pred_vals = modelo2.fit().predict()\n",
"modelo2.fit().get_prediction().summary_frame().head()"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" mean | \n",
" mean_se | \n",
" mean_ci_lower | \n",
" mean_ci_upper | \n",
" obs_ci_lower | \n",
" obs_ci_upper | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 6.428811 | \n",
" 0.147975 | \n",
" 6.134546 | \n",
" 6.723076 | \n",
" 5.958326 | \n",
" 6.899296 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" mean mean_se mean_ci_lower mean_ci_upper obs_ci_lower \\\n",
"0 6.428811 0.147975 6.134546 6.723076 5.958326 \n",
"\n",
" obs_ci_upper \n",
"0 6.899296 "
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pred_vals = modelo2.fit().get_prediction(datos_nuevos)\n",
"pred_vals.summary_frame()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### RLM: Educación con insumos"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: colGPA R-squared: 0.050\n",
"Model: OLS Adj. R-squared: 0.043\n",
"Method: Least Squares F-statistic: 7.314\n",
"Date: Wed, 18 Sep 2024 Prob (F-statistic): 0.00770\n",
"Time: 05:32:25 Log-Likelihood: -56.641\n",
"No. Observations: 141 AIC: 117.3\n",
"Df Residuals: 139 BIC: 123.2\n",
"Df Model: 1 \n",
"Covariance Type: nonrobust \n",
"==============================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"Intercept 2.9894 0.040 75.678 0.000 2.911 3.068\n",
"PC 0.1695 0.063 2.704 0.008 0.046 0.293\n",
"==============================================================================\n",
"Omnibus: 2.136 Durbin-Watson: 1.941\n",
"Prob(Omnibus): 0.344 Jarque-Bera (JB): 1.852\n",
"Skew: 0.160 Prob(JB): 0.396\n",
"Kurtosis: 2.539 Cond. No. 2.45\n",
"==============================================================================\n",
"\n",
"Notes:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n"
]
}
],
"source": [
"uu = \"https://raw.githubusercontent.com/vmoprojs/DataLectures/master/WO/gpa1.csv\"\n",
"datosgpa = pd.read_csv(uu, header = None)\n",
"datosgpa.columns = [\"age\", \"soph\", \"junior\", \"senior\", \"senior5\", \"male\", \"campus\", \"business\", \"engineer\", \"colGPA\", \"hsGPA\", \"ACT\", \"job19\", \"job20\", \"drive\", \"bike\", \"walk\", \"voluntr\", \"PC\", \"greek\", \"car\", \"siblings\", \"bgfriend\", \"clubs\", \"skipped\", \"alcohol\", \"gradMI\", \"fathcoll\", \"mothcoll\"]\n",
"\n",
"reg4 = smf.ols('colGPA ~ PC', data = datosgpa)\n",
"print(reg4.fit().summary())"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: colGPA R-squared: 0.219\n",
"Model: OLS Adj. R-squared: 0.202\n",
"Method: Least Squares F-statistic: 12.83\n",
"Date: Wed, 18 Sep 2024 Prob (F-statistic): 1.93e-07\n",
"Time: 05:32:25 Log-Likelihood: -42.796\n",
"No. Observations: 141 AIC: 93.59\n",
"Df Residuals: 137 BIC: 105.4\n",
"Df Model: 3 \n",
"Covariance Type: nonrobust \n",
"==============================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"Intercept 1.2635 0.333 3.793 0.000 0.605 1.922\n",
"PC 0.1573 0.057 2.746 0.007 0.044 0.271\n",
"hsGPA 0.4472 0.094 4.776 0.000 0.262 0.632\n",
"ACT 0.0087 0.011 0.822 0.413 -0.012 0.029\n",
"==============================================================================\n",
"Omnibus: 2.770 Durbin-Watson: 1.870\n",
"Prob(Omnibus): 0.250 Jarque-Bera (JB): 1.863\n",
"Skew: 0.016 Prob(JB): 0.394\n",
"Kurtosis: 2.438 Cond. No. 298.\n",
"==============================================================================\n",
"\n",
"Notes:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n"
]
}
],
"source": [
"reg5 = smf.ols('colGPA ~ PC + hsGPA + ACT', data = datosgpa)\n",
"print(reg5.fit().summary())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"celltoolbar": "Raw Cell Format",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.19"
}
},
"nbformat": 4,
"nbformat_minor": 4
}