BA Soln
BA Soln
import numpy as np
import statsmodels.api as sm
data = {
"Revenue": [5, 6, 6.5, 7, 7.5, 8, 10, 10.8, 12, 13, 15.5, 15, 16, 17, 18, 18, 18.5, 21, 20, 22, 23, 7.1, 10.5, 15.8],
"Promotional Expenses": [1, 1.8, 1.6, 1.7, 2, 2, 2.3, 2.8, 3.5, 3.3, 4.8, 5, 7, 8.1, 8, 10, 8, 12.7, 12, 15, 14.4, 1, 2.1,
4.75]
df = pd.DataFrame(data)
# Log Transformation
df["Log_Revenue"] = np.log(df["Revenue"])
X = sm.add_constant(df["Log_Promotional_Expenses"])
y = df["Log_Revenue"]
model_summary = model.summary()
# Residual plot
plt.figure(figsize=(10, 6))
plt.scatter(model.fittedvalues, model.resid)
plt.xlabel('Fitted values')
plt.ylabel('Residuals')
residual_plot = plt.gcf()
model_summary, residual_plot
Result
(<class 'statsmodels.iolib.summary.Summary'>
"""
==============================================================================
Df Model: 1
============================================================================================
--------------------------------------------------------------------------------------------
==============================================================================
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
""",
To address the issues with the functional form, we applied a logarithmic transformation to both the dependent
variable (Revenue) and the independent variable (Promotional Expenses). This transformation can help in dealing
with non-linearity and heteroscedasticity.
Steps:
1. Data Transformation:
o Used the transformed variables to fit an Ordinary Least Squares (OLS) regression model.
o Obtained the regression summary to check the model's performance and diagnostic statistics.
==============================================================================
Df Model: 1
============================================================================================
--------------------------------------------------------------------------------------------
==============================================================================
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Interpretation of Results:
R-squared and Adjusted R-squared: The model explains about 92.5% of the variance in the log-transformed
revenue, which indicates a strong fit.
Coefficients: The coefficient for log(Promotional Expenses) is 0.5235, which means that a 1% increase in
Promotional Expenses is associated with approximately a 0.52% increase in Revenue, holding other factors
constant.
P-values: Both the intercept and the independent variable are statistically significant (p < 0.05).
Diagnostic Tests: The Omnibus test, Durbin-Watson statistic, and Jarque-Bera test indicate that the residuals
are normally distributed, and there is no indication of severe autocorrelation or heteroscedasticity.
Residual Plot:
The residual plot is shown below. It helps in assessing the assumption of homoscedasticity (constant variance of
residuals).
The plot shows no clear pattern, suggesting that the residuals have constant variance, which indicates that the
transformation has successfully addressed the issue of heteroscedasticity.
Conclusion:
By transforming the data using logarithms, we have improved the functional form of the model and addressed issues
related to heteroscedasticity. The transformed model provides a good fit for predicting revenue based on
promotional expenses.