Research Assignment II 2
Research Assignment II 2
Before performing any statistical Analysis outliers should be removed from the data set. In determining the outlier, first we used
boxplot to see where around they could be located. By observing the distribution from the graphs (the current capital and initial
capital) we took assumptions which ones could be the extreme values. Then by using SPSS analysis of standardized scores (Z-score)
we found the outliers from both current and initial capital values which are the values of Z-score greater than three and less than
negative 3.
Current_capita
l Zcurrent_capital
$500,000.00 6.34925
$500,000.00 6.34925
$400,000.00 4.94666
Outliers
$300,000.00 3.54408
$300,000.00 3.54408
$300,000.00 3.54408
$260,000.00 2.98305
$220,000.00 2.42202
... ...
$600.00 -0.65524
$500.00 -0.65664
1
Current Capital outliers: - the values which are greater than Birr 300,000.00
Initial_capita
l Zinitial_capital
$120,000.00 5.85525
$120,000.00 5.85525
$110,000.00 5.32055
Outlie
$100,000.00 4.78585
rs
$98,000.00 4.67891
$75,000.00 3.44910
$70,000.00 3.18175
$60,000.00 2.64705
$60,000.00 2.64705
… …
$100.00 -0.55580
$100.00 -0.55580
$30.00 -0.55954
$20.00 -0.56008
2
Initial Capital outliers: - the values which are greater than Birr 70,000.00
Then we filter out these outliers from our data to do statistical analyses.
1. Find the first, second and third quartiles for current capital
Statistics
Current Capital
N Valid 239
Missing 0
Percentiles 25 8,500.0000
50 20,000.0000
75 50,000.0000
Recoding
Crosstabs
Case Processing Summary
Cases
Valid Missing Total
N Percent N Percent N Percent
Gender of Owner * 239 100.0% 0 0.0% 239 100.0%
Current Capital grouped
3
Gender of Owner * Current Capital grouped Cross tabulation
Chi-Square Tests
Asymptotic
Significance
Value df (2-sided)
Pearson Chi-Square 2.847a 3 .416
Likelihood Ratio 2.943 3 .400
Linear-by-Linear 2.739 1 .098
Association
N of Valid Cases 239
a. 0 cells (0.0%) have expected count less than 5. The minimum
expected count is 10.44.
As per the above Pearson’s Chi-square Test, the P-value (Exact Sig) is 0.416 which is greater than 0.05. Thus we do not reject Ho
and concluded that Gender and Current Capital are independent, i.e. gender and current capital are not associated.
4
1 (b) Testing the association highest level of education of owner(s) and current capital
Crosstabs
5
Chi-Square Tests
Asymptotic Significance
Value df (2-sided)
Pearson Chi-Square 5.601a 6 .469
a. 4 cells (33.3%) have expected count less than 5. The minimum expected count is
3.62.
As per the above Pearson’s Chi-square Test, the P-value (Exact Sig) is 0.469 which is greater than 0.05. Thus we do not reject Ho
and concluded that Gender and Current Capital are independent, i.e. gender and current capital are not associated.
6
2. Test of significant difference in the highest level of education of owner(s) of MSEs across sub-sectors
Kruskal-Wallis Test
Ranks
Sub-Sector N Mean Rank
Owner's Highest Level Metal and Wood 113 138.86
of Education Work
Textile and Garment 84 121.50
Urban Agriculture 42 66.25
Total 239
Test Statisticsa,b
Owner's Highest Level of
Education
Kruskal-Wallis H 81.946
df 2
Asymp. Sig. .000
Exact Sig. .000
Based on the results of Kruskal–Wallis shown above, the exact p-value (Exact Sig.) is 0.000 which is less than 1%. Thus, we
reject the null hypothesis that the population medians for the three sub-sectors are all equal, and conclude that the Owner’s
High Level of Education for the three subsectors are significantly different.
7
Comparison between Sub-Sectors of owners of high level of education
In order to check which sub-sector has highest level of education we conduct Mann Whitney test of three pair-wise (Metal and Wood
Work versus Textile and Garment, Metal and Wood Work versus Urban Agriculture and Textile and Garment versus Urban Agriculture) test
and found the result as follows.
Mann-Whitney Test
Ranks
Sub-Sector N Mean Rank Sum of Ranks
Owner's Highest Level of Metal and Wood Work 113 105.69 11943.00
Education Textile and Garment 84 90.00 7560.00
Total 197
Owner's Highest Level of Metal and Wood Work 113 90.17 10189.50
Education Urban Agriculture 42 45.25 1900.50
Total 155
Owner's Highest Level of Textile and Garment 84 74.00 6216.00
Education Urban Agriculture 42 42.50 1785.00
Total 126
Test Statisticsa
Metal and Wood Metal and Wood Textile and Garment
Work versus Textile Work versus Urban versus Urban
and Garment Agriculture Agriculture
Mann-Whitney U 3990.000 997.500 882.000
Wilcoxon W 7560.000 1900.500 1785.000
Z -3.828 -7.291 -7.071
Asymp. Sig. (2-tailed) .000 .000 .000
Exact Sig. (2-tailed) .000 .000 .000
Exact Sig. (1-tailed) .000 .000 .000
Point Probability .000 .000 .000
a. Grouping Variable: Sub-Sector
Since we have three pair-wise tests, the correct p-value is 0.05/3 0.0166666 0.017. The exact p-value of all the three pair-wise
tests of the subsectors is less than 0.017. Thus, we can conclude that there is significant difference in between the paired sub-sectors.
From the table above Metal and Wood Work sub-sector has the highest level of education of owners, whereas Urban
Agriculture has the smallest level of Education.
8
3. Test of significance difference in the proportion of MSE’s that are registered, those who have tax identification
number (TIN) and those who have business license (or certificate)
Descriptive Statistics
N Mean Std. Deviation Minimum Maximum
Business Registration 239 .83 .378 0 1
Business 239 .82 .388 0 1
License/Certificate
TIN 239 .72 .450 0 1
Cochran Test
Frequencies
Value
0 1
Business Registration 41 198
Business License/Certificate 44 195
TIN 67 172
Test Statistics
N 239
Cochran's Q 31.947a
df 2
Asymp. Sig. .000
Exact Sig. .000
Point Probability .000
a. 1 is treated as a success.
Since the P-Value is 0.000 which is less than 5% (even less than 1%) we reject Ho and concluded that there is significant difference
in proportion of MSE’s that are registered, those who have tax identification number (TIN) and those who have business license (or
certificate) at one percent level of significance.
9
Test of Paired wise significance difference of proportions
McNemar Test
Crosstabs
Business Registration & TIN Business License/Certificate &
Business Registration & Business
TIN
License/Certificate Business TIN
Business TIN
Business Registration No Yes
License/Certificate No Yes
Business License/Certificate No 37 4
No 42 2
Registration No Yes Yes 30 168 Yes 25 170
No 35 6
Yes 9 189
Test Statisticsa
Business Registration & Business
Business Business Registration & License/Certificate &
License/Certificate TIN TIN
N 239 239 239
Chi-Squarec 18.382 17.926
Asymp. Sig. .000 .000
b
Exact Sig. (2-tailed) .607 .000 .000
Exact Sig. (1-tailed) .304 .000 .000
Point Probability .153 .000 .000
a. McNemar Test
b. Binomial distribution used.
c. Continuity Corrected
The exact p-value for Business Registration & Business License/Certificate pair is 0.607 which exceeds 0.017, while those for
the other two pairs are 0.000 which is less than 0.017 . Thus, there is no statistically significant difference in the proportion of
MSE’s who have Business Registration & Business License/Certificate. However, there are statistically significant difference in the
proportion of MSE’s those who have Business Registration versus TIN and Business License/Certificate versus TIN.
10
4. Pearson coefficient of correlation between current capital and each of Initial capital, Years since establishment,
Years of schooling of owner(s) and Number of paid workers.
The null and alternative hypotheses are:
Ho: 0
HA: ≠ 0
Pearson Correlations
Correlations
Number of paid Owner's Years of
Current Capital Initial Capital Workers Schooling Establishment
Current Capital Pearson Correlation 1 .483** .294** -.099 .017
Sig. (2-tailed) .000 .000 .126 .798
N 239 239 239 239 239
**
Initial Capital Pearson Correlation .483 1 .210** -.038 -.160*
Sig. (2-tailed) .000 .001 .560 .013
N 239 239 239 239 239
**
Number of paid Workers Pearson Correlation .294 .210** 1 .037 -.071
Sig. (2-tailed) .000 .001 .574 .273
N 239 239 239 239 239
Owner's Schooling Pearson Correlation -.099 -.038 .037 1 -.014
Sig. (2-tailed) .126 .560 .574 .830
N 239 239 239 239 239
Years of Establishment Pearson Correlation .017 -.160* -.071 -.014 1
Sig. (2-tailed) .798 .013 .273 .830
N 239 239 239 239 239
**. Correlation is significant at the 0.01 level (2-tailed).
*. Correlation is significant at the 0.05 level (2-tailed).
Based on the result shown above the coefficient of correlation between current capital and Initial capital (=0.483) has a p-value of
0.000 which is less than 1%. Thus, we reject the null hypothesis and conclude that there is a significant positive (or direct)
correlation between current capital and Initial capital at one percent level of significance. Similarly, the coefficient of correlation
between current capital and number of paid workers (=0.294) has a p-value of 0.000 which is less than 1%. Therefore, we reject the
null hypothesis and conclude that there is a significant positive (or direct) correlation between current capital and number of paid
workers at one percent level of significance.
11
On the contrary, current capital against owners’ schooling and current capital against years of establishment have the p-values of
0.126 and 0.798 respectively, that are greater than 5%. So that we do not reject H0 and conclude that there is no correlation between
current capital versus owners’ schooling and current capital versus years of establishment at five percent level of significance.
Spearman rank correlation coefficient between current capital and each of Initial capital, Years since establishment, Years
of schooling of owner(s) and Number of paid workers.
Nonparametric Correlations
Correlations
Current Initial Number of Owner's Years of
Capital Capital paid Workers Schooling Establishment
Spearman's rho Current Capital Correlation Coefficient 1.000 .581** .237** -.071 .128*
Number of paid Workers Correlation Coefficient .237** .284** 1.000 .022 -.065
From the result of spearman test of correlation, the coefficient of correlation between current capital and Initial capital (=0.581) has
a p-value of 0.000 which is less than 1%. Accordingly, we reject the null hypothesis and conclude that there is a significant positive
(or direct) correlation between current capital and Initial capital at one percent level of significance. Similarly, the coefficient of
12
correlation between current capital and number of paid workers (=0.237) has a p-value of 0.000 which is less than 1%.
Consequently, we reject the null hypothesis and conclude that there is a significant positive (or direct) correlation between current
capital and number of paid workers at one percent level of significance. By the same token, the coefficient of correlation between
current capital and years of establishment (=0.128) has a p-value of 0.047 which is less than 5%. Thus, we reject the null hypothesis
and conclude that there is a significant positive (or direct) correlation between current capital and years of establishment at five
percent level of significance.
Conversely, current capital and owners’ schooling have the p-values of 0.271, that is greater than 5%. Hence, we do not reject H0
and conclude that there is no correlation between current capital versus owners’ schooling at five percent of significance.
13
5. Estimation of a multiple linear regression model:
Dependent variable: - Current capital (CC)
Independent (explanatory) variables: - Initial Capital (IC), Business License (BL), Access to Loan from Micro Finance
Institutions (LOAN), Access to Land/Working Premises (LAND), Capacity Utilization Rate (CUR), Years since Establishment
(YEST) and Number of Paid Workers (NPW).
Regression
Model Summary
Adjusted R Std. Error of
Model R R Square Square the Estimate
a
1 .538 .290 .268 36,035.01070
a. Predictors: (Constant), Number of paid Workers, Loan from
Microfinance, Years of Establishment, Utilization Rate, Land or
Premises, Initial Capital, Business License/Certificate
ANOVAa
Sum of
Model Squares df Mean Square F Sig.
1 Regression 122352612376 7 17478944625. 13.461 .000b
.994 285
Residual 299958581030 231 1298521995.8
.177 02
Total 422311193407 238
.172
a. Dependent Variable: Current Capital
b. Predictors: (Constant), Number of paid Workers, Loan from Microfinance, Years of
Establishment, Utilization Rate, Land or Premises, Initial Capital, Business
License/Certificate
14
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 6149.873 8325.912 .739 .461
Initial Capital 1.758 .227 .448 7.760 .000
Business -836.048 6271.401 -.008 -.133 .894
License/Certificate
Loan from Microfinance -4131.566 5555.572 -.042 -.744 .458
Land or Premises 5853.796 5118.838 .066 1.144 .254
Utilization Rate 87.723 100.321 .050 .874 .383
Years of Establishment 542.510 345.582 .091 1.570 .118
Number of paid Workers 5217.733 1501.456 .206 3.475 .001
a. Dependent Variable: Current Capital
c) Percentage of the variation in current capital which is explained by the independent variables
d) The ANOVA result (the F-test). Express the necessary hypotheses symbolically.
e) Which of the explanatory variables are significant predictors of current capital? Briefly discuss each of them (level of significance,
direction of influence, interpretation of the estimated regression coefficients)
15