0% found this document useful (0 votes)
24 views

Stats Tools Presentation

The document discusses illegal immigration to the United States and provides statistics on illegal immigrants apprehended at US borders from 2000-2016. It introduces the topic, poses a problem statement on understanding demographics and behaviors to inform policy, and describes the dataset which contains information on state/territory, year, and numbers of immigrants apprehended.

Uploaded by

Predak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Stats Tools Presentation

The document discusses illegal immigration to the United States and provides statistics on illegal immigrants apprehended at US borders from 2000-2016. It introduces the topic, poses a problem statement on understanding demographics and behaviors to inform policy, and describes the dataset which contains information on state/territory, year, and numbers of immigrants apprehended.

Uploaded by

Predak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Stats Tools

Presentation

Created by Vatsal Pandey


Register 2241162
Intro to domain
Illegal immigration to the United States
is the process of migrating into the
United States in violation of US
immigration laws. This can include
foreign nationals who have entered the
United States unlawfully, as well as
those who lawfully entered but then
remained after the expiration of their
visas, parole, TPS, etc. Illegal
immigration has been a matter of
intense debate in the United States
since the 1980s.
Problem Statement
How can we better understand the demographics,
socio-economic factors, and behaviors of illegal
immigrants in the United States, in order to develop
evidence-based policies and interventions to
manage this population and address the
challenges they pose to the country's economy,
society, and political landscape?
About the dataset
This report provides statistics for the number of
illegal immigrants arrested or apprehended by the
border patrol in each division (or sector) of the
United States borders with Canada, Mexico, and the
Caribbean islands; this data is a partial measure of
the flow of people illegally entering the United
States.
SL NO Column Name Description

Dimensions of 1 Border
Info about the
broder in

the dataset
geography.

Info about the


2 Sector
sector.

The given dataset contains 5 unique


Info about state
columns of information regarding the 3 State/Territory
or territory.

State, Region, Year, and Year (Mexico


No of immigrants
4 Year All
only) and continues upto 2016. from all regions.

No of immigrants
Year Mexican
5 from Mexico
Only
alone.
Program 7 Cross Tabulation
Definition Interpretation Result
Descriptive statistics is a branch of statistics that
The describe function shows us the statistical data
deals with summarizing and describing the
of the dataset based on various statistical
characteristics of a data set. It provides
measures. Here, it has calculated the total count,
quantitative measures that help to describe the
mean, standard deviation, minimum and maximum
data in a meaningful way. The goal of descriptive
of each category, and its Quartiles.
statistics is to provide a summary of the data that

can be easily understood and interpreted by


humans.

Code
Program 8 Imp Of Correlation
Definition Code Interpretation

The correlation function calculates the correlation
•Correlation is a statistical measure that indicates of each category/column against each other in the
the extent to which two or more variables fluctuate dataset. We can observe that 2000 and 2005 years
in relation to each other. has a high correlation in this dataset.

•A positive correlation indicates the extent to


which those variables increase or decrease in
parallel

• A negative correlation indicates the extent to


Result
which one variable increases as the other
decreases

Program 9 Simple Regression


Definition Code Interpretation
•Simple linear regression is a statistical technique The simple regression function calculates the
used to model the relationship between two regression of each category/column against each
quantitative variables, where one variable (the other in the dataset. We can observe that 2000 and
dependent variable) is predicted from the other 2005 years has a high regression in this dataset.
variable (the independent variable) using a linear
function

Result

•The goal of simple linear regression is to find the


line of best fit that minimizes the sum of the
squared differences between the predicted values
and the actual values of the dependent variable.

Program 10 Testing Hypothesis for Mean


Definition Code Interpretation
•Testing of hypothesis for single mean is a The sample mean of the dataset is 196425.96
statistical method used to determine whether a

sample mean is significantly different from a Sample Std Dev is 449680.81


hypothesized population mean. This type of

hypothesis testing involves comparing the sample T value 2.18


mean to the hypothesized population mean using a

t-test or a z-test, depending on whether the P value 0.03


population standard deviation is known or
unknown.

Result
Program 11 Hypothesis for Comparision of Means

Definition Code
Hypothesis testing for comparison of means is a
statistical method used to determine if there is a
significant difference between the means of two or
more populations. In this type of hypothesis
testing, we compare the means of two or more
samples to see if they come from populations with
the same mean or different means

Interpretation
We can infer that there is evidence to suggest that
the eloping 2000 year and 2005 year are not equal.
Grp1 Mean: 6414
Grp2: 2230.4
T Value: 1.27
P Value: 0.27

Result
Program 12 Chi Sqaure Test
Defination Code Interpretation
•It is a statistical test used to determine whether Based on the results of the chi-square test for
there is a significant association between two independence, with a chi-square value of 75 and a
categorical variables. p-value of approximately 0.38, we can conclude
• that there is no significant relationship between
•It tests the null hypothesis that there is no the 2000 year and to Border Region.
relationship between the variables, and the
alternative hypothesis that there is a significant
relationship.
•If the Chi-square statistic is large enough and the
associated p-value is small enough (typically less
than 0.05), then we reject the null hypothesis and
Result
conclude that there is a significant relationship
between the variables.

• If the p-value is not small enough, then we fail to


reject the null hypothesis and conclude that there
is no significant relationship.

You might also like