0% found this document useful (0 votes)
4 views

Census 8

Uploaded by

afaq0456
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Census 8

Uploaded by

afaq0456
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Name

ID
1. Introduction
This project involves working with a fictitious census data of a hypothetical town intending
to address the fundamental issues of land use and resource management that affects the local
authority. It has been seen that census data acts as a powerful tool in the hands of the
governments as it reveals the qualitative and quantitative population details which are
essential to identify the areas of growth and to shape the future policies and infrastructure.
The sample data resembled a census that might have been taken in historical past: details of
household members including marital status, age, occupation and religion were recorded.
In this project, we focus on cleaning and analysing the data to inform decisions on two
primary issues: In terms of community welfare, this means the construction of a plot of land
without the presence of dwellings and investing in other areas. Concretely, the analysis will
give the answer to the question of what shall be built – high-density housing, low-density
housing, train station, religious structure, emergency medical centre, or some other
infrastructure. It will also evaluate the merits or otherwise of investing in
employment/training opportunities, care for the elderly, education, or simply, physical
enhancement of facilities.
Therefore, through analysing the age factors, employment rates, religious activities, people’s
marital statuses, household occupancy rates, university students in the town, birth and death
rates, this project will be prepared with sound advice as to further development and
investment in the town. The result will be a well-coordinated and researched plan that
captures the current and future needs of the town inhabitants in terms of need fulfilment in
equal proportions.

2. Data cleaning
In this project, data cleaning was a critical step to ensure the accuracy and reliability of the
analysis. The dataset initially contained several missing values and inconsistencies, which
needed to be addressed before conducting any meaningful analysis. The data cleaning process
involved handling missing values across various columns, converting data types where
necessary, and standardizing certain entries to maintain consistency.
The dataset had 8,291 entries and 12 columns, with missing values spread across multiple
columns. Here's an overview of the initial data quality:
 The House Number, Street, First Name, Surname, Age, Gender, and Occupation
columns had 41 missing values each.
 The Relationship to Head of House column had 644 missing values.
 The Marital Status column had 2,142 missing values.
 The Infirmity column had 8,223 missing values, indicating that most entries were
missing for this attribute.
 The Religion column had 4,814 missing values.
To address these issues, the following data cleaning steps were implemented:
1. House Number: Missing values were filled with the median of the existing house
numbers, as this would provide a reasonable estimate for a numerical field.
2. Street: Missing street names were replaced with the mode (most frequent) street
name, if common street names might cover missing entries.
3. First Name and Surname: Missing names were filled with the placeholder
'Unknown', acknowledging that these are critical identifiers but were not available for
some entries.
4. Age: The Age column, which was initially of type object, was converted to a numeric
type to facilitate analysis. Any non-numeric values were coerced into NaN (missing
values), which were then filled with the median age of the dataset to approximate the
age of individuals where it was missing.
5. Relationship to Head of House: Missing values in this categorical field were filled
with the mode, as it represents the most common relationship in the dataset.
6. Marital Status: Similarly, the mode was used to fill missing values in the marital
status field, ensuring consistency in demographic data.
7. Gender: Missing gender entries were filled with the mode, aligning with the most
common gender recorded.
8. Occupation: The most common occupation was employed to complete the gaps in
occupational information, thus keeping the credibility of work-related results.
9. Infirmity: As stated earlier, there were many cells which were empty in this column;
thus, all the blank cells in this column were coded as ‘Unknown’. This maintains a list
of recognized impairments while also admitting the existence of missing
informational links.
10. Religion: The mode was used to fill missing values in the religion column, providing
a common religious affiliation in cases where the data was absent.
To ensure the data is ready for the models, some more steps in the data cleaning process were
identified to check for and work with invalid values in the Age and Gender columns I also
one more column name with Age Group range of 4. Here are the guidelines of these steps:
1. Handling Invalid Age Values:
o A check was performed to identify any Age values that were unrealistic,
specifically those less than 0 or greater than 120. Such values were replaced
with pd.NA (missing values) to indicate that the age data was not valid.
o After addressing the invalid ages, any missing values in the Age column were
filled with the median age of the dataset to maintain consistency in
demographic analysis.
2. Standardizing Gender Values:
o A gender mapping dictionary was created to standardize the various
representations of gender in the dataset. For example, 'f', 'female', and 'F' were
all mapped to 'Female', while 'm', 'male', and 'M' were mapped to 'Male'.
o The Gender column was then updated using this mapping to ensure
consistency across entries.
o Any remaining empty values in the Gender column were replaced with pd.NA,
and subsequently, all pd.NA values were filled with 'Unknown' to maintain
completeness in the data.

Figure no 1: Dataset Introduction

3. Population Demographics
The provided data on age group distribution reveals a diverse demographic landscape. The
highest percentage of the population falls within the 35-39 age group, comprising 8.91% of
the total, indicating a significant concentration in this age bracket. The percentage gradually
declines as the age increases, with the 65-69 age group representing 3.53% and the 70-74 age
group further decreasing to 2.36%. The population distribution continues to shrink with
advancing age, culminating in the 100+ age group, which accounts for just 0.21% of the total
population. Notably, the data highlights a prominent decline in population percentages for
older age groups, particularly after 65 years, reflecting the typical demographic trend of aging
populations. This distribution underscores a central tendency of the population clustered
around middle age, with a decreasing proportion in the elderly categories.

Figure no 2: Population pyramid

3.1 Divorce and Marriage


The marital status distribution data reveals that most of the population is single, with a total
of 5,113 individuals representing this category. This is significantly higher compared to the
other marital statuses. The next most prevalent group is married individuals, numbering
2,164. Divorced individuals make up 735 of the population, while widowed individuals
account for 278. This distribution highlights a clear predominance of single individuals, with
a noticeable drop in the numbers for married, divorced, and widowed categories.
Figure no 3: Divorce and Marriage

3.2 Distribution of occupations


The employment data reveals the distribution of various occupations among the top 10
categories. The most prevalent occupation is "Student," with 1,728 individuals representing
20.84% of the total, indicating a significant proportion of the population engaged in
educational pursuits and Percentage of Retired: 7.01%. Following this, "Unemployed" and
"University Student" categories each encompass 534 and 533 individuals respectively,
accounting for 6.44% and 6.43% of the population. The "Child" category, with 514
individuals, constitutes 6.20% of the dataset, reflecting a notable segment of the population
not yet participating in formal employment. The remaining top occupations include "PhD
Student" at 0.22%, and several roles such as "Psychologist, sport and exercise," "Surveyor,
quantity," "Insurance account manager," "Musician," and "Art gallery manager," each
representing 0.16% to 0.18% of the population. This distribution highlights a diverse range of
roles, with a considerable focus on educational and transitional statuses.
Figure no 4: Occupations

3.3 Birth and death rate


The birth rate is 11.82 per 1,000 people, indicating a moderate level of new births relative to
the population. The death rate, calculated as 11.70 per 1,000 people, reflects a similar level of
mortality, based on individuals aged 85 and above as in (Clark, 2018). The difference
between the birth rate and the death rate is 0.12 per 1,000 people. This small difference
indicates that the population is relatively stable, with nearly equal rates of births and deaths.
Migration
Migratory statistics in the UK for 2023 indicate an overall increase in immigration, driven
largely by students and workers. From September 2021 to September 2022, immigration
reached a record high of approximately 1.1 million people. This is attributed to various
factors, including changes in visa policies and global events such as the war in Ukraine and
the aftermath of Brexit, which have altered migration patterns (GOV.UK, n.d.).

3.4 Household Occupancy Distribution


The distribution of household sizes in the dataset reveals a diverse range of occupancy levels.
The most common household sizes are those with 8 individuals, representing 17 households,
and 9 individuals, representing 11 households. Smaller households are also prevalent, with 4
individuals in 7 households and 5 individuals in 8 households. Larger households, though less
frequent, are present, with sizes ranging from 105 to 325 individuals. Notably, some
households have a single individual count, such as 243 individuals, appearing in only one
household. This wide range of household sizes indicates a varied living situation within the
community.

Figure no 5: Household Occupancy Distribution


3.5 Religion and Infirmity
The distribution of religious affiliations within the town highlights a predominance of
Christianity, with 80.06% of the population identifying as Christian. Other significant groups
include Catholicism (10.46%) and Methodism (6.83%). Minor representations include
Muslim (1.56%), Sikh (0.69%), and Jewish (0.25%). Other categories such as Baptist,
Housekeeper, Pagan, Quaker, and Private account for a very small percentage of the
population.

Figure no 6: Religion Distribution

3.6 Unemployment Rate Across Different Age Groups


The percentage distribution of unemployed individuals by age group and gender shows that
in the 15-19 age group, unemployment is entirely female. In the 20-24 and 25-29 age groups,
females also dominate, with 64.29% and 72.22% respectively. However, in older age groups
like 55-59 and 60-64, the proportion of unemployed males increases. By the 75-79 age group,
males constitute 66.67% of the unemployed. In the oldest age groups, unemployment is again
exclusively female. Overall, while younger age groups tend to have higher female
unemployment rates, this trend shifts as age increases.
Figure no 7: Unemployment Rate Across Different Age Groups

3.7 Commuters
In the town, there are a total of 533 university students and 5,370 employed, all of whom are
commuters because there are no universities located within the town. Consequently, every
university student must travel to attend their educational institution and workplaces. In
addition to university students, other professions that are likely to involve commuting include
roles in IT technical support, health professions, and specialized technology fields. For
instance, IT technical support roles and positions such as health physicists or brewing
technologists may require employees to work in specialized facilities or offices that are
situated outside the town. Similarly, multimedia programmers and other specialized
professionals might also need to commute to access the specific resources or workplaces
pertinent to their jobs.

4. Task Analysis and Recommendations

(a) What should be built on an unoccupied plot of land that the local
government wishes to develop?
To determine the most suitable development for the unoccupied plot of land, we'll consider
the following options:

1. High-Density Housing
o Justification: This is suitable if the population is expanding and there is a
need to accommodate more residents.
o Analysis

 Population Growth: The birth rate (11.82 per 1,000) is slightly higher
than the death rate (11.70 per 1,000), indicating a stable population
with a slight increase.
 Household Occupancy: There are significant instances of high
household occupancy (e.g., households with 8-9 individuals).
Overcrowding is a concern, suggesting a need for more housing units
to alleviate pressure on existing homes.

2. Low-Density Housing
o Justification: This should be considered if the population is affluent and
there's a demand for larger family homes.
o Analysis

 Economic Status: The data doesn't explicitly mention affluence, but


with a high proportion of students and significant unemployment, the
indication is not of an affluent population.
 Marital Status: A large single population suggests less immediate
demand for large family homes.

3. Train Station
o Justification: If a significant number of commuters are present, a train station
can reduce road congestion.
o Analysis

 Commuter Data: With 533 university students and 5,370 employed


individuals commuting to nearby cities, the town has a substantial
commuter population.
 Existing Infrastructure: Currently, there is no train station to
facilitate these commutes, which likely places pressure on road
transport.

4. Religious Building
o Justification: If there is a high demand for religious services not met by
existing facilities.
o Analysis

 Religious Affiliation: The town predominantly identifies as Christian


(80.06%), with Catholics making up 10.46%. Other religious groups
have much smaller representations.
 Current Facilities: There is already one Catholic place of worship.
The data doesn't show significant demand for additional religious
facilities for other denominations or religions.

5. Emergency Medical Building


o Justification: Necessary if there's a high frequency of injuries or anticipated
medical needs (e.g., pregnancies).
o Analysis

 Population Demographics: The population has a substantial middle-


aged group (35-39), with a gradual decline in older age groups. This
suggests moderate healthcare needs but not exceptionally high.
 Infirmity Data: With most entries missing, it's challenging to gauge
the exact demand for emergency medical services.
Recommendation: Build a Train Station
 Justification: The significant number of commuters (both university students and
employed individuals) indicates a high demand for improved transportation
infrastructure. A train station would alleviate road congestion, provide a more
efficient commuting option, and potentially attract more residents who prefer easy
access to larger cities for work or study.

(b) Which one of the following options should be invested in?

1. Employment and Training


o Justification: If there's significant unemployment, re-training can provide
new skills.
o Analysis
 Unemployment Trends: The unemployment rate varies across age
groups, with young females (15-29) and older males (55-64) showing
higher unemployment rates. This indicates a need for targeted
employment and training programs.

2. Old Age Care


o Justification: Necessary if there's an anticipated increase in the retired
population.
o Analysis

 Age Distribution: The population is concentrated around middle age,


with a smaller proportion of elderly individuals. There is a moderate
future need for old age care, but it's not immediate.

3. Increase Spending for Schooling:


o Justification: Needed if there's a growing population of school-aged children.

o Analysis

 Birth Rate: The stable population growth and the significant


proportion of children (6.20% of the population are classified as
children) indicate a steady need for schooling infrastructure.

4. General Infrastructure
o Justification: If the town is expanding, general services need more
investment.
o Analysis

 Population Growth and Household Occupancy: The population's


slight growth and high household occupancy levels suggest a need for
improved general infrastructure.
Recommendation: Invest in Employment and Training
 Justification: Addressing unemployment, particularly among young females and
older males, can have significant long-term benefits for the town’s economy and
social stability. By providing re-training and skill development programs, the town
can reduce unemployment rates and prepare its workforce for more diverse
employment opportunities. This approach aligns with the identified trends and
addresses a critical need in the community.
Option/Investment Area Demand/Need Analysis Final Decision
High-Density Housing High demand, significant Not chosen
overcrowding
Low-Density Housing Low demand Not chosen
Train Station High demand Chosen
Religious Building Low demand Not chosen
Emergency Medical Moderate demand, Not chosen
Building
Investment Area
Employment and Training High demand Chosen
Old Age Care Moderate future need but Not chosen
not immediate
Increase Spending for Steady need Not chosen
Schooling
General Infrastructure Need for improved general Not chosen
infrastructure due to slight
population growth and high
household occupancy

5. Bibliography
Clark, D. (2018). Life expectancy in the UK by gender 2018 | Statista. [online] Statista.
Available at: https://www.statista.com/statistics/281671/life-expectancy-united-kingdom-uk-
by-gender/.
GOV.UK. (n.d.). Key Findings, Statistical Digest of Rural England. [online] Available at:
https://www.gov.uk/government/statistics/key-findings-statistical-digest-of-rural-england/
key-findings-statistical-digest-of-rural-england.

You might also like