0% found this document useful (0 votes)
9 views

internshipreport4 (1)

The document is an internship report submitted by Purru Balaji for the Bachelor of Technology in Computer Science & Engineering (Data Science) at Sri Venkateshwara College of Engineering and Technology. It details the internship experience at Techno Hacks, focusing on a data analytics project analyzing the impacts of COVID-19 across various sectors. The report includes acknowledgments, an abstract, an organization profile, a weekly overview of activities, and chapters covering data science fundamentals and tools.

Uploaded by

2B1 P.Balaji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

internshipreport4 (1)

The document is an internship report submitted by Purru Balaji for the Bachelor of Technology in Computer Science & Engineering (Data Science) at Sri Venkateshwara College of Engineering and Technology. It details the internship experience at Techno Hacks, focusing on a data analytics project analyzing the impacts of COVID-19 across various sectors. The report includes acknowledgments, an abstract, an organization profile, a weekly overview of activities, and chapters covering data science fundamentals and tools.

Uploaded by

2B1 P.Balaji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

SRI VENKATAESHWARA COLLEGE OF ENGINEERING AND TECHNOLOGY

(AUTONOMOUS)
R.V.S Nagar, Chittoor – 517 127. (A.P)
(Approved by AICTE, New Delhi, Affiliated to JNTUA, Anantapur)
(Accredited by NBA, New Delhi & NAAC A+, Bangalore)
(An ISO 9001:2000 Certified Institution)
2024-2025

INTERNSHIP REPORT
A report submitted in partial fulfilment of the requirements for the Award of Degree of

BACHELOR OF TECHNOLOGY
IN

COMPUTER SCIENCE & ENGINEERING


(DATA SCIENCE)
BY
PURRU BALAJI
Regd.No.21781A32B1
Under supervision of
Mrs. Sandip Gavit

TECHNO HACKS
(Duration: 01/06/2024 to 31/08/2024)
SRI VENKATAESHWARA COLLEGE OF ENGINEERING AND TECHNOLOGY
(AUTONOMOUS)
R.V.S Nagar, Chittoor – 517 127. (A.P)
(Approved by AICTE, New Delhi, Affiliated to JNTUA, Anantapur)
(Accredited by NBA, New Delhi & NAAC A+, Bangalore)
(An ISO 9001:2000 Certified Institution)
2023-2024

CERTIFICATE

This is to certify that the “Internship report” submitted by PURRU


BALAJI (Regd.No.:21781A32B1) is work done by him and
submitted during 2023-2024.Academic year, in partial fulfilment of
the requirements for the award of the Degree of BACHELOR OF
TECHNOLOGY in COMPUTER SCIENCE & ENGINEERING
(DATA SCIENCE), at TECHNO HACKS.

Mr.R.RAJASHEKHAR Dr.MOSESDIAN
Internship coordinator Head of the Department
(DATA SCIENCE)
CERTIFICATE
ACKNOWLEDGEMENT

 A Grateful thanks to Dr.R.VENKATASWAMY Chairman of Sri Venkateshwara College of


Engineering & Technology(Autonomous) for providing education in their esteemed institution.
I wish to record my deep sense of gratitude and profound thanks to our beloved Vice Chairman,
Sri R.V. Srinivas for his valuablesupport throughout the course.

 I express our sincere thanks to Dr.M.MOHAN BABU, our beloved principal for his
encouragement and suggestion during the course of study.

 With the deep sense of gratefulness, I acknowledge Dr.MOSESDIAN Head of the Department,
Computer Science Engineering (CSD), for giving us
inspiring guidance in undertaking internship.

 I express our sincere thanks to the internship coordinator Mrs.R.RAJASHEKHAR for his keen
interest, stimulating guidance, constant encouragement with our work during all stages, to bring
this report into fruition.

 I wish to convey my gratitude and sincere thanks to all members for their support and
cooperation rendered for successful submission of report.

 Finally, I would like to express my sincere thanks to all teaching, non-teaching faculty
members, our parents, and friends and for all those who have supported us to complete the
internship successfully.

(NAME: PURRU BALAJI)


(ROLL.NO. 21781A328B1)
ABSTRACT
Title: "Navigating the Storm: A Comprehensive Data Analytics Approach for Analyzing the Impacts of
COVID-19"
The COVID-19 pandemic has disrupted global societies, economies, and healthcare systems, leaving an
indelible mark on our world. This data analytics project aims to provide a thorough analysis of the
multifaceted impacts of the COVID-19 pandemic. Leveraging advanced data analytics techniques, our study
will explore various dimensions, including public health, socio-economic factors, and global supply chains.

Public Health Assessment:


Utilize epidemiological data to assess the spread and severity of COVID-19.
Analyze the effectiveness of public health interventions and vaccination campaigns.
Identify patterns and trends in infection rates, mortality, and recovery rates across regions.

Socio-Economic Impact Analysis:


Examine the economic fallout of the pandemic on different sectors, including employment, GDP, and
business operations.
Investigate the disparities in the socio-economic impact on diverse demographic groups.
Evaluate the effectiveness of government stimulus packages and relief measures.

Education and Workforce Dynamics:


Investigate the transition to remote work and its impact on productivity and job satisfaction.
Assess the challenges faced by the education sector due to lockdowns and the shift to online learning.
Explore the long-term implications of these changes on the workforce and education systems.

Supply Chain Disruptions:


Analyze disruptions in global supply chains and identify vulnerable nodes.
Evaluate the impact on manufacturing, logistics, and the availability of essential goods.
Propose strategies for enhancing supply chain resilience in the face of future global crises.

Sentiment Analysis:
Conduct sentiment analysis on social media and news articles to gauge public perception and emotional
responses to the pandemic.
Investigate the role of misinformation and its impact on public behavior and compliance with health
guidelines.
ORGANIZATION PROFILE

ABOUT OUR ORGANIZATION


TechnoHacks EduTech: Bridging the Gap in IT Training and Industry Requirements
TechnoHacks EduTech, founded in 2022 and headquartered in Nashik, Maharashtra, India, is a beacon of education
for students and professionals aiming to thrive in the IT sector. Their mission revolves around providing comprehen
sive training and mentorship to bridge the gap between theoretical knowledge and industry requirements.
Mission and Vision
TechnoHacks EduTech’s core mission is to empower individuals through education, training, and mentorship. They
aim to equip students and working professionals with the necessary skills to excel in the competitive IT industry. Th
e organization envisions a future where education is accessible, practical, and aligned with industry demands, creati
ng a seamless transition from learning to employment.
Services and Programs
1. Training and Mentorship Programs
 TechnoHacks offers a range of training programs tailored to different skill levels. From beginners to
advanced learners, their curriculum covers various aspects of IT, including programming languages
, software development, and cybersecurity.
 Mentorship is a key component, providing personalized guidance to help learners navigate their edu
cational and career paths. Experienced mentors share insights, offer feedback, and support learners i
n achieving their goals.
2. Internship and Placement Training
 Recognizing the importance of practical experience, TechnoHacks provides internship opportunities
to students. These internships allow learners to apply their knowledge in real-
world scenarios, gaining invaluable hands-on experience.
 Placement training is another critical service, preparing students for job interviews and helping the
m secure positions in the IT industry. This training includes resume building, mock interviews, and
job search strategies.
3. Certification Courses
 To validate their skills, learners can enroll in certification courses offered by TechnoHacks. These c
ertifications are recognized in the industry, enhancing the employability of graduates.
4. Industry Partnerships
 Collaborating with industry leaders, TechnoHacks ensures that their training programs remain up-
to-
date and aligned with current trends. These partnerships also open doors for learners, connecting the
m with potential employers and industry networks.
5. Future Goals
 Looking ahead, TechnoHacks aims to expand its reach and impact. Plans include developing more a
dvanced courses, exploring international markets, and leveraging technology to enhance the learnin
g experience. Innovations such as AI-
driven personalized learning and virtual labs are on the horizon, promising an even more engaging a
nd effective educational journey.
INDEX

CHAPTER 1: INTRODUCTION TO ABOUT DATA SCIENCE

12.1 What is data science


12.2 What are the jobs roles/domains in the data science and skills to learn
12.3 What is the range of salary in these domains

CHAPTER 2: EXCEL

2.1 Spreadsheets| Importing Data to Excel


2.2 Spreadsheets Functions to organize Data | Filtering | Pivot Tables | charts
2.3 Conditional Formatting | Data Validation

CHAPTER 3: SQL DATA BASE

3.1 SQL Overview | Relational Database Concepts


3.2 SQL Data Grouping and Summarizing
3.3 SQL Clause, SQL Functions
3.4 SQl-Correalted and uncorrelated Queries -sub-Queries

CHAPTER 4: PYTHON PROGRAMMING

4.1 Collection objects -1(conditional Statements, Arrays, Strings)


4.2 Collection objects -2(List, Tuple, Dictionary)
4.3 OOPs
4.4 Python Strings

CHAPTER 5: NUMPY

5.1 NumPy Array


5.2 Operations on Array
5.3 Indexing and Slicing

CHAPTER 6: TABLEAU

6.1 Tableau
6.2 Meta Data
6.3 Types of Tableau Chart
6.4 Visual Analytics

CHAPTER 7: POWER BI

7.1 Power BI
7.2 Interface, Data Connection
7.3 Data Transformation

CHAPTER 8: FINAL PROJECT

CHAPTER 9: CONCLUSION
Learning Objectives/Internship Objectives

• Internships are generally thought of to be reserved for college students looking to gain experience in a
particular field.

• However, a wide array of people can benefit from Training Internships in order to receive real world
experience and develop their skills. An objective for this position should emphasize the skills you already
possess in the area and your interest in learning more

• Internships are utilized in a number of different career fields, including architecture, engineering,
healthcare, economics, advertising and many more.

• Some internship is used to allow individuals to perform scientific research while others are specifically
designed to allow people to gain first-hand experience working.

• Utilizing Internship, make sure to highlight any special skills or talents that can make you stand apart from
the rest of the applicants so that you have an improved chance of landing the position internships is a great
way to build your resume and develop skills that can be emphasized in your resume for future jobs. When
you are applying for Training.
WEEKLY OVER VIEW OF INTERNSHIP ACTIVITIES

1ST WEEK
DATE DAY NAME OF THE MODULE/TOPICS COMPLETED
Orientation of the Program, Introduction of Data Science and Data
17-04-2023 Monday
Science Applications
Spreadsheet Functions to organize Data, Filtering, Pivot Tables, and
18-04-2023 Tuesday
Charts
19-04-2023 Wednesday Conditional Formatting, Data Validation
20-04-2023 Thursday SQL Overview, Relational Database Concepts
21-04-2023 Friday SQL Data Grouping and Summarizing
22-04-2023 Saturday RAMZAN (Holiday)
23-04-2023 Sunday Holiday

2ND WEEK

DATE DAY NAME OF THE MODULE/TOPICS COMPLETED


24-04-2023 Monday SQL Clause, SQL Functions, Live Doubt Session
25-04-2023 Tuesday Self-driven/ Assignment
26-04-2023 Wednesday Self-driven/ Assignment
27-04-2023 Thursday Self-driven/ Assignment
28-04-2023 Friday Self-driven/ Assignment
29-04-2023 Saturday Self-driven/ Assignment
30-04-2023 Sunday Case study Project Industry Expert Session Grand Test

3RD WEEK
DATE DAY NAME OF THE MODULE/TOPICS COMPLETED
01-05-2023 Monday Self-driven/ Assignment
02-05-2023 Tuesday Self-driven/ Assignment
03-05-2023 Wednesday Self-driven/ Assignment
04-05-2023 Thursday Self-driven/ Assignment
05-05-2023 Friday Self-driven/ Assignment
06-05-2023 Saturday Holiday
07-05-2023 Sunday Holiday
4TH WEEK
DATE DAY NAME OF THE MODULE/TOPICS COMPLETED
08-05-2023 Monday SQL-Correlated and Uncorrelated Queries - Sub - Queries
09-05-2023 Tuesday NumPy Array
10-05-2023 Wednesday Operations on Arrays
11-05-2023 Thursday Operations on Arrays
12-05-2023 Friday Indexing and slicing
13-05-2023 Saturday Holiday
14-05-2023 Sunday Holiday

5TH WEEK
DATE DAY NAME OF THE MODULE/TOPICS COMPLETED
15-05-2023 Monday Self-driven/ Assignment
16-05-2023 Tuesday Self-driven/ Assignment
17-05-2023 Wednesday Self-driven/ Assignment
18-05-2023 Thursday Self-driven/ Assignment
19-05-2023 Friday Self-driven/ Assignment
20-05-2023 Saturday Self-driven/ Assignment
21-05-2023 Sunday Case study Project Industry Expert Session Grand Test

6TH WEEK
NAME OF THE MODULE/TOPICS
DATE DAY
COMPLETED
22-05-2023 MONDAY
TO TO Preparations for Final Academic Examinations
21-06-2023 WEDNESDAY

7th WEEK

DATE DAY NAME OF THE MODULE/TOPICS COMPLETED


22-06-2023 Tuesday R Programming by German Professor
23-06-2023 Wednesday R Programming by German Professor
24-06-2023 Thursday Session – 1 Meta Data
25-06-2023 Friday Types of Tableau Chart
26-06-2023 Saturday Visual Analytics
27-06-2023 Sunday Self-driven/ Assignment
8TH WEEK
DATE DAY NAME OF THE MODULE/TOPICS COMPLETED
28-06-2023 Monday Session – 04 Power BI
29-06-2023 Tuesday Interface, Data Connection
30-06-2023 Wednesday Data Transformation
01-07-2023 Thursday Self-driven/ Assignment
02-07-2023 Friday Self-driven/ Assignment
03-07-2023 Saturday Self-driven/ Assignment
04-07-2023 Sunday Case study Project Industry Expert Session Grand Test

9TH WEEK
DATE DAY NAME OF THE MODULE/TOPICS COMPLETED
05-07-2023 Wednesday
TO To Advanced Python Programming by German Professor & Exam
22-09-2023 Sunday preparation

27-06-2023
TO Monday FINAL EXAM
23-09-2023
CHAPTER 1: Introduction to about data science

1.1 What is data science: Data science is a multidisciplinary field that involves the use of scientific
methods, processes, algorithms, and systems to extract insights and knowledge from structured and
unstructured data. It combines expertise from various domains, including statistics, mathematics, computer
science, and domain-specific knowledge, to analyze and interpret complex data sets.

DATA PREPROCESSING TECHNIQUES:

Data Collection
Data Cleaning and Preprocessing
Exploratory Data Analysis (EDA)
Feature Engineering
Machine Learning
Model Evaluation and Validation:
Big Data and Distributed Computing
Deep Learning
Data Visualization
Ethics and Privacy
Domain Knowledge
Communication Skills
1.2 What are the jobs roles/domains in the data science and skills to learn
JOB ROLES:

EXAMPLE: DATA ANALYST DOMAIN/TOOLS: Data analysts play a critical role in


organizations by collecting, analysing, and interpreting data to provide insights and support decision-
making. To excel as a data analyst, you should possess a combination of technical, analytical, and
communication skills. Here are some key skills that data analysts typically need.
1.3 What is the range of salary in these domains
CHAPTER 2: EXCEL
2.1 spreadsheets| Importing Data to Excel:
Spread Sheet: Spreadsheets are a powerful tool in Excel for organizing and analyzing data. They provide
a way to store data in rows and columns, perform calculations and analysis, and create visual representations
of the data. In this section, we will cover some basic concepts of spreadsheets in Excel that relate to the
questions mentioned earlier.

 Importing Data from Text/CSV Files:


Open Excel.
Click on "File" in the top left corner.
Select "Open" and browse to find your text or CSV file.
Select the file and click "Open.".
 Importing Data from a Database:
Go to the "Data" tab in Excel.
Click on "Get Data" or "Get External Data," depending on your Excel version.
Choose the data source, such as SQL Server, MySQL, or Oracle.
 Importing Data from the Web:
Go to the "Data" tab in Excel.
Click on "Get Data" or "From Web," depending on your Excel version.
Enter the URL of the web page or specify a web query.
 Importing Data from Other Excel Files:
Open the Excel workbook where you want to import the data.
Click on a cell where you want to place the imported data.
Go to the "Data" tab.
Click on "Get Data" or "From Workbook," depending on your Excel version.

2.2 Spreadsheets Functions to organize Data | Filtering | Pivot Tables | charts


Organizing Data:
Sort: ascending or descending order based on selected criteria.
Filter: Apply the Filter function to display only the rows that meet specific criteria, hiding the rest of the data
temporarily.
Group: Group data to organize it hierarchically, making it easier to analyze and summarize.

Filtering Data:
AutoFilter: Enable AutoFilter to quickly filter data based on specific values in a column.
Advanced Filter: Use the Advanced Filter function to apply complex criteria for filtering data.

Pivot Tables:
Create Pivot Table: Use the Pivot Table function to create a summary of a large dataset, allowing you to
rearrange, summarize, and analyse data dynamically.
Pivot Charts: Once you create a Pivot Table, you can create a Pivot Chart based on that table to visualize the
summarized data.

Charts and Graphs:


Insert Chart: Use the Insert Chart function to create various types of charts such as bar charts, line charts, pie
charts, etc.
Customize Charts: Customize the appearance and formatting of charts to better represent data using the
Chart Tools in Excel.
Combo Chart: Create a combo chart to display multiple sets of data in one chart, with different types of data
represented on different axes.
Data Validation:
Data Validation: Use Data Validation to set restrictions on what type of data can be entered into a cell,
ensuring data consistency and accuracy.
Text Functions: LEFT, RIGHT, MID: Use these functions to extract specific portions of text from cells.
CONCATENATE, CONCAT: Combine text from multiple cells into one cell using these functions.

Lookup and Reference Functions:


VLOOKUP, HLOOKUP: Use these functions to look up values in a table based on a key and return related
information.
INDEX, MATCH: Use these functions to find a value in a specified row or column and return a value in the
corresponding cell.

Math and Statistical Functions:


SUM, AVERAGE, MAX, MIN: Use these functions for basic mathematical and statistical calculations on a
range of data.
COUNT, COUNTA, COUNTIF: Use these functions to count cells that meet certain criteria.

Date and Time Functions:


TODAY, NOW: Use these functions to insert the current date and time into a cell.
DATE, TIME: Create date and time values using these functions.

IF Function:
IF: Use the IF function to perform conditional operations based on specified criteria.
Logical Functions:
AND, OR, NOT: Use these functions to perform logical operations.

2.3 Conditional Formatting | Data Validation


Conditional Formatting:
Conditional formatting is a feature in Excel that allows you to apply specific formatting rules to cells based
on their content, helping you visually analyse and highlight important trends or patterns in your data. Here's
how to use conditional formatting:

Highlight Cell Rules:


 Select the range of cells you want to apply conditional formatting to.
 Go to the "Home" tab.
 Click on "Conditional Formatting" in the "Styles" group.
 Choose "Highlight Cells Rules" and select a rule like "Greater Than," "Less Than," "Between," etc.
 Set the conditions and formatting options. Excel will automatically apply the chosen formatting to
the cells based on the specified conditions.

Data Bars, Color Scales, Icon Sets: These options in the "Conditional Formatting" menu allow you
to visually represent the values in your cells using bars, color gradients, or icons, making it easier to interpret
the data.

New Rule: For more advanced or custom rules, you can select "New Rule" in the "Conditional
Formatting" menu. This allows you to define your own formatting rule using formulas.

Data Validation: Data validation is a feature that helps control what type of data can be entered into a
cell or range. It ensures data accuracy and consistency by restricting the input to specific criteria. Here's how
to use data validation:
Basic Data Validation:
 Select the cells or range where you want to apply data validation.
 Go to the "Data" tab and click on "Data Validation."
 Choose the type of validation rule (e.g., whole number, decimal, list, date, text length) and set the
criteria accordingly.
 Customize the error message and input message if needed.
List Data Validation:
 To create a drop-down list for cell input:
 Select the cells where you want the drop-down list.
 Go to the "Data" tab and click on "Data Validation."
 Choose "List" as the validation criteria and specify the source of the list (either a range of cells or a
comma-separated list).
 Custom Data Validation:

For more complex validation rules, you can use a custom formula:
Select the cells or range.
Go to the "Data" tab and click on "Data Validation."
Choose "Custom" and enter the custom formula based on your validation criteria.
CHAPTER 3: SQL data base
3.1 SQL Overview | Relational Database Concepts: SQL (Structured Query Language)
is a domain-specific language used in programming to manage and manipulate relational databases. It
provides a standardized way to interact with databases, enabling users to perform tasks such as querying
data, inserting new records, updating existing records, and deleting records. SQL is crucial for working with
relational database management systems (RDBMS), such as MySQL, PostgreSQL, Microsoft SQL
Server, Oracle, and SQLite.

Relational Database Concepts:


Tables: In a relational database, data is organized into tables. Each table is a collection of related data
entries and consists of rows (records) and columns (fields).

Rows (Records): A row, also known as a record, represents a single entry or data point within a table.
Each row contains data for each column defined in the table.

Columns (Fields): Columns represent attributes or properties of the data stored in a table. Each column
has a specific data type (e.g., integer, text, date) that defines the type of data it can hold.

Primary Key: A primary key is a unique identifier for each row in a table. It ensures that each row can be
uniquely identified and accessed.

Foreign Key: A foreign key is a field in a table that refers to the primary key of another table. It
establishes a relationship between two tables, enabling the creation of links between data.

Indexes: Indexes are data structures that improve the speed of data retrieval by allowing quick access to
specific rows in a table based on the indexed columns.

Relationships: Relationships define the connections and associations between different tables in a
database. Common types of relationships include one-to-one, one-to-many, and many-to-many.

SQL Operations:
CRUD Operations:
 Create (INSERT): Adds new records into a table.
 Read (SELECT): Retrieves data from one or more tables based on specified criteria.
 Update (UPDATE): Modifies existing records in a table.
 Delete (DELETE): Removes records from a table.
Querying Data (SELECT Statement): The SELECT statement is used to query data from a table or
multiple tables. It can retrieve specific columns, apply filters, sort results, and aggregate data using
functions.

Filtering and Sorting: SQL allows you to filter data using the WHERE clause based on specified
conditions. You can also sort the results using the ORDER BY clause.

Joining Tables: SQL enables the combination of data from multiple tables using various types of joins,
such as INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN.
Aggregation Functions: SQL provides aggregation functions like SUM, AVG, COUNT, MAX, and
MIN to calculate summaries and statistics on groups of data.

Subqueries: Subqueries allow you to nest one query within another, enabling complex data retrieval and
analysis.
Data Definition Language (DDL): DDL includes SQL commands like CREATE, ALTER, and DROP
used to define and modify the structure of the database, tables, indexes, etc.

Data Manipulation Language (DML) DML comprises SQL commands like INSERT, UPDATE, and
DELETE used to manage and manipulate data within the tables.

Data Control Language (DCL): DCL includes SQL commands like GRANT and REVOKE used to
control access to data within the database.

Transaction Control Language (TCL): TCL includes SQL commands like COMMIT,
ROLLBACK, and SAVEPOINT used to manage transactions in the database.

3.2 SQL Data Grouping and Summarizing: SQL provides several functions and clauses for
grouping and summarizing data in a database. These features help aggregate data to provide meaningful
insights and analysis. Here are key SQL concepts and functions for data grouping and summarization:

GROUP BY Clause: The GROUP BY clause is used to group rows returned by the SELECT statement
into summary rows based on the values in one or more columns. It's often used with aggregate functions to
calculate summaries within each group.
Syntax:
SELECT column1, aggregate_function(column2)
FROM table
GROUP BY column1;
HAVING Clause: The HAVING clause is used to filter the results of a GROUP BY clause based on a
condition. It's similar to the WHERE clause but works with aggregated data.
Syntax:
SELECT column1, aggregate_function(column2)
FROM table
GROUP BY column1
HAVING condition;
GROUPING SETS:
GROUPING SETS allow you to specify multiple groupings of the result set, providing a way to
aggregate data at different levels in a single query.
GROUP BY GROUPING SETS ((column1), (column2));

ROLLUP: ROLLUP is an extension of the GROUP BY clause that generates subtotals and grand totals
for a result set. It produces a result set with a hierarchy of grouping sets.

CUBE:
CUBE is similar to ROLLUP but generates all possible subtotal combinations, providing a complete
summary of the data.

SQL Clause, SQL Functions: SQL (Structured Query Language) is a powerful domain-specific
language used for managing, querying, and manipulating data in relational database management systems
(RDBMS). It consists of various clauses and functions that enable users to interact with databases
effectively. Here's an overview of key SQL clauses and functions:
3.3 SQL Clauses:
SELECT Clause: The SELECT clause is used to retrieve data from one or more database tables.
Syntax: SELECT column1, column2 FROM table_name;

FROM Clause: The FROM clause specifies the tables from which to retrieve the data.
Syntax: SELECT column1 FROM table_name;

WHERE Clause: The WHERE clause is used to filter records based on a specified condition.
Syntax: SELECT column1 FROM table_name WHERE condition;

GROUP BY Clause: The GROUP BY clause is used to group rows into summary rows based on the
values of one or more columns.
Syntax: SELECT column1, aggregate_function(column2) FROM table GROUP BY column1;

HAVING Clause: The HAVING clause is used in combination with the GROUP BY clause to filter
grouped records.
Syntax: SELECT column1, aggregate_function(column2) FROM table GROUP BY column1 HAVING
condition;

ORDER BY Clause: The ORDER BY clause is used to sort the result set in ascending or descending
order based on specified columns.
Syntax: SELECT column1 FROM table ORDER BY column1 ASC|DESC;

LIMIT Clause: The LIMIT clause is used to limit the number of rows in the result set.
Syntax: SELECT column1 FROM table LIMIT number_of_rows;

SQL Functions:
Aggregate Functions: Perform calculations across multiple rows and return a single result:
 SUM(): Calculates the sum of values.
 AVG(): Calculates the average of values.
 COUNT(): Counts the number of rows or non-null values.
 MAX(): Returns the maximum value.
 MIN(): Returns the minimum value.

String Functions: Manipulate and analyze text data:


 CONCAT(): Concatenates two or more strings.
 SUBSTRING(): Extracts a substring.
 UPPER(): Converts a string to uppercase.
 LOWER(): Converts a string to lowercase.
 LENGTH(): Returns the length of a string.

Date and Time Functions: Perform operations on date and time data types:
 NOW(): Returns the current date and time.
 DATE(): Extracts the date from a date-time value.
 DATEDIFF(): Calculates the difference between two dates.
 DATE_ADD(): Adds a specified time interval to a date.
Mathematical Functions: Perform mathematical calculations:
 ROUND(): Rounds a numeric value to a specified number of decimal places.
 ABS(): Returns the absolute value of a numeric value.
 SQRT(): Calculates the square root of a number.

Logical Functions: Evaluate logical expressions:


 IF(): Returns one value if a condition is true and another if false.
 CASE(): Performs conditional logic within a query.

3.4 SQl-Correalted and uncorrelated Queries -sub-Queries


In SQL, subqueries, whether correlated or uncorrelated, provide a way to nest one query inside another. This
allows for more complex and dynamic SQL queries. Let's explore the concepts of correlated and
uncorrelated subqueries:

Uncorrelated Subqueries: An uncorrelated subquery is a subquery that can be executed independently


of the outer/main query. It does not depend on the outer query for its values. Uncorrelated subqueries are
executed only once and provide a result set that is used by the main query.

Correlated Subqueries: A correlated subquery is a subquery that is executed once for each row
processed by the outer/main query. The subquery references columns from the outer query, and its results are
dependent on the current row being processed by the main query.

Usage and Considerations: Uncorrelated subqueries are generally faster and more efficient than
correlated subqueries. Correlated subqueries are necessary when you need to perform operations based on
each row of the outer query. Use uncorrelated subqueries when the subquery can be executed independently
for all rows.

Subqueries in Various Clauses:


Subqueries in WHERE clause: SELECT * FROM table1 WHERE column1 = (SELECT column2 FROM
table2 WHERE condition);
Subqueries in FROM clause (Derived Tables): SELECT t1.column1 FROM (SELECT * FROM table1
WHERE condition) AS t1;

Subqueries in SELECT clause:


SELECT (SELECT MAX(column1) FROM table1) AS max_value;

Example Correlated Subquery:


Suppose we want to find employees in a table "Employees" whose salary is greater than the average salary
of their department.

SELECT emp_name, salary


FROM Employees AS e1
WHERE salary > (SELECT AVG(salary) FROM Employees AS e2 WHERE e1.department =
e2.department);
In this example, the subquery (SELECT AVG (salary) FROM Employees AS e2 WHERE e1.department =
e2.department) is correlated because it references the department from the outer query (e1).
CHAPTER 4: PYTHON PROGRAMMING

4.1 Collection objects -1(conditional Statements, Arrays, Strings)


Certainly! Let's explore collection objects in Python, specifically focusing on conditional statements, arrays
(lists), and strings:
Conditional Statements:
if statement: The if statement allows you to execute a block of code only if a specified condition is true.
if condition:
# code to execute if the condition is true
if-else statement:
The if-else statement allows you to execute one block of code if a condition is true and another block if the
condition is false.
if condition:
# code to execute if the condition is true
else:
# code to execute if the condition is false
if-elif-else statement:
The if-elif-else statement allows you to check multiple conditions and execute different blocks of code based
on which condition is true.
if condition1:
# code to execute if condition1 is true
elif condition2:
# code to execute if condition2 is true
else:
# code to execute if all conditions are false
nested if statement:
You can have an if statement inside another if, elif, or else block. This is called nesting.
if condition1:
if condition2:
# code to execute if both condition1 and condition2 are true
else:
# code to execute if only condition1 is true
else:
# code to execute if condition1 is false
Strings:
Declaration and Initialization:
my_string = "Hello, World!"

Accessing Elements:
print(my_string[0]) # Access the first character

Slicing and Substring:


substring = my_string[7:12] # Retrieves characters from index 7 to 11

String Methods:
print(my_string.upper()) # Convert the string to uppercase
print(my_string.lower()) # Convert the string to lowercase
print(my_string.split(',')) # Split the string based on a delimiter (comma in this case)
These are some common uses and operations related to conditional statements, arrays (lists), and strings in
Python. These collection objects play a significant role in organizing and manipulating data within Python
programs.

4.2 Collection objects -2(List, Tuple, Dictionary)


Certainly! Let's delve deeper into collection objects in Python, focusing on lists, tuples, and dictionaries:

Lists:
Declaration and Initialization:
my_list = [1, 2, 3, 'hello', True]

Accessing Elements:
print(my_list[0]) # Access the first element

Modifying Elements:
my_list[1] = 10 # Modify the second element

List Methods:
my_list.append(6) # Append an element to the end
my_list.extend([7, 8]) # Extend the list with elements from another list
my_list.pop(2) # Remove and return an element at the specified index

Tuples:
Declaration and Initialization:
my_tuple = (1, 2, 3, 'hello', True)

Accessing Elements:
print(my_tuple[0]) # Access the first element

Immutable Nature:
Tuples are immutable; elements cannot be modified once defined.
# This will cause an error
my_tuple[1] = 10

Dictionaries:
Declaration and Initialization:
my_dict = {'name': 'Alice', 'age': 30, 'city': 'New York'}

Accessing Elements:
print(my_dict['name']) # Access the value associated with the key 'name'

Modifying Elements:
my_dict['age'] = 31 # Modify the value associated with the key 'age'

Dictionary Methods:
print(my_dict.keys()) # Get all keys
print(my_dict.values()) # Get all values
print(my_dict.items()) # Get all key-value pairs
4.3 OOPS CONCEPTS IN PYTHON:
Object-Oriented Programming (OOP) is a programming paradigm that revolves around the concept of
"objects," which can encapsulate data (attributes) and behavior (methods). In OOP, software is structured in
a way that models real-world entities by creating classes that represent these entities. Here are key OOP
concepts in Python with brief definitions:

1. Class: A class is a blueprint or a template that defines the structure and behavior of objects. It serves as a
blueprint for creating instances (objects) that have specific attributes and methods.

2. Object: An object is a unique instance of a class, representing a specific entity or concept. Objects
encapsulate data (attributes) and behavior (methods) related to that entity.

3. Attributes: Attributes (or properties) are data associated with a class or object. These represent the
characteristics or features of the object.

4. Methods: Methods are functions defined within a class that represent the behavior or actions that an
object can perform. They operate on the object's attributes and provide a way to interact with the object.

5. Constructor ( init method): The init method is a special method in a class that is called
when an object is instantiated. It is used to initialize object attributes with initial values.

6. Inheritance: Inheritance is a mechanism in which a new class (subclass) inherits attributes and
methods from an existing class (superclass). It promotes code reuse and extensibility.

7. Encapsulation: Encapsulation is the bundling of data (attributes) and methods (functions) that operate
on the data within a class. It helps in controlling access to the data and protecting it from unauthorized
access.

8. Polymorphism: Polymorphism allows objects to be treated as instances of their parent class, enabling
them to share a common interface. It promotes flexibility in code design and usage.

9. Abstraction: Abstraction involves hiding the complex implementation details of a class and exposing
only the essential features. It focuses on what an object does rather than how it achieves its functionality.

These OOP concepts provide a structured approach to software development, making code more modular,
maintainable, and scalable. Python is an object-oriented programming language that fully supports these
concepts, allowing developers to create robust and flexible applications.

4.4 PYTHON STRINGS: In Python, a string is a sequence of characters, represented using single (' '),
double (" "), or triple (''' ''' or """ """) quotes. Strings are immutable, meaning their content cannot be
modified after creation. Here are the key aspects and operations related to strings in Python:

Creating a String:
Using Single Quotes:
my_string = 'Hello, World!'
Using Double Quotes:
my_string = "Hello, World!"
Using Triple Quotes (for multiline strings):
my_string = '''This is a multiline string.'''
Accessing Characters in a String:
You can access individual characters in a string using indexing. Indexing starts at 0 for the first character.
print(my_string[0]) # Access the first character

Slicing and Substring:

Slicing allows you to extract a substring from a string by specifying a start and end index.
substring = my_string[7:12] # Retrieves characters from index 7 to 11
print(substring)

String Concatenation:
You can concatenate (combine) strings using the + operator.
str1 = "Hello"
str2 = "World"
concatenated_string = str1 + " " + str2
print(concatenated_string)
# Outputs: Hello World

String Length:

You can find the length of a string using the len() function.
length = len(my_string)
print(length)
# Outputs: 13 (including spaces and comma)

String Methods:
Python provides numerous built-in string methods to manipulate and modify strings. Here are some
commonly used methods:
upper(): Converts the string to uppercase.
lower(): Converts the string to lowercase.
strip(): Removes leading and trailing whitespace.
replace(old, new): Replaces occurrences of the old substring with the new substring.
split(separator): Splits the string into a list of substrings based on the specified separator.
my_string = " Hello, World! "
print(my_string.strip()) # Outputs: "Hello, World!"
my_string = "Hello, World!"
print(my_string.replace("Hello", "Hi")) # Outputs: "Hi, World!"
my_string = "apple,banana,cherry"
print(my_string.split(",")) # Outputs: ['apple', 'banana', 'cherry']
String Formatting:
String formatting allows you to insert values into a string.
Using f-Strings (Formatted String Literals):
name = "Alice"
age = 30
formatted_string = f"My name is {name} and I am {age} years old."
print(formatted_string)

Using the format() Method:


formatted_string = "My name is {} and I am {} years old.".format(name, age)
print(formatted_string)
CHAPTER 5: NUMPY

5.1 NumPy Array:


NumPy (Numerical Python) is a popular open-source library in Python that provides support for
multidimensional arrays and matrices, along with a collection of mathematical functions to operate on these
data structures. The fundamental data structure in NumPy is the ndarray (n-dimensional array).

1. Creating NumPy Arrays: NumPy arrays can be created using various methods such as:
From a Python List:
import numpy as np
my_list = [1, 2, 3]
my_array = np.array(my_list)

Using numpy.array():

import numpy as np
my_array = np.array([1, 2, 3])

Using numpy.arange() (similar to Python's range()):


import numpy as np
my_array = np.arange(0, 10, 2)

Using numpy.zeros():

import numpy as np
zeros_array = np.zeros(5) # Creates an array of 5 zeros

Using numpy.ones():

import numpy as np
ones_array = np.ones(3) # Creates an array of 3 ones

Using numpy.random:

import numpy as np
random_array = np.random.random(5) # Creates an array of 5 random numbers between 0 and 1

2. Array Attributes:
shape: Returns a tuple representing the dimensions of the array (rows, columns).
dtype: Returns the data type of the elements in the array (e.g., int32, float64).
ndim: Returns the number of dimensions of the array.

3. Array Indexing and Slicing:

NumPy arrays use zero-based indexing. You can access elements, rows, columns, and subsets of an array
using indexing and slicing.

4. Mathematical Operations on Arrays: NumPy provides a wide range of mathematical functions


to perform operations on arrays, such as addition, subtraction, multiplication, division, etc. These operations
are element-wise.
5. Array Reshaping and Flattening: You can reshape arrays using the reshape() method. Flattening
an array means converting a multidimensional array into a 1D array.

6. Array Concatenation and Splitting: NumPy allows concatenation of multiple arrays and splitting
a single array into multiple smaller arrays.

7. Broadcasting: Broadcasting is a powerful feature in NumPy that allows arrays with different shapes to
be combined in arithmetic operations.

8. Universal Functions (ufuncs): Universal functions are functions that operate element-wise on the
array. Examples include sin(), cos(), exp(), log(), etc.

9. Linear Algebra and Matrix Operations: NumPy provides functions to perform various linear
algebra and matrix operations, such as matrix multiplication, determinant, eigenvalues, etc.

5.2 Operations on Array

Operations on NumPy arrays encompass a wide range of functionalities for data manipulation, computation,
and analysis. Here's a comprehensive overview of various operations you can perform on NumPy arrays:

1. Mathematical Operations: NumPy allows for element-wise mathematical operations on arrays.

import numpy as np
arr = np.array([1, 2, 3])
arr_add = arr + 5 # Add 5 to each element
arr_sub = arr - 2 # Subtract 2 from each element
arr_mul = arr * 3 # Multiply each element by 3
arr_div = arr / 2 # Divide each element by 2
arr_exp = np.exp(arr) # Compute exponential for each element
# Logarithm (natural logarithm)
arr_log = np.log(arr) # Compute natural logarithm for each element
# Trigonometric functions
arr_sin = np.sin(arr) # Compute sine for each element
arr_cos = np.cos(arr) # Compute cosine for each element

2. Statistical Operations: NumPy provides functions for calculating various statistics from arrays.
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
mean_value = np.mean(arr)
median_value = np.median(arr)
std_dev = np.std(arr)
variance = np.var(arr)
sum_value = np.sum(arr)
product = np.prod(arr)
3. Aggregation Functions:
These functions perform operations on entire arrays or along a particular axis.
import numpy as np
arr = np.array([[1, 2], [3, 4]])
total_sum = np.sum(arr)
column_sum = np.sum(arr, axis=0)
row_sum = np.sum(arr, axis=1)
4. Array Comparison and Boolean Operations:

Performing comparisons and generating boolean arrays.

import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([2, 2, 2])
element_comparison = arr1 == arr2
logical_and = np.logical_and(arr1 > 1, arr2 == 2)
logical_or = np.logical_or(arr1 > 2, arr2 == 2)
logical_not = np.logical_not(arr1 > 1)

5. Reshaping and Flattening:

Changing the shape of an array and flattening it.

import numpy as np
arr = np.array([[1, 2], [3, 4]])
reshaped_arr = arr.reshape(4) # Convert to 1D array
flattened_arr = arr.flatten() # Flatten to 1D array

6. Sorting:.

import numpy as np

arr = np.array([3, 1, 2])


sorted_arr = np.sort(arr)
reverse_sorted_arr = np.sort(arr)[::-1]

These are fundamental operations on NumPy arrays. Utilizing these operations allows for efficient and
powerful data manipulation and computation, making NumPy a cornerstone in scientific computing and data
analysis.

5.3 Indexing and Slicing Indexing and slicing in NumPy allow you to access and manipulate specific
elements or ranges of elements in an array. Here's a comprehensive guide to indexing and slicing in NumPy:

1. Indexing: In NumPy, indexing is the process of accessing individual elements in an array.


1D Array: For a 1D array, indexing is similar to a Python list.
import numpy as np
arr = np.array([10, 20, 30, 40, 50])
print(arr[0])
Output: 10
print(arr[-1])
Output: 50

b. Multi-dimensional Array:

For a multi-dimensional array, indexing involves specifying indices for each dimension.

import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr[0, 1]) # Output: 2 (row 0, column 1)
2. Slicing: Slicing allows you to extract a portion of an array. The basic syntax is start:stop:step.

1D Array:
import numpy as np
arr = np.array([10, 20, 30, 40, 50])
print(arr[1:4]) # Output: [20 30 40]
print(arr[::2]) # Output: [10 30 50]

b. Multi-dimensional Array: Slicing works similarly for multi-dimensional arrays, specifying slices
for each dimension.

import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(arr[:2, 1:]) # Output: [[2 3], [5 6]]

3. Boolean Indexing: Boolean indexing allows you to filter elements based on a condition.
import numpy as np
arr = np.array([10, 20, 30, 40, 50])
# Boolean condition
condition = arr > 30
# Applying the condition
filtered_arr = arr[condition]
print(filtered_arr) # Output: [40 50]

4. Integer Array Indexing: You can use integer arrays to extract specific elements.
import numpy as np
arr = np.array([10, 20, 30, 40, 50])
# Integer array for indexing
indices = np.array([1, 3])
# Accessing elements using the integer array
selected_elements = arr[indices]
print(selected_elements) # Output: [20 40]

5. Slicing with Assignment:


You can use slicing to modify elements in an array.
import numpy as np
arr = np.array([10, 20, 30, 40, 50])
arr[1:4] = 0
print(arr) # Output: [10 0 0 0 50]
CHAPTER 6: TABLEAU

6.1 Tableau: Tableau is a powerful and popular data visualization tool that allows you to create interactive
and shareable dashboards and reports. Here are some basic concepts and steps to get started with Tableau:

Installing and Setting Up Tableau: Installation: Download and install Tableau Desktop from the Tableau
website. You can choose a trial version or a licensed version based on your needs.

Connecting to Data:

Data Sources: Tableau can connect to various data sources like Excel, CSV, databases (SQL, MySQL, etc.),
cloud-based sources, and more.

Connecting to Data: Open Tableau and click on "Connect to Data." Choose the appropriate data source, and
follow the prompts to connect.

Data Preparation and Cleaning:Tableau allows for basic data preparation, cleaning, and transformation
within the tool itself. You can rename fields, create calculated fields, pivot data, etc.

Creating Visualizations: Dimensions and Measures: Tableau distinguishes between dimensions (categorical
data) and measures (quantitative data).

Creating a Visualization: Drag and drop dimensions and measures into the Rows and Columns shelves to
create a visualization.

Common Visualizations: Tableau offers various visualization options like bar charts, line charts, scatter
plots, maps, and more.

Building Dashboards: Dashboard Workspace: Combine multiple visualizations into a single dashboard.

Formatting: Customize the appearance of your dashboard by formatting colors, fonts, and layout.

Interactivity:
Filtering: Allow users to interact with your data by adding filters.
Actions: Create interactive actions between different sheets or dashboards.

Saving and Sharing:


Saving: Save your Tableau workbook (.twb) to keep your work.
Publishing: Publish your workbook to Tableau Server or Tableau Public for sharing.

Tableau Server and Tableau Public:


Tableau Server: It allows sharing and managing Tableau workbooks within an organization.
Tableau Public: A free service for sharing Tableau visualizations publicly.

Learning Resources:
Tableau Help and Documentation: The Tableau official website provides extensive documentation and
tutorials.
Online Courses: Many online platforms offer Tableau courses to help you master the tool.
Community Forums: Participate in Tableau community forums to get help and learn from others.
Advanced Features:
Parameters and Calculations: Use parameters and calculated fields for more complex analyses.
Advanced Visualizations: Explore advanced visualizations like dual-axis charts, trend lines, and forecasting.
Mapping: Utilize Tableau's mapping capabilities for geographic data visualization.
Tableau is a versatile tool with a lot of capabilities. Starting with the basics and gradually exploring its
features and functionalities will help you create powerful and insightful data visualizations.

6.2 Meta Data

In Tableau, metadata refers to the information about the data itself. It includes details about the structure,
properties, and characteristics of the data you're working with. Understanding metadata is crucial for
effective data analysis and visualization. Here's how metadata is handled in Tableau:

Viewing Metadata for Data Sources: When you connect to a data source in Tableau, you can view metadata
related to that data source. This includes information about tables, columns, data types, and other properties.

Data Pane: The Data pane in Tableau displays the metadata of the connected data source, including
dimensions (categorical data), measures (quantitative data), and other relevant information.

Field Metadata: Within Tableau, you can access metadata for each field (column) in your data source.

Field Properties: Right-click on a field in the Data pane and select "Describe" to view field properties. This
provides details like the data type, minimum and maximum values, and more.

Custom Field Names and Aliases: You can customize field names and aliases to make them more descriptive
and meaningful. This doesn't change the actual data but provides a clear representation.

Data Types and Roles: Tableau assigns data types and roles to each field based on the initial analysis of the
data source. However, you can manually override these assignments based on your understanding of the
data.

Data Type: You can change the data type assigned to a field. For example, you can change a numerical field
to a date field if required.

Role: Assign roles like dimension (discrete data) or measure (continuous data) to fields.

Calculated Fields and Metadata: When creating calculated fields in Tableau, metadata plays a role in
defining the properties of the new field.

Field Properties in Calculations: When creating a calculated field, Tableau allows you to specify the field's
properties, including data type and aggregation.

Metadata Grid: The Metadata Grid in Tableau allows you to view and modify field properties and aliases for
a data source.

Metadata Impact on Visualizations: Understanding the metadata is critical in building effective


visualizations. The type of data (dimension or measure) and its properties influence how Tableau visualizes
the data.

Chart Suggestions: Tableau offers chart suggestions based on the metadata. For example, it might suggest a
bar chart for a categorical variable and a line chart for a time series.
6.3 Types of Tableau Chart:

Tableau offers a wide range of charts and visualization options to help users represent their data in a
meaningful and insightful way. Here are some common types of Tableau charts:

Bar Chart: Bar charts represent data using rectangular bars, with the length of each bar proportional to the
value it represents. Bar charts are effective for comparing discrete categories.

Line Chart: Line charts display data points connected by lines, useful for showing trends or changes over a
continuous range, such as time.

Area Chart: Area charts are similar to line charts, but the area under the line is filled, making it useful for
comparing proportions over time.

Scatter Plot: Scatter plots represent individual data points with dots on a graph, making them useful for
showing relationships or correlations between two numerical variables.

Histogram: Histograms provide a visual representation of the distribution of a dataset, showing the
frequency of values within specific ranges (bins).

Pie Chart: Pie charts display data as a circular graph divided into slices, where each slice represents a
proportion of the whole.

Heat Map: Heat maps use color to represent data values in a matrix, making it easier to identify patterns and
variations.

Tree Map: Tree maps represent hierarchical data in a nested, rectangular layout. The size of each rectangle is
proportional to the data it represents.

Bubble Chart: Bubble charts display data points using bubbles, where the size of the bubble represents a
third numerical variable.

Gantt Chart: Gantt charts visualize project timelines, showing the start and end times of various tasks or
activities.

Box Plot (Box and Whisker Plot): Box plots display the distribution of data based on quartiles, helping to
identify outliers and distribution patterns.

Bullet Graph:Bullet graphs are used to display performance data, comparing a primary measure to a target
measure and additional measures.

Waterfall Chart: Waterfall charts show how an initial value is increased or decreased by a series of
intermediate values, often used for financial data analysis.

Packed Bubble Chart: Packed bubble charts are similar to bubble charts but with bubbles packed tightly to
visualize hierarchical data.

Dual-Axis Chart: Dual-axis charts combine two different chart types in a single chart, allowing for better
comparison of data.

Radar Chart: Radar charts display data in a circular pattern, useful for comparing multiple quantitative
variables.
Map Chart: Tableau offers different map charts, including symbol maps, filled maps, and heat maps, to
visualize data geographically.

6.4 Visual Analytics

Visual analytics in Tableau involves using Tableau's powerful features and tools to visually explore, analyse
and gain insights from data. It allows users to create interactive and insightful visualizations that help in
understanding complex data patterns, trends, and relationships. Here are the key aspects of visual analytics
in Tableau:

Drag-and-Drop Interface: Tableau offers an intuitive drag-and-drop interface, allowing users to easily
connect to data sources and drag dimensions and measures onto the canvas to create visualizations.

Quick Visualization Creation: Users can quickly create various types of visualizations like bar charts, line
charts, pie charts, scatter plots, and more by simply dragging and dropping data fields onto the canvas.

Interactive Dashboards: Users can create interactive dashboards by combining multiple visualizations onto a
single canvas. Interactivity allows users to filter and highlight specific data points dynamically.

Filters and Highlighting: Tableau provides options to filter data based on dimensions or measures, enabling
users to focus on specific subsets of data for analysis. Users can also highlight data points or groups.

Parameters and Calculated Fields: Tableau allows users to create parameters and calculated fields to perform
complex calculations and customize visualizations dynamically.

Data Blending and Joining: Users can blend or join data from different sources, allowing for a unified view
of disparate datasets and facilitating comprehensive analysis.

Annotations and Annotations Pane: Users can add annotations to visualizations to provide additional context
or explanations. The Annotations pane allows for easy management and customization of annotations.

Tableau Story Points: Tableau Story Points enable users to create a sequence of visualizations that tell a story
or present a narrative, providing a guided analytical experience.

Mapping and Geospatial Analysis: Tableau allows users to plot geographical data on maps, enabling
geospatial analysis and insights.

Integration with Advanced Analytics: Tableau integrates with advanced analytics platforms and tools,
allowing users to incorporate predictive analytics, machine learning models, and statistical analysis into their
visualizations.

Publishing and Sharing: Users can publish their visualizations and dashboards to Tableau Server or Tableau
Online, making them accessible to others for viewing and interaction.

Data Alerts and Subscriptions: Users can set up data alerts to receive notifications when specific conditions
in the data are met. Subscriptions allow scheduled delivery of dashboards via email.

Real-Time Data Analysis: Tableau supports real-time data analysis, allowing users to visualize and analyze
streaming data for timely decision-making.
CHAPTER 7: POWER BI

7.1 POWER BI

Power BI is a popular business intelligence tool developed by Microsoft that allows users to visualize and
share insights from their data. Here are the basic concepts and features of Power BI:

Power BI Desktop: Power BI Desktop is a free application that you install on your local machine. It's used to
create reports and visualizations from various data sources.

Data Sources and Connectors: Power BI can connect to a wide array of data sources, including databases
(SQL Server, MySQL, Oracle), Excel files, SharePoint lists, Salesforce, Google Analytics, and more. These
connections are facilitated through connectors.

Data Transformation and Modelling: Power BI Desktop allows you to clean, transform, and model your data
using Power Query and Power Pivot. You can shape your data, create relationships between tables, and
define measures.

Data Visualization: Power BI offers a wide range of visualization options such as bar charts, line charts, pie
charts, maps, tables, matrices, and custom visuals. Users can drag and drop data fields onto the canvas to
create interactive visualizations.

Reports and Pages: Reports in Power BI are collections of visuals that are displayed together on a page. You
can have multiple pages within a report to organize your visuals.

Dashboards: Dashboards in Power BI are a collection of visuals from a single report or multiple reports.
They provide a consolidated view of important metrics and KPIs.

Power Query (Get & Transform Data): Power Query is a powerful data connection and transformation tool
in Power BI. It allows you to shape and clean your data before loading it into Power BI.

Power Pivot (Data Modelling): Power Pivot is an in-memory data modeling engine. It enables users to
model large sets of data, create relationships, and define calculated columns and measures.

DAX (Data Analysis Expressions): DAX is a formula language used in Power BI to create calculated
columns and measures. It's similar to Excel functions but tailored for Power BI's tabular modelling.

Row-Level Security (RLS): RLS allows you to restrict access to rows of data based on the viewer's role or
identity, ensuring data security and privacy.

Q&A (Natural Language Processing): Power BI has a Q&A feature that allows users to ask questions about
their data using natural language and receive visualizations as answers.

Power BI Service: The Power BI service (PowerBI.com) is a cloud-based platform where you can publish,
share, and access Power BI reports and dashboards. It allows collaboration and real-time updates.

Gateway: Power BI Gateway allows for a secure connection between Power BI services and on-premises
data sources, enabling data refreshes and real-time dashboards.

Power BI Mobile: Power BI Mobile enables users to view and interact with their Power BI content on
mobile devices, making it accessible anytime, anywhere.
Power BI's user-friendly interface and powerful features make it a popular choice for data analysts, business
analysts, and decision-makers to derive valuable insights from their data and drive informed business
decisions.

7.2 Interface, Data Connection

Power BI Interface: Power BI provides an intuitive and user-friendly interface designed to streamline the
process of creating, analyzing, and visualizing data. Here are key components:

Ribbon: Similar to Microsoft Office applications, Power BI has a ribbon at the top providing access to
various tools and features.

Canvas: This is the central area where you create visualizations by dragging fields from the data pane.

Visualizations Pane: On the right side, you have the visualizations pane, where you can select and configure
the type of visualization you want to create.

Fields Pane: Also on the right, the fields pane displays the fields available from your data source. You can
drag and drop these fields to create visualizations.

Pages Tab: You can have multiple pages within a report, allowing you to organize your visuals effectively.

Filters Pane: Allows you to add filters to your report to interactively slice and dice your data.

Visualization Tools: Various visualization tools are available to enhance and customize your visuals, such as
formatting options, analytics, and more.

Modelling Tools: These tools enable data modelling operations such as creating relationships, defining
measures, and managing data categories.

Data Connection in Power BI:

Data Sources: Power BI can connect to a wide range of data sources including databases (SQL Server,
MySQL, Oracle), files (Excel, CSV), cloud-based sources (Azure SQL Database, Google Analytics), and
more.

Power Query (Get Data): Power Query is a tool used to connect, transform, and clean data from various
sources. It helps prepare the data for analysis.

Data Load: Once the data is transformed, you load it into Power BI for modeling and visualization. Power BI
Desktop keeps a model of the data in memory.

Data Modelling (Power Pivot): Power BI allows you to create relationships between tables, define
hierarchies, create calculated columns, and write DAX expressions to enhance the data model.

Data Refresh: After loading the data, you can configure refresh settings to keep your data up-to-date by
scheduling regular refreshes. This is crucial for live or frequently updated data.

DirectQuery: Power BI supports DirectQuery mode where it queries the underlying data source in real-time
instead of importing data. This is useful for large datasets.
Power BI Gateway: The Power BI Gateway allows for secure data refreshes for on-premises data sources
and live connections to data models in the Power BI service.

7.3 Data Transformation: Data transformation in Power BI involves cleaning, shaping, and organizing
data to make it suitable for analysis and visualization. Power BI provides a powerful tool called Power
Query to perform these transformation tasks.

Connecting to Data:

 Open Power BI Desktop.


 Click on the "Home" tab.
 Click "Get Data" to connect to a data source.
 Using Power Query for Data Transformation:
 After connecting to a data source, a Power Query Editor window will open.

a. Data Source Settings: Review and modify data source settings like server details, authentication, and
database selection.

b. Navigator: In the Navigator window, choose the specific data tables or views you want to load.

Click "Load" to load the data into Power Query Editor.

Data Cleaning and Transformation Steps: In the Power Query Editor, you'll find various options for data
transformation and cleaning:

Removing Columns or Rows: Right-click on a column or row header and choose to remove.

Changing Data Types: Select a column, right-click, and choose "Change Type" to change data types.

Filtering Rows: Use filter options to remove unwanted rows based on conditions.

Adding or Removing Columns: Add custom columns using formulas.

Remove unnecessary columns: Splitting and Merging Columns: Split a column into multiple columns based
on a delimiter. Merge multiple columns into one.

Grouping and Aggregating Data: Group rows to perform aggregations (sum, average, etc.) on grouped data.

Pivoting and Unpivoting Data: Change the structure of the data by pivoting columns or unpivoting rows.

Duplicating or Reference Data: Create a duplicate of a query or create a reference to the same data source.

Handling Null or Blank Values:

Replace null or blank values with appropriate data.

 Advanced Transformations with M Code (Power Query Language):


 For complex transformations, you can use the M language directly.

Applying Changes and Loading Data:

Once the necessary transformations are applied, click "Close & Apply" to load the transformed data into
Power BI.
Modifying Applied Steps:

You can review and modify the applied steps in the "Applied Steps" window.
Any changes here will be reflected in the loaded data.
Refreshing Data:

After loading the data into Power BI, you can refresh it to reflect any changes in the source data.

Data Transformation with DAX (Data Analysis Expressions): In the data model, you can further transform
data using DAX formulas, creating calculated columns and measures.

What is Power BI?

Power BI is a business analytics service provided by Microsoft that lets you visualize your data and share
insights. It converts data from different sources to build interactive dashboards and Business Intelligence
reports.

Why Power BI?

Power BI can access vast volumes of data from multiple sources. It allows you to view, analyze, and
visualize vast quantities of data that cannot be opened in Excel. Some of the important data sources available
for Power BI are Excel, CSV, XML,

JSON, pdf, etc. Power BI uses powerful compression algorithms to import and cache the data within
the.PBIX file.

Interactive UI/UX Features

Power BI makes things visually appealing. It has an easy drag and drops

functionality, with features that allow you to copy all formatting across similar visualizations.

Exceptional Excel Integration

Power BI helps to gather, analyze, publish, and share Excel business data. Anyone familiar with Office 365
can easily connect Excel queries, data models, and reports to Power BI Dashboards.

Accelerate Big Data Preparation with Azure

Using Power BI with Azure allows you to analyze and share massive volumes of data. An azure data lake
can reduce the time it takes to get insights and increase collaboration between business analysts, data
engineers, and data scientists.

Turn Insights into Action

Power BI allows you to gain insights from data and turn those insights into actions to make data-driven
business decisions.

Real-time Stream Analytics

Power BI will enable you to perform real-time stream analytics. It helps you fetch data from multiple sensors
and social media sources to get access to real-time analytics, so you are always ready to make business
decisions
Components of Power BI

Power Query

Power Query is the data transformation and mash of the engine. It enables you to discover, connect,
combine, and refine data sources to meet your analysis need. It can be downloaded as an add-in for Excel or
can be used as part of the Power BI Desktop.

Power Pivot

Power Pivot is a data modeling technique that lets you create data models, establish relationships, and create
calculations. It uses Data Analysis Expression (DAX) language to model simple and complex data.

Power View

Power View is a technology that is available in Excel, Sharepoint, SQL Server, and Power BI. It lets you
create interactive charts, graphs, maps, and other visuals that bring your data to life. It can connect to data
sources and filter data for each data visualization element or the entire report.

Power Map

Microsoft's Power Map for Excel and Power BI is a 3-D data visualization tool that lets you map your data
and plot more than a million rows of data visually on Bing maps in 3-D format from an Excel table or Data
Model in Excel. Power Map works with Bing maps to get the best visualization based on latitude, longitude,
or country, state, city, and street address information.

Power BI Desktop

Power BI Desktop is a development tool for Power Query, Power Pivot, and Power View. With Power BI
Desktop, you have everything under the same solution, and it is easier to develop BI and data analysis
experience.

Power Q&A

The Q&A feature in Power BI lets you explore your data in your own words. It is the fastest way to get an
answer from your data using natural language. An examplecould be what was the total sales last year? Once
you've built your data model and deployed that on the Power BI website, then you can ask questions and get
answers quickly.

Sample POWER BI reports


CHAPTER 8: FINAL PROJECT

The COVID-19 pandemic, also known as the coronavirus pandemic, is an ongoing global pandemic of
coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus . It was
first identified in December 2019 in Wuhan, China.

The World Health Organization declared the outbreak a Public Health Emergency of International Concern
on 30 January 2020, and later a pandemic on 11 March 2020. As of 8 April 2021, more than 133 million
cases have been confirmed, with more than 2.89 million deaths attributed to COVID-19, making it one of the
deadliest pandemics in history.

Symptoms of COVID-19 are highly variable, ranging from none to life-threatening illness. The virus appears
to spread quickly among people, and more continue to be discovered over time about how it applies. The
virus can cause a range of symptoms, ranging from mild illness to pneumonia. Signs of the disease are fever,
cough, sore throat, and headaches. In severe cases, difficulty in breathing and deaths can occur.

The COVID-19 virus spreads primarily through droplets of saliva or discharge from the nose when an
infected person coughs or sneezes, so it is essential that you also practice respiratory etiquette. The virus
spreads mainly through the air when people are near each other. It leaves an infected person as they breathe,
cough, sneeze, or speak and enters another person via their mouth, nose, or eyes.

It may also spread via contaminated surfaces. People remain contagious for up to two weeks and can spread
the virus even if they are asymptomatic.

Recommended preventive measures include social distancing, wearing face masks in public, ventilation and
air-filtering, hand washing, covering one's mouth when sneezing or coughing, disinfecting surfaces, and
monitoring and self isolation for people exposed or symptomatic. Several vaccines have been developed and
widely distributed since December 2020. Current treatments focus on addressing symptoms, but work is
underway to develop therapeutic drugs that inhibit the virus.

The pandemic has resulted in significant global social and economic disruption, including the largest global
recession since the Great Depression. It has led to widespread supply shortages exacerbated by panic buying,
agricultural disruption and food shortages, and decreased emissions of pollutants and greenhouse

gases. Numerous educational institutions and public areas have been partially or fully closed, and many
events have been cancelled or postponed. Misinformation has circulated through social media and mass
media. The pandemic has raised issues of racial and geographic discrimination, health equity, and the
balance between public health imperatives and individual rights. This pandemic is the defining global health
crisis of our time and the most significant challenge we have faced since World War Two

Source of the Data: An electronic health record (EHR) is the systematized collection of patient and
population electronically stored health information in a digital format. These records can be shared across
different health care settings. Records are shared through network-connected, enterprise-wide information
systems or other information networks and exchanges.

EHRs may include a range of data, including demographics, medical history, medication and allergies,
immunization status, laboratory test results, radiology images, vital signs, personal statistics like age and
weight, and billing information.
The Electronic Health Record (EMR) software was specifically created to fully accommodate all aspects of
clinical workflow, including storage, retrieval, and modification of digital patient records plus prescription
writing, clinical annotation, ordering laboratory and imaging tests and viewing test results.

The Electronic-health-record software aids in interoperability for patient record sharing between physicians,
hospitals and pharmacies and offers a very mature EMR solution. EMR helps with continuity of care by
connecting all members of the care team throughout the healthcare cycle which improves care quality.

If all members of a patient’s care team can connect about a patient’s health (from primary care doctor, to
specialist, and beyond); consequently, hospital readmissions are reduced leading to better value.

Utilizing a certified, interoperable electronic medical record system enables continuity of care, which
provides practices with a means to thrive within a value based care model and enables practices to receive
reimbursement.

Data Cleaning and Methodology: Data cleaning is a critical step before loading data into any
decision support system or GIS for spatial analysis. In this project, we received the data from 3 different
Electronic-Medical-Record systems. We standardized the master data in our data file before loading the file
into Microsoft Power BI software for analysis.

For example, one system identifies the data as Medi-Cal while in another system, it is defined as Medicaid.
Similarly, discharge disposition, patient type, patient financial class, point of origin and all other data set
attributes are made consistent and streamlined across all the systems for spatial analysis.

COVID-19 Dashboard: We have analyzed the above-mentioned data using ESRI ArcGIS, ESRI
Insights, Business Analyst, Tableau, and Microsoft Power BI. Finally, we build all the analysis and
dashboard in Microsoft Power BI because Power BI can be integrated with ArcGIS for custom data
visualization and spatial analysis.

The COVID19 Dashboard provides a comprehensive overview of the key metrics of the global pandemic,
their current developments, and detailed analyses at the Zip Code and City levels. Users can quickly get an
overview and check most essential Key Performance Indices (KPIs) and filter by their respective Year,
Month, Patient Type, Point of Origin, and Primary Diagnosis. The dashboard provides valuable insights on
cases (Positive, Recovered and Death), mortality analysis in depth by Zip Code, City, and selected
timeframe. Generally, both cumulative numbers and new cases are provided.

The dashboard design provides users with an effective management summary of the most relevant KPIs as
well as detailed analyses on separate report pages. Using the COVID dashboard example, the dashboard
shows the strengths of spatial analysis with Power BI: an intuitive visualization which makes even complex
data sets more comprehensible.

Data Source for Power BI Power BI supports importing or connecting to workbooks created in Excel
2007 and later. Our data are stored in Excel. Moreover, the tool can connect to any RDBMS table as well
and that can be refreshed automatically.

Data Model: With the modeling feature, we can build custom calculations on the existing tables and these
columns can be directly presented into Power BI visualizations. This allows us to define new metrics and to
perform custom calculations for those metrics.
Covid-19 Impacts Analysis using Python

The outbreak of Covid-19 resulted in a lot of restrictions which resulted in so many impacts on the global
economy. Almost all the countries were impacted negatively by the rise in the cases of Covid-19. If you want
to learn how to analyze the impacts of Covid-19 on the economy, this article is for you. In this article, I will
take you through the task of Covid-19 Impacts Analysis using Python.

Covid-19 Impacts Analysis (Case Study)

The first wave of covid-19 impacted the global economy as the world was never ready for the pandemic. It
resulted in a rise in cases, a rise in deaths, a rise in unemployment and a rise in poverty, resulting in an
economic slowdown. Here, you are required to analyze the spread of Covid-19 cases and all the impacts of
covid-19 on the economy.

 the country codes


 name of all the countries
 date of the record
 Human development index of all the countries
 Daily covid-19 cases
 Daily deaths due to covid-19
 stringency index of the countries
 the population of the countries
 GDP per capita of the countries

Covid-19 Impacts Analysis using Python

Let’s start the task of Covid-19 impacts analysis by importing the necessary Python libraries and the dataset:
The data we are using contains the data on covid-19 cases and their impact on GDP from December 31,
2019, to October 10, 2020.

Data Preparation

The dataset that we are using here contains two data files. One file contains raw data, and the other file
contains transformed one. But we have to use both datasets for this task, as both of them contain equally
important information in different columns. So let’s have a look at both the datasets one by one:
After having initial impressions of both datasets, I found that we have to combine both datasets by creating a
new dataset. But before we create a new dataset, let’s have a look at how many samples of each country are
present in the dataset:

So we don’t have an equal number of samples of each country in the dataset. Let’s have a look at the mode
value

Let’s have a look at the median value

Let’s have a look at the mean value

Let’s have a look at the min value


Let’s have a look at the max value

Let’s have a look at the count value

So 294 is the mode value. We will need to use it for dividing the sum of all the samples related to the human
development index, GDP per capita, and the population. Now let’s create a new dataset by combining the
necessary columns from both the datasets:

I have not included the GDP per capita column yet. I didn’t find the correct figures for GDP per capita in the
dataset. So it will be better to manually collect the data about the GDP per capita of the countries.

As we have so many countries in this data, it will not be easy to manually collect the data about the GDP per
capita of all the countries. So let’s select a subsample from this dataset. To create a subsample from this
dataset, I will be selecting the top 10 countries with the highest number of covid-19 cases. It will be a perfect
sample to study the economic impacts of covid-19. So let’s sort the data according to the total cases of
Covid-19:
Now here’s how we can select the top 10 countries with the highest number of cases:

Now I will add two more columns (GDP per capita before Covid-19, GDP per capita during Covid-19) to
this dataset:
Note: The data about the GDP per capita is collected manually.

Analyzing the Spread of Covid-19 Now let’s start by analyzing the spread of covid-19 in all the
countries with the highest number of covid-19 cases. I will first have a look at all the countries with the
highest number of covid-19 cases:

Countries with Highest Covid Cases We can see that the USA is comparatively having a very high
number of covid-19 cases as compared to Brazil and India in the second and third positions. Now let’s have
a look at the total number of deaths among the countries with the highest number of covid-19 cases.

Covid-19 Impacts Analysis: Countries with Highest Deaths: Just like the total number of
covid-19 cases, the USA is leading in the deaths, with Brazil and India in the second and third positions. One
thing to notice here is that the death rate in India, Russia, and South Africa is comparatively low according
to the total number of cases. Now let’s compare the total number of cases and total deaths in all these
countries:

Now
let’s have a look at the percentage of total deaths and total cases among all the countries with the highest
number of covid-19 cases

Below is how you can calculate the death rate of Covid-19 cases

Another important column in this dataset is the stringency index. It is a composite measure of response
indicators, including school closures, workplace closures, and travel bans. It shows how strictly countries are
following these measures to control the spread of covid-19:

Here we can see that India is performing well in the stringency index during the outbreak of covid-19.

Analyzing Covid-19 Impacts on Economy Now let’s move to analyze the impacts of covid-19 on the
economy. Here the GDP per capita is the primary factor for analyzing the economic slowdowns caused due
to the outbreak of covid-19. Let’s have a look at the GDP per capita before the outbreak of covid-19 among
the countries with the highest number of covid-19 cases:

Now let’s have a look at the GDP per capita during the rise in the cases of covid-19:
Now let’s compare the GDP per capita before covid-19 and during covid-19 to have a look at the impact of
covid-19 on the GDP per capita:

You can see a drop in GDP per capita in all the countries with the highest number of covid-19 cases. One
other important economic factor is Human Development Index. It is a statistic composite index of life
expectancy, education, and per capita indicators. Let’s have a look at how many countries were spending
their budget on the human development:
CHAPTER 9: CONCLUSION

Drawing a conclusion about the impacts of COVID-19 involves considering various aspects, including
public health, the global economy, societal behavior, and more. Keep in mind that the situation is dynamic,
and new information may have emerged since my last knowledge update in January 2022. Here's a general
conclusion based on the information available up to that point

The COVID-19 pandemic has left an indelible mark on the world, triggering widespread and multifaceted
impacts. From a public health perspective, the virus has caused significant morbidity and mortality,
prompting unprecedented global efforts to develop and distribute vaccines. Economically, the pandemic led
to disruptions across industries, with businesses facing closures, supply chain challenges, and economic
downturns. The shift towards remote work and digitalization accelerated, transforming the way people work
and communicate.

Societal impacts include changes in behavior, such as increased awareness of hygiene practices and the
importance of public health measures. The pandemic also highlighted existing social inequalities and
disparities, with vulnerable populations disproportionately affected.

Governments around the world responded with various measures, including lockdowns, travel restrictions,
and economic stimulus packages. International collaboration became crucial for addressing the global nature
of the crisis.

Looking forward, the ongoing vaccination efforts offer hope for a gradual return to normalcy, but challenges
such as vaccine distribution, emerging variants, and long-term economic recovery persist. The pandemic
underscored the need for robust public health systems, global cooperation, and preparedness for future health
crises

You might also like