Building a Data Lake and Business Intelligence Solution for Higher Education
Building a Data Lake and Business Intelligence Solution for Higher Education
dharmawan
Building a Data Lake Event dashboard Prerequisites Bring your Own Account
and Business
Intelligence Solution
for Higher Education
Bring your Own Account
Prerequisites
In order to actually do something with AWS services, you need an AWS account. Furthermore, you need an IAM user for
AWS Event
this account that you can use to log into the AWS Management Console, so that you can provision and configure you
Bring your Own Account resources.
Workshop Steps
AWS Identity and Access Management (IAM) enables you to manage access to AWS services and resources securely.
Wrapping Up
Using IAM, you can create and manage AWS users and groups, and use permissions to allow and deny their access to
Cleanup
AWS resources.
Open AWS console The regular case is that you have to bring your own AWS account, configured with an IAM user with administrator
(us-east-1) privileges. If possible, this should be a vanilla account. If you bring your own account, either personal or a company
Get AWS CLI credentials account, make sure you understand the implications and policy of provisioning resources into this account.
If you don't have an AWS account yet, and you want to repeat this workshop for yourself later, you can create a free
Exit event
account here or ask your company's Cloud Center of Excellence to create one for you.
1. Once you have an AWS account, create a new IAM user for this workshop with administrator access to the AWS
account: create a new IAM user
2. Enter the user details: and select Access Type as Programmatic access and AWS Management Console access
3. Attach the AdministratorAccess IAM Policy, choose Administrator access or fine-grained access depending upon the
Amazon services used for this lab. For this lab you can choose Administrator access → Click Next → Ignore Tags →
Next Review
4. Click to create the new user:
3. We choose false as the value for Parameter DMSCWRoleCreated. Check the box "I acknowledge that ...", then click
on "Create Stack" to create the stack
4. In case you aren't able to launch the quick create stack, you can download the template file and then follow the
steps to create stack manually.
Previous Next
© 2008 - 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Privacy policy Terms of use
irwin.dharmawan
AWS Data Lake and Visualization Lab tasks will cover following navigation exercise
Workshop
Ingest Data into Amazon S3 using AWS Lambda
Introduction
Setup your Data Lake using AWS Lake Formation
Prerequisites
Query your data using Amazon Athena
AWS Event
Visualize your data using Amazon Quicksight
Bring your Own Account
Workshop Steps The high-level architecture looks something like the following
Wrapping Up
Cleanup
Exit event
Previous Next
© 2008 - 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Privacy policy Terms of use
irwin.dharmawan
AWS Data Lake and Visualization In this lab, you will use AWS Lambda to execute a pre-configured function that will copy the contents of a remote S3
Workshop bucket into your own S3 bucket. This simulates the action of batch-ingesting data into your S3 Data Lake.
Introduction
1. In the AWS console, navigate to Lambda console by clicking Services on the top-left of your screen, then clicking
Prerequisites
Compute and then Lambda.
AWS Event
Workshop Steps
Wrapping Up
Cleanup
Exit event
2. In the AWS Lambda console, click Functions on the left panel. Search for SisDemoDataHandler and then click the
function shown in the results.
4. Wait for the message Execution result: succeeded in a green box indicating that objects have been successfully
copied into your S3 bucket.
To verify this, you can navigate to your S3 bucket to view the newly ingested files.
Previous Next
© 2008 - 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Privacy policy Terms of use
irwin.dharmawan
Building a Data Lake Event dashboard Workshop Steps 2. Setup Data Lake
and Business
Intelligence Solution
for Higher Education
2. Setup Data Lake
AWS Data Lake and These labs cover basic functionalities of Lake Formation, how different components can be glued together to create a
Visualization Workshop data lake on AWS, how to configure different security policies to provide access, how to do search across catalogs and
Introduction collaborate.
Prerequisites
Follow the following steps one-by-one to complete these labs:
AWS Event
© 2008 - 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Privacy policy Terms of use
irwin.dharmawan
Building a Data Lake Event dashboard Workshop Steps 2. Setup Data Lake a. Data Lake Administration
and Business
Intelligence Solution
for Higher Education
a. Data Lake Administration
AWS Data Lake and Visualization The data lake administrator is an IAM user or IAM role that has the ability to grant any principal (including self) any
Workshop permission on any Data Catalog resource. Designate a data lake administrator as the first user of the Data Catalog.
Introduction
1. Open the AWS Lake Formation console at https://console.aws.amazon.com/lakeformation .
Prerequisites
2. It will prompt you to assign a Lake Formation administrator, and the option Add myself is checked by default. Keep
AWS Event
the option as it is and click on the Get started button. This will give the role TeamRole administrative control on the
Bring your Own Account
Workshop Steps
Wrapping Up
Cleanup
Exit event
data lake.
Previous Next
© 2008 - 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Privacy policy Terms of use
irwin.dharmawan
Building a Data Lake Event dashboard Workshop Steps 2. Setup Data Lake b. Change Default Catalog Settings
and Business
Intelligence Solution
for Higher Education
b. Change Default Catalog Settings
AWS Data Lake and Visualization Lake Formation starts with the "Use only IAM access control" settings enabled for compatibility with existing AWS Glue
Workshop Data Catalog behavior. Follow these steps to disable those settings to enable fine-grained access control with Lake
Introduction Formation permissions.
Prerequisites
1. Open the AWS Lake Formation console at https://console.aws.amazon.com/lakeformation .
AWS Event
2. In the navigation pane, under Data catalog, choose Settings.
Bring your Own Account 3. Clear both check boxes and choose Save.
Workshop Steps
Wrapping Up
Cleanup
Exit event
Previous Next
© 2008 - 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Privacy policy Terms of use
irwin.dharmawan
Building a Data Lake Event dashboard Workshop Steps 2. Setup Data Lake c. Databases
and Business
Intelligence Solution
for Higher Education
c. Databases
AWS Data Lake and Visualization 1. Now, click on the Databases option on the left and then click on Create database button.
Workshop
Introduction
Prerequisites
AWS Event
Workshop Steps
Wrapping Up
Cleanup
2. Since we are planning to build the data lake for Student Performance data, name the database as SISDemo.
AWS account access
3. On the Location box, select the S3 data lake path which was created through CloudFormation. You can also find the
Open AWS console S3 path from the CloudFormation output tab.
(us-east-1)
4. Uncheck the option - Use only IAM access control for new tables in this database
Get AWS CLI credentials
5. Leave rest of the options as default and click on Create database button.
Exit event
Previous Next
© 2008 - 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Privacy policy Terms of use
irwin.dharmawan
Building a Data Lake Event dashboard Workshop Steps 2. Setup Data Lake d. Data Lake Location
and Business
Intelligence Solution
for Higher Education
d. Data Lake Location
AWS Data Lake and Visualization 1. Now register an Amazon S3 bucket as your data lake storage. After ingestion phase, data will be stored into this
Workshop location.
Introduction 2. In the navigation pane, choose Data lake locations, and then choose Register location.
Prerequisites
AWS Event
Workshop Steps
Wrapping Up
Cleanup
Open AWS console 3. Enter a path to an existing S3 bucket (created through CloudFormation) to contain the data that you want available
(us-east-1) in your data lake.
Get AWS CLI credentials 4. For the IAM role, select the AWSServiceRoleForLakeFormationDataAccess role. Click Register location to save it.
Exit event
Previous Next
© 2008 - 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Privacy policy Terms of use
irwin.dharmawan
Building a Data Lake Event dashboard Workshop Steps 2. Setup Data Lake e. Data Lake Permissions
and Business
Intelligence Solution
for Higher Education
e. Data Lake Permissions
AWS Data Lake and Visualization One of the key features of AWS Lake formation is its ability to secure access to data in your data lake. Lake Formation
Workshop provides its own permissions model that augments the AWS Identity and Access Management (IAM) permissions model.
Introduction
This exercise covers basic data access permission setting for different personas using the Lake Formation. This is a
Prerequisites
critical step in defining fine grained access to the Data Catalog tables and columns in your data lake.
AWS Event
Bring your Own Account To set table-based permission for your Role and for backend processing role, navigate to the Lake Formation data
lake permissions section.
Workshop Steps
Wrapping Up
For Lakeformationworkflow role
Cleanup
Exit event
3. Under the Policy tags or catalog resources section, select the option Named data catalog resources
4. Choose sisdemo for the database.
5. For this exercise, we are doing database-based permissions. So, under the Database permissions section, check
Super.
For WSParticipantRole
1. Click on Grant button. On the window that pops up, configure the following options:
2. Under the Principals section, select the WSParticipantRole role.
3. Under the Policy tags or catalog resources section, select the option Named data catalog resources
4. Choose sisdemo for the database and select All tables from the Tables drop-down list.
5. For this exercise, we are doing table-based permissions. So, under the Table and column permissions section, check
Super.
6. Click on Grant button. The resulting Data permissions page should look like this.
Previous Next
© 2008 - 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Privacy policy Terms of use
irwin.dharmawan
Building a Data Lake Event dashboard Workshop Steps 2. Setup Data Lake f. Catalog Data Lake
and Business
Intelligence Solution
for Higher Education
f. Catalog Data Lake
AWS Data Lake and Visualization We will use AWS Glue Crawler to crawl the data from S3 bucket and add the tables to Glue Data Catalog, which can
Workshop then be queried using Amazon Athena.
Introduction
1. Navigate to AWS Lake Formation console.
Prerequisites
2. Select Crawlers under Register and ingest section from the left navigation bar.
AWS Event
3. New window will open for AWS Glue service. Switch to the new console if such an option is given. Click on Create
Bring your Own Account crawler.
Workshop Steps
Wrapping Up
Cleanup
Exit event
4. Enter SISDemo-crawler as the crawler name. Optionally, enter the description. Click Next.
5. Choose Not yet for the question Is your data already mapped to Glue tables?. Under Data Sources, click Add a
data source.
8. On the IAM role page, select the existing IAM role that looks like xxx-LakeFormationWorkflowRole-xxx. Click Next.
9. On the Set output and scheduling page, select sisdemo as the Target database under Output configuration. Under
Crawler schedule, select On demand for frequency. Click Next.
10. Review the summary page noting the Include path and Database output and Click Create Crawler. The crawler is
now ready to run.
11. Click Run crawler button.
12. Crawler will change status from starting to stopping, wait until crawler comes back to ready state (the process will
take a few minutes), you can see that it has created 12 tables.
13. In the AWS Glue navigation pane, click Databases > Tables. You can also click the SISDemo database to browse the
tables.
14. We are all set to query our data using Amazon Athena !
Previous Next
© 2008 - 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Privacy policy Terms of use
irwin.dharmawan
AWS Data Lake and Visualization We will use Amazon Athena to Query data and create a view
Workshop
As an Admin, you have access to all the tables. Let us make sure you can run queries from Athena console on those
Introduction
tables.
Prerequisites
AWS Event 1. From the AWS Management console, search for Amazon Athena service
Bring your Own Account 2. Make sure you are logged in as a Team role
Wrapping Up
Cleanup
6. Click on Manage
7. Provide the Athena query result path to point to S3 bucket configured that looks like s3://xxx-
s3bucketworkgroupa-xxx/athena-results/ and save it.
8. In the Query Editor, select SISDemo for the database. As you my have noticed, you can see all the 12 tables
9. Click the table named Student to inspect the fields. Note: The type for student_id is int, first_name and last_name
should be string.
10. We will start with the following query that joins courses to their related schools and departments. Copy the
following SQL syntax into the New Query tab and click Run Query.
SELECT
c.course_id,
c.course_name,
c.course_level,
c.department_id,
d.department_code,
d.department_name,
c.school_id,
s.school_name
FROM
sisdemo.course c
JOIN sisdemo.department d
ON c.department_id = d.department_id
JOIN sisdemo.school s
ON c.school_id = s.school_id
LIMIT 10
We will join tables student, department, school, semester . Copy the following SQL syntax into the New Query tab
and click Run Query.
SELECT
st.student_id,
st.first_name,
st.last_name,
st.gender,
st.birth_date,
st.email_address,
st.admitted,
st.enrolled,
st.parent_alum,
st.parent_highest_ed,
phe.ed_level_id,
phe.ed_level_code,
phe.ed_level_desc,
st.first_gen_hed_student,
st.high_school_gpa,
st.was_hs_athlete_ind,
st.home_state_name,
st.admit_type,
st.private_hs_indicator,
st.multiple_majors_indicator,
st.secondary_class_percentile,
st.department_id,
d.department_code,
d.department_name,
d.school_id,
sc.school_name,
st.admit_semester_id,
admitsem.start_date AS first_semester_start_date,
admitsem.end_date AS first_semester_end_date,
admitsem.term_name AS first_semester_term_name,
admitsem.semester_year AS first_semester_year,
admitsem.school_year_name AS first_semester_school_year_name,
st.first_year_gpa,
st.cumulative_gpa,
st.enroll_status,
st.planned_grad_semester_id,
gradsem.start_date AS final_semester_start_date,
gradsem.end_date AS final_semester_end_date,
gradsem.term_name AS final_semester_term_name,
gradsem.semester_year AS final_semester_year,
gradsem.school_year_name AS final_semester_school_year_name
FROM
"sisdemo"."student" st
JOIN "sisdemo"."department" d
ON st.department_id = d.department_id
JOIN "sisdemo"."school" sc
ON d.school_id = sc.school_id
JOIN "sisdemo"."ed_level" phe
ON st.parent_highest_ed = phe.ed_level_id
JOIN "sisdemo"."semester" admitsem
ON st.admit_semester_id = admitsem.semester_id
JOIN "sisdemo"."semester" gradsem
ON st.admit_semester_id = gradsem.semester_id
We are now ready to use this view for visualization in Amazon Quicksight
Previous Next
© 2008 - 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Privacy policy Terms of use
irwin.dharmawan
AWS Data Lake and We will use Amazon Quicksight to visualize the data
Visualization Workshop
These labs cover basic functionalities of Amazon Quicksight. Follow the following steps to complete this lab
Introduction
Wrapping Up
Previous Next
Cleanup
AWS account access
Exit event
© 2008 - 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Privacy policy Terms of use
irwin.dharmawan
Building a Data Lake Event dashboard Workshop Steps 4. Visualize Data a. Set up QuickSight
and Business
Intelligence Solution
for Higher Education
a. Set up QuickSight
AWS Data Lake and Visualization 1. In the AWS services console, search for QuickSight. If this is the first time you have used QuickSight, you are
Workshop prompted to create an account.
Introduction 2. Click Sign up for QuickSight.
Prerequisites
AWS Event
Workshop Steps
Wrapping Up
Cleanup
Exit event 3. For account type, choose the default Standard Version. Please do not choose Enterprise or Enterprise + Q.
4. Click Continue.
5. On the Create your QuickSight account page, Choose the appropriate AWS region based on where you are running
this workshop. For QuickSight account name give a unique name (e.g., quicksight-lab-<initals>-<randomstring>)
and email address.
6. On Quicksight access to AWS service, check boxes to enable auto discovery, Amazon Athena, and Amazon S3. Select
S3 buckets
a. Open the Lake Formation console, go to Data lake permissions, click on Grant button
d. Under the Policy tags or catalog resources section, select the option Named data catalog resources
e. Choose sisdemo for the database and select All tables from the Tables drop-down list.
f.
For this exercise, we are doing table-based permissions. So, under the Table and column permissions section, check
Super.
h. We are now ready to build our quicksight visualization. Return to the Quicksight console
Previous Next
© 2008 - 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Privacy policy Terms of use
irwin.dharmawan
Building a Data Lake Event dashboard Workshop Steps 4. Visualize Data b. Configure Datasource
and Business
Intelligence Solution
for Higher Education
b. Configure Datasource
Introduction
Prerequisites
1. On the top right corner, click New analysis.
AWS Event
Workshop Steps
Wrapping Up
Cleanup
Exit event
8. To finish data set creation, choose the option Import to SPICE for quicker analytics and click Edit/Preview data. If
your SPICE has 0 bytes available, choose the second choice Directly query your data
9. You will now be taken to the QuickSight Dataset preparation interface where you can start preparing your dataset.
The SPICE dataset will take a few minutes to be built, but you can continue to work on the underlying data.
10. Click Add calculated field and enter the following: Name: First_Gen_Status Enter the followwing formula in the
blank canvas:
Previous Next
© 2008 - 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Privacy policy Terms of use
irwin.dharmawan
Building a Data Lake Event dashboard Workshop Steps 4. Visualize Data c. Build Analysis and Dashboard
and Business
Intelligence Solution
for Higher Education
c. Build Analysis and Dashboard
AWS Event 2. Click on + Add > Add visual to add visual to the canvas
Workshop Steps
Wrapping Up
Cleanup
4. Select the first visual, choose the Visual type as KPI icon, Value: student_id (Count), Target value: none, Trend
group: none
5. Select the second visual, choose the Visual type as KPI icon, Value: student_id (Count), Target value: none, Trend
group: gender
6. Select the third visual, choose the Visual type as KPI icon, Value: student_id (Count), Target value: none, Trend
group: First_gen_status
7. Click on + Add > Add visual to add visual to the canvas. choose the Visual type as Vertical bar chart icon, x axis:
school_name, value: none, Group/Color: First_Gen_status
13. Dismiss the Share dashboard with popup dialog. The dashboard Student Demographics now appears in the main
section of the page.
Previous Next
© 2008 - 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Privacy policy Terms of use
irwin.dharmawan
Bring your Own Account Learned about building Secure Data Lakes with Lake Formation
Workshop Steps Learned how to Hydrate the Data Lake
Wrapping Up Learned the working Within the Data Lake with Glue
Cleanup Learned to Query and Visualize the Data Lake with Athena and QuickSight
AWS account access
Exit event
© 2008 - 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Privacy policy Terms of use
irwin.dharmawan
Prerequisites
To avoid unexpected chargers to your account, make sure you clean up your account.
AWS Event
Bring your Own Account Clean up AWS Glue, Amazon Athena, Amazon QuickSight, Amazon S3 resources related to demo SIS database
Workshop Steps
1. Delete Quicksight Account.
Wrapping Up
2. Delete AWS Glue crawler sisdemo-crawler.
Cleanup
3. Delete AWS Glue database sisdemo.
4. Delete the xxx-dmslabs3bucket-xxx bucket.
AWS account access 5. Delete data Athena queries wrote by deleting s3://xxx-s3bucketworkgroupa-xxx bucket.
Open AWS console 6. To delete cloudformation using AWS Management Console → Login to AWS Management Console and navigate
(us-east-1) Services → Management & Governance → CloudFormation. Delete the stack from latest to oldest.
Previous Next
Exit event
© 2008 - 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Privacy policy Terms of use