Next Pathway Hack Backpackers Problem Statement
Next Pathway Hack Backpackers Problem Statement
problem statement
Introduction
Data industry is fast-tracked and so is Next Pathway. Next Pathway needs experts like you to bring out
solutions to the data problems we brought to you today.
Acceptance criteria
Write a script in Python or SQL where any SQL Statement can be passed and your program will
trace the complete lineage and give the originating table for each output column of the input
query.
for example:
sql_script=”
SELECT
a.uid, b.uname
FROM
“
Sample expected output:
function get_originating_tables(sql_script):
//parse the script and fetch the lineage and gives originating table name and put it in a
dictionary or any other data structure
Evaluation criteria
1. Participant needs to push the code into a folder with name “Lineage Problem Statement”
2. Participant needs to mention the instructions to run the program in ReadME.md file
3. Evaluator will
a. Fetch the code to their local
b. Read the instructions to run
c. Pass any SQL script to your program
d. Checks the expected output
e. Evaluates the program and the approach taken
Judging criteria
1. Based on the output of the program, the correct ones will be called for the interview
2. In a few cases, where the approach is right but due to some technical issues, the program
doesn’t run, can also be called for the interview. This is purely the evaluator's decision.
Problem statement 2 (FUNCTION CONVERSION PROBLEM STATEMENT)
Note: You do not necessarily need the knowledge of ADF or Datastage here. You need to understand
the meaning of functions and write your logic to convert them. All links with functions, their syntax
and explanation are mentioned for reference.
Problem:
DataStage expression and function to ADF expression and dynamic contents
Link for DataStage functions and expressions: DataStage functions and expressions
Understand the functions on which you need to work(mentioned below in Datastage Functions to
work upon) from the Datastage link mentioned above and create equivalent Python or Javascript
code using equivalent ADF functions and expressions (example below),
Below is the sample of writing an equivalent code for Datastage function CurrentDate which returns
the current date.
Scenario
ADF does not have a direct function which can return the date value. Hence, In the code, utc_now()
is a ADF function which gives datetime in the format 2018-04-15T13:00:00.0000000Z and then we
will take the substring till T to return the date.
There can be other better ways to do this which you can put in your code.
Explanation:
utc_now() is the ADF function which will provide the datetime in the format “2022-08-06T11:00:00”
utc_now(“dd”) gives the day part. Here, from 2022-08-06T11:00:00, it will give 06
utc_now(“MM”) gives the month part. Here, from 2022-08-06T11:00:00, it will give 08
utc_now(“yyyy”) gives the year part. Here, from 2022-08-06T11:00:00, it will give 2022
All these functions, their syntax and explanation is mentioned in the ADF links provided.
Please make sure that the implementation is done by ADF expression only. Python/Javascript is a
wrap to get it. In other words, in the given example, when converting from timestamp to date, it
needs to be done using ADF expression, not Python or Javascript.
Link for Expressions and functions in Azure Data Factory and Azure Synapse Analytics
https://docs.microsoft.com/en-us/azure/data-factory/control-flow-expression-language-functions
** To test the functions, you can have an ADF setup on your local for avoiding any syntactical errors.
It is ok if you do not have the setup and it is not a bottleneck in solving the problem, keep writing the
functions as per your understanding.
In case you face difficulty in having this setup, please connect us on the support channel on Discord.
Logical functions
BitAnd
BitCompress
Mathematical functions
Abs
Floor
Random
Number functions
AsDouble
MantissaFromDecimal
Raw functions
RawLength
String functions
AlNum
Change
Compare
Field
Str
Acceptance criteria
Write equivalent python or javascript code for each of the above functions in separate code files and
push the code to your Github repository.
Please make sure that the implementation is done by ADF expression only. Python/Javascript is a
wrap to get it. In other words, in the given example, when converting from timestamp to date, it
needs to be done using ADF expression, not Python or Javascript.
Evaluation criteria
1. Participant needs to push the code into a folder with name “Function Conversion Problem
Statement”
2. Participant needs to push one folder for each type of function like folder
“Date_time_functions” and inside that you can push each function with a new file like
“currenttimestamp.py” or “currenttimestamp.js”
3. Participant needs to mention the instructions to run the program in ReadME.md file
4. Evaluator will
a. Fetch the code to their local
b. Read the instructions to run
c. Checks the expected output
d. Evaluates the program and the approach taken
Judging criteria
1. Participants will keep pushing their functions as they complete them. We may expect not all
the functions to be completed.
2. Based on the output of the scripts pushed, the correct ones will be called for the interview
3. In a few cases, where the approach is right but due to some technical issues or syntactical
error while using the ADF function, the program doesn’t run, and can also be called for the
interview. This is purely the evaluator's decision.
Input
Wf_src_idw_cntry_multi_def_cd.xml
Download the XML file and understand the transformation happening on the data.
Acceptance criteria
Understand the Source columns, the transformations happening over those columns and the target
where they are getting dumped.
Once understood, your task is to
1. Write a python program to parse the XML and fetch the lineage of all downstream
transformations over a column from source to target.
2. Pay attention on sources to fetch the columns and then trace the XML document to get the
transformations logic over that column
3. Provide the complete lineage of transformation on these columns like below
Expected output:
source(tableA.column1) -> filter(col1) -> join(colA) -> target(tableB.column2)
source(tableB.column2) -> join(colAB) -> target(tableC.column3)
Evaluation criteria
1. Participant needs to push the code into a folder with the name “XML Problem Statement”
2. Participant needs to mention the instructions to run the program in ReadME.md file
3. After running the program on the XML, the output should give the columns lineage like
mentioned above.
Judging criteria
1. Based on the output of the program, the correct ones will be called for the interview
2. In a few cases, where the approach is right but due to some technical issues, the program
doesn’t run, can also be called for the interview. This is purely the evaluator's decision.
Github Guidelines
Create Repository
1. Create your repository with name “<Hackathon_name>_<Your_name>
2. Keep your repository public.
3. Add a README file to your repository
View How to create repo and make it public with README.md file video
Provide us with the repository url while in the registration desk(registration channel on Discord)
which starts at 8:30 AM
Instruction to write in README.md
Follow the below format to write in README.md file
README.md content
# HackBackpackers_SunilKumar
Problem Statements
# Description
# Prerequisite
# How to run
Steps
1.
2.
# Any other points to mention
# Description
# Prerequisite
# How to run
Steps
1.
2.
# Description
# Prerequisite
# How to run
Steps
1.
2.
We expect you to submit your work on Github irrespective of whether any solution is complete or
not. No evaluation is done at this checkpoint.
We expect you to submit your work on Github. No evaluation is done at this checkpoint.