0% found this document useful (0 votes)
12 views

Case Study Proposal For Data Loads For Future

data loads

Uploaded by

Parthasarathi M
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Case Study Proposal For Data Loads For Future

data loads

Uploaded by

Parthasarathi M
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 1

Conceptual Questions

Create a proposal on how to handle new batches of input data, so that existing transactions created in
previous batches are not overridden. You don’t need to provide an implementation, just write down a
short proposal which can be discussed at the interview.

Introduction:

This is a proposal for handling new batches of input data.

Project Scope:

The scope is within the data purview, data could be coming from respective upstream teams and
transaction data could be used for certain downstream teams, reporting teams and by Business users.
There could be different stakeholders involved in the smooth functioning of data pipelines and reporting
layers for bringing value to the business.

Technical Specifications:

Existing Transaction data is being brought into Data bricks environment by using input data of
CONTRACTS and CLAIMS.

New Batches of input data can be handled in multiple ways. A few of those are mentioned below

 For the very first load, we can do a full-load of data


 And then incremental Batch loads can be performed by using data orchestration services like
any cloud service such as Azure Data Factory, Amazon Redshift, etc.
 For near real-time data capture, Event Hub or any pub-sub models such as Kafka can be put in
place
 MERGE facility can be used to not hamper any historical data which is not updated.
 Data Lake house architecture could be followed to keep hot and cold data
 Data formats and Compression techniques can be evaluated per need basis

Evaluation Criteria:

The Business User and Management decision making is remarkably dependent on the DATA. As the
volume, variety, velocity, veracity of the data increases significantly, there is a greater responsibility on
the data team to have up-to-date and accurate data which would be used by multiple stakeholders.

In order to achieve that, data pipelines needs to be created & modified as per the business and technical
needs of the project.

You might also like