SlideShare a Scribd company logo
Spring & Batch
October 2017
M. Mohamed
2Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Plan & Goals
Plan
1. A tour of Spring & Spring-Boot
2. What is a Batch and Spring-Batch ?
3. How does it work ?
4. Advanced notions and applications
5. A good example
Goals
1. Understanding how Spring-Batch works
2. Detecting use cases
3. Be able to do Batch and go even further with ...
Hello « Spring Framework »…
4Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Spring Framework (Core 1/3)
Open-source Java framework created in 2003 by “Pivotal Software”
A light container (unlike EJBs)
Mainly based on injection of dependency and AOP
Easy integration of other Framework (Hibernate, JSF, Thymeleaf, ...)
5Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Spring Framework (Boot 2/3)
Little configuration (self configurable as long as possible)
Runnable Application, autonomous even in production.
Embedded Tomcat (+ Jetty) , no need War
No need for XML files (context.xml, web.xml, …) @annotations
6Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Spring Framework (Trend 3/3 @GoogleTrend)
What is a « Batch » ?
8Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Batch (What's this ? 1/2)
Lot Data Processing
Large data process
Several operations ensue for each lot
Automatic or manual triggering
9Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Batch (Example 2/2)
Example : import of daily orders
Reading a lot of
commands from
a file
Checking
orders
Saving the lot of
commands in
storage system
Repeat this cycle for each lot of orders until the last
Triggered by CRON, everyday at 5 am, before the arrival of the collaborators
10Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Spring-Batch (and ! 1/6)
Open-source framework for Batch Processing
Robust Batch Application (Entreprise application)
Reusable functions for data processing
Adds a standard approach to defining Batch Jobs
11Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Spring-Batch (More 2/6)
Transaction management (Commit interval & Rollback)
Batch processing easily (Chuncked process)
Error Management, Recovery and Stopping Jobs
All in Spring…
12Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Spring-Batch (Concept 3/6)
A Job is defined by a Step Flow (one or more)
A Step usually defined as a Reader, Processor and Writer
A Step can also define a simple Tasklet
A Job Repository (JobRepository) and a Job Executor (JobLauncher)
Origin : Spring.io
13Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Spring-Batch (How does it work ? 4/6)
The Reader read one item at a time
The Processor processe one item at a time
When the lot is read and processed, the Writer writes it
Origin : Spring.io
14Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Spring-Batch (Advanced 5/6)
Flow of Sequential, Conditional or Parallel Step
Split Flow and Multi-threading
Many Listeners (StepExecutionListener, ChunkListener, ...)
Exceptions management
15Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Spring-Batch (Diagramme 6/6)
Origin : Cépria FR
16Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Example
from life…
17Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Example of use (Context 1/8)
Web application that can use Jobs from View
Track Jobs Status
Allow to play Jobs in Synchronous and Asynchronous
Three Jobs, one for importing CSV files to Database, one for exporting
data to a JSON file, and one Job for importing and exporting large
amounts of data (200,000 lines) into less than 10 minutes
18Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Example of use (Business request 2/8)
Import Job :
• Input : a CSV file containing employee information and their annual gross
salaries
• Output : processed information and taxes calculated and saved in the
database
Export Job :
• Input : employee information and taxes calculated and saved in the
database
• Output : A JSON file containing all data
+ A REST API to perform tax calculation and validation
19Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Example of use (Architecture & technical choices 3/8)
Use of Spring-Boot with Spring-MVC and Srping-Batch
Integration of Thymeleaf as a templating engine (+ nekohtml)
MySQL Driver, Mokito for tests and Jackson for JSON support
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
…
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-batch</artifactId>
…
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-jpa</artifactId>
…
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
…
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-thymeleaf</artifactId>
…
<groupId>net.sourceforge.nekohtml</groupId>
<artifactId>nekohtml</artifactId>
…
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
20Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Example of use (Solution diagramme 4/8)
21Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Example of use (Programming 5/8)
An interface is available to define a Tasklet
public interface ItemReader<T> {
T read() throws Exception, UnexpectedInputException , ParseException;
}
public interface ItemWriter<T> {
void write(List<? extends T> items) throws Exception;
}
public interface ItemProcessor<I, O> {
O process(I item) throws Exception;
}
Three interfaces are available to define a Reader, Processor and Writer
public interface Tasklet {
RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception
}
An interface is available to define a Partitioner
public interface Partitioner {
Map<String, ExecutionContext> partition(int gridSize);
}
22Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Example of use (Annotation job 6/8)
@Bean
public ItemReader<Person> reader() {
return new PersonReaderFromFile();
}
@Bean
public ImportPersonItemProcessor processor() {
return new ImportPersonItemProcessor();
}
@Bean
public ItemWriter<Person> writer() {
return new PersonWriterToDataBase();
}
@Bean
public Tasklet cleaner() {
return new CleanDBTasklet ();
}
@Bean
public Job importUserJob() {
return jobBuilderFactory.get("importUserJob").incrementer(new RunIdIncrementer()).flow(stepClean())
.next(stepImport()).end().listener(new ImportJobExecutionListener(reader())
.validator(new FileNameParameterValidator()).build();
}
@Bean
public Step stepImport() {
return stepBuilderFactory.get("stepImport").<Person, Person>chunk(10).reader(reader()).processor(processor()).writer(writer()).build();
}
@Bean
public Step stepClean() {
return stepBuilderFactory.get("stepClean").tasklet(cleaner()).build();
}
23Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Example of use (XML Jobs 7/8)
<batch:job id="importUserJob">
<batch:step id="stepClean" next="importStep">
<batch:tasklet ref="cleanDBTasklet" />
</batch:step>
<batch:step id="importStep">
<batch:tasklet>
<batch:chunk reader="reader" writer="writer" processor="processor" commit-interval="10" />
</batch:tasklet>
</batch:step>
</batch:job>
<bean id="reader" class="com.capgemini.reader.PersonReaderFromFile" scope="step" />
<bean id="processor" class="com.capgemini.processor.ImportPersonItemProcessor" scope="step" />
<bean id="writer" class="com.capgemini.writer.PersonWriterToDataBase" scope="step" />
<bean id="cleanDBTasklet" class="com.capgemini.tasklet.CleanDBTasklet" />
<batch:job id=“exportUserJob">
<batch:step id=“exportStep">
<batch:tasklet>
<batch:chunk reader="reader" writer="writer" processor="processor" commit-interval="10" />
</batch:tasklet>
</batch:step>
</batch:job>
<bean id="reader" class="com.capgemini.reader.PersonReaderFromDataBase" scope="step" />
<bean id="processor" class="com.capgemini.processor.ExportPersonItemProcessor" scope="step" />
<bean id="writer" class="com.capgemini.writer.PersonWriterToFile" scope="step" />
24Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Example of use (Application 8/8)
Demo
Live …
26Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Demo (Result and Conclusion1/6)
Small or medium file import:
• For small files, an average of 4 seconds for Synchronous and
Asynchronous modes
• For medium-sized files (10,000 lines), an average of 43 seconds of
processing between Synchronous and Asynchronous
Import large file: A processing average of 13 minutes
Export 1000 rows or 10,000 rows : For the 1000 rows, an average of 4
seconds of processing, and for the 100,000 rows, an average of 30
seconds
Export of 200,000 rows : A processing average of 12 minutes
=> For an import and export of 200,000 lines, it exceeds 10 minutes
27Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Demo (Solution 2/6)
To import and export 200,000 lines in less than 10 minutes, multi-threading
is one of the solutions
Multi-threading: multiple Threads, perform a parallel task
New problem: the FileReader and FileWriter are not ThreadSafe
Solution for the FileReader: Split the input file so that each file is processed
by a Thread
Solution for the FileWriter: paginate data from the database to export
several files and concatenate them at the end of the Process
28Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Demo (Configuration 3/6)
<batch:job id="transformJob">
<batch:step id="deleteDir" next="cleanDB">
<batch:tasklet ref="fileDeletingTasklet" />
</batch:step>
<batch:step id="cleanDB" next="countThread">
<batch:tasklet ref="cleanDBTasklet" />
</batch:step>
<batch:step id="countThread" next="split">
<batch:tasklet ref="countThreadTasklet" />
</batch:step>
<batch:step id="split" next="partitionerMasterImporter">
<batch:tasklet>
<batch:chunk reader="largeCSVReader" writer="smallCSVWriter"
commit-interval="#{jobExecutionContext['chunk.count']}" />
</batch:tasklet>
</batch:step>
<batch:step id="partitionerMasterImporter" next="partitionerMasterExporter">
<partition step="importChunked" partitioner="filePartitioner">
<handler grid-size="10" task-executor="taskExecutor" />
</partition>
</batch:step>
<batch:step id="partitionerMasterExporter" next="concat">
<partition step="exportChunked" partitioner="dbPartitioner">
<handler grid-size="10" task-executor="taskExecutor" />
</partition>
</batch:step>
<batch:step id="concat">
<batch:tasklet ref="concatFileTasklet" />
</batch:step>
</batch:job>
29Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Demo (Configuration 4/6)
<batch:step id="importChunked">
<batch:tasklet>
<batch:chunk reader="smallCSVFileReader" writer="dbWriter"
processor="importProcessor" commit-interval="500">
</batch:chunk>
</batch:tasklet>
</batch:step>
<batch:step id="exportChunked">
<batch:tasklet>
<batch:chunk reader="dbReader" writer="jsonFileWriter" processor="exportProcessor" commit-
interval="#{jobExecutionContext['chunk.count']}">
</batch:chunk>
</batch:tasklet>
</batch:step>
<bean id="jsonFileWriter" class="com.capgemini.writer.PersonWriterToFile" scope="step">
<property name="outputPath" value="csv/chunked/paged-#{stepExecutionContext[page]}.json" />
</bean>
<bean id="dbReader" class="com.capgemini.reader.PersonReaderFromDataBase" scope="step">
<property name="iPersonRepository" ref="IPersonRepository" />
<property name="page" value="#{stepExecutionContext[page]}"/>
<property name="size" value="#{stepExecutionContext[size]}"/>
</bean>
<bean id="countThreadTasklet" class="com.capgemini.tasklet.CountingTasklet" scope="step">
<property name="input" value="file:csv/input/#{jobParameters[filename]}" />
</bean>
<bean id="cleanDBTasklet" class="com.capgemini.tasklet.CleanDBTasklet" />
<bean id="fileDeletingTasklet" class="com.capgemini.tasklet.FileDeletingTasklet">
<property name="directory" value="file:csv/chunked/" />
</bean>
30Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Demo (Configuration 5/6)
<bean id="concatFileTasklet" class="com.capgemini.tasklet.FileConcatTasklet">
<property name="directory" value="file:csv/chunked/" />
<property name="outputFilename" value="csv/output/export.json" />
</bean>
<bean id="filePartitioner" class="com.capgemini.partitioner.FilePartitioner">
<property name="outputPath" value="csv/chunked/" />
</bean>
<bean id="dbPartitioner" class="com.capgemini.partitioner.DBPartitioner" scope="step">
<property name="pageSize" value="#{jobExecutionContext['chunk.count']}" />
</bean>
<bean id="largeCSVReader" class="com.capgemini.reader.LineReaderFromFile" scope="step">
<property name="inputPath" value="csv/input/#{jobParameters[filename]}" />
</bean>
<bean id="smallCSVWriter" class="com.capgemini.writer.LineWriterToFile" scope="step">
<property name="outputPath" value="csv/chunked/"></property>
</bean>
<bean id="smallCSVFileReader" class="com.capgemini.reader.PersonReaderFromFile" scope="step">
<constructor-arg value="csv/chunked/#{stepExecutionContext[file]}" />
</bean>
<bean id="importProcessor" class="com.capgemini.processor.ImportPersonItemProcessor" />
<bean id="exportProcessor" class="com.capgemini.processor.ExportPersonItemProcessor" />
<bean id="dbWriter" class="com.capgemini.writer.PersonWriterToDataBase">
<property name="iPersonRepository" ref="IPersonRepository" />
</bean>
31Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
The test
With multi-threading
32Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Test (Result1/3)
33Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Test (Threads 2/3)
Before Job aunch
During the
execution of the
Job
34Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Test (Output 3/3)
35Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Summary
36Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Summary (Good + 1/2)
Define a pattern for Batch
Embed Batch in any kind of Spring application easily
Reliability, maintainability
Advanced functions such as "Multi-Threading"
Integrated batch testing
Error Tolerance and Recovery
37Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Summary (Not good – 2/2)
The Spring Batch Admin project is no longer maintained : mandatory switch
to the Spring Cloud Data Flow
Difficulty to run a job defined by annotation in a project under JAR package
embarking several Jobs
Version compatibility issues between Spring Batch and H2 Database, very
useful for testing Jobs
38Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Useful links
39Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Useful links
Full source code of the project : https://gitlab.com/mmohamed/spring-batch
Documentation :
• http://projects.spring.io/spring-batch/#quick-start
• https://blog.octo.com/spring-batch-par-quel-bout-le-prendre
• https://blog.netapsys.fr/spring-batch-par-lexemple-2
• http://jeremy-jeanne.developpez.com/tutoriels/spring/spring-batch/
www.capgemini.com
The information contained in this presentation is proprietary.
© 2017 Capgemini. All rights reserved. Rightshore® is a trademark belonging to Capgemini.
About Capgemini
With more than 190,000 people, Capgemini is present in over 40
countries and celebrates its 50th Anniversary year in 2017. A
global leader in consulting, technology and outsourcing services,
the Group reported 2016 global revenues of EUR 12.5 billion.
Together with its clients, Capgemini creates and delivers
business, technology and digital solutions that fit their needs,
enabling them to achieve innovation and competitiveness. A
deeply multicultural organization, Capgemini has developed its
own way of working, Collaborative Business ExperienceTM, and
draws on Rightshore®, its worldwide delivery model
Learn more about us at www.capgemini.com
Rightshore® is a trademark belonging to Capgemini

More Related Content

What's hot (20)

PDF
Spring Boot Revisited with KoFu and JaFu
VMware Tanzu
 
PDF
The Making of the Oracle R2DBC Driver and How to Take Your Code from Synchron...
VMware Tanzu
 
PDF
Nomad, l'orchestration made in Hashicorp - Bastien Cadiot
Paris Container Day
 
PPTX
Node.js Internals and V8 / Operating System Interaction
GlobalLogic Ukraine
 
ODP
Gradle: The Build System you have been waiting for!
Corneil du Plessis
 
PDF
Development with Vert.x: an event-driven application framework for the JVM
David Wu
 
PDF
猿でもわかる Helm
Tsuyoshi Miyake
 
PDF
C++からWebRTC (DataChannel)を利用する
祐司 伊藤
 
PDF
Security in a containerized world - Jessie Frazelle
Paris Container Day
 
PDF
An Introduction to Gradle for Java Developers
Kostas Saidis
 
PDF
WILD microSERVICES v2
Aleksandr Tarasov
 
PDF
Reactive Microservice And Spring5
Jay Lee
 
PDF
Vert.x introduction
GR8Conf
 
PDF
桃園市教育局Docker技術入門與實作
Philip Zheng
 
PDF
Spring boot wednesday
Vinay Prajapati
 
PDF
Dsl로 만나는 groovy
Seeyoung Chang
 
PDF
Griffon @ Svwjug
Andres Almiray
 
PPTX
Cloud hybridation leveraging on Docker 1.12
Ludovic Piot
 
KEY
Developing Mobile HTML5 Apps with Grails
GR8Conf
 
PPTX
C++ Coroutines
Sumant Tambe
 
Spring Boot Revisited with KoFu and JaFu
VMware Tanzu
 
The Making of the Oracle R2DBC Driver and How to Take Your Code from Synchron...
VMware Tanzu
 
Nomad, l'orchestration made in Hashicorp - Bastien Cadiot
Paris Container Day
 
Node.js Internals and V8 / Operating System Interaction
GlobalLogic Ukraine
 
Gradle: The Build System you have been waiting for!
Corneil du Plessis
 
Development with Vert.x: an event-driven application framework for the JVM
David Wu
 
猿でもわかる Helm
Tsuyoshi Miyake
 
C++からWebRTC (DataChannel)を利用する
祐司 伊藤
 
Security in a containerized world - Jessie Frazelle
Paris Container Day
 
An Introduction to Gradle for Java Developers
Kostas Saidis
 
WILD microSERVICES v2
Aleksandr Tarasov
 
Reactive Microservice And Spring5
Jay Lee
 
Vert.x introduction
GR8Conf
 
桃園市教育局Docker技術入門與實作
Philip Zheng
 
Spring boot wednesday
Vinay Prajapati
 
Dsl로 만나는 groovy
Seeyoung Chang
 
Griffon @ Svwjug
Andres Almiray
 
Cloud hybridation leveraging on Docker 1.12
Ludovic Piot
 
Developing Mobile HTML5 Apps with Grails
GR8Conf
 
C++ Coroutines
Sumant Tambe
 

Similar to Spring & SpringBatch EN (20)

PPTX
Spring batch
Deepak Kumar
 
PPTX
Spring batch for large enterprises operations
Ignasi González
 
PDF
Spring batch overivew
Chanyeong Choi
 
PDF
Gain Proficiency in Batch Processing with Spring Batch
Inexture Solutions
 
KEY
Spring Batch Behind the Scenes
Joshua Long
 
PPTX
Spring batch
Yukti Kaura
 
PDF
Java one 2015 [con3339]
Arshal Ameen
 
PPTX
Spring batch
Chandan Kumar Rana
 
PDF
Spring Batch Performance Tuning
Gunnar Hillert
 
PDF
Spring Batch in Code - simple DB to DB batch applicaiton
tomi vanek
 
PPT
Spring Batch Introduction
Tadaya Tsuyukubo
 
PPTX
Spring batch introduction
Alex Fernandez
 
PPTX
Spring batch showCase
taher abdo
 
PPTX
SBJUG - Building Beautiful Batch Jobs
stephenbhadran
 
PDF
Batch Applications for the Java Platform
Sivakumar Thyagarajan
 
PDF
Spring Day | Behind the Scenes at Spring Batch | Dave Syer
JAX London
 
PDF
Design & Develop Batch Applications in Java/JEE
Naresh Chintalcheru
 
DOCX
springn batch tutorial
Jadae
 
PDF
Intro to SpringBatch NoSQL 2021
Slobodan Lohja
 
PPTX
Batching and Java EE (jdk.io)
Ryan Cuprak
 
Spring batch
Deepak Kumar
 
Spring batch for large enterprises operations
Ignasi González
 
Spring batch overivew
Chanyeong Choi
 
Gain Proficiency in Batch Processing with Spring Batch
Inexture Solutions
 
Spring Batch Behind the Scenes
Joshua Long
 
Spring batch
Yukti Kaura
 
Java one 2015 [con3339]
Arshal Ameen
 
Spring batch
Chandan Kumar Rana
 
Spring Batch Performance Tuning
Gunnar Hillert
 
Spring Batch in Code - simple DB to DB batch applicaiton
tomi vanek
 
Spring Batch Introduction
Tadaya Tsuyukubo
 
Spring batch introduction
Alex Fernandez
 
Spring batch showCase
taher abdo
 
SBJUG - Building Beautiful Batch Jobs
stephenbhadran
 
Batch Applications for the Java Platform
Sivakumar Thyagarajan
 
Spring Day | Behind the Scenes at Spring Batch | Dave Syer
JAX London
 
Design & Develop Batch Applications in Java/JEE
Naresh Chintalcheru
 
springn batch tutorial
Jadae
 
Intro to SpringBatch NoSQL 2021
Slobodan Lohja
 
Batching and Java EE (jdk.io)
Ryan Cuprak
 
Ad

Recently uploaded (20)

PPTX
Diabetes diabetes diabetes diabetes jsnsmxndm
130SaniyaAbduNasir
 
DOCX
Engineering Geology Field Report to Malekhu .docx
justprashant567
 
PDF
Module - 4 Machine Learning -22ISE62.pdf
Dr. Shivashankar
 
PDF
bs-en-12390-3 testing hardened concrete.pdf
ADVANCEDCONSTRUCTION
 
PPTX
OCS353 DATA SCIENCE FUNDAMENTALS- Unit 1 Introduction to Data Science
A R SIVANESH M.E., (Ph.D)
 
PPTX
Introduction to File Transfer Protocol with commands in FTP
BeulahS2
 
PDF
Artificial Neural Network-Types,Perceptron,Problems
Sharmila Chidaravalli
 
PPTX
Numerical-Solutions-of-Ordinary-Differential-Equations.pptx
SAMUKTHAARM
 
PPTX
Engineering Quiz ShowEngineering Quiz Show
CalvinLabial
 
PDF
Bayesian Learning - Naive Bayes Algorithm
Sharmila Chidaravalli
 
PDF
William Stallings - Foundations of Modern Networking_ SDN, NFV, QoE, IoT, and...
lavanya896395
 
PDF
A Brief Introduction About Robert Paul Hardee
Robert Paul Hardee
 
PDF
NFPA 10 - Estandar para extintores de incendios portatiles (ed.22 ENG).pdf
Oscar Orozco
 
PPTX
Introduction to Internal Combustion Engines - Types, Working and Camparison.pptx
UtkarshPatil98
 
PPTX
Seminar Description: YOLO v1 (You Only Look Once).pptx
abhijithpramod20002
 
PDF
Artificial intelligence,WHAT IS AI ALL ABOUT AI....pdf
Himani271945
 
PPTX
darshai cross section and river section analysis
muk7971
 
PPTX
Engineering Quiz ShowEngineering Quiz Show
CalvinLabial
 
PDF
Authentication Devices in Fog-mobile Edge Computing Environments through a Wi...
ijujournal
 
PDF
this idjfk sgfdhgdhgdbhgbgrbdrwhrgbbhtgdt
WaleedAziz7
 
Diabetes diabetes diabetes diabetes jsnsmxndm
130SaniyaAbduNasir
 
Engineering Geology Field Report to Malekhu .docx
justprashant567
 
Module - 4 Machine Learning -22ISE62.pdf
Dr. Shivashankar
 
bs-en-12390-3 testing hardened concrete.pdf
ADVANCEDCONSTRUCTION
 
OCS353 DATA SCIENCE FUNDAMENTALS- Unit 1 Introduction to Data Science
A R SIVANESH M.E., (Ph.D)
 
Introduction to File Transfer Protocol with commands in FTP
BeulahS2
 
Artificial Neural Network-Types,Perceptron,Problems
Sharmila Chidaravalli
 
Numerical-Solutions-of-Ordinary-Differential-Equations.pptx
SAMUKTHAARM
 
Engineering Quiz ShowEngineering Quiz Show
CalvinLabial
 
Bayesian Learning - Naive Bayes Algorithm
Sharmila Chidaravalli
 
William Stallings - Foundations of Modern Networking_ SDN, NFV, QoE, IoT, and...
lavanya896395
 
A Brief Introduction About Robert Paul Hardee
Robert Paul Hardee
 
NFPA 10 - Estandar para extintores de incendios portatiles (ed.22 ENG).pdf
Oscar Orozco
 
Introduction to Internal Combustion Engines - Types, Working and Camparison.pptx
UtkarshPatil98
 
Seminar Description: YOLO v1 (You Only Look Once).pptx
abhijithpramod20002
 
Artificial intelligence,WHAT IS AI ALL ABOUT AI....pdf
Himani271945
 
darshai cross section and river section analysis
muk7971
 
Engineering Quiz ShowEngineering Quiz Show
CalvinLabial
 
Authentication Devices in Fog-mobile Edge Computing Environments through a Wi...
ijujournal
 
this idjfk sgfdhgdhgdbhgbgrbdrwhrgbbhtgdt
WaleedAziz7
 
Ad

Spring & SpringBatch EN

  • 1. Spring & Batch October 2017 M. Mohamed
  • 2. 2Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Plan & Goals Plan 1. A tour of Spring & Spring-Boot 2. What is a Batch and Spring-Batch ? 3. How does it work ? 4. Advanced notions and applications 5. A good example Goals 1. Understanding how Spring-Batch works 2. Detecting use cases 3. Be able to do Batch and go even further with ...
  • 3. Hello « Spring Framework »…
  • 4. 4Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Spring Framework (Core 1/3) Open-source Java framework created in 2003 by “Pivotal Software” A light container (unlike EJBs) Mainly based on injection of dependency and AOP Easy integration of other Framework (Hibernate, JSF, Thymeleaf, ...)
  • 5. 5Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Spring Framework (Boot 2/3) Little configuration (self configurable as long as possible) Runnable Application, autonomous even in production. Embedded Tomcat (+ Jetty) , no need War No need for XML files (context.xml, web.xml, …) @annotations
  • 6. 6Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Spring Framework (Trend 3/3 @GoogleTrend)
  • 7. What is a « Batch » ?
  • 8. 8Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Batch (What's this ? 1/2) Lot Data Processing Large data process Several operations ensue for each lot Automatic or manual triggering
  • 9. 9Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Batch (Example 2/2) Example : import of daily orders Reading a lot of commands from a file Checking orders Saving the lot of commands in storage system Repeat this cycle for each lot of orders until the last Triggered by CRON, everyday at 5 am, before the arrival of the collaborators
  • 10. 10Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Spring-Batch (and ! 1/6) Open-source framework for Batch Processing Robust Batch Application (Entreprise application) Reusable functions for data processing Adds a standard approach to defining Batch Jobs
  • 11. 11Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Spring-Batch (More 2/6) Transaction management (Commit interval & Rollback) Batch processing easily (Chuncked process) Error Management, Recovery and Stopping Jobs All in Spring…
  • 12. 12Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Spring-Batch (Concept 3/6) A Job is defined by a Step Flow (one or more) A Step usually defined as a Reader, Processor and Writer A Step can also define a simple Tasklet A Job Repository (JobRepository) and a Job Executor (JobLauncher) Origin : Spring.io
  • 13. 13Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Spring-Batch (How does it work ? 4/6) The Reader read one item at a time The Processor processe one item at a time When the lot is read and processed, the Writer writes it Origin : Spring.io
  • 14. 14Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Spring-Batch (Advanced 5/6) Flow of Sequential, Conditional or Parallel Step Split Flow and Multi-threading Many Listeners (StepExecutionListener, ChunkListener, ...) Exceptions management
  • 15. 15Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Spring-Batch (Diagramme 6/6) Origin : Cépria FR
  • 16. 16Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Example from life…
  • 17. 17Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Example of use (Context 1/8) Web application that can use Jobs from View Track Jobs Status Allow to play Jobs in Synchronous and Asynchronous Three Jobs, one for importing CSV files to Database, one for exporting data to a JSON file, and one Job for importing and exporting large amounts of data (200,000 lines) into less than 10 minutes
  • 18. 18Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Example of use (Business request 2/8) Import Job : • Input : a CSV file containing employee information and their annual gross salaries • Output : processed information and taxes calculated and saved in the database Export Job : • Input : employee information and taxes calculated and saved in the database • Output : A JSON file containing all data + A REST API to perform tax calculation and validation
  • 19. 19Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Example of use (Architecture & technical choices 3/8) Use of Spring-Boot with Spring-MVC and Srping-Batch Integration of Thymeleaf as a templating engine (+ nekohtml) MySQL Driver, Mokito for tests and Jackson for JSON support <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> … <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-batch</artifactId> … <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-jpa</artifactId> … <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> … <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-thymeleaf</artifactId> … <groupId>net.sourceforge.nekohtml</groupId> <artifactId>nekohtml</artifactId> … <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId>
  • 20. 20Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Example of use (Solution diagramme 4/8)
  • 21. 21Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Example of use (Programming 5/8) An interface is available to define a Tasklet public interface ItemReader<T> { T read() throws Exception, UnexpectedInputException , ParseException; } public interface ItemWriter<T> { void write(List<? extends T> items) throws Exception; } public interface ItemProcessor<I, O> { O process(I item) throws Exception; } Three interfaces are available to define a Reader, Processor and Writer public interface Tasklet { RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception } An interface is available to define a Partitioner public interface Partitioner { Map<String, ExecutionContext> partition(int gridSize); }
  • 22. 22Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Example of use (Annotation job 6/8) @Bean public ItemReader<Person> reader() { return new PersonReaderFromFile(); } @Bean public ImportPersonItemProcessor processor() { return new ImportPersonItemProcessor(); } @Bean public ItemWriter<Person> writer() { return new PersonWriterToDataBase(); } @Bean public Tasklet cleaner() { return new CleanDBTasklet (); } @Bean public Job importUserJob() { return jobBuilderFactory.get("importUserJob").incrementer(new RunIdIncrementer()).flow(stepClean()) .next(stepImport()).end().listener(new ImportJobExecutionListener(reader()) .validator(new FileNameParameterValidator()).build(); } @Bean public Step stepImport() { return stepBuilderFactory.get("stepImport").<Person, Person>chunk(10).reader(reader()).processor(processor()).writer(writer()).build(); } @Bean public Step stepClean() { return stepBuilderFactory.get("stepClean").tasklet(cleaner()).build(); }
  • 23. 23Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Example of use (XML Jobs 7/8) <batch:job id="importUserJob"> <batch:step id="stepClean" next="importStep"> <batch:tasklet ref="cleanDBTasklet" /> </batch:step> <batch:step id="importStep"> <batch:tasklet> <batch:chunk reader="reader" writer="writer" processor="processor" commit-interval="10" /> </batch:tasklet> </batch:step> </batch:job> <bean id="reader" class="com.capgemini.reader.PersonReaderFromFile" scope="step" /> <bean id="processor" class="com.capgemini.processor.ImportPersonItemProcessor" scope="step" /> <bean id="writer" class="com.capgemini.writer.PersonWriterToDataBase" scope="step" /> <bean id="cleanDBTasklet" class="com.capgemini.tasklet.CleanDBTasklet" /> <batch:job id=“exportUserJob"> <batch:step id=“exportStep"> <batch:tasklet> <batch:chunk reader="reader" writer="writer" processor="processor" commit-interval="10" /> </batch:tasklet> </batch:step> </batch:job> <bean id="reader" class="com.capgemini.reader.PersonReaderFromDataBase" scope="step" /> <bean id="processor" class="com.capgemini.processor.ExportPersonItemProcessor" scope="step" /> <bean id="writer" class="com.capgemini.writer.PersonWriterToFile" scope="step" />
  • 24. 24Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Example of use (Application 8/8)
  • 26. 26Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Demo (Result and Conclusion1/6) Small or medium file import: • For small files, an average of 4 seconds for Synchronous and Asynchronous modes • For medium-sized files (10,000 lines), an average of 43 seconds of processing between Synchronous and Asynchronous Import large file: A processing average of 13 minutes Export 1000 rows or 10,000 rows : For the 1000 rows, an average of 4 seconds of processing, and for the 100,000 rows, an average of 30 seconds Export of 200,000 rows : A processing average of 12 minutes => For an import and export of 200,000 lines, it exceeds 10 minutes
  • 27. 27Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Demo (Solution 2/6) To import and export 200,000 lines in less than 10 minutes, multi-threading is one of the solutions Multi-threading: multiple Threads, perform a parallel task New problem: the FileReader and FileWriter are not ThreadSafe Solution for the FileReader: Split the input file so that each file is processed by a Thread Solution for the FileWriter: paginate data from the database to export several files and concatenate them at the end of the Process
  • 28. 28Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Demo (Configuration 3/6) <batch:job id="transformJob"> <batch:step id="deleteDir" next="cleanDB"> <batch:tasklet ref="fileDeletingTasklet" /> </batch:step> <batch:step id="cleanDB" next="countThread"> <batch:tasklet ref="cleanDBTasklet" /> </batch:step> <batch:step id="countThread" next="split"> <batch:tasklet ref="countThreadTasklet" /> </batch:step> <batch:step id="split" next="partitionerMasterImporter"> <batch:tasklet> <batch:chunk reader="largeCSVReader" writer="smallCSVWriter" commit-interval="#{jobExecutionContext['chunk.count']}" /> </batch:tasklet> </batch:step> <batch:step id="partitionerMasterImporter" next="partitionerMasterExporter"> <partition step="importChunked" partitioner="filePartitioner"> <handler grid-size="10" task-executor="taskExecutor" /> </partition> </batch:step> <batch:step id="partitionerMasterExporter" next="concat"> <partition step="exportChunked" partitioner="dbPartitioner"> <handler grid-size="10" task-executor="taskExecutor" /> </partition> </batch:step> <batch:step id="concat"> <batch:tasklet ref="concatFileTasklet" /> </batch:step> </batch:job>
  • 29. 29Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Demo (Configuration 4/6) <batch:step id="importChunked"> <batch:tasklet> <batch:chunk reader="smallCSVFileReader" writer="dbWriter" processor="importProcessor" commit-interval="500"> </batch:chunk> </batch:tasklet> </batch:step> <batch:step id="exportChunked"> <batch:tasklet> <batch:chunk reader="dbReader" writer="jsonFileWriter" processor="exportProcessor" commit- interval="#{jobExecutionContext['chunk.count']}"> </batch:chunk> </batch:tasklet> </batch:step> <bean id="jsonFileWriter" class="com.capgemini.writer.PersonWriterToFile" scope="step"> <property name="outputPath" value="csv/chunked/paged-#{stepExecutionContext[page]}.json" /> </bean> <bean id="dbReader" class="com.capgemini.reader.PersonReaderFromDataBase" scope="step"> <property name="iPersonRepository" ref="IPersonRepository" /> <property name="page" value="#{stepExecutionContext[page]}"/> <property name="size" value="#{stepExecutionContext[size]}"/> </bean> <bean id="countThreadTasklet" class="com.capgemini.tasklet.CountingTasklet" scope="step"> <property name="input" value="file:csv/input/#{jobParameters[filename]}" /> </bean> <bean id="cleanDBTasklet" class="com.capgemini.tasklet.CleanDBTasklet" /> <bean id="fileDeletingTasklet" class="com.capgemini.tasklet.FileDeletingTasklet"> <property name="directory" value="file:csv/chunked/" /> </bean>
  • 30. 30Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Demo (Configuration 5/6) <bean id="concatFileTasklet" class="com.capgemini.tasklet.FileConcatTasklet"> <property name="directory" value="file:csv/chunked/" /> <property name="outputFilename" value="csv/output/export.json" /> </bean> <bean id="filePartitioner" class="com.capgemini.partitioner.FilePartitioner"> <property name="outputPath" value="csv/chunked/" /> </bean> <bean id="dbPartitioner" class="com.capgemini.partitioner.DBPartitioner" scope="step"> <property name="pageSize" value="#{jobExecutionContext['chunk.count']}" /> </bean> <bean id="largeCSVReader" class="com.capgemini.reader.LineReaderFromFile" scope="step"> <property name="inputPath" value="csv/input/#{jobParameters[filename]}" /> </bean> <bean id="smallCSVWriter" class="com.capgemini.writer.LineWriterToFile" scope="step"> <property name="outputPath" value="csv/chunked/"></property> </bean> <bean id="smallCSVFileReader" class="com.capgemini.reader.PersonReaderFromFile" scope="step"> <constructor-arg value="csv/chunked/#{stepExecutionContext[file]}" /> </bean> <bean id="importProcessor" class="com.capgemini.processor.ImportPersonItemProcessor" /> <bean id="exportProcessor" class="com.capgemini.processor.ExportPersonItemProcessor" /> <bean id="dbWriter" class="com.capgemini.writer.PersonWriterToDataBase"> <property name="iPersonRepository" ref="IPersonRepository" /> </bean>
  • 31. 31Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 The test With multi-threading
  • 32. 32Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Test (Result1/3)
  • 33. 33Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Test (Threads 2/3) Before Job aunch During the execution of the Job
  • 34. 34Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Test (Output 3/3)
  • 35. 35Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Summary
  • 36. 36Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Summary (Good + 1/2) Define a pattern for Batch Embed Batch in any kind of Spring application easily Reliability, maintainability Advanced functions such as "Multi-Threading" Integrated batch testing Error Tolerance and Recovery
  • 37. 37Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Summary (Not good – 2/2) The Spring Batch Admin project is no longer maintained : mandatory switch to the Spring Cloud Data Flow Difficulty to run a job defined by annotation in a project under JAR package embarking several Jobs Version compatibility issues between Spring Batch and H2 Database, very useful for testing Jobs
  • 38. 38Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Useful links
  • 39. 39Copyright © 2017 Capgemini. All rights reserved CSD | October 2017 Useful links Full source code of the project : https://gitlab.com/mmohamed/spring-batch Documentation : • http://projects.spring.io/spring-batch/#quick-start • https://blog.octo.com/spring-batch-par-quel-bout-le-prendre • https://blog.netapsys.fr/spring-batch-par-lexemple-2 • http://jeremy-jeanne.developpez.com/tutoriels/spring/spring-batch/
  • 40. www.capgemini.com The information contained in this presentation is proprietary. © 2017 Capgemini. All rights reserved. Rightshore® is a trademark belonging to Capgemini. About Capgemini With more than 190,000 people, Capgemini is present in over 40 countries and celebrates its 50th Anniversary year in 2017. A global leader in consulting, technology and outsourcing services, the Group reported 2016 global revenues of EUR 12.5 billion. Together with its clients, Capgemini creates and delivers business, technology and digital solutions that fit their needs, enabling them to achieve innovation and competitiveness. A deeply multicultural organization, Capgemini has developed its own way of working, Collaborative Business ExperienceTM, and draws on Rightshore®, its worldwide delivery model Learn more about us at www.capgemini.com Rightshore® is a trademark belonging to Capgemini