SlideShare a Scribd company logo
Spring Batch Bootcamp



• Your host: Josh Long
    SpringSource, a division of VMware




•   Code: github.com/joshlong/spring-batch-bootcamp
•   Deck: slideshare.net/joshlong/spring-batch-behind-the-scenes




1
ahuge      amount of this came from Dr. Dave Syer
    (he’s awesome)




2
About Josh Long (Spring Developer Advocate)




                                  @starbuxman
                         josh.long@springsource.com
3
Why we’re here...




4
Agenda

• Introduce Spring Batch
    – concepts
    – specifics
    – demos




5
Inside Spring Batch

• Architecture and Domain Overview
• Application concerns and Getting Started
• Chunk-Oriented Processing




6
Inside Spring Batch

• Architecture and Domain Overview
• Application concerns and Getting Started
• Chunk-Oriented Processing




7
Spring Batch: Layered Architecture



                   Application



                   Batch Core
                 Batch Execution
                  Environment



                  Infrastructure




8
Spring Batch: Layered Architecture

                                     Business Domain
                                       – Record-level
                                      data (e.g. Trade)

                   Application



                   Batch Core
                 Batch Execution       Batch Domain -
                  Environment         Job, Chunk, Step,
                                       Partition, Status


                                      Repeat, Retry,
                  Infrastructure       Transaction,
                                      Input/Output




8
Spring Batch: Layered Architecture

                                     Business Domain
                                       – Record-level
                                      data (e.g. Trade)

                   Application
                                     Publicly exposed
                                     Batch Execution
                                       Environment
                                           APIs
                   Batch Core
                 Batch Execution       Batch Domain -
                  Environment         Job, Chunk, Step,
                                       Partition, Status


                                      Repeat, Retry,
                  Infrastructure       Transaction,
                                      Input/Output




8
Spring Batch Dependencies


                                          Spring Batch
      Samples


           Application     Core


                         Execution      Infrastructure




                                            Spring Core
       Compile
       Configuration
                                     Spring Framework




9
Batch Domain Diagram

                                                                                             Batch
                         JobParameters                                           uses       Operator
                       uses to identify and manage jobs

                                                                   JobLauncher
                   uses to construct jobs
                                                                                        starts and stops
                                                        executes
                                      JobInstance
                     recipe for
                                             *

                                      StepInstance


                         Job
                                                 stored in
                               *

                        Step

                                            Database
     Application         configures
     Developer


10
Job Configuration and Execution


                               The EndOfDay Job
          Job


                *              The EndOfDay Job
          JobInstance          for 2011/05/05


                        *         The first attempt at
                JobExecution      EndOfDay Job
                                  for 2011/05/05




11
Job and Step


     Job
                          *   Step
           *
     JobInstance

                               Step Scope
                   *
           JobExecution
                                 *   StepExecution




12
DEMO of Spring Batch Application




13
Inside Spring Batch

• Architecture and Domain Overview
• Application concerns and Getting Started
• Chunk-Oriented Processing




14
Application Concerns

•    Getting Started
•    Stateful or Stateless
•    Step Scope
•    Domain Specific Language
•    Resource Management
•    Alternative Approaches
•    Failure Modes




15
Application Concerns

•    Getting Started
•    Stateful or Stateless
•    Step Scope
•    Domain Specific Language
•    Resource Management
•    Alternative Approaches
•    Failure Modes




16
Getting Started



          Application
          Developer
                    implements               ItemProcessor (optional)

                                            input                  output (optional)
     configures
                                      ItemReader             ItemWriter



                        Job

                           *         StepExecutor concerns
                        Step

                                 RepeatOperations      ExceptionHandler




17
OK, So How Do I start?

• Find and install the appropriate .sql script in your database
     – they live in org.springframework.batch.core in spring-batch-core.jar




18
OK, So How Do I start?
@Inject JobLauncher launcher ;
@Inject @Qualifier("importData") Job job ;

@Schedule(cron = "* 15 9-17 * * MON-FRI ")
public void run15MinutesPastHourDuringBusinessDays() throws Throwable {

         Resource samplesResource = new ClassPathResource("/sample/a.csv");
         String absFilePath = "file:///" + samplesResource.getFile().getAbsolutePath();

         JobParameters params = new JobParametersBuilder()
            .addString("input.file", absFilePath)
            .addDate("date", new Date())
            .toJobParameters();

         JobExecution jobExecution = jobLauncher.run(job, params);
         BatchStatus batchStatus = jobExecution.getStatus();
         while (batchStatus.isRunning()) Thread.sleep(1000);
         JobInstance jobInstance = jobExecution.getJobInstance();
     }
19
OK, So How Do I start?

• Or... Deploy the Spring Batch Admin
     – good for operations types
     – good for auditing the batch jobs




20
DEMO of Spring Batch Admin




21
ItemReader


         public interface ItemReader<T> {

         	        T read() throws Exception,
                        UnexpectedInputException,
                        ParseException,
                        NonTransientResourceException;

         }

Returns null at end of dataset

                                      delegate Exception handling to framework




22
Database Cursor input

• Cursor is opened over all data that will be input for a given job
• Each row in the cursor is one ‘item’
• Each call to read() will advance the ResultSet by one row, and
  return one item that is equivalent to one row




23
Database Cursor Input



                    ID   NAME   BAR
                    1    foo1   bar1
                    2    foo2   bar2
                    3    foo3   bar3
                    4    foo4   bar4
                    5    foo5   bar5
                    6    foo6   bar6
                    7    foo7   bar7
                    8    foo8   bar8


24
Database Cursor Input

     FOO 2        Select * from FOO
     id=2         where id > 1 and id < 7
     name=foo2
     bar=bar2       ID     NAME    BAR
                    1      foo1    bar1
                    2      foo2    bar2
                    3      foo3    bar3
                    4      foo4    bar4
                    5      foo5    bar5
                    6      foo6    bar6
                    7      foo7    bar7
                    8      foo8    bar8


24
Database Cursor Input

                  Select * from FOO
                  where id > 1 and id < 7

                    ID     NAME    BAR
                    1      foo1    bar1
     FOO 3
                    2      foo2    bar2
     id=3
     name=foo3      3      foo3    bar3
     bar=bar3       4      foo4    bar4
                    5      foo5    bar5
                    6      foo6    bar6
                    7      foo7    bar7
                    8      foo8    bar8


24
Database Cursor Input

                  Select * from FOO
                  where id > 1 and id < 7

                    ID     NAME    BAR
                    1      foo1    bar1
                    2      foo2    bar2
                    3      foo3    bar3
                    4      foo4    bar4
                    5      foo5    bar5
     FOO 4
                    6      foo6    bar6
     id=4
     name=foo4      7      foo7    bar7
     bar=bar4       8      foo8    bar8


24
Database Cursor input

     @Bean
     public JdbcCursorItemReader reader (){
      	 JdbcCursorItemReader reader = new JdbcCursorItemReader();
     	 reader.setDataSource(dataSource());
     	 reader.setVerifyCursorPosition(true);
     	 reader.setRowMapper( new PlayerSummaryMapper());
     	 reader.setSql("SELECT GAMES.player_id, GAMES.year_no, SUM(COMPLETES), "+
     	   	   "SUM(ATTEMPTS), SUM(PASSING_YARDS), SUM(PASSING_TD), "+
     	   	   "SUM(INTERCEPTIONS), SUM(RUSHES), SUM(RUSH_YARDS), "+
     	   	   "SUM(RECEPTIONS), SUM(RECEPTIONS_YARDS), SUM(TOTAL_TD) "+
     	   	   "from GAMES, PLAYERS where PLAYERS.player_id = "+
     	 	     "GAMES.player_id group by GAMES.player_id, GAMES.year_no");	
     	 return reader;
     }

25
Xml input

• Xml files are separated into fragments based on a root
  element
• Each fragment is sent to Spring OXM for binding.
• One fragment is processed per call to read().
• Synchronized with the transaction to ensure any rollbacks
  won’t cause duplicate records.




26
Xml Input


                   <trade>
                     <isin>XYZ0001</isin>
      Fragment 1     <quantity>5</quantity>
                     <price>11.39</price>
                     <customer>Customer1</customer>
                   </trade>
                   <trade>
                     <isin>XYZ0002</isin>
      Fragment 2     <quantity>2</quantity>
                     <price>72.99</price>
                     <customer>Customer2c</customer>
                   </trade>
                   <trade>
                     <isin>XYZ0003</isin>
                     <quantity>9</quantity>
      Fragment 3     <price>99.99</price>
                     <customer>Customer3</customer>
                   </trade>




27
Xml Input


                               Spring OXM


                                 JaxB2
        Fragment 1
         Fragment 2
          Fragment 3
                                 Castor


                               XmlBeans



     Any binding framework
     supported by Spring OXM



28
Flat File input

• How lines are read from a file is separated from how a line
  is parsed allowing for easy support of additional file
  formats.
• Parsing is abstracted away from users
• Familiar usage patterns: FieldSets can be worked with in
  similar ways as ResultSets
• Supports both Delimited and Fixed-Length file formats.




29
LineTokenizer

                 public interface LineTokenizer {
                	 FieldSet tokenize(String line);
                }



     FieldSet      Line
                 Tokenizer      UK21341EAH45,978,98.34,customer1
                                UK21341EAH46,112,18.12,customer2
                                UK21341EAH47,245,12.78,customer2
                                UK21341EAH48,108,109.25,customer3
                                UK21341EAH49,854,123.39,customer4




30
FieldSetMapper

     public class TradeFieldSetMapper
     	 	 implements FieldSetMapper<Trade> {
     	
       public Trade mapFieldSet(FieldSet fieldSet)
         throws BindException {

             Trade trade = new Trade();
             trade.setIsin(fieldSet.readString(0));
             trade.setQuantity(fieldSet.readLong(1));
             trade.setPrice(fieldSet.readBigDecimal(2));
             trade.setCustomer(fieldSet.readString(3));
             return trade;
         }
     }

31
Column-name Access to FieldSet

@Bean
public DelimitedLineTokenizer tradeTokenizer() throws Exception {
	       DelimitedLineTokenizer dlt = new DelimitedLineTokenizer();
	       dlt.setDelimiter( DelimitedLineTokenizer.DELIMITER_COMMA);
	       dlt.setNames( "ISIN,Quantity,Price,Customer".split(","));
	       return dlt;
}


     Trade trade = new Trade();
     trade.setIsin(fieldSet.readString(0));
     trade.setQuantity(fieldSet.readLong(1));
     trade.setPrice(fieldSet.readBigDecimal(2));
     trade.setCustomer(fieldSet.readString(3));
     return trade;



32
Column-name Access to FieldSet

@Bean
public DelimitedLineTokenizer tradeTokenizer() throws Exception {
	       DelimitedLineTokenizer dlt = new DelimitedLineTokenizer();
	       dlt.setDelimiter( DelimitedLineTokenizer.DELIMITER_COMMA);
	       dlt.setNames( "ISIN,Quantity,Price,Customer".split(","));
	       return dlt;
}




32
Column-name Access to FieldSet

@Bean
public DelimitedLineTokenizer tradeTokenizer() throws Exception {
	       DelimitedLineTokenizer dlt = new DelimitedLineTokenizer();
	       dlt.setDelimiter( DelimitedLineTokenizer.DELIMITER_COMMA);
	       dlt.setNames( "ISIN,Quantity,Price,Customer".split(","));
	       return dlt;
}


     Trade trade = new Trade();
     trade.setIsin( fieldSet.readString(“ISIN”));
     trade.setQuantity( fieldSet.readLong(“Quantity”));
     trade.setPrice( fieldSet.readBigDecimal(“Price”));
     trade.setCustomer( fieldSet.readString(“Customer”));
     return trade;



32
Flat File Parsing errors

• It is extremely likely that bad data will be read in, when this
  happens information about the error is crucial, such as:
     – Current line number in the file
     – Original Line
     – Original Exception (e.g. NumberFormatException)
• Allows for detailed logs, that can be processed on an ad-
  hoc basis, or using a specific job for bad input data.




33
ItemProcessor



                public interface ItemProcessor<I, O> {
                  O process(I item) throws Exception;
                }




Delegate Exception handling to framework




34
Item processors

• optional
     – simple jobs may be constructed entirely with out-of-box
       readers and writers
•    sit between input and output
•    typical customization site for application developers
•    good place to coerce data into the right format for output
•    chain transformations using CompositeItemProcessor




35
ItemWriter


        public interface ItemWriter<T> {

        	           void write(List<? extends T> items) throws Exception;

        }




expects a “chunk”

                                          delegate Exception handling to framework




36
Item Writers

• handles writing and serializing a row of data
• the input might be the output of a reader or a processor
• handles transactions if necessary and associated
  rollbacks




37
ItemWriters

@Value("#{systemProperties['user.home']}")
private String userHome;

@Bean @Scope(“step”)
public FlatFileItemWriter writer( ){
	      FlatFileItemWriter w = new FlatFileItemWriter();
	      w.setName( "fw1");
       File out = new File( this.userHome,
                            "/batches/results").getAbsolutePath();
	      Resource res = new FileSystemResource(out);
	      w.setResource(res);
	      return w;	
}



38
ItemWriters
@Bean
public JpaItemWriter jpaWriter() {
	      JpaItemWriter writer = new JpaItemWriter();
	      writer.setEntityManagerFactory( entityManagerFactory() );
	      return writer;
}




39
Application Concerns

•    Getting Started
•    Stateful or Stateless
•    Step Scope
•    Domain Specific Language
•    Resource Management
•    Alternative Approaches
•    Failure Modes




40
Stateful or Stateless?

                                              Job




                                JobInstance         JobInstance


2 JobInstances can access the same
ItemWriter concurrently



                                     BusinessItemWriter




41
Stateful or Stateless: StepContext

                                               Job




                                 JobInstance         JobInstance

 Concrete ItemWriters only created
 once per step execution as needed


                                       step scoped proxy


                 BusinessItemWriter              Proxy             BusinessItemWriter




                              Job has reference to Proxy


42
Introducing the Step scope

               File writer needs to be step scoped so it can flush and close the output stream
     	   @Scope("step")
     	   @Bean
                                                   Make this bean injectable
     	   public FlatFileItemReader reader(
     	        @Value("#{jobParameters['input.file']}")
     	        Resource input ){
     	   	 FlatFileItemReader fr = new FlatFileItemReader();
     	   	 fr.setResource(input);
     	   	 fr.setLineMapper( lineMapper() );             Inner beans inherit the
     	   	 fr.setSaveState(true);                        enclosing scope by default
     	   	 return fr;		
     	   }

              Because it is step scoped the bean has access to the
              StepContext values and can get at it through Spring EL (in Spring >3)



43
Step Scope Responsibilities

• Create beans for the duration of a step
• Respect Spring bean lifecycle metadata (e.g.
  InitializingBean at start of step, DisposableBean at end
  of step)
• Allows stateful components in a multithreaded
  environment




44
Application Concerns

•    Getting Started
•    Stateful or Stateless
•    Step Scope
•    Domain Specific Language
•    Resource Management
•    Alternative Approaches
•    Failure Modes




45
Domain Specific Language

• Keeping track of all the application concerns can be a
  large overhead on a project
• Need a DSL to simplify configuration of jobs
• DSL can hide details and be aware of specific things like
  well-known infrastructure that needs to be step scoped
• The Spring way of dealing with this is to use a custom
  namespace




46
XML Namespace Example
<job id="skipJob" incrementer="incrementer"
   xmlns="http://www.springframework.org/schema/batch">          chunk has input,
                                                                   output and a
                                                                    processor
  <step id="step1">
   <tasklet>
    <chunk reader="fileItemReader" processor="tradeProcessor"
writer="tradeWriter" commit-interval="3" skip-limit="10">
	          <skippable-exception-classes>                      lots of control over
	            <include class="....FlatFileParseException" />          errors
	             <include class="....WriteFailedException" />
            </skippable-exception-classes>
	         </chunk>                                          flow control from one
	        </tasklet>                                             step to another
	        <next on="*" to="step2" />
	        <next on="COMPLETED WITH SKIPS" to="errorPrint1" />
	        <fail on="FAILED" exit-code="FAILED" />
  </step>

 ...

</job>
47
Application Concerns

•    Getting Started
•    Stateful or Stateless
•    Step Scope
•    Domain Specific Language
•    Resource Management
•    Alternative Approaches
•    Failure Modes




48
Resource Management Responsibilities

• Open resource (lazy initialisation)
• Close resource at end of step
• Close resource on failure
• Synchronize output file with transaction – rollback resets
  file pointer
• Synchronize cumulative state for restart
     – File pointer, cursor position, processed keys, etc.
     – Statistics: number of items processed, etc.
• Other special cases
     – Hibernate flush policy
     – JDBC batch update



49
File Output Resource Synchronization

                  TransactionSynchronizationManager


               Writer                   TransactionSynchronization   FileResource


write(items)

                        register(…)    create
                                                        mark()

write(items)


write(items)




  50
File Output Resource Synchronization

                  TransactionSynchronizationManager


               Writer                   TransactionSynchronization   FileResource


write(items)

                        register(…)    create
                                                        mark()

write(items)


write(items)

                        Error!




  50
File Output Resource Synchronization

                  TransactionSynchronizationManager


               Writer                   TransactionSynchronization   FileResource


write(items)

                        register(…)    create
                                                        mark()

write(items)


write(items)

                        Error!
  rollback()
                                                        reset()


  50
Hibernate Flush Policy

• First problem
     – If we do not flush manually, transaction manager handles
       automatically
     – …but exception comes from inside transaction manager, so
       cannot be caught and analyzed naturally by StepExecutor
     – Solution: flush manually on chunk boundaries
• Second problem
     – Errors cannot be associated with individual item
     – Two alternatives
        • Binary search through chunk looking for failure = O(logN)
        • Aggressively flush after each item = O(N)
• HibernateItemWriter



51
Application Concerns

•    Getting Started
•    Stateful or Stateless
•    Step Scope
•    Domain Specific Language
•    Resource Management
•    Alternative Approaches
•    Failure Modes




52
Tasklet: Alternative Approaches for Application Developer

• Sometimes Input/Output is not the way that a job is
  structured
• E.g. Stored Procedure does everything in one go, but we
  want to manage it as a Step in a Job
• E.g. repetitive process where legacy code prevents
  ItemReader/Processor being identified
• For these cases we provide Tasklet




53
Tasklet
 public interface Tasklet {

     RepeatStatus execute(StepContribution contribution,
                          ChunkContext chunkContext) throws Exception;
 }




Signal to framework about end of business process:

     RepeatStatus.CONTINUABLE              delegate Exception handling to framework
     RepeatStatus.FINISHED




54
Getting Started With a Tasklet



          Application
          Developer
                    implements                      Tasklet


     configures




                        Job

                           *         StepExecutor concerns
                        Step

                                 RepeatOperations      ExceptionHandler




55
XML Namespace Example with Tasklet


                                            Tasklet Step has
                                                Tasklet




<job id="loopJob" xmlns="http://www.springframework.org/schema/batch">
	        <step id="step1">
	        	        <tasklet ref="myCustomTasklet">
	        	        	        <transaction-attributes propagation="REQUIRED"/>
	        	        </tasklet>
	        </step>
</job>




56
Application Concerns

•    Getting Started
•    Stateful or Stateless
•    Step Scope
•    Domain Specific Language
•    Resource Management
•    Alternative Approaches
•    Failure Modes




57
Failure Modes
     Event                      Response                 Alternatives

     Bad input record (e.g.     Mark item as skipped but Abort Job when skipLimit
     parse exception)           exclude from Chunk       reached

     Bad output record –        Mark item as skipped.    Abort Job when skipLimit
     business or data integrity Retry chunk excluding    reached (after Chunk
     exception                  bad item.                completes)

     Bad output record –        Retry Chunk including    Abort Chunk when retry
     deadlock loser exception   bad item.                limit reached

     Bad Chunk – e.g. Jdbc      Retry Chunk but flush    Discard entire chunk;
     batch update fails         and commit after every   binary search for failed
                                item                     item

     Output resource failure – Graceful abort. Attempt   If database is unavailable
     e.g. disk full, permanent to save batch data.       meta data remains in
     network outage                                      “running” state!




58
Inside Spring Batch

• Architecture and Domain Overview
• Application concerns and Getting Started
• Chunk-Oriented Processing




59
Chunk-Oriented Processing

• Input-output can be grouped together = Item-Oriented
  Processing (e.g. Tasklet)
• …or input can be aggregated into chunks first = Chunk-
  Oriented Processing (the chunk element of a tasklet)
• Chunk processing can be encapsulated, and independent
  decisions can be made about (e.g.) partial failure and retry
• Step = Chunk Oriented (default, and more common)
• Tasklet = Item Oriented
• Here we compare and contrast the two approaches




60
Item-Oriented Pseudo Code



REPEAT(while more input) {

     TX {
            REPEAT(size=500) {

                  input;
                  output;

            }
     }

}



61
Item-Oriented Pseudo Code



REPEAT(while more input) {
                                     RepeatTemplate

     TX {                        TransactionTemplate
            REPEAT(size=500) {
                                     RepeatTemplate
                  input;
                  output;

            }
     }                                Business Logic


}



61
Item-Oriented Retry and Repeat



     REPEAT(while more input
              AND exception[not critical]) {
         TX {
              REPEAT(size=500) {
                   RETRY(exception=[deadlock loser]) {
                        input;
                   } PROCESS {
                        output;
                   } SKIP and RECOVER {
                        notify;
                   }
              }
         }
     }



62
Chunk-Oriented Pseudo Code



     REPEAT(while more input) {
       chunk = ACCUMULATE(size=500) {
         input;
       }
       RETRY {
         TX {
            for (item : chunk) { output; }
         }
       }
     }




63
Summary

• Spring Batch manages the loose ends so you can sleep
• Easy to Use Along with Other Spring Services
• Things We Didn’t talk about:
     – remote chunking
     – all the ItemReaders/Writers
• Spring Batch Admin Lets Operations Types Sleep Easier




64

More Related Content

What's hot (20)

PPT
J2EE Batch Processing
Chris Adkin
 
PDF
Design & Develop Batch Applications in Java/JEE
Naresh Chintalcheru
 
PDF
Java EE 7 Batch processing in the Real World
Roberto Cortez
 
PPTX
Spring Batch
Jayasree Perilakkalam
 
PDF
Atlanta JUG - Integrating Spring Batch and Spring Integration
Gunnar Hillert
 
PDF
Spring batch overivew
Chanyeong Choi
 
PPT
Spring Batch Introduction
Tadaya Tsuyukubo
 
PPTX
SBJUG - Building Beautiful Batch Jobs
stephenbhadran
 
PDF
Go faster with_native_compilation Part-2
Rajeev Rastogi (KRR)
 
KEY
Hibernate performance tuning
Sander Mak (@Sander_Mak)
 
PPTX
Apache Airflow | What Is An Operator
Marc Lamberti
 
PPT
Data Pipeline Management Framework on Oozie
ShareThis
 
PPTX
Copper: A high performance workflow engine
dmoebius
 
PDF
Avoid boring work_v2
Marcin Przepiórowski
 
PPTX
Distributed Model Validation with Epsilon
Sina Madani
 
PPT
40043 claborn
Baba Ib
 
PPTX
Dive into spark2
Gal Marder
 
PPTX
Stream processing from single node to a cluster
Gal Marder
 
PPTX
Apache airflow
Pavel Alexeev
 
PPT
Oracle Sql Tuning
Chris Adkin
 
J2EE Batch Processing
Chris Adkin
 
Design & Develop Batch Applications in Java/JEE
Naresh Chintalcheru
 
Java EE 7 Batch processing in the Real World
Roberto Cortez
 
Spring Batch
Jayasree Perilakkalam
 
Atlanta JUG - Integrating Spring Batch and Spring Integration
Gunnar Hillert
 
Spring batch overivew
Chanyeong Choi
 
Spring Batch Introduction
Tadaya Tsuyukubo
 
SBJUG - Building Beautiful Batch Jobs
stephenbhadran
 
Go faster with_native_compilation Part-2
Rajeev Rastogi (KRR)
 
Hibernate performance tuning
Sander Mak (@Sander_Mak)
 
Apache Airflow | What Is An Operator
Marc Lamberti
 
Data Pipeline Management Framework on Oozie
ShareThis
 
Copper: A high performance workflow engine
dmoebius
 
Avoid boring work_v2
Marcin Przepiórowski
 
Distributed Model Validation with Epsilon
Sina Madani
 
40043 claborn
Baba Ib
 
Dive into spark2
Gal Marder
 
Stream processing from single node to a cluster
Gal Marder
 
Apache airflow
Pavel Alexeev
 
Oracle Sql Tuning
Chris Adkin
 

Viewers also liked (12)

PDF
The Spring Update
Gunnar Hillert
 
PDF
Extending Spring for Custom Usage
Joshua Long
 
PPT
Enterprise Integration and Batch Processing on Cloud Foundry
Joshua Long
 
PDF
Spring Integration and EIP Introduction
Iwein Fuld
 
KEY
S2GX 2012 - Introduction to Spring Integration and Spring Batch
Gunnar Hillert
 
PDF
Spring integration概要
kuroiwa
 
PPTX
Pattern driven Enterprise Architecture
WSO2
 
PDF
Java Batch 仕様 (Public Review時点)
Norito Agetsuma
 
PDF
Spring integration을 통해_살펴본_메시징_세계
Wangeun Lee
 
PDF
Economies of Scaling Software
Joshua Long
 
PDF
Spring Day | Behind the Scenes at Spring Batch | Dave Syer
JAX London
 
PDF
기업 통합 패턴(Enterprise Integration Patterns) 강의
정호 차
 
The Spring Update
Gunnar Hillert
 
Extending Spring for Custom Usage
Joshua Long
 
Enterprise Integration and Batch Processing on Cloud Foundry
Joshua Long
 
Spring Integration and EIP Introduction
Iwein Fuld
 
S2GX 2012 - Introduction to Spring Integration and Spring Batch
Gunnar Hillert
 
Spring integration概要
kuroiwa
 
Pattern driven Enterprise Architecture
WSO2
 
Java Batch 仕様 (Public Review時点)
Norito Agetsuma
 
Spring integration을 통해_살펴본_메시징_세계
Wangeun Lee
 
Economies of Scaling Software
Joshua Long
 
Spring Day | Behind the Scenes at Spring Batch | Dave Syer
JAX London
 
기업 통합 패턴(Enterprise Integration Patterns) 강의
정호 차
 
Ad

Similar to Spring Batch Behind the Scenes (20)

PDF
Java Batch for Cost Optimized Efficiency
SridharSudarsan
 
PPTX
Outbrain River Presentation at Reversim Summit 2013
Harel Ben-Attia
 
PPTX
Spring batch
Chandan Kumar Rana
 
PDF
Salesforce Batch processing - Atlanta SFUG
vraopolisetti
 
PDF
Gain Proficiency in Batch Processing with Spring Batch
Inexture Solutions
 
PPTX
Spring & SpringBatch EN
Marouan MOHAMED
 
PPTX
Spring batch
Yukti Kaura
 
PDF
Spring Batch in Code - simple DB to DB batch applicaiton
tomi vanek
 
PDF
Devoxx 2009 Conference session Jbpm4 In Action
Joram Barrez
 
DOCX
springn batch tutorial
Jadae
 
ODP
Business processes, business rules, complex event processing, the JBoss way
Kris Verlaenen
 
PPTX
Alfresco Devcon 2010: A new kind of BPM with Activiti
Joram Barrez
 
PPTX
Spring batch
Deepak Kumar
 
PPTX
A new kind of BPM with Activiti
Alfresco Software
 
DOC
Datastage Online Training
Srihitha Technologies
 
DOCX
Computers in management
Kinshook Chaturvedi
 
ODP
jBPM, open source BPM
Kris Verlaenen
 
PDF
Java one 2015 [con3339]
Arshal Ameen
 
PPTX
Java Batch
Software Infrastructure
 
Java Batch for Cost Optimized Efficiency
SridharSudarsan
 
Outbrain River Presentation at Reversim Summit 2013
Harel Ben-Attia
 
Spring batch
Chandan Kumar Rana
 
Salesforce Batch processing - Atlanta SFUG
vraopolisetti
 
Gain Proficiency in Batch Processing with Spring Batch
Inexture Solutions
 
Spring & SpringBatch EN
Marouan MOHAMED
 
Spring batch
Yukti Kaura
 
Spring Batch in Code - simple DB to DB batch applicaiton
tomi vanek
 
Devoxx 2009 Conference session Jbpm4 In Action
Joram Barrez
 
springn batch tutorial
Jadae
 
Business processes, business rules, complex event processing, the JBoss way
Kris Verlaenen
 
Alfresco Devcon 2010: A new kind of BPM with Activiti
Joram Barrez
 
Spring batch
Deepak Kumar
 
A new kind of BPM with Activiti
Alfresco Software
 
Datastage Online Training
Srihitha Technologies
 
Computers in management
Kinshook Chaturvedi
 
jBPM, open source BPM
Kris Verlaenen
 
Java one 2015 [con3339]
Arshal Ameen
 
Ad

More from Joshua Long (20)

PDF
Bootiful Code with Spring Boot
Joshua Long
 
PDF
Microservices with Spring Boot
Joshua Long
 
PDF
Boot It Up
Joshua Long
 
PDF
Have You Seen Spring Lately?
Joshua Long
 
PDF
Java Configuration Deep Dive with Spring
Joshua Long
 
PDF
the Spring Update from JavaOne 2013
Joshua Long
 
PDF
Multi Client Development with Spring for SpringOne 2GX 2013 with Roy Clarkson
Joshua Long
 
PDF
REST APIs with Spring
Joshua Long
 
PDF
the Spring 4 update
Joshua Long
 
PDF
Extending spring
Joshua Long
 
PDF
The spring 32 update final
Joshua Long
 
KEY
Multi Client Development with Spring
Joshua Long
 
KEY
Integration and Batch Processing on Cloud Foundry
Joshua Long
 
KEY
using Spring and MongoDB on Cloud Foundry
Joshua Long
 
PDF
Spring in-the-cloud
Joshua Long
 
KEY
Multi Client Development with Spring
Joshua Long
 
KEY
The Cloud Foundry bootcamp talk from SpringOne On The Road - Europe
Joshua Long
 
KEY
A Walking Tour of (almost) all of Springdom
Joshua Long
 
KEY
Multi client Development with Spring
Joshua Long
 
KEY
Cloud Foundry Bootcamp
Joshua Long
 
Bootiful Code with Spring Boot
Joshua Long
 
Microservices with Spring Boot
Joshua Long
 
Boot It Up
Joshua Long
 
Have You Seen Spring Lately?
Joshua Long
 
Java Configuration Deep Dive with Spring
Joshua Long
 
the Spring Update from JavaOne 2013
Joshua Long
 
Multi Client Development with Spring for SpringOne 2GX 2013 with Roy Clarkson
Joshua Long
 
REST APIs with Spring
Joshua Long
 
the Spring 4 update
Joshua Long
 
Extending spring
Joshua Long
 
The spring 32 update final
Joshua Long
 
Multi Client Development with Spring
Joshua Long
 
Integration and Batch Processing on Cloud Foundry
Joshua Long
 
using Spring and MongoDB on Cloud Foundry
Joshua Long
 
Spring in-the-cloud
Joshua Long
 
Multi Client Development with Spring
Joshua Long
 
The Cloud Foundry bootcamp talk from SpringOne On The Road - Europe
Joshua Long
 
A Walking Tour of (almost) all of Springdom
Joshua Long
 
Multi client Development with Spring
Joshua Long
 
Cloud Foundry Bootcamp
Joshua Long
 

Recently uploaded (20)

PPTX
Simplifica la seguridad en la nube y la detección de amenazas con FortiCNAPP
Cristian Garcia G.
 
PDF
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
 
PDF
Python Conference Singapore - 19 Jun 2025
ninefyi
 
PDF
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
 
PDF
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
yosra Saidani
 
PDF
Redefining Work in the Age of AI - What to expect? How to prepare? Why it mat...
Malinda Kapuruge
 
PDF
From Chatbot to Destroyer of Endpoints - Can ChatGPT Automate EDR Bypasses (1...
Priyanka Aash
 
PDF
Open Source Milvus Vector Database v 2.6
Zilliz
 
PPTX
𝙳𝚘𝚠𝚗𝚕𝚘𝚊𝚍—Wondershare Filmora Crack 14.0.7 + Key Download 2025
sebastian aliya
 
PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
PDF
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
PPTX
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
 
PDF
The Growing Value and Application of FME & GenAI
Safe Software
 
PPTX
Smarter Governance with AI: What Every Board Needs to Know
OnBoard
 
PPSX
Usergroup - OutSystems Architecture.ppsx
Kurt Vandevelde
 
PPTX
reInforce 2025 Lightning Talk - Scott Francis.pptx
ScottFrancis51
 
PDF
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
PDF
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
 
PPTX
Paycifi - Programmable Trust_Breakfast_PPTXT
FinTech Belgium
 
PDF
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
Simplifica la seguridad en la nube y la detección de amenazas con FortiCNAPP
Cristian Garcia G.
 
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
 
Python Conference Singapore - 19 Jun 2025
ninefyi
 
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
 
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
yosra Saidani
 
Redefining Work in the Age of AI - What to expect? How to prepare? Why it mat...
Malinda Kapuruge
 
From Chatbot to Destroyer of Endpoints - Can ChatGPT Automate EDR Bypasses (1...
Priyanka Aash
 
Open Source Milvus Vector Database v 2.6
Zilliz
 
𝙳𝚘𝚠𝚗𝚕𝚘𝚊𝚍—Wondershare Filmora Crack 14.0.7 + Key Download 2025
sebastian aliya
 
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
 
The Growing Value and Application of FME & GenAI
Safe Software
 
Smarter Governance with AI: What Every Board Needs to Know
OnBoard
 
Usergroup - OutSystems Architecture.ppsx
Kurt Vandevelde
 
reInforce 2025 Lightning Talk - Scott Francis.pptx
ScottFrancis51
 
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
 
Paycifi - Programmable Trust_Breakfast_PPTXT
FinTech Belgium
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 

Spring Batch Behind the Scenes

  • 1. Spring Batch Bootcamp • Your host: Josh Long SpringSource, a division of VMware • Code: github.com/joshlong/spring-batch-bootcamp • Deck: slideshare.net/joshlong/spring-batch-behind-the-scenes 1
  • 2. ahuge amount of this came from Dr. Dave Syer (he’s awesome) 2
  • 3. About Josh Long (Spring Developer Advocate) @starbuxman [email protected] 3
  • 5. Agenda • Introduce Spring Batch – concepts – specifics – demos 5
  • 6. Inside Spring Batch • Architecture and Domain Overview • Application concerns and Getting Started • Chunk-Oriented Processing 6
  • 7. Inside Spring Batch • Architecture and Domain Overview • Application concerns and Getting Started • Chunk-Oriented Processing 7
  • 8. Spring Batch: Layered Architecture Application Batch Core Batch Execution Environment Infrastructure 8
  • 9. Spring Batch: Layered Architecture Business Domain – Record-level data (e.g. Trade) Application Batch Core Batch Execution Batch Domain - Environment Job, Chunk, Step, Partition, Status Repeat, Retry, Infrastructure Transaction, Input/Output 8
  • 10. Spring Batch: Layered Architecture Business Domain – Record-level data (e.g. Trade) Application Publicly exposed Batch Execution Environment APIs Batch Core Batch Execution Batch Domain - Environment Job, Chunk, Step, Partition, Status Repeat, Retry, Infrastructure Transaction, Input/Output 8
  • 11. Spring Batch Dependencies Spring Batch Samples Application Core Execution Infrastructure Spring Core Compile Configuration Spring Framework 9
  • 12. Batch Domain Diagram Batch JobParameters uses Operator uses to identify and manage jobs JobLauncher uses to construct jobs starts and stops executes JobInstance recipe for * StepInstance Job stored in * Step Database Application configures Developer 10
  • 13. Job Configuration and Execution The EndOfDay Job Job * The EndOfDay Job JobInstance for 2011/05/05 * The first attempt at JobExecution EndOfDay Job for 2011/05/05 11
  • 14. Job and Step Job * Step * JobInstance Step Scope * JobExecution * StepExecution 12
  • 15. DEMO of Spring Batch Application 13
  • 16. Inside Spring Batch • Architecture and Domain Overview • Application concerns and Getting Started • Chunk-Oriented Processing 14
  • 17. Application Concerns • Getting Started • Stateful or Stateless • Step Scope • Domain Specific Language • Resource Management • Alternative Approaches • Failure Modes 15
  • 18. Application Concerns • Getting Started • Stateful or Stateless • Step Scope • Domain Specific Language • Resource Management • Alternative Approaches • Failure Modes 16
  • 19. Getting Started Application Developer implements ItemProcessor (optional) input output (optional) configures ItemReader ItemWriter Job * StepExecutor concerns Step RepeatOperations ExceptionHandler 17
  • 20. OK, So How Do I start? • Find and install the appropriate .sql script in your database – they live in org.springframework.batch.core in spring-batch-core.jar 18
  • 21. OK, So How Do I start? @Inject JobLauncher launcher ; @Inject @Qualifier("importData") Job job ; @Schedule(cron = "* 15 9-17 * * MON-FRI ") public void run15MinutesPastHourDuringBusinessDays() throws Throwable { Resource samplesResource = new ClassPathResource("/sample/a.csv"); String absFilePath = "file:///" + samplesResource.getFile().getAbsolutePath(); JobParameters params = new JobParametersBuilder() .addString("input.file", absFilePath) .addDate("date", new Date()) .toJobParameters(); JobExecution jobExecution = jobLauncher.run(job, params); BatchStatus batchStatus = jobExecution.getStatus(); while (batchStatus.isRunning()) Thread.sleep(1000); JobInstance jobInstance = jobExecution.getJobInstance(); } 19
  • 22. OK, So How Do I start? • Or... Deploy the Spring Batch Admin – good for operations types – good for auditing the batch jobs 20
  • 23. DEMO of Spring Batch Admin 21
  • 24. ItemReader public interface ItemReader<T> { T read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException; } Returns null at end of dataset delegate Exception handling to framework 22
  • 25. Database Cursor input • Cursor is opened over all data that will be input for a given job • Each row in the cursor is one ‘item’ • Each call to read() will advance the ResultSet by one row, and return one item that is equivalent to one row 23
  • 26. Database Cursor Input ID NAME BAR 1 foo1 bar1 2 foo2 bar2 3 foo3 bar3 4 foo4 bar4 5 foo5 bar5 6 foo6 bar6 7 foo7 bar7 8 foo8 bar8 24
  • 27. Database Cursor Input FOO 2 Select * from FOO id=2 where id > 1 and id < 7 name=foo2 bar=bar2 ID NAME BAR 1 foo1 bar1 2 foo2 bar2 3 foo3 bar3 4 foo4 bar4 5 foo5 bar5 6 foo6 bar6 7 foo7 bar7 8 foo8 bar8 24
  • 28. Database Cursor Input Select * from FOO where id > 1 and id < 7 ID NAME BAR 1 foo1 bar1 FOO 3 2 foo2 bar2 id=3 name=foo3 3 foo3 bar3 bar=bar3 4 foo4 bar4 5 foo5 bar5 6 foo6 bar6 7 foo7 bar7 8 foo8 bar8 24
  • 29. Database Cursor Input Select * from FOO where id > 1 and id < 7 ID NAME BAR 1 foo1 bar1 2 foo2 bar2 3 foo3 bar3 4 foo4 bar4 5 foo5 bar5 FOO 4 6 foo6 bar6 id=4 name=foo4 7 foo7 bar7 bar=bar4 8 foo8 bar8 24
  • 30. Database Cursor input @Bean public JdbcCursorItemReader reader (){ JdbcCursorItemReader reader = new JdbcCursorItemReader(); reader.setDataSource(dataSource()); reader.setVerifyCursorPosition(true); reader.setRowMapper( new PlayerSummaryMapper()); reader.setSql("SELECT GAMES.player_id, GAMES.year_no, SUM(COMPLETES), "+ "SUM(ATTEMPTS), SUM(PASSING_YARDS), SUM(PASSING_TD), "+ "SUM(INTERCEPTIONS), SUM(RUSHES), SUM(RUSH_YARDS), "+ "SUM(RECEPTIONS), SUM(RECEPTIONS_YARDS), SUM(TOTAL_TD) "+ "from GAMES, PLAYERS where PLAYERS.player_id = "+ "GAMES.player_id group by GAMES.player_id, GAMES.year_no"); return reader; } 25
  • 31. Xml input • Xml files are separated into fragments based on a root element • Each fragment is sent to Spring OXM for binding. • One fragment is processed per call to read(). • Synchronized with the transaction to ensure any rollbacks won’t cause duplicate records. 26
  • 32. Xml Input <trade> <isin>XYZ0001</isin> Fragment 1 <quantity>5</quantity> <price>11.39</price> <customer>Customer1</customer> </trade> <trade> <isin>XYZ0002</isin> Fragment 2 <quantity>2</quantity> <price>72.99</price> <customer>Customer2c</customer> </trade> <trade> <isin>XYZ0003</isin> <quantity>9</quantity> Fragment 3 <price>99.99</price> <customer>Customer3</customer> </trade> 27
  • 33. Xml Input Spring OXM JaxB2 Fragment 1 Fragment 2 Fragment 3 Castor XmlBeans Any binding framework supported by Spring OXM 28
  • 34. Flat File input • How lines are read from a file is separated from how a line is parsed allowing for easy support of additional file formats. • Parsing is abstracted away from users • Familiar usage patterns: FieldSets can be worked with in similar ways as ResultSets • Supports both Delimited and Fixed-Length file formats. 29
  • 35. LineTokenizer public interface LineTokenizer { FieldSet tokenize(String line); } FieldSet Line Tokenizer UK21341EAH45,978,98.34,customer1 UK21341EAH46,112,18.12,customer2 UK21341EAH47,245,12.78,customer2 UK21341EAH48,108,109.25,customer3 UK21341EAH49,854,123.39,customer4 30
  • 36. FieldSetMapper public class TradeFieldSetMapper implements FieldSetMapper<Trade> { public Trade mapFieldSet(FieldSet fieldSet) throws BindException { Trade trade = new Trade(); trade.setIsin(fieldSet.readString(0)); trade.setQuantity(fieldSet.readLong(1)); trade.setPrice(fieldSet.readBigDecimal(2)); trade.setCustomer(fieldSet.readString(3)); return trade; } } 31
  • 37. Column-name Access to FieldSet @Bean public DelimitedLineTokenizer tradeTokenizer() throws Exception { DelimitedLineTokenizer dlt = new DelimitedLineTokenizer(); dlt.setDelimiter( DelimitedLineTokenizer.DELIMITER_COMMA); dlt.setNames( "ISIN,Quantity,Price,Customer".split(",")); return dlt; } Trade trade = new Trade(); trade.setIsin(fieldSet.readString(0)); trade.setQuantity(fieldSet.readLong(1)); trade.setPrice(fieldSet.readBigDecimal(2)); trade.setCustomer(fieldSet.readString(3)); return trade; 32
  • 38. Column-name Access to FieldSet @Bean public DelimitedLineTokenizer tradeTokenizer() throws Exception { DelimitedLineTokenizer dlt = new DelimitedLineTokenizer(); dlt.setDelimiter( DelimitedLineTokenizer.DELIMITER_COMMA); dlt.setNames( "ISIN,Quantity,Price,Customer".split(",")); return dlt; } 32
  • 39. Column-name Access to FieldSet @Bean public DelimitedLineTokenizer tradeTokenizer() throws Exception { DelimitedLineTokenizer dlt = new DelimitedLineTokenizer(); dlt.setDelimiter( DelimitedLineTokenizer.DELIMITER_COMMA); dlt.setNames( "ISIN,Quantity,Price,Customer".split(",")); return dlt; } Trade trade = new Trade(); trade.setIsin( fieldSet.readString(“ISIN”)); trade.setQuantity( fieldSet.readLong(“Quantity”)); trade.setPrice( fieldSet.readBigDecimal(“Price”)); trade.setCustomer( fieldSet.readString(“Customer”)); return trade; 32
  • 40. Flat File Parsing errors • It is extremely likely that bad data will be read in, when this happens information about the error is crucial, such as: – Current line number in the file – Original Line – Original Exception (e.g. NumberFormatException) • Allows for detailed logs, that can be processed on an ad- hoc basis, or using a specific job for bad input data. 33
  • 41. ItemProcessor public interface ItemProcessor<I, O> { O process(I item) throws Exception; } Delegate Exception handling to framework 34
  • 42. Item processors • optional – simple jobs may be constructed entirely with out-of-box readers and writers • sit between input and output • typical customization site for application developers • good place to coerce data into the right format for output • chain transformations using CompositeItemProcessor 35
  • 43. ItemWriter public interface ItemWriter<T> { void write(List<? extends T> items) throws Exception; } expects a “chunk” delegate Exception handling to framework 36
  • 44. Item Writers • handles writing and serializing a row of data • the input might be the output of a reader or a processor • handles transactions if necessary and associated rollbacks 37
  • 45. ItemWriters @Value("#{systemProperties['user.home']}") private String userHome; @Bean @Scope(“step”) public FlatFileItemWriter writer( ){ FlatFileItemWriter w = new FlatFileItemWriter(); w.setName( "fw1"); File out = new File( this.userHome, "/batches/results").getAbsolutePath(); Resource res = new FileSystemResource(out); w.setResource(res); return w; } 38
  • 46. ItemWriters @Bean public JpaItemWriter jpaWriter() { JpaItemWriter writer = new JpaItemWriter(); writer.setEntityManagerFactory( entityManagerFactory() ); return writer; } 39
  • 47. Application Concerns • Getting Started • Stateful or Stateless • Step Scope • Domain Specific Language • Resource Management • Alternative Approaches • Failure Modes 40
  • 48. Stateful or Stateless? Job JobInstance JobInstance 2 JobInstances can access the same ItemWriter concurrently BusinessItemWriter 41
  • 49. Stateful or Stateless: StepContext Job JobInstance JobInstance Concrete ItemWriters only created once per step execution as needed step scoped proxy BusinessItemWriter Proxy BusinessItemWriter Job has reference to Proxy 42
  • 50. Introducing the Step scope File writer needs to be step scoped so it can flush and close the output stream @Scope("step") @Bean Make this bean injectable public FlatFileItemReader reader( @Value("#{jobParameters['input.file']}") Resource input ){ FlatFileItemReader fr = new FlatFileItemReader(); fr.setResource(input); fr.setLineMapper( lineMapper() ); Inner beans inherit the fr.setSaveState(true); enclosing scope by default return fr; } Because it is step scoped the bean has access to the StepContext values and can get at it through Spring EL (in Spring >3) 43
  • 51. Step Scope Responsibilities • Create beans for the duration of a step • Respect Spring bean lifecycle metadata (e.g. InitializingBean at start of step, DisposableBean at end of step) • Allows stateful components in a multithreaded environment 44
  • 52. Application Concerns • Getting Started • Stateful or Stateless • Step Scope • Domain Specific Language • Resource Management • Alternative Approaches • Failure Modes 45
  • 53. Domain Specific Language • Keeping track of all the application concerns can be a large overhead on a project • Need a DSL to simplify configuration of jobs • DSL can hide details and be aware of specific things like well-known infrastructure that needs to be step scoped • The Spring way of dealing with this is to use a custom namespace 46
  • 54. XML Namespace Example <job id="skipJob" incrementer="incrementer" xmlns="http://www.springframework.org/schema/batch"> chunk has input, output and a processor <step id="step1"> <tasklet> <chunk reader="fileItemReader" processor="tradeProcessor" writer="tradeWriter" commit-interval="3" skip-limit="10"> <skippable-exception-classes> lots of control over <include class="....FlatFileParseException" /> errors <include class="....WriteFailedException" /> </skippable-exception-classes> </chunk> flow control from one </tasklet> step to another <next on="*" to="step2" /> <next on="COMPLETED WITH SKIPS" to="errorPrint1" /> <fail on="FAILED" exit-code="FAILED" /> </step> ... </job> 47
  • 55. Application Concerns • Getting Started • Stateful or Stateless • Step Scope • Domain Specific Language • Resource Management • Alternative Approaches • Failure Modes 48
  • 56. Resource Management Responsibilities • Open resource (lazy initialisation) • Close resource at end of step • Close resource on failure • Synchronize output file with transaction – rollback resets file pointer • Synchronize cumulative state for restart – File pointer, cursor position, processed keys, etc. – Statistics: number of items processed, etc. • Other special cases – Hibernate flush policy – JDBC batch update 49
  • 57. File Output Resource Synchronization TransactionSynchronizationManager Writer TransactionSynchronization FileResource write(items) register(…) create mark() write(items) write(items) 50
  • 58. File Output Resource Synchronization TransactionSynchronizationManager Writer TransactionSynchronization FileResource write(items) register(…) create mark() write(items) write(items) Error! 50
  • 59. File Output Resource Synchronization TransactionSynchronizationManager Writer TransactionSynchronization FileResource write(items) register(…) create mark() write(items) write(items) Error! rollback() reset() 50
  • 60. Hibernate Flush Policy • First problem – If we do not flush manually, transaction manager handles automatically – …but exception comes from inside transaction manager, so cannot be caught and analyzed naturally by StepExecutor – Solution: flush manually on chunk boundaries • Second problem – Errors cannot be associated with individual item – Two alternatives • Binary search through chunk looking for failure = O(logN) • Aggressively flush after each item = O(N) • HibernateItemWriter 51
  • 61. Application Concerns • Getting Started • Stateful or Stateless • Step Scope • Domain Specific Language • Resource Management • Alternative Approaches • Failure Modes 52
  • 62. Tasklet: Alternative Approaches for Application Developer • Sometimes Input/Output is not the way that a job is structured • E.g. Stored Procedure does everything in one go, but we want to manage it as a Step in a Job • E.g. repetitive process where legacy code prevents ItemReader/Processor being identified • For these cases we provide Tasklet 53
  • 63. Tasklet public interface Tasklet { RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception; } Signal to framework about end of business process: RepeatStatus.CONTINUABLE delegate Exception handling to framework RepeatStatus.FINISHED 54
  • 64. Getting Started With a Tasklet Application Developer implements Tasklet configures Job * StepExecutor concerns Step RepeatOperations ExceptionHandler 55
  • 65. XML Namespace Example with Tasklet Tasklet Step has Tasklet <job id="loopJob" xmlns="http://www.springframework.org/schema/batch"> <step id="step1"> <tasklet ref="myCustomTasklet"> <transaction-attributes propagation="REQUIRED"/> </tasklet> </step> </job> 56
  • 66. Application Concerns • Getting Started • Stateful or Stateless • Step Scope • Domain Specific Language • Resource Management • Alternative Approaches • Failure Modes 57
  • 67. Failure Modes Event Response Alternatives Bad input record (e.g. Mark item as skipped but Abort Job when skipLimit parse exception) exclude from Chunk reached Bad output record – Mark item as skipped. Abort Job when skipLimit business or data integrity Retry chunk excluding reached (after Chunk exception bad item. completes) Bad output record – Retry Chunk including Abort Chunk when retry deadlock loser exception bad item. limit reached Bad Chunk – e.g. Jdbc Retry Chunk but flush Discard entire chunk; batch update fails and commit after every binary search for failed item item Output resource failure – Graceful abort. Attempt If database is unavailable e.g. disk full, permanent to save batch data. meta data remains in network outage “running” state! 58
  • 68. Inside Spring Batch • Architecture and Domain Overview • Application concerns and Getting Started • Chunk-Oriented Processing 59
  • 69. Chunk-Oriented Processing • Input-output can be grouped together = Item-Oriented Processing (e.g. Tasklet) • …or input can be aggregated into chunks first = Chunk- Oriented Processing (the chunk element of a tasklet) • Chunk processing can be encapsulated, and independent decisions can be made about (e.g.) partial failure and retry • Step = Chunk Oriented (default, and more common) • Tasklet = Item Oriented • Here we compare and contrast the two approaches 60
  • 70. Item-Oriented Pseudo Code REPEAT(while more input) { TX { REPEAT(size=500) { input; output; } } } 61
  • 71. Item-Oriented Pseudo Code REPEAT(while more input) { RepeatTemplate TX { TransactionTemplate REPEAT(size=500) { RepeatTemplate input; output; } } Business Logic } 61
  • 72. Item-Oriented Retry and Repeat REPEAT(while more input AND exception[not critical]) { TX { REPEAT(size=500) { RETRY(exception=[deadlock loser]) { input; } PROCESS { output; } SKIP and RECOVER { notify; } } } } 62
  • 73. Chunk-Oriented Pseudo Code REPEAT(while more input) { chunk = ACCUMULATE(size=500) { input; } RETRY { TX { for (item : chunk) { output; } } } } 63
  • 74. Summary • Spring Batch manages the loose ends so you can sleep • Easy to Use Along with Other Spring Services • Things We Didn’t talk about: – remote chunking – all the ItemReaders/Writers • Spring Batch Admin Lets Operations Types Sleep Easier 64

Editor's Notes