Talend Interview Questions 2022
Talend Interview Questions 2022
1. Why use Talend over other ETL tools available in the market.
Feature Description
Faster Talend automates the tasks and further maintains them for you.
Talend provides open source tools which can be downloaded free of cost. More
Less Expense
as the processes speed up, the developer rates are reduced as well.
Talend is comprised of everything that you might need to meet the marketing
Future Proof
requirements today as well as in the future.
Talend meets all of our needs under a common foundation for the products base
Unified Platform
the needs of the organization.
Huge Community Being open source, it is backed up by a huge community.
2. What Is Talend?
Talend Open Studio is an open source project that is based on Eclipse RCP.
It supports ETL oriented implementations and is generally provided for the on-
premises deployment. This acts as a code generator which produces data
transformation scripts and underlying programs in Java. It provides an
interactive and user-friendly GUI which lets you access the metadata
repository containing the definition and configurations for each process
performed in Talend.
‘Project’ is the highest physical structure which bundles up and stores all
types of Business Models, Jobs, metadata, routines, context variables or any
other technical resources.
1. Row: The Row connection deals with the actual data flow. Following
are the types of Row connections supported by Talend:
Main
Lookup
Filter
Rejects
ErrorRejects
Output
Uniques/Duplicates
Multiple Input/Output
2. Iterate: The Iterate connection is used to perform a loop on files
contained in a directory, on rows contained in a file or on the database
entries.
3. Trigger: The Trigger connection is used to create a dependency
between Jobs or Subjobs which are triggered one after the other
according to the trigger’s nature. Trigger connections are generalized
in two categories:
1. Subjob Triggers
OnSubjobOK
OnSubjobError
Run if
2. Component Triggers
OnComponentOK
OnComponentError
Run if
4. Link: The Link connection is used to transfer the table schema
information to the ELT mapper component.
8. Differentiate between ‘OnComponentOk’ and
‘OnSubjobOk’.
OnComponentOk OnSubjobOk
Talend provides a user-friendly GUI where you can simply drag and drop the
components to design a Job. When the Job is executed, Talend Studio
automatically translates it into a Java class at the backend. Each component
present in a Job is divided into three parts of Java code (begin, main and
end). This is why Talend studio is called a code generator.
Routines are the reusable pieces of Java code. Using routines you can write
custom code in Java in order to optimize data processing, improve Job
capacity, and extend Talend Studio features.
o System routines: These are the read-only codes which you can call
directly in any Job.
o User routines: These are the routines which can be custom created
by the users by either creating new ones or adapting the existing ones.
12.Can you define schema at runtime in Talend?
Built-in Repository
1. Stored locally inside a Job 1. Stored centrally inside the Repository
2. Can be used globally by any Job within a
2. Can be used by the local Job only
project
3. Can be updated easily within a Job 3. Data is read-only within a Job
14.What are Context Variables and why they are used in Talend?
Context variables are the user-defined parameters used by Talend which are
passed into a Job at the runtime. These variables may change their values as
the Job promotes from Development to Test and Production
environment. Context variables can be defined in three ways:
Yes, you can do that by declaring a static variable within a routine. Then you
need to add the setter/getter methods for this variable in the routine itself.
Once done, this variable will be accessible from multiple Jobs.
16.What is a Subjob and how can you pass data from parent Job to child
Job?
Outline View in Talend Open Studio is used to keep the track of return values
available in a component. This will also include the user-defined values
configured in a tSetGlobal component.
18.Explain tMap component. List down the different functions that you
can perform using it.
tMap is one of the core components which belongs to the ‘Processing’ family
in Talend. It is primarily used for mapping the input data to the output data.
tMap can perform following functions:
tMap tJoin
1. It is a powerful component which can
1. Can only handle basic Join cases
handle complicated cases
2. Can accept multiple input links (one is 2. Can accept only two input links (main and
main and rest are lookups) lookup)
3. Can have only two output links (main and
3. Can have more than one output links
reject)
4. Supports multiple types of join models
4. Supports only unique join
like unique join, first join, and all join etc.
5. Supports inner join and left outer join 5. Supports only inner join
6. Can filter data using filter expressions 6. Can’t-do so
20.What is a scheduler?
A scheduler is a software which selects processes from the queue and loads
them into memory for execution. Talend does not provide a built-in scheduler.
ETL stands for Extract, Transform and Load. It refers to a trio of processes
which are required to move the raw data from its source to a data warehouse,
a business intelligence system, or a big data platform.
o Extract: This step involves accessing the data from all the Storage
Systems like RDBMS, Excel files, XML files, flat files etc.
o Transform: In this step, entire data is analyzed and various functions
are applied on it to transform that into the required format.
o Load: In this step, the processed data, i.e. the extracted and
transformed data, is then loaded to a target data repository which
usually is the database, by utilizing minimal resources.
22.Differentiate between ETL and ELT.
ETL ELT
1. Data is first Extracted, then it is 1. Data is first Extracted, then it is Loaded to
Transformed before it is Loaded into a target the target systems where it is further
system Transformed
2. With the increase in the size of data,
2. Processing is not dependent on the size of
processing slows down as entire ETL process
the data
needs to wait till Transformation is over
3. Needs deep knowledge of tools in order to
3. Easy to implement
implement
4. Doesn’t provide Data Lake support 4. Provides Data Lake support
5. Supports relational data 5. Supports unstructured data
23.Can we use ASCII or Binary Transfer mode in SFTP connection?
No, the transfer modes can’t be used in SFTP connections. SFTP doesn’t
support any kind of transfer modes as it is an extension to SSH and assumes
an underlying secure channel.
In order to schedule a Job in Talend first, you need to export the Job as a
standalone program. Then using your OS’ native scheduling tools (Windows
Task Scheduler, Linux, Cron etc.) you can schedule your Jobs.
insert or update: In this action, first Talend tries to insert a record, but if a
record with a matching primary key already exists, then it updates that record.
update or insert: In this action, Talend first tries to update a record with a
matching primary key, but if there is none, then the record is inserted.
XMS parameter is used to specify the initial heap size in Java whereas XMX
parameter is used to specify the maximum heap size in Java.
29.What is the use of Expression Editor in Talend?
From an Expression Editor, all the expressions like Input, Var or Output, and
constraint statements can be viewed and edited easily. Expression Editor
comes with a dedicated view for writing any function or transformation. The
necessary expressions which are needed for the data transformation can be
directly written in the Expression editor or you can also open
the Expression Builder dialog box where you can just write the data
transformation expressions.
o For simple Jobs, one can rely on the exception throwing process of
Talend Open Studio, which is displayed in the Run View as a red stack
trace.
o Each Subjob and component has to return a code which leads the
additional processing. The Subjob Ok/Error and Component Ok/Error
links can be used to direct the error towards an error handling routine.
o The basic way of handling an error is to define an error handling
Subjob which should execute whenever an error occurs.
31.Differentiate between the usage of tJava, tJavaRow, and tJavaFlex
components.
tJav
Functions tJavaRow tJavaFlex
a
1. Can be used to integrate
Yes Yes Yes
custom Java code
2. Will be executed only once at
Yes No No
the beginning of the Subjob
3. Needs input flow No Yes No
Only if output Only if output
4. Needs output flow No
schema is defined schema is defined
5. Can be used as the first
Yes No Yes
component of a Job
6. Can be used as a different
Yes No Yes
Subjob
7. Allows Main Flow or Iterator
Both Only Main Both
Flow
8. Has three parts of Java code No No Yes
9. Can auto propagate data No No Yes
32.How can you execute a Talend Job remotely?
You can execute a Talend Job remotely from the command line. All you need
to do is, export the job along with its dependencies and then access its
instructions files from the terminal.
33.Can you exclude headers and footers from the input files before loading
the data?
Yes, the headers and footers can be excluded easily before loading the data
from the input files.
‘Heap Space Issue’ occurs when JVM tries to add more data into the heap
space area than the space available. To resolve this issue, you need to
modify the memory allocated to the Talend Studio. Then you have to
modify the relevant Studio .ini configuration file according to your system and
need.
This component transforms and routes the data from single or multiple
sources to single or multiple destinations. It is an advanced component which
is sculpted for transforming and routing XML data flow. Especially when we
need to process numerous XML data sources.
Talend Open Studio for Big Data is the superset of Talend For Data
Integration. It contains all the functionalities provided by TOS for DI along with
some additional functionalities like support for Big Data technologies. That is,
TOS for DI generates only the Java codes whereas TOS for BD generates
MapReduce codes along with the Java codes.
In TOS for BD, the Big Data family is really very large and few of the most
used technologies are:
o Cassandra
o CouchDB
o Google Storage
o HBase
o HDFS
o Hive
o MapRDB
o MongoDB
o Pig
o Sqoop etc.
38.How can you run multiple Jobs in parallel within Talend?
1. Multithreading
2. tParallelize component
3. Automatic parallelization
39. What are the mandatory configurations needed
in order to connect to HDFS?
In order to connect to HDFS you must provide the following details:
o Distribution
o NameNode URI
o User name
40.Which service is mandatory for coordinating transactions between
Talend Studio and HBase?
This component creates a Kafka topic which the other Kafka components can
use as well. It allows you to visually generate the command to create a topic
with various properties at topic-level.
Once the data is validated, this component helps in loading the original input
data to an output stream in just one single transaction. It sets up a connection
to the data source for the current transaction.
a. Repository
b. Run view
c. Designer Workspace
d. Palette [Ans]
46.In the component view, where can you change the name of a component
from?
a. Basic settings
b. Advanced settings
c. Documentation
d. View [Ans]
47.The HDFS components can only be used with Big Data batch or Big
Data streaming Jobs.
a. True
b. False [Ans]
48. An analysis on Hive table content can be
executed in which perspective of Talend Studio?
a. Profiling [Ans]
b. Integration
c. Big Data
d. Mediation
49.What does an asterisk next to the Job name signify in the design
workspace?
a. It is an active Job
b. The Job contains unsaved changes [Ans]
c. The job is currently running
d. The Job contains errors
50.Suppose you have designed a Big Data batch using the MapReduce
framework. Now you want to execute it on a cluster using Map Reduce.
Which configurations are mandatory in the Hadoop Configuration tab
of the Run view?
a. Name Node [Ans]
b. Data Node
c. Resource Manager
d. Job Tracker [Ans]
51.How to find configuration error message for a component?
a. Right-click the component and select “Show Problems”
b. Hover over the error symbol within the Designer view [Ans]
c. Open the Errors view
d. Open the Jobs view
52.What is the process of joining two input columns in the tMap
configuration window?
a. Dragging a column from the main input table to a column in another
input table [Ans]
b. Right-clicking one column in the input table and selecting “Join”
c. Selecting two columns in two distinct input tables, right-clicking, and
selecting “Join”
d. Selecting two columns in two distinct input tables dragging them to the
output table
53.To import a file from FTP, which of the following are the mandatory
components?
a. tFTPConnection, tFTPPut
b. tFTPConnection, tFTPFileList, tFTPGet
c. tFTPConnection, tFTPGet [Ans]
d. tFTPConnection, tFTPExists, tFTPGet
54.Suppose you have three Jobs of which Jobs 1 and 2 are executed
parallelly. Job 3 executes only after Jobs 1 and 2 complete their
execution. Which of the following components can be used to set this
up?
a. tUnite
b. tPostJob [Ans]
c. tRunJob
d. tParallelize [Ans]
55.For a tFileInputDelimited component, what is the default field
separator parameter?
a. Semicolon [Ans]
b. Pipe
c. Comma
d. Colon
56.While saving the changes to a tMap configuration, sometimes Talend
asks you for confirmation to propagate changes. Why?
a. Because your changes affect the output schema and the source
component should have a matching schema
b. Because your changes affect the output schema and the target
component should have a matching schema [Ans]
c. Because your changes affect an input schema and the related source
component should have a matching schema
d. Because your changes have not been saved yet
57.In Talend, how to add a Shape into a Business Model?
a. Click and place it from the palette
b. Drag it from the repository
c. Click in the quick access toolbar
d. Drag and drop it from the palette [Ans]
58.How do you create a row link between two components?
a. HTML [Ans]
b. TEXT
c. CSV
d. XML
60.We can directly change the generated code in Talend.
a. True
b. False [Ans]
61.What is the default date pattern in Talend Open Studio?
a. MM-DD-YY
b. DD-MM-YY [Ans]
c. DD-MM-YYYY
d. YY-MM-DD
62.MDM stands for
a. Meta Data Management
b. Mobile Device Management
c. Master Data Management [Ans]
d. Mock Data Management
63.In order to encapsulate and pass the collected log data to the output,
which components must be used along with tLogCatcher?
a. tWarn [Ans]
b. tDie [Ans]
c. tStatCatcher
d. tAssertCatcher
64.Which component do you need to use in order to read data line by line
from an input flow and store the data entries into iterative global
variables?
a. tIterateToFlow
b. tFileList
c. tFlowToIterate [Ans]
d. tLoop
65.tMemorizeRows belongs to which component family in Talend?
a. Misc [Ans]
b. Orchestration
c. Internet
d. File
66._________ is a powerful input component which holds the ability to
replace a number of other components of the File family.
a. tFileInputLDIF
b. tFileInputRegex [Ans]
c. tFileInputExcel
d. tFileInputJSON
67.Which component do you need in order to prevent an unwanted commit
in MySQL database?
a. tMysqlRollback [Ans]
b. tMysqlCommit
c. tMysqlLookupInput
d. tMysqlRow
68. A database connection defined in Repository can
be reused by any Job within the project.
a. True [Ans]
b. False
69.Using which component can you integrate personalized Pig code with
a Talend program?
a. tPigCross
b. tPigMap
c. tPigDistinct
d. tPigCode [Ans]
70.tKafkaOutput component receives messages serialized into which data
type?
a. byte
b. byte[] [Ans]
c. String[]
d. Integer
71.Two which two component families do tHDFSProperties components
belongs to?
a. Big Data and Misc
b. Orchestration and Big Data
c. File and Big Data [Ans]
d. Big Data and Internet
72.This component is used to read data from cache memory for high-speed
data access
a. tHashInput [Ans]
b. tFileInputLDIF
c. tHDFSInput
d. tFileInputXML
73.Using which component you can calculate the processing time of one or
more Subjobs in the main Job?
a. tFlowMeter
b. tChronometerStart [Ans]
c. tFlowMeterCatcher
d. tStatCatcher
74.tUnite component belongs which of the following two families?
a. File and Processing
b. Misc and Messaging
c. Orchestration and Messaging
d. Orchestration and Processing [Ans]
75.Using tJavaFlex how many parts of java-code you can add in your Job?
a. One
b. Two
c. Three [Ans]
d. Four