0% found this document useful (0 votes)

98 views

Module-2 PPT-1

hdfs dfs -ls /

Uploaded by

Lahari bilimale

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views

Module-2 PPT-1

hdfs dfs -ls /

Uploaded by

Lahari bilimale

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 126

Lesson 1

Hadoop

1
• Hadoop is an open-source software framework for storing data
and running applications on clusters of commodity hardware.

• It provides massive storage for any kind of data, enormous

processing power and the ability to handle virtually limitless
concurrent tasks or jobs.
• Hadoop Ecosystem is a platform or a suite which provides
various services to solve the big data problems.
• It includes Apache projects and various commercial tools and
solutions.
• There are four major elements of Hadoop
i.e. HDFS, MapReduce, YARN, and Hadoop Common.
• Most of the tools or solutions are used to supplement or
support these major elements.
• All these tools work collectively to provide services such as
absorption, analysis, storage and maintenance of data etc.
Following are the components that collectively form a Hadoop
ecosystem:
•HDFS: Hadoop Distributed File System
•YARN: Yet Another Resource Negotiator
•MapReduce: Programming based Data Processing
•Spark: In-Memory data processing
•PIG, HIVE: Query based processing of data services
•HBase: NoSQL Database
•Mahout, Spark MLLib: Machine Learning algorithm libraries
•Solar, Lucene: Searching and Indexing
•Zookeeper: Managing cluster
•Oozie: Job Scheduling
Hadoop Platform
• Provides a low cost Big Data
platform, which is open source and
uses cloud services

5
Hadoop
• Tera Bytes of data processing takes
just few minutes
• Hadoop enables distributed processing
of large datasets (above 10 million
bytes) across clusters of computers
using a programming model called
MapReduce.

6
Hadoop Distributed File System

Hadoop Data Storage

Hadoop Physical Organisation
Map Reduce Programming Model
YARN based Execution Model
Hadoop System Characteristics
• Scalable
• Self-manageable
• Self-healing
• Distributed file system

12
Scalability
• Means can be scaled up (enhanced) by
adding storage and processing units as
per the requirements failure.

13
Self Manageability
• Means creation of storage and
processing resources which are used,
scheduled and reduced or increased
with the help of the system itself

14
Self Healing
• Means taken care of by the system
itself in case of faults
• Enables functioning and resources
availability
• Software detect and handle failures at
the task level and also Software
enable the task execution on
communication failure.
15
Figure 2.1 Core components of Hadoop

16
•Hadoop Common: The common utilities that support the
other Hadoop modules.
•Hadoop Distributed File System (HDFS): A distributed file
system that provides high-throughput access to application
data.
•Hadoop YARN: A framework for job scheduling and cluster
resource management.
•Hadoop MapReduce: A YARN-based system for parallel
processing of large data sets.
Features of Hadoop Which Makes It Popular
1. Open Source:
2. Highly Scalable Cluster
3 . Fault Tolerance is Available
4. High Availability is Provided
5. Cost-Effective:
6. Hadoop Provide Flexibility
7. Easy to Use
8. Provides Faster Data Processing
Hadoop ecosystem:
•HDFS: Hadoop Distributed File System
•YARN: Yet Another Resource Negotiator
•MapReduce: Programming based Data Processing
•Spark: In-Memory data processing
•PIG, HIVE: Query based processing of data services
•HBase: NoSQL Database
•Mahout, Spark MLLib: Machine Learning algorithm libraries
•Solar, Lucene: Searching and Indexing
•Zookeeper: Managing cluster
•Oozie: Job Scheduling
Figure 2.2 Hadoop main components and
ecosystem components

22
End of Lesson 1 on
Hadoop

24
MODULE 2
Chapter 3: Hadoop Distributed File System Basics

Prepared By: Mrs. SHWETHA C H

HDFS Design Features

▪ The HDFS was designed for Big Data processing.

▪ Distributed and parallel computation.

▪ HDFSis not designed asatrue parallel file system.

▪ Designassumes alarge file write-once/read manymodel

▪ HDFSrigorously restricts data writing to one user at atime.

▪ Alladditional writes are append only,there is no random writing toHDFSfiles.

▪ HDFSdesign is basedon Goggle File System(GFS)

▪ HDFSis designedfor data streaming where large amounts of data are read from disk in bulk.
▪ HDFS block size is typically 64MBto 128MB

▪ Map reduce emphasis on moving the computation to the data

▪ Asingle server node in the cluster is often both a computation engine

as well as storage engine for the application.

▪ HDFS has redundant design that can tolerate system failures.

Discuss the important aspects of HDFS (5 Marks)
HDFS Components

⚫ Explain HDFS components with a neat diagram. (8 Marks)

• or
⚫Explain Various system roles in an HDFS deployment (8
Marks)
HDFS Components
• The design of HDFSis basedonTwotypes of nodes

• A Single Name Node

• Multiple DataNode
• Single NameNode manages all the metadata

• No data on the NameNode.

• Minimal Adobe installation

• Master/Slave architecture

• File system namespace operations

• Mapping of blocks to DataNodes

• Slaves serving read and write request

• NameNode- Block creation, Deletion and Replicatioon

• Example – client write Data

• File block Replication

• Acknowledgement

• Reading Data

• Block Report
How Client write Data to DataNode
How Client read Data from DataNode
FOR Performance Reason

• The mappings b/w data blocks and physical DataNodes are not kept in persistent storage
on the NameNode. (No local caching mechanism)
• The NameNode stores all metadata in memory.
• Block reports are sent every 10 heartbeat.
• In almost all Hadoop deployments, there is a SecondaryNameNode(Checkpoint Node).
• It is not an active failover node and cannot replace the primary NameNode in case of it
failure
Secondary NameNode (SNN)
[checkpiont]
Thus the various important roles in HDFS are:
• HDFS uses a master/slave model designed for large file reading or streaming.

• The NameNode is a metadata server or “Data traffic cop”.

• HDFS provides a single namespace that is managed by the NameNode.

• Data is redundantly stored on DataNodes ; there is no data on NameNode.

• SecondaryNameNode performs checkpoints of NameNode file system’s state but is not

a failover node.
HDFS Block replication
• HDFS is a reliable system, it stores multiple copies of data.

Store the data

•
• What if a data block corrupts or fails?
• - chances of loosing the Data permanently

• HOW to overcome this problem?

• - Block replication
node
• When HDFS writes a file, it is replicated across the cluster.

• The amount of replication is based on the value of dfs.replication in the hdfs-site.xml

file.

• Hadoop clusters containing more than eight DataNodes, the replication value is usually
set to 3.

• In a Hadoop cluster of fewer DataNodes but more than one DataNode , a replication
factor of 2 is adequate.

• For a single machine ,like pseudo-distributed the replication factor is set to 1.

• HDFS default block size is often 64MB.In a typical OS , the block size is 4KB or 8KB.
• The figure above provides an example of how a file is broken into blocks and replicated across the
cluster.
• In this case replication factor of 3 ensures that any one DataNode can fail and the replicated blocks will
be available on other nodes and subsequently re-replicated on other DataNodes.
HDFS Safe Mode
⚫ When NameNode starts- read only safe mode where blocks

cannot be replicated or deleted.

⚫ Safe Mode enables the NameNode to perform two

important processes,

1. The previous file system state is reconstructed by

loading the fsimage file into memory and replaying the

edit log.

2. The mapping between blocks and data nodes is created by

waiting for enough of data nodes to register so that atleast

one copy of the data is available.
⚫ Hdfs may also enter safe mode for maintenance

using hdfs dfsadmin-safemode command.

Rack Awareness
• It deals with data locality.
• -One of the main design goals Hadoop MapReduce is

• to move the computation to the data

• When YARN scheduler is assigning MapReduce containers to
work as mappers, it will try to place containers first on the
local machine, then on same rack, and finally on another rack.
• In addition NameNode tries to replace replicated data blocks on multiple racks for
improved fault tolerance.

• In such a case, an entire rack failure will not cause data loss or stop HDFS from
working.

• HDFS can be made rack-aware by using a user-derived script that enables the
master node to map the network topology of the cluster.

• A default Hadoop installation assumes all the nodes belong to the same rack.
NameNode High Availability
scalability

Better
performance

⚫ HDFS BackupNode maintains an upto-date copy of the file system namespace both in memory and on
disk.
⚫ A NameNode supports one BackupNode at a time.
HDFS snapshots
HDFS User Commands

⚫ The preferred wayto interact with HDFSin Hadoop version2 isthroughthe hdfs command
List Files in HDFS
❖ To list the files in the root HDFS directory, enter the following

command:

• Syntax: $ hdfs dfs -ls /

• Output:
• Found 2 items
• drwxrwxrwx - yarn hadoop 0 2015-04-29 16:52 /app-logs
• drwxr-xr-x - hdfs hdfs 0 2015-04-21 14:28 /apps

• To list files in your home directory, enter the following command:

• Syntax: $ hdfs dfs -ls
• Output:
• Found 2 items
• drwxr-xr-x - hdfs hdfs 0 2015-05-24 20:06 bin
• drwxr-xr-x - hdfs hdfs 0 2015-04-29 16:52 examples
 Make a Directory in HDFS

❖ To make a directory in HDFS, use the following command. As with the

-ls command, when no path is supplied, the user’s home directory is
used

• Syntax: $ hdfs dfs -mkdir stuff

 Copy Files to HDFS

❖ To copy a file from your current local directory into HDFS, use the
following command. If a full path is not supplied, your home directory is
assumed. In this case, the file test is placed in the directory stuff that was
created previously.
• Syntax: $ hdfs dfs -put test stuff

❖ The file transfer can be confirmed by using the -ls

command:

❖ Syntax: $ hdfs dfs -ls stuff

• Output:

• Found 1 items

• -rw-r--r-- 2 hdfs hdfs 12857 2015-05-29 13:12 stuff/test

• Copy Files from HDFS
❖ Files can be copied back to your local file system using the following command.

❖ In this case, the file we copied into HDFS, test, will be copied back to the current local
directory with the name test-local.

• Syntax: $ hdfs dfs -get stuff/test test-local

• Copy Files within HDFS

• The following command will copy a file in HDFS:

• Syntax: $ hdfs dfs -cp stuff/test test.hdfs

• Delete a File within HDFS

❖ The following command will delete the HDFS file

test.hdfs that was Syntax: $ hdfs dfs -rm test.hdfs
• Get an HDFS Status Report
• $ hdfs dfsadmin -report
• Configured Capacity: 1503409881088 (1.37 TB)
• Present Capacity: 1407945981952 (1.28 TB)
• DFS Remaining: 1255510564864 (1.14 TB)
• DFS Used: 152435417088 (141.97 GB)
• DFS Used%: 10.83%
• Under replicated blocks: 54
• Blocks with corrupt replicas: 0
• Missing blocks: 0
Using the Web GUI to Monitor
Examples

• This section provides an illustration of using the YARN

ResourceManager web GUI to monitor and find information
about YARN jobs.

• The Hadoop version 2 YARN ResourceManager web GUI differs

significantly from the MapReduce web GUI found in Hadoop version
1.
Module 2
Chapter 3
Hadoop ecosystem

• Pig: It is a procedural language platform used to develop a script for MapReduce operations.
• Sqoop: It is used to import and export data to andfrom between HDFS and RDBMS.
• Hbase: HBase is a distributed column-oriented database built on top of the
Hadoop file system.
• Hive: It is a platform used to develop SQL type scripts to do MapReduce operations.
• Flume: Used to handle streaming data on the topof Hadoop.
• Oozie: Apache Oozie is a workflow schedulerfor Hadoop.
Introduction to Pig

• Pig raises the level of abstraction for processing large amount of datasets.
• It is a fundamental platform for analyzing large amount of data sets which
consists of a high level language for expressing data analysis programs.
• It is an open source platform developed by yahoo.
usage of Apache pig
▪ Apache Pig is a high-level language that enables programmers to write
complex Map Reduce transformations using a simple scripting language.
▪ Pig’s simple SQL-like scripting language is called Pig Latin, and appeals
to developers already familiar with scripting languages and SQL.
▪ Pig Latin (the actual language) defines a set of transformations on a
data set such as aggregate, join, and sort.
▪ Pig is often used to extract, transform, and load (ETL) data , quick
research on raw data.
▪ Apache Pig has several usage modes. The first is a local mode in which
all processing is done on the local machine.
▪ The non-local (cluster) modes are Map Reduce and Tez.
▪ These modes execute the job on the cluster using either the Map
Reduce engine or the optimized Tez engine.
MapReduce Parallel Data Flow

 The basic Stepsare:

1) Input Splits
2) Map Step
3) Combiner Step
4) Shuffle Step
5) Reduce Step
Pig Example Walk-Through:

▪ Working knowledge of Pig through the hand-on experience of creating pig scripts to carry out
essential data operations and tasks.

▪ In this simple example, Pig is used to extract user names from the
/etc/passwd file.

▪ The following example assumes the user is hdfs , but any valid user with access to HDFS can run the
example.

• To begin first, copy the passwd file to a working directory for local Pig operation:

$cp /etc/passwd.
▪ Next, copy the data file into HDFS for Hadoop Map Reduce operation:

▪ $hdfs dfs –put passwd passwd.

▪ To confirm the file is in HDFS by entering the following command:

▪ hdfs dfs –ls passwd
▪ -rw-r--r- 2 hdfs hdfs 2526 2015-03-17 11:08 passwd.

▪ In local Pig operation, all processing is done on the local machine (Hadoop is not
used). First, the interactive command line started: $ pig -x local.

▪ If Pig starts correctly, you will see a grunt> prompt.

▪ And also see a bunch of INFO messages. Next, enter the commands to load the
passwd file and then grab the user name and dump it to the terminal.
▪ Pig commands must end with a semicolon (;).
▪ grunt> A= load 'passwd' using Pig Storage(':') ;
▪ grunt>B = foreach A generate $0 as id;
▪ grunt>dumpB;

▪ The processing will start and a list of user names will be printed to
the screen.

▪ To exit the interactive session, enter the command quit.

• o $ grunt> quit.
USING APACHE HIVE
• Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data
summarization, ad hoc queries, and the analysis of large data sets using a SQL-like language
called HiveQL.
• Hive is considered the de facto standard for interactive SQL queries over petabytes of data
using Hadoop and offers the following features:

1. Tools to enable easy data extraction, transformation, and loading

(ETL)
2. A mechanism to impose structure on a variety of data formats
3. Access to files stored either directly in HDFS or in other data
storage systems such as HBase
4. Query execution via MapReduce and Tez (optimized MapReduce)
• Hive provides users who are already familiar with SQL the capability to query the data on
Hadoop clusters.
To start Hive, simply enter the hive command. If Hive starts correctly, you should get a hive> prompt.
$ hive
(some messages may show up here)
hive>
As a simple test, create and drop a table. Note that Hive commands must end
with a semicolon (;).
hive> CREATE TABLE pokes (foo INT, bar STRING);
OK
Time taken: 1.705 seconds
hive> SHOW TABLES;
OK
pokes
Time taken: 0.174 seconds, Fetched: 1 row(s)
hive> DROP TABLE pokes;
OK
Time taken: 4.038 seconds
First, create a table using the following command:
hive> CREATE TABLE logs(t1 string, t2 string, t3 string, t4 string, t5
string, t6 string, t7 string) ROW FORMAT DELIMITED FIELDS
TERMINATED BY ‘ ’;
OK
Time taken: 0.129 seconds
Next, load the data—in this case, from the sample.log file.
Note that the file is found in the local directory and not in HDFS.
hive> LOAD DATA LOCAL INPATH 'sample.log' OVERWRITE INTO
TABLE logs;
Loading data to table default.logs
Table default.logs stats: [numFiles=1, numRows=0, totalSize=99271,
rawDataSize=0]
OK
Time taken: 0.953 seconds
Finally, apply the select step to the file. Note that this invokes a Hadoop
• hive> SELECT t4 AS sev, COUNT(*) AS cnt FROM logs Cumulative CPU 4.07 sec
WHERE t4 LIKE '[%'
MapReduce Total cumulative CPU time: 4 seconds 70 msec
GROUP BY t4;
Ended Job = job_1427397392757_0001
Query ID = hdfs_20150327130000_d1e1a265-a5d7-4ed8-
b785-2c6569791368 MapReduce Jobs Launched:
Total jobs = 1 Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.07 sec
Launching Job 1 out of 1 HDFS

1 In order to change the average load for a reducer ( Read: 106384

in bytes): HDFS Write: 63 SUCCESS

set hive.exec.reducers.bytes.per.reducer=<number> Total MapReduce CPU Time Spent: 4 seconds 70 msec
In order to limit the maximum number of reducers: OK
set hive.exec.reducers.max=<number> [DEBUG] 434
In order to set a constant number of reducers: [ERROR] 3
set mapreduce.job.reduces=<number> [FATAL] 1
Starting Job = job_1427397392757_0001, Tracking URL =
[INFO] 96
http://norbert:8088/proxy/
[TRACE] 816
application_1427397392757_0001/
[WARN] 4
Kill Command = /opt/hadoop-2.6.0/bin/hadoop job -kill
Time taken: 32.624 seconds, Fetched: 6 row(s)
To exit Hive, simply type exit;:
hive> exit;
Apache Sqoop to Acquire Relational data with an example.

▪ Sqoop is a tool designed to transfer data between Hadoop and

relational databases.

▪ Sqoop is used to
• -import data from a relational database
management system (RDBMS) into the Hadoop
Distributed File System(HDFS),
- transform the data in Hadoop and

- export the data back into an RDBMS.

Sqoop import method :
• The data import is done in two steps :
1) Sqoop examines the database to gather the necessary metadata for the
data to be imported.
2) Map-only Hadoop job : Transfers the actual data using the metadata.

▪ The imported data are saved in an HDFS directory.

▪ Sqoop will use the database name for the directory, or the user can
specify any alternative directory where the files should be populated.
▪ By default, these files contain comma delimited fields, with new lines
separating different records.
Sqoop Export method :
• Data export from the cluster works in a similar fashion. The export is done in
two steps :

1) examine the database for metadata.

2) Map-only Hadoop job to write the data to the database.

• Sqoop divides the input data set into splits, then uses individual map tasks to
push the splits to the database
Example: The following example shows the use of sqoop:

• Steps:
1. Download Sqoop.
2. Download and load sample MySQL data.
3. Add Sqoop user permissions for the local machine and cluster.
4. Import data from MySQL to HDFS.
5. Export data from HDFS to MySQL.
Step 1: Download Sqoop and Load Sample MySQL Database

• To install sqoop,

• # yum install sqoop sqoop-metastore

• To download database,

• $ wget http : //downloads.mysql.com/docs/world_innodb.sql.gz

next, log into mysql (assumes you have privileges to create a database) and
import the desired database by following these steps:
The following MySQL command will let you see the table details
mysql> SHOW CREATE TABLE Country;
mysql> SHOW CREATE TABLE City;
mysql> SHOW CREATE TABLE CountryLanguage;
Step 2: Add Sqoop User Permissions for the Local Machine and Cluster.
• In MySQL, add the following privileges for user sqoop to MySQL.

•mysql> GRANT ALL PRIVILEGES ON world.* To 'sqoop'@'limulus'

IDENTIFIED BY 'sqoop';

•mysql> GRANT ALL PRIVILEGES ON world.* To 'sqoop'@'10.0.0.%'

IDENTIFIED BY 'sqoop';

• mysql> quit
Next, log in as sqoop to test the permissions:
$ mysql -u sqoop -p
mysql> USE world;
mysql> SHOW TABLES;
+---------------------+
| Tables_in_world |
+---------------------+
| City |
| Country |
| CountryLanguage|
+----------------------+
3 rows in set (0.01 sec)
mysql> quit
Step 3: Import Data Using Sqoop
• To import data, we need to make a directory in HDFS:

• $ hdfs dfs -mkdir sqoop-mysql-import

• The following command imports the Country table into HDFS. The option -table
signifies the table to import, --target-dir is the directory created previously, and -m 1
tells Sqoop to use one map task to import the data.

• $ sqoop import --connect jdbc:mysql://limulus/world --username sqoop

• --password sqoop --table Country -m 1 --target-dir /user/hdfs/sqoopmysql-

import/country

• The file can be viewed using the hdfs dfs -cat command:
Step 4: Export Data from HDFS to MySQL

• Sqoop can also be used to export data from HDFS. The first step is to create
tables for exported data.

• There are actually two tables needed for each exported table. The first table
holds the exported data (CityExport), and the second is used for staging the
exported data (CityExportStaging).
Enter the following MySQL commands to create these tables:

mysql> CREATE TABLE 'CityExport’ ( 'ID' int(11) NOT NULL

AUTO_INCREMENT,
'Name' char(35) NOT NULL DEFAULT '',
'CountryCode' char(3) NOT NULL DEFAULT '',
'District' char(20) NOT NULL DEFAULT '',
'Population' int(11) NOT NULL DEFAULT '0',
PRIMARY KEY ('ID’));

mysql> CREATE TABLE 'CityExportStaging’ ( 'ID' int(11) NOT NULL

AUTO_INCREMENT,
'Name' char(35) NOT NULL DEFAULT '',
'CountryCode' char(3) NOT NULL DEFAULT '',
'District' char(20) NOT NULL DEFAULT '',
• Then use the following command to export the cities data into MySQL:

•sqoop --options-file cities-export-options.txt --table CityExport --

staging-table CityExportStaging --clear-staging-table -m 1 –exportdir
/user/hdfs/sqoop-mysql-import/city

• $ mysql> select * from CityExport limit 10;

Apache Flume to acquire data streams

▪ Apache Flume is an independent agent designed to collect, transport, and store data into
HDFS.
▪ Data transport involves a number of Flume agents that may traverse a series of machines and
locations.
▪ Flume is often used for log files, social media-generated data, email messages, and just about
any continuous data source.
▪ Flame agent is composed of three components.
o Source: The source component receives data and sends it to a channel. It can send the
data to more than one channel.
o Channel: A channel is a data queue that forwards the source data to the sink
destination.
o Sink: The sink delivers data to destination such as HDFS, a local file, or another
Flume agent.
▪ A Flume agent must have all three of these components defined. Flume agent can have
several source, channels, and sinks.
▪ Source can write to multiple channels, but a sink can take data from only a single channel.
▪ Data written to a channel remain in the channel until a sink removes the data.
▪ By default, the data in a channel are kept in memory but may be optionally stored on disk
to prevent data loss in the event of a network failure.
▪ As shown in the above figure, Sqoop agents may be placed in a pipeline, possibly
to traverse several machines or domains.
▪ In this Flume pipeline, the sink from one agent is connected to the source of
another.
▪ The data transfer normally used by Flume, which is called Apache Avro.
▪ Avro is a data serialization/deserialization system that uses a compact binary
format.
▪ The scheme is sent as part of the data exchange and is defined using JSON.
▪ Avro also uses remote procedure calls (RPCs) to send data.
Oozie Example Walk-Through
To run the Oozie MapReduce example job from the oozieexamples/
apps/map-reduce directory, enter the following line:
Click here to view code image
$ oozie job -run -oozie http://limulus:11000/oozie -config
job.properties
When Oozie accepts the job, a job ID will be printed:
Click here to view code image
job: 0000001-150424174853048-oozie-oozi-W
You will need to change the “limulus” host name to match the name of the node running your Oozie server.
The job ID can be used to track and control job progress.
Step 3: Run the Oozie Demo Application
• A more sophisticated example can be found in the demo directory (oozieexamples/ apps/demo). This
workflow includes MapReduce, Pig, and file system tasks as well as fork, join, decision, action, start, stop,
kill, and end nodes.
• Move to the demo directory and edit the job.properties file as described previously. Entering the following
command runs the workflow (assuming the OOZIE_URL environment variable has been set):
• $ oozie job -run -config job.properties
• You can track the job using either the Oozie command-line interface or the Oozie web console. To start the
web console from within Ambari, click on the Oozie service, and then click on the Quick Links pull-down
menu and select Oozie Web UI. Alternatively, you can start the Oozie web UI by connecting to the Oozie
server directly. For example, the following command will bring up the Oozie UI (use your Oozie server host
name in place of “limulus”):
• $ firefox http://limulus:11000/oozie/
A Short Summary of Oozie Job Commands
The following summary lists some of the more commonly encountered Oozie
commands. See the latest documentation at http://oozie.apache.org for more
information. (Note that the examples here assume OOZIE_URL is defined.)
Run a workflow job (returns _OOZIE_JOB_ID_):
$ oozie job -run -config JOB_PROPERITES
Submit a workflow job (returns _OOZIE_JOB_ID_ but does not start):
$ oozie job -submit -config JOB_PROPERTIES
Start a submitted job:
$ oozie job -start _OOZIE_JOB_ID_
Check a job’s status:
$ oozie job -info _OOZIE_JOB_ID_
Suspend a workflow:
$ oozie job -suspend _OOZIE_JOB_ID_
Resume a workflow:
$ oozie job -resume _OOZIE_JOB_ID_
Rerun a workflow:
$ oozie job -rerun _OOZIE_JOB_ID_ -config JOB_PROPERTIES
Kill a job:
$ oozie job -kill _OOZIE_JOB_ID_
View server logs:
$ oozie job -logs _OOZIE_JOB_ID_
Full logs are available at /var/log/oozie on the Oozie server.
HBase provides a shell for interactive use. To enter the shell, type the following as a user:
$ hbase shell
hbase(main):001:0>
To exit the shell, type exit. Various commands can be conveniently entered from the shell prompt. For instance, the status command provides the
system status:
hbase(main):001:0> status
4 servers, 0 dead, 1.0000 average load
Additional arguments can be added to the status command, including 'simple', 'summary', or 'detailed'. The single quotes are needed for proper
operation. For example, the following command will provide simple status information for the four HBase servers (actual server statistics have been
removed for clarity):
hbase(main):002:0> status 'simple'
4 live servers
n1:60020 1429912048329
n2:60020 1429912040653
limulus:60020 1429912041396
...
n0:60020 1429912042885
...
0 dead servers
Aggregate load: 0, regions: 4
Apache HBase Web Interface
• Like many of the Hadoop ecosystem tools, HBase has a web interface. To start the HBase console,
shown in Figure 7.11, from within Ambari, click on the HBase service, and then click on the Quick
Links pull-down menu and select HBase Master UI. Alternatively, you can connect to the HBase master
directly to start the HBase web UI. For example, the following command will bring up theHBase UI
(use your HBase master server host name in place of “limulus”):
• $ firefox http://limulus:60010/master-status

Software Testing An ISTQB ISEB Foundation Guide Peter Morgan - Download the ebook today and own the complete content
100% (1)
Software Testing An ISTQB ISEB Foundation Guide Peter Morgan - Download the ebook today and own the complete content
47 pages
LCD3.3 Operators Handbook - EN (15459-7)
No ratings yet
LCD3.3 Operators Handbook - EN (15459-7)
78 pages
Made For Science Quanser Qube Servo 2 CoursewareStud MATLAB PDF
No ratings yet
Made For Science Quanser Qube Servo 2 CoursewareStud MATLAB PDF
75 pages
World Cup Analysis
No ratings yet
World Cup Analysis
15 pages
4.6 Key Terms, Review Questions, and Problems
No ratings yet
4.6 Key Terms, Review Questions, and Problems
4 pages
ITIL Foundation v3 2011 Test Exam 1
No ratings yet
ITIL Foundation v3 2011 Test Exam 1
8 pages
EXAM Oblicon 1stquiz Wans
No ratings yet
EXAM Oblicon 1stquiz Wans
6 pages
Apply - For - Dealership - Application Form PDF
No ratings yet
Apply - For - Dealership - Application Form PDF
15 pages
Categories and Objects
No ratings yet
Categories and Objects
5 pages
Sem 4th Java Practical File Modify
100% (1)
Sem 4th Java Practical File Modify
13 pages
Shubham Jade MSC It 31031420010 NLP Practical Journal
No ratings yet
Shubham Jade MSC It 31031420010 NLP Practical Journal
17 pages
BDA - Unit-2
No ratings yet
BDA - Unit-2
24 pages
DW and Olap
No ratings yet
DW and Olap
59 pages
Unit Ii
No ratings yet
Unit Ii
11 pages
Big Data Simplified: Book Description
No ratings yet
Big Data Simplified: Book Description
14 pages
14 99 188 242:8080:jspui:bitstream:123456789:13894:1:17VFSB7007
No ratings yet
14 99 188 242:8080:jspui:bitstream:123456789:13894:1:17VFSB7007
96 pages
Question Paper CSE 3RD SEM
No ratings yet
Question Paper CSE 3RD SEM
10 pages
Unit 6 Advanced Databases
No ratings yet
Unit 6 Advanced Databases
108 pages
Lecture Notes
No ratings yet
Lecture Notes
107 pages
Logic Programming
No ratings yet
Logic Programming
11 pages
2021 Lecture08 FirstOrderLogic PDF
No ratings yet
2021 Lecture08 FirstOrderLogic PDF
97 pages
Java Programs 1-10
No ratings yet
Java Programs 1-10
23 pages
4th Sem Notes
No ratings yet
4th Sem Notes
18 pages
General Purpose Application Software
No ratings yet
General Purpose Application Software
24 pages
Rule Based Systems
No ratings yet
Rule Based Systems
48 pages
BSC Reading PDF
No ratings yet
BSC Reading PDF
4 pages
IT Infrastructure 2
No ratings yet
IT Infrastructure 2
5 pages
Functional Dependency
No ratings yet
Functional Dependency
17 pages
CD Unit 5 PDF
100% (1)
CD Unit 5 PDF
16 pages
Assignment DSBDA
No ratings yet
Assignment DSBDA
12 pages
REIT and INVIT The Twin Cylinders of Infra Monetisation
No ratings yet
REIT and INVIT The Twin Cylinders of Infra Monetisation
33 pages
r20 - Aiml (CSM) Syllabus
No ratings yet
r20 - Aiml (CSM) Syllabus
175 pages
A Project Report On Smart Bell With Electronic Timetable Display
No ratings yet
A Project Report On Smart Bell With Electronic Timetable Display
83 pages
Problem Solving & Programming Notes - Unit - I
No ratings yet
Problem Solving & Programming Notes - Unit - I
46 pages
Pes University: 6 Semester Project Report On
No ratings yet
Pes University: 6 Semester Project Report On
70 pages
Mongodb Schema Validation
No ratings yet
Mongodb Schema Validation
8 pages
N5 Exam Prepartion Short Note Finalmade Md. Rony Hasan
No ratings yet
N5 Exam Prepartion Short Note Finalmade Md. Rony Hasan
14 pages
MCA Final Research Paper 2021
No ratings yet
MCA Final Research Paper 2021
8 pages
Noc20 Cs81 Assignment 01 Week 03
No ratings yet
Noc20 Cs81 Assignment 01 Week 03
5 pages
Wipro Mock Test 1
100% (1)
Wipro Mock Test 1
9 pages
Physics Std.6
No ratings yet
Physics Std.6
14 pages
B.tech CSE Syllabus AR16 Revised
No ratings yet
B.tech CSE Syllabus AR16 Revised
215 pages
SOP For MS
No ratings yet
SOP For MS
1 page
Document WPS Office
No ratings yet
Document WPS Office
3 pages
Women Safety
No ratings yet
Women Safety
10 pages
Xat 2023 Answer Key
No ratings yet
Xat 2023 Answer Key
69 pages
CSE211s Introduction To Embedded Systems: Interrupts
No ratings yet
CSE211s Introduction To Embedded Systems: Interrupts
13 pages
C S and Application UNIT-1
No ratings yet
C S and Application UNIT-1
243 pages
Programming Languages and Viruses
No ratings yet
Programming Languages and Viruses
35 pages
List and Tuples
No ratings yet
List and Tuples
4 pages
BDA (18CS72) Module-5
No ratings yet
BDA (18CS72) Module-5
52 pages
Nps PFM SchemeInfo
No ratings yet
Nps PFM SchemeInfo
4 pages
Dbms r19 - Unit-2 (Ref-2)
No ratings yet
Dbms r19 - Unit-2 (Ref-2)
27 pages
Playsar Report
No ratings yet
Playsar Report
102 pages
Diabetes Detection System
No ratings yet
Diabetes Detection System
35 pages
Scoring Package 3rd Eng 2021-22-1
No ratings yet
Scoring Package 3rd Eng 2021-22-1
48 pages
Problem Reduction AO Star
No ratings yet
Problem Reduction AO Star
24 pages
Sentia 2018 Brochure
No ratings yet
Sentia 2018 Brochure
28 pages
Web Question Bank
No ratings yet
Web Question Bank
6 pages
Grade 5 Chapter 1 The Fish Tale
No ratings yet
Grade 5 Chapter 1 The Fish Tale
15 pages
Ppsuc Manual r20
No ratings yet
Ppsuc Manual r20
98 pages
Digital SIGNATURE
No ratings yet
Digital SIGNATURE
4 pages
Time Crunch Worksheet
No ratings yet
Time Crunch Worksheet
48 pages
Big Data-UNIT-2
No ratings yet
Big Data-UNIT-2
46 pages
Prepared By: Manoj Kumar Joshi & Vikas Sawhney
No ratings yet
Prepared By: Manoj Kumar Joshi & Vikas Sawhney
47 pages
ML Module 5 2
No ratings yet
ML Module 5 2
32 pages
ML Module 5 1
No ratings yet
ML Module 5 1
37 pages
ML Notes Module2
No ratings yet
ML Notes Module2
16 pages
ML Problems Module2
No ratings yet
ML Problems Module2
10 pages
Review of The Faculty of Language What I PDF
No ratings yet
Review of The Faculty of Language What I PDF
14 pages
Changelog
No ratings yet
Changelog
6 pages
Rosemary - Curriculum - Vitae (1) 2
No ratings yet
Rosemary - Curriculum - Vitae (1) 2
3 pages
6C Queenstown - Worksheet, Answers, Script
0% (3)
6C Queenstown - Worksheet, Answers, Script
3 pages
wph13 01 Rms 20190815
No ratings yet
wph13 01 Rms 20190815
10 pages
Capability Statement 12072012 General
No ratings yet
Capability Statement 12072012 General
63 pages
The University of Edinburgh 2nd Year Problem Set
No ratings yet
The University of Edinburgh 2nd Year Problem Set
2 pages
Manifestation With Motion To Allow Oppositor To Conduct Engage in Business in The Properties Included in The Estate of The Late Teodoro Salanga JR
No ratings yet
Manifestation With Motion To Allow Oppositor To Conduct Engage in Business in The Properties Included in The Estate of The Late Teodoro Salanga JR
5 pages
Hec Course PDF
No ratings yet
Hec Course PDF
3 pages
PHD Thesis On Lung Cancer
100% (3)
PHD Thesis On Lung Cancer
8 pages
Y08 1028 PDF
No ratings yet
Y08 1028 PDF
8 pages
Physics 1 (Lesson Plan Upto Midterm_SPRING 2020)
No ratings yet
Physics 1 (Lesson Plan Upto Midterm_SPRING 2020)
13 pages
Dell Latitude 5480: Owner's Manual
No ratings yet
Dell Latitude 5480: Owner's Manual
103 pages
Exp.2 Screw Gauge
No ratings yet
Exp.2 Screw Gauge
7 pages
RJH60F7BDPQ-A0 Renesas
No ratings yet
RJH60F7BDPQ-A0 Renesas
9 pages
Materi Upper Intermediate Level
No ratings yet
Materi Upper Intermediate Level
3 pages
1997 06 The Khyber Pakhtunkhwa Consumers Protection Act 1997
No ratings yet
1997 06 The Khyber Pakhtunkhwa Consumers Protection Act 1997
19 pages
PRADA Assignment 1 1516
0% (1)
PRADA Assignment 1 1516
5 pages
JESR 2010 - 11 Final Report MEM
No ratings yet
JESR 2010 - 11 Final Report MEM
190 pages
Drawback
No ratings yet
Drawback
1 page
Result - Interview SC TO TC - 69
No ratings yet
Result - Interview SC TO TC - 69
2 pages
High 1 Workbook Answer
No ratings yet
High 1 Workbook Answer
10 pages
320 & 365 Kva PDF
No ratings yet
320 & 365 Kva PDF
156 pages
Vmware Exam NSX T Datacenter Prep Guide
No ratings yet
Vmware Exam NSX T Datacenter Prep Guide
5 pages
Cottony Fashions
No ratings yet
Cottony Fashions
6 pages