CM 4.5 Enterprise Help Guide
CM 4.5 Enterprise Help Guide
Cloudera, Inc.
220 Portage Avenue
Palo Alto, CA 94306
[email protected]
US: 1-888-789-1488
Intl: 1-650-362-0488
www.cloudera.com
Important Notice
2010-2013 Cloudera, Inc. All rights reserved.
Cloudera, the Cloudera logo, Cloudera Impala, Impala, and any other product or service names or
slogans contained in this document, except as otherwise disclaimed, are trademarks of Cloudera and its
suppliers or licensors, and may not be copied, imitated or used, in whole or in part, without the prior
written permission of Cloudera or the applicable trademark holder.
Hadoop and the Hadoop elephant logo are trademarks of the Apache Software Foundation. All other
trademarks, registered trademarks, product names and company names or logos mentioned in this
document are the property of their respective owners. Reference to any products, services, processes or
other information, by trade name, trademark, manufacturer, supplier or otherwise does not constitute
or imply endorsement, sponsorship or recommendation thereof by us.
Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights
under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval
system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or
otherwise), or for any purpose, without the express written permission of Cloudera.
Cloudera may have patents, patent applications, trademarks, copyrights, or other intellectual property
rights covering subject matter in this document. Except as expressly provided in any written license
agreement from Cloudera, the furnishing of this document does not give you any license to these
patents, trademarks copyrights, or other intellectual property.
The information in this document is subject to change without notice. Cloudera shall not be liable for
any damages resulting from technical errors or omissions which may be present in this document, or
from use of this document.
Version: 4.5
Date: March 31, 2013
Contents
ABOUT THIS GUIDE ............................................................................................................................................ 1
OTHER CLOUDERA MANAGER GUIDES ................................................................................................................................1
INTRODUCING CLOUDERA MANAGER ................................................................................................................ 1
CLOUDERA MANAGER ARCHITECTURE ................................................................................................................................3
What You Can Use Cloudera Manager to Do ........................................................................................................4
OVERVIEW OF USING CLOUDERA MANAGER FOR CONFIGURING SERVICES .................................................................................6
OVERVIEW OF USING CLOUDERA MANAGER FOR MONITORING SERVICES AND USER ACTIVITIES....................................................6
BASICS OF USING CLOUDERA MANAGER ............................................................................................................ 7
STARTING THE CLOUDERA MANAGER ADMIN CONSOLE .........................................................................................................8
ABOUT THE CLOUDERA MANAGER ADMIN CONSOLE .............................................................................................................9
Search Box...........................................................................................................................................................10
Running Commands Indicator .............................................................................................................................10
Configuration Validations Indicator ....................................................................................................................10
New Parcel Indicator ...........................................................................................................................................10
Support Menu .....................................................................................................................................................11
Help Menu...........................................................................................................................................................11
Logged-in User Menu ..........................................................................................................................................11
Administration ....................................................................................................................................................11
SELECTING A TIME RANGE ..............................................................................................................................................11
Current vs. Historical data ..................................................................................................................................12
ABOUT EVENTS AND ALERTS ...........................................................................................................................................14
Events ..................................................................................................................................................................14
Alerts ...................................................................................................................................................................15
ABOUT SERVICE, ROLE, AND HOST HEALTH .......................................................................................................................15
CLOUDERA MANAGER USER ACCOUNTS ...........................................................................................................................17
Changing Your Password ....................................................................................................................................17
Adding Cloudera Manager User Accounts ..........................................................................................................18
Deleting an Account ............................................................................................................................................18
SERVICES MONITORING ....................................................................................................................................18
MONITORING THE HEALTH AND STATUS OF SERVICES ..........................................................................................................19
Service Health and Status ...................................................................................................................................20
DELETING HOSTS........................................................................................................................................................106
USING THE HOST INSPECTOR ........................................................................................................................................107
Running the Host Inspector ...............................................................................................................................108
Viewing Past Host Inspector Results .................................................................................................................109
DECOMMISSIONING A HOST .........................................................................................................................................109
RE-RUNNING THE CLOUDERA MANAGER UPGRADE WIZARD ..............................................................................................110
MANAGING PARCELS ..................................................................................................................................................111
Downloading a parcel .......................................................................................................................................111
Distributing a Parcel..........................................................................................................................................111
Activating a parcel ............................................................................................................................................112
Deactivating a parcel ........................................................................................................................................112
Parcel Configuration Settings............................................................................................................................112
WORKING WITH HOST TEMPLATES ................................................................................................................................113
Creating a Host Template .................................................................................................................................114
Applying a Host Template to a Host .................................................................................................................114
RESOURCE MANAGEMENT ...........................................................................................................................................115
Resource Management via Control Groups (Cgroups) ......................................................................................115
Existing resource management controls ...........................................................................................................119
Examples ...........................................................................................................................................................119
ACTIVITY MONITORING ................................................................................................................................... 120
VIEWING ACTIVITIES ...................................................................................................................................................121
Selecting Columns to Show in the Activities List ...............................................................................................124
Sorting the Activities list ...................................................................................................................................124
Filtering the Activities list ..................................................................................................................................124
Activity Charts ...................................................................................................................................................125
VIEWING THE JOBS IN A PIG, OOZIE, OR HIVE ACTIVITY ......................................................................................................126
VIEWING A JOB'S TASK ATTEMPTS .................................................................................................................................127
Selecting Columns to Show in the Tasks List .....................................................................................................128
Sorting the Tasks List ........................................................................................................................................128
Filtering the Tasks List .......................................................................................................................................128
VIEWING ACTIVITY DETAILS IN A REPORT FORMAT ............................................................................................................129
COMPARING SIMILAR ACTIVITIES ...................................................................................................................................129
Guide
Cloudera Manager 4.5.x Release Notes
Cloudera Manager Installation Guide
Configuring Hadoop Security with Cloudera Manager
Configuring TLS Security for Cloudera Manager
Configuring Ports for Cloudera Manager
Cloudera Manager provides many useful features for monitoring the health and performance of the
components of your cluster (hosts, service daemons) as well as the performance and resource demands
of the user jobs running on your cluster.
With Cloudera Manager, you can easily deploy and centrally operate a complete Hadoop stack. The
application automates the installation process, reducing deployment time from weeks to minutes; gives
you a cluster-wide, real time view of the services running and the status of their hosts; provides a single,
central place to enact configuration changes across your cluster; and incorporates a full range of
reporting and diagnostic tools to help you optimize cluster performance and utilization.
Cloudera Manager provides full lifecycle management for Apache Hadoop.
Lets you install multiple clusters, with the choice of running CDH3 or CDH4 on a given cluster.
Gives you complete, end-to-end visibility and control over your Hadoop clusters from a single
interface.
Correlates jobs, activities, logs, system changes, configuration changes, service and host metrics
along a single timeline to simplify diagnosis.
Lets you set server roles, configure services and manage security across the cluster.
Maintains a complete record of configuration changes with the ability to roll back to previous
states.
Automatically deploys client configuration files for the services you have installed.
Supports HDFS High Availability using either Quorum-based storage (introduced with CDH 4.1)
for its shared directory, or an NFS-mounted shared edits directory.
Monitors dozens of service performance metrics and alerts you when you approach critical
thresholds.
Lets you gather, view and search Hadoop logs collected from across the cluster.
Creates and aggregates relevant Hadoop events pertaining to system health, log messages, user
services and activities and makes them available for alerting (by email) and searching.
Lets you drill down into individual workflows and jobs at the task attempt level to diagnose
performance issues.
Shows information pertaining to hosts in your cluster including status, resident memory, virtual
memory and roles.
Monitors the available space in log and other directories used by Cloudera Manager and CDH
components.
Provides operational reports on current and historical disk usage by user, group, and directory,
as well as MapReduce activity on the cluster by job or user.
Takes a snapshot of the cluster state and automatically sends it to Cloudera support to assist
with resolution.
You work primarily in the Cloudera Manager Admin Console in a web browser that is connected to the
Cloudera Manager Server, where you can manage the configuration settings, monitor the health of your
services, and monitor and track user activity on your cluster.
Tracks the Cloudera Manager data model, which is stored in the Cloudera Manager Server
database. The data model is a catalog of the available host machines in the cluster, and the
services, roles, and configurations assigned to each host.
Communicates with Agents to send configuration instructions and track Agents' heartbeats
Provides an Admin console for the operator to perform management and configuration tasks
Calculates and displays the health of the cluster and its components
Provides a comprehensive set of APIs for the various features supported in Cloudera Manager
Monitors the health of Hadoop daemons, and dozens of service performance metrics, and alerts
you when you approach critical thresholds.
Each Agent starts and stops Hadoop daemons on the local host machine and collects statistics (overall
and per-process memory usage and CPU usage, log tailing) for health calculations and status in the
Admin console.
Note
The Cloudera Manager Agent runs as root so that it can make sure the required directories are
created and that processes and files are owned by the appropriate user (for example, the hdfs
user and mapred user).
After First Run, you can use the Cloudera Manager Admin Console to:
Configure CDH while seeing suggested ranges of values for parameters and illegal values
highlighted; you can also configure override settings on specific hosts, and for specific role
instances.
Specify dependencies between services. Configuration changes for a service are propagated to
its dependent service
Generate CDH configurations for clients to use to connect to the cluster, and deploy those
configurations automatically to clients.
Download, distribute and activate a new CDH version (CDH 4.1.2 or later) all from within
Cloudera Manager.
Use the Cloudera Manager API to export or import deployment settings to and from clusters.
Display metrics about your jobs, such as the number of currently running tasks and their CPU
and memory usage.
Display metrics about your Hadoop services, such as the average HDFS I/O latency and the
number of jobs running concurrently.
Display metrics about your cluster, such as the average CPU load across all your machines.
Get assistance with configuring Kerberos security (Cloudera Manager generates and installs the
host and service key tab files for you.)
Temporarily suppress alerting for individual roles, services, hosts, or even the entire cluster to
allow maintenance/troubleshooting without generating excessive alert traffic.
Cloudera Manager also collapses several levels of CDH configuration abstraction into one. For example,
you can manage Java heap usage in the same place as Hadoop-specific parameters. Cloudera Manager is
internally secure, and you can configure the Admin Console and Agents to connect with the Server over
TLS.
cluster: which hosts are up or down, current resident and virtual memory consumption for a host, what
role instances are running on a host, which hosts are assigned to different racks, and so on. You can look
at a summary view for all hosts in your cluster or drill down for extensive details about an individual
host, including charts that provide a visual overview of key metrics on your host.
Monitoring User Activities
Activity Monitoring lets you see who's running what activities on the cluster, both at the current time
and through views of historical activity, and provides many statistics - both in tabular displays and charts
about the resources used by individual jobs. You can compare the performance of similar jobs and
view the performance of individual task attempts across a job to help diagnose behavior or performance
problems.
Searching Logs and Events
Cloudera Manager provides access to logs and events in a variety of ways that take into account the
current context you are viewing. For example, when monitoring a service, you can easily click a single
link to view the log entries related to that specific service, through the same user interface. When
viewing information about a user activity, you can easily view the relevant log or event and alert entries
that occurred on the hosts used by the job while the job was running.
You can also search independently for log entries or events and alerts by time range, service, host,
keyword.
The Event Server aggregates relevant Hadoop events and makes them available for alerting and for
searching, giving you a view into the history of all relevant events that occur cluster-wide.
Receiving Alert Notifications
You can configure Cloudera Manager to generate Alerts from a variety of events. You can configure
thresholds for certain types of events, enable and disable them, and configure email delivery of Alerts
on critical events. You can also suppress alerts temporarily for individual roles, services, hosts, or even
the entire cluster to allow system maintenance/troubleshooting without generating excessive alert
traffic.
Operational Reports
Reports provide an historical view into disk utilization by user, user group, and by directory. You can
manage your HDFS directories as well, including searching and setting quotas. You can also view cluster
job activity user, group, or job ID. These reports are aggregated over selected time periods (Hourly,
Daily, Weekly etc.) and can be exported as XLS or CSV files.
The following topics describe some of the basic features of the Cloudera Manager Admin Console.
Internet Explorer 9
Google Chrome
Safari 5
where:
<Server host> is the name or IP address of the host machine where the Cloudera Manager
Server is installed.
<port> is the port configured for the Cloudera Manager Server. The default port is 7180.
2. Log into the Cloudera Manager Admin Console. The admin user credentials are:
Username: admin
Password: admin
Note
For security, change the password for the default admin user account as soon as possible. You can
also add user accounts and selectively assign admin privileges to them as necessary. For
instructions, see the Changing the Password for an Account.
View the status and other details of a Service instance, or the Role instances associated with the
service
View the commands that have been run for a service or a role
Decommission a role
You can also perform actions unique to a specific type of service, for example:
Hosts: The Host Management section describes the features under the Hosts tab. Under this tab you
can:
View the status and a variety of detail metrics about individual hosts
View all the processes running on a host, with status, access to logs and so on
Activities: The Activities tab lets you monitor and manage MapReduce jobs running on your clusters.
The Activity Monitoring section of this Guide describes these features in detail.
Logs: The Logs page presents log information for Hadoop services, and lets you search by service, role,
host, and/or search phrase as well log level (severity).
Events: The Events page lets you to search for and display events and alerts about that have occurred
within a time range you select, anywhere in your cluster. See Searching for Events and Alerts for more
information. See Charting Time-series Data for more information.
Charts: The Charts page lets you search for metrics of interest and display them as charts. You can also
create custom chart views that can act as a personalized dashboard for your cluster. See Charting Timeseries Data for details.
Reports: The Reports tab lets you create reports about the usage of HDFS in your cluster, as well as
browse files and manage quotas for HDFS directories. See Viewing Reports for details.
Search Box
The Search box on the top navigation bar lets you search by Service, Role or Host name. You can enter a
partial name and it will search for all entities that match.
Running Commands Indicator
The indicator (
) to the left of the Support menu shows you how many commands are currently
running in the clusters you are managing. You can click on it to see a list of running commands in your
cluster. See Viewing Running and Recent Commands for more information.
Configuration Validations Indicator
The Configuration Validations indicator (
) to the left of the Running Commands indicator shows you
how many actionable validation notifications (Validation Errors and Warnings) are pending for your
cluster. The color of the badge with the number in it indicates the severity of the notifications Red
indicates a Validation Error, Orange indicates a Validation Warning, and Blue is informational. Click on
the indicator to pop up a dialog box where you can filter for and display these notifications. See
Configuration Validation Notifications for more information.
New Parcel Indicator
The New Parcel indicator (
) to the left of the Configuration Validations indicator shows you whether
a parcel for a newer version of your software is available for the clusters you are managing. Click on the
indicator to see a list of parcels available for your cluster. See Managing Parcels for more information.
10 | Cloudera Manager User Guide
This is also where you can change your own password the Change Password option opens the
Account Settings page where you can change your password.
Administration
Click the gear icon to display the Administration page. For details of the functions available from the
page, see Administration.
The background chart in the Time Range Selector bar shows the percentage of CPU utilization on all
machines in the cluster, updated at approximately one-minute intervals, depending on the total visible
time range. You can use this graph to identify periods of activity that may be of interest. Note that the
background chart appears even when the Time Range Selector handles are not available.
Current vs. Historical data
There are two ways to look at information about your cluster its current status and health, or its
status and health at some point (or during some interval) in the past.
Specifying a Point in Time for "Snapshot" Data
Health and Status information on pages such as the All Services page and Service status pages, reflects
the state at a single point in time (a snapshot of the health and status). By default, this is the status and
health at the current time. However, by moving the Time Marker ( ) to an earlier point on the time
range graph, you see the status as it was at the selected point in the past.
When the Time Marker is set to the current time, it is blue ( ). When it is set to a time in the
past, it is orange ( ).
When the Time Marker is set to a past time, you can quickly switch back to view the current
status using the Current Time button ( ).
When displayed data is from a single point in time (a snapshot) the panel or column will display
a small version of the Time Market icon ( ) in the panel. This indicates that the data
corresponds to the time at the location of the Time Marker on the Time Range Selector. The
Status and Health panels in the Service status pages are examples of this.
Note: When you are looking at a point in the past, some functions may not be available. For
example, on a Service Status page, the Actions menu (where you can take actions like stopping,
starting, or restarting services or roles) is accessible only when you are looking at Current status.
that appear on the individual Service Status and Host Status pages also show data over a time range.
For this type of display, there are several ways to select a time range of interest.
You can drag one (or both) edges of the highlighted area of the graph to expand or contract the
range for which data will be displayed. (See the handles noted in the illustration below.)
You can also grab and slide the highlighted area as a unit.
You can zoom the time line out or in to change the time scale and make it easier to drag the
time slider to an earlier or later time.
To look backwards from the present time, you can use preselected time chunks (the
past 30 minutes, the past hour, and several other intervals up to the past day). Select
among the options provided in the time selection widget.
To enter a specific start time and stop time, enter them into the fields provided.
When displayed data covers time interval rather than a single point in time, an icon
representative of the Time Range Selector ( ) appears in the header of the panel. This
indicates that the displayed data corresponds to the time range currently selected and
highlighted on the Time Range Selector. The Charts panel and the Logs and Events tabs shown
on the individual Service Status pages are examples of this.
When you are under the Activities tab with an individual activity selected, a Zoom to Duration
button is available. This lets you zoom the time selection to include just the time range that
corresponds to the duration of your selected activity.
Note: When you are looking historical information, some functions will not be available. For
example, the Actions menu, where you can take actions like stopping, starting, or restarting
services or roles, is accessible only when you are looking at Current status.
Zoom Out lets you show a longer time period on the time range graph (with correspondingly
less granular segmentation).
Zoom In shows a shorter time period with more detailed interval segments.
Zooming does not change your selected time range. However, the ability to zoom the Time
Range Selector can make it easier to use the selector to highlight a time range.
2. Enter either a start and end time, and Click OK to put your choice into effect,
or
Select a time interval relative to the current time such as the past 30 minutes, the past 6
hours, and so on using the buttons provided.
These events indicate that certain health check activities have occurred, or
that health check results have met specific conditions (thresholds).
These events are generated for certain types of log messages from HDFS,
MapReduce, or HBase services and roles. Log events are created when a log
entry matches a set of rules for identifying messages of interest. The default
set of rules is based on Cloudera's experience supporting Hadoop clusters.
You can configure additional log event rules if necessary.
Audit Events
Activity Events
These are events generated by the Activity Monitor; specifically, for jobs that
fail, or that run slowly (as determined by comparison with duration limits). In
order to monitor your workload for slow-running jobs, you need to set up
Activity Duration Rules.
For a specific health check, the returned result is normal or within the acceptable
range.
For a role or service, this means all health checks for that role or service are Good.
For a specific health check, the returned result indicates a potential problem. Typically
this means the test result has gone above (or below) a configured Warning threshold.
For a role or service, this means that at least one health check is Concerning.
For a specific health check, the check failed, or the returned result indicates a serious
problem. Typically this means the test result has gone above (or below) a configured
Critical threshold.
For a role or service, this means that at least one health check is Bad.
For a role or service, its health is Unknown. This can occur for a number of reasons,
such as the Service Monitor is not running, or connectivity to the agent doing the
health monitoring has been lost.
Health Checks have been disabled in the configuration for this service or role.
Cloudera Manager does not support the collection of historical information for this role
or service. This is the case for services such as ZooKeeper, Oozie, or Hue services
other than HDFS, MapReduce and HBase.
There are several types of health checks that can be performed, depending on the type of service or role
instance:
Simple pass/fail checks, such as a service or role started as expected, a DataNode is connected
to its NameNode, or a TaskTracker is (or is not) blacklisted. These checks result in the health of
that metric being either Good or Bad.
Metric-type tests, such as the number of file descriptors in use, the amount of disk space used
or free, how much time spent in garbage collection, or how many pages were swapped to disk in
the previous 15 minutes.
The results of these types of checks can be compared to threshold values that determine
whether everything is OK (e.g. plenty of disk space available), whether it is "Concerning" (disk
space getting low), or is Bad (a critically low amount of disk space).
HDFS (NameNode) and HBase also run a health test known as the "canary" test; it periodically
does a set of simple create, write, read, and delete operations to determine the service is
indeed functioning.
By default most health checks are enabled and (if appropriate) configured with reasonable thresholds.
You can modify threshold values by editing the Monitoring properties under Configuration tab for the
service. You can also enable or disable individual or summary health checks, and in some cases specify
what should be included in the calculation of overall health for the service or role. See Configuring
Monitoring Settings for more information.
Administrator privileges: Allows the user to add, change, delete, and configure services or
administer user accounts. Also, even if you are using an external authentication mechanism for
user authentication, users with Administrator privileges will also be able to log in to Cloudera
Manager using their local Cloudera Manager username and password. (This prevents the
system from locking everyone out if the external authentication settings get misconfigured.)
No Administrator privileges: User accounts that don't have Administrator privileges can view
services and monitoring information but they cannot add services or take any actions that affect
the state of the cluster.
When you are logged in to the Cloudera Manager Admin Console, the user name you are logged in as is
shown on the top navigation bar for example, if you are logged in as admin you will see this:
Important
As soon as possible after running the installation wizard and beginning to use Cloudera Manager,
you should use the following procedure to change the password for the default admin account, if
you have not already done so.
Services Monitoring
Services Monitoring
Cloudera Manager's Service Monitoring feature monitors dozens of service health and performance
metrics about the services and role instances running on your cluster. It presents health and
performance data in a variety of formats including interactive charts, monitors metrics against
configurable thresholds, generates events related to system and service health and critical log entries
Services Monitoring
and makes them available for searching and alerting, and maintains a complete record of service-related
actions and configuration changes.
Note
Impala and HBase monitoring are a separately-licensed features. To determine what license
capabilities you have, go to the License tab on the Administration page (see Administering
Licenses).
The following topics describe how to monitor the services and role instances installed on your cluster.
Monitor the health and status of the services running on your clusters.
Access the client configuration files generated by Cloudera Manager that enable Hadoop client
users to work with the HDFS, MapReduce, HBase, and YARN services you added. (Note that
Services Monitoring
these configuration files are normally deployed automatically when you install your cluster or
add a service).
Install an additional cluster. After initial installation, you can use the Add Cluster wizard to add
and configure an additional cluster. See Managing Multiple Clusters and Adding a Cluster for
more information on this topic.
You can also pull down a menu from an individual service name to go directly to one of the tabs for that
service to its status, instances, commands, configuration, audits, or charts tabs.
Service Health and Status
To view the status of your services, click the Services tab and select All Services.
The Services page opens and displays an overview of the service instances currently installed on your
cluster.
For each service instance, this page shows:
The type and number of the roles that have been configured for that service instance.
Note: By default, the All Services page shows the current state of the services in your cluster. By
moving the Time Marker ( ), you can see what the status was at any point in the past. When you
are looking at the past, the Actions menus and most other commands are disabled, and Role
Counts information may not be accurate. Click the Current Time button ( ) to return to the current
time.
See Selecting the Time Range for details of how time range selection works in Cloudera Manager.
Add a Service
After initial installation, you can use the Add a Service wizard to add and configure (but not start) new
service instances. The Add a Service... command is found under the cluster Actions menu for the
cluster where you want to add the service.
The cluster Actions menu, and thus the Add a Service... command, is not available if you are viewing
status for a point of time in the past.
See Adding Services for more information on this topic.
Services Monitoring
View the URLs of the Client Configuration Files
To allow Hadoop client users to work with the HDFS, MapReduce, YARN and HBase services you created,
Cloudera Manager generates client configuration files that contain the relevant configuration files with
the settings from your services. These files are deployed automatically by Cloudera Manager based on
the services you have installed, when you add a service, or when you add a Gateway role on a host.
You can download and distribute these client configuration files manually to the users of a service, if
necessary.
The Client Configuration URLs command on the cluster Actions menu opens a pop-up that displays links
to the client configuration zip files created for the services installed in your cluster. You can download
these zip files by clicking the link.
The Client Configuration URLs button is not available if you are viewing status for a point of time in the
past.
See Deploying Client Configuration Files for more information on this topic.
View the Health and Status of a Service Instance or Role Instance
To see the status of a service instance:
From the Services tab, select the service instance you want to see.
This will open the Status page where you can view a variety of information about a service and its
performance. See Viewing Service Status for details. To see the status of a role instance:
If there is just one instance of this role, this opens the Status tab for the role instance.
If there are multiple instances of a role, clicking the role link under Role Counts will open the Instances
tab for the service, showing instances of the role type you have selected. See Viewing Status for a Role
Instance for details.
If you are viewing a past point in time, the Role Count links will be greyed out, but still functional. Their
behavior will depend on whether historical data is available for the role instance.
Viewing the Maintenance Mode Status of a Cluster
Click the View Maintenance Mode Status button to view the status of your cluster in terms of
which components (service, roles or hosts) are in maintenance mode.
This pops up a dialog box that shows the components in your cluster that are in maintenance mode, and
indicates which are in effective maintenance mode as well as those that have been placed into
maintenance mode explicitly. (See Maintenance Mode for an explanation of explicit maintenance mode
and effective maintenance mode.)
Services Monitoring
From this dialog box you can select any of the components shown there, and remove them from
maintenance mode.
If individual services are in maintenance mode, you will see the maintenance mode icon next to the
Actions button for that service.
The View Maintenance Mode Status button is not available if you are viewing status for a point of time
in the past.
The Actions Menus
There are two Actions menu available on the All Services page: one for the cluster, and one for each
service.
Actions for a Cluster
There are multiple actions you can take at a cluster level:
Deploy the client configurations onto the appropriate nodes of the cluster, or view the client
configuration file URLs.
Stop, start, restart, or delete the service (the available actions depend on the current status of
the associated service for example, you cannot Start a Started service).
These actions are covered in the Services Configuration section of this document:
Services Monitoring
Viewing Service Status
To see the status of a Service instance:
Pull down the menu from the Services tab and select the service instance you want to see. OR
Click either the Status or Health value associated with the instance.
For all service types there is a Status and Health Summary that shows, for each configured role, the
overall status and health of the role instance(s).
Note: Not all service types provide complete monitoring and Health information. Hue, Oozie, Hive,
and YARN (CDH4 only) only provide the basic Status and Health Summary.
Impala also provides only basic status and health information if you do not have a license for
Impala monitoring.
Each service that supports monitoring provides a set of monitoring properties where you can enable or
disable health tests and events, and set thresholds for tests and modify thresholds for the status of
certain health checks. for more information see Configuring Monitoring Settings.
The HDFS, MapReduce, HBase, ZooKeeper, and Flume NG services also provide much additional
information: a snapshot of service-specific metrics, Health Test results, and a set of charts that provide a
historical view of metrics of interest.
Impala also provides this information if you have license installed that enables Impala Monitoring.
The Actions Menu
The Actions menu is available from the Service Status page when you are viewing Current time status.
The commands function at the Service level for example, Restart selected from this page will restart all
the roles within this service.
Some services provide additional commands that are unique to that service, such as HDFS.
Note: The Actions menu is only available when you are viewing Current status. The menu is
disabled if you are viewing a point of time in the past.
Services Monitoring
show information for the time range currently selected on the Time Range Selector (which defaults to
the past 30 minutes). By default, the information shown on this page is for the current time. You can
view status for a past point in time simply by moving the time marker ( ) to a point in the past.
When you move the time marker to a point in the past (for Services/Roles that support health history),
the Health Status clearly indicates that it is referring to a past time. A Current Time button ( ) is present
whenever you are viewing past status, to enable you to quickly switch to view the current state of the
service. In addition, the Actions menu is disabled while you are viewing status in the past to ensure
that you cannot accidentally take an action based on outdated status information.
See Selecting a Time Range for more details.
Started
For a service, this indicates the service is running, but at least one of its roles is
running with a configuration that does not match the current configuration
settings in Cloudera Manager.
For a role, this indicates a configuration change has been made that requires a
restart, and that restart has not yet occurred.
Starting
Starting
For a service, this indicates the service is starting up, but at least one of its
roles has a configuration that does not match the current configuration
settings in Cloudera Manager. For a role, this indicates a configuration change
has been made that requires a restart, and that restart has not yet occurred.
Stopping
The service or role is stopping: a stop command has been issued, but the
service (and its roles) have not finished shutting down.
Stopped
Services Monitoring
History
Unavailable
Cloudera Manager does not support historical information for this role or
service. This is the case for services such as ZooKeeper, Oozie, or Hue
services other than HDFS, MapReduce and HBase.
N/A
The service or role is not started or stopped in the same way as a regular
service or role. Examples are the HDFS Balancer (which runs from the HDFS
Rebalance action) or Gateway roles. The Start and Stop commands are not
applicable to these instances.
Unknown
The overall Health Status for a service is a roll-up of the health check results for the service and all its
role instances.
The Health status can be:
For a specific health check, the returned result is normal or within the acceptable range.
For a role or service, this means all health checks for that role or service are Good.
For a specific health check, the returned result indicates a potential problem. Typically
this means the test result has gone above (or below) a configured Warning threshold.
For a role or service, this means that at least one health check is Concerning.
For a specific health check, the check failed, or the returned result indicates a serious
problem. Typically this means the test result has gone above (or below) a configured
Critical threshold.
For a role or service, this means that at least one health check is Bad.
For a role or service, its health is Unknown. This can occur for a number of reasons, such
as the Service Monitor is not running, or connectivity to the agent doing the health
monitoring has been lost.
Health Checks have been disabled in the configuration for this service or role.
Cloudera Manager does not support the collection of historical information for this role
or service. This is the case for services such as ZooKeeper, Oozie, or Hue services
other than HDFS, MapReduce and HBase.
Services Monitoring
You can click either the Status or Health link for a role to drill down to see the details of the status and
health of the role instance(s). If there is a single instance of the role type, the link takes you directly to
the Role Instance page.
If there are multiple role instances (such as for DataNodes, TaskTrackers, RegionServers) a pop-up opens
to allow you to select the specific instances you want to see. Furthermore, this pop-up displays the
results for each health check that applies to this role type.
You can filter by an individual health check result. Click the result link an X appears by the link (as
shown in the illustration above) and only the instance(s) with that specific health status will appear in
the instances list. (Note that in the example above, although the filter was to look at an "Unknown"
result, the Health status of the instance is "Bad". This is because there is at least one "Bad" health check
associated with that instance.
Service Summary
Some services (specifically HDFS, MapReduce, HBase, Flume, and ZooKeeper) provide additional
statistics about their operation and performance.
These are shown in a Summary panel at the left side of the page. The contents of this panel depend on
the service for example:
The HDFS Summary shows read and write latency statistics and disk space usage.
Services Monitoring
The MapReduce Summary shows statistics on slot usage, jobs and so on.
The HBase Summary shows statistics about get and put operations and other similar metrics.
The Flume summary provides a link to a page of Flume metric details. See Flume Metric Details.
The ZooKeeper Summary provides links to the ZooKeeper role instances (nodes) as will as Zxid
information if you have a ZooKeeper Quorum (multiple ZooKeeper servers).
Other services such as Hue, Oozie, Impala, and Cloudera Manager itself, do not provide a Service
Summary.
Move your cursor over an individual metric to pop up a definition.
Health Tests
The Health Tests panel appears for HDFS, MapReduce, HBase, Flume, Impala, ZooKeeper, and the
Cloudera Manager service. Other services such as Hue, Oozie, and YARN, do not provide a Health Test
panel.
The Health Tests panel shows health test results in an expandable/collapsable list, typically with the
specific metrics that the test returned. (You can Expand All or Collapse All from the links at the upper
right of the Health Tests panel).
The color of the text (and the background color of the field) for a Health Test result indicates the
status of the results. The tests are sorted by their health status Good, Concerning, Bad, or
Disabled. The list of entries for good and Disabled health tests are collapsed by default;
however, Bad or Concerning results are shown expanded.
The text of a health test also acts as a link to further information about the test. Clicking the
text will pop up a window with further information, such as the meaning of the test and its
possible results, suggestions for actions you can take or how to make configuration changes
related to the test.
The help text for a health test also provides a link to the relevant monitoring configuration
section for the service. See Configuring Monitoring Settings for more information.
The small heatmap icon ( ) to the right of some of the tests takes you to a heatmap display that
lets you compare the values of the relevant test result metrics across the nodes of your cluster.
Charts
HDFS, MapReduce, HBase, ZooKeeper, Flume, and Cloudera Management Services all display charts of
some of the critical metrics related to their performance and health. Other services such as Hue, Oozie,
and Hive do not provide charts.
See Viewing Charts for Service, Role, or Host Instances for detailed information on the charts that are
presented, and the ability to search and display metrics of your choice.
Services Monitoring
Flume Metric Details
From the Flume Service Status page, click the Flume Metric Details link in the Flume Summary panel to
display details of the Flume agent roles.
On this page you can view a variety of metrics about the Channels, Sources and Sinks you have
configured for your various Flume agents. You can view both current and historical metrics on this page.
Note that the Flume configuration can be viewed under the Configuration tab, Agent category.
The Channels section shows the metrics for all the channel components in the Flume service. These
include metrics related to the channel capacity and throughput.
The Sinks section shows metrics for all the sink components in the Flume service. These include event
drain statistics as well as connection failure metrics.
The Sources section shows metrics for all the source components in the Flume service.
Note that this page maintains the same navigation bar as the Flume service status page, so you can go
directly to any of the other tabs (Instances, Commands, Configuration, or Audits). The Actions menu is
also available from this page.
Click the Configuration Validations indicator to display the Configuration Validations pop-up.
In the pop-up, the notifications at the highest severity level are shown, grouped by Service Name.
Click the message associated with a warning or error to be taken to the configuration property
for which the validation notification has been issued.
You can group the notifications in a number of ways, such as role Configuration Group, Entity
type, host name, and so on group by Service Name is the default. Pull down the arrow in
the Group By: field to select the group category.
Services Monitoring
Viewing Service Instance Details
For a selected service, the Instances tab shows all the role instances that have been instantiated for this
service.
To view the details of a service instance:
1. Click the Services tab on the top navigation bar and select All Services.
2. Click the service Name (or its Health Status) to go to the Status tab for that service.
3. Click the Instances tab on the Services navigation bar.
This shows all instances of all role types configured for the selected service.
You can also go directly to the Instances page to view instances of a specific role type by clicking one of
the links under the Role Counts column. This will show only instances of the role type you selected.
The Instances page displays the results of the configuration validation checks it performs for all the role
instances for this service.
Note: The information on this page is always the Current information for the selected service and
roles. This page does not support a historical view: thus, the Time Range Selector is not available.
Whether the role is currently in maintenance mode. If the role has been set into maintenance
mode explicitly, you will see the following icon (
to the service or its host having been set into maintenance mode, the icon will be this (
).
You can sort or filter the Instances list by criteria in any of the displayed columns.
To sort the Instances list:
1. Click the column header by which you want to sort.
A small arrow indicates whether the sort is in ascending or descending order.
Services Monitoring
To filter by Role, Status, Health, Decommissioned, or Maintenance Mode, select the value from
the drop-down search field at the top of the column.
To filter by Host or Rack, type the filter value in the search field.
From the Actions for Selected menu you can stop, start, restart, or delete a role, put a role into or
remove it from maintenance mode, and (for HDFS or HBase roles only) decommission or recommission a
role.
To take an action on one or more roles:
1. Check the checkbox next to the role instance(s) you want to act upon (or check the box to the
top of the list to select all role instances).
2. From the Actions for Selected menu, select the appropriate action. See Services Configuration
for details on these actions.
(Note that the Decommission action only applies to HDFS DataNodes, MapReduce TaskTrackers,
YARN NodeManagers, and HBase RegionServers.)
To add a role instance:
Select a Service instance to display the Status page for that service.
From the list of Roles, select one to display that role instance's Status page.
Services Monitoring
show information for the time range currently selected on the Time Range Selector (which defaults to
the past 30 minutes). By default, the information shown on this page is for the current time. You can
view status for a past point in time simply by moving the time marker ( ) to a point in the past.
When you move the time marker to a point in the past (for Services/Roles that support health history),
the Health Status clearly indicates that it is referring to a past time. A Current Time button ( ) is present
whenever you are viewing past status, to enable you to quickly switch to view the current state of the
service. In addition, the Actions menu is disabled while you are viewing status in the past to ensure
that you cannot accidentally take an action based on outdated status information.
See Selecting a Time Range for more details.
Role Summary
The Role Summary provides basic information about the role instance, where it resides, and the health
of its host.
Click the Host name to view the Host Status Details page for that host.
Started
For a service, this indicates the service is running, but at least one of its roles is
running with a configuration that does not match the current configuration
settings in Cloudera Manager.
For a role, this indicates a configuration change has been made that requires a
restart, and that restart has not yet occurred.
Starting
Starting
For a service, this indicates the service is starting up, but at least one of its roles
has a configuration that does not match the current configuration settings in
Cloudera Manager. For a role, this indicates a configuration change has been
made that requires a restart, and that restart has not yet occurred.
Stopping
The service or role is stopping: a stop command has been issued, but the service
(and its roles) have not finished shutting down.
Stopped
Services Monitoring
History
Cloudera Manager does not support historical information for this role or service.
Unavailable This is the case for services such as ZooKeeper, Oozie, or Hue services other
than HDFS, MapReduce and HBase.
N/A
The service or role is not started or stopped in the same way as a regular service
or role. Examples are the HDFS Balancer (which runs from the HDFS Rebalance
action) or Gateway roles. The Start and Stop commands are not applicable to
these instances.
Unknown
The overall Health Status for a role is a roll-up of the health check results for that role. The Health status
can be:
For a specific health check, the returned result is normal or within the acceptable range.
For a role or service, this means all health checks for that role or service are Good.
For a specific health check, the returned result indicates a potential problem. Typically
this means the test result has gone above (or below) a configured Warning threshold.
For a role or service, this means that at least one health check is Concerning.
For a specific health check, the check failed, or the returned result indicates a serious
problem. Typically this means the test result has gone above (or below) a configured
Critical threshold.
For a role or service, this means that at least one health check is Bad.
For a role or service, its health is Unknown. This can occur for a number of reasons, such
as the Service Monitor is not running, or connectivity to the agent doing the health
monitoring has been lost.
Health Checks have been disabled in the configuration for this service or role.
Cloudera Manager does not support the collection of historical information for this role
or service. This is the case for services such as ZooKeeper, Oozie, or Hue services other
than HDFS, MapReduce and HBase.
Services Monitoring
Health Tests
The Health Tests panel is shown for roles that are related to HDFS, MapReduce, or HBase. Roles related
to other services such as Hue, ZooKeeper, Oozie, and Cloudera Manager itself, do not provide a Health
Tests panel. The Health Tests panel shows health test results in an expandable/collapsable list, typically
with the specific metrics that the test returned. (You can Expand All or Collapse All from the links at the
upper right of the Health Tests panel).
The color of the text (and the background color of the field) for a Health Test result indicates the
status of the results. The tests are sorted by their health status Good, Concerning, Bad, or
Disabled. The list of entries for good and Disabled health tests are collapsed by default;
however, Bad or Concerning results are shown expanded.
The text of a health test also acts as a link to further information about the test. Clicking the
text will pop up a window with further information, such as the meaning of the test and its
possible results, suggestions for actions you can take or how to make configuration changes
related to the test.
The help text for a health test also provides a link to the relevant monitoring configuration
section for the service. See Configuring Monitoring Settings for more information.
The small heatmap icon ( ) to the right of some of the tests takes you to a heatmap display that
lets you compare the values of the relevant test result metrics across the nodes of your cluster.
Charts
Charts are shown for roles that are related to HDFS, MapReduce, HBase, ZooKeeper, Flume, and
Cloudera Management services. Roles related to other services such as Hue, Hive, Oozie, and YARN, do
not provide charts.
See Viewing Charts for Service, Role, or Host Instances for detailed information on the charts that are
presented, and the ability to search and display metrics of your choice.
Services Monitoring
The HDFS Instances Page with Federation and High Availability
If you have Federation or High Availability configured, the Instances page has a section at the top that
provides information about the configured Nameservices. This includes information about:
Links to the active and standby NameNodes and SecondaryNameNode (depending on whether
High Availability is enabled or not).
There is also an Actions menu for each Nameservice. From this menu you can:
Edit the list of mount points for the Nameservice (using the Edit... command)
From this page you can also add a NameService via the Add Nameservice button at the top of the page.
See Adding a NameService.
Services Monitoring
role instances (the NameNode, Secondary NameNode and DataNode instances) and even the command
that initially formatted HDFS on the NameNode.
This may be particularly useful if a service or role seems to be taking a long time to start up or shut
down, or if certain services or roles are not running or do not appear to have been started correctly. You
can view both the status and progress of currently running commands, as well as the status and results
of commands run in the past.
To view the commands that are running or have run for a Service or Role instance:
1. Click the Services tab on the top navigation bar.
2. Click the service Name to go to the Status tab for that service.
3. To view recent commands for a role, select the role instance name to go its Status tab.
4. Click the Commands tab on the Services navigation bar.
Running Commands
The Running Commands area shows commands that are currently in progress.
If a command is running, the Command Details section at the top shows:
The command
A progress indicator
While the command is In Progress, an Abort Command button will be present so that you can abort the
command if necessary.
If the command generates subcommands, this is indicated; click the command link to display the
subcommands in a Child Commands section as they are started. Each child command also has an Abort
button that is present as long as the subcommand is in progress.
The Commands information status is updated automatically while the command is running.
Once the command has finished running (all its subcommands have finished), the status is updated, the
Abort buttons disappear, and the information appears as described below for other Recent Commands.
Recent Commands
Recent Commands shows commands that were run and finished within the search time range you've
specified.
If no commands were run during the selected time range, you can click the Try expanding the time
range selection link. Each time you click the link it doubles the time range selection. If you are in the
"current time" mode, the beginning time will move; if you are looking at a time range in the past, both
the beginning and ending times of the range are changed. You can also change the time range using the
Services Monitoring
preset time ranges at the right side of the page, the Time Range Selector, or the Custom Time Range
panel (see Selecting a Time Range). Each entry shows:
Commands are shown with the most recent ones at the top.
The icon associated with the status (which typically includes the time that the command finished) plus
the result message tells you whether the command succeeded or failed. If the command failed, it
indicates if it was one of the subcommands that actually failed.
In many cases, there may be multiple subcommands that result from the top level command.
Click a command in the Recent Commands list to display its command details, and its child commands
(subcommands), if there are any.
The Command Details section at the top shows information about the command; its start and
end times, its progress (status), and a link to its parent command. The information includes:
o
You can use the Parent link near the top of the page to return to the parent command's details.
Note: If the parent is First Run, this indicates that this command was run as part of the initial
startup of your cluster. Clicking on this link takes you to the command history for the startup of
your cluster.
If the command included multiple steps, a Command Progress section may appear showing the steps
within the command and whether they succeeded.
The Child Commands section lists any subcommands of the selected command. This section
includes:
o
A result message
Services Monitoring
Click the Command link to display further command details (and any subcommands) of this
command. You can continue to drill down through a tree of subcommands this way.
Click the link in the Context column to go to the Status page for the component (host, service or
role instance) to which this command was related.
Services Monitoring
Removing a Filter
To remove a filter from a filter specification:
1. Click the at the right of the filter.
The filter is removed and the audit log redisplays all events that match the remaining filters.
Modifying a Filter
To modify a filter:
1. Click the filter.
The filter expands into separate property, operator, and value fields.
2. Modify the value of one or more fields.
3. Click Search.
A filter containing the property, operation, and value is added to the list of filters at the left and
the audit log redisplays all events that match the modified set of filters.
The Audit Log Display
Audit log entries are ordered (within the time range you've selected) with the most recent at the top.
The Audits tab lets you see the actions that have been taken for a Service or Role instance, and what
user performed them. The audit history includes actions such as creating a role or service, making
configuration revisions for a role or service, and running commands.
To view the Audit history for a Service:
1. Click the Services tab on the top navigation bar, then choose the service you want to see.
2. Click the Audits tab on the Services navigation bar.
To view the Audit history for a Role:
1. Click the Services tab on the top navigation bar, then choose the service you want to see.
2. Click the Instances tab on the Services navigation bar to show the list of role instances.
3. Select the Role whose audit history you want to see.
4. Click the Audits tab on the navigation bar for the role.
The Audit History provides the following information:
By User: The user name of the user that performed the action.
Services Monitoring
The audit history does not track the progress or results of commands it sees (such as starting or stopping
a service or creating a directory for a service) it just notes the command that was executed and the
user who executed it. If you want to view the progress or results of a command, you can look at Recent
Commands under the Commands tab.
If no actions were taken during the selected time range, you can click the Try expanding the time range
selection link. Each time you click the link it doubles the time range selection. If you are in the "current
time" mode, the beginning time will move; if you are looking at a time range in the past, both the
beginning and ending times of the range are changed. You can also change the time range using the
Time Range Selector or the Custom Time Range panel (see Selecting a Time Range).
Moving the mouse to a data point on the chart shows the details about that data point in a popup tooltip.
Click on a chart to expand it into a full-page view with a legend for the individual charted entities
as well more fine-grained axes divisions.
o
If there are multiple elements in the chart, you can check/uncheck the legend item to
hide or show that element on the chart.
Services Monitoring
When the mouse is over a chart, a down-arrow icon appears at the upper right. Click this to
display a menu where you can choose to edit the individual chart, or to remove the chart from
the Custom view.
When the mouse is over a chart, a Clone link appears at the bottom right of the chart. Click the
Clone link to duplicate the chart, make any modifications you want, and then save back to the
same page or to a different page.
Notice the "$HOSTID" portion of the query string: "$HOSTID" is a variable which will be resolved to a
specific value based on the page before the query is actually issued. In this case, "$HOSTID" will become
"server-1.my.company.com".
Services Monitoring
Context-sensitive variables are useful since they allow portable queries to be written. For example the
query above may be on the host status page or any role status page to display the appropriate host's
swap rate. Variables cannot be used in queries that are part of global views since those views have no
service, role or host context.
Copying and Editing a Chart
Editing a copy of a chart is just like editing a chart in the current view, except that you can save it to
another existing chart view, back to the current Custom view, or to a new chart view that you create.
You cannot save a chart to a Status tab Default view. You can copy a chart from any existing chart view,
including the Status tab Default chart view, and save it to any chart view except one of the Status tab
Default views.
To make a copy of a chart:
1. Move the cursor over the chart and click the Edit a Copy link at the bottom right of the chart.
This opens the Chart Search page with the chart you selected already displayed.
See Modifying Your Chart below for details on how you can modify an existing chart.
2. To save your chart to an existing view*
a. Click the down-arrow at the right of the Save as View... button to display a list of the
existing chart views.
b. Select the view to which you want to add the chart.
3. To save to a new view*
a. Click the the Save as View... button and enter a name for the new chart.
b. Your new chart view should appear in the menu under the top-level Charts tab.
4. Click your browser back button to return to your original chart view.
Adding a New Chart to the Custom View
From the Custom view under the Status tab of a service, host, or role, you can add new charts to the
view.
To add a new chart:
1. Click the Add button at the bottom of the page. This takes you to the Add To View page, with
variables preset for the specific service, role, or host where you want to add the view.
2. Select a metric from the List of Metrics, type a metric name or description into the Basic text
field, or type a query into the Advanced field.
3. Click Search.
Services Monitoring
The charts that result from your query are displayed, and you can modify their chart type,
combine them using facets, change their size and so on.
4. To add the new chart back to your chart view, click Add.
Note: If the query you've chosen has resulted in multiple charts, all the charts are added to the
view as a set. Although the individual charts in this set can be copied, you can only edit the set as a
whole.
To change the chart type, click one of the possible chart types on the left: Line, Stack Area, and
Bar or Scatter.
Facets
A time-series plot for a service, role, or host may actually be a composite of multiple individual timeseries. For charts shown under the Status tab for a service, role, or host, multiple time series may be
combined on a single chart. Facets let you chose how to group the different time series in a variety of
ways, based on the attributes of those time-series. For example, for a host, the Load Average chart
shows you the time-series data for average load at one- five- and ten-minute intervals, by default all on
a single chart. Using Facets you can choose to display each time-series as a separate chart. Depending
on the query, you can combine or separate time series based on attributes such as Service, Role type,
Hostname and so on.
Click on one of the facets to change the organization of the chart data. The number in
parentheses indicates how many charts will be displayed for that facet.
Services Monitoring
The X-axis is based on clock time, and by default shows the last one hour of data. You can change the
time range for your plot using the time range sets shown at the upper right of the window (right below
the Time Range Selector) or by expanding or shrinking the Time Range Selector.
Select a Service instance to display the Status page for that service.
From the list of Roles, select one to display that role instance's Status page.
This page shows the processes that run as part of this service role, with a variety of metrics about those
processes.
When you are set to the current time, you can link from this panel to the Web UI for the role, and see
(and view) the relevant configuration files and log files.
To see the location of a process' configuration files, and to view the Environment variable
settings, click the Show link under Configuration Files/Environment.
If the process provides a Web UI (as is the case for the NameNode, for example) click the link to
open the Web UI for that process
To see the most recent log entries, click the Show Recent Logs link.
To see the full log, stderr, or stdout log files, click the appropriate links.
If you are viewing a point in time in the past, this panel will be visible but greyed out (the data will still
show the current values and won't reflect the time marker position). However, the links to the
configuration and log files will still work.
Services Monitoring
The legend at the right shows the meaning of the colors in the example above it represents the health
of a given node.
The links in the drop-down menu let you view other related metrics.
Moving the cursor over a cell in the grid (as shown above) displays the name of the node, the role, the
rack assignment, and other information specific to the type of metric the map is displaying.
A second example shows metrics for a Regionserver. In this case, the color of each cell represents a
range of values (duration in milliseconds).
Services Monitoring
Moving the cursor over a cell displays the exact value for that cell, in addition to the node name and so
on.
For a service or role for which monitoring is provided, you can enable and disable selected
health checks and events, configure how those health checks factor into the overall health of
the service, and modify thresholds for the status of certain health checks. Cloudera Manager
supports this type of monitoring configuration for HDFS, MapReduce, HBase, ZooKeeper, and
Flume. It is also supported for Impala with an Impala monitoring license.
For individual hosts you can also disable or enable selected health checks, modify thresholds,
and enable or disable health alerts.
Each of the Cloudera Management Services has its own parameters that can be modified in
order to modify how much data is retained by that service. For some monitoring functions, the
amount of retained data can grow very large, so it may become necessary to adjust the limits.
For the Cloudera Management Services you can configure monitoring settings for the
monitoring roles themselves enable and disable health checks on the monitoring processes
as well as configuring some general settings related to events and alerts (specifically with the
Event Server and Alert Publisher).
In addition, you can configure the basic functions of Cloudera Manager's Management Services through
the standard configuration settings for the various management roles. For example, the mail server and
related properties for the Alerts Publisher are set under the Default set of Alert Publisher configuration
properties.
This section covers the following topics:
Configuring Alerts
For general information about modifying configuration settings, see Changing Service Configurations.
Services Monitoring
Configuring Health Check Settings
The initial monitoring configuration is handled during the installation and configuration of your cluster,
and most monitoring parameters have default settings. However, you can set or modify these at any
time.
Note: If alerting is enabled for events, you will be able to search for and view alerts in the Events
tab, even if you do not have email notification configured.
Services Monitoring
Configuring Directory Monitoring
Cloudera Manager can perform threshold-based monitoring of free space in the various directories on
the hosts its monitors such as log directories or checkpoint directories (for the Secondary
NameNode).
These thresholds can be set in one of two ways as absolute thresholds (in terms of MiBx/GiBs etc.) or
as percentages of space. As with other threshold properties, you can set values that will trigger events
at both the Warning and Critical levels.
If you set both thresholds, the Absolute Threshold setting will be used.
These thresholds are set under the Monitoring section of the Configuration page for each service.
Configuring Activity Monitor Events
The Activity Monitor monitors the MapReduce jobs running on your cluster. This also includes the
higher-level activities, such as Pig, Hive, and Oozie workflows that eventually are run as MapReduce
tasks. Currently the Activity Monitor does not support MapReduce v2 (YARN).
You can monitor for slow-running jobs or jobs that fail, and alert on these events. To detect jobs that
are running too slowly, you must configure a set of you must configure Activity Duration Rules that
specify what jobs to monitor, and what the limits on duration are for those jobs.
Activity Monitor-related events and alerts for MapReduce are configured via the Monitoring category
under the Configuration tab of the MapReduce services page.
To configure Activity Monitor settings for MapReduce:
1. Click the Services tab.
2. Select the MapReduce service instance.
3. Click the Configuration tab.
4. Click the Monitoring category at the bottom of the left-hand Category panel.
A "slow activity" alert occurs when a job exceeds the duration limit configured for it in an Activity
Duration Rule. Activity Duration Rules are not defined by default; you must configure these rules if you
want to see alerts for jobs that exceed the duration defined by these rules.
An Activity Duration Rule is a regular expression (used to match an activity name (Job ID)) combined
with a run time limit which the job should not exceed. You can add as many rules as you like, one per
line, in the Activity Duration Rules property.
The format of each rule is '<regex>=<number>' where the <regex> is a regular expression to match
against the activity name, and <number> is the job duration limit, in minutes. When a new activity
starts, each <regex> expression is tested against the name of the activity for a match.
The list of rules is tested in order, and the first match found is used.
Services Monitoring
Any activity named "foo" would be marked slow if it ran for more than 10 minutes.
Any activity named "bar" would be marked slow if it ran for more than 20 minutes.
Since full Java regular expressions can be used, if the rule set is:
foo.*=10
bar=20
In this case, any activity with a name that starts with foo (e.g. fool, food, foot) will match the first rule
(see http://download.oracle.com/javase/tutorial/essential/regex/).
If there is not a match for an activity, then that activity will not be monitored for job duration. However,
you can add a "catch-all" as the last rule which will always match any name:
foo.*=10
bar=20
baz=30
.*=60
In this case, any job that runs longer than 60 minutes will be marked slow and will generate an alert.
Configuring Log Events
You can enable or disable the forwarding of selected log events to the Event Server. This is enabled by
default, and is a service-wide setting (Enable Log Event Capture) for each service for which monitoring is
provided. Alerts for log events is disabled by default for all alerts.
To enable or disable log event capture:
1. Click the Services tab, and select the service instance you want to modify.
You can enable disable event capture for CDH services or for the Cloudera management
services.
2. Pull down the Configuration tab and select Edit.
3. Click the Monitoring category at the bottom of the left-hand Category panel.
4. Under Service Wide > Events and Alerts, modify the Enable Log Event Capture setting.
You can also modify the rules that determine how log messages are turned into events.
Editing these rules is not recommended.
48 | Cloudera Manager User Guide
Services Monitoring
For each role, there are rules that govern how its log messages are turned into events by the custom
log4j appender for the role. These are defined in the Rules to Extract Events from Log Files property for
each HDFS, MapReduce and HBase role, and for ZooKeeper, Flume agent, and monitoring roles as well.
To configure which log messages become events:
1. Click the Services tab, and select the service instance you want to modify.
2. Pull down the Configuration tab and select Edit.
3. Click the Monitoring category at the bottom of the left-hand Category panel.
4. Select the Configuration Group for the Role for which you want to configure log events, or
search for "Rules to Extract Events from Log Files".
Note that for some roles there may be more than one configuration group, and you may need to
modify all of them. The easiest way to ensure that you have found all occurrences of the
property to need to modify is to search for the property by name; Cloudera Manager will show
all copies of the property that match the search filter.
5. Edit these rules as needed.
A number of useful rules are defined by default, based on Cloudera's experience supporting Hadoop
clusters. For example:
The line {"rate": 10, "threshold":"FATAL"}, means log entries with severity FATAL
should be forwarded as alerts, up to 10 a minute.
The syntax for these rules is defined in the Description field for this property: basically, the syntax lets
you create rules that identify log messages based on log4j severity, message content matching, and/or
the exception type. These rules must result in valid JSON. You can also specify that the event should
generate an alert (by setting "alert":true in the rule).
Note that if you specify a content match, the entire content must match if you want to match on a
partial string, you must provide wildcards as appropriate to allow matching the entire string.
Editing these rules is not recommended. Cloudera Manager provides a default set of rules that should
be sufficient for most users.
Configuring Alerts
You can configure alerts to be delivered by email, or sent as SNMP traps. These configurations are set
under the Alert Publisher role of the Cloudera Manager management service. See Configuring Alert
Delivery.
Note that if you just want to add to or modify the list of alert recipient email addresses, you can do this
starting at the Alerts tab under the Administration page, accessed with the gear icon .
Services Configuration
You can also send a test alert e-mail from the Alerts tab under the Administration page.
Enabling Health Checks for Cloudera Management Services
The Cloudera Manager management service provides health checks for its own roles.
You can enable or disable these health checks for each management service. (Role-based health checks
are enabled by default).
You can also set a variety of thresholds for specific roles such as thresholds for log directory free space.
Configuring Cloudera Management Services Database Limits
Each Cloudera Management Service maintains a database for retaining the data it monitors. These
databases (as well as the log files maintained by these services) can grow quite large. For example, the
Activity Monitor maintains data at the service level, the activity level (MapReduce jobs and aggregate
activities), and at the task attempt level. Limits on these data sets are configured when you install your
management services, but you can modify these parameters through the Configuration settings in the
Cloudera Manager Admin console, for each management service.
For example, the Event Server lets you set a total number of events you want to store. Host Monitor and
Service Monitor let you set data expiration thresholds (in hours), and Activity Monitor gives you "purge"
settings (also in hours) for the data it stores. There are also settings for the logs that these various
services create. You can throttle how big the logs are allowed to get and how many previous logs to
retain.
To change any of the data retention or log size settings:
1. From the Services tab, select the Cloudera Management Services service instance.
2. Pull down the Configuration tab and click Edit.
3. In the left-hand column, select the configuration group for the role whose configurations you
want to modify.
(Note that the management services are singleton roles so there will be only a Base
configuration group for the role.)
4. For some services, such as the Activity Monitor, Service Monitor, or Host Monitor, the purge or
expiration period properties are found in the top-level settings for the role.
Typically, Log file size settings will be under the Logs category under the role configuration
group.
Services Configuration
Cloudera Manager's Services Configuration features let you manage the deployment and configuration
of your cluster. You can add new services and roles if needed, gracefully start, stop and restart services
or roles, and decommission and delete roles or services if necessary. Further, you can modify the
configuration properties for services or for individual role instances, with an audit trail that allows you to
Services Configuration
role them back if necessary. You can also generate client configuration files, enabling you to easily
distribute them to the users of a service.
The following topics describe how to configure and use the services on your cluster.
Adding Services
Rolling Restart
Renaming a Service
Adding Services
After initial installation, you can use the Add a Service wizard to add and configure new service
instances. For example, you may want to add a service such as Oozie that you did not select in the
wizard during the initial installation.
If you have installed a CDH4 cluster, you can use Add a Service to add the YARN service to enable
MapReduce version 2 (MRv2). By default, the initial installation configures and enables only the original
MapReduce service (though you can use the Custom option in the wizard to include it). If you add both
versions of MapReduce, the original MapReduce service will be given a higher alternatives priority by
default, so that the MRv1 configuration will take priority. (You can change this by changing the values of
the Alternatives Priority in the MapReduce or YARN configuration settings.)
Services Configuration
You can also use Add a Service to install Flume NG. Because Flume requires the addition of a
configuration file to specify the agent configuration, it must be added separately after the wizard has
finished.
As of Cloudera Manager 4.5, Cloudera Impala can be added using the initial installation wizard, and does
not need to be added separately.
Important
The current upstream MRv2 release is not considered stable at this time, and the current Impala
release is beta software. Therefore, these are not recommended for use in production at this time.
Adding a Service
1. Click the Services tab, then choose All Services.
2. From the Actions menu, select Add a Service.
A list of possible services are displayed. You can add one type of service at a time.
3. Follow the instructions in the Add Service wizard to add the service.
As you go through the wizard pages, Cloudera Manager will recommend assignments of service
roles to hosts based on the host properties and existing roles on the host; you can modify these
assignments if necessary. Cloudera Manager will also recommend configuration settings, such as
data directory paths and heap sizes. You can modify the settings as indicated before continuing.
o
If you click Skip in the wizard's configuration settings page, Cloudera Manager will
create the service and its roles without the configuration settings, and you will need to
configure the settings later in the Service > Configuration tab for the new service.
If the new service is not dependent on another service for example, ZooKeeper
and if you continued with the recommended configurations, the service is configured
and started automatically. If you skipped the configuration settings page, the new
service will not be configured or started automatically. You must configure the settings
for the new service in the Service > Configuration pages and then start it.
If you added a service that is dependent on another service for example, HBase is
dependent on HDFS and ZooKeeper the new service is not started automatically if the
dependent service has an outdated configuration.
Services Configuration
o
If you added a service that is dependent on another service that was stopped at the
time you used the Add Service wizard, the wizard will start the dependent service for
you and perform any other steps required to prepare the cluster for the new service.
When you are ready, start the new service.
If you added a service that is dependent on another service that was already started at
the time you used the Add Service wizard, its configurations might be out of date if you
continued with the recommended configurations. Or, you may need to update its
configurations so that the new service works correctly. Restart the dependent service
before you start the new service.
Note
For information about the order in which to start services, see Starting, Stopping, and Restarting
Services.
You can verify the new service is started properly by navigating to Services > Status and checking the
health status for the new service. If the Health Status is Good, then the service is started properly.
Important
The current upstream MRv2 release is not yet stable, and should not be considered productionready at this time. It is given by default a lower alternatives priority than MRv1.
Services Configuration
1. Choose Create Root Directory from the Actions menu in the HBase > Status tab.
2. Click Create Root Directory again to confirm.
By default, the original MapReduce (MRv1) is set to a higher alternatives priority than YARN. Therefore,
even though you add and start the YARN service, MapReduce jobs will still be run under MRv1. If you
want to use YARN to run jobs, you will need to change its alternatives priority to be higher than MRv1.
To add the YARN service and configure it to have a higher priority than MRv1, do the following:
54 | Cloudera Manager User Guide
Services Configuration
Adding Flume
The Flume NG service must be added separately from the wizard; the packages are installed by the
installation wizard, but the agents are not configured or started as part of First Run. As part of adding
Flume as a service, you should first configure your Flume agents before you start those role instances.
For details of how to modify configurations and use configuration overrides in Cloudera Manager, see
Changing Service Configurations.
For detailed information about Flume agent configuration, see the Flume User Guide. To install Flume
agents on your cluster:
1. Follow the initial steps (above) to select Flume as the service to be added.
2. Select the hosts on which you want Flume agents to be installed.
3. Click Continue and the Flume agents are installed on the nodes you've selected.
The Flume agents are not started automatically. You must first configure your agents appropriately
before you start them, following the instructions below.
Services Configuration
A default Flume flow configuration is provided as an example in the Configuration properties for the
flume agents; you should replace this with the your own configuration. The default configuration
provides configuration for a single agent.
A single configuration file can contain the configuration for multiple agents, since each configuration
property is prefixed by the agent name. You can then set the agents' names using role instance
configuration overrides to specify the configuration applicable to each agent. Note that different agent
role instances can have the same name. The agent names do not have to be unique. You can use this to
further simplify the configuration file. This is the recommended method to configure Flume.
Flume NG can be installed on a cluster running either CDH3 or CDH4. However, monitoring of Flume is
only supported if your cluster is running CDH4.1 or later, or CDH3u5 (refresh 2) or later.
Note: If you are using Flume to write to HDFS or HBase sinks, you must have at least one HDFS or
HBase role instance on the Flume agent's host. If you do not want to run a daemon on the Flume
agent's host, you can just add a Flume Gateway role on the host.
Services Configuration
1. Pull down the Flume service Configuration tab, select Edit and the select the Agent (Base)
configuration group in the left hand column.
2. To override the Agent Name for one or more instances, move your cursor over the value area of
the Agent Name property, and click Override Instances.
3. Select the agent (role) instances you want to override.
4. In the field labeled Change value of selected instances to: select "Other".
(You can use the "Inherited Value" setting to return to the service-level value.)
5. In the field that appears, type the agent name you want to use for the selected agents.
6. Click Apply to have your change take effect.
After you have completed your configuration changes, you can start the Flume service, which will start
all your Flume agents.
Note
If you need to modify your Flume configuration file after you have started the Flume service, you
can use the Update Config... command from the Actions menu on the Flume Service Status page to
update the configuration across flume agents without having to shut down the Flume service.
The current Impala release is beta software and not recommended for use in production at
this time.
Cloudera Manager 4.5 supports Cloudera Impala beta version 0.6 running with CDH 4.2;
earlier beta versions are not supported.
You can install Cloudera Impala through the Cloudera Manager installation wizard, using either parcels
or packages, and have the service started as part of the First Run process. All configuration settings,
including the Hive metastore setup, are handled by Cloudera Manager as part of the installation wizard.
See Installation Path A - Automated Installation by Cloudera Manager for more information.
If you elect not to include the Impala service using the Installation Wizard, you can you the Add Service
wizard to perform the installation.
Services Configuration
Impala depends on ZooKeeper, HDFS, HBase, and Hive. All these services must be present in order to
run the Impala service.
Simply follow the steps in the Add Service wizard. It will automatically configure and start the
dependent services and the Impala service.
Services Configuration
6. The wizard finishes by performing any actions necessary to prepare the cluster for the new role
instances. For example, new DataNodes are added to the NameNode's
dfs_hosts_allow.txt file.
The new role instance is configured with the Base configuration group for its role type, even if
there are multiple configuration groups for the role type. If you want to use a different role
configuration group, you can go to the Group Management page under the Configuration tab
for the service, and move the role instance to a different configuration group. See Managing
Configuration Groups for more information.
7. The new role instances are not started automatically. You can start them on the service's
Instances page.
Adding ZooKeeper Roles
If you add ZooKeeper nodes to an existing ZooKeeper service, you must initialize the data directories for
the new nodes (role instances) before you restart the ZooKeeper service.
1. Add new ZooKeeper role instances as described in the steps above.
2. Go to the Instances tab for the ZooKeeper service. Your newly added roles should show their
status as Stopped.
3. From the Actions menu at the top of the page, select Initialize....
4. Confirm that you want to perform this action. Note that the dialog will inform you that the
action cannot be performed on your previously-existing ZooKeeper nodes.
5. When this action has completed, you can then restart the ZooKeeper service. This will start the
new nodes as well as update the configuration for the existing nodes.
When you start the ZooKeeper service after you have added new nodes, the original node will have the
datastore, but the datastores of the new nodes will be empty. Therefore, you must ensure that the
original node is included when the new quorum is started up. If the new nodes are able to form a
quorum without the original node being included, then the ensemble will have an empty datastore. You
can avoid this by starting up just the original node plus one of the new nodes and allowing those to form
a quorum, resulting a quorum with the datastore from the original node. You can then add the other
new nodes.
Services Configuration
role configuration group includes a set of configuration properties for that role type, as well as a list of
role instances associated with that configuration group. Cloudera Manager automatically creates a base
role configuration group for each role type.
Certain role types specifically those that allow multiple instances on multiple nodes, such as
DataNodes, TaskTrackers, RegionServers allow the creation of additional role configuration groups
that differ from the base configuration. Each role instance can be associated with only a single role
configuration group.
Note that when you run the installation or upgrade wizard, Cloudera Manager automatically creates the
appropriate base configurations for the roles it adds. It may also create additional configuration groups
for a given role type, if necessary. For example, if you have a DataNode role on the same host as the
NameNode, it may require a slightly different configuration that DataNode roles running on other hosts.
Therefore, Cloudera Manager will create a separate configuration group for the DataNode role that is
running on the NameNode host, and use the base DataNode configuration for the DataNode roles
running on other hosts.
You can modify the settings of the base role configuration group, or you can create new configuration
groups and associate role instances to whichever role configuration group is most appropriate. This
simplifies the management of role configurations when one group of role instances may require
different settings than another group of instances of the same role type for example, due to
differences in the hardware the roles run on.
For information on creating a new Configuration Group, see Managing Configuration Groups.
Certain roles, such as the CDH3 NameNode and SecondaryNameNode, provide only a base configuration
group, as only one instance of the role can exist in the cluster. You cannot create additional
configuration groups for those roles.
You modify the configuration for any of the service's role configuration groups through the
Configuration tab for the service. You can also override the settings inherited from a role configuration
group for a given role instance, if necessary see Overriding Configuration Settings.
Role configuration groups provide two types of properties: those that affect the configuration of the
service itself (the Default section in the left-hand panel) and those that affect monitoring of the service,
if applicable (the Monitoring section). (Not all services have monitoring properties). For more
information about monitoring properties see Configuring Monitoring Settings.
Important
If you change configuration settings in the Configuration tab after you have started the service or
instance, you may need to restart the service or instance. If you need to restart, a message is
displayed at the top of the Configuration tab. For more information, see Restarting Services and
Instances after Configuration Changes.
Services Configuration
Changing the Configuration of a Service or Role
To change configuration settings for a service or role:
1. Click the Services tab and select the service you want to modify.
2. Pull down the Configuration tab at the top of the window and select Edit.
The left hand panel organizes the configuration properties into into categories; first those that
are Service-Wide, followed by role configuration groups for each role type within the service.
Each configuration group shows its own set of properties, organized by function Advanced
properties are listed separately for each configuration group.
If you have created additional configuration groups they will appear in this panel and you can
modify them just as you can the base configuration group. For example, if during installation,
Cloudera Manager determined that a different set of configuration values is needed for the
DataNode colocated with the NameNode, you might see two categories in the Category panel
DataNode (Base) and HDFS-1-DATANODE-1 (where HDFS-1-DATANODE-1 is the configuration
group Cloudera Manager created for the DataNode instance colocated with the NameNode
role).
3. Under the appropriate role configuration group, select the category for the properties you want
to change.
4. To search for a text string (such as "safety valve"), in a property, value, or description, enter the
text string in the Search box at the top of the Category list.
5. Moving the cursor over the value cell highlights the cell; click anywhere in the highlighted area
to enable editing of the value. Then type the new value in the field provided (or check or
uncheck the box, as appropriate).
o
To facilitate entering some types of values, you can specify not only the value, but also
the units that apply to the value. for example, to enter a setting that specifies bytes per
second, you can choose to enter the value in bytes (B), KiBs, MiBs, or GiBs selected
from a drop-down menu that appears when you edit the value.
To remove the value you entered, click the Reset to the default value link.
If the property allows a list of values, click the Plus icon to the right of the edit field to
add an additional field. An example of this is the DataNode Data Directory property,
which can have a comma-delimited list of directories as its value.
To remove an item from such a list, click the Minus icon to the right of the field you
want to remove.
6. Click Save Changes to commit the changes. You can add a note that will be included with the
change in the Configuration History.
This will change the setting for the configuration group, and will apply to all role instances
Services Configuration
Services Configuration
View the list of role instances that have overridden the value specified in the Configuration
Group. Use the selections on the drop-down menu below the Value column header to view a list
of instances that use the inherited value, instances that use an override value, or all instances.
This view is especially useful for finding inconsistent settings in a cluster. You can also use the
Host and Rack text boxes to filter the list.
Change the override value for the role instances to the inherited value from the associated
Configuration Group. To do so, select the role instances you want to change, choose Inherited
Value from the drop-down menu next to Change value of selected instances to and click Apply.
Change the override value for the role instances to a different value. To do so, select the role
instances you want to change, choose Other from the drop-down menu next to Change value of
selected instances to. Enter the new value in the text box and then click Apply.
Services Configuration
Using a Configuration Safety Valve
Found in the Advanced category (usually under a Role Configuration Group) a Safety Valve configuration
setting lets you insert an XML text string into the configuration file, such as hdfs-site.xml or
mapred-site.xml, owned by a service or role. It is intended for advanced use in case there is a
specific Hadoop configuration setting that you find is not exposed in Cloudera Manager; contact
Cloudera Support if you are required to use it.
For example, there are several safety valves for the NameNode role under the HDFS service
Configuration tab, NameNode (Base) configuration group, Advanced subcategory. There are a number
of Safety Valve properties that affect various configuration files; the Description field tells you into
which configuration file your additions will be placed. For example, one NameNode safety valve
property is called the NameNode Configuration Safety Valve for hdfs-site.xml; settings you enter here
will be inserted verbatim into the hdfs-site.xml file associated with the NameNode thus each
value you enter into that configuration safety valve must be a valid xml property definition, for example:
To see a list of safety valve settings that apply to a specific configuration file, you can enter the
configuration file name in the search field and filter for all safety valves that affect that file. For
example, searching for mapred-site.xml will show all the safety valve settings that have mapredsite.xml in their descriptions.
Another example of a safety valve is an environment safety valve, such as the HDFS Service
Environment Safety Valve found under the Service-Wide Advanced settings for HDFS. The
key/value pairs you specify in an environment safety valve for a service or role are inserted verbatim
into the role's environment. Service-wide safety valve values apply to all roles in the service; a safety
valve value for a role configuration group apply to all instances of the role associated with that
configuration group.
Services Configuration
Restarting Services and Instances after Configuration Changes
If you change the configuration settings after you start a service or instance, you may need to restart the
service or instance to have the configuration settings become active. If you need to restart, a message is
displayed at the top of the Configuration tab when you save your changes.
Note: If you change configuration settings at the service level that affect a particular role only (such
as all DataNodes but not the NameNodes), you can restart only that role; you do not need to
restart the entire service. If you changed the configuration for a particular instance only (such as
one of four Datanodes), you may need to restart only that instance.
Services Configuration
1. Click the Services tab and select the service where you want to create a new role configuration
group.
2. Pull down the Configuration tab at the top of the window and select Group Management.
3. Click Create New Group.
4. Provide a name for the group.
5. Select the type of role for the group.
You will only be able to select role types that allow multiple instances, and that exist the Service
you have selected.
6. In the Copy From field, select the source of the basic configuration information for this
configuration group; you can use any existing Configuration group of the appropriate type, or
you can choose None and the configuration group will be set up with all generic default values.
Note: If you select None as the source, the default values are not the same as the values
Cloudera Manager sets in the base configuration group, as Cloudera Manager specifically
sets the appropriate configuration properties for the services and roles it installs. After you
create your group using None you must edit the configuration to set missing properties (for
example the TaskTracker Local Data Directory List property, which is not populated if you
select None) and clear other validation warnings and errors.
To rename the configuration group, click the Rename button, then enter the new name
in the pop-up window, and click Rename to complete the action.
You cannot rename a base group.
To delete the configuration group, click the Delete button, and confirm you want to
perform the delete.
You cannot delete any of the base groups, and you cannot delete a group if it has role
Services Configuration
instances associated with it. If you want to delete a group you've created, you must
move any role instances to a different configuration group.
4. To move a role to a different configuration group:
a. Select the role instance(s) you want to move
b. Pull down the Actions for Selected menu and select Move...
c. In the pop-up that appears, select the group to which you want to move your selected
role instance, and click Move.
By default, or if you click Show All, a list of all revisions is shown. If you are viewing a
Service or Role Instance, all Service/Configuration Group related revisions are shown. If
you are viewing a Host or All Hosts, all Host/All Hosts related revisions are shown.
To list only the configuration revisions that were done in a particular time period, use
the Time Range Selector to select a time range. Then, click Show within the Selected
Time Range.
For a host instance, Revision Details shows just a message, date and time stamp, and the user.
Services Configuration
Services Configuration
Services Configuration
4. Oozie
5. Hue
6. HBase
7. ZooKeeper
8. YARN
9. MapReduce
10. HDFS
Restarting a Service
It is sometimes necessary to restart a service, which is essentially a combination of stopping a service
and then starting it again. For example, if you change the hostname or port where the Cloudera
Manager is running, or you enable TLS security, you must restart the Cloudera Management Services to
update the URL to the Server.
Note
If you need to restart all services, you should stop them all first and then start them all again in the
order described above. It is not possible to restart all of the services in the correct order using the
Restart command.
To restart a service:
1. Choose All Services from the Services tab.
2. Choose Restart on the Actions menu for the service you want to restart. Click Restart that
appears in the next screen to confirm. When you see a Finished status, the service has
restarted.
Rolling Restart
Rolling restart allows you to conditionally restart the role instances of your HDFS, MapReduce, HBase,
ZooKeeper, and Flume services. Note that if the service is not running, Rolling Restart is not available.
You can do a rolling restart of each of these services individually.
If you have High Availability enabled, you can also perform a cluster-level rolling restart. You cannot
perform a cluster-level rolling restart unless you have High Availability enabled.
Restarting an Individual Service
You can initiate a rolling restart from either the Service page for one of the eligible services, or from the
service's Instances page, where you can select individual roles to be restarted.
Services Configuration
1. From the Services page (or the Services tab) select the service you want to restart.
2. From the service's Actions menu, select Rolling Restart...
OR
a. Go to the Instances tab.
b. Select the roles you want to restart.
c. Select Rolling Restart from the Actions for Selected menu.
3. In the pop-up dialog box, select the options you want:
o
You can choose to restart only roles who's configurations are stale
You can choose to restart only roles that are running outdated software versions.
4. If you have a significant number of slave roles (DataNodes, TaskTrackers, RegionServers) you can
have those restarted in batches. You can configure:
o
How many roles should be included in a batch (the default is one, so individual roles will
be started one at a time).
How long should Cloudera Manager wait before starting the next batch.
The number of batch failures that will cause the entire rolling restart to fail (this is an
advanced feature).
Services Configuration
Restarting ZooKeeper or Flume
For both ZooKeeper and Flume, the option to restart roles in batches is not available. They are always
restarted one by one.
Restarting a Cluster
Note: Rolling Restart for a cluster is available ONLY if you have High Availability enabled. In order
to avoid having your cluster go down during the restart, Cloudera Manager will force a failover to
the Standby NameNode while the critical roles are being restarted.
1. If you have not already done so, enable High Availability. See Configuring HDFS High Availability
for instructions. You do not need to enable Automatic Failover for rolling restart to work,
though you can enable it if you wish. Automatic Failover does not affect the rolling restart
operation.
2. From the Actions menu for the cluster you want to restart (found on the All Services page)
select Rolling Restart....
3. In the pop-up dialog box, select the services you want to restart.
Please review the caveats in the preceding sections for the services you elect to have restarted.
Note that the services that do not support rolling restart will simply be restarted, and will be
unavailable during their restart.
4. If you select an HDFS or HBase service, you can also configure the following:
o
How many roles should be included in a batch (the default is one, so individual roles will
be started one at a time).
How long should Cloudera Manager wait before starting the next batch.
The number of batch failures that will cause the entire rolling restart to fail (this is an
advanced feature).
Services Configuration
Cloudera Manager Server host. In this case, the Cloudera Manager Server will be unable to connect to
the Cloudera Manager Agent on that disconnected host to start or stop the role instance which will
prevent the corresponding service from starting or stopping. To work around this, you can abort the
command to start or stop the role instance on the disconnected host, and then you can start or stop the
service again.
To abort any pending command:
You can click this indicator ( ) that shows the number of commands that are currently running in your
cluster (if any). This indicator is positioned just to the left of the Support link at the right hand side of
the navigation bar. Unlike the Commands tab for a role or service, this indicator includes all commands
running for all services or roles in the cluster. In the Running Commands window, click Abort to abort
the pending command. For more information, see Viewing Running and Recent Commands.
To abort a pending command for a service or role:
1. Navigate to the Service > Instances tab for the service where the role instance you want to stop
is located. For example, navigate to the HDFS Service > Instances tab if you want to abort a
pending command for a DataNode.
2. In the list of instances, click the link for role instance where the command is running (for
example, the instance that is located on the disconnected host).
3. Go to the Commands tab.
4. find the command in the list of Running Commands and click Abort Command to abort the
running command.
Services Configuration
Note: A Gateway is a role whose sole purpose is to designate a host that should receive a client
configuration for a specific service, when the host does not otherwise have any roles running on it.
Gateways are configured by going to the Instances tab for the service and using the Add command
to add Gateway roles as needed. You can configure Gateway roles for HDFS, MapReduce, and
HBase services (and for YARN in CDH4). See Adding Role Instances for more information on adding
Gateway roles.
Note that if you are installing on a system that happens to to have pre-existing alternatives, then it is
possible another alternative may have higher priority and will continue to be used. The alternatives
priority of the Cloudera Manager client configuration is configurable under the Client section of the
Configuration tab for the appropriate service.
You can also distribute these client configuration files manually to the users of a service.
The main circumstance that may require a redeployment of the client configuration files is when you
have modified the configuration of your cluster. In this case you will typically see a message telling you
to redeploy your client configurations. The affected service(s) will also typically be shown as "Running
with outdated Configuration."
Services Configuration
Redeploying the Client Configuration Files Manually
Although Cloudera Manager will deploy client configuration files automatically in many cases, if you
have modified the configurations for a service, you may need to redeploy those configuration files.
If your client configurations were deployed automatically, this command will attempt to redeploy them
as appropriate.
Note: If you are deploying client configurations on a node that has multiple services installed, some
of the same configuration files, though with different configurations, will be installed in the conf
directories for each service. Cloudera Manager uses the priority parameter in the
alternatives --install command to ensure that the correct configuration directory is made
active based on the combination of services on that node. The priority order (as of Cloudera
Manager 4.1.2) is MapReduce > YARN > HDFS.
The priority can be configured under the Client/Advanced section of the Configuration tab for the
appropriate service.
To deploy all the client configuration files to all nodes on your cluster:
1. Click the Services tab in the Cloudera Manager Admin Console.
2. From the cluster-level Actions menu at the top right of the page, select Deploy Client
Configuration...
3. If you are sure you want to proceed, click Deploy client configuration.
To deploy client configuration files for a specific service:
1. From the Services tab, click the service for which you want to deploy client configurations.
2. From the Actions menu at the top right of the service page, select Deploy client
Configuration...
3. If you are sure you want to proceed, click Deploy client configuration.
Services Configuration
Services Configuration
Services Configuration
Configuring HDFS High Availability
You can use Cloudera Manager to configure your CDH4 cluster for HDFS High Availability (HA). High
Availability is not supported for CDH3 clusters.
An HDFS HA cluster is configured with two NameNodes - an Active NameNode and a Standby
NameNode. Only one NameNode can be active at any point in time. HDFS High Availability depends on
maintaining a log of all namespace modifications in a location available to both NameNodes, so that in
the event of a failure the Standby NameNode has up-to-date information about the edits and location of
blocks in the cluster.
There are two implementations available for maintaining the copies of the edit logs:
Quorum-based Storage relies upon a set of JournalNodes, each of which maintains a local edits directory
that logs the modifications to the namespace metadata.
The other alternative is to use a NFS-mounted shared edits directory (typically a remote Filer) to which
both the Active and Standby NameNodes have read/write access.
Once you have enabled High Availability, you can enable Automatic Failover, which will automatically
failover to the Standby NameNode in case the Active NameNode fails.
You can also initiate a manual failover from Cloudera Manager.
See the CDH4 High Availability Guide for a more detailed introduction to High Availability with CDH4.
Important: Enabling or Disabling High Availability will shut down your HDFS service, and the
services that depend on it MapReduce, YARN, and HBase. Therefore, you should not do this while
you have jobs running on your cluster. Further, once HDFS has been restored, the services that
depend upon it must be restarted, and the client configurations for HDFS must be redeployed.
Important: Enabling or Disabling High Availability will cause the previous monitoring history to
become unavailable.
Services Configuration
You may enter only one directory for each JournalNode. The names/paths do not need
to be the same on every JournalNode.
The directories you specify should be empty, and must have the appropriate
permissions.
If the directories are not empty, Cloudera Manager will not delete the contents;
however, in that case the data should be in sync across the edits directories of the
JournalNodes and should have the same version data as the NameNodes.
6. You can choose whether the workflow will restart the dependent services and redeploy the
client configuration for HDFS. To do this manually rather than have it done as part of the
workflow, uncheck these extra options.
7. Click Continue
Cloudera Manager proceeds to execute the set of commands that will stop the dependent
services, delete, create, and configure roles and directories as appropriate, and will restart the
dependent services and deploy the new client configuration if those options were selected.
8. There are some additional steps you must perform if you want to use Hive, Impala, or Hue in a
cluster with High Availability configured. Follow the Post Setup Steps described below.
Services Configuration
Enabling High Availability using NFS Shared Edits Directory
After you have installed HDFS on your CDH4 cluster, the Enable High Availability workflow leads you
through adding a second (Standby) NameNode and configuring the shared edits directory.
The shared edits directory is what the Standby NameNode uses to stay up-to-date with all the file
system changes the Active NameNode makes. Note that you must have a shared directory already
configured to which both NameNode machines have read/write access. Typically, this is a remote filer
which supports NFS and is mounted on each of the NameNode machines. This directory must be
writable by the hdfs user, and must be empty before you run the Enable HA workflow.
You can enable High Availability from the Actions menu on the HDFS Service page in a CDH4 cluster, or
from the HDFS Service Instances tab.
1. From the Services tab, select your HDFS service.
2. Click the Instances tab.
3. Click Enable High Availability
(This button does not appear if this is a CDH3 version of the HDFS service.)
4. The next screen shows the hosts that are eligible to run a Standby NameNode.
a. Select Enable High Availability with NFS shared edits directory as the High Availability
Type.
b. Select the host where you want the Standby NameNode to be installed, and click
Continue.
The Standby NameNode cannot be on the same host as the Active NameNode, and the
host that is chosen should have the same hardware configuration (RM, Disk space,
number of cores, etc.) as the Active NameNode.
5. Confirm or enter the directories to be used as the name directories for the NameNode.
6. Enter the absolute path of the local directory, on each NameNode host, that is mounted to the
remote shared edits directory.
For example, hostA has /dfs/sharedA mounted to nfs:///exported/namenode, and hostB
has /dfs/sharedB mounted to the same NFS location. The user should enter /dfs/sharedA
for hostA and /dfs/sharedB for hostB. (/dfs/sharedA and /dfs/sharedB can be the same
paths).
You should only configure one shared edits directory. This directory must be mounted
read/write on both NameNode machines. This directory must be writable by the hdfs user, and
must be empty when you run the enable HA command.
7. You can choose whether the workflow will restart the dependent services and redeploy the
client configuration for HDFS. To do this manually rather than have it done as part of the
workflow, uncheck these extra options.
Services Configuration
Configure the HDFS Web Interface Role for Hue to be a HTTPFS role. See Configuring Hue to
work with High Availability.
Upgrade the Hive Metastore to use High Availability. You must do this for each Hive service in
your cluster. See Upgrading the Hive Metastore for HDFS High Availability.
Services Configuration
4. Under the HttpFS column, select a host where you want to install the HttpFS role and click
Continue.
5. After you are returned to the Instances page, select the new HttpFS role.
6. From the Actions for Selected menu, select Start (and confirm).
7. After the command has completed, go to the Services tab and select your Hue service.
8. From the Configuration tab, select Edit.
9. The HDFS Web Interface Role property will now show the httpfs role you just added. Select it
instead of the namenode role, and Save your changes. (The HDFS Web Interface Role property
is under the Service-Wide Configuration category.)
10. Restart the Hue service for the changes to take effect.
Services Configuration
Note: If you started your services and re-deployed your client configurations after you enabled HA,
you should not need to do so again now. If you did not start them after enabling HA, you must do
so now, before you attempt to run any jobs on your cluster.
Note: You must disable Automatic Failover before you can disable High Availability.
To disable Automatic Failover
1. From the Services tab, select your HDFS service.
2. Click the Instances tab.
3. Click Disable Automatic Failover...
4. Confirm that you want to take this action.
Cloudera Manager will stop the NameNodes, remove the Failover Controllers, and restart the
NameNodes, transitioning one of them to be the Active NameNode.
Services Configuration
Disabling High Availability
Note: If you have enabled Automatic Failover, you must disable it before you can disable High
Availability.
Fencing Methods
In order to ensure that only one NameNode is active at a time, a fencing method is required for the
shared directory. During a failover, the fencing method is responsible for ensuring that the previous
Active NameNode no longer has access to the shared edits directory, so that the new Active NameNode
can safely proceed writing to it.
For details of the fencing methods supplied with CDH4, and how fencing is configured, see the Fencing
Configuration section in the CDH4 High Availability Guide.
By default, Cloudera Manager configures HDFS to use a shell fencing method
(shell(./cloudera_manager_agent_fencer.py)) that takes advantage of the Cloudera Manager
agent. However, you can configure HDFS to use the sshfence method, or you can add your own shell
fencing scripts, instead of or in addition to the one Cloudera Manager provides. .
Services Configuration
The fencing parameters are found in the Service-Wide section of the Configuration tab for your HDFS
service.
Converting from NFS-mounted shared edits directory to Quorum-based Storage
Converting your High Availability configuration from using a NFS-mounted shared edits directory to
Quorum-based Storage just involves disabling your current High Availability configuration, then enabling
High Availability using Quorum-based Storage.
1. Disable High Availability (see Disabling High Availability).
2. Although the Standby NameNode role is removed, its name directories are not deleted. Empty
these directories.
3. Enable High Availability with Quorum-based Storage (see Enabling High Availability with
Quorum-based Storage).
Services Configuration
Important: Configuring a new Nameservice will shut down the services that depend upon HDFS.
Once the new Nameservice has been started, the services that depend upon HDFS must be
restarted, and the client configurations must be redeployed. (This can be done as part of the Add
Nameservice workflow, as an option.)
Adding a Nameservice
The instructions below for adding a Nameservice assume that a Nameservice is already set up. The first
Nameservice cam be set up either by converting a simple HDFS service as described above (see
Converting a non-Federated HDFS Service to a Federated HDFS Service or by enabling High Availability.
Services Configuration
1. Click the Services tab and select your CDH4 HDFS service.
2. Click the Instances tab.
At the top of this page you should see the Federation and High Availability section.
Note: If this section does not appear, it means you do not have any Nameservices
configured. You must have one Nameservice already configured in order to add a second.
You can either enable High Availability, which will create a Nameservice, or you can convert
your existing HDFS service. See Converting a non-Federated HDFS Service to a Federated
HDFS Service for instructions.
Note that the mount points must be unique for this Nameservice; you cannot
specify any of the same mount points you have used for other Nameservices.
You can specify mount points that do not yet exist, and create the
corresponding directories in a later step in this procedure.
After you have brought up the new Nameservice, you will need to create, in the
new namespace, the directories that correspond with the mount points you
specified.
If an HBase service is set to depend on the federated HDFS service, make sure to
edit the mount points of the existing Nameservice to reference:
Services Configuration
c. If you want to configure High Availability for this Nameservice, leave the Highly
Available checkbox checked.
d. Click Continue.
4. Select the hosts on which the new NameNode and SecondaryNameNodes will be created. (Note
that these must be hosts that are not already running other NameNode or SecondaryNameNode
instances, and their /dfs/nn and /dfs/snn directories should be empty if they exist.
Click Continue.
5. Enter or confirm the directory property values (these will differ depending on whether you are
enabling High Availability for this Nameservice, or not).
6. Uncheck the Start Dependent Services checkbox if you need to create directories or move data
onto the new Nameservice. Leave this checked if you want the workflow to restart services and
redeploy the client configurations as the last steps in the workflow.
7. Click Continue.
If the process finished successfully, click Finish.
You should now see your new Nameservice in the Federation and High Availability section in
the Instances tab of the HDFS service.
8. You must now create the directories you want under the new Nameservice. You need to do this
in the CLI.
a. To create a directory in the new namespace, use the command
hadoop fs -mkdir /nameservices/ <nameservice name> / <directory>
where <nameservice name> is the new nameservice you just created, and
<directory> is the directory that corresponds to a mount point you specified.
b. If you need to move data from one Nameservice to another, use distcp or manual
export/import. dfs -cp and dfs -mv will not work.
c. Verify that the directories and data are where you expect them to be.
9. Restart the dependent services.
Services Configuration
Note: The monitoring configurations at the HDFS level apply to all NameServices. So if you have
two NameServices, it is not possible to disable a check on one but not the other. Likewise, it's not
possible to have different thresholds for events for the two NameServices.
Renaming a Service
A service is given a name upon installation, and that name is used as an identifier internally. However,
Cloudera Manager allows you to provide your own display name for a service, and that name will appear
in the Cloudera Manager User Interface instead of the original (internal) name.
To provide (or change) the display name of a service:
1. Pull down the Actions menu for the service, and select Rename...
2. Type the new name you want.
Services Configuration
2. On the Properties tab, under the Performance category, set the following option:
Setting
Description
3. On the Properties tab, under the Threshold category, set the following options:
Setting
Description
Set health status to Bad if the Agent heartbeats If an Agent fails to send this number of
fail ____ time(s)
expected consecutive heartbeats to the Server,
a Bad health status is assigned to that Agent.
Services Configuration
Moving the NameNode to a Different Host
If necessary, you can move the NameNode role instance to a different host machine. For example, the
NameNode host may be having hardware problems and you need to move the NameNode to a properly
functioning host. Use the instructions in this section to move the NameNode. The host where you want
to move the NameNode must be managed by Cloudera Manager.
Adding a New Host
If you need to first add a new host in the cluster, follow the instructions in Adding a Host to the Cluster
and then return to this page and proceed to next section below. If you are moving the NameNode to a
host machine where you already installed CDH3 and the Cloudera Manager Agent, you can proceed
directly to the next section below.
Moving the NameNode Role Instance to a Different Host
To move the NameNode to a different host machine:
1. Click the Services tab and then stop all services. For instructions, see Stopping All Services.
2. Using the command line, make a backup copy of the dfs.name.dir directories on the existing
NameNode host. Make sure you backup the fsimage and edits files. They should be the same
across all of the directories specified by the dfs.name.dir property. You can view the setting
for this property in the HDFS Service > Configuration tab.
3. Using the command line, copy the files you backed up from dfs.name.dir directories on the
old NameNode host to the new host where you want to run the NameNode.
4. In Cloudera Manager, click the Services tab and navigate to the HDFS Service > Instances tab.
5. Select the check box next to the NameNode role instance and then click the Delete button. Click
Delete again to confirm.
6. In the Review configuration changes page that appears, click Skip.
7. On the same HDFS Service > Instances tab, click Add to add a NameNode role instance.
8. Select the new host where you want to run the NameNode and then click Continue.
9. Specify the location of the dfs.name.dir directories where you copied the data on the new
host, and then click Accept Changes.
10. Click the Services tab and then start all services. For instructions, see Starting All Services. After
the HDFS service has started, the Cloudera Manager Server will distribute the new configuration
files to the DataNodes which will then be configured with the new IP address of the NameNode.
11. Navigate to the HDFS Service > Status tab. The NameNode, Secondary NameNode, and
DataNode for the HDFS service should each show a process state of Started, and the overall
HDFS service should show a health status of Good.
4. You are asked if you want to restart the cluster; click Rolling Restart to proceed with a Rolling
Restart. Click Restart to perform a normal restart.
Services that do not support Rolling Restart will undergo a normal restart, and will not be
available during the restart process.
o
If you do not want to restart immediately, click Close. You can restart the cluster from
the All Services page at a later time. Note, however, that the new version of CDH will
not take effect until you restart your cluster.
5. For a Rolling Restart, a pop-up allows you to chose which services you want to restart, and
presents caveats to be aware of for those services that can undergo a rolling restart.
o
How many roles should be included in a batch (the default is one, so individual
roles will be started one at a time).
How long should Cloudera Manager wait before starting the next batch.
The number of batch failures that will cause the entire rolling restart to fail (this
is an advanced feature).
Please see the Rolling Restart topic for more information about these choices.
Adding a Cluster
Cloudera Manager can manage multiple clusters. Furthermore, the clusters do not need to run the
same version of CDH; you can manage both CDH3 and CDH4 clusters with Cloudera Manager.
To add a cluster with new hosts:
1. From the Services tab, click Add Cluster...
This begins the Installation Wizard, just as if you were installing a cluster for the first time. (See
the Cloudera Manager Installation Guide for detailed instructions. )
2. To enable Cloudera Manager to automatically discover new hosts where you want to install
CDH, enter the cluster hostnames or IP addresses, and click Search.
Cloudera Manager lists the hosts you can use to configure a new cluster. Managed hosts that
already have services installed will not be selectable.
3. Click Install CDH on Selected Hosts to install the new cluster.
At this point the installation continues through the wizard the same as it did when you installed
your first cluster. You will be asked to select the version of CDH to install, which services you
want and so on, just as previously.
To add a cluster using currently managed hosts:
Alternatively, you may have hosts that are already "managed" but are not part of a cluster. You can have
managed hosts that are not part of a cluster when you have added hosts to Cloudera Manager either
through the Add Host wizard, or by manually installing the Cloudera Manager agent onto hosts where
you have not installed any other services. This will also be the case if you remove all services from a
host so that it no longer is part of a cluster.
1. From the Services tab, click Add Cluster...
2. To see a list of the currently managed hosts, click the View Currently Managed Hosts link under
the Search field.
3. To perform the installation, click Continue Using Only Currently Managed Hosts. Instead of
searching for hosts, this will attempt to install onto any hosts managed by Cloudera Manager
that are not already part of a cluster. It will proceed with the installation wizard as for a new
cluster installation.
Clicking the small arrow in front of the number of roles will list all the role instances running on
that host. The balloon annotation that appears when you move the cursor over a link indicates
the service instance to which the role belongs.
All columns in the table can be filtered either by typing text or by selecting a filter specification
from the drop-down list in the filter field.
You can change the data you see for the hosts in the host list using the View Columns menu:
o
Current States shows you basic information (Host name, IP, rack assignment, CDH
version, health status, when the Last Heartbeat occurred, and the Decommission
status.)
Physical Attributes shows you physical information about the host such as the number
of cores, system load averages for the past 1, 5 and 15 minutes, disk usage, physical
memory and swap space usage.
Under the Actions for Selected menu you can manage rack locality by assigning hosts to racks, delete
hosts, decommission and recommission hosts, start all roles on the host, and enter or exit maintenance
mode. You can also apply a Host Template to a host, if that host does not have any roles currently
running on it.
You can also Add New Hosts to Cluster, run the Host Inspector, and Re-run Host Upgrade Wizard from
this page.
You can change the data you see for the hosts in the host list using the View Columns menu:
Current States shows you basic information (Host name, IP, Rack assignment, CDH version,
Health status, Last heartbeat, Maintenance Mode status, and Decommission status.)
Physical Attributes shows you physical information about the host such as the number of cores,
system load averages for the past 1, 5 and 15 minutes, disk usage, physical memory and swap
space usage.
Configuration Tab
The Configuration tab lets you set properties related to parcels and to resource management, and also
monitoring properties for the hosts under management. Note that configuration settings you make
here will affect all your managed hosts. You can also configure properties for individual hosts from the
Host Details page (see Viewing Detailed Information about Hosts) which will override the global
properties set here).
To edit the Default configuration properties for a host:
1. From the Configuration tab, pull down the menu and select Edit.
Under Parcels, the Configuration tab lets you specify how parcels will interact with your
managed hosts. You can provide a "blacklist" of products that should not be distributed
to these hosts.
Blacklisting a product at the All Hosts level will prevent the product from being
distributed to any of the managed hosts in your cluster. You can also blacklist products
at the individual host level.
Under Thresholds you can configure the thresholds for monitoring the free space in the
Agent Log and Agent Process Directories for all your hosts. You can set these thresholds
as either or both a percentage and an absolute value (in bytes).
Under Other you can set health check thresholds for a variety of conditions related to
memory usage and other properties.
Here is where you can enable Alerting for health check events for all your managed
hosts.
Deleting Hosts
Decommissioning a Host
Managing Parcels
Resource Management
Health status of the host and last time the Cloudera Manager Agent sent a heart beat to the
Cloudera Manager Server
Number of cores
Memory usage
Charts showing a variety of metrics and health test results over time.
Status Tab
Processes Tab
Resources Tab
Commands Tab
Configuration Tab
Components Tab
Audits Tab
Charts Tab
Status Tab
This page is displayed when a Host is initially selected. This provides summary information about the
status of the selected host. Use this page to gain a general understanding of work being done by the
system, the configuration, and health status.
If this host has been decommissioned or is in maintenance mode, you will see the following icon(s)
(
To view details about the Host agent, click the link at the far right in the Details section.
Health Tests
Cloudera Manager monitors a variety of metrics that are used to indicate whether a host is functioning
as expected. The Health Tests panel shows health test results in an expandable/collapsable list, typically
with the specific metrics that the test returned. (You can Expand All or Collapse All from the links at the
upper right of the Health Tests panel).
The color of the text (and the background color of the field) for a Health Test result indicates the
status of the results. The tests are sorted by their health status Good, Concerning, Bad, or
Disabled. The list of entries for good and Disabled health tests are collapsed by default;
however, Bad or Concerning results are shown expanded.
The text of a health test also acts as a link to further information about the test. Clicking the
text will pop up a window with further information, such as the meaning of the test and its
possible results, suggestions for actions you can take or how to make configuration changes
related to the test.
The help text for a health test also provides a link to the relevant monitoring configuration
section for the service. See Configuring Monitoring Settings for more information.
The small heatmap icon ( ) to the right of some of the tests takes you to a heatmap display that
lets you compare the values of the relevant test result metrics across the nodes of your cluster.
Health History
The Health History provides a record of state transitions of the Health Tests for the host.
Click the arrow symbol at the left to view the description of the health test state change.
Click the View link to open a new page that shows the state of the host at the time of the
transition. Note that in this view some of the status settings are greyed out, as they reflect a
time in the past, not the current status.
File Systems
The File systems panel provides information about disks, their mount points and usage. Use this
information to determine if additional disk space is required.
Roles
Use the Roles panel to see the role instances running on the selected host, as well as each instance's
status and health.
Host machines are configured with one or more role instances, each of which corresponds to a service.
The role indicates which daemon runs on the host. Some examples of roles include the NameNode,
Secondary NameNode, Balancer, JobTrackers, DataNodes, RegionServers and so on. Typically a host will
run multiple roles in support of the various Services running in the cluster.
Clicking the role name takes you to the role instance's status page. Using the triangle to the right of the
role name, you can directly access the tabs on the role page (such as the Processes, Commands,
Configuration, or Audits tabs) as well as the status page for the parent Service of the role.
You can delete a role from the host from the Instances tab of the Service page for the parent service of
the role. You can add a role to a host in the same way. See Adding Role Instances and Deleting Service
Instances and Role Instances.
Charts
Charts are shown for each host instance in your cluster.
See Viewing Charts for Service, Role, or Host Instances for detailed information on the charts that are
presented, and the ability to search and display metrics of your choice.
Heat Maps
Health heat maps let you compare the status or performance of the different hosts in your cluster.
From the Health Tests panel for the host, you can access heatmaps that show related metrics for all the
nodes in your cluster. These are accessed by clicking the small heatmap icon ( ) to the right of some of
the tests in the Health Tests panel for the Host you are viewing.
See Viewing Heatmaps for Services and Roles for more information heatmaps for hosts are very
similar to those for roles, and the explanation there applies to hosts as well.
Processes Tab
The processes page provides information about each of the processes that are currently running on this
host. Use this page to access management web UIs, check process status, and access log information.
The Processes tab includes a variety of categories of information.
Service The name of the service. Clicking the service name takes you to the service status
page. Using the triangle to the right of the service name, you can directly access the tabs on the
role page (such as the Processes, Commands, Configuration, or Audits tabs).
Instance The role instance on this host that is associated with the service. Clicking the role
name takes you to the role instance's status page. Using the triangle to the right of the role
name, you can directly access the tabs on the role page (such as the Processes, Commands,
Configuration, or Audits tabs) as well as the status page for the parent Service of the role.
Link A link to the management interface for this role instance on this system. This is not
available in all cases.
Status The current status for the process. Statuses include stopped, starting, running, and
paused.
Full log file A link to the full log for this host log entries for this host (a file external to
Cloudera Manager).
Stderr A link to the stderr log for this host (a file external to Cloudera Manager).
Stdout ---A link to the stdout log for this host (a file external to Cloudera Manager).
Resources Tab
Under the Resources tab you can view the resources (CPU, memory, disk, and ports) used by every
service and role instance running on the selected host.
Each entry on this page lists:
The amount of the resource being consumed or the settings for the resource
Description
CPU
Memory
Disk
Ports
The port number being used by the service to establish network connections.
Configuration Tab
The Configuration tab for a host lets you set monitoring properties for the selected host. In addition, for
parcel upgrades, you can blacklist specific products specify products that should not be distributed or
activated on the host.
To modify the monitoring properties for the selected host:
1. Pull down the Configuration Tab and select View and Edit.
2. Click the Monitoring category.
o
Under Thresholds you can configure the thresholds for monitoring the free space in the
Agent Log and Agent Process Directories for all your hosts. You can set these thresholds
as either or both a percentage and an absolute value (in bytes).
Under Other you can set health check thresholds for a variety of conditions related to
memory usage and other properties.
Here is where you can enable Alerting for health check events for all your managed
hosts.
The monitoring settings you make on this page will override the global host monitoring settings from the
Configuration tab of the All Hosts page.
For more information, see Modifying Service Configurations.
Components Tab
The Components tab lists every component installed on this host. This may include components that
have been installed but have not been added as a service (such as YARN, Flume or Impala).
This includes the following information:
Version The version of CDH from which each component came (CDH3 or CDH4).
Audits Tab
The Audits tab lets you filter for audit events related to this host. See Viewing the Audit History for
more information.
You can edit a copy of any chart on this page and save it to a custom chart view.
Deleting Hosts
There are two ways to remove a host from a cluster:
You can remove a host from a cluster, but leave it available to be added to a different cluster
managed by Cloudera Manager.
You can stop Cloudera Manager from managing a host and the Hadoop daemons on the host.
First, make sure there are no roles running on the Host; you can decommission the host to ensure all
roles are stopped.
To remove a host from a cluster but leave it available to Cloudera Manager, you must remove all CDH
roles from the host. If the host has Cloudera Manager management roles (such as the Events Server,
Activity Monitor and so on), those roles can remain.
To remove a host from Cloudera Manager mangement entirely, you must stop the Cloudera Manager
Agent from running on the host; if you don't stop the Agent, it will send heartbeats to the Cloudera
Manager Server and show up again in the list of hosts.
The inspector runs tests to gather information for functional areas including:
Networking
System time
HDFS settings
Component versions
Installing components
Upgrading components
Decommissioning a Host
Decommissioning a host lets you decommission all roles on a single host without having to go to each
service and decommission the roles individually. Once all roles on the host have been decommissioned
and stopped, the host can be removed from service.
Decommissioning applies to only to HDFS DataNodes, MapReduce TaskTrackers, YARN NodeManagers,
and HBase RegionServers. If the host you select has other roles running on it, those roles will simply be
stopped.
Host decommissioning supports decommissioning multiple hosts in parallel.
To decommission one or more hosts:
1. Click the Hosts tab.
2. Select the host(s) you want to decommission.
3. From the Actions for Selected menu, click Decommission.
A confirmation pop-up informs you of the roles that will be decommissioned or stopped on the nodes
you have selected. To proceed with the decommissioning, click Confirm.
A Command Details window appears that will show each stop or decommission command as it is run,
service by service. You can click one of the decommission links to see the subcommands that are run for
decommissioning a given role. Depending on the role, the steps may include adding the node to an
"exclusions list" and refreshing the NameNode, JobTracker, or NodeManager, stopping the Balancer (if it
is running), and moving data blocks or regions. Roles that do not have specific decommission actions
are just stopped.
While decommissioning is in progress, the host is marked Decommissioning in the list under the Hosts
tab. Once all roles have been decommissioned or stopped, the host is marked Decommissioned.
1. From the Parcels tab, click the Distribute button for the parcel you want to distribute.
This will begin the process of distributing the parcel to the nodes in the cluster.
If you have a large number of nodes to which the parcels should be distributed, you can control how
many concurrent uploads Cloudera Manager will perform. You can configure this setting on the
Adminstration page, Properties tab under the Parcels section.
You can delete a parcel that is ready to be distributed; click the triangle at the right end of the Distribute
button to access the Delete command. This will delete the downloaded parcel from the local parcel
repository.
Note that distributing parcels to the nodes in the cluster does not affect the current running services.
Activating a parcel
Parcels that have been distributed to the nodes in a cluster are ready to be activated.
1. From the Parcels tab, click the Activate button for the parcel you want to activate.
This will update Cloudera Manager to point to the new software, ready to be run the next time a
service is restarted.
2. To start using a new version of a component, go to the Services tab and restart your services.
Until you restart services, the current software will continue to run. This allows you to restart
your services at a time that is convenient based on your maintenance schedules or other
considerations.
Deactivating a parcel
You can deactivate an active parcel; this will update Cloudera Manager to point to the previous software
version, ready to be run the next time a service is restarted.
To use the previous version of the software, go to the Services tab and restart your services.
If you did your original installation from parcels, and there is only one version of your software
installed (i.e. no packages, and no additional parcels have been activated and started) then when
you attempt to restart after deactivating the current version, your roles will be stopped but will not
be able to restart.
The Remote Parcel Repository URLs is a list of remote repositories the Cloudera Manager should check
for parcels. Initially this points to the default repository at archive.cloudera.com
(http://archive.cloudera.com/cdh4/parcels/latest/) but you can added your own repository locations to
the list.
You can also:
Set the frequency with which Cloudera Manager will check for new parcels
Configure whether downloads and distribution of parcels should occur automatically whenever
new ones are detected.
If automatic downloading/distribution are not enabled (the default), you must go to the Parcels page to
initiate these actions.
For individual hosts (or all hosts) you can "blacklist" selected parcels; this will prevent those parcels from
being distributed to or activated upon those hosts.
To blacklist a parcel:
1. Go to the Configuration tab for a host (or for All Hosts) and click Edit.
2. Under the parcels category, enter the parcel(s) you want to blacklist. Enter the name as it
appears on the Parcles page for example, 4.1.2-1.cdh4.1.2.p0.30 and click Save.
If a parcel you blacklist has already been distributed to the host, it will be removed from that host. If it
is already running on the host, it will continue to run until the next restart, when it will not be restarted.
You can create and manage host templates under the Templates tab from the All Hosts page.
1. Click the Hosts tab on the main Cloudera Manager navigation bar.
2. Click the Templates tab on the All Hosts page.
Templates are not required; Cloudera Manager assigns roles and configuration groups to the hosts of
your cluster when you perform the initial cluster installation. However, if you want to add new hosts to
your cluster, a host templates can make this much easier.
If there are existing host templates, they are listed on the page, along with links to each role
configuration group included in the template.
If you are managing multiple clusters, you must create separate host templates for each cluster, as the
templates specify role configurations specific to the roles in a single cluster. Existing host templates are
listed under the cluster to which they apply.
You can click a configuration group name to be taken to the Edit page for that configuration
group, where you can modify its settings.
From the Actions menu associated with the group you can Rename the template, or delete it.
For each role, select either the "none" option, or the appropriate role configuration
group. There may be multiple configuration groups for a given role type you want to
select the one with the configuration that meets your needs.
Selecting the "none" option means no configuration group will be included in the
template for that role type.
Note that a host may have no roles on it if you have just added the host to your cluster, or if you
decommissioned a managed host and removed its existing roles.
Also note that the host must have the same version of CDH installed as is running on the cluster whose
host templates you are applying.
If a host belongs to a different cluster than the one for which you created the host template, you can
apply the host template if the "foreign" host either has no roles on it, or has only management roles on
it. When you apply the host template, the host will then become a member of the cluster whose host
template you applied. The following instructions assume you have already created the appropriate host
template.
1. Go to the All Hosts page, Status tab.
2. Select the host(s) to which you want to apply your host template.
3. From the Actions for Selected menu, select Apply Host Template.
4. In the pop-up window that appears, select the host template you want to apply.
5. Optionally you can have Cloudera Manager start the roles created per the host template check
the box to enable this.
6. Click Confirm to initiate the action.
Resource Management
The 4.5 release of Cloudera Manager reinforces existing resource management techniques and
introduces several new ones. These are primarily intended to isolate compute frameworks from one
another. For example, MapReduce and Impala often work with the same data set and run side-by-side
on the same physical hardware. Without explicitly managing the cluster's resources, Impala queries may
affect MapReduce job SLAs, and vice versa.
Resource Management via Control Groups (Cgroups)
Cloudera Manager 4.5 introduces support for the Control Groups (cgroups) Linux kernel feature. With
cgroups, administrators can impose per-resource restrictions and limits on CDH processes.
Important
If you've upgraded from an older version of Cloudera Manager to Cloudera Manager 4.5, you must
restart every Cloudera Manager supervisor process before using cgroups-based resource
management. The easiest and safest way to do this is:
1. Stop all services, including the Cloudera Management Services.
2. On each cluster node, run as root:
$ service cloudera-scm-agent hard_restart
Cgroups-based resource management can be enabled for all hosts, or on a per-host basis.
To enable cgroups for all hosts:
1. Click the Hosts tab.
2. Click on the Configuration tab, then select View and Edit.
3. Select the Resource Management category.
4. Check the box next to Enable Cgroup-based Resource Management.
To enable cgroups for individual hosts:
1. Click the Hosts tab.
2. Click the link for the host where you want to enable cgroups.
3. Click on the Configuration tab, then select View and Edit.
4. Select the Resource Management category.
5. Check the box next to Enable Cgroup-based Resource Management.
When cgroups-based resource management is enabled for a particular host, all roles on that host must
be restarted for the changes to take effect.
Using Resource management
After enabling cgroups, one can restrict and limit the resource consumption of roles (or role config
groups) on a per-resource basis. All of these parameters can be found in the Cloudera Manager
configuration UI, under the Resource Management category.
CPU Shares
The more CPU shares given to a role, the larger its share of the CPU when under contention. Until
processes on the host (including both roles managed by Cloudera Manager and other system processes)
are contending for all of the CPUs, this will have no effect. When there is contention, those processes
with higher CPU shares will be given more CPU time. The effect is linear: a process with 4 CPU shares will
be given roughly twice as much CPU time as a process with 2 CPU shares.
Updates to this parameter will be dynamically reflected in the running role.
I/O Weight
Much like CPU shares, the more the I/O weight, the higher priority will be given to I/O requests made by
the role when I/O is under contention (either by roles managed by Cloudera Manager or by other
system processes). Note that this only affects read requests; write requests remain unprioritized.
Updates to this parameter will be dynamically reflected in the running role.
Distribution
CPU Shares
I/O Weight
If the distribution lacks support for a given parameter, changes to it will have no effect.
The exact level of support can be found in the Cloudera Manager Agent's log file, shortly after the agent
has started. See Viewing the Cloudera Manager Server and Agent Logs to find the agent log. In the log
file, look for an entry like this:
Found cgroups capabilities: {'has_memory': True,
'default_memory_limit_in_bytes': 9223372036854775807,
'writable_cgroup_dot_procs': True, 'has_cpu': True, 'default_blkio_weight':
1000, 'default_cpu_shares': 1024, 'has_blkio': True}
The 'has_memory' and similar entries correspond directly to support for the CPU, I/O, and Memory
parameters.
Further reading
http://www.kernel.org/doc/Documentation/cgroups/cgroups.txt
http://www.kernel.org/doc/Documentation/cgroups/blkio-controller.txt
http://www.kernel.org/doc/Documentation/cgroups/memory.txt
http://access.redhat.com/knowledge/docs/enUS/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide
MapReduce Child Java Maximum Heap Size (Gateway and Client Override).
MapReduce Map Task Maximum Heap Size (Gateway and Client Override).
MapReduce Reduce Task Maximum Heap Size (Gateway and Client Override).
MapReduce Map Task Maximum Virtual Memory (Gateway and Client Override).
MapReduce Reduce Task Maximum Virtual Memory (Gateway and Client Override).
Examples
Protecting production MapReduce jobs from Impala queries
Suppose you have MapReduce deployed in production and want to roll out Impala without clobbering
your production MapReduce jobs.
For simplicity, we will make the following assumptions:
1. The cluster is using homogenous hardware.
2. Each slave node has 8 GB of RAM.
3. Each slave node is running a Datanode, Tasktracker, and an Impala daemon.
4. Each role type is in a single role config group.
5. Cgroups-based resource management has been enabled on all hosts.
Activity Monitoring
CPU:
1. Leave Datanode and Tasktracker role config group CPU shares at 1024.
2. Set Impala daemon role config group's CPU shares to 256.
Memory:
1. Set Impala daemon role config group memory limit to 1024 MB.
2. Leave Datanode maximum Java heap size at 1 GB.
3. Leave Tasktracker maximim Java heap size at 1 GB.
4. Set MapReduce Child Java Maximum Heap Size for Gateway to 5 GB.
5. Leave cgroups hard memory limits alone. We'll rely on "cooperative" memory limits exclusively,
as they yield a nicer user experience than the cgroups-based hard memory limits.
I/O:
1. Leave Datanode and Tasktracker role config group I/O weight at 500.
2. Impala daemon role config group I/O weight is set to 100.
When you're done with configuration, restart all services for these changes to take effect.
The results are:
1. When MapReduce jobs are running, all Impala queries together will consume up to a fifth of the
cluster's CPU resources.
2. Individual Impala daemons won't consume more than 1 GB of RAM. If this figure is exceeded,
new queries will be cancelled.
3. Datanodes and TaskTrackers can consume up to 1 GB of RAM each.
4. The remainder of each host's available RAM (6 GB) is reserved for MapReduce tasks.
5. When MapReduce jobs are running, read requests issued by Impala queries will receive a fifth of
the priority of either HDFS read requests or MapReduce read requests.
Activity Monitoring
Cloudera Manager's activity monitoring capability monitors the Pig, Hive, Oozie, MapReduce and
streaming jobs that are running on your cluster. The Activity Monitor provides many statistics about the
performance of and resources used by those jobs. You can see which users are running jobs, both at the
current time and through views of historical activity. When the individual jobs are part of larger
workflows (via Oozie, Hive, or Pig), these jobs are aggregated into 'activities' that can be monitored as a
whole, as well as by the component MapReduce jobs.
If you are running multiple clusters, there will be an separate link under the Activities tab for each
cluster's MapReduce activities.
Activity Monitoring
Currently, only MapReduce v1 jobs can be monitored with Cloudera Manager's Activity Monitor.
MapReduce v2 (YARN) jobs are not currently supported in the Activity Monitor.
The following topics describe how to view and monitor user activities that run on your cluster.
Viewing Activities
Viewing Activities
From the Activities tab you can view information about the activities (jobs and tasks) that have run in
your cluster during a selected time span.
The list of activities provides specific metrics about the activities that were submitted, were
running, or finished within the time frame you select.
You can select charts that show a variety of metrics of interest, either for the cluster as a whole
or for individual jobs.
You can use the Time Range Selector or the Custom Time Range panel to select the time interval over
which job and task information is displayed in the Activities list (see Selecting a Time Range for more
details).
You can select an activity and drill down to look at the jobs and tasks spawned by the activity.
Compare the selected activity to a set of other similar activities, to determine if the selected
activity showed anomalous behavior. For example, if a standard job suddenly runs much longer
than usual, this may indicate issues with your cluster.
Display the distribution of task attempts that made up a job, by amount of input or output data
or CPU usage compared to task duration. You can use this, for example, to determine if tasks
running on a certain host are performing slower than average.
Activity Monitoring
Note:
Some Activity data is sampled at one-minute intervals. This means that if you run a very short job
that both starts and ends within the sampling interval, it may not be detected by the Activity
Monitor, and thus will not appear in the Activities list or charts.
Click the MapReduce service you want to see (if you are running multiple clusters, each cluster's
MapReduce service will have a separate entry under the Activities tab).
The columns in the Activities list show statistics about the performance of and resources used
by each activity. By default only a subset of the possible metrics are displayed you can modify
the columns that are displayed to add or remove columns.
o
The leftmost column holds a context menu button ( ). Click this button to display a
menu of commands relevant to the job shown in that row. The commands are:
Children
For a Pig, Hive or Oozie activity, this takes you to the Children tab of
the individual activity page. You can also go to this page by clicking
the activity ID in the activity list. This command only appears for Pig,
Hive or Oozie activities.
Tasks
For a MapReduce job, this takes you to the Tasks tab of the
individual job page. You can also go to this page by clicking the job
ID in the activity or activity children list. This command only appears
for a MapReduce job.
Details
Takes you to the Details tab where you can view the activity or job
statistics in report form.
Compare
Takes you to the Compare tab where you can see how the selected
activity compares to other similar activities in terms of a wide
variety of metrics.
Task Distribution
Takes you to the Task Distribution tab where you can view the
distribution of task attempts that made up this job, by amount of
data and task duration. This command is available for MapReduce
and Streaming jobs.
Activity Monitoring
Kill Job
A pop-up asks for confirmation that you want to kill the job. This
command is available only for MapReduce and Streaming jobs.
The third column shows the status of the job, if the activity is a MapReduce job:
The job has been submitted.
The job has been started.
The job is assumed to have succeeded.
The job has finished successfully.
The job's final state is unknown.
The job has been suspended.
The job has failed.
The job has been killed.
Note: If the activity is a Pig, Hive, or Oozie activity, no overall status is shown for
the activity because the activity may be composed individual MapReduce jobs.
Select the activity in the Activities list to view its children the individual jobs that
make up the activity work flow. Each child job shows its own status.
Activity Monitoring
Hive job
Oozie job
Streaming job
From the drop-down menu, select the query you want to run. There are predefined queries to
search by job type (e.g. Pig jobs, MapReduce jobs and so on) or for running, failed, or longrunning activities.
Activity Monitoring
To create a filter:
1. Click the down arrow next to the Search button (
2. Select a metric from the drop-down list in the first field; you can create a filter based on any of
the available metrics.
3. Once you select a metric, fill in the rest of the fields; your choices depend on the type of metric
you have selected.
Use the percent character % as a wildcard in a string; for example, "Id matches job%0001" will
look for any MapReduce job ID with suffix 0001.
4. To create a compound filter, click the plus icon at the end of the filter row to add another row. If
you combine filter criteria, all criteria must be true for an activity to match.
5. To remove a filter criteria from a compound filter, click the minus icon at the end of the filter
row. Removing the last row removes the filter.
6. To include any children of a Pig, Hive, or Oozie activity in your search results, check the Include
Child Activities checkbox. Otherwise, only the top-level activity will be included, even if one or
more child activities matched the filter criteria.
7. Click the Search button (which appears when you start creating the filter) to run the filter.
Note: The filter will be remembered across user sessions i.e if you log out the filter will be
preserved and will still be active when you log back in. Newly-submitted activities will appear in
the Activity List only if they match the filter criteria.
Activity Charts
By default the charts show aggregated statistics about the performance of the cluster: Running Tasks,
CPU Usage and Memory Usage. There are additional charts you can enable from a pop-up panel. You
can also superimpose individual job statistics on any of the displayed charts.
Most charts display multiple metrics within the same chart. For example, the Tasks Running chart
shows two metrics: Cluster Running Maps and Cluster Running Reducers in the same chart. Each
metric appears in a different color.
To see the exact values at a given point in time, move the cursor over the chart a movable
vertical line pinpoints a specific time, and a tooltip shows you the values at that point.
You can use the time range selector at the top of the page to zoom in the chart display will
follow. In order to zoom out, you can use the Time Range Selector at the top of the page or click
the link below the chart.
Activity Monitoring
Click the plus at the top right of the chart panel to open the chart selection panel.
Check or uncheck the boxes next to the charts you want to show or hide.
Check or uncheck the Cluster checkbox at the top of the Charts panel.
To remove a job's statistics from the chart, click the "x" next to the job ID in the top bar of the
chart.
Note: Chart selections are retained only for the current session.
Move the cursor over the divider between the Activities list and the charts, grab it and drag to
expand or contract the chart area compared to the Activities list.
Drag the divider all the way to the right to hide the charts, or all the way to the left to hide the
Activities list.
Click an individual job to view Task information and other information for that child.
See Viewing Activities for details of how the functions on this page work.
In addition, viewing a Pig, Hive or Oozie Activity provides the following tabs:
Activity Monitoring
The Details tab shows Activity details in a report form. See Viewing Activity Details in a Report
Format for more information.
The Compare tab compares this activity to other similar activity. The main difference between
this and a comparison for a single MapReduce activity is that the comparison is done looking at
other activities of the same type (Pig, Hive or Oozie) but does include the child jobs of the
activity. See Comparing Similar Activities for an explanation of that tab.
You can use the Zoom to Duration button to zoom the Time Range Selector to the exact time range
spanned by the activity whose tasks you are viewing.
Activity Monitoring
Selecting Columns to Show in the Tasks List
In the Tasks list, you can display or hide any of the metrics the Cloudera Manager collects for task
attempts. By default a subset of the possible metrics are displayed.
To show or hide statistics in the list:
1. Click the Select Columns to Display icon ( ).
A pop-up panel lets you turn on or off a variety of metrics that may be of interest.
2. Check or uncheck the columns you want to include or remove from the display. Note that as you
check or uncheck an item, its column immediately appears or disappears from the display.
3. Click the "x" in the upper right corner to close the panel.
Sorting the Tasks List
You can sort the list by any of the information displayed in the list:
1. Click the column header to initiate a sort.
2. Click the small arrow that appears next to the column header to reverse the sort direction.
Filtering the Tasks List
You can filter the list of activities based on values of any of the metrics that are available.
To create a filter:
1. Click the down arrow next to the Search button (
2. Select a metric from the drop-down list in the first field; you can create a filter based on any of
the available metrics.
3. Once you select a metric, fill in the rest of the fields; your choices depend on the type of metric
you have selected.
Use the percent character % as a wildcard in a string; for example, "Id matches job%0001" will
look for any MapReduce job ID with suffix 0001.
4. To create a compound filter, click the plus icon at the end of the filter row to add another row. If
you combine filter criteria, all criteria must be true for an activity to match.
5. To remove a filter criteria from a compound filter, click the minus icon at the end of the filter
row. Removing the last row removes the filter.
6. To include any children of a Pig, Hive, or Oozie activity in your search results, check the Include
Child Activities checkbox. Otherwise, only the top-level activity will be included, even if one or
more child activities matched the filter criteria.
7. Click the Search button (which appears when you start creating the filter) to run the filter.
Your filter will only persist for this user session when you log out, your tasks list filter will be removed.
Activity Monitoring
Viewing Activity Details in a Report Format
The Details tab for an activity shows the job or activity statistics in a report format.
To view activity details for an individual MapReduce job:
1. Select a MapReduce job from the Activities list or
Select a Pig, Hive or Oozie activity, then select a MapReduce job from the Children tab.
2. Select the Details tab after the job page is displayed.
This displays information about the individual MapReduce job in a report format.
From this page you can also access the Job Details and Job Configuration pages on the JobTracker web
UI.
Click the Job Details link at the top of the report to be taken to the job details web page on the
JobTracker host.
Click the Job Configuration link to be taken to the job configuration web page on the JobTracker
host.
Activity Monitoring
The first row in the comparison table displays a set of visual indicators of how the selected job
deviates from the mean of all the similar jobs (the combined Average values). This is displayed
for each statistic for which a comparison makes sense.
The diagram in the ID column shows the elements of the indicator, as follows:
o
The line at the midpoint of the bar represents the mean value of all similar jobs. The
colored portion of the bar indicates the degree of deviation of your selected job from
the mean. The top and bottom of the bar represent two standard deviations (plus or
minus) from the mean.
For a given metric, if the value for your selected job is within two standard deviations of
the mean, the colored portion of the bar is blue.
If a metric for your selected job is more than two standard deviations from the mean,
the colored portion of the bar is red.
The following rows show the actual values for other similar jobs. These are the sets of values
that were used to calculate the mean values shown in the Combined Averages row.
The most recent ten similar jobs are used to calculate the average job statistics, and these are
the jobs that are shown in the table.
Activity Monitoring
The Task Distribution chart is useful for detecting tasks that are outliers in your job, either because of
skew, or because of faulty TaskTrackers. The chart can clearly show if some tasks deviate significantly
from the majority of task attempts.
Normally, the distribution of tasks will be fairly concentrated. If, for example, some Reducers receive
much more data than others, that will be represented by having two discrete sections of density on the
graph. That suggests that there may be a problem with the user code, or that there's skew in the
underlying data. Alternately, if the input sizes of various Map or Reduce tasks are the same, but the
time it takes to process them varies widely, it might mean that certain TaskTrackers are performing
more poorly than others.
You can click in a cell and see a list of the TaskTrackers that correspond to the tasks whose performance
falls within the cell.
The Y-axis can show Input or Output records or bytes for Map or Reduce tasks, or the amount of CPU
seconds for the user who ran the job, while the X-axis shows the task duration in seconds.
In the Select Axis: field you can chart the distribution of the following:
TaskTracker Nodes
To the right of the chart is a table that shows the TaskTracker hosts that processed the tasks in the
selected cell, along with the number of task attempts each host executed.
You can select a cell in the table to view the TaskTracker hosts that correspond to the tasks in the cell.
The area above the TaskTracker table shows the type of task and range of data volume (or User
CPUs) and duration times for the task attempts that fall within the cell.
The table itself shows the TaskTracker nodes that executed the tasks that are represented
within the cell, and the number of task attempts run on that node.
Clicking a TaskTracker host name takes you to the Role Status page for that TaskTracker instance.
Searching Logs
Searching Logs
The Logs page presents log information for Hadoop services, filtered by service, role, host, and/or search
phrase as well log level (severity).
Logs are, by definition, historical, and are meaningful only in that context. So the Time Marker, used to
pinpoint status at a specific point in time, is not available on this page. The Current Time button ( ) is
also not available.
You can use the Time Range Selector or the Custom Date panel ( ) to set a specific start and end time,
or to choose a period of time back from the current time (selections range from the past 30 minutes to
the past day). See Selecting the Time Range for details of how time range selection works in Cloudera
Manager.
Searching Logs
To perform a log search:
1. Click the Logs tab.
2. Modify any of the log search parameters as described below, if appropriate. This is optional, all
the settings have defaults.
3. Click the Search button.
The logs for each of the selected roles are searched. If any of the hosts cannot be searched, an error
message notifies you of the error and the host(s) on which it occurred.
The Log Search criteria include the following settings:
Services
This section presents a list of all the service instances and roles currently instantiated in your
cluster.
By default, all services and roles are selected to be included in your log search; the Services
checkbox lets you select or deselect all services and roles in one operation. You can expand each
service and limit the search to specific roles by selecting or deselecting individual roles.
Hosts
You can also specify which hosts should be included in the search. To simplify entry, as soon as
you start typing a host name, Cloudera Manager provides a list of hosts that match the partial
name. You can add multiple names, separated by commas. The default is to search all hosts.
Search Phrase
You can specify a string to match against the log message content. The search is case-insensitive,
Searching Logs
and the string can be a regular expression, such that wildcards and other regex primitives are
supported.
Search Timeout
Specifies a time (in seconds) after which the search will time out. The default is 10 seconds.
Search Results
Search results are displayed in a list with the following columns:
Host: The host where this log entry appeared. Clicking this link will take you to the Host Status
page (see Viewing Detailed Information about Hosts).
Log Level: The log level (severity) associated with this log entry.
Time: The date and time this log entry was created.
Message: The message portion of the log entry. Clicking a message takes you to the Log Details
page, which presents a display of the full log, showing the selected message (highlighted) and
the 100 messages before and after it in the log.
If there are more results than can be shown on one page (per the Results per Page setting you selected),
Next and Prev buttons let you view additional results.
Log Details
Clicking the View Details link takes you to the Log Details page, which presents a portion of the full log,
showing the selected message (highlighted) and the 100 messages before and after it in the log.
The page shows you:
The full path and name of the log file you are viewing.
The offset in the file of the message you selected, as well as the current offset range (the range
of messages shown on the page).
The 100 messages before and after the one you selected.
View the log entries in either expanded or contracted form using the buttons to the left of the
date range at the top of the log.
Download the full log using the Download Full Log button at the top right of the page.
View log details for a different host or for a different role on the current host, by clicking the
Change... link next to the host or role at the top of the page. In either case this shows a pop-up
where you can select the role or host you want to see.
Cloudera Manager User Guide | 133
b. If the property is a numeric type, choose an operator in the operator drop-down list.
c. Type an event property value in the value text field.
Note that for some properties, where the list of values is finite and known, you can start
typing and then select from a list of potential matches.
For some properties you can include multiple values in the value field. For example, you
can create a filter like "SERVICE = HBASE1, HDFS1" which will find events from either
service.
d. Click Add Another to add additional filter components. A filter containing the property
and its value is added to the list of filters at the left. Multiple filters are combined using
AND e.g. SERVICE = HBASE1 AND SEVERITY = CRITICAL)
e. Click Search.
The log displays all events that match the filter criteria.
Removing a Filter
To remove a filter from a filter specification:
1. Click the at the right of the filter.
The filter is removed and the event log redisplays all events that match the remaining filters.
If there are no filters, the event log displays all events.
Modifying a Filter
To modify a filter:
1. Click the filter.
The filter expands into separate property, operator, and value fields.
2. Modify the value of one or more fields.
3. Click Search.
A filter containing the property, operation, and value is added to the list of filters at the left and
the event log redisplays all events that match the modified set of filters.
The Events Log Display
Event log entries are ordered (within the time range you've selected) with the most recent at the top.
Click the View link to go to the Logs page to view the log entry for the event.
Click the arrow at the right side of the event entry ( ) to display details of the entry.
o
In the detail display, clicking on the URL link also takes you to the log entry for the
event.
) in the entry.
The username and password of the email user that will be logged into the mail server as
the "sender" of the alert emails.
A comma-separated list of email addresses that will be the recipients of alert emails.
The format of the email alert message. Select json if you need the message to be parsed
by a script or program.
4. Click the Save Changes button at the top of the page to save your settings.
5. You will need to restart the Alert Publisher role to have these changes take effect.
Configuring SNMP
Before you enable SNMP traps, make sure you have configured your trap receiver (Network
Management System or SNMP server) with the Cloudera MIB.
Enter the DNS name or IP address of the Network Management System (SNMP server)
acting as the trap receiver in the SNMP NMS Hostname property.
Select the version of SNMP you are using: SNMPv2, SNMPv3 authentication with no
privacy (authNoPriv), or SNMPv3 with no authentication and no privacy
(noAuthNoPriv).
For SNMPv3, you must enter the SNMP Server Engine ID.
For SNMPv3 with authentication (authNoPriv) you must also enter the Security user
name, Authentication protocol, and protocol pass phrase.
You can also change other settings such as the port, retry, or timeout values.
Alert Settings
The Alerts tab (found on the Administration page) provides a summary of the settings for alerts in your
clusters.
Click the gear icon ( ) to display the Administration page, then click the Alerts tab.
Alert Type
The left column lets you select by alert type (Health, Log, or Activity) and within that by service instance.
In the case of Health alerts, you can look at alerts for Hosts as well. You can select an individual service
to see just the alert settings for that service.
Cloudera Manager User Guide | 137
In the Cloudera Manager Admin Console, pull down the Charts tab in the top navigation bar and
select Search.
Terminology
Entity: A Cloudera Manager component that has metrics associated with it, such as a service, role, roletype, or host.
Metric: A property that can be measured to quantify the state of an entity or activity, such as the
number of open file descriptors, or cpu utilization percentage.
Time series: A list of (time, value) pairs that is associated with some (entity, metric) pair, e.g.,
(datanode-1, fd_open). In more complicated cases the time-series can represent operations on other
time-series, for example, (datanode-1, cpu_user) + (datanode-1, cpu_system).
Facet: A display grouping of the dataset, shown in separate charts. By default, when a query returns
multiple time series, they are displayed in individual charts. Facets allow you to display the time series
in separate charts, in a single chart, or grouped by various attributes of the set of time series.
To change the chart-type, click one of the possible chart-types on the left: Line, Stack Area, and
Bar.
Saving a View
You can save the charts and their configurations (chart-type, dimension, and y-axis minimum and
maximum) as a view. To save the plots as a new view, click the Save as View button, enter a view name,
and then click Create. The new view will appear on the menu under the top-level Charts tab so that you
can select it later. See Creating a Custom View for more information.
Each tsquery returns one or more time-series. The (2) tsquery example shown above returns one timeseries for each DataNode. A time-series is a stream of metric data points for a specific entity. Each
metric data point contains a timestamp and the value of that metric at that timestamp. See the section
Entites and Predicates below for details on how modeled by Cloudera Manager.
Multiple tsqueries can be concatenated with semi-colons. The (3) example shown above can be written
as:
select jvm_heap_used_mb / 1024 where category=ROLE and hostname=myhost;
select jvm_heap_commited_mb / 1024 where category=ROLE and hostname=myhost
Tsquery tokens are case insensitive: Select, select or SeLeCt are accepted for SELECT. This applies
for all tsquery tokens. Tsquery attribute names and most attribute values are also case insensitive. The
displayName and entityName attributes are two whose values are case sensitive.
The metric expression can be replaced with a * (asterisk) as shown in the (1) example above. In that
case, all metrics that are applicable for selected entities, such as DATANODEs in the (1) example, will be
returned.
Filter expressions can be omitted as shown in the (4) example above. In that case, time-series for all
entities for which the metrics are appropriate will be returned. For this query you would see the
jvm_new_threads metric for NameNodes, DataNodes, TaskTrackers, and so on.
The query select * is invalid. For any other query, a maximum of 250 time-series will be returned.
This value can be configured in the SCM server settings.
Metric Expression
A metric expression is a comma-delimited list of one or more metric expression statements. A metric
expression statement is either the name of a metric collected by Cloudera Manager or a scalar value. For
example:
jvm_heap_used_mb, cpu_user, 5
See FAQ below to learn how to discover metrics collected by Cloudera Manager and use-cases for using
scalar values in metric expressions.
Metric Expression Operators
Metrics expressions support the following binary operators:
+ (plus)
- (minus)
* (multiplication)
/ (division)
dt: derivative with negative values. The change of the underlying metric expression, per second.
dt0: derivative where negative values are skipped (useful for dealing with counter resets). The
change of the underlying metric expression, per second.
getHostFact
getHostFact(string factName, double defaultValue). Retrieves a fact about a host, for example:
select dt(total_cpu_user) / getHostFact(numCores, 2) where category=HOST
The example above will divide the results of dt(total_cpu_user) by the current number of cores for each
host. If the number of cores cannot be determined the default will be used, '2'.
The 'like' operator accepts only quoted values. value can be any regular expression as specified in
regular expression constructs in the Java Pattern class documentation.
Here are some of the time-series attributes and their possible values.
Time Series Attribute Name Possible values
roleType
category
serviceType
hostname
host name
hostId
rackId
clusterId
The cluster id. To specify a cluster by its name, use filter 'where
category=CLUSTER and displayName="[the display name]"'
serviceName
device
partition
mountpoint
iface
componentName
The ROLE category applies to all role types (see roleType attribute above). The SERVICE category
applies to all service types (see serviceType attribute above). For example, to retrieve the committed
heap for all roles on host1 use
select jvm_committed_heap_mb where category=ROLE and hostname="host1"
FAQ
How do I compare all disk io for all the datanodes that belong to a specific hdfs
service?
select bytes_read, bytes_written where roleType=datanode and
serviceName=hdfs1
Replace 'hdfs1' with the appropriate service name. You can then facet by "Metric" and compare all
datanodes byte_reads and byte_writes metric at once. See the Charting Time-Series Data page for more
details about faceting.
You can then facet the results to be all in one chart. The scalar threshold '50' will also be rendered on
the chart. See the Charting Time-Series Data page for more details about faceting.
I get "The query hit the maximum results limit" warning. How do I work around the
limit?
There is a limit on the number of results that can be returned by a query. When a query results in more
time-series streams than the limit a warning for "partial results" is issued. To circumvent the problem try
to reduce the number of metrics you are trying to retrieve. You can also use the like operator to limit
the query to a subset of entities. For example, instead of
select service_time, await_time, await_read_time, await_write_time, 50
where category=disk
The latter query will retrieve the disks for only ten hosts.
Metric Aggregation
Overview
It is often useful to see an aggregated view of the activity on a cluster. For example, one might want to
see the average number of bytes read per datanode, or they might want to see the maximum number of
bytes read by any datanode. To make this easy we pre-aggregate many of these metrics and allow users
to access them through our charts.
What We Aggregate
We aggregate metrics based on the category of the entity the generated them. The categories map to
components in the system such as hosts, disks, regionservers, and HDFS services.
Metrics are aggregated from their generating entity to larger entities they are a part of. For example,
metrics that are generated by disks, network interfaces, and filesystems are aggregated to their
respective hosts and clusters. Generally, this hierarchy is defined as follows:
Viewing Reports
Again like in the weighted gauges case, sum aggregations are a special case. For counters they represent
the total number of times an event occurred and are NOT a rate. In this case we append the word "sum"
to the end of name just like we would for gauge metrics - "jvm_gc_count_regionserver_sum"
Viewing Reports
The Reports tab lets you create reports about the usage of HDFS in your cluster data size and file
count by user, group, or directory. It also lets you report on the MapReduce activity in your cluster, by
user.
The Search files and Manage Directories button takes you to a File Browser where you can search files,
manage directories, and set quotas.
If you are managing multiple clusters, or have multiple NameServices configured (if High Availability
and/or Federation is configured) there will be separate reports for each cluster and NameService.
To view reports of HDFS usage or MapReduce activity:
Viewing Reports
Raw Bytes
The physical number of bytes (total disk space in HDFS) used by the files
aggregated by user, group, or directory. This does include replication,
and so is actually Bytes times the number of replicas.
Note that Bytes and Raw Bytes are shown in IEC binary prefix notation ( 1 GiB = 1 * 2 30 ).
The directories shown in the Current Disk Usage by Directory report are the HDFS directories you have
set as watched directories. You can add or remove directories to or from the watch list from this report;
click the Search Files and Manage Directories button at the top right of the set of reports for the cluster
or NameService (see Search Files and Manage Directories).
The report data is also shown in chart format:
Move the cursor over the graph to highlight a specific period on the graph and see the actual
value (data size) for that period.
You can also move the cursor over the user, group, or directory name (in the graph legend) to
highlight the portion of the graph for that name.
You can right-click within the chart area to save the whole chart display as a single image (a
.PNG file) or as a PDF file. You can also print to the printer configured for your browser.
Click the report name (link) to produce the initial report. This generates a report that shows
Raw Bytes for the past month, aggregated daily.
Select the Start Date and End Date to define the time range of the report.
Select the Graph Metric you want to graph: bytes, raw bytes, or files and directories count.
In the Report Period field, select the period over which you want the metrics aggregated. The
default is Daily. This affects both the number of rows in the results table, and the granularity of
the data points on the graph.
As with the current reports, the report data is also presented in chart format, and you can use the cursor
to view the data shown on the charts, as well as save and print them.
Viewing Reports
For weekly or monthly reports, the Date indicates the date on which disk usage was measured.
The directories shown in the Historical Disk Usage by Directory report are the HDFS directories you
have set as watched directories (see Search Files and Manage Directories).
Downloading Reports as XLS or CVS files
Any report can be downloaded to your local system as an XLS file (Microsoft Excel 97-2003 worksheet)
or CSV (Comma-Separated Values) text file.
To download a report:
From the main page of the Report tab, click CSV or XLS link next to in the column to the right of
the report name or
From any report page, click the Download CSV or Download XLS buttons.
Either of these opens the Open file dialog where you can open or save the file locally.
Activities Reports
The following report shows metrics on the job activity in your cluster.
MapReduce Usage by User
This produces a tabular report that you can use to view aggregate job activity per hour, day, week,
month, or year.
In the Report Period field, select the period over which you want the metrics aggregated.
Default is Daily.
For weekly reports, the Date indicates the year and week number (e.g. 2011-01 through 2011-52).
For monthly reports, the Date indicates the year and month by number (2011-01 through 2011-12).
The Activity data in these reports comes from the Activity Monitor; they can include all the data
currently in the Activity Monitor database. Note that Activity Monitor data will eventually expire, based
on the configuration you have set for Activity Monitor.
Viewing Reports
2. Browse the file system to find the directory for which you want to set quotas.
3. Click the Manage Quota button at the right of the row for the directory you want.
A Manage Quota pop-up appears, where you can set file count or disk space limits for the
directory you have selected.
4. When you have set the limits you want, click OK
Searching within the File System
The Search Files and Manage Directories page lets you search the file system using predefined search
criteria. The default is Search by Name but you can also search for large files, directories with quotas,
watched directories, or custom search criteria which you can construct using criteria such as filename,
owner, file size and so on.
To search the file system:
1. From the Reports page, click Search Files and Manage Directories for the namespace you want
to search.
2. In the Search menu at the top right of the page, select the type of query you want to use.
Depending on what you select, you may be presented with different fields to fill in, or different
views of the file system.
For example, selecting Search by Name will show a field where you can enter the name.
Selecting Large Files will provide fields where you provide size to be used as the search criteria.
If you select Custom... and enter multiple criteria, all of them must be met for a file to be
considered a match.
3. Click the Search button (
If you search within a directory, only files within that directory will be found, so if you're browsing
/user and do a search, you might find /user/foo/file, but you will not find /bar/baz.
Watched Directories
Search Files and Manage Directories lets you designate the HDFS directories that you want watch for
inclusion in the Directory-based usage reports (Current Disk Usage By Directory and Historical Disk
Usage By Directory).
To add or remove directories from the Directory-based usage reports:
1. From the main page of the Report tab, click the Search Files and Manage Directories button at
the upper right of the page or
From either of the Directory-based reports, click the Search Files and Manage Directories
button.
2. Click the Star icon ( ) at the left of the directories you want to include. The icon changes to the
activated form ( ). You can navigate through the file system to see the directory you want to
add you can include a directory at any level without needing to include its parent.
Administration
3. To remove a directory from the list, just deactivate the Star icon.
Administration
Click the gear icon to display the Administration page where you can configure settings that affect
how Cloudera Manager interacts with your cluster.
Properties Tab
On the Properties tab, you can set:
Security: Set TLS encryption settings to enable TLS encryption between the Cloudera Manager
Server, Agents, and clients. For configuration instructions, see Configuring TLS Security for
Cloudera Manager
You can also:
o
Set the realm for Kerberos security and point to a custom keytab retrieval script. For
configuration instructions, see Configuring Hadoop Security with Cloudera Manager
Ports and Addresses: Set ports for the Cloudera Manager Admin Console and Server. For
configuration instructions, see Configuring the Ports for the Admin Console and Agents.
Other: To enable Cloudera usage data collection For configuration instructions, see Configuring
Anonymous Usage Data Collection.
You can also:
Set a custom header color and banner text for the Admin console.
Disable/enable the auto-search for the Events panel at the bottom of a page.
Enterprise Support: Enable access to online Help files from the Clouders web site rather than
from locally-installed files. (see Opening the Help Files from the Cloudera Web Site), and enable
automatic sending of diagnostic data to Cloudera when you trigger a data collection (see
[Sending Diagnostic Data to Cloudera
Administration
External Authentication: Specify the configuration to use LDAP, Active Directory, or an external
program for authentication. See Configuring External Authentication for instructions.
Import Tab
You can import Cloudera Manager configuration settings to transfer the settings. For instructions, see
Importing Cloudera Manager Settings.
Alerts Tab
This tab provides a summary of the settings for alerts in your clusters. From this tab you can view the
alerts by alert type (Health, Log, or Activity alerts) and by service within those categories. You can also
see the email addresses configured as recipients for alerts, and send test messages. See Alert Settings
for more information.
Users Tab
Cloudera Manager user accounts allow users to log into the Cloudera Manager Admin Console. For
configuration instructions, see Cloudera Manager User Accounts.
Kerberos Tab
After enabling and configuring Hadoop security using Kerberos on your cluster, you can view and
regenerate the Kerberos principals for your cluster. If you make a global configuration change in your
cluster, such as changing the encryption type, you would use the Kerberos tab to regenerate the
principals for your cluster.
Important
Do not regenerate the principals for your cluster unless you have made a global configuration
change. Before regenerating, be sure to read the Configuring Hadoop Security with Cloudera
Manager to avoid making your existing host keytabs invalid.
License Tab
The License tab indicates the status of your license (for example, whether you license is currently valid)
and shows you the owner, the license key, and expiration date of the license. It also shows any Add-Ons
enabled by your license, such as Impala Monitoring.
Administration
You can enter a new license (for example, to enable additional add-ons) by browsing to the new license
file and uploading the license.
Language Tab
You can change the language of the Cloudera Manager Admin Console User Interface through the
language preference in your browser. Information on how to do this for the browsers supported by
Cloudera Manager is shown under the Language tab. You can also change the language for the
information provided with activity and health events, and for alert email messages.
To change the language of the activity and health event information and alert email messages, select the
language you want from the drop-down list on this page, the click Save Changes.
You can specify a single base Distinguished Name (DN) and then provide a "Distinguished Name
Pattern" to use to match a specific user in the LDAP directory.
Search filter options let you search for a particular user based on somewhat broader search
criteria for example Cloudera Manager users could be members of different groups or
organizational units (OUs), so a single pattern won't find all those users. Search filter options
also let you find all the groups to which a user belongs, to help determine if that user should
have login or admin access.
Administration
Note that if you select External Only, users who are administrators in the Cloudera Manager
database will still be able to log in with their database password. This is to prevent the system
from locking everyone out if the authentication settings get misconfigured such as with a bad
LDAP URL.
5. Go to the section below for the type of authentication you want to configure, and follow the
steps to set the properties appropriately.
Configure User Authentication Using Active Directory
1. For External Authentication Type select Active Directory.
2. Provide the URL of the Active Directory server.
3. Provide the NT domain to authenticate against.
4. Optionally, provide a comma-separated list of LDAP group names in the LDAP User Groups
property. If this list is provided, only users who are members of one or more of the groups in the
list will be allowed to log into Cloudera Manager. If this property is left empty, all authenticated
LDAP users will be able to log into Cloudera Manager.
For example, if there is a group called
"CN=ClouderaManagerUsers,OU=Groups,DC=corp,DC=com", add the group name
ClouderaManagerUsers to the LDAP User Groups list to allow members of that group to log
in to Cloudera Manager. The group names are case-sensitive.
5. In the LDAP Administrator Groups property you can provide a list of groups whose members
should be given administrator access when they log in to Cloudera Manager. (Note that admin
users must also be a member of at least one of the groups specified in the LDAP User Groups
property or they will not be allowed to log in.) If this is left empty, then no users will be granted
administrator access automatically at login administrator access will need to be granted
manually by another administrator.
Configure User Authentication Using an OpenLDAP-compatible Server
1. For External Authentication Type select LDAP.
2. Provide the URL of the LDAP server and (optionally) the base Distinguished Name (DN) (the
search base) as part of the URL for example ldap://ldapserver.corp.com/dc=corp,dc=com.
3. If your server does NOT allow anonymous binding:
Provide the user DN and password to be used to bind to the directory. These are the LDAP Bind
User Distinguished Name and LDAP Bind Password properties. By default, Cloudera Manager
assumes anonymous binding.
Administration
4. To use a single "Distinguished Name Pattern," provide a pattern in the LDAP Distinguished
Name Pattern property.
Use {0} in the pattern to indicate where the username should go. For example, to search for a
distinguished name where the the uid attribute is the username, you might provide a pattern
similar to uid={0},ou=People,dc=corp,dc=com. Cloudera Manager substitutes the name
provided at login into this pattern and performs a search for that specific user. So if a user
provides the username "foo" at the Cloudera Manager login page, Cloudera Manager will search
for the DN uid=foo,ou=People,dc=corp,dc=com.
Note that if you provided a base DN along with the URL, the pattern only needs to specify the
rest of the DN pattern. For example, if the URL you provide is ldap://ldapserver.corp.com/dc=corp,dc=com, and the pattern is uid={0},ou=People, then the
search DN will be uid=foo,ou=People,dc=corp,dc=com.
5. You can also search using User and/or Group search filters, using the LDAP User Search Base,
LDAP User Search Filter, LDAP Group Search Base and LDAP Group Search Filter settings.
These allow you to combine a base DN with a search filter to allow a greater range of search
targets.
For example, if you want to authenticate users who may be in one of multiple OUs, the search
filter mechanism will allow this. You can specify the User Search Base DN as dc=corp,dc=com
and the user search filter as uid={0}. Then Cloudera Manager will search for the user
anywhere in the tree starting from the Base DN. Suppose you have two OUs ou=Engineering
and ou=Operations Cloudera Manager will find User "foo" if it exists in either of these OUs,
i.e. uid=foo,ou=Engineering,dc=corp,dc=com or
uid=foo,ou=Operations,dc=corp,dc=com.
You can use a user search filter along with a DN pattern, so that the search filter provides a
fallback if the DN pattern search fails.
The Groups filters let you search to determine if a DN or user name is a member of a target
group. In this case, the filter you provide can be something like member={0} where {0} will be
replaced with the DN of the user you are authenticating. For a filter requiring the user name, {1}
may be used, as memberUid={1}. This will return a list of groups this user belongs to, which will
be compared to the list in the LDAP User Groups and LDAP Administrator Groups properties
(discussed previously in the section about Active Directory).
Configure User Authentication Using an External Program
You can configure Cloudera Manager to use an external authentication program of your own choosing.
Typically, this may be a custom script that interacts with a custom authentication service.
1. For External Authentication Type select External Program.
Administration
2. Provide a path to the external program in the External Authentication Program Path property.
Cloudera Manager will call the external program with the user name as the first command line
argument. The password is passed over stdin. Cloudera Manager assumes the program will return the
following exit codes:
2. On the Properties tab, under the Ports and Addresses category, set the following options as
described below:
Setting
Description
Specify the HTTP port to use to access the Server via the
Admin Console.
Specify the HTTPS port to use to access the Server via the
Admin Console.
Administration
2. On the Properties tab, under the Other category, set the Allow Usage Data Collection option to
enable or disable anonymous usage data collection.
3. Click Save Changes.
Note: As of Cloudera Manager 4.1, the import of configuration settings through the Cloudera
Manager Admin Console UI has been deprecated. If you have exported a configuration using the
Export tab in an older version of Cloudera Manager, you can still import it following the instructions
below. However, going forward, importing a deployment should be done using the Cloudera
Manager API. See the documentation for /cm/deployment for details.
Important
You must import the configuration settings on a clean cluster that does not have existing hosts or
services.
Important
When you first installed the Cloudera Manager Server, you set up a database to store the Cloudera
Manager service configuration information (see Installing and Configuring Databases). That
database also stores the Cloudera Manager license information. If the original database is lost (for
Administration
example, the database was deleted and you recreated a new one), you must first upload your
license on the Administration > License tab and restart the Cloudera Manager Server before
importing the configuration settings. If you don't upload your license first to store the license
information in the new database, the import will fail.
2. Delete all services on the Services tab by choosing Delete from the Actions menu next to each
service instance.
3. Delete all hosts on the Hosts tab by clicking the check box at the top of list of hosts, and then
click Delete.
4. Copy the configuration script file that you downloaded during export to the host with the new
Cloudera Manager server.
5. Click the gear icon
Maintenance
2. On the Properties tab, under the Enterprise Support category, enable the Open latest Help files
from the Cloudera website.
3. Click Save Changes.
Maintenance
There may be situations where you need to temporarily stop the Cloudera Manager server or Cloudera
Manager agents on selected nodes, for example, in order to perform maintenance on a host. The
following topics cover stopping or restarting the Cloudera Manager server or its agents.
Maintenance Mode
Maintenance Mode
Maintenance mode allows you to suppress alerts for a host, service, role, or even the entire cluster. This
can be useful when you need to take actions in your cluster (make configuration changes and restart
various elements) and do not want/need to see the alerts that will be generated due to those actions.
Putting a component into maintenance mode does not prevent events from being logged; it only
suppresses the alerts that those events would otherwise generate. As a result, you can still see a history
of the events that were recorded due to your actions.
You can enable maintenance mode for a service, a role, a host, or the entire cluster.
You can view the status of Maintenance Mode in your cluster with the View Maintenance Mode Status
button from the
All Services page. This button appears for each cluster, and separately for the Cloudera Management
Services.
When you enter maintenance mode on a component (cluster, service, or host) that has subordinate
components (for example, the roles for a service) the subordinate components are also put into
maintenance mode. These are considered to be in "effective" maintenance mode, as they have inherited
the setting from the higher-level component.
For example:
If you set the HBase service into maintenance mode, then its roles (HBase Master and all Region
Servers) are put into effective maintenance mode.
If you set a host into maintenance mode, then any roles running on that host are put into
effective maintenance mode.
Maintenance
Components that have been explicitly put into maintenance mode show the following icon (
).
Components that have entered effective maintenance mode as a result of inheritance from a higherlevel component show a similar icon, but it is grey and yellow instead of black and red (
).
Maintenance
Maintenance
To restart it:
service cloudera-scm-server start
Note
If you are intending to perform an upgrade of Cloudera Manager, then you should stop the
management service (through the Admin Console) prior to stopping the server.
To stop the Agent itself, but leave the processes it manages running:
$ sudo service cloudera-scm-agent stop
Maintenance
If you want to stop or restart the Agents themselves and the services they manage, use one of the
following commands on every Agent host.
When an Agent is stopped using either of the stop or hard_stop commands, you cannot use either of
the restart or hard_restart commands to start it. You must use the following start command to
start a stopped agent regardless of how you stopped it:
$ sudo service cloudera-scm-agent start
Possible Causes
Solutions
Starting Services
After you click the Start button to The host machine is disconnected
start a service, the Finished status from the Server, as indicated by
doesn't display.
missing heartbeats on the Hosts
tab.
Problems
Possible Causes
Solutions
The Cloudera Manager Server and Agent logs. See Viewing the Cloudera Manager Server Log.
The Events tab lets you search for and display events and alerts that have occurred within a
selected time range filtered by service, hosts, and/or keywords. See Events and Alerts Filtering.
The Logs tab presents log information for Hadoop services, filtered by role, host, and/or
keywords as well log level (severity). See Log Filtering.
Event and Log search features are also provided for individual user jobs, or for specific service.
See the sections on Activity Monitoring and Service Monitoring.
To obtain help solving problems when using Cloudera Manager on your cluster, you can collect and send
diagnostic data to Cloudera Support.
You can have Cloudera Manager send diagnostic data to Cloudera automatically whenever a data
collection occurs either on a regular schedule or when you specifically trigger a data collection. You
can also send a collected data set manually. Cloudera Manager is configured by default to collect data
weekly and to send it automatically. You can schedule the frequency of data collection on a daily,
weekly, or monthly schedule, or disable the scheduled collection of data entirely. Separately you can
disable the automatic sending of data to Cloudera see Disabling the Automatic Sending of Diagnostic
Data
Note
To automatically send diagnostic data requires the Cloudera Manager Server host to have Internet
access, and be configured for sending data automatically. If your Cloudera Manager server does not
have Internet access, you can manually send the diagnostic data as described below.
2. Under the Properties tab, Enterprise Support category, click in the field for the property Send
diagnostic Data to Cloudera Automatically and select the frequency you want.
3. You can change the day and time of day that the collection will be performed.
4. Click Save Changes
You can see the setting for the current data collection frequency under the Support menu in the main
navigation bar.
Collecting and Sending Diagnostic Data to Cloudera on Demand
By default, Cloudera Manager will automatically attempt to send your collected data to Cloudera when
you trigger a data collection. If you do not want data sent automatically, you must disable that feature
(see Disabling the Automatic Sending of Diagnostic Data).
To change the System Identifier, go to the Administration page (via the gear icon
under the Properties tab, Other category.
Cloudera Manager pre-populates the End Time based on the setting of the Time Range
Selector. You should change this to be a few minutes after you observed the problem or
condition that you are trying to capture.
),
Note that the time range is based on the time zone of the host where Cloudera Manager
server is running.
o
If you have a support ticket open with Cloudera support, please include the support
ticket number in the field provided.
2. Under the Properties tab, Enterprise Support category, uncheck the box for Send diagnostic
Data to Cloudera Automatically.
3. Click Save Changes
Note
Automatically sending diagnostic data may fail sometimes and return an error message of "Could
not send data to Cloudera." To work around this issue, you can manually send the data to Cloudera
Support as described below.
Note
If you want to send your file manually but choose not to download the script, you can follow the
instructions documented on the Cloudera Customer Portal at Get Support - Uploading Files for
Cloudera Support.
Up to 1000 Cloudera Manager Audit Events: Configuration changes, add/remove of users, roles,
services, etc.
Data about the cluster structure which includes a list of all hosts, roles, and services along with
the configs that are set through Cloudera Manager. Where passwords are set in Cloudera
Manager, the passwords are not returned.
One day's worth of Cloudera Manager events: This includes critical errors Cloudera Manager
watches for and more.
Current health information for hosts, service, and roles. Includes results of health tests run by
Cloudera Manager.
Heartbeat information from each host, service, and role. These include status and some
information about memory/disk/processer usage.
For each machine in the cluster, the result of running a number of system-level commands on
that machine.
Logs from each role on the cluster, as well as the CM server and agent logs.
Select: Choose the key datasets that are critical for your business operations.
Schedule: Create an appropriate schedule for data replication trigger replication as quickly as
is appropriate for your business needs.
Monitor: Track progress of your replication jobs through a central console and easily identify
issues or files that failed to be transferred.
Alert: Issue alerts when a replication job fails or is aborted so that the problem can be diagnosed
expeditiously.
These capabilities work seamlessly across Hive and HDFS replication can be setup on files or
directories in the case of HDFS and on tables in the case of Hive without any manual translation of
Hive datasets into HDFS datasets or vice-versa. Hive Metastore information is also replicated which
means that the applications that depend upon the table definitions stored in Hive will work correctly on
the replica side as well as the source side as table definitions are updated.
Built on top of a hardened version of distcp the replication uses the scalability and availability of
MapReduce itself to parallelize the copying of files using a specialized MapReduce job that diffs and
transfers only changed files from each Mapper to the replica side efficiently and quickly.
Also available in the new version is the ability to do a Dry Run to verify configuration and understand
the cost of the overall operation before actually copying the entire dataset.
Since the functionality is implemented as an add-on to Cloudera Manager, all Cloudera BDR functionality
is available directly through the Cloudera Manager Admin Console.
Minimum Supported Version: CDH 4.0
License: BDR requires a separate license for each node on the destination side of your replication cluster
The following sections are covered in this topic:
HDFS Replication
Hive Replication
If there are no existing peers, you will see only an Add Peer button in addition to a short
message.
If you have existing peers, they are listed here.
3. Click the Add Peer button.
4. In the Add Peer pop-up, provide a name, the URL (including the port) of the Cloudera Manager
Server that will act as the source for the data to be replicated, and the login credentials for that
server.
Note that the Data Replication feature recommends that SSL be used, and a warning is shown if
the URL uses http instead of https. However, you can ignore the warning and proceed if SSL is
not available.
5. Click the Add Peer button in the pop-up to create the peer relationship.
6. To test the connectivity between your current Cloudera Manager server and the remote server
select Test Connectivity from the Actions menu associated with the peer.
Note that the current Cloudera Manager system is also available as a replication source.
HDFS Replication
HDFS Replication enables you to copy (replicate) your HDFS data from a remote Peer Cloudera Manager
server to your local Cloudera Manager server (the server whose Admin console you are currently logged
into). You can add Peers though the Administration > Peers tab (see Designating Peer Clusters).
You can also use the Add Replication Source link on the HDFS Replication page to go to the Peers page.
Once you have a peer relationship set up with a Cloudera Manager server, you can configure replication
of your HDFS data.
1. From the Services tab, go to the CDH4 HDFS service where you want to host the replicated data.
The user that should run the MapReduce job. By default this is hdfs. If you want to run
the MR job as a different user, you can enter that here. If you are using Kerberos, you
MUST provide a user name here, and it must be one with an ID greater 1000.
Limits for the number of map slots and for bandwidth per mapper. The defaults are
unlimited.
Whether to abort the job on an error (default is not to do so). This means that files
copied up to that point will remain on the destination, but no additional files will be
copied.
Whether to remove deleted files from the target directory if they have been removed
on the source.
Whether to preserve the block size, replication count, and permissions as they exist on
the source file system, or to use the settings as configured on the target file system. The
default is to preserve these settings as on the source.
Note: If you leave the setting to preserve permissions, then you must be running as
a superuser. You can use the "Run as" option to ensure that is the case.
Whether to generate alerts for various state changes in the replication workflow.
From the Actions menu for the replication task you want to test and click Dry Run.
From the Actions menu for a replication task, in addition to Dry Run you can also:
If the replication fails, that will be indicated and the timestamp will appear in Red text.
To view more information about completed replication runs, click anywhere in the replication
job entry row in the replication list. This displays sub-entries for each past replication run.
To view detailed information about a particular past run, click the entry for that replication run.
This opens another sub-entry that shows:
A result message
When viewing a sub-entry, you can dismiss the sub-entry by clicking anywhere in its parent
entry, or by clicking the return arrow icon !at the top left of the sub-entry area.
Hive Replication
Hive Replication enables you to copy (backup) and keep in sync the Hive Metastore and data from
clusters managed by a remote peer or local Cloudera Manager server, and keep the copy on a cluster
managed by your local Cloudera Manager server (the server whose Admin console you are currently
logged into). You can add Peers though the Administration > Peers tab (see Designating Peer Clusters).
You can use the Add Peer link on the Replication page to go to the Peers page to add a new peer
Cloudera Manager server.
Once you have a peer relationship set up with a Cloudera Manager server, you can configure replication
of your Hive Metastore data.
1. From the Services tab, go to the CDH4 Hive service where you want to host the replicated data.
2. Click the Replication tab at the top of the page.
3. Select the Hive service to be the source of the replicated data. If the peer Cloudera Manager
Server has multiple CDH4 Hive services (for example, if it is managing multiple CDH4 clusters)
you will be able to select the service you want to use as the source.
Note that the local CDH4 Hive service (being managed by the Cloudera Manager server you are
logged into) is also available as a replication source.
If the peer whose Hive service you want is not listed, click the Add Peer link to go to the Peers
page to add a Cloudera Manager peer.
When you select a replication source, the Create Replication pop-up opens.
4. Leave Replicate All checked to replicate all the Hive metastore databases from the source.
To replicate only selected databases, uncheck this option and enter the Database name(s) and
tables you want to replicate.
178 | Cloudera Manager User Guide
5. Select the target destination. If there is only one Hive service managed by Cloudera Manager
available as a target, then this will be specified as the target. If there are more than one Hive
services managed by this Cloudera Manager, then you will be able to select among those.
6. Select a schedule: You can have it run immediately, run once at a scheduled time in the future,
or at regularly scheduled intervals. If you select "Once" or "Recurring" you are presented with
fields that let you set the date and time and (if appropriate) the interval between runs.
7. Uncheck the Replicate HDFS Files checkbox to skip replicating the associated data files if you
uncheck this, only the Hive metadata will be replicated. These are replicated to a default
location; to specify a different location, enter the path in the Destination field under the More
Options section.
8. Use the More Options section to specify an export location, modify the parameters of the
MapReduce job that will perform the replication, and other options.
Here you will be able to select a MapReduce service (if there is more than one in your cluster)
and change the following parameters:
o
By default, Cloudera Manager exports the Hive Metadata to a default HDFS location
(/user/${user.name}/.cm/hive) and then imports from this HDFS file to the target
Hive Metastore. The default HDFS location for this export file can be overriden by
specifying a path in the Export Path field.
The Force Overwrite option, if checked, forces overwriting data in the target metastore
if there are incompatible changes detected. For example, if the target metastore was
modified and a new partition was added to a table, this option would force deletion of
that partition, overwriting the table with the version found on the source.
By default, Cloudera Manager replicates Hive's HDFS data files to a default location (/).
To override the default, enter a path in the Destination field.
Select the MapReduce service to use for this replication (if there is more than one in
your cluster).
The user that should run the MapReduce job. By default this is hdfs. If you want to run
the MR job as a different user, you can enter that here. If you are using Kerberos, you
MUST provide a user name here, and it must be one with an ID greater 1000.
Limits for the number of map slots and for bandwidth per mapper. The defaults are
unlimited.
Whether to abort the job on an error (default is not to do so). This means that files
copied up to that point will remain on the destination, but no additional files will be
copied.
Whether to remove deleted files from the target directory if they have been removed
on the source.
Whether to preserve the block size, replication count, and permissions as they exist on
the source file system, or to use the settings as configured on the target file system. The
default is to preserve these settings as on the source.
Note: If you leave the setting to preserve permissions, then you must be running as
a superuser. You can use the "Run as" option to ensure that is the case.
Whether to generate alerts for various state changes in the replication workflow.
From the Actions menu for the replication task you want to test, click Dry Run.
From the Actions menu for a replication task, in addition to Dry Run you can also:
Getting Support
If the replication fails, that will be indicated and the timestamp will appear in Red text.
To view more information about completed replication runs, click anywhere in the replication
job entry row in the replication list. This displays sub-entries for each past replication run.
To view detailed information about a particular past run, click the entry for that replication run.
This opens another sub-entry that shows:
A result message
When viewing a sub-entry, you can dismiss the sub-entry by clicking anywhere in its parent
entry, or by clicking the return arrow icon !at the top left of the sub-entry area.
Getting Support
Cloudera Support
Cloudera can help you install, configure, optimize, tune, and run Hadoop for large scale data processing
and analysis. Cloudera supports Hadoop whether you run our distribution on servers in your own data
center, or on hosted infrastructure services such as Amazon EC2, Rackspace, SoftLayer, or VMware's
vCloud.
If you are a Cloudera customer, you can:
Community Support
Register for the Cloudera Manager Users group.
Register for the CDH Users group.
Report Issues
Cloudera tracks software and documentation bugs and enhancement requests for CDH on
issues.cloudera.org. Your input is appreciated, but before filing a request, please search the Cloudera
Getting Support
issue tracker for existing issues and send a message to the CDH user's list, [email protected], or
the CDH developer's list, [email protected].