Strata ASC Troubleshooting Playbook - Commit
Strata ASC Troubleshooting Playbook - Commit
COMMIT
© 2022 Palo Alto Networks, Inc. All rights reserved. Sharing restricted, for internal and Authorized Support Center use only.
Strata ASC Troubleshooting Playbook: Commit
Palo Alto Networks recommends running the latest version of PAN-OS. As a general
reminder, always check the release notes to see if is possibly experiencing an issue that
has already been resolved.
Contents
Triage ............................................................................................................................................................................................................... 2
Troubleshooting........................................................................................................................................................................................ 3
Symptom 1 - Phase 0 Failures ..................................................................................................................................................... 3
Symptom 2 – Phase 1 failure in a client daemon............................................................................................................ 4
Symptom 3 – Slow Commit ......................................................................................................................................................... 8
Symptom 4 – Panorama commit-all Does Not Reach FW, or Does Not Send Properly ....................... 11
Symptom 5 - Commit Fails on DP Due To Memory Allocation Failure........................................................... 13
Symptom 6 - Commit Stuck on FW or the commit-all from Panorama Is Stuck Even Though
Local FW commits Succeed.......................................................................................................................................................14
Issue Replication Tips ..................................................................................................................................................................... 15
How To Troubleshoot Which Daemon Is Causing a Commit Failure .............................................................. 16
Scenario-1: Commit failed with error "Failed to modify obj for phase1 push: TIMEOUT" ..................... 16
Scenario-2: Commit fails on panorama after a downgrade from 10.1.4-h2 to 10.0.9 ............................... 17
Scenario-3: Autocommit failure due to "Certificate <Cert-name>" failed to load: failed to parse
key............................................................................................................................................................................................................... 18
Escalation .................................................................................................................................................................................................... 19
Palo Alto Networks TAC Escalation Template ................................................................................................................ 19
1 | © 2022 Palo Alto Networks, Inc. All rights reserved. Sharing restricted, for internal and ASC partner use only.
Strata ASC Troubleshooting Playbook: Commit
Triage
When it comes to troubleshooting commit issues, the most important thing is to narrow
down the issue to the specific phase of the commit or system process.
Start by ensuring the configs referenced in the failure are valid. One that is done,
additional debugs can be enabled on processes to help identify root cause.
• Pre-commit processing
o Generates commit candidate files
o Plugin pre-commit, validation and plugin file generation
• Schema Verification
o Validates the generated xml config to be committed against schema
• Validity checks (xsl transform)
o Does additional validations based on platforms/modes etc by running a xsl
transform
• EDL checks
o Additional validations for edl configuration
• Phase 0
o ID population from devsrvr
• Phase 1
o Mgmtsrvr/Configd sends respective transformed configuration to each client
and plugin for validation
• Phase 2
o Final phase of client commit where config is applied
• Post-commit operations
o After every client reports success in phase2, this is where the configuration
takes effect to reflect the newly committed changes
2 | © 2022 Palo Alto Networks, Inc. All rights reserved. Sharing restricted, for internal and ASC partner use only.
Strata ASC Troubleshooting Playbook: Commit
Mgmtsrvr/configd are the the main processes when it comes to the rest of the commit
phases, so collecting commit debugs for those processes are usually necessary to
troubleshoot commit issues.
Troubleshooting
Symptom 1 - Phase 0 Failures
Phase 0 failures are usually related to idmgr.
2020-12-18 06:40:31.426 +0000 client device reported error: Error: Error populating id
for 'vsys1+TestURL' (4294967295)
(Module: device)
This can occur if base-id idmgr capacity is exceeded, or in case of idmgr corruption.
PAN-157730 and PAN-155807 fix an issue where ids are not released during the idmgr
reset, so even after the capacity gets back within the limits, commits would still fail.
After resetting idmgr, always issue “commit force” after the reset.
In case of HA, after resetting idmgr on active device, reset it also on the
passive/secondary device, followed by commit force on both devices.
To even start the commit all the daemons must register (up and running):
Once the commit starts the phase1 and phase2 progress can be checked using the same
command (show management-clients):
4 | © 2022 Palo Alto Networks, Inc. All rights reserved. Sharing restricted, for internal and ASC partner use only.
Strata ASC Troubleshooting Playbook: Commit
5 | © 2022 Palo Alto Networks, Inc. All rights reserved. Sharing restricted, for internal and ASC partner use only.
Strata ASC Troubleshooting Playbook: Commit
In the example above, the command "show management-clients" has been executed
during the commit, It can be seen that P1 finished OK on all processes. but failed on the
device client (device-server).
In ms.log, you can check the clients for which phase1 was successful, and for which it was
unsuccessful:
...
6 | © 2022 Palo Alto Networks, Inc. All rights reserved. Sharing restricted, for internal and ASC partner use only.
Strata ASC Troubleshooting Playbook: Commit
Once you determine the client (process) that is failing, check the logs (if needed enable
debugs for that process).
In our example it was device-server – devsrvr.log (the name if the zone used was
incorrect):
...
<<vsys1>>
<</vsys1>>
If management-server debug level is set to dump for the commit feature, the
transformed xml file will be created in the /tmp folder for every process participating in
the commit containing the xml config sent to that process:
7 | © 2022 Palo Alto Networks, Inc. All rights reserved. Sharing restricted, for internal and ASC partner use only.
Strata ASC Troubleshooting Playbook: Commit
The commit can fail during the phase 1 also if there is a communication error between
the devsrvr on the MP side and the DP side (pan_comm process), because devsrvr needs
to send the generated binary config to the pan_comm process on the DP:
In this case the netstat output should be checked if the connection between the devsrvr
and the pan_comm on the port 2001 exists:
Also the status of the dataplane and the corresponding processes should be checked
using the command "show system software status". For both processes (devsrvr and
pan_comm) debug level can be increased (for the pan_comm the command "debug
dataplane process comm on debug" can be used), and core of the processes taken. After
that the processes can be restarted to try to re-establish the working connection
between them.
There could be also other reasons for this issue, so when troubleshooting slow commit,
the main goal is to isolate the issue to specific commit phase that is mainly contributing
to the extended duration.
The exact duration of the commit can be checked from the output of “show jobs”
command or from the ms log. In the ms.log, there will be an entry when the commit job
starts and ends:
8 | © 2022 Palo Alto Networks, Inc. All rights reserved. Sharing restricted, for internal and ASC partner use only.
Strata ASC Troubleshooting Playbook: Commit
2021-06-02 16:23:34.435 -0700 Commit job 15 completed and system log generated
If management server is set to debug for the commit all feature, during the commit the
ms log should contain the exact duration in seconds for all phases and checks during the
process:
...
2021-06-02 10:07:43.877 -0700 Schema validation including uuid check for job 12 takes 0
seconds
...
9 | © 2022 Palo Alto Networks, Inc. All rights reserved. Sharing restricted, for internal and ASC partner use only.
Strata ASC Troubleshooting Playbook: Commit
If the duration of the specific phase is long, it can be checked which client exactly takes a
lot of time using either the command “show management-clients” during the commit, or
in ms log, where for both phase1 and phase2, there will be a log message for every client
when the specific phase is done (example useridd):
Line 893: 2021-06-02 10:08:05.867 -0700 client useridd reported Phase 1 was SUCCESSFUL
Line 917: 2021-06-02 10:08:11.481 -0700 client useridd reported Phase 2 was SUCCESSFUL
Once the problematic client is isolated, the logs for that daemon can be checked (debugs
also can be taken) for more details and its activity using “show system resources follow” (If
the daemon is very high on CPU or unresponsive, core and multiple traces can be taken
for further troubleshooting).
Sometimes the commit can be slow due to the amount of time needed for the config to
be saved on the disk - debug ms log example:
In that case, diskstat output in the mp-monitor can be checked during the commit time,
and also the system disk performance can be checked from root using the linux dd
command
From PAN-OS 10.0 on, the command "show system last-commit-info" can be used to get
the breakdown of the last commit including the duration per phase/process:
10 | © 2022 Palo Alto Networks, Inc. All rights reserved. Sharing restricted, for internal and ASC partner use only.
Strata ASC Troubleshooting Playbook: Commit
That config is merged with FW’s candidate config and commit is done on the merged
configuration.
Once this data is pushed to the FW, FW will save it to the folders /opt/pancfg/mgmt/sp
and /opt/pancfg/mgmt/template (per vsys there will be a folder and there will also be file
for shared data).
In every folder 2 files (pre-trans (exactly what was pushed from panorama) and one more
xml (transformed) - this one will be used for merging).
Once the push has been sent to the FW, the FW should start the commit job.
If the job itself is not queued (show jobs all), the first thing to check should be the
connectivity between FW and Panorama (system log, ms.log).
Panorama is sending the config files through the evtmgr to the FW, and there is a
possibility of hitting the limit of the message (approximately 150MB).
11 | © 2022 Palo Alto Networks, Inc. All rights reserved. Sharing restricted, for internal and ASC partner use only.
Strata ASC Troubleshooting Playbook: Commit
If the config pushed from the panorama does not contain some data (causing the
deletion of part of the merged config on the FW), the files panorama-config.xml and
template-config.xml should be checked and compared with the sp and template files on
the FW.
A very common problem is that in case of having the setting “Share Unused Address and
Service Objects with Devices” disabled, that not all objects that are used on the FW are
pushed.
If the references are not pushed, the workaround could be to rebuild the xml cache on
the Panorama side:
newsp.xml - getting generated by transposing the new sp node over the old one (Go
through shared and each vsys and replace nodes with the ones from pushsp.xml)
mergesp.xml - merge the sp node with candidate cfg (replace /config/panorama with
newsp.xml, apply rename-map, and invoke the sp-importer transform)
newtpl.xml - compared to the dg, there is no transpose for the template, so the newtpl
will be either the newly pushed template or the last pushed one depending on the type
of the push
mergetpl.xml - merge the template with the candidate config
12 | © 2022 Palo Alto Networks, Inc. All rights reserved. Sharing restricted, for internal and ASC partner use only.
Strata ASC Troubleshooting Playbook: Commit
It can happen that due to no available memory, pan_comm fails the commit. Usually in
the pan_comm logs there will be entries like:
The commit job will be failing at the device module with the reason
CONFIG_UPDATE_START.
The first thing to check is the output of the command, and check the memory utilization:
It should be checked during the commit the “config allocator usage” and “policy cache
allocator usage”:
Example of the config allocator usage (in pan_comm memory are maintained two copies,
the previous and the current running config):
If the usage is close to the limit, that means that there are parts of the config that needs
size reduction (check the platform support limit for different config options). The
following table contains the explanation which feature belongs to which config memory
area:
The second thing in the output of the aforementioned command to be checked is the
policy cache usage.
Check the usage of the EDLs in the policies, especially as a destination inside the security
policies.
Once downloaded EDLs are saved on the disk, and they can be found in the TS file:
\opt\pancfg\mgmt\devices\localhost.localdomain
If the commit on the FW seems to be stuck, the first objective is to find the offending
process.
Once the offending process is determined, the debugs for that process can be taken,
together with the core and trace files of the process.
Also core and trace of the management-server (configd) process should be taken.
There could be a situation, where during the push from panorama, commit on the FW
side is successful (the job is OK), but the corresponding commit-all job on the panorama
side does not finish.
This could be due to communication issues or malformed response, and the best way to
isolate the issue (if it is on the FW or panorama side), is to set the management-server on
the debug level on both sides and check for the following logs:
Even though the issue may not be reproduced, this facilitates easier troubleshooting and
finding the root cause.
During the replication, use the same PAN-OS version and if possible use the same model
of the FW/Panorama.
The best would be if customer can give us the device state from the FW and running-
config from the panorama.
The file \opt\pancfg\mgmt\audit\cfg-audit.xml,v can be taken from the TS file, and used
on reproduction setup in case loading of the recent changes customer has done is
needed.
15 | © 2022 Palo Alto Networks, Inc. All rights reserved. Sharing restricted, for internal and ASC partner use only.
Strata ASC Troubleshooting Playbook: Commit
The EDL files from the TS can also be used on a lab web server to replicate EDL issues,
even though EDLs are not publicly reachable.
From PAN-OS 9.1 on, on the FW, the previous versions of the merged config are saved
(containing also the panorama config) – they can be seen from the CLI using the
command:
In the folder /opt/pancfg/mgmt/audit there is an audit file for merged config - last-
candidatecfg-audit.xml,v (present in the TS file).
This can be used in case we need to restore the config from some point in time (including
the panorama’s pushed configuration).
•
o As you can see, devsrvr has reported a failure, This can indicate an issue on
the device server process and the team which handles the commit related
issues is Layer-7
2022-02-10 12:40:57.837 -0500 client device reported error: Warning: No Valid DNS
Security License
<>
Warning: cannot find complete certificate chain for certificate FRTrusted
Warning: vsys2 decryption: forward decrypt untrust cert is not configured, forward
decrypt trust cert will be used instead.
<>
<>
Warning: cannot find complete certificate chain for certificate ipac_cer_ft
<>
16 | © 2022 Palo Alto Networks, Inc. All rights reserved. Sharing restricted, for internal and ASC partner use only.
Strata ASC Troubleshooting Playbook: Commit
• As you can see in ms.log, we see device server process reported phase1 failed
• If we look more in devsrvr.log during the time of the commit failure:
17 | © 2022 Palo Alto Networks, Inc. All rights reserved. Sharing restricted, for internal and ASC partner use only.
Strata ASC Troubleshooting Playbook: Commit
• Since auto-commit fails due to the reason indicating failed to load the private key
and which is related to sslmgr
• ms.log
• devsrvr.log
18 | © 2022 Palo Alto Networks, Inc. All rights reserved. Sharing restricted, for internal and ASC partner use only.
Strata ASC Troubleshooting Playbook: Commit
• pan_comm.log
Escalation
If after completing the steps in this playbook and conducting your own analysis you are
not able to solve the issue for your customer, it may be necessary to engage Palo Alto
Networks Global Customer Support team for additional assistance. Filing a support case
with the correct team and a full set of actionable data will substantially accelerate
identification and resolution of issues. Always include tech-support files from affected
devices whenever possible, and note when specific information cannot be shared.
Date/Time(s) of Issue
-----
specify exact time when the commit is started and when it failed.
Expected/Observed Behavior
-----
commit should succeed / commit failing
19 | © 2022 Palo Alto Networks, Inc. All rights reserved. Sharing restricted, for internal and ASC partner use only.
Strata ASC Troubleshooting Playbook: Commit
Steps To Reproduce
-----
For commit failures, this is a mandatory field. It should contain copies of the problematic
configuration files whenever possible.
20 | © 2022 Palo Alto Networks, Inc. All rights reserved. Sharing restricted, for internal and ASC partner use only.