100% found this document useful (1 vote)
132 views

How To Gather Data For Openshift OVN-Kubernetes

The document provides instructions for gathering diagnostic data from Openshift OVN-Kubernetes environments experiencing issues with networking or traffic failures. It describes using the must-gather tool to collect logs and data from nodes in a consistent way. It also provides scripts to manually set debug logging levels for OVS and OVN, collect component status and configuration information, and revert logging levels after troubleshooting. The goal is to collect logs and data that can help identify potential issues with logical flows, OpenFlow rules, or traffic reaching host interfaces.

Uploaded by

Srinivasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
132 views

How To Gather Data For Openshift OVN-Kubernetes

The document provides instructions for gathering diagnostic data from Openshift OVN-Kubernetes environments experiencing issues with networking or traffic failures. It describes using the must-gather tool to collect logs and data from nodes in a consistent way. It also provides scripts to manually set debug logging levels for OVS and OVN, collect component status and configuration information, and revert logging levels after troubleshooting. The goal is to collect logs and data that can help identify potential issues with logical flows, OpenFlow rules, or traffic reaching host interfaces.

Uploaded by

Srinivasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

How to Gather Data for Openshift OVN-

Kubernetes
 SOLUTION VERIFIED - Updated April 6 2022 at 9:22 AM - 
English 
Environment
Red Hat Openshift Container Platform 4.4
Red Hat Openshift Container Platform 4.5
Red Hat Openshift Container Platform 4.6

Issue
 Issue with potentially missing ovn LogicalFlow, or ovs OpenFlow rules
 Issue not seeing packets getting to the outgoing host interface
 Intermittent traffic failures

Diagnostic Steps

Using must-gather
 Note: When possible you should use must-gather to collect networking information in a
consistent/simplified way, however to do this you must run must-gather in
an advanced fashion, to add the network collection to the archive.

Raw
export NODES="node1.clusterName.example.com node2.clusterName.example.com"
oc adm must-gather -- 'gather && gather_network_logs '${NODES}''

 Note: If this collection fails or can't be run due to an api server outage; please follow the
remainder of this article to manually collect the information for debugging a cluster.

1. Identifying Which Hosts to Debug


2. Setting Debug Level Logs for OVS and OVN
o Starting DEBUG logging
o Stopping DEBUG logging
3. Gathering OVN and OVS data
4. OVN Tracing

Identifying Which Hosts to Debug


As our issue statement is very wide, we will recommend gathering more data than likely
necessary.
That being said, gathering the logs from the master could be useful (and so the below scripts all
append the master nodes automatically to the list of nodes to gather data from) as they may
reveal ovn clustering issues. Such as repeat disconnects, replication issues, too many elections
and so on.
As for which worker to list, check where your pods currently reside and use those hosts.
Example, to debug the SDN layer of my application run the following to see where each pod is
hosted:

Raw
# oc get pods -n test-mydb-operator -o wide
NAME READY STATUS RESTARTS
AGE IP NODE NOMINATED NODE READINESS GATES
awesomedb-head-5b75dcf67d-llkv9 1/1 Running 30 34h 172.24.2.216
openshift-worker-1 <none> <none>
awesomedb-bigdata-655f86bd75-qrk8z 3/3 Running 30 34h 172.24.3.217
openshift-worker-0 <none> <none>
awesomedb-sort-6fbf8dc89b-nf2qk 2/2 Running 30 34h 172.24.2.267
openshift-worker-1 <none> <none>

With the above example my NODES to gather data from would be openshift-worker-
1 and openshift-worker-0.

Setting Debug Level Logs for OVS and OVN


The following script will set DEBUG level logging for openvswitch components. This
is highly verbose, and we have a low roll-over time for logging on the nodes.
NOTE: Red Hat recommends only setting openvswitch logging to DEBUG level for
troubleshooting purposes only, as it can have performance impacts.

To make sure not to lose any log entries it is best to stream the logs to a central location. In the
example script you can either enable to log to a local file (local to where you are running the
script) or not log local if you have an alternative destination.
Note: Alter the NODES variable to match which hosts you wish to debug.

Starting DEBUG logging

Example debug logging script:

Raw
#!/bin/bash

# Specify the worker nodes that your application is seeing issue on


NODES="openshift-worker-0"

LOG_LOCAL=true
# This scriptuses "oc logs ... -f" and sends it to bg.
# You have to manually stop it when you're done:
# kill $(ps f | awk '$0 ~ /oc.*logs/ {print $1}')

OUTDIRa=ovn_outputs
OUTDIRb=ovs_outputs
mkdir -p ${OUTDIRa}
mkdir -p ${OUTDIRb}

# Adding master nodes to the list of NODES regradless to help detect potential
replication issues
for NODE in $(echo $NODES $(oc get nodes -l node-role.kubernetes.io/master -o
jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}') | sort -u); do
echo "Working on $NODE"
OVN_POD=$(oc -n openshift-ovn-kubernetes get pod -l app=ovnkube-
node,component=network -o jsonpath='{range .items[?(@.spec.nodeName=="'${NODE}'")]}
{.metadata.name}{end}')
OVS_POD=$(oc -n openshift-ovn-kubernetes get pod -l app=ovs-node,component=network
-o jsonpath='{range .items[?(@.spec.nodeName=="'${NODE}'")]}{.metadata.name}{end}')

echo oc -n openshift-ovn-kubernetes exec -t ${OVS_POD} -- ovs-appctl vlog/set dbg


oc -n openshift-ovn-kubernetes exec -t ${OVS_POD} -- ovs-appctl vlog/set dbg

echo oc -n openshift-ovn-kubernetes exec -t ${OVN_POD} -- ovn-appctl -t ovn-


controller vlog/set dbg
oc -n openshift-ovn-kubernetes exec -t ${OVN_POD} -c ovn-controller -- ovn-appctl -
t ovn-controller vlog/set dbg

echo oc -n openshift-ovn-kubernetes exec -t ${OVN_POD} -c northd -- ovn-appctl -t


ovn-northd vlog/set dbg
oc -n openshift-ovn-kubernetes exec -t ${OVN_POD} -c northd -- ovn-appctl -t ovn-
northd vlog/set dbg

if ! $LOG_LOCAL; then
# if LOG_LOCAL variable is set to false the loop stops here and will not log
locally
continue
fi

echo "Sending to Background: oc -n openshift-ovn-kubernetes logs ${OVS_POD} --


timestamps -f"
(oc -n openshift-ovn-kubernetes logs ${OVS_POD} --timestamps -f > ${OUTDIRb}/ovs-
node-${NODE}.log) &

echo "Sending to Background: oc -n openshift-ovn-kubernetes logs ${OVN_POD} --all-


containers --timestamps -f"
(oc -n openshift-ovn-kubernetes logs ${OVN_POD} --all-containers --timestamps -f >
${OUTDIRa}/ovnkube-node-${NODE}.log) &
done
To stop the background task run the following command:

Raw
kill $(ps f | awk '$0 ~ /oc.*logs/ {print $1}')

Note: it will kill any currently running oc log command on the current system

If the LOG_LOCAL variable was set to true then pod logs will be in these two directories ready to
be analyzed or attached to a case:
- ./ovn_outputs/.*.log
- ./ovs_outputs/.*.log

Stopping DEBUG logging

To revert to normal logging level the below script can be utilized:

Raw
#!/bin/bash

# Specify the worker nodes that your application is seeing issue on


NODES="openshift-worker-0"

# Adding master nodes to the list of NODES regradless to help detect potential
replication issues
for NODE in $(echo $NODES $(oc get nodes -l node-role.kubernetes.io/master -o
jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}') | sort -u); do
OVN_POD=$(oc -n openshift-ovn-kubernetes get pod -l app=ovnkube-
node,component=network -o jsonpath='{range .items[?(@.spec.nodeName=="'${NODE}'")]}
{.metadata.name}{end}')
OVS_POD=$(oc -n openshift-ovn-kubernetes get pod -l app=ovs-node,component=network
-o jsonpath='{range .items[?(@.spec.nodeName=="'${NODE}'")]}{.metadata.name}{end}')

echo "oc -n openshift-ovn-kubernetes exec -t ${OVS_POD} -- ovs-appctl vlog/set


info"
oc -n openshift-ovn-kubernetes exec -t ${OVS_POD} -- ovs-appctl vlog/set info

echo "oc -n openshift-ovn-kubernetes exec -t ${OVN_POD} -- ovn-appctl -t ovn-


controller vlog/set info"
oc -n openshift-ovn-kubernetes exec -t ${OVN_POD} -c ovn-controller -- ovn-appctl -
t ovn-controller vlog/set info

echo "oc -n openshift-ovn-kubernetes exec -t ${OVN_POD} -c northd -- ovn-appctl -t


ovn-northd vlog/set info"
oc -n openshift-ovn-kubernetes exec -t ${OVN_POD} -c northd -- ovn-appctl -t ovn-
northd vlog/set info

done
Gathering OVN and OVS data
For Openshift Container Platform releases 4.7.x in addition to the below script, also run the
following must-gather collection script:

Raw
oc adm must-gather --dest-dir="./network-ovn" -- /usr/bin/gather_network_logs

The below script will grab the dumps for both ovn and ovs, all that is required to have is to
change the NODES variable in the script to the node names experiencing the issue (current
example: "openshift-worker-0 openshift-worker-1")

Collection script:

Raw
#!/bin/bash

### Change NODES variable to the desired hosts


NODES="openshift-worker-0 openshift-worker-1"

OUTDIRa=ovn_outputs
OUTDIRb=ovs_outputs
mkdir -p ${OUTDIRa}
mkdir -p ${OUTDIRb}

OVN_NB_TABLES=(
"NB_Global"
"Logical_Switch"
"Logical_Switch_Port"
"Address_Set"
"Port_Group"
"Load_Balancer"
"ACL"
"Logical_Router"
"QoS"
"Meter"
"Meter_Band"
"Logical_Router_Port"
"Logical_Router_Static_Route"
"NAT"
"DHCP_Options"
"Connection"
"DNS"
"SSL"
"Gateway_Chassis"
)

PIDS=()
for NODE in ${NODES}; do
echo ${NODE}
### OVN
POD_OVNKUBE=$(2>/dev/null oc -n openshift-ovn-kubernetes get pod -l app=ovnkube-
node,component=network -o jsonpath='{range .items[?(@.spec.nodeName=="'${NODE}'")]}
{.metadata.name}{end}')
NBDB=$(oc describe ds ovnkube-node -n openshift-ovn-kubernetes | awk '/nb-address/
{gsub(/"/, "", $2); print $2}')
SBDB=$(oc describe ds ovnkube-node -n openshift-ovn-kubernetes | awk '/sb-address/
{gsub(/"/, "", $2); print $2}')
ARGS="-p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt"

echo "oc -n openshift-ovn-kubernetes exec -t -c ovn-controller ${POD_OVNKUBE} --


ovn-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound"
sh -x &>${OUTDIRa}/${NODE}.${POD_OVNKUBE}.ovn-appctl_sbstatus <<< "timeout 30 oc -n
openshift-ovn-kubernetes exec -t -c ovn-controller ${POD_OVNKUBE} -- ovn-appctl -t
/var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound" & PIDS+=($!)

echo "oc -n openshift-ovn-kubernetes exec -t -c ovn-controller ${POD_OVNKUBE} --


ovn-sbctl --db ${SBDB} ${ARGS} show"
sh -x &>${OUTDIRa}/${NODE}.${POD_OVNKUBE}.ovn-sbctl_show <<< "timeout 30 oc -n
openshift-ovn-kubernetes exec -t -c ovn-controller ${POD_OVNKUBE} -- ovn-sbctl --db $
{SBDB} ${ARGS} show" & PIDS+=($!)

echo "oc -n openshift-ovn-kubernetes exec -t -c ovn-controller ${POD_OVNKUBE} --


ovn-sbctl --db ${SBDB} ${ARGS} lflow-list"
sh -x &>${OUTDIRa}/${NODE}.${POD_OVNKUBE}.ovn-sbctl_lflow-list <<< "timeout 30 oc -
n openshift-ovn-kubernetes exec -t -c ovn-controller ${POD_OVNKUBE} -- ovn-sbctl --db
${SBDB} ${ARGS} lflow-list" & PIDS+=($!)

echo "oc -n openshift-ovn-kubernetes exec -t -c ovn-controller ${POD_OVNKUBE} --


ovn-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound"
sh -x &>${OUTDIRa}/${NODE}.${POD_OVNKUBE}.ovn-appctl_nbstatus <<< "timeout 30 oc -n
openshift-ovn-kubernetes exec -t -c ovn-controller ${POD_OVNKUBE} -- ovn-appctl -t
/var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound" & PIDS+=($!)

echo "oc -n openshift-ovn-kubernetes exec -t -c ovn-controller ${POD_OVNKUBE} --


ovn-nbctl --db ${NBDB} ${ARGS} show"
sh -x &>${OUTDIRa}/${NODE}.${POD_OVNKUBE}.ovn-nbctl_show <<< "timeout 30 oc -n
openshift-ovn-kubernetes exec -t -c ovn-controller ${POD_OVNKUBE} -- ovn-nbctl --db $
{NBDB} ${ARGS} show" & PIDS+=($!)

for tbl in ${OVN_NB_TABLES[*]}; do


echo "oc -n openshift-ovn-kubernetes exec -t -c ovn-controller ${POD_OVNKUBE} --
ovn-nbctl --db ${NBDB} ${ARGS} list ${tbl}"
sh -x &>${OUTDIRa}/${NODE}.${POD_OVNKUBE}.ovn-nbctl_list_${tbl} <<< "timeout 30
oc -n openshift-ovn-kubernetes exec -t -c ovn-controller ${POD_OVNKUBE} -- ovn-nbctl
--db ${NBDB} ${ARGS} list ${tbl}" & PIDS+=($!)
done

### OVS
POD_OVS_NODE=$(2>/dev/null oc -n openshift-ovn-kubernetes get pod -l app=ovs-
node,component=network -o jsonpath='{range .items[?(@.spec.nodeName=="'${NODE}'")]}
{.metadata.name}{end}')
OVS_BRIDGES=$(oc -n openshift-ovn-kubernetes exec -t ${POD_OVS_NODE} -- ovs-vsctl
list-br 2>/dev/null)
for OVS_BRIDGE in ${OVS_BRIDGES}; do
echo "oc -n openshift-ovn-kubernetes exec -t ${POD_OVS_NODE} -- ovs-ofctl dump-
flows ${OVS_BRIDGE}"
oc -n openshift-ovn-kubernetes exec -t ${POD_OVS_NODE} -- ovs-ofctl dump-flows $
{OVS_BRIDGE} > ${OUTDIRb}/${NODE}.ovs-ofctl.dump-flows.${OVS_BRIDGE}

echo "oc -n openshift-ovn-kubernetes exec -t ${POD_OVS_NODE} -- ovs-ofctl show $


{OVS_BRIDGE}"
oc -n openshift-ovn-kubernetes exec -t ${POD_OVS_NODE} -- ovs-ofctl show $
{OVS_BRIDGE} > ${OUTDIRb}/${NODE}.ovs-ofctl.show.${OVS_BRIDGE}

echo "oc -n openshift-ovn-kubernetes exec -t ${POD_OVS_NODE} -- ovs-appctl


dpctl/dump-conntrack"
oc -n openshift-ovn-kubernetes exec -t ${POD_OVS_NODE} -- ovs-appctl dpctl/dump-
conntrack > ${OUTDIRb}/${NODE}.${POD_OVS_NODE}.ovs_appctl.dpctl.dump_conntrack
done
echo "oc -n openshift-ovn-kubernetes exec -t ${POD_OVS_NODE} -- ovs-dpctl -s show"
oc -n openshift-ovn-kubernetes exec -t ${POD_OVS_NODE} -- ovs-dpctl -s show > $
{OUTDIRb}/${NODE}.${POD_OVS_NODE}.ovs-dpctl.show
done

### gathering raw ovn dbs


for OVN_KUBE_MASTER_POD in $(2>/dev/null oc -n openshift-ovn-kubernetes get pod -l
app=ovnkube-master,component=network -o jsonpath='{range .items[*]}{.metadata.name}
{"\n"}{end}')
do
echo "oc -n openshift-ovn-kubernetes exec -t -c nbdb ${OVN_KUBE_MASTER_POD} --
cat /etc/ovn/ovnnb_db.db"
oc -n openshift-ovn-kubernetes exec -t -c nbdb ${OVN_KUBE_MASTER_POD} -- cat
/etc/ovn/ovnnb_db.db > ${OUTDIRa}/${OVN_KUBE_MASTER_POD}.ovnnb_db.db

echo "oc -n openshift-ovn-kubernetes exec -t -c nbdb ${OVN_KUBE_MASTER_POD} --


cat /etc/ovn/ovnsb_db.db"
oc -n openshift-ovn-kubernetes exec -t -c nbdb ${OVN_KUBE_MASTER_POD} -- cat
/etc/ovn/ovnsb_db.db > ${OUTDIRa}/${OVN_KUBE_MASTER_POD}.ovnsb_db.db
done

echo "Waiting for collection to complete"


wait "${PIDS[@]}"

echo "Compressing files and creating a tar archive"


tar -zcvf ovs_ovn_dumps.tar.gz ${OUTDIRa} ${OUTDIRb}

echo -e "\nPlease upload $(realpath ovs_ovn_dumps.tar.gz) to your case with RH"

Once it is ran it should create a "ovs_ovn_dumps.tar.gz" tarball file, simply upload it to the case
if you wish to have support analyze it.

OVN Tracing
For information on how to trace OVN flows (using ovn-trace/ovnkube-trace) please check
the following KCS:
- https://access.redhat.com/solutions/5887511

You might also like