-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Fluentd logs is full of backslash and kibana doesn't show k8s pods logs #2545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Is this log single line, right? If so, it seems several logs are merged into one. |
No, the log is full of backslashes and there are single lines of actual log and then pages of backslashes but I didn't want to copy all the meaningless backslashes and when I searched for the "error" there wasn't any. |
Any progress on this issue ? I seem to have just hit exactly the same problem I use a slightly different setup using
but otherwise substantially the same. Looking at the logs, it appears to be repeatedly reprocessing the same information, objecting to the format, which generates a new, longer log entry which is then reprocessed .... and around we go. |
I have the same problem after following this tutorial, but using k3s as my kubernetes deployment. If I strip the backslashs I can see something like:
But otherwise it's not even possible to see what is going on:
My fluend.yaml is as follows:
|
Same issue. Does anyone have a solution for this? |
Same issue \\\\ |
If your fluentd logs are growing in backslashes, then your fluentd container is parsing its own logs and recursively generating new logs. Consider creating a fluentd-config.yaml file that is setup to ignore Here is my kind: ConfigMap
apiVersion: v1
metadata:
name: fluentd-config
namespace: kube-logging
labels:
addonmanager.kubernetes.io/mode: Reconcile
data:
containers.input.conf: |-
<source>
@type tail
@id in_tail_container_logs
path /var/log/containers/*.log
exclude_path ["/var/log/containers/fluentd*"]
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
read_from_head true
format /^.* (?<source>(stderr|stdout))\ F\ (?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$/
time_format %d/%b/%Y:%H:%M:%S %z
</source>
output.conf: |-
# Enriches records with Kubernetes metadata
<filter kubernetes.**>
type kubernetes_metadata
</filter>
<match **>
type elasticsearch
log_level info
include_tag_key true
host elasticsearch.kube-logging.svc.cluster.local
port 9200
logstash_format true
# Set the chunk limits.
buffer_chunk_limit 2M
buffer_queue_limit 8
flush_interval 5s
# Never wait longer than 5 minutes between retries.
max_retry_wait 30
# Disable the limit on the number of retries (retry forever).
disable_retry_limit
# Use multiple threads for processing.
num_threads 2
</match> Then you will want to update your fluentd DaemonSet. I have had success with the Here's what that looks like: apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd
namespace: kube-logging
labels:
app: fluentd
spec:
selector:
matchLabels:
app: fluentd
template:
metadata:
labels:
app: fluentd
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
serviceAccount: fluentd
serviceAccountName: fluentd
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: fluentd
image: gcr.io/google-containers/fluentd-elasticsearch:v2.0.1
env:
- name: FLUENTD_SYSTEMD_CONF
value: "disable"
- name: FLUENTD_ARGS
value: "--no-supervisor -q"
resources:
limits:
memory: 512Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlogcontainers
mountPath: /var/log/containers
readOnly: true
- name: config
mountPath: /etc/fluent/config.d
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlogcontainers
hostPath:
path: /var/log/containers/
- name: config
configMap:
name: fluentd-config Best of luck! |
Just added new envvar to fluentd-kubernetes-daemonset for this case: |
I see 2 possible concurrent causes:
The pattern not match explains why kibana doesn't see any error message. They're not being sent to your elastic service. Having a proper filter/parser would help on this. |
Is there a good way for fluentd's own logs to be shipped if possible? |
I got this issue as well, because I was using containerd instead of docker. I solved it by putting in the following configuration:
|
@micktg |
For lastest images, use cri parser is better than regexp: https://github.com/fluent/fluentd-kubernetes-daemonset#use-cri-parser-for-containerdcri-o-logs |
I followed a digital ocean tutorial https://www.digitalocean.com/community/tutorials/how-to-set-up-an-elasticsearch-fluentd-and-kibana-efk-logging-stack-on-kubernetes to setup my EFK for kubernetes and faced the same issue. The above answer by @micktg resolved the issue. I added the below in environment variables of my fluentd yaml file, so now my environment variables look like this
|
I found @micktg and @varungupta19 answer solve the problem. |
Thanks, @micktg and @varungupta19. Problem solved. |
adding |
+ prevent Fluentd from parsing its own logs and fix an issue with endless backslashes (fluent/fluentd#2545) + increase chunk limit size + add storage for systemd plugin configuration + add pos_file parameter for the tail sources Change-Id: I7d6e54d2324e437c92e5e8197636bd6c54419167
* Update openstack-helm-infra from branch 'master' to 01e66933b3c2b93c6677c04a00361ceeb70a9634 - [fluentd] Adjust configuration for v1.15 + prevent Fluentd from parsing its own logs and fix an issue with endless backslashes (fluent/fluentd#2545) + increase chunk limit size + add storage for systemd plugin configuration + add pos_file parameter for the tail sources Change-Id: I7d6e54d2324e437c92e5e8197636bd6c54419167
* Update openstack-helm from branch 'master' to b95bb6767865077c0ccec867d52429e8549bcdb8 - Merge remote-tracking branch 'openstack-helm-infra/master' The openstack-helm and openstack-helm-infra are merged into one. See https://lists.openstack.org/archives/list/[email protected]/thread/YRWSN6X2MTVGFPCULJ344RSDMCQDO7ZG/ for the discussion which led up to this. - Prepare for upcoming merger with OSH * Remove unnecessary scripts * Sync identical scripts * Update chart_version.sh so to make it identical with OSH * Sync README.rst, tox.ini, .gitignore, Makefile * Rename some files to prevent merge conflicts * Sync releasenotes config Change-Id: Ibfdebcb62a416fc1b15989a1fd89b897a783d8f4 - Move docs to the openstack-helm repo This is to prepare for the upcoming merger to the openstack-helm repo and to reduce the number of merge conflicts. Depends-On: I6a4166f5d4d69279ebd56c66f74e2cbc8cbd17dd Change-Id: I3cb3f2c44d8401e1d0de673bf83f8e294433b8df - Merge "Do not install reno globally using pip" - [ceph-rook] Skip check iteration if TOOLS_POD empty When we deploy Ceph cluster using Rook we check its status in a loop using Rook tools pod. If tools pod is not found, let's skip the iteration. Change-Id: Ib6bc90034961f89b8bb53081db5bf03c4d736110 - Do not install reno globally using pip It does not work on Ubuntu Noble which requires using virtual env when you try to install packages using pip. Also for deployment tests reno is not needed because we build charts with SKIP_CHANGELOG=1 Change-Id: I8f0578ed2e1d0e757add155c618eea2e8a2e30d2 - [deploy-env] Do not use kubernetes.core ansible module It fails with the error: `No module named 'yaml'`. So let's use native helm with the ansible shell instead. Change-Id: If652d603cfcaeb0b70c9b566b90d98e627d3bada - Remove ceph repos from deploy-env role Also do not install ceph-common package on the test nodes. Change-Id: Ia33f2e7f26f3ccaec4863a22702946b4383d39a5 - Update ceph-rook.sh script While we are waiting for Ceph cluster to be ready we check Ceph status in a loop using tools pod provided by Rook. We have to get this tools pod name every iteration within the loop because K8s can terminate the pod for some reason and this is expected behavior. Change-Id: Iabb98e94d7470fe996091bf77787637f3e8f4798 - Merge "Add ingress deployment to deploy-env role" - Merge "Cleanup FEATURE_GATES env var" - Add ingress deployment to deploy-env role Change-Id: I910862d391650c443c6f0e352b3687120af14a91 - Cleanup FEATURE_GATES env var A while ago we changed the way how we define features for the test jobs and this variable is not needed any more Also change keystone jobs so they don't deploy Rook but instead use local volumes. Depends-On: Ib4afe58b27cd255ce844626b1eee5ecc82e3aeb3 Change-Id: Ia8971bd8c3723542a275c9658db7f9a5bb943f92 - Update jobs * Make TLS job non-voting because it is unstable and we don't want it to block the gate. * Ceph migration job is not very important now. Let's run it in a periodic-weekly pipeline. Change-Id: Iadb67e1c5218794d15e60538abb2e869ae7e67c0 - Merge "Remove resource limits for Rook" - Remove resource limits for Rook Change-Id: I857f75974a2ba0e3374fb46e06c7bce7fa04980c - Cleanup old scripts The env-variables.sh get-values-overrides.sh and wait-for-pods.sh are not needed any more since they are now a part of openstack-helm-plugin. Change-Id: I044ee7e7182822a9d7e5fd3e56c444fbfea9a753 - helm-toolkit: always add pod mounts for db-sync job Always include mounts defined for the db-sync job under the pods section rather than requiring every chart to pass it in explicitly. Now the passed in value can be just for overrides. Since charts today already pass this in, we need to de-duplicate it to ensure we don't create this multiple times. Change-Id: I042e79cee7859ebdc001a056edc75eb89dd3e5b3 - Rolling update on secret changes Change-Id: If1bb0218eb70a2bed55f18b9fb6dd36ea042286c - [ceph] Update Ceph and Rook This change updates all of the charts that use Ceph images to use new images based on the Squid 19.2.1 release. Rook is also updated to 1.16.3 and is configured to deploy Ceph 19.2.1. Change-Id: Ie2c0353a4bfa181873c98ce5de655c3388aa9574 - Merge "Ceph migration gate improvement" - Ceph migration gate improvement This change addresses instability in the Ceph migration gate job by adding extra nodes to the cluster. Change-Id: Id60d61274a42f87280748f0b4b9c0c3c7adb7357 - [k8s,ceph,docker] Apt repository filename cleanup The prerequisites and containerd tasks that add the Kubernetes, Ceph, and Docker repos to apt list the filenames as kubernetes.list, ceph.list, and docker.list, which results in the files created under /etc/apt/sources.list.d being named with '.list.list' extensions. This change simply removes the redundant '.list' from the filenames to clean that up. Change-Id: I3672873149d137ad89c176cabad4c64dcff2bfee - Merge "Add OVN network logging parser" - Add OVN network logging parser Change-Id: I03a1c600c161536e693743219912199fabc1e5a5 - [openvswitch] Make --user flag optional Add the ability to run the OVS server as root since the following change lacks backward compatibility: https://review.opendev.org/c/openstack/openstack-helm-infra/+/939580 Change-Id: I071f77be0d329fbe98ce283324466bf129fe190d - [deploy-env] Install reno for OSH jobs This commit requires reno installed on the system: https://review.opendev.org/c/openstack/openstack-helm/+/940142 Change-Id: I874fabc0199229d8e05d1e0bb2626d7630c06a12 - Generate CHANGELOG.md for charts We use reno>=4.1.0 features to combine release notes from a bunch of release notes files. Reno uses git tags to figure out which release notes files to include to a given release. When building charts for deployment tests we skip generating CHANGELOG.md files. Change-Id: I2f55e76844afa05139a5c4b63ecb6c0ae2bcb5b2 - Merge "Revert "Temporarily disable voting for ovn job"" - Merge "Ceph rook gates improvement" - Ceph rook gates improvement This patchset fixes the instability of the ceph-rook gates by adding extra nodes to the cluster. Also improved ceph deployment process monitoring. Change-Id: I405e501afc15f3974a047475a2b463e7f254da66 - Merge "ovn: implement Daemonset overrides" - Revert "Temporarily disable voting for ovn job" This reverts commit 7c6c32038d6e49bd8188d02199f054002266432d. Reason for revert: OVN issue is resolved Change-Id: I2425701e6075335433b90c949bac444fcebe3ac9 - Merge "Run ovn controller with non root openvswitch user" - Run ovn controller with non root openvswitch user We recently updated the openvswitch chart to run ovs db server as non root. See: https://review.opendev.org/c/openstack/openstack-helm-infra/+/939580 Also ovn-kubernetes script ovnkube.sh that we are using for lifecycle management of OVN components tries to update the ownership of OVS run and config directories before start. So we have to pass the correct username to the script so it does not break the OVS files permissions. Change-Id: Ie00dd2657c616645ec237c0880bbc552b3805236 - Ensure python and pip installed for lint and build chart jobs Change-Id: I7819d67894eff03e57fe1c22f02e167a6c63b346 - Update create db user queries This commit changes the queries to use % instead of %% in the Host field of CREATE USER and GRANT ALL statements. It also uplifts fresh jammy images for mariadb. Change-Id: I6779f55d962bc9d8efc3b3bfe05b72cbe0b7f863 - Temporarily disable voting for ovn job OVN jobs is failing due to recent changes: https://review.opendev.org/c/openstack/openstack-helm-infra/+/939580 https://review.opendev.org/c/openstack/openstack-helm-images/+/939589 This is to unblock unrelated PRs. Change-Id: Id6f411c8ddf819e3f96401995afe5fcdca2386af - Merge "update openvswitch to run with non-root user" - Merge "[memcached] Expose exporter port via service" - Install reno>=4.1.0 on test env nodes This is needed to generate CHANGELOG.md files from release nodes while building chart tarballs. Change-Id: I3c52f4ace6770515d64bfdf4433d27fd3a674eb0 - [memcached] Expose exporter port via service Pods may be discovered via prometheus endpoint scraper [0] expose exporter port via service to have ability to scrape over endpoints. [0] https://prometheus.io/docs/prometheus/latest/configuration/configuration/#endpoints Change-Id: I59a4472f13753db0ff2dc48559dd644d2648d97e - ovn: implement Daemonset overrides Change-Id: I2735748a200071c9488810456b8cccfc3bb2cff6 - Merge "Add OVN Kubernetes support" - update openvswitch to run with non-root user Change-Id: I27a0927fb8b01b4eb997e8e7b840adc7a9e56d26 - Merge "set hugepage mount point permission for nova when using dpdk" - Merge "Add release note template" - set hugepage mount point permission for nova when using dpdk Change-Id: Ic4b6e8aac5a4c6b6398e5ef03fa9608c43f766ed - Add OVN Kubernetes support This patch introduce OVN Kubernetes support. With OVN Kubernetes (https://github.com/ovn-org/ovn-kubernetes) OVN services control gets more native in Kubernetes way. At this point we only use OVN Kubernetes utilities to run and probe OVN components. We don't use OVN-Kubernetes CNI and CRD features. Depends-On: I2ec8ebb06a1ab7dca6651f5d1d6f34e417021447 Change-Id: I5821149c987070125f14d01c99343b72f234fc36 - Merge "[helm-toolkit] Allow to pass raw network policy" - Add release note template Change-Id: Ied6af6bf7521a92c70170a62d6ad8b29c731eac0 - Update values_overrides to use images from buildset registry Recently we moved all overrides to a separate directory and if we want to test images published to buildset registry we have to update those overrides before deployment. Change-Id: I9a515b5ba98be7ee0225fc1c95a35828055383f6 - [helm-toolkit] Allow to pass raw network policy Allow to pass raw network policy via values, labels without spec are ingnored in this case. values: | network_policy: myLabel: spec: <RAW SPEC> Change-Id: I87fce44f143fbdf9771ad043133dee22daced3f3 - Merge "[memcached] Allign with security best practices" - Merge "[memcached] Unhardcode port in exporter" - Merge "[memcached] Enasure liveness probe is enabled" - Merge "Delete setup.py to avoid validate_build_sdist" - Merge "[memcached] Drop max_surge option" - Merge "Ensure memcached pods antiaffinity" - Delete setup.py to avoid validate_build_sdist To create git tags we have to submit PRs to the openstack/releases which checks if a project contains setup.py file. If it does then the validation test tries to build sdist package. For openstack-helm this is not needed. Change-Id: I3030dcf21d58d54d37b03e2db20004d086dbfaa9 - [memcached] Allign with security best practices * Add runAsNonRoot directive * Drop all capabilities * Mount bianries with 550 and 65534 fsgroup Change-Id: I0636088b40ce8ebaef84dad017ddbcaaecfc8221 - [memcached] Unhardcode port in exporter * Pick up port for exporter from endpoints * Drop exporter port from service as we should not use service that do loadbalancing among pods which are independent Change-Id: I0408039ba87aca5b8b3c9333644fa0c92f0ca01a - [ceph-osd] Remove wait_for_degraded_objects This PS removes the wait_for_degraded_objects function from ceph-osd helm-test script because not all pgs may be in good condition even if all osds are up and running. The pgs will get healthy after complete osd charts set upgrade is complete. Change-Id: Ia8da3d96e01b765c5cb691dd0af15f36a7292e89 - Merge "Append metadata suffix when building charts" - [memcached] Enasure liveness probe is enabled Change-Id: I4980d2e9ec4fbfc8e57bd643b703d37c12b32dfa - [memcached] Drop max_surge option We do not use service proxy to comminicate to memcached. All services has exact number of endpoints to communicate. Having max_surge is useless as clients will never use it. Change-Id: I74a665c96cfc99cbb8d31c4a17700c467c746c9e - Ensure memcached pods antiaffinity Use required* antiaffinity to make sure we do not have two pods sitting on same node as it does not make any sense. Change-Id: I6c0c55733b75eb1bd53eee855907533d672dbf22 - Append metadata suffix when building charts Change-Id: Ic9af11193f097c3bad99b63c63abc5e8dd93de53 - [deploy-env] Fix fetching images Even with the docker proxy cache we often get jobs failed due to Docker Hub rate limits. As per recommendation from the Opendev Infra team let's pull as many as possible images from other registires. This PR updates the dnsmasq and nginx images used for auxiliary purposes during deployments. Change-Id: I58946e6fc63d726e08d83ea7f96e7fef140ddf21 - Update versions of all charts to 2024.2.0 As per agreement with https://docs.openstack.org/openstack-helm/latest/specs/2025.1/chart_versioning.html Change-Id: Ia064d83881626452dc3c0cf888128e152692ae77 - Update Chart.yaml apiVersion to v2 Change-Id: I66dcaedefd0640f8a7b5343363354ba539d70627 - Enable temporarily disabled jobs Here I7bfdef3ea2128bbb4e26e3a00161fe30ce29b8e7 we disabled some jobs that involve scripts from OSH git repo because these scripts had to be aligned with the new values_overrides location and directory structure. Change-Id: I7d0509051c8cd563a3269e21fe09eb56dcdb8f37 - Move values overrides to a separate directory This is the action item to implement the spec: doc/source/specs/2025.1/chart_versioning.rst Also add overrides env variables - OSH_VALUES_OVERRIDES_PATH - OSH_INFRA_VALUES_OVERRIDES_PATH This commit temporarily disables all jobs that involve scripts in the OSH git repo because they need to be updated to work with the new values_overrides structure in the OSH-infra repo. Once this is merged I4974785c904cf7c8730279854e3ad9b6b7c35498 all these disabled test jobs must be enabled. Depends-On: I327103c18fc0e10e989a17f69b3bff9995c45eb4 Change-Id: I7bfdef3ea2128bbb4e26e3a00161fe30ce29b8e7 - [ceph] Fix for ceph-osd pods restart This PS updates ceph-osd pod containers making sure that osd pods are not stuck at deletion. In this PS we are taking care of another background process that has to be terminated by preStop script. Change-Id: Icebb6119225b4b88fb213932cc3bcf78d650848f - [ceph] Fix for ceph-osd pods restart This PS updates ceph-osd pod containers making sure that osd pods are not stuck at deletion. It adds missed lifecycle preStop action for log0runner container. Change-Id: I8d6853a457d3142c33ca6b5449351d9b05ffacda - [ceph] Fix for ceph-osd pods restart This PS updates ceph-osd pod containers making sure that osd pods are not stuck at deletion. Also added similar approach to add lifecycle ondelete hook to kill log-runner container process before pod restart. And added wait_for_degraded_object function to helm-test pod making sure that newly deployed pod are joined the ceph cluster and it is safe to go on with next ceph-osd chart releade upgrade. Change-Id: Ib31a5e1a82526906bff8c64ce1b199e3495b44b2 - Merge "Remove tini from ceph-osd chart" - Remove tini from ceph-osd chart Removing tini from ceph daemon as this didn't resolve an issue with log runner process as will be resolved in another change in post-apply job. Change-Id: I4ebb1d12e736d387e6e34354619a532dd50dfeae - Bump K8s to v1.31 Change-Id: I384b10ef7b2da42d2227b4134e4ece4c5f9aa6d1 - Merge "Remove 2023.1 build jobs" - Merge "[mariadb] Add probes for exporter" - Merge "Allow to use default storage class" - Merge "[mariadb] Add terminationGracePeriodSeconds" - Merge "[mariadb] Use service IP to discover endpoints" - Merge "[mariadb] Implement mariadb upgrade on start" - Merge "[mariadb] Avoid using deprecated isAlive" - [mariadb] Add probes for exporter Implement readiness/liveness probes for exporter Change-Id: I7e73872dd35b8e6adf67d585e7d4d9250eca70c3 - Allow to use default storage class When name of storage class is specified as default, do not add storageClassName option to let kubernetes pick a default Change-Id: I25c60e49ba770ce10ea2ec68c3555ffea49848fe - [mariadb] Add terminationGracePeriodSeconds Allow to set terminationGracePeriodSeconds for server instace to let more time to shutdown all clients gracefully. Increase timeout to 600 seconds by default. Change-Id: I1f4ba7d5ca50d1282cedfacffbe818af7ccc60f2 - [mariadb] Use service IP to discover endpoints It was observed that under certain circumstances galera instances can use old IP address of the node after pod restart. This patch changes the value of wsrep_cluster_address variable - instead of listing all dns names of the cluster nodes the discovery service IP address is used. In this case cluster_node_address is set to IP address instead of DNS name - otherwise SST method will fail. Co-Authored-By: Oleksii Grudev <[email protected]> Change-Id: I8059f28943150785abd48316514c0ffde56dfde5 - [mariadb] Implement mariadb upgrade on start Call mysql_upgrade during start to check and upgrade if needed Change-Id: I9c4ac1a5ea5f492282bb6bb1ee9923b036faa998 - [mariadb] Avoid using deprecated isAlive The method was deprecated and later dropped, switch to is_alive() Co-Authored-By: dbiletskiy <[email protected]> Change-Id: Ie259d0e59c68c9884e85025b1e44bcd347f45eff - Remove 2023.1 build jobs The 2023.1 release is unmaintained since 2024-10-30. See https://releases.openstack.org/ Change-Id: I8375b16338b172a5875b7a379df085020490305c - Merge "Update ceph-osd to be able to use tini" - Merge "ovn: fix resources" - [mariadb] Refactor liveness/readiness probes * Move all probes into single script to reduce code duplication * Check free disk percent, fail when we consume 99% to avoid data corruption * Do not restart container when SST is in progress Change-Id: I6efc7596753dc988aa9edd7ade4d57107db98bdd - [mariadb] Give more time on resolving configmap update conflicts Make 'data too old' timeout dependent on state report interval. Increase timeout to 5 times of report interval. Change-Id: I0c350f9e64b65546965002d0d6a1082fd91f6f58 - Prevent TypeError in get_active_endpoint function Sometimes "endpoints_dict" var can be evaluated to None resulting in "TypeError: 'NoneType' object is not iterable" error. This patch catches the exception while getting list of endpoints and checks the value of endpoints_dict. Also the amount of active endpoints is being logged for debug purposes. Change-Id: I79f6b0b5ced8129b9a28c120b61e3ee050af4336 - [mariadb] Remove useless retries on conflics during cm update The retries were originally added at [0] but they were never working. We pass fixed revision that we would like to see during patch to avoid race condition, into the safe_update_configmap. We can't organize retries inside function as it will require change of the original revision which may happen only at upper layer. Revert patch partially. [0] https://review.opendev.org/c/openstack/openstack-helm-infra/+/788886 Change-Id: I81850d5e534a3cfb3c4993275757c244caec8be9 - [mariadb] Stop running threads on sigkill Stop monitor cluster and leader election threads on sigkill. This allows to terminate all threads from start.py and actually exit earlier than terminationGracePeriod in statefulset. Drop preStop hook which is redundant with stop_mysqld() function call. Change-Id: Ibc4b7604f00b1c5b3a398370dafed4d19929fd7d - ovn: fix resources Change-Id: I2b0c70550379dd214bc67869a7c74518b7004c7f - [mariadb] Improve python3 compatibility Decode byte sequence into string before printing log. Change-Id: Icd61a1373f5c62afda0558dfadc2add9138cff6d - [mariadb] Improve leader election on cold start During cold start we pick leader node by seqno. When node is running of finished non gracefully seqno may stay as -1 unless periodic task update its based on local grastate.dat or will detect latest seqno via wsrep_recover. This patch adds an unfinite waiter to leader election function to wait unless all nodes report seqno different that -1 to make sure we detect leader based on correct data. Change-Id: Id042f6f4c915b21b905bde4d57d40e159d924772 - [mysql] Use constant for mysqld binary name Change-Id: I996141242dac9978283e5d2086579c75d120ed8b - Update ceph-osd to be able to use tini Sometimes the pod fails to terminate correctly, leaving zombie processes. Add option to use tini to handle processes correctly. Additionally update log-tail script to handle sigterm and sigint. Change-Id: I96af2f3bef5f6c48858f1248ba85abdf7740279c - Merge "Mariadb chart updates" - Merge "Update grafana helm test" - Merge "ovn: make gateway label configurable" - Mariadb chart updates This PS is for improvements for wait_for_cluster mariadb job. Change-Id: I46de32243e3aaa98b7e3e8c132a84d7b65d657cc - Update grafana helm test Adds setting XDG_CONFIG_HOME and XDG_CACHE_HOME to a writable path. Change-Id: Ieb2a6ca587ecefe24d04392970c415409c8f5e1b - Update helm test for Elasticsearch Removing the use of python during helm test script as it is no longer in the image. Change-Id: Id8feff1bee8c3f2dd277307d176f6a535c5f7ba6 - ovn: make gateway label configurable Change-Id: I88ab77e61e9766e12eb3aff899e0d6dd24a8d3c0 - Merge "Add 2024.2 overrides" - Merge "[helm-toolkit] Fix db-init and db-drop scripts" - [memcached] Fix statefulset spec format Recently we switched from Deployment to Statefulset to make it possible to work with memcached instances directly w/o load balancer. The strategy field is not valid for statefulsets, so here we remove it. Change-Id: I52db7dd4563639a55c12850147cf256cec8b1ee4 - Add 2024.2 overrides Change-Id: Ic43f14e212f4de6616b4255bdd5ce562c5bcf9b0 - [helm-toolkit] Fix db-init and db-drop scripts Wrap queries into sqlalchemy.text before executing them. Change-Id: I783bd05bdd529c73825311515e1390f3cc077c4f - Merge "Add app.kubernetes.io/name label to openstack pods" - Merge "[mariadb] Add cluster wait job" - Add app.kubernetes.io/name label to openstack pods This commit adds recommended kubernetes name label to pods definition. This label is used by FluxCD operators to correctly look for the status of every pod. Change-Id: I866f1dfdb3ca8379682e090aca4c889d81579e5a Signed-off-by: Johnny Chia <[email protected]> - Merge "Allow share OVN DB NB/SB socket" - Merge "Revert "[rabbitmq] Use short rabbitmq node name"" - Allow share OVN DB NB/SB socket This will help other services to access to OVN DB. So services like Octavia can use OVN Octavia provider agent. Change-Id: Iddaa6214ece63a5f1e692fe019bcba1b41fdb18f - Merge "[mariadb] Remove ingress deployment" - Merge "Allow to package charts in specified directory" - Merge "Allow to pass custom helm charts version" - Allow to package charts in specified directory Use make PACKAGE_DIR=/foo/bar/ Change-Id: I37db3f507c9375c64081adcf994ede3829dbb34b - Allow to pass custom helm charts version * Allow to pass custom helm chart version during build like make all version=1.2.3+custom123 * add get-version target that allows to get version based on number of git commits in format <git-tag>+<commits number> Change-Id: I1f04aeaa8dd49dfa2ed1d76aabd54a0d5bf8f573 - [helm-toolkit] Update toolkit to support fqdn alias This change add the ability to add fqdn alias to namespace and cluster ingress resources. This change is specifically required for keystone so HA of backup solution can be implemented.This change allows user to specify alias_fqdn in the endpoints section, and user can have alias configued. This change is backward compatible, so without specifying this option in charts gives one fqdn ingress rule without cname alias as default behaviour. Change-Id: Ib1c60524e2f247bb057318b1143bfbc3bde5b73a - Revert "[rabbitmq] Use short rabbitmq node name" Rabbitmqcluster does not work with short node names, as there is unresolvable dependency in dns resolution, it is not possible to resolve only pod name svc must be added. This reverts commit bb7580944a5268a1e5f7fcd195b156f53dc668c5. Change-Id: I42b25ba4f569bae94bbc2939a1022bd14e66e527 - [libvirt] Add 2023.1 overrides Recently we fixed the libvirt.sh script and removed the conditionals cgroup commands which were introduced for smooth transition to Jammy and cgroups v2 https://review.opendev.org/c/openstack/openstack-helm-infra/+/929401 But because we didn't have overrides for 2023.1 we used to run 2023.1 with the default libvirt image openstackhelm/libvirt:latest-ubuntu_focal which does not work with cgroups v2 on the host system with this recent fix (see above). So the 2023.1 Ubuntu Jammy compute-kit test jobs fails. This PR fixes this job by means of introducing explicit image overrides for 2023.1. Change-Id: Ie81f8fb412362388274ea92ad7fa5d3d176c0441 - Merge "Add local volume provisioner chart" - Merge "[libvirt] Implement daemonset overrides for libvirt" - Merge "[libvirt] Make readiness probe more tiny" - Merge "[libvirt] Allow to generate dynamic config options" - Merge "[memcached] Change deployment type to statefulset" - [libvirt] Implement daemonset overrides for libvirt The patch implements libvirt daemonset to use overrides daemonset_overrides_root .Values: overrides: libvirt_libvirt: labels: label::value: values: override_root_option: override_root_value conf: dynamic_options: libvirt: listen_interface: null Change-Id: If4c61f248d752316c54955ebf9712bb3235c06fd - Merge "[mariadb] Switch to controller deployment" - Merge "[mariadb] Deploy exporter as sidecar" - Merge "[mariadb] Avoid using cluster endpoints" - Merge "[helm-toolkit] Add daemonset_overrides_root util" - Merge "Remove trailing slash in endpoinds" - Merge "Add ability to get multiple hosts endpoint" - Add local volume provisioner chart Some applications require perisitant volumes to be stored on the hosts where they running, usually its done via kubernetes PV. One of PV implementations is local-volume-provisioner [0] This patch adds helm chart to deploy LVP. Since LVP creates a volumes for each mountpoint, helm chart provides a script to create mountpoints in the directory, which later exposed to kubernetes as individual volumes. Change-Id: I3f61088ddcbd0a83a729eb940cbf9b2bf1e65894 - [memcached] Change deployment type to statefulset For effective cache use all endpoints should be specified explicitly as memcache client use specific algorithm to identify on which cache server key is stored based on servers availability and key name. If memcached deployed behind the service unless same key is stored on all memcached instances clients will always got cache misses and will require to use heavy calls to database. So in the end all keys will be stored on all memcached instances. Furthermore delete operations such as revoke token or remove keystone group call logic in service to remove data from cache if Loadbalancer is used this functionality can't work as we can't remove keys from all backends behind LB with single call. Change-Id: I253cfa2740fed5e1c70ced7308a489568e0f10b9 - [mariadb] Add cluster wait job Add job that waits when initial bootstrapping of cluster is completed which is required to pause db creation and initialization when cluster is not fully bootstrapped. Change-Id: I705df1a1b1a34f464dc36a36dd7964f8a7bf72d9 - [mariadb] Remove ingress deployment Ingress deployment is not used for a while and there are more elegant ways to provide same functionality based on controller to pick up master service. Remove ingress deployment completely. Change-Id: Ica5d778f5122f8a4f0713353aa5e0ef4e21c77f8 - [mariadb] Switch to controller deployment Move primary node selector into mariadb controller, this patch partially reverts 07bd8c92a259557d07119525c85bea4b8fc6006e Change-Id: Id53a6503b177f0c46e89a7def2c0773a68b8d8e8 - Merge "Add snippet configmap_oslo_policy" - [libvirt] Make readiness probe more tiny Use virsh connect instead of list which is heavy and may stuck for a while when libvirt creating domains. Change-Id: I515c70b0b3a050599726ca2548eeeb7fd3f3e6ea - [libvirt] Allow to generate dynamic config options It may be required to use some dynamic options such as IP address from interface where to bind service. This patch adds ability to use dynamic logic in option detection and fill it in the configuration file later. Co-Authored-By: dbiletskiy <[email protected]> Change-Id: I8cc7da4935c11c50165a75b466d41f7d0da3e77c - Merge "[libvirt] Allow to initialize virtualization modules" - [helm-toolkit] Add daemonset_overrides_root util The helm-toolkit.utils.daemonset_overrides function have some limitations: * it allows to override only conf values specifid in configmap-etc * it doesn't allow to override values for daemonsets passed via env variables or via damoenset definition. As result it is impossible to have mixed deployment when one compute is configured with dpdk while other not. * it is impossible to override interface names/other information stored in <service>-bin configmap * It allows to schedule on both hosts and labels, which adds some uncertainty This implementation is intended to handle those limitations: * it allows to schedule only based on labels * it creates <service>-bin per daemonset override * it allows to override values when rendering daemonsets It picks data from the following structure: .Values: overrides: mychart_mydaemonset: labels: label::value: values: override_root_option: override_root_value conf: ovs_dpdk: enabled: true neutron: DEFAULT: foo: bar Change-Id: I5ff0f5deb34c74ca95c141f2402f375f6d926533 - Remove trailing slash in endpoinds This patch removes trailing slash in endpoint address in case the path is empty. Co-Authored-By: Vasyl Saienko [email protected] Change-Id: I11ace7d434b7c43f519d7ec6ac847ef94916202f - Add ability to get multiple hosts endpoint For memcache we should set specify all hosts directly in the config as client do key spreading based on what hosts are alive, when LB address is used memcached can't work effectively. This patch updates endpoint_host_lookup to handle this scenario Change-Id: I8c70f8e9e82bf18d04499a132ef9a016d02cea31 - Add snippet configmap_oslo_policy Openstack policies can be applied without service restart keep all policies in single configmap to have ability to do not restart services on policy changes. This patch adds a snippet of configmap that will later be used in other helm charts. Change-Id: I41d06df2fedb7f6cf0274c886dc9b94134507aca - Merge "[rabbitmq] Use short rabbitmq node name" - Merge "[rabbitmq] Set password for guest user rabbitmq" - Merge "[memcached] Allow to configure additional service parameters" - Merge "[mariadb] Add mariadb controller support" - Merge "Add service params snippet" - Merge "[libvirt] Remove hugepages creation test" - Merge "[libvirt] Handle cgroupv2 correctly" - Merge "Add compute-kit-2023-1-ubuntu_focal job" - Merge "[etcd] Add cronjob with database compaction" - Merge "[etcd] Switch etcd to staetefulset" - [libvirt] Allow to initialize virtualization modules Add init-modules libvirt container which allows to initialize libvirt modules during start. The script is provided via .Values.init_modules.script data structure Change-Id: I9d5c48448b23b6b6cc18d273c9187a0a79db4af9 - [libvirt] Remove hugepages creation test The tests is useless as libvirt is not running in the pod cgroup so pod settings are not applied to it. Change-Id: Ice3957c800e29a0885a341103c453c4d6c921fd3 - [libvirt] Handle cgroupv2 correctly The list of default kernel cgroup controllers may be changed an example is kernel upgrade from 5.4.x to 5.15.x where misc controller is enabled by default. Unhardcode list of controllers to have ability to override them for never kernel version and allow to do not kill qemu processes with container restart. Change-Id: Ic4f895096a3ad2228c31f19ba1190e44f562f2a0 - Add compute-kit-2023-1-ubuntu_focal job This is necessary to test if libvirt changes are compatible with cgroups v1. Change-Id: I3cfb4e747a4cd23bc2d7051ef526fd58dc38aaf8 - [mariadb] Deploy exporter as sidecar Deploy exporter as a sidecar to provide correct mysql metrics. Co-Authored-By: Oleh Hryhorov <[email protected]> Change-Id: I25cfeaf7f95f772d2b3c07a6a91220d0154b4eea - [mariadb] Avoid using cluster endpoints Switch to namespaced based endpoints to remove requirement configure kubernetes internal cluster domain name which can't be get from kubernetes API. Change-Id: I8808153a83e3cec588765797d66d728bb6133a5c - [memcached] Allow to configure additional service parameters Use the following structure in values to define addtional service parameters: Values: network: memcached: service: type: loadBalancer loadBalancerIP: 1.1.1.1 Change-Id: I94c87e530d90f603949ccacbf0602273feec741a - [mariadb] Add mariadb controller support This patch adds mairadb controller that is responsible to mark one ready pod as mariadb_role: primary to forward all traffic to it. This will allow to drop nginx ingress controller which adds extra hops between client and server and uses heavy customized nginx templates. Change-Id: I3b29bc2029bfd39754516e73a09e4e14c52ccc99 - Add service params snippet Allows to add custom parameters to services, and ingress services from values as is. Co-Authored-By: Mykyta Karpin <[email protected]> Change-Id: I42b8d07126de2cf12ddc3a934d1fd4e3a2ee0051 - [etcd] Add cronjob with database compaction etcd database need to be periodically compacted and defrag This patch adds jobs to perform required maintenance actions automatically. Co-Authored-By: Oleh Hryhorov <[email protected]> Change-Id: I31b48bb198f7322c343c7d0171322759893e374f - [etcd] Switch etcd to staetefulset * Switch etcd to statefulset * Allow to use persistant volumes to store etcd data * Allow to deploy in clustered mode Change-Id: I2baf5bdd05c280067991bb8b7f00c887ffd95c20 - [rabbitmq] Use short rabbitmq node name The patch switches rabbitmq to use short node names, this will allow to do not care about internal domain name as it is can't be get from k8s API. Change-Id: I6d80bc4db4e497f7485fb5416818e0b61f821741 Related-Prod: PRODX-3456 - [rabbitmq] Set password for guest user rabbitmq Guest account is enabled by default and has access to all vhosts. Allow to change guest password during rabbitmq configuration. Change-Id: If23ab8d5587b13e628bce5bcb135a367324dca80 - [rabbitmq] Allow to bootstrap rabbitmq with initial config Prepare rabbitmq to be running in non clustered mode, in which it may be useful to bootstrap cluster with fresh data each time since we do not use durable queues in openstack that are stored on filesystem. The two new data strucutre in rabbitmq Values are added: users: auth: keystone_service: username: keystone password: password path: /keystone aux_conf: policies: - vhost: "keystone" name: "ha_ttl_keystone" definition: ha-mode: "all" ha-sync-mode: "automatic" message-ttl: 70000 priority: 0 apply-to: all pattern: '^(?!amq\.).*' Change-Id: Ia0dd1a8afe7b6e894bcbeafedf75131de0023df0 - [rabbitmq] Do not use hardcoded username in rabbitmq chown container Pick up UID from .Values.pod.security_context.server.pod.runAsUser as this is user that we are using to run service. Change-Id: Id4c53b0a882b027e320b08ed766cb473ab9ab535 - [rabbitmq] Update readiness/liveness commands Use lightweigh rabbitmqctl ping command to check readiness and liveness probe. check_port_connectivity - is not suatable for liveness as it does not check that instance of rabbitmq is actually running and we can authenticate. Change-Id: I6f157e9aef3450dba1ad7e0cb19491a41f700bbc - Decode url-encoded password for rabbit connection Resolve that access fails when the Rabbitmq password contains special characters by the changes below. https://pikachu.space/openstack/openstack-helm-infra/commit/6c5cc2fdf04d32fbf5fed2b90c6fdca60286d567 story: 2011222 task: 50999 Change-Id: I0cfc6e2228bc4b1327efb7da293849d6d1bbff19 - Run utils-defragOSDs.sh in ceph-osd-default container The Ceph defragosds cronjob script used to connect to OSD pods not explicitly specifying the ceph-osd-default container and eventually tried to run the defrag script in the log-runner container where the defrag script is mounted with 0644 permissions and shell fails to run it. Change-Id: I4ffc6653070dbbc6f0766b278acf0ebe2b4ae1e1 - Merge "Update deploy-env role" - Update deploy-env role - Use kubeadm configuration to not set taints on control plain nodes (instead of removing them after deployment). - Fix ssh client key permissions. - Update the Mariadb ingress test job so it is inherinted from the plain compute-kit test job. And also remote it from the check pipeline. Change-Id: I92c73606ed9b9161f39ea1971b3a7db7593982ff - [osh-selenium] Upgrade image to ubuntu_jammy + run tests in a read-only file system + change google-chrome data directory from ~/.config/google-chrome (which is immutable) to /tmp/google-chrome (writable), otherwise Chrome fails to launch + activate new headless mode as the old one will be soon removed https://developer.chrome.com/docs/chromium/new-headless Change-Id: I7d183b3f3d2fdc3086a5db5fa62473f777b9eb7a - Ingress-nginx controller upgrade for mariadb This PS bumps up ingress-nginx controller version to v1.11.2 in mariadb chart due to CVE vulnerability. nginx.tmpl from mariadb chart has been updated to match the latest 1.11.2 ingress-controller image. Change-Id: Ie2fd811f8123515f567afde62bbbb290d58dd1b2 - Merge "Add the ability to use custom Nagios plugins" - Add the ability to use custom Nagios plugins Change-Id: Ib309499140994448d7b3e0eef0c875c6edb3a2ac - Add retry logic to index creation script - Re-add the retry logic back to the index creation script. - Fixed small regex bug. - Also added function to lookup the id of a view, because the new views API requires an id to set the default view. - Set noglob to make sure the asterisks in the view names aren't expanded. Change-Id: Idfd56f09a739731f2ce3153b8fc284bb499a91d4 - Merge "[ceph] Remove dependencies on legacy provisioners" - [ceph] Remove dependencies on legacy provisioners The legacy RBD provisioner and the CephFS provisioner haven't been used in some time. This change removes them. Change-Id: I313774627fcbaed34445ebe803adf4861a0f3db5 - parse nova metadata in libvirt exporter Change-Id: Ib49968d919bda72caffd09d57a283587ae867fec - Merge "Updating script to use data views to support kibana 8.0 and beyond as some of api is now depreacated." - Updating script to use data views to support kibana 8.0 and beyond as some of api is now depreacated. Change-Id: I58d5c388cc0f6ba56c5fe646be352a0641e0661d - Upgrade env - K8s 1.30.3 - Helm 3.14.0 - Crictl 1.30.1 - Calico 3.27.4 - Cilium 1.16.0 - Ingress-nginx Helm chart 4.11.1 Change-Id: I3d5a3d855b0b4b0b66e42d94e1e9704f7f91f88b - Add 2024.1 overrides to some charts - Add 2024.1 overrides to those charts where there are overrides for previous releases. - Update some jobs to use 2024.1 overrides. - Update default images in grafana, postgresql, nagios, ceph-rgw, ceph-provisioners, kubernetes-node-problem-detector - Install tzdata package on K8s nodes. This is necessary for kubernetes-node-problem-detector chart which mounts /etc/localtime from hosts. Change-Id: I343995c422b8d35fa902d22abf8fdd4d0f6f7334 - Merge "Use predefined Helm repo in deployment scripts" - Update deploy-env role When generating keys and sharing them between nodes in a multinode env it is important that task which generates keys is finished before trying to use these keys on another node. The PR splits the Ansible block into two blocks and makes sure the playbook deploy-env is run with the linear strategy. Thus we can be sure that keys are first generated on all affected nodes and only then are used to setup tunnels and passwordless ssh. Change-Id: I9985855d7909aa5365876a24e2a806ab6be1dd7c - Use predefined Helm repo in deployment scripts Change-Id: Icd55637a8909cc261e6bde307e556476cacb1c1f - Merge "ovn: Use chart name in oci_image_registry secret" - Remove gateway node role With elasticsearch 8 gateway is no longer a valid node role https: //www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html#node-roles Change-Id: I4f522bc29b51645b6cfc16faaa3d250d7b18c51f - ovn: Use chart name in oci_image_registry secret The current values.yaml uses the service name to create separate secrets. However, helm-toolkit indexes into oci_image_registry using .Chart.Name and not $serviceName so the secrets are not used. Change-Id: I50f575f951c19ab728f9e40a73bc893e4f7356f2 - Add Flannel deployment to deploy-env role Change-Id: I72f3f29196ea1d433655c8862ac34718df18c7ea - Update kubernetes-entrypoint image Use quay.io/airshipit/kubernetes-entrypoint:latest-ubuntu_focal by default instead of 1.0.0 which is v1 formatted and not supported any more by docker. Change-Id: I6349a57494ed8b1e3c4b618f5bd82705bef42f7a - Align db scripts with sqlalchemy 2.0 Change-Id: I0b6c500e8257c333c16c15d7d338651ee5b2ca27 - [fluentd] Adjust configuration for v1.15 + prevent Fluentd from parsing its own logs and fix an issue with endless backslashes (https://github.com/fluent/fluentd/issues/2545) + increase chunk limit size + add storage for systemd plugin configuration + add pos_file parameter for the tail sources Change-Id: I7d6e54d2324e437c92e5e8197636bd6c54419167 - Test job for legacy OSH Ceph to Rook migration At the moment the recommended way of managing Ceph clusters is using Rook-Ceph operator. However some of the users still utilize legacy OSH Ceph* charts. Since Ceph is a critical part of the infrastructure we suggest a migration procedure and this PR is to test it. Change-Id: I837c8707b9fa45ff4350641920649188be1ce8da - Add Cilium deployment to deploy-env role Change-Id: I7cec2d3ff09ec3f85992162bbdb8c351660f7de8 - Merge "Couple tiny fixes for deploy-env role" - Couple tiny fixes for deploy-env role - typo in the setup of wireguard tunnel - wrong home directory when setup k8s client for root user Change-Id: Ia50f9f631b56538f72843112745525bc074e7948 - Setup passwordless ssh from primary to cluster nodes Here we add Ansible tasks to the deploy-env role to setup passwordless ssh from the primary node to K8s cluster nodes. This is necessary for some test scripts like for example Ceph migration script. Change-Id: I1cae1777d51635a19406ea054f4d83972e5fe43c - Update curator to 8.0.10 Update es curator to 8.0.10 and use appropriate config options for the es_client python module that has been incorporated in 8.0.9 https://github.com/elastic/curator/compare/v8.0.8...v8.0.9 https: //github.com/elastic/curator/blob/bd5dc942bbf173d5e456f1a3c5ca8bec1c0df2ac/docs/usage.rst#log-settings Change-Id: I88071162f5bc0716bfb098525ed2eacd48367d98 - Merge "Simplify ceph-adapter-rook" - Merge "Update deploy-env role to support root user" - Simplify ceph-adapter-rook - Do not deploy anything in the ceph namespace - Prepare admin key secret in the openstack namespace. Get admin key from the Ceph tools pod - Prepare Ceph client config with the mon_host taken from the rook-ceph-mon-endpoints configmap as recommended in the Rook documentation. Change-Id: Idd4134efab49de032a389283e611c4959a6cbf24 - Add value for rendering sidecar without feature Add option to deploy rendering sidecar without the k8s sidecar feature. Change-Id: I4b8052166bad8965df9daa6b28e320d9132150cd - Update deploy-env role to support root user Change-Id: I4126155eec03677cf29edfb47e80f54ab501705d - Add image rendering sidecar This PS is to add a sidecar for the grafana image renderer. Starting with Grafana v10 it will be necessary to use an image rendering plugin or remote renderer. https://grafana.com/docs/grafana/latest/setup-grafana/image-rendering/ Change-Id: I4ebdac84769a646fa8154f80aaa2692c9f89eeb8 - [openstack-exporter] Switch to jammy-based images Change-Id: I5326bb5231d3339d722ac67227e60bac592eb916 - Updating openvswitch to run as child process On containerd v1.7+ openvswitch restarts when containerd is restarted. To prevent this add tini and run OVS as a child process. Change-Id: I382dc2db12ca387b6d32304315bbee35d8e00562 - Use OSH helm plugin rabbitmq and memcached scripts Change-Id: Ia06ee7f159c6ed028ab75fcb5707ee6e42179d98 - Merge "Fix selenium test for additional compatibility." - Fix selenium test for additional compatibility. Change-Id: I2b5bd47d1a648813987ff10184d2468473454dfd - Bump K8s version to 1.29.5 Change-Id: I4a3c7a17f32b5452145e1677e3c5072875dc9111 - Merge "Escape special characters in password for DB connection" - Escape special characters in password for DB connection The passwords with special characters need to be URL encoded to be parsed correctly Change-Id: Ic7e0e55481d9ea5ce2621cf0d67e80b9ee43cde0 - Cleanup unused scripts Change-Id: I3bad13cc332fd439b3b56cfa5fc596255bc466f2 - Merge "Fix typo in the ovn chart" - Fix typo in the ovn chart Change-Id: Ib69c6af7b79578090e23ea574da0029cf3168e03 - Merge "Add configurable probes to rabbitmq" - Add configurable probes to rabbitmq Currently rabbitmq probes are hardcoded with no ability to customize via values. Signed-off-by: Ruslan Aliev <[email protected]> Change-Id: Ibbe84e68542296f3279c2e59986b9835fe301089 - [deploy-env] Add mirror to Docker configuration There are some docker_container tasks which pull docker images. This commit adds mirror configuration to daemon.json to prevent encountering issues related to the pull rate limit. + update tls job according to the changes in openstack-helm Depends-On: Ia58916e3dc5e0f50b476ece9bba31d8d656b3c44 Change-Id: Iac995500357336566cdbf9ddee0ae85b0b0347cd - [chromedriver] Loosen compatibility up with Chrome Chromedriver had strict version selection. This commit allows it to pick the closest patch version to google-chrome-stable Change-Id: I435985573f69ee4bb0f6009416452649f302c0fe - Add env variables to deploy from Helm repos These env variables will be defined in test jobs. By default we will deploy from local charts but some jobs will deploy from charts published on a HTTP server (local or public). - OSH_HELM_REPO - OSH_INFRA_HELM_REPO - DOWNLOAD_OVERRIDES Change-Id: Ic92b97eb5df4f7f8c4185c06654de4b4d890fbc6 - Remove ingress chart We have not been using it for a while since some time ago we switched to the upstream ingress-nginx. Change-Id: I2afe101cec2ddc562190812fc27bb3fad11469f1 - Install OSH Helm plugin Depends-On: I71ab6ad104beb491b5b15b7750e2fc0988db82bf Change-Id: I8f30fbdf94d76ef9fa2985a25c033df290995326 - [chromedriver] Change json api endpoint Choose a more reliable json file from the upstream to refer to. "Stable" versions of Chrome and Chromedriver became unsynchronized for some reason. Change-Id: I1688a867ea1987105e7a79c89ba7ea797819a12f - Merge "Clean up outdated deploy k8s scripts" - Update test jobs - Remove openstack-helm-infra-openstack-support* jobs. Instead of these jobs we run compute-kit, cinder and tls jobs defined in the openstack-helm repo. - Remove all experimental jobs since they are outdated and do not work. We will later add some of the test cases including apparmor, network policy, tenant Ceph and others. Change-Id: I8f3379c06b4595ed90de025d32c89de29614057d - Clean up outdated deploy k8s scripts Change-Id: I8481869a6547feae2ac057b65c8c4aecc2c1f505 - Enable job for DPDK Depends-On: I3ad5b63a0813761a23573166c5024e17d87f775d Change-Id: I4851767a79bc4571a0f38622fe309807b53a7504 - Merge "helm-toolkit: Enable custom secret annotations" - Merge "Add conf file for MongoDB" - Merge "make ovn db file path as configurable" - make ovn db file path as configurable Change-Id: I8b0f5c0bda2f1305e0460adc35e85b130f4cf9ff - Add conf file for MongoDB Change-Id: If6635557d4b0f65188da0d7450ad37630b811996 - helm-toolkit: Enable custom secret annotations Enable custom annotations for secrets [registry, tls] Change-Id: I811d5553f51ad2b26ea9d73db945c043ee2e7a10 - Merge "Update deploy-env role README.md" - Merge "Add 2023.2 Ubuntu Jammy overrides" - add custom job annotations snippet and use it Add the ability for charts that use helm-toolkit to allow the users to set custom annotations on jobs. Use the snippet in a generic way in the job templates provided by helm-toolkit. Change-Id: I5d60fe849e172c19d865b614c3c44ea618f92f20 Depends-On: I3991d6984563813d5a3a776eabd52e2e89933bd8 Signed-off-by: Doug Goldstein <[email protected]> - Update deploy-env role README.md Change-Id: Ia2ace3541be97577f1225d54417f6a287b7a8eb2 - Run more test jobs when helm-toolkit updated Specifically we would like at least the following deployments to be tested when helm-toolkit is updated - compute-kit - cinder - tls Change-Id: I3991d6984563813d5a3a776eabd52e2e89933bd8 - Merge "Add 2024.1 overrides" - Fix coredns resolver Forward requests for unknown names to 8.8.8.8 NOTE: Temporarily disable DPDK job which turned to be incompatible with this PR https://review.opendev.org/c/openstack/openstack-helm/+/914399 It wasn't tested with the DPDK job. Change-Id: I936fb1032a736f7b09ad50b749d37095cce4c392 - Add 2024.1 overrides Depends-On: Iadc9aec92b756de2ecfcb610e62c15bdbad4bb9e Change-Id: Icf98f9af863f60fa93ff70d2e8256810bed2b9f9 - Add 2023.2 Ubuntu Jammy overrides Change-Id: Ia23370d07faf1f8a1e05447459ce9872e8d4e875 - Rename dpdk job name to reflect Openstack version Change-Id: I9c04a60ae8b7fde35a8a970e3b74bcaad7bd564f - Merge "Add custom secret annotations helm-toolkit snippet" - Add custom secret annotations helm-toolkit snippet Change-Id: Ic61afcb78495b35ee42232b435f54344f0a0a057 - Bump RabbitMQ version 3.9.0 -> 3.13.0 Also - Update default Heat image to 2023.2 used for init and test jobs - Add overrides for - yoga-ubuntu_focal - zed-ubuntu_focal - zed-ubuntu_jammy - 2023.1-ubuntu_focal - 2023.1-ubuntu_jammy - 2023.2-ubuntu_jammy Change-Id: I516c655ea1937f9bd1d363ea86d35e05e3d54eed - Merge "Refactor deploy-env role" - Merge "Add custom pod annotations helm-toolkit snippet" - Refactor deploy-env role - Make it less mixed. Each task file deploys one feature. - Deploy Metallb - Deploy Openstack provider network gateway Change-Id: I41f0353b286f817cb562b3bd59992e4baa473568 - Merge "Bump containerd sandbox image from 3.6 to 3.9" - Merge "Update ovn controller init script" - Add custom pod annotations helm-toolkit snippet Change-Id: I898afae7945c03aec909e5edcd1c760c4d8ff9d6 - Update ovn controller init script - OVN init script must be able to attach an interface to the provider network bridge and migrate IP from the interface to the bridge exactly like Neutron OVS agent init script does it. - OVN init script sets gateway option to those OVN controller instances which are running on nodes with l3-agent=enabled label. Change-Id: I24345c1f85c1e75af6e804f09d35abf530ddd6b4 - Bump containerd sandbox image from 3.6 to 3.9 Fixes the following kubeadm warning: W0321 01:33:46.409134 14953 checks.go:835] detected that the sandbox image "registry.k8s.io/pause:3.6" of the container runtime is inconsistent with that used by kubeadm. It is recommended that using "registry.k8s.io/pause:3.9" as the CRI sandbox image. Change-Id: I8129a6e9ad3acdf314e2853851cd5274855e3209 - [rook-ceph] Add a script to migrate Ceph clusters to Rook This change adds a deployment script that can be used to migrate a Ceph cluster deployed with the legacy openstack-helm-infra Ceph charts to Rook. This process is disruptive. The Ceph cluster goes down and comes back up multiple times during the migration, but the end result is a Rook-deployed Ceph cluster with the original cluster FSID and all OSD data intact. Change-Id: Ied8ff94f25cd792a9be9f889bb6fdabc45a57f2e - Fix registry bootstrap values The quay.io/airshipit/kubernetes-entrypoint:v1.0.0 image format is deprecated and not supported any more by the docker registry. This is temporary fix to download the image from third party repo until we update the quay.io/airshipit/kubernetes-entrypoint:v1.0.0. The deprecation message is as follows: [DEPRECATION NOTICE] Docker Image Format v1 and Docker Image manifest version 2, schema 1 support is disabled by default and will be removed in an upcoming release. Suggest the author of quay.io/airshipit/kubernetes-entrypoint:v1.0.0 to upgrade the image to the OCI Format or Docker Image manifest v2, schema 2. More information at https://docs.docker.com/go/deprecated-image-specs/ The docker-registry container must start not earlier than docker-images PVC is bound. Change-Id: I6bff98aa7d0b23e13a17a038f3039b7956703d40 - Fixing rolebindings generation for init container This part has to use the same configuration as init container: see line 96 Change-Id: I06c1f3ad586863d4dcfab559d13a592fc576f857 - Merge "Update Ceph images to patched 18.2.2 and restore debian-reef repo" - Update Ceph images to patched 18.2.2 and restore debian-reef repo This change updates the Ceph images to 18.2.2 images patched with a fix for https://tracker.ceph.com/issues/63684. It also reverts the package repository in the deployment scripts to use the debian-reef directory on download.ceph.com instead of debian-18.2.1. The issue with the repo that prompted the previous change to debian-18.2.1 has been resolved and the more generic debian-reef directory may now be used again. Change-Id: I85be0cfa73f752019fc3689887dbfd36cec3f6b2 - Include values_overrides for OpenStack components Fixes issue where override files for OS charts were missing due to specifying the wrong project directory. Change-Id: I4af6715a33c7de43068ed76a8115c12a2c0969ed - Merge "bugfix: updated permissions of ceph user created to allow rbd profile" - [ceph-osd] Allow lvcreate to wipe existing LV metadata In some cases when OSD metadata disks are reused and redeployed, lvcreate can fail to create a DB or WAL volume because it overlaps an old, deleted volume on the same disk whose signature still exists at the offsets that trigger detection and abort the LV creation process when the user is asked whether or not to wipe to old signature. Adding a --yes argument to the lvcreate command automatically answers yes to the wipe question and allows lvcreate to wipe the old signature. Change-Id: I0d69bd920c8e62915853ecc3b22825fa98f7edf3 - Workaround for debian-reef folder issue This PS changes ceph repo to debian-18.2.1 from debian-reef due to some issues with debian-reef folder at https://download.ceph.com/ Change-Id: I31c501541b54d9253c334b56df975bddb13bbaeb - bugfix: updated permissions of ceph user created to allow rbd profile Change-Id: I9049e4312aa6cb92a832d5100ba1da995233c48e - [mariadb] Switch to ingress-less mariadb This PS switches mariadb to use primary service by default instead of ingress based deployment. The primary service that is getting created and automatically updated based on the leader election process in start.py entrypoint script. Mariadb primary service was introduced by this PS: https://review.opendev.org/c/openstack/openstack-helm-infra/+/905797 Change-Id: I4992276d0902d277a7a81f2730c22635b15794b0 - Merge "Remove unused nodesets" - Add compute-kit job with DPDK enabled + add role for enabling hugepages Change-Id: I89d3c09ea3bedcba6cb51178c8d1ac482a57af01 Depends-On: I2f9d954258451f64eb87d03affc079b71b00f7bd - Merge "[deploy-env] Docker env setup" - Merge "Remove some aio jobs" - [deploy-env] Docker env setup This PS adds connection reset for ansible session letting zuul user to use newly installed docker environment without sudo Change-Id: I37a2570f1dd58ec02338e07c32ec15eacbfaf4b6 - Remove calico chart Tigera provides tools for managing Calico deployments (helm chart, operator and even plain kubectl manifest). Also there are plenty of other networking solutions on the market and it looks like users can choose on their own the CNI implementation. There have not been many contributions to this chart for quite some time and we don't use this chart in any test jobs. In the deploy-env role we use the upstream Calico manifest. Change-Id: I6005e85946888c52e0d273c61d38f4787e43c20a - Remove unused nodesets Change-Id: Ifc5ea6a83729fc2313c209f683ef7476d6a14272 - Remove some aio jobs These two jobs openstack-helm-infra-aio-monitoring and openstack-helm-infra-aio-logging were only needed for backward compatibility. Depends-On: I9c3b8cd18178aa57ce44564490ef1b61f275ae29 Change-Id: I09d0e48128a3fd98fa9148b8e520df75d6e5be50 - Merge "Bump Calico version to v3.27.0" - Fix prevent trailing whitespace lint command Recently we added a jpg file to OSH documentation but the lint job didn't run due to the job configuration. But then for the next PR link job did run and failed due to trailing whitespace in the jpg file. Change-Id: I9abf8f93a4566411076190965f282375846dc5db - Bump Calico version to v3.27.0 Change-Id: I8daa54e70c66cec41733d6b9fd5c9dd4597ff9c1 - Merge "Use upstream ingress-nginx chart" - Use upstream ingress-nginx chart Change-Id: I90a1a1e27f0b821bbecfe493057eada81d4f9424 - Merge "Use containerized Openstack client" - Merge "[openvswitch] Add overrides values for dpdk" - Use containerized Openstack client Change-Id: I17c841b74bf92fc3ac375404b27fa2562603604f - [openvswitch] Add overrides values for dpdk Change-Id: I756f35f1251244bc76f87a18a1a2e51f13a8c010 - [ceph] Update Ceph images to Jammy and Reef 18.2.1 This change updates all Ceph images in openstack-helm-infra to ubuntu_jammy_18.2.1-1-20240130. Change-Id: I16d9897bc5f8ca410059a5f53cc637eb8033ba47 - [ceph-rook] Update Rook and increase ceph-mon memory limit This change updates Rook to the 1.13.3 release. It also increases the memory limit for ceph-mon pods deployed by Rook to prevent pod restarts due to liveness probe failures that sometimes result from probes causing ceph-mon pods to hit their memory limit. Change-Id: Ib7d28fd866a51cbc5ad0d7320ae2ef4a831276aa - Merge "[mariadb] Add mariadb-server-primary service" - [elasticsearch-exporter] Update to the latest v1.7.0 The current version of the exporter is outdated, switch to the upstream + rename --es.snapshots to --collector.snapshots (v1.7.0) and --es.cluster_settings to --collector.clustersettings (v1.6.0) Change-Id: I4b496d859a4764fbec3271817391667a53286acd - [mariadb] Add mariadb-server-primary service This PS adds mariadb-server-primary service that is getting created and automatically updated based on the leader election process in start.py entrypoint script. Change-Id: I1d8a8db0ce8102e5e23f7efdeedd139726ffff28 Signed-off-by: Sergiy Markin <[email protected]> - Merge "Change default ingress path type to prefix" - Change default ingress path type to prefix Due to CVE-2022-4886 the default pathType for an ingress should be …
Describe the bug
I set up an EFK stack for gathering my different k8s pods logs based on this tutorial: https://mherman.org/blog/logging-in-kubernetes-with-elasticsearch-Kibana-fluentd/ on a Microk8s single node cluster. Everything is up and working and I can connect kibanna to elasticsearch and see the indexes but in the discovery section of kibana there is no log related to my pods and there are kubelete logs.
When I checked the logs of fluentd I saw that it is full of backslashes:
There are much more backslashes but I just copied this amount to show the log.
Your Environment
fluent/fluentd-kubernetes-daemonset:v1.4-debian-elasticsearch
and alsov1.3
but the results were the sameYour Configuration
Based on the tutorial that I mentioned earlier I am using two config files for setting up fluentd:
The text was updated successfully, but these errors were encountered: