|
26 | 26 | - [Scenario 7. Risk of porting existing sidecars to the new mechanism naively](#scenario-7-risk-of-porting-existing-sidecars-to-the-new-mechanism-naively)
|
27 | 27 | - [Design Details](#design-details)
|
28 | 28 | - [Backward compatibility](#backward-compatibility)
|
| 29 | + - [kubectl changes](#kubectl-changes) |
| 30 | + - [Without sidecar containers support](#without-sidecar-containers-support) |
| 31 | + - [With sidecar container feature](#with-sidecar-container-feature) |
29 | 32 | - [Resources calculation for scheduling and pod admission](#resources-calculation-for-scheduling-and-pod-admission)
|
30 | 33 | - [Exposing Pod Resource requirements](#exposing-pod-resource-requirements)
|
31 | 34 | - [Goals of exposing the Pod.TotalResourcesRequested field](#goals-of-exposing-the-podtotalresourcesrequested-field)
|
32 | 35 | - [Implementation details](#implementation-details)
|
33 | 36 | - [Notes for implementation](#notes-for-implementation)
|
34 | 37 | - [Resources calculation and Pod QoS evaluation](#resources-calculation-and-pod-qos-evaluation)
|
| 38 | + - [Resource calculation and version skew](#resource-calculation-and-version-skew) |
35 | 39 | - [Topology and CPU managers](#topology-and-cpu-managers)
|
36 | 40 | - [Termination of containers](#termination-of-containers)
|
37 | 41 | - [Other](#other)
|
|
81 | 85 |
|
82 | 86 | Items marked with (R) are required *prior to targeting to a milestone / release*.
|
83 | 87 |
|
84 |
| -- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) |
85 |
| -- [ ] (R) KEP approvers have approved the KEP status as `implementable` |
86 |
| -- [ ] (R) Design details are appropriately documented |
87 |
| -- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) |
| 88 | +- [X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) |
| 89 | +- [X] (R) KEP approvers have approved the KEP status as `implementable` |
| 90 | +- [X] (R) Design details are appropriately documented |
| 91 | +- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) |
88 | 92 | - [ ] e2e Tests for all Beta API Operations (endpoints)
|
89 | 93 | - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
|
90 | 94 | - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
|
@@ -243,9 +247,8 @@ condition is represented with the field `Started` of `ContainerStatus` type. See
|
243 | 247 | the section ["Pod startup completed condition"](#pod-startup-completed-condition)
|
244 | 248 | for considerations on picking this signal.
|
245 | 249 |
|
246 |
| -As part of the KEP, init containers and regular containers will be split into |
247 |
| -two different types. The field `restartPolicy` will only be introduced on init |
248 |
| -containers. The only supported value proposed in this KEP is `Always`. No other |
| 250 | +The field `restartPolicy` will only be accepted on init |
| 251 | +containers as part of this KEP. The only supported value proposed in this KEP is `Always`. No other |
249 | 252 | values will be defined as part of this KEP. Moreover, the field will be
|
250 | 253 | nullable so the default value will be "no value".
|
251 | 254 |
|
@@ -274,7 +277,7 @@ containers in the Pod. This intent to solve the issue
|
274 | 277 | https://github.com/kubernetes/kubernetes/issues/111356
|
275 | 278 |
|
276 | 279 | As part of this KEP we also will be enabling for sidecar containers (those will
|
277 |
| -not be enabled for other init containers): |
| 280 | +not be allowed for other init containers): |
278 | 281 | - `PostStart` and `PreStop` lifecycle handlers for sidecar containers
|
279 | 282 | - All probes (startup, readiness, liveness)
|
280 | 283 | - Readiness probes of sidecars will contribute to determine the whole Pod
|
@@ -525,6 +528,91 @@ Behaviors they can rely on:
|
525 | 528 |
|
526 | 529 | These potential incompatibilities will be documented.
|
527 | 530 |
|
| 531 | +### kubectl changes |
| 532 | + |
| 533 | +The `kubectl get pods` filters all the Init containers from output when Pod is Running. |
| 534 | +As part of this KEP, the output will be extended to include status of sidecar Containers. |
| 535 | +#### Without sidecar containers support |
| 536 | + |
| 537 | +For the Pod: |
| 538 | + |
| 539 | +``` |
| 540 | +initContainers: |
| 541 | + - name: init-config |
| 542 | +containers: |
| 543 | + - name: sidecar-1 |
| 544 | + - name: sidecar-2 |
| 545 | + - name: main |
| 546 | +``` |
| 547 | + |
| 548 | +Initialization (Waiting) |
| 549 | + |
| 550 | +``` |
| 551 | +NAME READY STATUS RESTARTS AGE |
| 552 | +test 0/3 Init:0/1 0 0s |
| 553 | +``` |
| 554 | +Running |
| 555 | + |
| 556 | +``` |
| 557 | +NAME READY STATUS RESTARTS AGE |
| 558 | +test 3/3 Running 0 35s |
| 559 | +``` |
| 560 | + |
| 561 | +#### With sidecar container feature |
| 562 | + |
| 563 | +For the Pod: |
| 564 | + |
| 565 | +``` |
| 566 | +initContainers: |
| 567 | + - name: init-config |
| 568 | + - name: sidecar-1 |
| 569 | + restartPolicy: Always |
| 570 | + - name: sidecar-2 |
| 571 | + restartPolicy: Always |
| 572 | +containers: |
| 573 | + - name: main |
| 574 | +``` |
| 575 | + |
| 576 | +What we have today: |
| 577 | + |
| 578 | +Initialization (Waiting) |
| 579 | + |
| 580 | +``` |
| 581 | +NAME READY STATUS RESTARTS AGE |
| 582 | +test 0/1 Init:0/3 0 0s |
| 583 | +NAME READY STATUS RESTARTS AGE |
| 584 | +test 0/1 Init:1/3 0 5s |
| 585 | +NAME READY STATUS RESTARTS AGE |
| 586 | +test 0/1 Init:2/3 0 10s |
| 587 | +``` |
| 588 | + |
| 589 | +Running |
| 590 | + |
| 591 | +``` |
| 592 | +NAME READY STATUS RESTARTS AGE |
| 593 | +test 1/1 Running 0 35s |
| 594 | +``` |
| 595 | + |
| 596 | +What will be returned as part of the KEP implementation: |
| 597 | + |
| 598 | +Initialization (Waiting) |
| 599 | + |
| 600 | +``` |
| 601 | +NAME READY STATUS RESTARTS AGE |
| 602 | +test 0/3 Init:0/3 0 0s |
| 603 | +NAME READY STATUS RESTARTS AGE |
| 604 | +test 0/3 Init:1/3 0 5s |
| 605 | +NAME READY STATUS RESTARTS AGE |
| 606 | +test 0/3 Init:2/3 0 10s |
| 607 | +``` |
| 608 | + |
| 609 | +Running |
| 610 | + |
| 611 | +``` |
| 612 | +NAME READY STATUS RESTARTS AGE |
| 613 | +test 3/3 Running 0 35s |
| 614 | +``` |
| 615 | + |
528 | 616 | ### Resources calculation for scheduling and pod admission
|
529 | 617 |
|
530 | 618 | When calculating whether Pod will fit the Node, resource limits and requests are
|
@@ -586,7 +674,7 @@ represent the actual resources in use. The KEP notes that:
|
586 | 674 | > `Status.ContainerStatuses[i].ResourcesAllocated` when considering available
|
587 | 675 | > space on a node.
|
588 | 676 |
|
589 |
| -We can introduce `ContainerUse` to represent this value: |
| 677 | +We will introduce `ContainerUse` to represent this value: |
590 | 678 |
|
591 | 679 | ```
|
592 | 680 | ContainerUse(i) = Max(Spec.Containers[i].Resources, Status.ContainerStatuses[i].ResourcesAllocated)
|
@@ -640,6 +728,11 @@ allow a pod to schedule.
|
640 | 728 | - Eliminate duplication of the pod resource requirement calculation within
|
641 | 729 | `kubelet` and `kube-scheduler`.
|
642 | 730 |
|
| 731 | +Note: in order to support the [Downgrade strategy](#downgrade-strategy), scheduler |
| 732 | +will ignore the presence of the feature gate when calculating resources. This will |
| 733 | +prevent overbooking of nodes when scheduler ignored sidecar when calculating resources |
| 734 | +and scheduled too many Pods on the Node that had the feature gate enabled. |
| 735 | + |
643 | 736 | #### Implementation details
|
644 | 737 |
|
645 | 738 | We propose making two changes to satisfy the two primary consumers of the
|
@@ -699,6 +792,24 @@ The logic in
|
699 | 792 | not likely will need changes, but needs to be tested with the sidecar
|
700 | 793 | containers.
|
701 | 794 |
|
| 795 | +#### Resource calculation and version skew |
| 796 | + |
| 797 | +In case of a version skew between scheduler and kubelet, or in cases when |
| 798 | +scheduler and kubelet has a different value set for the `SidecarContainers` feature gate, |
| 799 | +calculation of resources required for a Pod will differ between the scheduler |
| 800 | +and a kubelet when the sidecar container created. |
| 801 | + |
| 802 | +In case when scheduler "knows" about the sidecar and kubelet doesn't, there |
| 803 | +unlikely be any issues. Scheduler will calculate resources usage for a Pod that |
| 804 | +will be equal or more than kubelet will require to run the Pod. So there will be |
| 805 | +no overbooking. |
| 806 | + |
| 807 | +If scheduler has the `SidecarContainers` feature gate disabled, the Pod that has a Sidecar |
| 808 | +container will not be admitted as validation of the new field will fail. |
| 809 | + |
| 810 | +We will recommend in documentation to not disable the feature gate on scheduler, |
| 811 | +while there are any Pods with Sidecar container is running. |
| 812 | + |
702 | 813 | ### Topology and CPU managers
|
703 | 814 |
|
704 | 815 | [NodeResourcesFit scheduler plugin](https://github.com/kubernetes/kubernetes/blob/release-1.26/pkg/scheduler/framework/plugins/noderesources/fit.go#L160-L176)
|
@@ -984,26 +1095,31 @@ First, there will be no effect on any workload that doesn't use a new field. Any
|
984 | 1095 | combination of feature gate enabled/disabled or version skew will work as usual
|
985 | 1096 | for that workload.
|
986 | 1097 |
|
987 |
| -So when the new functionality wasn't yet used, downgrade will not be affected. |
| 1098 | +When the new functionality wasn't yet used, downgrade will not be affected. |
988 | 1099 |
|
989 | 1100 | Due to the new field added to `initContainers` to turn them into sidecars,
|
990 | 1101 | downgrading to the version without this feature will make all Pods using this
|
991 |
| -flag unscheduleable. New Pods will be rejected by the control plane and |
992 |
| -all kubelets. Pods that has already been created will not be rejected by control |
993 |
| -plane, but once reaching the kubelet, that has this feature disabled or which |
994 |
| -is old, kubelet will reject the Pod on unmarshalling. |
995 |
| - |
996 |
| -**Note**, we tested kubelet behavior. For the control plane we may need |
997 |
| -to implement a new logic to reject such Pods when feature gate got turned off. |
| 1102 | +flag unscheduleable. New Pods will be rejected by the control plane. |
| 1103 | + |
| 1104 | +Pods that has already been created will not be rejected by control |
| 1105 | +plane nor by kubelet. Both will treat the sidecar container as an Init container. |
| 1106 | +This may render the Pod unusable as it will stuck in initialization forever - |
| 1107 | +sidecar container are never exiting. We will document this behavior for Alpha release. |
| 1108 | +Promoting feature to Beta we will revisit the situation. If we will see this as |
| 1109 | +a major issue, we will need to wait for 3 releases so kubelet will have the logic |
| 1110 | +to reject such Pods when the feature gate is disabled to keep Downgrade safe. |
| 1111 | + |
| 1112 | +**Note**, For the control plane and kubelet we will implement logic to reject Pods |
| 1113 | +with sidecar containers when feature gate got turned off. |
998 | 1114 | See [Upgrade/downgrade testing](#upgradedowngrade-testing) section.
|
999 | 1115 |
|
1000 | 1116 | Workloads will have to be deleted and recreated with the old way of handling
|
1001 | 1117 | sidecars. Once there is no more Pods using sidecars, node can be downgraded
|
1002 | 1118 | without side effects.
|
1003 | 1119 |
|
1004 |
| -If downgrade hapenning from the version with the feature enabled to the previous |
| 1120 | +If downgrade happening from the version with the feature enabled to the previous |
1005 | 1121 | version that has this feature support, but feature gate is disabled, kubelet
|
1006 |
| -and/or control place will reject these Pods. |
| 1122 | +and control place will reject these Pods. |
1007 | 1123 |
|
1008 | 1124 | **Note**, downgrade requires node drain. So we will not support scenarios when
|
1009 | 1125 | Pod already running on the node will need to be handled by the restarted
|
|
0 commit comments