-
Notifications
You must be signed in to change notification settings - Fork 40.4k
test: Fix flaky network test for service endpoint updates #131372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
test: Fix flaky network test for service endpoint updates #131372
Conversation
The test 'Networking Granular Checks: Services should update endpoints: http' was failing intermittently due to a race condition between pod deletion and endpoint updates. This change adds explicit wait for endpoint updates after pod deletion, improves error logging for better debugging, sets appropriate timeout for endpoint update checks, and provides better status reporting during the test. Fixes kubernetes#131370
Please note that we're already in Test Freeze for the Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Fri Apr 18 13:34:58 UTC 2025. |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Hi @aryasoni98. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Startup probe failures during container startup are expected and should be logged as Normal events rather than Warning events. This is because these failures are part of the normal container startup process and do not indicate a problem that requires operator attention. This change modifies the prober to: - Log startup probe failures as Normal events - Keep liveness and readiness probe failures as Warning events - Add unit tests to verify the event type behavior Fixes kubernetes#131370 Signed-off-by: Arya Soni <[email protected]>
Keywords which can automatically close issues and at(@) or hashtag(#) mentions are not allowed in commit messages. The list of commits with invalid commit messages:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: aryasoni98 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The test 'Networking Granular Checks: Services should update endpoints: http' was failing intermittently due to a race condition between pod deletion and endpoint updates. This change adds explicit wait for endpoint updates after pod deletion, improves error logging for better debugging, sets appropriate timeout for endpoint update checks, and provides better status reporting during the test.
Fixes #131370
What type of PR is this?
/kind bug
/kind flake
/kind failing-test
What this PR does / why we need it:
This PR fixes a flaky test in the networking e2e test suite. The test was failing intermittently due to a race condition where it would check for endpoint updates too quickly after pod deletion, before the endpoint controller had time to update the endpoints.
Changes made:
wait.PollWithContext
after pod deletionWhich issue(s) this PR fixes:
Fixes #131370
Special notes for your reviewer:
The main change is the addition of a wait condition after pod deletion. This ensures that we properly wait for the endpoint controller to update the endpoints before proceeding with the test. The wait condition checks every second for up to 30 seconds, which should be more than enough time for the endpoint controller to process the pod deletion and update the endpoints.
The error logging has also been improved to make it easier to diagnose any future issues. If the endpoints don't match the expected state, the logs will now show both the current and expected endpoint lists.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: