Skip to content

Memes

Memes application service serving meme.sh and www.meme.sh. Built from apps/memes/axum-memes (Rust backend + Askama templates) plus the apps/memes/astro-memes static frontend, packaged into the kbve/memes image. Deployment lives at apps/kube/memes/manifest/.

Public traffic enters via Cloudflare → CNAME to gateway.kbve.com142.132.206.71 (Cilium gateway LB) → kbve-gateway https-memes listener (hostname meme.sh, cert memes-tls covers both meme.sh and www.meme.sh via SAN) → memes-route (spec.hostnames = [meme.sh, www.meme.sh]) → memes-service:4321.

The cert lives in memes/memes-tls and is consumed cross-namespace by kbve-gateway via memes-tls-from-kbve-gateway ReferenceGrant.

memes-cert is RSA 2048, issued by the letsencrypt-http ClusterIssuer (HTTP-01). The algorithm matches the cert already in memes-tls; do not change to ECDSA without first verifying that the ACME HTTP-01 challenge path on Cilium gateway is healthy (see runbook below) — otherwise cert-manager flags IncorrectCertificate and queues a reissue that can never complete.

Cilium gateway gotcha — never split apex + www across listeners with one cert

Section titled “Cilium gateway gotcha — never split apex + www across listeners with one cert”

The original setup had two listeners (https-memes for meme.sh, https-memes-www for www.meme.sh) both pointing at the same memes-tls Secret. Cilium derives the Envoy filter chain SNI list from the cert SAN (not the listener hostname), so each listener emitted a filter chain with the same SNI array [meme.sh, www.meme.sh]. Envoy rejects duplicate filter chain match → falls through to the default 404 handler → 73 days of meme.sh returning nginx-style 404s while kubectl get gateway reported Programmed=True / Accepted=True / ResolvedRefs=True and the memes-service cluster sat healthy with zero cx_total.

A second-order victim was ACME HTTP-01 — cm-acme-http-solver-* HTTPRoutes share the same listener, so the cert reissue loop also couldn’t complete (Ready=False, Reason=IncorrectCertificate).

Rule for this gateway: if one cert covers multiple SANs, use one listener with the apex hostname and let the HTTPRoute spec.hostnames enumerate every hostname the route serves. If you need per-host listeners, issue one cert per hostname so the SNI lists don’t collide.

  1. Confirm origin is the actual culprit (not Cloudflare): curl -s https://meme.sh/ | head — an nginx-flavored 404 body means the request reached origin and the gateway returned it.
  2. Compare against a sibling host on the same gateway, e.g. curl -sk --resolve kbve.com:443:142.132.206.71 -H 'Host: kbve.com' https://kbve.com/ — if that’s 200, the gateway is healthy and the breakage is memes-specific.
  3. Dump the Envoy filter chains and look for two with overlapping SNI arrays: kubectl exec -n kube-system <cilium-pod> -- cilium-dbg envoy admin config | jq '.. | objects | select(.filter_chains) | .filter_chains[] | select(.filter_chain_match.server_names[]? | contains("meme")) | .filter_chain_match.server_names' — if you see the SAN list duplicated across two chains, the listener split is the bug.
  4. Collapse to one listener (this PR’s fix), reapply via ArgoCD, and kubectl rollout restart -n kube-system ds/cilium-envoy to flush the stale filter chains.
  5. The stuck cm-acme-http-solver-* HTTPRoutes drain automatically once the cert can validate; if they don’t, kubectl delete httproute -l acme.cert-manager.io/http01-solver -n memes to force cert-manager to recreate fresh ones.
  6. Re-curl meme.sh — should return 200.