TIL That pod memory limits include page cache

On February 23, 2023, I learnt ...

That pod memory limits include page cache

At work we run a lot of Nginx pods, which we use to serve static files to our web apps. We specify memory limits for all pods we run and, to get an estimate of what those numbers should be, I have been using Datadog.

It worked well so far, but with Nginx not so much. Nginx has a very low memory footprint, and relies a lot on the page cache when serving static files. The page cache (a.k.a. disk cache) is how the operating system speeds up frequent access to the same files.

Datadog reports memory usage for pods, but only for the RSS memory: it is not accounting for the disk cache. Kubernetes however is including disk cache usage when enforcing the memory limits (source here).

Increasing our memory limits from 20Mi to 500Mi made a huge difference, though Datadog keeps reporting the same RSS memory usage.

From:


> k describe deployment deployment-name
...
  Containers:
...
   nginx:
    Limits:
      cpu:     10m
      memory:  20Mi
    Requests:
      cpu:      1m
      memory:   5Mi
...

To:


> k describe deployment deployment-name
...
  Containers:
...
   nginx:
    Limits:
      cpu:     100m
      memory:  500Mi
    Requests:
      cpu:      20m
      memory:   50Mi
...

What DD keeps reporting:

nginx memory usage

What actually happens:


> curl -o /dev/null -s -w 'Total: %{time_total}s\n' https://support.oeg-kraken.energy/static/support/js/vendor/fontawesome-all.min.2ae534bdf847.js
Total: 5.764943s

That’s down from 25 seconds.

I am struggling to find a good metric on DD, but the one below look like a good one to use as reference. All the values for last day look lower, which is a good sign.

nginx iowait