Arrow of time
Arrow of time

Resource reservations and local DOS

Share Tweet Share

An interesting post on the stable@ mailing list by Matt Dillon illustrates why are jail resource limits and other such …

An interesting post on the stable@ mailing list by Matt Dillon illustrates why are jail resource limits and other such measures important on todays machines. Actually, this also illustrates one of the "pro" arguments for virtualization.

The post is very informative and I'm quoting it here:

    One of the problems with resource management in general is
    that it has traditionally been per-process, and due to the
    multiplicative effect (e.g. max-descriptors * limit-per-descriptor),
    per-process resources cannot be set such that any given user is
    prevented from DDOSing the system without making them so low that
    normal programs begin to fail for no good reason.

Hence the advent of per-user and other more suitable resource limits, nominally set via sysctl. Even with these, however, it is virtually impossible to protect against a user DDOS. The kernel itself has resource limitations which are fairly easy to blow out... mbufs are usually the easiest to blow up, followed by pipe KVM memory. Filesytems can be blown up too by creating sparse files and mmap()ing them (thus circumventing normal overcommit limitations).
Paging just itself, without running the system out of VM, can destroy a machine's performance and be just as effective a DDOS attack as resource starvation is.
Virtual memory resources are similarly impacted. Overcommit limiting features have as many downsides as they have upsides. Its an endless argument but I've seen systems blow up with overcommit limits set even more readily than with no (overcommit) limits set. Theoretically overcommit limits make the system more manageable but in actual practice they only work when the application base is written with such limits in mind (and most are not). So for a general purpose unix environment putting limits on overcommit tends to create headaches. To be sure, in a turn-key environment overcommit serves a very important function. In a non-turn-key environment however it will likely create more problems than it will solve.
The only way to realistically deal with the mess, if it is important to you, is to partition the systems' real resources and run stuff inside their own virtualized kernels each of which does its own independent resource management and whos I/O on the real system can be well-controlled as an aggregate.
Alternatively, creating very large swap partitions work very well to mitigate the more common problems. Swap itself is changing its function. Swap is no longer just used for real memory overcommit (in fact, real memory overcommit is quite uncommon these days). It is now also used for things like tmpfs, temporary virtual disks, meta-data caching, and so forth. These days the minimum amount of swap I configure is 32G and as efficient swap storage gets more cost effective (e.g. SSDs), significantly more. 70G, 110G, etc.
It becomes more a matter of being able to detact and act on the DDOS/resource issue BEFORE it gets to the point of killing important processes (definition: whatever is important for the functioning of that particular machine, user-run or root-run), and less a matter of hoping the system will do the right thing when the resource limit is actually reached. Having a lot of swap gives you more time to act.


comments powered by Disqus