FOLLOWUP: Found the nastiest Enomalism bug yet!

FOLLOWUP: Found the nastiest Enomalism bug yet!

So... Even when you think you have it all figured out, sometimes, you are wrong.

Enomalism seemed to live a lot longer when simplefirewall was removed, but would still go down overnight when dos attacking via some automated javascript stuff. By the way, if you have not yet checked out the Selenium project, you really should. Awesome automated regression testing framework.

Anyways, I did a LOT of digging, and wrote a test harness in addition to my NOSE tests that re-ran tight loops on the various lowlevel XEN services in my API, and found another bug. Turns out that when you retrieve (or try to) details for a non-running xen domain enough times, sooner or later, the XenD socket library starts leaking sockets. I think. I only know that it blocks on the thread forever, leading to a socket being used up. I also discovered that it is not only non-re-entrant, but also non concurrent!

Solution:
  • Thread locks on all cheezy api calls
  • Cache all running machine states, and request a list of running ones only to avoid the dead machine bug
  • More regression testing :(
Home Home
http://www.reaysmoving.com/