
Abstract
Network failures continue to plague datacenter operators as their symptomsmay not have direct correlation with where or why they occur. We introduce 007,a lightweight, always-on diagnosis application that can find problematic linksand also pinpoint problems for each TCP connection. 007 is completely containedwithin the end host. During its two month deployment in a tier-1 datacenter, itdetected every problem found by previously deployed monitoring tools while alsofinding the sources of other problems previously undetected.