Fault Management

Fault Localization in Computer Networks


Our research investigates non-deterministic fault diagnosis in computer networks. We introduce a non-deterministic system model for multi-layer fault diagnosis, which incorporates both availability and performance problems. We map the layered model into a belief network and investigate an application of Bayesian reasoning techniques to performing fault localization using a belief network as a fault propagation model. Although it allows very accurate root cause determination, exact Bayesian reasoning is infeasible in real-life systems due to its exponential complexity. We introduce adaptations of two Bayesian reasoning techniques and show through simulation that the approximate schemes allow almost optimally accurate fault localization to be performed in polynomial time. We also propose a novel incremental algorithm for fault correlation which processes symptoms incrementally as they are received, and show through simulations that this algorithm is computationally fast, making it scalable to much larger network sizes than other algorithms. The incremental algorithm is able to identify multiple simultaneous faults, incorporate both positive and negative information into the reasoning process, and is resilient to noise in the observed symptoms. We also investigate a distributed algorithm to divide the computational effort and system knowledge among multiple hierarchically organized managers. Our recent work is now looking into active probing techniques applied to fault localization.

(Supported by the ARL Collaborative Technologies Alliances (CTA) Communications and Networks Consortium sponsored by the Army Research Laboratory (ARL).



Back to Network Management Laboratory

Adarsh Sethi's Homepage