Fault Management

Fault Localization in Computer Networks

Our research investigates non-deterministic fault diagnosis in computer networks. We introduce a non-deterministic system model for multi-layer fault diagnosis, which incorporates both availability and performance problems. We map the layered model into a belief network and investigate an application of Bayesian reasoning techniques to performing fault localization using a belief network as a fault propagation model. Although it allows very accurate root cause determination, exact Bayesian reasoning is infeasible in real-life systems due to its exponential complexity. We introduce adaptations of two Bayesian reasoning techniques and show through simulation that the approximate schemes allow almost optimally accurate fault localization to be performed in polynomial time. We also propose a novel incremental algorithm for fault correlation which processes symptoms incrementally as they are received, and show through simulations that this algorithm is computationally fast, making it scalable to much larger network sizes than other algorithms. The incremental algorithm is able to identify multiple simultaneous faults, incorporate both positive and negative information into the reasoning process, and is resilient to noise in the observed symptoms. We also investigate a distributed algorithm to divide the computational effort and system knowledge among multiple hierarchically organized managers. Our recent work is now looking into active probing techniques applied to fault localization.

(Supported by the ARL Collaborative Technologies Alliances (CTA) Communications and Networks Consortium sponsored by the Army Research Laboratory (ARL).

Related Publications

M. Natu and A.S. Sethi, ``Active Probing Approach for Fault Localization in Computer Networks.'' Proc. End-to-End Monitoring Workshop, Vancouver, B.C., Canada (April 2006). PDF
M. Natu and A.S. Sethi, ``Adaptive Fault Localization for Mobile, Ad-Hoc Battlefield Networks.'' Proc. Milcom-2005, IEEE Military Communications Conference, Atlantic City, NJ (Oct. 2005). PDF
L. Kant, W. Chen, C-W. Lee, A.S. Sethi, M. Natu, L. Luo, and C-C. Shen, ``D-FLASH: Dynamic Fault Localization And Self-Healing for Battlefield Networks.'' Proc. ASC'04, the 24th Army Science Conference, Orlando, FL (Nov.-Dec. 2004). PDF
M. Steinder and A.S. Sethi, ``A Survey of Fault Localization Techniques in Computer Networks.'' Science of Computer Programming, Special Edition on Topics in System Administration Vol. 53, 2 (Nov. 2004), pp. 165-194. PDF
M. Steinder and A.S. Sethi, ``Non-deterministic Fault Localization in Communication Systems Using Belief Networks.'' IEEE/ACM Transactions on Networking Vol. 12, 5 (Oct. 2004), pp. 809-822. PDF
M. Steinder and A.S. Sethi, ``Probabilistic Fault Diagnosis in Communication Systems Through Incremental Hypothesis Updating.'' Computer Networks Vol. 45, 4 (July 2004), pp. 537-562. PDF
M. Steinder and A.S. Sethi, ``Multi-Domain Diagnosis of End-to-End Service Failures in Hierarchically Routed Networks.'' In NETWORKING 2004, Networking Technologies, Services, and Protocols; Performance of Computer and Communication Networks; Mobile and Wireless Communications (N. Mitrou, K. Kontovasilis, G.N. Rouskas, et al. (eds.)) Lecture Notes in Computer Science Vol. LNCS-3042, (2004), pp. 1036-1046, Heidelberg: Springer-Verlag. PDF
L. Kant, A. McAuley, R. Morera, A.S. Sethi, and M. Steinder, ``Fault Localization and Self-Healing with Dynamic Domain Configuration.'' Proc. Milcom-2003, IEEE Military Communications Conference, Boston, MA (Oct. 2003). PDF
S. Singh and A.S. Sethi, ``Fault Management for Home Networks.'' Proc. SAM'03, the 2003 International Conference on Security and Management, Las Vegas, NV (June 2003). PDF
M. Steinder and A.S. Sethi, ``Probabilistic Event-driven Fault Diagnosis Through Incremental Hypothesis Updating.'' In Integrated Network Management, VIII} (G. Goldszmidt and J. Schonwalder (eds.)), pp. 635-648, Boston, MA: Kluwer Academic Publishers, 2003. PDF
M. Steinder and A.S. Sethi, ``Application of Bayesian Reasoning Techniques to Fault Localization in FCS Networks''. Proc. CTA Annual Conference, College Park, MD (April 2003). PDF
L. Kant, A. McAuley, R. Morera, A.S. Sethi, and M. Steinder, ``Fault Localization and Self-Healing with Dynamic Domain Configuration.'' Proc. CTA Annual Conference, College Park, MD (April 2003). PDF
L. Kant, A.S. Sethi, and M. Steinder, ``Fault Localization and Self-Healing Mechanisms for FCS Networks.'' Proc. 23rd Army Science Conference, Orlando, FL (Dec. 2002). Received Best Paper Award. PDF
M. Steinder and A.S. Sethi, ``Distributed Fault Localization in Hierarchically Routed Networks.'' In Management Technologies for E-Commerce and E-Business Applications} (M. Feridun, P. Kropf, and G. Babin (eds.)) Lecture Notes in Computer Science Vol. LNCS-2506, (2002), pp. 195-207, Berlin: Springer-Verlag. PDF
M. Steinder and A.S. Sethi, ``Increasing Robustness of Fault Localization Through Analysis of Lost, Spurious, and Positive Symptoms.'' Proc. Infocom-2002, 21st Annual Joint Conference of the IEEE Computer and Communications Societies, New York, NY (June 2002). PDF
M. Steinder and A.S. Sethi, ``End-to-end Service Failure Diagnosis Using Belief Networks.'' Proc. NOMS-2002, 8th International IFIP/IEEE Symposium on Network Operations and Management, Florence, Italy (April 2002), pp. 375-390. PDF
M. Steinder and A.S. Sethi, ``Non-deterministic Diagnosis of End-to-end Service Failures in a Multi-Layer Communication System,'' Proc. ICCCN-2001, Tenth International Conference on Computer Communications and Networks, Scottsdale, AZ (Oct. 2001), pp. 374-379. PDF
M. Steinder and A.S. Sethi, ``The Present and Future of Event Correlation: A Need for End-to-end Service Fault Localization,'' Proc. SCI-2001, 5th World Multiconference on Systemics, Cybernetics, and Informatics, Orlando, FL (July 2001), pp. 124-129. PDF

Technical Reports

M. Steinder and A.S. Sethi, ``Multi-Layer Fault Localization Using Probabilistic Inference in Bipartite Dependency Graphs,'' Technical Report No. 2001-02, Dept. of Computer and Information Sciences, University of Delaware, Newark, DE (Feb. 2001).

Back to Network Management Laboratory

Adarsh Sethi's Homepage