Fault Management
Fault Localization in Computer Networks
Our research investigates non-deterministic fault diagnosis in computer networks.
We introduce a non-deterministic system model for multi-layer fault diagnosis,
which incorporates both availability and performance problems. We map the layered
model into a belief network and investigate an application of Bayesian reasoning
techniques to performing fault localization using a belief network as a fault
propagation model. Although it allows very accurate root cause determination,
exact Bayesian reasoning is infeasible in real-life systems due to its
exponential complexity. We introduce adaptations of two Bayesian reasoning
techniques and show through simulation that the approximate schemes allow almost
optimally accurate fault localization to be performed in polynomial time.
We also propose a novel incremental algorithm for fault correlation which
processes symptoms incrementally as they are received, and show through
simulations that this algorithm is computationally fast, making it scalable
to much larger network sizes than other algorithms. The
incremental algorithm is able to identify multiple simultaneous faults,
incorporate both positive and negative information into the reasoning process,
and is resilient to noise in the observed symptoms. We also investigate a
distributed algorithm to divide the computational effort and system knowledge
among multiple hierarchically organized managers. Our recent work is now
looking into active probing techniques applied to fault localization.
(Supported by the
ARL Collaborative Technologies Alliances (CTA)
Communications and Networks Consortium
sponsored by the Army Research Laboratory
(ARL).
Related Publications
- M. Natu and A.S. Sethi, ``Active Probing Approach for Fault
Localization in Computer Networks.'' Proc. End-to-End
Monitoring Workshop, Vancouver, B.C., Canada (April 2006).
PDF
- M. Natu and A.S. Sethi, ``Adaptive Fault Localization for
Mobile, Ad-Hoc Battlefield Networks.''
Proc. Milcom-2005, IEEE Military Communications
Conference, Atlantic City, NJ (Oct. 2005).
PDF
- L. Kant, W. Chen, C-W. Lee, A.S. Sethi, M. Natu, L. Luo, and C-C. Shen,
``D-FLASH: Dynamic Fault Localization And Self-Healing for
Battlefield Networks.'' Proc. ASC'04, the 24th Army
Science Conference, Orlando, FL (Nov.-Dec. 2004).
PDF
- M. Steinder and A.S. Sethi, ``A Survey of Fault Localization
Techniques in Computer Networks.'' Science
of Computer Programming, Special Edition on Topics in
System Administration Vol. 53, 2 (Nov. 2004),
pp. 165-194.
PDF
- M. Steinder and A.S. Sethi, ``Non-deterministic Fault Localization
in Communication Systems Using Belief Networks.''
IEEE/ACM Transactions on Networking Vol. 12, 5
(Oct. 2004), pp. 809-822.
PDF
- M. Steinder and A.S. Sethi, ``Probabilistic Fault Diagnosis in
Communication Systems Through Incremental Hypothesis Updating.''
Computer Networks Vol. 45, 4 (July 2004), pp. 537-562.
PDF
- M. Steinder and A.S. Sethi, ``Multi-Domain Diagnosis of
End-to-End Service Failures in Hierarchically Routed
Networks.'' In NETWORKING 2004, Networking Technologies,
Services, and Protocols; Performance of Computer and
Communication Networks; Mobile and Wireless Communications
(N. Mitrou, K. Kontovasilis, G.N. Rouskas, et al. (eds.))
Lecture Notes in Computer Science Vol. LNCS-3042,
(2004), pp. 1036-1046, Heidelberg: Springer-Verlag.
PDF
- L. Kant, A. McAuley, R. Morera, A.S. Sethi, and M. Steinder,
``Fault Localization and Self-Healing with Dynamic Domain
Configuration.'' Proc. Milcom-2003, IEEE Military Communications
Conference, Boston, MA (Oct. 2003).
PDF
- S. Singh and A.S. Sethi, ``Fault Management for Home Networks.''
Proc. SAM'03, the 2003 International Conference on Security and
Management, Las Vegas, NV (June 2003).
PDF
- M. Steinder and A.S. Sethi, ``Probabilistic
Event-driven Fault Diagnosis
Through Incremental Hypothesis Updating.''
In Integrated Network Management, VIII}
(G. Goldszmidt and J. Schonwalder (eds.)), pp. 635-648,
Boston, MA: Kluwer Academic Publishers, 2003.
PDF
-
M. Steinder and A.S. Sethi, ``Application of Bayesian Reasoning
Techniques to Fault Localization in FCS Networks''. Proc.
CTA Annual Conference, College Park, MD
(April 2003).
PDF
-
L. Kant, A. McAuley, R. Morera, A.S. Sethi, and M. Steinder,
``Fault Localization and Self-Healing with Dynamic Domain
Configuration.'' Proc. CTA Annual Conference, College
Park, MD (April 2003).
PDF
- L. Kant, A.S. Sethi, and M. Steinder, ``Fault Localization
and Self-Healing
Mechanisms for FCS Networks.'' Proc. 23rd Army Science Conference,
Orlando, FL (Dec. 2002). Received Best Paper Award.
PDF
- M. Steinder and A.S. Sethi, ``Distributed Fault Localization in
Hierarchically Routed Networks.'' In Management Technologies
for E-Commerce and E-Business Applications} (M. Feridun, P. Kropf,
and G. Babin (eds.))
Lecture Notes in Computer Science Vol. LNCS-2506,
(2002), pp. 195-207, Berlin: Springer-Verlag.
PDF
- M. Steinder and A.S. Sethi, ``Increasing Robustness of Fault Localization
Through Analysis of Lost, Spurious, and Positive Symptoms.'' Proc.
Infocom-2002, 21st Annual Joint Conference of the IEEE Computer and
Communications Societies, New York, NY (June 2002).
PDF
- M. Steinder and A.S. Sethi, ``End-to-end Service Failure Diagnosis Using
Belief Networks.'' Proc. NOMS-2002, 8th International IFIP/IEEE
Symposium on Network Operations and Management, Florence, Italy (April 2002),
pp. 375-390.
PDF
- M. Steinder and A.S. Sethi, ``Non-deterministic Diagnosis of End-to-end
Service Failures in a Multi-Layer Communication System,'' Proc. ICCCN-2001,
Tenth International Conference on Computer Communications and Networks,
Scottsdale, AZ (Oct. 2001), pp. 374-379.
PDF
- M. Steinder and A.S. Sethi, ``The Present and Future of Event Correlation:
A Need for End-to-end Service Fault Localization,''
Proc. SCI-2001, 5th World Multiconference on Systemics, Cybernetics,
and Informatics, Orlando, FL (July 2001), pp. 124-129.
PDF
Technical Reports
- M. Steinder and A.S. Sethi, ``Multi-Layer Fault Localization Using
Probabilistic Inference in Bipartite Dependency Graphs,''
Technical Report No. 2001-02, Dept. of Computer
and Information Sciences, University of Delaware, Newark, DE
(Feb. 2001).
Back to Network Management Laboratory
Adarsh Sethi's Homepage