The Open Cybernetics & Systemics Journal

2011, 5 : 45-52
Published online 2011 June 15. DOI: 10.2174/1874110X01105010045
Publisher ID: TOCSJ-5-45

Operating System Level Trace Analysis for Automated Problem Identification

Gabriel N. Matni and Michel R. Dagenais
Department of Computer and Software Engineering, Ecole Polytechnique de Montreal, C.P. 6079, Station Downtown, Montreal, Quebec, H3C 3A7, Canada.

ABSTRACT

Performance bottlenecks, malicious activities, programming bugs and other kinds of problematic behavior could be accurately detected on production systems if the relevant events were being monitored. This could be achieved through kernel level tracing where every time a relevant event occurs, the information is analysed or saved in a trace file to be inspected during post-mortem analysis. While collecting the information from the kernel has a very low impact, the offline analysis is typically performed remotely with no overhead on the system whatsoever.

This article presents an automata-based approach for analyzing traces generated by the kernel of an operating system. Some typical patterns of problematic behavior are identified and described using the State Machine Language. These patterns are fed into an offline analyzer which efficiently and simultaneously checks for their occurrences even in traces of several gigabytes. The analyzer achieves a linear performance with respect to the trace size. The remaining factors impacting its performance are also discussed. The main interest of the proposed approach is the efficiency obtained in monitoring such extensive and detailed execution traces for a very large number of simultaneous possible patterns of problematic behavior.