Resiliency Assessment of Wireless Sensor Networks: a Holistic Approach
Di Martino, Catello (2009) Resiliency Assessment of Wireless Sensor Networks: a Holistic Approach. [Tesi di dottorato] (Inedito)
Full text disponibile come:
Wireless Sensor Networks (WSNs) are emerging as a promising technology to foster the design and the implementation of self-configuring, self-healing, and cost-effective monitoring infrastructures. In the last decade, they have been used in several pilot research applications, such as detection of fires , object tracking [2, 3], security monitoring , supply chain monitoring  and stability monitoring of civil engineering structures, such as buildings , bridges , railroad tunnels , and dams [9, 10]. The commercial use of WSN is expected to grow dramatically in the next few years. However, industries in the field of wired sensing and monitoring infrastructures are still questioning the adoption of WSN in critical applications, despite attracted by their interesting features and by the possibility of reducing deployment and management costs of more than one order of magnitude . This gap between research achievements and industrial development is mainly due to the little trust that companies repose in the reliability of WSNs. One of the causes of this distrust is represented by the lack of work defining critical application requirements forWSN and by the absence of effective approaches to be used at design time for assessing non functional properties, such as WSN dependability. Indeed, dependability assessment plays a central role in raising the level of trust of WSNs for critical applications. WSNs are exposed to several faults due to both wireless medium characteristic, the limited energy budget they are equipped with, harsh environment , and cheap adopted hardware. Even if digital signals are less prone to electromagnetic interference, packets might be lost or delivered with errors, sensors may be frozen to wrong fixed values and nodes may periodically reset due to malfunctioning. In these systems, data sensed by WSN nodes has to be properly delivered to the sink node (i.e., the node responsible of data collection), in spite of ”changes” introduced during WSN operation (e.g., a node failure). The situation is finally exacerbated by the highly dynamic nature of WSN and their proneness in manifesting transient failures , and self-reconfigurations. Such a complex behavior introduces several challenges for WSN developers. The design of WSNs is hardened by the lack of effective methods and approaches to master the intrinsic complexity of WSN assessment, especially when aiming to design WSNs able to perform with a persistent level of dependability while withstanding to manifesting changes, i.e. able to perform with a given level of resiliency . As discussed in Chapter 2, past research efforts have been devoted to define the concept of connection (or network) resiliency for computer networks  and ad-hoc networks , i.e., the number of “changes”, in terms of node failures, that can be accommodated while preserving a specific degree of connectivity in the network. However, while these concepts still apply to WSNs, they are not enough to characterize the data-driven nature of WSNs. The service delivered by the WSN does not encompass only the connection, but also the computation, i.e., even when sensor nodes are potentially connected ( a path exists between nodes and the sink node), data losses can still occur. To overcome this limit, this thesis defines the concept of data delivery resiliency and qualifies the concept of WSN resiliency as a non functional properties composed by both connection resiliency and data delivery resiliency. Data delivery resiliency is defined in this thesis as the number of changes in terms of node failure that the WSN can accommodate while preserving packet delivery efficiency greater than a threshold. The concept of connection resiliency and data delivery resiliency are not interrelated. While the concept of connection resiliency relates to the WSN topology, i.e. the degree of path redundancy in the network, the concept of data delivery resiliency is related to i) the computational load on nodes which may causes packet losses due to buffer overrun, ii) application requirements, e.g. at least a given amount of produced measurements must be delivered to the sink node, iii) routing and MAC protocols impacting on the data delivery features and packet error rate, and iv) radio interferences and packet loss/corruption phenomenon on the propagation medium. Hence, assessing the data delivery resiliency as well as the connection resiliency is a crucial task in designing dependable WSNs, since it could help to i) anticipate critical choices e.g., concerning node placement, running software, routing and MAC protocols, ii) mitigate risks, e.g., by forecasting the time when the WSN will not be able to perform with a suitable level of resiliency, and iii) prevent money loss, e.g., providing a criteria to plan and schedule maintenance actions effectively. It is easy to figure out that resiliency assessment ofWSNs is dramatically exacerbated by the complexity of potential changes that may take place at runtime. The workload impacts on the number of packets sent on the network. The path followed by packets depends on the routing algorithm, on the topology, and on the wireless propagation profile (packets can be lost). The energy profile is affected by the workload, by the number of forwarded packets, and by the battery technology. All above factors impact on the failure behavior, e.g., a node can fail due to battery exhaustion. A node can also fail independently, due to faults in the sensing hardware. In turn, a failure of a node may induce a partition of the network into two or more subsets, involving a large set of nodes to be unavailable, i.e., isolated, since they are no more able to deliver data to the sink. Clearly, such high degree of inter-dependence complicates the assessment task, by dramatically increasing the number of variables and dynamics to encompass. Finally, but not less important, resiliency assessment cannot neglect actual hardware/software platforms features and the sensing hardware being used: different power consumptions and failure rates are indeed experienced when varying the underlying platforms, such as sensing hardware, radio chip and node operating system. Resiliency assessment cannot deviate from the use of models. State-of-art techniques for the assessment of non-functional properties, such as power consumption or dependability are mostly based on behavioral simulators and analytical models, as deeply discussed in Chapter 2. WSN Behavioral simulators, such as ns-2  or TOSSIM , are close to real WSNs. They typically belong to the the final user (e.g. the deployer) domain of knowledge and allow to reproduce the expected behavior of single WSN nodes on the basis of the real application planned to execute. However, they are not designed to express and to evaluate non-functional properties. Such an analysis requires to evaluate statistical estimators and hence it needs several simulations runs in order to achieve results with an acceptable confidence. This in turn increases the time needed for the simulation by order of magnitudes, given the low-level of detail of these approaches. Analytical models, such as Petri nets and Markov chains, are the reference for resiliency assessment techniques. They have been successfully used for decades for the assessment of computer systems, including WSNs [17, 18]. However, the highly dynamic nature of WSNs requires the definition of detailed and complex models which are difficult to develop and hardly re-usable for different scenarios For instance, if a modeling team would invest for a fine grain model of a WSN, taking into account software, routing issues and hardware platforms, even a tiny change in the design parameters of the considered WSNs, such as the software or the topology, would probably require a modeling phase ex-novo, incurring in unaffordable design costs, while such aspects are well and easily reproduced in behavioral models. As matter of fact, the assessment of WSN resiliency following a mere analytical approach requires strong simplifying assumptions that often lead to rather abstract results. To overcome the limitation of available approaches, this thesis proposes a novel and holistic approach for the resiliency assessment of WSNs. Key focus of the approach is the holistic resiliency assessment, i.e., the comprehensive assessment performed by taking into account all subsystems and inter-related factors concurring to the behavior of the WSN. Hence, an important step toward the resiliency assessment of a WSN is to evaluate: i) how the node workload, hardware platforms, topology and routing protocols impact on the failure proneness of nodes and of the network, and, vice-versa, ii) how node and network failures impacts on the nominal behavior of the WSN (e.g., how the failure of a node mutates the behavior of running workload or routing protocols). It is clear that the failure of a single node may impact on the behavior of the overall network in an unmanageable number of ways. Conversely, different user choices (e.g., the node workload and the routing algorithm) influence the nominal behavior as well as the failure behavior of every single node. To master this complexity, the approach separates the assessment of the failure behavior from the evaluation of the nominal behavior by considering i) a set of parametric analytical failure models, and ii) a WSN behavioral model, respectively. Initially, the behavioral model is exploited to configure the WSNs in terms of hardware platform, topology, routing and MAC protocols, and to study the nominal behavior of the software, included the OS, and the power consumption of the nodes. Evaluations performed with the behavioral models are used to gather values for failure model parameters of the WSN under study, such as the packet forwarding rate of each node. Then the power of the analytical failure model is exploited to evaluate a set of metrics of interest such as the resiliency. However, it is not difficult to realize that some parameters are dynamic over time, i.e., their values need to be dynamically updated during the assessment, driven by the failure model. To exemplify consider a node X that stops working, due to battery exhaustion. After this failure, the routing tree needs to be updated, and traffic patterns in the network change consequently. Different traffic patterns in turn cause a different nodes battery discharge rate which finally affects the lifetime of individual nodes, and likely, of the WSN. A possible solution would be to stop the failure model at each change event, and to step back to the behavioral simulation in order to re-compute network parameters coherently with new working condition, however, at the price of unaffordable simulation costs. For this reason, the proposed approach delegates the effort of computing the variation of dynamic parameters to an additional component, here referred as External Engine which orchestrates the evolution of the failure model. The external engine can be regarded as a supervision entity encapsulating and managing aspects that are generally difficult to express at the level of abstraction of analytical models. Hence, the engine is essential to keep models simple, general and reusable. The use of the External Engine decouples analytical models from “changes“ management issues, allowing to simplify the failure model which can adapt to each manifesting change, transparently. Moreover, the assessment is more realistic since it encompasses all network/application related parameters which are likely to change during WSN lifetime, without the need of strong simplifying assumption. The proposed approach is also conceived to reduce the modeling effort of final users by automating the creation of failure models, metrics to be estimated, and experiments to be performed. To this aim, information collected after the behavioral model simulation, concerning adopted node and sensing platform, radio chip, batteries workload and topology are exploited to specialize a specific set of templates from a Failure Model Template Library. Failure Model Templates are skeletons of failure models, described by means of XML files, which are produced una tantum by a domain expert. They are composed of i) a well defined interface, ii) a part depending on the specific WSN , and iii) a fixed part. Well defined interface are used to compose complex models by joining different sub-models together. Template parts depending on the specific WSNs are the objective of the automated failure model generation since they need to be generated according to the considered WSN. For instance, after the behavioral simulation, a model is generated for each sensor node with as many output links to other node failure models as its neighbors in the topology configured by the user. Generated models are then populated with parameters which values reflect the WSN studied in the behavioral simulation, e.g., for each nodes, generated output links models are populated with packet loss probabilities which values have been evaluated during the behavioral simulation. The generation phase ends by producing a XML description for each generated models, which are completed with a XML description of metrics and experiments of interests to perform (e.g., selected by the user, consistently with his/her interests). Finally, a parser translates the XML descriptors in a format compliant to the selected analytical model formalism, i.e., specializing the produced XML for the modeling framework chosen for performing experiments of interest. Relying on an automated modeling phase, the proposed approach allows final users (i.e., WSN developers) to work within their knowledge domain, without requiring specific modeling and/or programming skills. In other terms, developers interact with artifacts that are related to their domain, such as behavioral simulators. Finally, interested industries may release failure model libraries upon the release of WSN hardware, following the same approach as for HDL libraries. In the context of this thesis, Stochastic Activity Networks (SAN) formalism  and the Mobius  framework are adopted to develop and simulateWSNs failure models, due to their flexibility and extensibility features. The effectiveness of the approach is shown by means of a resiliency assessment campaign based on a set of hypothetic, real-world WSNs. As it will detailed later, the approach allows to anticipate design choices by evaluating the resiliency under different failure conditions and scenarios, workload behavior, and adopted routing algorithms, using the same set of parametric SAN models. The approach can be adopted by a hypothetic user, who can exploit simulation results to fine-tune his applications, for instance, selecting an appropriate routing algorithm and/or application workload which make the WSN able to fulfill given requirements, e.g. in terms of resiliency. The proposed approach may help also in the case of already deployed WSN, for instance, by forecasting the time when the WSN will exhibit a degraded behavior by deviating from its specifications helping to schedule maintenance actions in advance. This thesis is organized in 7 chapters as it follows. Chapter 1 provides a brief overview of WSNs and their applications, stressing their requirements and the importance of resiliency in the considered scenarios. Chapter 2 analyzes the state of the art in the field of WSN simulation and dependability assessment. Chapter 3 provides the definition of both connection and data resiliency. Chapter 4 is focused on the holistic approach, objective of this thesis, and it presents challenges and solutions for the orchestration of the behavioral and analytical simulation and for the automated failure model generation. Chapter 5 presents the behavioral models and the parameters needed to generate the failure model. Chapter 6 presents the failure models and the followed modeling approach. Chapter 7 finally provides a set of case studies aiming at showing the effectiveness of the approach concerning different WSN deployments.
Solo per gli Amministratori dell'archivio: edita il record