IN CONCURRENCE WITH TEST TECHNOLOGY TECHNICAL COUNCIL (TTTC) OF THE IEEE COMPUTER SOCIETY MINISTRY OF EDUCATION AND SCIENCE OF UKRAINE

> KHARKOV NATIONAL UNIVERSITY OF RADIOELECTRONICS

> > ISSN 1563-0064

# RADIOELECTRONICS & INFORMATICS

**Scientific and Technical Journal** 

Founded in 1997

№ 1 (40), January – March 2008

Published 4 times a year

© Kharkov National University of Radioelectronics, 2008

Sertificate of the State Registration KB № 12097-968 ПР 14.12.2006

# **International Editorial Board:**

Y. Zorian – USA M. Karavay – Russia R. Ubar – Estonia S. Shoukourian – Armenia D. Speranskiy – Russia M. Renovell – France A. Zakrevskiy – Byelorussia R. Seinauskas – Lithuania Z. Navabi – Iran E. J. Aas - Norway J. Abraham – USA A. Ivanov – Canada V. Kharchenko – Ukraine O. Novak - Czech Republic Z. Peng - Sweden B. Bennetts - UK P. Prinetto - Italy V. Tarassenko - Ukraine V. Yarmolik - Byelorussia W. Kusmicz - Poland E. Gramatova - Slovakia H-J. Wunderlich – Germany S. Demidenko – New Zealand F. Vargas – Brazil J-L. Huertas Diaz – Spain M. Hristov – Bulgaria W. Grabinsky - Switzerland A. Barkalov - Poland, Ukraine

# **Local Editorial Board:**

Bondarenko M.F. – Ukraine Bykh A.I. – Ukraine Volotshuk Yu.N – Ukraine Gorbenko I.D. - Ukraine Gordienko Yu.E. – Ukraine Dikarev V.A. - Ukraine Krivoulya G.F. - Ukraine Nerukh A.G. – Ukraine Petrov E.G. – Ukraine Presnyakov I.N. – Ukraine Rutkas A.G. – Ukraine Rudenko O.G. – Ukraine Svir I.B. – Ukraine Svich V.A. – Ukraine Semenets V.V. – Ukraine Slipchenko N.I. – Ukraine Terzijan V.Ya. – Ukraine Chumachenko S.V. – Ukraine Hahanov V.I. – Ukraine Yakovenko V.M. – Ukraine Yakovlev S.V. – Ukraine

Address of journal edition: Ukraine, 61166, Kharkiv, Lenin avenu, 14, KNURE, Design Automation Department, room 321, ph. (0572) 70-21-326, d-r Hahanov V.I. E-mail: ri@kture.kharkov.ua; hahanov@kture.kharkov.ua

# CONTENTS

| Raimund Ubar, Gert Jervan, Artur Jutman, Jaan Raik, Peeter Ellervee, Margus Kruus.<br>Research in Digital Design and Test at Tallinn University of Technology                                         |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Øystein Gjermundnes, Einar J. Aas. Enhancing Path Delay Fault Coverage by Weighted<br>Pseudorandom Test Generation                                                                                    |
| Arkadij Zakrevskij. Programming Calculations in Many-Dimensional Boolean Space                                                                                                                        |
| <b>Ondřej Novák, Jiří Jeníček.</b> Test Pattern Overlapping - a Promising Compression Method for Narrow Test Access Mechanism SOC Circuits                                                            |
| Ivan E. Villalon-Turrubiates and Yuriy V. Shkvarko Comparative Study of the Descriptive<br>Experiment Design and Robust Fused Bayesian Regularization Techniques For High-Resolution<br>Radar Imaging |
| Oleksiy V. Klymenko. Theoretical Study of Diffusion and Adsorption Inside Nano- and<br>Mesoporous Active Particles                                                                                    |
| Natalia Shabaldina, Nina Yevtushenko. Solving Parallel Multi Component Automata Equations55                                                                                                           |
| Dmitry Speranskiy. Experiments with the Linear Automata and Synthesis Test to Them                                                                                                                    |
| <b>Dmitrienko V.D., Leonov S.Yu., Gladkikh T.V.</b> Research digital devices by means of modelling system on the basis of K-Value differential calculus                                               |
| <b>Igor N. Presnjakov, Leonid I. Nefedov, Stanislaw A. Krivenko, and Alexander P. Stativka.</b><br>Theory and Applications of Constrained LinearPredictive (LP) models                                |
| A. Filipenko, O. Sychova. Monitoring of Photonic-Crystal Fibers Positioning in the Connection<br>Process                                                                                              |
| <b>Ryabtsev V.G, Almadi M.K.</b> Features of Decision Support's Program at Choice of Tests<br>Optimized Sequence for Semiconductors Memory Diagnosing                                                 |
| Vladimir Hahanov, Eugenia Litvinova, W. Gharibi. General Testing Models of SOC Hardware-<br>Software Components                                                                                       |
| Preparation of Papers for IEEE TRANSACTIONS and JOURNALS                                                                                                                                              |

# Research in Digital Design and Test at Tallinn University of Technology

Raimund Ubar, Gert Jervan, Artur Jutman, Jaan Raik, Peeter Ellervee, Margus Kruus

Abstract — An overview about the recent research results at the Tallinn University of Technology in the field of digital design and test is presented. The main topics discussed in the paper cover digital design, verification, emulation, dependability, fault simulation, and test generation. An experimental research environment is described which consists of prototype tools developed as a side-effect of our research activities. This environment together with a set of dedicated elearning tools serves also for teaching purposes for the disciplines of design and test of embedded systems.

#### 1. INTRODUCTION

INCREASING complexity of electronic systems has made testing and verification one of the most complicated and time-consuming problems in system design and production. The importance of design for testability is growing because the expenses of testing are becoming the major components of the design and manufacturing costs of new products. It is estimated that more than 70% of the design cycle for systems is spent on test and verification [1]. Nanometer technologies are introducing new challenges making test quality and dependability of systems a very fast moving target [2]. Enhancing productivity and quality of test related solutions is thus a key competitive aspect, both in terms of time-to-market and end-product quality.

In this paper an overview about the recent results in the field of digital test at Tallinn University of Technology (TUT) is presented. One of the most important research areas has been multi-level diagnostic modeling of digital systems by Decision Diagrams (DD) [3]. Using DDs, a hierarchical automated test program generator DECIDER was developed which outpaces similar known academic systems in the speed of test generation [4]. Commercial tools of this type are missing today. A special class of Binary DDs (BDD) called structurally synthesized BDDs (SSBDD) has been developed [3] which allowed to implement ultra-fast fault simulator for combinational circuits [5]. Based on SSBDDs a defect-oriented test generator DOT was developed which is unic with its ability to prove redundancy of physical defects in digital circuits [6].

Recent results of research in the field of reconfigurable logic allowed to create a hardware accelerator to replace traditional software simulators, which allowed to increase the speed of fault simulation in digital circuits about 200 times [7]. Most of our current research is concentrated in the hot problems of testing Network-on-Chips (NoC) [8].

A set of prototype tools, developed as a side-effect of our research, together with dedicated set of tools targeted for elearning and created in frames of several EU projects, serve now at TUT for teaching design and test, design for testability and fault tolerance. The tools support lecture courses by hands-on training opportunity.

In the following several most important research results obtained in the recent years at TUT are presented. In Section 2 the results in design verification are presented. Section 3 describes simulation speed-up possibilities by hardware emulation, whereas Section 4 presents new software algorithmic possibilities to increase the speed of fault simulation. In Section 5 a novel approach to defectoriented test generation is presented, and in Section 6 our research on dependability issues is described. Finally, in Section 7 an overview is given about the prototype tool environment developed as a side-effect of our research.

#### 2. HIGH-LEVEL DD-BASED VERIFICATION

With the increase in size and complexity of modern ICs, it has become imperative to address critical verification issues in the design cycle. The process of verifying correctness of designs consumes between 60% and 80% of design effort [9]. For every designer the number of verification engineers may vary from 2 to 4 depending on the design complexity. Moreover, validation is so complex that, even though it consumes most of the computational resources and time, it is still the weakest link in the design process. Ensuring functional correctness is the most difficult part of designing a hardware system [10].



Fig. 1. APRICOT verification flow

We have developed a new framework for digital systems verification, called APRICOT (Assertions, PRopertIes, COde coverage and Test generation) [11]. It includes different tasks, such as assertion checking, code coverage analysis, simulation, test generation and property checking. APRICOT is easy to set up and use. The novelty of APRICOT lies in a system representation model called High-Level Decision Diagrams (HLDD). The framework has interfaces to commonly used design formats such as VHDL, SystemC, PSL and EDIF. Fig. 1 presents the general structure of the APRICOT framework.

Decision Diagrams have been used in verification for about two decades. Reduced Ordered BDDs [12] as canonical forms of Boolean functions have their application in equivalence checking and in symbolic model checking. Additionally, a higher abstraction level DD representation, called Assignment Decision Diagrams (ADD) [13], have been successfully applied to, both, register-transfer level (RTL) verification and test.

In this paper we consider a different decision diagram representation, High-Level Decision Diagrams (HLDD) that, unlike ADDs can be viewed as a generalization of BDDs. HLDDs can be used for representing different abstraction levels from RTL to behavioral. HLDDs have proven to be an efficient model for simulation and diagnosis since they provide for a fast evaluation by graph traversal and for easy identification of cause-effect relationships [14].

#### 2.1. Code Coverage Analysis

Code coverage provides insight into how thoroughly the code of a design is exercised by a suite of simulations. Code coverage analysis is a well-defined, well-scalable procedure and, thus, applicable to large designs. The main limitation of code coverage metrics lies in the fact that they only measure the quality of the test case in stimulating the implementation and do not necessarily prove its correctness with respect to the specification.

We have shown how classical coverage metrics map to HLDD constructs [15]. Covering all nodes in the HLDD model corresponds to covering all statements in the respective HDL. However, the opposite is not true. We showed that HLDD node coverage is more stringent than HDL statement coverage [15]. This is due to the fact that in HLDDs diagrams are generated to each data variable separately. Such partition on variables includes an additional context to statement coverage. Similar to the statement coverage, branch coverage has also very clear representation in HLDD simulation. The ratio of every edge activated in the HLDD simulation process constitutes to branch coverage.

#### 2.2. Assertion-Based Verification

Assertions have been found to be beneficial for solving a wide range of tasks in systems design ranging from modelling, verification to manufacturing test. In this paper, we present an approach to checking PSL assertions using HLDDs. Property Specification Language (PSL) is a recently accepted IEEE standard language [16] that is commonly used to express the assertions. Here, the assertions are translated to HLDD graphs and integrated into fast HLDD-based simulation. The structure of HLDD design representation with a temporal extension proposed in [17] allows straightforward and lossless translation of PSL properties.



Fig. 2. PSL property reqack

An example PSL property reqack structure is shown in Figure 2. Its possible timing diagram is also illustrated by Figure 3a. It states that ack must become high next after req being high. A system behaviour that activates reqack property however obviously violating it is demonstrated in Figure 3b. Figure 3c shows the case when the property was not activated.



Fig. 3. Timing diagrams for the property reqack

Let us consider an example PSL property P1.

#### P1: assert always !ready and (a=b)->next\_e[1:3]ready

Assertion P1 states that whenever 'ready' is low and 'a' is equal to 'b' then during the next three cycles ready must become true. The resulting HLDD graph describing this property is shown in Fig. 4.



Fig. 4. HLDD for property P1

Note that a HLDD representing assertions has always exactly three terminal nodes labeled by constants:

- FAIL assertion P has been simulated and does not hold;
- PASS the assertion has been simulated and holds;
- CHECKING P has been simulated and it does not fail, nor does it pass non-vacuously.

It has been shown by experiments that HLDDs is an efficient model for performing assertion checking [17].

## 2.3. Formal Verification

The formal methods to be included to the APRICOT framework include high-level Automated Test Pattern Generation (ATPG) [18] and formal property checking. The latter is under development and will be reduced to using the first one as a model-checking engine.

#### **3.** USING EMULATION FOR SIMULATION SPEED-UP

One of the main problems when designing modern onchip systems is to make sure that already the first version of the chip is "alive". That is, (1) all essential hardware components are working, and (2) application software and drivers are ready when the chip arrives from the factory (see e.g. [19]). Software simulation is the simplest way to check the functionality of a system and is typically the first choice. Unfortunately, because of its slowness, simulation does not guarantee that the results are available when needed and different approaches are under development to find possibilities for accelerating the simulation process. A possible solution is to use emulation (simulation in hardware) using reconfigurable logic like FPGA-s (see e.g. [20]). To model a hardware module, hardware description language (HDL) based simulation is the choice of most engineers. Taking into account that a model of the system may consist not only of HDL modules at various levels of abstraction but also of modules written in different HDL-s, a complicated simulation environment is required to make sure that the system is working as expected. [21].

Fault simulation is another widely used procedure in the digital circuit design flow. Test generation, fault diagnosis, test set compaction serve as examples of application of fault-free/fault simulators. Accelerating fault simulation would improve all the above-mentioned applications.

The maximum gain in performance could be achieved by moving all the required modules into hard-ware, i.e., emulating the test-bench in hardware. There exist several approaches that confirm the usefulness of replacing simulation with emulation (see, e.g., [19,20]). Difficulties arise when the test-bench is so complex that major modifications are needed for implementing in hardware – test-benches are only models and are not meant to be implemented in hardware. To test the idea of replacing simulation with emulation, an environment was created with the purpose to evaluate the feasibility of replacing fault simulation with FPGA based emulation [7].

The availability of large FPGA-s doesn't allow merely implementation of the circuit under test along with fault models but, additionally, to include test vector generation and output response analysis circuits, which in this case correspond to the test-bench, into a single reconfigurable device. Here we relied on a well-known solution for BIST – Linear Feedback Shift Register (LFSR) is used both for input vector generation and output correctness analysis. Automation of emulation environment generation was rather easy because of the modular structure of the hardware part. All commonly used modules are written in VHDL that allows to parameterize design units (see Fig.5). The abstraction level of VHDL modules corresponds to register-transfer level thus allowing the use of basically any FPGA mapping tool.



Fig. 5. Fault emulation environment structure

The proposed approach allows simulation speed-up of 40-500 times as compared to the software-based fault simulation [7]. It should be noted that when taking into account also synthesis time, the speed-up is much smaller and therefore the most beneficial is to use scenarios where the number of simulation runs is large, e.g., evaluation of generator/analyzer structures for BIST.

Based on experiences with fault emulation, a more elaborated emulation environment is under development. The need for such an environment is based on the fact that the whole description of a system almost always contains modules described at different abstraction levels. Some of these parts are never meant to be implemented in hardware, e.g., test-benches and application software. The three following levels can be outlined in the first order:

- Register-transfer level that is synthesizable and therefore directly implementable on FPGA.
- Behavioral (functional) level that is synthesizable by high-level synthesis tools under certain circumstances.
- The rest, essentially software, has lost hardware related issues from its abstraction and is therefore compilable for the used processor (core).



Fig. 6. Multi-module emulation environment

As a result, the needed complexity and performance of the simulator will be reduced. Fig. 6 depicts the structure of such emulation environment, consisting of multiple parallel modules. Additional information about research challenges and potential solutions, especially synchronization related, can be found in [22,23].

# 4. System Level Design of Dependable Real-time Systems-on-Chip

The future on chip systems will resemble more computer networks than traditional chips and the Network-on-Chip (NoC) paradigm has been proposed. In addition, new integration methodologies have enabled new 3D architectures, where the dies are stacked into 3-dimensional structures, thus providing even higher densities and complexity.

As technologies advance and semiconductor process dimensions shrink into the nanometer and subnanometer range, a high degree of sensitivity to defects begins to impact overall yield and quality [3]. It becomes very expensive to obtain perfectly operational hardware and the design processes have to be changed.

# 4.1. Our NoC platform

Our NoC platform is scalable packet switched communication platform for single chip heterogeneous systems. The hard guarantees are provided in Time Division Multiple Access (TDMA) way.

The NoC topology is m x n (2D) mesh with bi-directional links between the switches. Each switch is connected to 4 switches and to 1 resource. Every resource is connected to

switch via resource network interface (RNI). The NoC platform uses a subset of OSI Reference Model layers: physical, data link, network layer, transport layer and application layer. A resource operates on all 5 layers while switches operate on 4 lower layers. We use wormhole switching with virtual channels and deterministic dimension-ordered (XY) routing.

We concentrate on hard real-time data dominated event triggered NoC systems. However, not all of the traffic must be real-time.

# 4.2. System Specification & Design

In our approach the application is specified using C code. Based on the input description we extract the Extended Conditional Task Graph (ECTG). In NoC the communication platform introduces communication latency which depends not only on message size but also on resource mapping and needs to be taken into account. Fig. 7.a depicts an example task graph which describes an extract of a GSM decoder.

The ECTG describes the application tasks, their dependencies and task parameters – for example Worst Case Execution Time (WCET), task size etc. Additionally, we need to have the system reliability requirements, how many faults need to be tolerated, and describe the available hardware resources. All the information above and the ECTG itself are captured in XML file. An example of captured information for a part of GSM Decoder can be seen on Fig. 7b.

Once we have the refined task graph, system architecture and dependability description we need to produce a schedule which meets the application deadlines and dependability requirements. During the whole process we take into consideration also possible task mapping. For example – data intensive tasks could be mapped to the same resource or nearby resources to compensate network latency. The exact scheduling algorithms are known to be NP-hard problems. Therefore, different heuristics are used for calculating near-optimal schedules with reasonable time. Fig. 7.c depicts an example of scheduling Figure 1a task graph on multi-processor system.

# 4.3. Dependability analysis

Reliability improvement techniques have been extensively studied in various systems, either in bus based embedded, macro distributed systems or cover lower layers of NoC. Our objective is to extend those techniques to the system level, to provide design support at early stages of the design flow. The application should be able to tolerate transient or intermittent faults. Permanent faults can be handled by re-scheduling and re-mapping the application on a NoC. During the scheduling process we have from one hand the list of tasks with Worst Case Execution Times and on another hand the dependability requirements.

This is a design area where do not exist any integrated system level design methodology with dependability

requirements. One of the objectives is also to extend the existing system level design tasks into the new design

paradigms, such as NoC-based systems and 3D architectures.



Fig. 7. Scheduling and mapping

#### 5. ULTRA-FAST FAULT SIMULATION

Fault simulation is a well investigated research field, and a lot of methods have been proposed during the last decades, like parallel fault simulation, deductive fault analysis, critical path tracing. The main problem of very powerful critical path method is related to handling of reconvergent fan-outs. It can "process" by a single simulation run all the faults in the circuit, however it works exactly only in the fan-out free circuits. A modified rule based critical path technique that is linear time, exact, and complete was proposed in [24]. However, the rule based strategy does not allow simultaneous parallel analysis of many patterns beyond the fan-out free regions.

In [25] we proposed a new concept of parallel critical path tracing throughout the whole circuit.

Differently from the known critical path tracing approaches, a method is proposed to create ordered topological model for parallel fault backtracing. The model is based on the full Boolean differential, which allows generalization of the parallel critical path fault tracing beyond the reconvergent fan-out stems (see Fig.8). The method is based on the following theorem [25].

**Theorem:** If a stuck-at fault is detected by a test pattern at the output *y* of a subcircuit (see in Fig. 8) represented by a Boolean function  $y = F(x_1, ..., x_i, x_j, ..., x_n)$ , then the fault at the fan-out stem *x* which converges in *y* at the inputs  $x_1$ , ...,  $x_i$ , is also detected iff

$$\frac{\partial y}{\partial x} = y \oplus F((x_1 \oplus \frac{\partial x_1}{\partial x}), \dots, (x_i \oplus \frac{\partial x_i}{\partial x}), x_j, \dots, x_n) = 1 \quad (1)$$

From the formula (1), a method results for generalizing the parallel exact critical path tracing beyond the fan-out free regions. All the calculations in (1) can be carried out in parallel because they are Boolean operations. Further details about the solution for nested reconvergencies can be found in [15].



Fig. 8. Reconvergent FFR in a circuit

A topological pre-analysis is carried out to generate an efficient optimized model for backtracing of faults to minimize the repeated calculations because of the reconvergent fan-outs. The algorithm is equivalent to exact critical path tracing, while the backtracing is organized in parallel for groups of test patterns. To achieve high simulation speed, the network of macros rather than gates is used. To make it possible to rise from the lower gate level to the higher macro level, the macros are modeled by structurally synthesized BDDs. A special calculation method was developed to handle the SSBDDs in parallel for groups of test patterns [25].

The proposed exact parallel path tracing fault analysis is carried out in the following sessions:

- topological pre-analysis of the circuit to create a model for fault tracing along the critical paths;
- parallel simulation of a given set of test patterns to calculate the values of all variables of the circuit;
- parallel fault backtracing on the topological model created during the first session.

The topological pre-analysis to create a model for fault backtracing is carried out only once to serve all the next sessions of the procedure. It consists of the following procedures:

- creation of the Reconvergency Graph (RG) of the circuit,
- creation of the whole calculation model of the circuit.

Because of the parallelism, higher abstraction level modeling, and optimization of the topological model, the speed of fault simulation was considerably increased.

Table 1 presents the fault simulation results for the circuits of ISCAS'85 and ISCAS'89 families (column 1) to compare different fault simulators: exact critical path tracing [24] (column 2), two state-of-the-art commercial fault simulators from major CAD vendors (columns 3 and 4), and the proposed new method (column 5). The simulation times were calculated for the sets of random 10000 patterns. The time needed for topology analysis is included and is negligible compared to the gain in speed compared to the previous best method.

| TABLE 1. COMPARISON OF TOOLS FOR FAULT SIMULA | TION |
|-----------------------------------------------|------|
|-----------------------------------------------|------|

| Circuit               | Fault simulation time, s |        |       |       |  |  |
|-----------------------|--------------------------|--------|-------|-------|--|--|
| Chicult               | [24]                     | C1     | C2    | New   |  |  |
| c432                  | 70                       | 13.0   | 3.8   | 0.64  |  |  |
| c499                  | 190                      | 3.0    | 2.8   | 0.98  |  |  |
| c880                  | 140                      | 26.0   | 4.0   | 0.65  |  |  |
| c1355                 | 640                      | 44.0   | 9.0   | 1.33  |  |  |
| c1908                 | 640                      | 53.0   | 15.6  | 1.61  |  |  |
| c2670                 | 560                      | 104.0  | 11.0  | 1.99  |  |  |
| c3540                 | 770                      | 191.0  | 37.4  | 4.43  |  |  |
| c5315                 | 1270                     | 246.0  | 28.6  | 3.41  |  |  |
| c6288                 | 4280                     | 1159.0 | 139.2 | 46.39 |  |  |
| c7552                 | 1480                     | 378.0  | 40.5  | 5.44  |  |  |
| s4863_C               | N/A                      | 353.0  | 30.0  | 5.10  |  |  |
| s5378_C               | N/A                      | 170.0  | 15.9  | 4.17  |  |  |
| s6669_C               | N/A                      | 416.0  | 40.8  | 7.94  |  |  |
| s9234_C               | N/A                      | 248.0  | 26.7  | 6.72  |  |  |
| s13207_C              | N/A                      | 332.0  | 27.2  | 10.18 |  |  |
| s15850_C              | N/A                      | 470.0  | 57.8  | 13.75 |  |  |
| s35932_C              | N/A                      | 1751.0 | 111.6 | 36.22 |  |  |
| s38417_C              | N/A                      | 1351.0 | 157.0 | 39.05 |  |  |
| s38584_C              | N/A                      | 1399.0 | 115.3 | 34.97 |  |  |
| Average<br>speed gain | 258.9                    | 41.1   | 5.8   | 1     |  |  |

Compared to the commercial tools C1 and C2, the average gain in speed is 41.1 and 5.8 times, respectively. All the experiments were run on a 366 MHz SUN Ultra60 server using SunOS 5.8 operating system except the experiments for the known exact critical path fault simulator where the data are taken from [24]. The experiments in [24] were run on a 2.8 GHz Pentium 4 computer with Windows XP.

## 6. DEFECT-ORIENTED TEST GENERATION

The logical stuck-at fault (SAF) model has been a long time the prevalent technique to handle formally the real physical defects in electronic systems. In today's systems, however, we have two difficulties when using this model: it is too complex because of the huge number of faults to be handled in systems, and it is inaccurate to represent real physical defects which are taking place in today's nanoelectronic circuits. The paradox is that the two difficulties are working against each other: when trying to represent the defects with less complex and higher level fault models the accuracy will even decrease, and vice versa, when trying to increase the accuracy of defect modeling, the complexity of the fault model will increase. To get out from the deadlock, the two opposite trends high-level modeling and defect-orientation - should be combined into hierarchical approach.

Another problem is that the know-how about defects is quickly getting obsolet. New semiconductor processes will introduce new failure mechanisms, defects, and fault effects. This makes defect-based testing difficult, and all the needed changes in defect modeling should be taken into account and introduced continuously into the database of test generation and fault simulation tools.

We have developed a new approach for hierarchical defect simulation based on defect preanalysis for the components included into the libraries, and using the results of preanalysis in higher level fault modeling. The cornerstone of the new approach is - the functional fault model as a method for mapping faults from one hierarchical level to another. Based on this approach, a hierarchical algorithm for defect-oriented deterministic test generation was developed and implemented [6].

A methodology was developed which allows to find the types of defects that may occur in a real circuit, to determine their probabilities of occurrence, and to find the input test patterns (logical constraints) that allow to activate and detect these defects. This set of constraints which allows to detect all defects in a given component is called functional fault model of the component.

According to this model, each library component is represented by a set of logical constraints needed for activating the defects in the component. Simulations for finding the constraints are carried out on the layout level of components. The set of logical constraints can be regarded as a method for mapping physical defects to the logic level. During higher (logic) level test generation and fault simulation the physical defects are modeled only by logical constraints without referring back to the layout details.

The proposed functional fault model allows to represent and handle arbitrary physical defects not only in the library components, but also the physical defects in the communication network of components by the same technique.

There is a class of physical defects which increase the number of states in the circuit. To activate these defects, sequences of patterns (sequential constraints) are needed. Simulation based technique to find sequential constraints is not the best solution. For this purpose analytical approach was developed. A physical defect in the component is modeled as a defect variable in a generic Boolean differential equation which includes both the correct and faulty behavior of the component. Solutions of these equations give the logical constraints (or sequential constraints) for activating defects locally. In such a way, for example, the bridging faults that cause a feedback and transform a combinational circuit into a sequential one, can be modeled for test generation purposes.

A defect-oriented deterministic test generation tool (DOT) was developed [6]. The experimental data obtained by the tool for ISCAS'85 benchmark circuits are presented in Table 2. It was shown that 100% stuck-at fault tests covered only about 75-82% physical defects (column 5 in Table 2). The main feature of the new tool is its ability to reach 100% defect testing efficiency (percentage of covering the non-redundant defects) for the given set of defects by proving the redundancy of not detected defects. The tool allows to prove the redundancy of physical defects in relation to the logic behavior of a circuit.

|         | Number of defects |                 |            | Defect coverage  |                           |         |        |
|---------|-------------------|-----------------|------------|------------------|---------------------------|---------|--------|
| Circuit | All Redundant de  |                 | nt defects | 1000/ at         | 1000/ study at fault ATDC |         | DOT    |
|         | defects           | ts Gates System |            | Cts Gates System |                           | II AIPO | DOI    |
| 1       | 2                 | 3               | 4          | 5                | 6                         | 7       | 8      |
| c432    | 1519              | 226             | 0          | 78,6             | 99,05                     | 99,05   | 100,00 |
| c880    | 3380              | 499             | 5          | 75,0             | 99,50                     | 99,66   | 100,00 |
| c2670   | 6090              | 703             | 61         | 79,1             | 98,29                     | 98,29   | 100,00 |
| c3540   | 7660              | 985             | 74         | 80,1             | 98,52                     | 99,76   | 99,97  |
| c5315   | 14794             | 1546            | 260        | 82,4             | 97,73                     | 99,93   | 100,00 |
| c6288   | 24433             | 4005            | 41         | 77,0             | 99,81                     | 100,00  | 100,00 |

TABLE 2. EXPERIMENTAL DATA OF DEFECT-ORIENTED TEST GENERATION

Column 6 in the Table 2 shows the defect testing efficiency after proving the redundancy of defects inside the library cells, and column 7 shows the defect testing efficiency after proving the redundancy for the whole set of defects. The column 8 shows the defect testing efficiency reached by the test generation tool DOT.

#### 7. DESIGN AND TEST RESEARCH ENVIRONMENT

The experimental tools developed as a side effect of the research carried out at TUT during the recent 5-6 years are organized as an experimental R&D environment for investigating a broad set of design and test problems (Fig. 9). The environment consists of the following parts:

- Synthesis tools (high-level and logic level synthesis).
- Test generation and fault simulation tools (hierarchical, logic and defect level test sequence generators).
- Converters (interfaces between tools).
- Other (university) tools linked to the environment.

Design information can be created in different ways: (1) VHDL files to be processed by commercial or experimental high-level or logic synthesis systems, (2) manually by schematic editors. The gate-level design is presented in the

EDIF format. In university research practice, ISCAS benchmark families which have their own presentation format (ISCAS format) are widely used. In order to link test generation and fault simulation tools with all the needed formats, different converters are developed. EDIF netlists can be converted into ISCAS'85 or ISCAS'89 formats. Necessary technology library files to support such conversion have been created for the research environment.

The Turbo-Tester tools are based on SSBDDs, they have EDIF-SSBDD converters to link the tools with commercial CAD systems. Hierarchical ATPG DECIDER uses two inputs – higher level (RTL) descriptions in VHDL and low gate-level descriptions in EDIF. For importing VHDL descriptions to DECIDER which uses high-level DDs as input, a converter VHDL-DD is available.

As a set of examples, the following design flows can be exercised in this environment.

• Design and hierarchical ATPG. RTL VHDL design is synthesized by high-level synthesis tool. A logic level synthesis for the high-level blocks follows. For these designs DD and SSBDD models are generated. Using DDs and SSBDDs, hierarchical ATPG DECIDER generates test sequences.

- Logic level ATPG. Using SSBDDs, Turbo Tester ATPG generates logic level test patterns targeted to detect logic level stuck-at faults.
- Defect-oriented ATPG. Using SSBDDs and the defect library, the defect-oriented test generator DOT generates test patterns targeted to defect

physical defects. The defect libraries available are created in cooperation with Warsaw University of Technology.

• University tools that traditionally use ISCAS benchmarks can be linked via EDIF-ISCAS converter to commercial design tools.



Fig. 9. Hierarchical design and test research environment

Turbo Tester tool set represents an independent logic level test research environment. It consists of a set of tools for solving different test related tasks by different methods and algorithms:

- Test pattern generation by deterministic, random and genetic algorithms.
- Test optimization (test compaction).
- Fault simulation and fault grading for combinational and sequential circuits by different algorithms.
- Defect-oriented fault simulation and test generation.
- Multi-valued simulation for detecting hazards and analyzing dynamic behavior of circuits.
- BIST analysis and quality evaluation for different BIST architectures.

All the Turbo Tester tools operate on the model of SSBDD. The tools run on the structural level whereas two possibilities are available – gate-level and macro-level modeling. In the latter case, the gate network is transformed into macro network where each macro represents a tree-like sub-network. Using the macro-level helps to reduce the complexity of the model and to improve the performance of tools. The fault model used in the Turbo Tester is the traditional stuck-at one. However, the fault simulator and test generator can be run also in the defect-oriented mode, where defects in the library

R&I, 2008, № 1

components can be taken into account. In this case, additional input information is needed about defects in the form of defect tables for the library components.

A selection of the prototype tools described above together with a set of separate tools (Java applets) developed specially for teaching purposes are integrated into e-learning environment to support university courses by providing opportunity for the students for hands-on training [26]. This environment consists of toolsets: (1) Turbo Tester - CAD Software for Digital Test, (2) xTractor - CAD Software for High-Level Synthesis, (3) DefSim - HW/SW environment for experimental study of CMOS defects, (4) BIST Analyzer - a training system for learning self-testing issues of modern multi-core electronic systems, (5) Trainer 1149 - a multi-functional SW system, which provides a simulation, demonstration, and CAD environment for learning, research, and development related to IEEE 1149.1 Boundary Scan (BS) standard, (6) Applets for training and teaching logic synthesis and test at gate- and register transfer levels, (7) Applets for FSM Decomposition and Synthesis, (8) Deterministic traffic generator for NoC simulator, and (9) Test Time Calculator (Simple NoC simulator, based on XY-routing).

The laboratory tasks developed for this environment represent simultaneously real research problems, which allow to foster in students critical thinking, problem solving skills and creativity in a real research environment and atmosphere.

#### CONCLUSIONS

An overview was given about the recent research results at TUT in the field of design and test of dependable embedded systems. These results have been obtained thanks to the broad international cooperation during the last decade in frame of several EU projects like SYTIC, VILAB, REASON, eVIKINGS II, VERTIGO [27]. As a result of these projects, two new competence centres were established – Estonian Research Centre for Dependable Computing and Estonian Development Centre of Mission Critical Embedded Systems (ELIKO). ELIKO contracts between 7 private SMEs in Estonia under the leadership of TUT. Both centres are working on transfer of technology to local industry. Through ELIKO very tight links have been established now between the Academia and the industry of Estonia.

As a side-effect of the research carried out during recent years, an experimental research environment has been developed to support in the future both, research and teaching. The originality of the environment is in multifunctionality of the system (important for research and training), low-cost and ease of use. The multi-functionality means that different abstraction level models can be easily synthesized (to analyze the influence of the complexity of the model to the efficiency of methods); different methods of the same task are implemented (to analyze the efficiency of different approaches), the fault models can be easily changed and updated (to analyze the adequacy and accuracy of testing). The multi-functionality allows to set up and modify easily different experimental schemes and scenarios for investigating new ideas and methods. This multi-functionality gives an excellent opportunity for students working in this environment to understand the ideas, advantages and drawbacks of different methods at changeable conditions. In traditional commercial design tools these purely research oriented possibilities are missing.

#### ACKNOWLEDGMENT

The work has been supported by Estonian Science Foundation grants 5910, 6717, 6829, 7068, EC 6th FP IST project VERTIGO, Estonian IT Foundation (EITSA) and Enterprise Estonia.

#### REFERENCES

- R.Klein, T.Piekarz. Accelerating Functional Simulation for Processor Based Designs. Mentor Graphics Corporation. White paper, 2005.
- [2] K.Roy, T.M.Mak, K.-T.T.Cheng. Test consideration for nanometerscale CMOS circuits. IEEE Design and Test of Computers, vol.23, no 2, pp.128-136, 2006.
- [3] R.Ubar. Test Synthesis with Alternative Graphs. IEEE Design and Test of Computers. Spring, 1996, pp.48-59.

- [4] J.Raik, R.Ubar. Fast Test Pattern Generation for Sequential Circuits Using Decision Diagram Representations. JETTA. Kluwer Acad Publishers, Vol. 16, No. 3, pp. 213-226, 2000.
- [5] R.Ubar, S.Devadze, J.Raik, A.Jutman. Fast Fault Simulation in Digital Circuits with Scan Path. 13th Asia and South Pacific Design Automation Conference – ASP-DAC 2008, Seoul, Korea, Jan. 21-24, 2008, pp. 667-672.
- [6] J.Raik, R.Ubar, J.Sudbrock, W.Kuzmicz, W.Pleskacz. DOT: New Deterministic Defect-Oriented ATPG Tool. Proc. of 10th IEEE European Test Symposium, May 22-25, 2005, Tallinn, pp.96-101.
- [7] P.Ellervee, J.Raik, R.Ubar, K.Tammemäe. FPGA-Based Fault Emulation of Synchronous Sequential Circuits. IEE Proc. on Computers & Digital Techniques. Vol.1, Issue 2, pp.70-76, March 2007.
- [8] J.Raik, R.Ubar, V.Govind. Test Configurations for Diagnosing Faulty Links in NoC Switches. 12th IEEE ETS 2007, Freiburg, Germany, May 20-24, 2007, pp.29-34.
- [9] International Technology Roadmap for Semiconductors 2006 report, [URL] www.itrs.net, 2006.
- [10] S.Tasiran, K.Keutzer, Coverage metrics for functional validation of hardware designs. Design&Test of Computers, IEEE, Vol 18, Issue 4, Jul-Aug. 2001, Pages 36-45.
- [11] URL: http://www.vertigo-project.eu
- [12] R.Bryant. Graph-based algorithms for boolean function manipulation. IEEE Trans. on Comp,C-35, 8:677-691, 1986
- [13] V.Chayakul, D.D.Gajski, L.Ramachandran, "High-Level Transformations for Minimizing Syntactic Variances", Proc. of ACM/IEEE DAC, pp. 413-418, June 1993.
- [14] R.Ubar, J.Raik, A.Morawiec, Back-tracing and Event-driven Techniques in High-level Simulation with Decision Diagrams. ISCAS 2000, Vol. 1, pp. 208-211.
- [15] K.Minakova, U.Reinsalu, A.Chepurov, J.Raik, M.Jenihhin, R.Ubar, P.Ellervee. High-Level DD Manipulations for Code Coverage Analysis, Baltic Electronics Conf., IEEE, 2008.
- [16] IEEE-Commission, "IEEE Standard for Property Specification Language (PSL)," 2005, IEEE Std 1850-2005.
- [17] M.Jenihhin, et al. Temporally Extended High-Level Decision Diagrams for PSL Assertions Simulation. Proc. of the 13th IEEE European Test Symposium, 2008.
- [18] J.Raik, R.Ubar, T.Viilukas, M.Jenihhin. Mixed Hierarchical-Functional Fault Models for Targeting Sequential Cores. Elsevier Journal of Systems Architecture.
- [19] A.Bigot et al. Deploying Hardware Platforms for SoC Validation: An Industrial Case Study. The International Conference on Field Programmable Logic and Applications (FPL'04), Antwerp, Belgium, pp. 64-73, Aug. 2004.
- [20] N.Genko et al. A Complete Network-On-Chip Emulation Framework. Design Automation & Test in Europe (DATE'05), Munich, Germany, pp. 246-251, March 2005.
- [21] K.Morris. Debug Dilemma. Simulate or Emulate? FPGA and Structured ASIC J., http://fpgajournal.com, Jan. 2005.
- [22] P.Ellervee, A.Arhipov, K.Tammemäe. Clock Manipul. for Heterogenous Emulation Environment. The 24th NORCHIP Conference, Linköping, Sweden, pp. 213-216, Nov. 2006.
- [23] P.Ellervee, U.Reinsalu, A.Arhipov. Translating Beha-vioral VHDL for Emulation. The 25th NORCHIP Conference, Aalborg, Denmark, Nov. 2007.
- [24] L.Wu, D.M.H.Walker. A Fast Algorithm for Critical Path Tracing in VLSI. Int. Symp. on Defect and Fault Tolerance in VLSI Systems, Oct. 2005, pp.178-186.
- [25] R.Ubar, S.Devadze, J.Raik, A.Jutman. Ultra Fast Parallel Fault Analysis on Structural BDDs. 12th IEEE ETS, Freiburg, Germany, May 20-24, 2007, pp.131-136.
- [26] http://ati.ttu.ee/projects/tools.html
- [27] http://ati.ttu.ee/index.php?page=800

# Enhancing Path Delay Fault Coverage by Weighted Pseudorandom Test Generation

Øystein Gjermundnes, Einar J. Aas

Abstract – The implementation of a system for analyzing circuits with respect to their path-delay fault testability is presented. It includes a path-delay fault simulator, and an ATPG for path-delay faults combined into a test tool. The test tool is used to evaluate the performance of several different test vector generators. The test generators exploit weighted pseudo-random stimuli generation, based on arithmetic BIST and SIC patterns. The main goal is to find efficient heuristics that improves path-delay fault detection efficiency in terms of test time. We show that weighted ABIST stimuli are productive for detecting the K-longest path-delay faults for most circuits. On the average, we obtained fault coverage of 92.6% for the 20.000 longest paths on iscas'85 circuits.

*Index Terms* – Built-in testing, Fault diagnosis, Automatic testing.

#### I. INTRODUCTION

Defect oriented testing is gaining attention, and Path Delay Fault (PDF) testing is one of the more challenging problems to study [9]. The test method toolbox has expanded significantly over the last decade. Various trade-offs on test methodology, test quality (measured by various fault coverage metrics), design-fortest development costs, silicon overhead, and cost of Automatic Test Equipment, including test application time, are performed.

For PDF testing, deterministic test pattern pairs, or Built-In Self-Test (BIST) generated patterns may be exploited. We have chosen to explore the possible usage of BIST methods. This paper describes the implementation of a system for analyzing circuits with respect to their pathdelay fault testability. The system includes a path-delay fault simulator, and an Automatic Test Pattern Generator (ATPG) for path-delay faults, combined into a test tool. The test tool is used to evaluate the performance of different test vector generators that may be used in various BIST

Manuscript received February 4, 2008.

The work was done while Gjermundnes was affiliated with NTNU.

arrangements. The test generators exploit weighted pseudorandom stimuli generation, based on arithmetic BIST principles. We show that this is a viable BIST method for detecting the K-longest path-delay faults with satisfactory PDF coverage for many circuits, but not for all circuits. We employ the tool on iscas'85 circuits. Our focus is on the methodology, not on specific stimuli generators. We envision the use of compact software programs, like published [8], to be loaded into the system under test. An in-depth presentation of this test project is found in [5].



Fig. 1. c17 with one of its paths highlighted

#### II. PATH DELAY FAULT SIMULATION MODEL

The path-delay fault model was proposed by Smith [9]. A definition of the path-delay fault model from [1] is:

The delay defect in the circuit is assumed to cause the cumulative delay of a combinational path to exceed some specified duration. The combinational path begins at a primary input or a clocked flip-flop, contains a connected chain of gates, and ends at a primary output or a clocked flip-flop. The specified time duration can be the duration of the clock period (or phase), or the vector period. The propagation delay is the time that a signal event (transition) takes to traverse the path. Both switching delays of devices and transport delays of interconnects on the path contribute to the propagation delay.

There are *two* path-delay faults associated with each physical path in the circuit: slow-to-rise, and slow-to-fall. Fig. 1 shows one path. The path-delay fault model has the ability to detect distributed defects caused by statistical process variations. A test for a path-delay fault will also detect any spot defects along the path. The number of paths,

Øystein Gjermundnes with the ARM Norway, PBox N-2182, NO-7412 Trondheim Norway, e-mail: <u>oystein.gjermundnes@arm.com</u>

Einar J. Aas with the Norwegian University of Science and Technology – NTNU, NO-7491 Trondheim, Norway, e-mail: ejaas@iet.ntnu.no

and thus path-delay faults, may be exponential in the number of gates in the circuit.

The selection of proper *simulation algebra* (alphabet and logic rules) is crucial for any logic/fault simulator. Our simulator **PDFSim** uses the **6**-valued algebra developed by [9]. Several features to obtain an efficient simulator are presented in [5], see also [4]. Of course, a *two-pattern* test vector is needed for delay fault testing. We adopt SIC (Single Input Change) vectors, because it was shown in [11] that such vectors are more effective than Multiple Input Change vectors for robust and non-robust testing.

### III. AUTOMATIC TEST PATTERN GENERATION

It is intractable to test all path-delay faults in a circuit. There are nearly  $10^{20}$  paths in one of the iscas'85 circuits! One accepted strategy is to test a subset of all possible path delay faults. The longest testable paths are of particular importance for high quality delay testing. An algorithm for extracting the K-longest testable path-delay faults (K-LT-PDF) in a circuit has been developed, and integrated with the fault simulator. The test generators employed will be evaluated against the fault lists containing K-LT-PDF.

The earliest attempts at creating an ATPG that could extract the K-LT-PDF were very inefficient. ATPGs normally employed two separate phases. Usually, a lot of paths are untestable, and a structural path extractor would find and pass a lot of untestable paths to the test generator. Fortunately, by combining the structural path extractor and the test generator, it is possible to prune the search space significantly by sorting out untestable sets of paths at an early stage. This approach was originally used by Qiu and Walker [12]. We have introduced several improvements in terms of efficiency, including recursive learning [6], and FAN-like [3] justifications. Recursive learning is a method for extracting all logical dependencies between signals in a circuit, and to perform *precise* implications for a given set of value assignments.

#### IV. BIST-BASED STIMULI GENERATORS

#### A. Basis Vectors

First, we wanted to investigate whether ABIST generators of a simple kind, namely accumulator based stimuli generators, would provide sufficient basis for pseudo-random patterns. In particular, the generator described in [8] was investigated:

$$A_i = A_{i-1} + C \pmod{2^n}, A_0 = I, i = 1, 2, 3, ..., V$$
 (1)

By carefully selecting the parameters C and I, one may exhaustively cover every subinterval of size r within the first  $2^r$  test vectors. This generator may be implemented as a compact software program in a micro controller. It will generate uniformly distributed values. Let us call these patterns UDB (Uniformly Distributed Basis) patterns.

But are these generated values of adequate statistical quality? We compared the generator against a Mersenne

Twister (MT) generator [7]. This generator is considered as an excellent benchmark for uniformly generated pseudorandom numbers. But it is much more complex to implement in SW or HW. The simple ABIST generator given in (1) was not as efficient. But by combining *three* generators of type (1), and proper weighting, we developed a better basis, called **GAU** (U –for uniform). This generator yields considerably shorter test application times than a Mersenne-based generator will.

The rationale behind the use of weighted test patterns is as follows: consider Fig. 1. For a path to be sensitized from input to output, proper controlling values must be applied to the inputs not included in the path. We are looking for ABIST patterns that exhibit statistical properties inductive to fault detection. It is known that proper weighting of input values, i.e. non-uniform distribution of ones and zeros, might enhance the efficiency of fault detection.

Thus, we devised various schemes for weighting the random patterns. These schemes employed the basic generator, with added features for weighing. Transitions on input pins were generated by so-called Single Input Change (SIC) vectors. From a basis vector, we toggle one bit at a time to obtain two-pattern test vectors. For an N-input circuit, 2N vectors are generated this way.

First, we define the GA1 generator: use of the GAU generator, and SIC vectors. This yields a uniform generator, which we will compare potential weighting heuristics against.

#### B. GA2: stuck-at test set weights

Weights are based on a deterministic test set (obtained from a commercial ATPG) for stuck-at faults. For each input pin, we counted the relative number of ones and zeros, and used these numbers as weights. Don't cares were counted in both the one and the zero set. Basis vectors are generated from (1), with r=16, and three sets of (C, I) values.

The rationale is that these patterns have contributed to controlling values on the inputs for efficient stuck-at fault detection, and may be promising as candidates for path delay fault testing as well.

### C. GA3: counting based weights

Weights are generated based on fault coverage measurements. The circuit is first fed from a pseudorandom generator of type **GA1**. Two counters (**S0**Ctr, **S1**Ctr) are associated with each input. These counters store the number of *path-delay faults* detected when the input has a stable value (**S0** or **S1**). When a predetermined number of basis patterns (10M) has been applied, the weighting factors can be computed for every input according to:

$$p0 = S0Ctr/(S1Ctr + S0Ctr), \qquad (2)$$

p1 = S1Ctr/(S1Ctr + S0Ctr).(3)

Subsequently, we rerun the fault simulator with these weights. This yields the generator GA3. This heuristic is

inspired from the fact that patterns with more weight on the HIGH value are productive for AND/NAND gate testing.

Notice that the counting is not activated before 100 basis patterns have been applied. This will leave out the easy-todetect faults. These faults will be detected anyway.

# D. GA4 and GA5

Two less successful schemes were **GA4** and **GA5**. **GA4**: similar to **GA2**, but weights were computed with "reseeding". One output pin at a time was considered when recording fault detection of a test vector. The weight set was recomputed once for every output pin.

GA5 is similar to GA4, but the sequence of seeds was optimized somehow.

# E. GA6: weights based on deterministic tests

Similar to GA2, except that weights are generated based on a deterministic test set for path-delay faults. First, a test set for the 20.000 longest paths of non-robust faults was generated. Then, for each pin, we computed the ratio of ones (zeros) that occurred in the complete test set. Don't cares were counted twice, both as 0 and 1. These values were used as weights throughout the experiment, similar to GA3 above.

### V. EXPERIMENTS

Armed with the tools and generators described above, several experimental runs were set up.

# A. Benchmark circuit properties

Circuits from the iscas'85 benchmark suite were engaged in the experiments presented below. Some information about each circuit is provided in this section.

The number of inputs (I), outputs (O), gates (G), logical levels (L) and physical paths (P) for each circuit is shown in Table 1 (the two last columns will be discussed in Section 5.2.1). The number of paths is much larger than the stuck-at fault set. Notice in particular the huge number of paths for benchmark c6288 (a 16x16 bit array multiplier).

The circuits c432 and c499 are omitted from most of the experiments because they contain XOR-gates, which are not currently supported by the ATPG. Another circuit that is omitted from most experiments is c6288. The large number of paths in this circuit causes problems for both the simulator and the ATPG. C17 is discarded for its simplicity. The rest of the benchmarks are used in all experiments.

| TABLE 1 |         |          |              |          |       |  |  |
|---------|---------|----------|--------------|----------|-------|--|--|
|         |         | DENCHWAR | K FROFERIJES |          |       |  |  |
| Circuit | I/O     | G/L      | Р            | UB       | PF    |  |  |
| c880    | 60/26   | 469/25   | 8642         | 16652    | 16652 |  |  |
| c1355   | 41/32   | 619/25   | 4173216      | 1110076  | 20000 |  |  |
| c1908   | 33/25   | 938/41   | 729057       | 355197   | 20000 |  |  |
| c2670   | 233/140 | 1566/33  | 679960       | 1306884  | 20000 |  |  |
| c3540   | 50/22   | 1741/48  | 28676671     | 12330969 | 20000 |  |  |
| c5315   | 178/123 | 2608/50  | 1341305      | 353300   | 20000 |  |  |
| c6288   | 32/32   | 2480/125 | 10**20       | -        | -     |  |  |
| c7552   | 207/108 | 3827/44  | 726494       | 282752   | 20000 |  |  |

# B. Experimental results

This section presents some statistical properties of the sequences generated by the different test generators described in Section 4. This information can be used as an aid in the interpretation of the results.

# EX1: Find the K-longest testable paths

The objective of this experiment was to find the longest non-robust testable paths of each benchmark circuit, which was done by using the ATPG tool described in Section 3. Provided unlimited time and memory, the tool would list all testable faults in each circuit. Unfortunately, some of the circuits contain a huge number of testable path-delay faults, and this would cause the size of the data structure inside the ATPG tool to blow up. In order to keep the whole path store inside computer memory, the size of the path store was set to a maximum of 1M. The ATPG was asked to find the 20.000 longest non-robust testable paths in each of the benchmarks. The number of such paths found (PF) for the different circuits are listed in the last column of Table 1, together with an upper bound (UB) [2] of all non-robust path delay faults. Since all circuits except c880 contain more than 20.000 testable paths, the ATPG had no problem finding 20.000 testable paths. It is reassuring to notice that the upper bound of c880 from [2] coincides with the number of paths we found.

# *EX2:* Determine no. of paths detected by unweighted pseudo-random stimuli

In this experiment test vectors were applied, and the number of detected path-delay faults and their length were logged. The test vectors were generated with an unweighted Mersenne Twister pseudo-random generator (GT1). The purpose was to obtain information about the number of paths of different lengths detected by a standard pseudorandom generator. Typical results are presented in Figure 2.



Fig 2. No. of detected paths vs. path length for various no. of test vectors

The longer paths are not as easily detected as the shorter paths. This is to be expected; long paths need more constraints than shorter paths.

In fact, we found that for all circuits, a randomly selected physical path is longer than the average length of the paths detected within 4M test vectors. The average physical path was from 9.8% to 72 % longer. Clearly, UDB patterns are not effective for detecting the longest PDFs.

#### EX3: Comparison of GA1 - GA5

In this experiment the performances of GA1 - GA5 were evaluated. The experiment exploited the fault simulator described in Section 2. *10M* test patterns were simulated for each circuit and generator. Each simulation run was repeated 10 times with different seeds in order to cover statistical variations. Table 2 presents the average number of detected faults over 10 trials after *10M* applied test vectors.

 TABLE 2

 Detected fault is after 10M addition test vector

| DETECTED FAULTS AFTER <i>10M</i> APPLIED TEST VECTORS |         |         |         |         |         |  |  |  |
|-------------------------------------------------------|---------|---------|---------|---------|---------|--|--|--|
| Circuit                                               | GA1     | GA2     | GA3     | GA4     | GA5     |  |  |  |
| c880                                                  | 8714    | 16194   | 16550   | 16470   | 16473   |  |  |  |
| c1355                                                 | 1050139 | 1085021 | 1110297 | 1110264 | 1110258 |  |  |  |
| c1908                                                 | 269846  | 283665  | 349613  | 349579  | 349568  |  |  |  |
| c2670                                                 | 51739   | 85948   | 107711  | 102734  | 104141  |  |  |  |
| c3540                                                 | 588541  | 996001  | 1062718 | 1050384 | 1050579 |  |  |  |
| c5315                                                 | 173526  | 309498  | 339396  | 339122  | 339157  |  |  |  |
| c7552                                                 | 146754  | 185983  | 185687  | 185264  | 185383  |  |  |  |
| Sum                                                   | 2289259 | 2962310 | 3171972 | 3153817 | 3155559 |  |  |  |

The best result, i.e. the highest number of detected faults, is shown in bold in Table 2 for each circuit. The stimuli generator with the poorest performance is the unweighted pseudo-random generator GA1. This generator detected the fewest number of non-robust path delay faults in all tests. Generator GA2, which is a weighted pseudo-random generator with weights based on a deterministic test set for stuck-at faults, is somewhat better than GA1. The best generators are GA3, GA4, GA5 and GA6.

The performances of GA3, GA4 and GA5 do not differ by much, but the results point in favor of GA3, which detects most path-delay faults for all but one benchmark. GA3 is a weighted pseudo-random generator with weights based on the counting scheme described in Section 4.

We performed the same experiments with the MT as the basic pseudorandom generator. To summarize, the results were in general only slightly better than for the ABIST generator. For example, the equivalent of GA3 exhibited a *total improvement* (summed over all circuits) of 0.12% more detected path delay faults.

# *EX4: Weighted pseudo-random patterns to find the K-longest testable path-delay faults*

The purpose of this experiment was to find out if proper weighting of pseudorandom stimuli, based on K=20.000deterministic test patterns for path-delay faults, would yield more efficient path delay tests than using uniformly distributed patterns. The experiments were conducted as follows:

First, the K=20.000 longest testable path-delay faults were extracted for each circuit as described in EX1. For each detected path, the path number was stored in a file together with the corresponding path length and test vector. Weights for GA6 and GT6 were then extracted based on each test set as described in Section 4. Notice that generators labeled GTi refers to the use of Mersenne Twister random numbers, but with same heuristics as the corresponding GAi (i=1-6).

Prior to each simulation run a fault list with the 20.000 longest testable path delay faults was uploaded to the simulator. **10M** single-input-change test patterns were then applied to each circuit for each generator. Each simulation run was repeated 10 times with different seeds in order to cover statistical variations. Six different generators were used: GA1, GA3, GA6, GT1, GT3 and GT6.

The three generators GA1, GA3 and GA6 are using the exact same underlying accumulator based pseudo-random generator. GA3 and GA6 are weighted pseudo-random generators, and will be compared against GA1 (uniform

weights). The three generators GT1, GT3 and GT6 are using the exact same underlying MT pseudo-random generator. GT3 and GT6 are weighted pseudo-random generators, and will be compared against GT1 (uniform weights).



Fig 3. Typical curves of fault detection vs. no. of test vectors applied

#### *Two measures were recorded:*

*Fault coverage* in relation to the size (K) of the fault list. (K=20000 for all circuits except c880 which contains only 16652 non-robust testable paths).

*Test time speedup*, defined as the ratio:

 $R_{imp}(meth_x) = NTP(uniform)/NTP(meth_x),$  (4) where NTP represents the number of test patterns. The name of the stimuli generator is used as argument (meth\_x).

TABLE 3 FAULT COVERAGE, FC, OF BEST METHOD AFTER 10M APPLIED TEST VECTORS Circuit  $FC(GA_x)$ c880 99.3% (GA3) 100% (GA3) c1355 c1908 97.6% (GA3) c2670 67.9% (GA6) c3540 86.9% (GA6) c5315 96.7% (GA6) 99.8% (GA3) c7552 Average 92.6%

During simulation the fault coverage, FC, was sampled from time to time until *10M* test patterns had been applied. Figure 3 shows two typical curves of fault detection vs. no. of test vectors applied. The lowest curves represent unweighted stimuli, while the other curves are given for four different weighting schemes. The improvements are notable for GA3 and GA5.

Table 3 shows the fault coverage after *10M* test vectors. The numbers in the second column represent fault coverage achieved with the best generator of GA3 and GA6.

We observed that the GT methods are slightly better than the GA methods. Furthermore, 5 out of 7 circuits attain 97.6% fault coverage, or more. Two circuits exhibit inferior fault coverage, and need more test patterns or other methods of path-delay fault detection.

As mentioned, similar experiments were carried out with the MT as the basic pseudorandom generator, in order to check possible improvement when using a more authoritative pseudorandom generator. These generators are called GT1-GT6. The MT resulted only in slight improvements. The average fault coverage increased from 92.6% to 93.6%.

The standard deviation of the sample fault coverage after 10M applied test patterns over the 10 trials was also computed. It varied from 0% to 1.5%. Thus, the seed value does not influence the outcome much.

#### C. Test time speedup

One important goal in testing is the ability to obtain a desired test quality for less cost. In our case, test time, i.e. no. of test vectors to be applied for a given test quality, should be kept at a minimum. In order to measure the speedup of a weighted generator over that of a uniformly distributed pseudo-random generator, one can compare the number of test vectors needed in order to achieve the same fault coverage. The target coverage in our case was set to the fault coverage attained with the *unweighted generator* after application of *10M* stimuli. The improvement factors of the best-weighted generator over uniformly distributed stimuli, defined in (4), are listed in Table 4.

 TABLE 4

 TIME SPEEDUP OF BEST METHOD OVER UNIFORMLY DISTRIBUTED STIMULI

| Circuit | $R_{imp}(GA_x)$ | $R_{imp}(GT_x)$ |
|---------|-----------------|-----------------|
| c880    | 11.9            | 15.1            |
| c1355   | 1.5             | 2.7             |
| c1908   | 8.0             | 10.8            |
| c2670   | 10.7            | 14.3            |
| c3540   | 7.1             | 9.1             |
| c5315   | 4.7             | 7.0             |
| c7552   | 1.0             | 1.0             |

We observe time speedups from nothing to a factor 11.9 (GA) or 15.1 (GT)! However, it is unfortunately not possible to devise an a priori metric that may predict speedup. But given the potential of substantial savings in

test time, and thus savings of test cost, it can be recommended to experiment with GA3 and GA6 for a newly designed circuit, and use this method whenever beneficial.

#### 6. CONCLUSION

A system for analyzing circuits with respect to their pathdelay fault testability has been presented. It includes a pathdelay fault simulator, and an ATPG for path-delay faults combined into a test tool. This tool was used to evaluate the performance of different test vector generators for various BIST arrangements. The test generators exploit weighted pseudo-random stimuli generation, based on arithmetic BIST principles. We did find useful heuristics that improve path-delay fault efficiency in terms of test time. We showed that weighted ABIST stimuli are productive for detecting the K-longest path-delay faults for many circuits. On the average, we obtained fault coverage of 92.6% for the 20.000 longest paths on a subset of iscas'85 circuits. We observed time speedups from nothing to a factor 12 with the accumulator based stimuli generator, making it well worth the effort of experimenting with such methods for potential high quality path-delay fault testing. However, it should be noted that our methods do not always give significant improvements, and are not generally applicable.

Future work would involve using the simulator and the ATPG to create better generators based upon knowledge about the structure of the circuit. We might also investigate the influence the number of longest paths will have on the test quality obtainable.

#### REFERENCES

- M. L. Bushnell and V. D. Agrawal, *Essentials of electronic testing for* digital, memory, and mixed signal VLSI circuits", Kluwer Academic, New York, 2002.
- [2] K. T. Cheng and H. C. Chen, "Delay testing for non-robust untestable circuits", *Proc. of the International Test Conf.*, 1993, pp. 954–961.
- [3] H. Fujiwara and T. Shimono, "On the acceleration of test generation algorithms", *IEEE Trans. on Computers*, C-32(12), 1983, pp. 1137-1144.
- [4] O. Gjermundnes and E. J. Aas. "Efficient stimuli generators for detection of path delay faults", *Proc. of the 48th Midwest Symp. on Circuits and Systems*, 2005, pp. 1709–1712.
- [5] Ø. Gjermundnes, "Exploiting Arithmetic Built-In Self-Test Techniques for Path Delay Fault Testing". *Doctoral thesis*, 2006, Norwegian University of Science and Technology, ISBN 82-471-8257-2.
- [6] W. Kunz and D.K. Pradhan, "Recursive learning a new implication technique for efficient solutions to cad problems – test, verification, and optimisation", *IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems*, 13(9), 1994, pp. 1143-1158.
- [7] M. Matsumoto and T. Nishimura, "Mersenne twister: a 623dimensionally equidistributed uniform pseudo-random number generator", ACM Transactions on Modeling and Computer Simulation, 8(1), 1998, pp. 3–30.
- [8] J. Rajski and J. Tyszer, Arithmetic built-in self-test for embedded systems. Prentice Hall, Upper Saddle River, N.J., 1998.
- [9] G. L. Smith. "Model for delay faults based upon paths", Proc. of the International Test Conf., 1985, pp. 342–349.
- [10] A. Ströle and H.-J. Wunderlich, "TESTCHIP: A chip for weighted random pattern generation, evaluation, and test control", *IEEE Journal of Solid State Circuits*, Vol. 26, No. 7, 1991, pp. 1056-1063.
- [11] A. Virazel et al., "Delay fault testing. Choosing between random sic and random mic test sequences", *Journal of Electronic Testing – Theory and Applications*, Vol. 17, No. 3/4, 2001, pp. 233-241.
- [12] Q. Wangqi and D. M. H. Walker, "An efficient algorithm for finding the K longest testable paths through each gate in a combinational circuit", *Proc. of the International Test Conf.*, Vol. 1, 2003, pp. 592– 601

# Programming Calculations in Many-Dimensional Boolean Space

Arkadij Zakrevskij

Abstract — The set of macro operations POBS over Boolean  $2^n$ -component vectors is offered, which essentially facilitates programming calculations in many-dimensional Boolean space. The application of that set is illustrated by examples of the analysis of partial Boolean functions on a monotony and presence of functional regularities, solving problems of sequential composition and decomposition. An important role is played by operations of interaction between adjacent units of the space.

*Index Terms* — Boolean space, programming combinatorial problems, efficient macro operations.

#### I. INTRODUCTION

A set composed of  $2^n$  Boolean *n*-component vectors is called the Boolean *n*-dimensional space. The relation of neighborhood is defined there – two vectors are called *adjacent*, if they differ by values exactly in one component. This relation can be represented by a graph, where nodes correspond to elements of the Boolean space, and edges join nodes corresponding to adjacent elements. Such graphs are widely used in the educational literature for the description of methods of Boolean functions minimization and solution of other logical design tasks. However, already at n>5 the graph image becomes too complicated and inconvenient for practical usage which is illustrated by Fig. 1.

More acceptable from the programming point of view is the representation of *n*-dimensional Boolean space as a Boolean  $2^n$ -vector, i.e. a vector with  $2^n$  components corresponding to elements of the space. These components are numbered starting with zero: component with number *k* corresponds to an element of the considered Boolean space which represents *n*-component Boolean vector specifying the binary code of number *k*. Assigning to components of  $2^n$ -vector the values from set  $\{0, 1\}$ , it is possible to set any

Arkadij Zakrevskij is with the United Institute of Informatics Problems of the National Academy of Sciences of Belarus, 220023, Minsk; e-mail: zakr@newman. bas-net. by.

Boolean function of n variables. For example, the Boolean vector

defines a Boolean function f of six variables  $x_1$ ,  $x_2$ ,  $x_3$ ,  $x_4$ ,  $x_5$ ,  $x_6$ , receiving the value 1 on the following sets of their values: 000000, 010100, 011111, 100000, 110010, 1100011, 111111.

For convenience of visual perception the vector f is divided into eight fragments corresponding to intervals of the Boolean space with internal variables  $x_4$ ,  $x_5$ ,  $x_6$ . These fragments represent coefficients of disjunctive Shannon decomposition of the function f by variables  $x_1$ ,  $x_2$ ,  $x_3$ .

Boolean  $2^n$ -vectors serve as main objects of conversions performed at solution of manifold logic combinatorial tasks, which arise at design of discrete devices and developing systems of artificial intelligence [1]. With the purpose of raising the efficiency of their programming a basic set of macro operations over such vectors is offered in this paper. It is called POBS (Parallel Operations in Boolean Space).

As experience shows, the set POBS appears rather efficient, allowing fast operating on a modern PC with long Boolean vectors representing arbitrary Boolean functions of many variables, up to 27 including. A row presenting such a vector (containing  $2^{27} = 134217728$  characters) would need a paper strip of length more than 250 kilometers.

# II. COMPONENT-WISE OPERATIONS OVER BOOLEAN VECTORS

The elementary operations of the set POBS are component-wise operations: the operation of inverting over one vector and arbitrary two-place Boolean operations over two vectors of the same size. We illustrate them by the following examples.

Designate as g and h the Boolean vectors representing Boolean functions g(x) and h(x), where  $x = (x_1, x_2, x_3, x_4, x_5)$ . Let

 $g = 10001100\ 00100010\ 11100001\ 00101010$ ,

 $h = 00110010 \ 10001111 \ 01100011 \ 01100011.$ Then

 $\overline{g} = 01110011 11011101 00011110 11010101,$ 

 $\boldsymbol{g} \lor \boldsymbol{h} = 10111110\ 10101111\ 11100011\ 01101011$ ,

 $\boldsymbol{g} \wedge \boldsymbol{h} = 00000000\ 00000010\ 01100001\ 00100010$ 

 $g \oplus h = 10111110 \ 10101101 \ 10000010 \ 01001001,$  etc.

Manuscript received February 20, 2008.

This work was supported in part by the Belarusian Republican Fond of Fundamental Researches (Project Φ07MC-034).



Fig. 1. Six-dimensional cube

More complicated is the operation of *permutation of arguments* of a Boolean function, which is presented by a Boolean  $2^n$ -vector. It is defined by a permutation on the set of variable numbers and results in the appropriate permutation of components of the vector.

For example, as a result of permutation of numbers (4, 2, 1, 3), the variables of the set  $\mathbf{x} = (x_1, x_2, x_3, x_4)$  will rearrange in a new sequence  $(x_4, x_2, x_1, x_3)$ . It leads to the appropriate permutation of components of vector  $\mathbf{f}$ , representing the Boolean function  $f(\mathbf{x})$ . The component  $f_k$  is relocated in place with number i, if the binary code  $\mathbf{k}$  of number k represents the result of multiplying the permutation matrix  $\mathbf{P}$  by the binary code  $\mathbf{i}$  of number i. In other words, the vector  $\mathbf{k}$  is equal to the component-wise disjunction of columns of matrix  $\mathbf{P}$ , marked with ones in vector  $\mathbf{i}$ . It is illustrated below by the example of substitution of the component  $f_{13}$  by  $f_{14}$ :

|   | Р |   |   |   | i   | k |    |
|---|---|---|---|---|-----|---|----|
| 1 | 0 | 0 | 0 | 1 | 1   |   | 1  |
| 2 | 0 | 1 | 0 | 0 | 1   |   | 1  |
| 3 | 1 | 0 | 0 | 0 | × 0 | = | 1  |
| 4 | 0 | 0 | 1 | 0 | 1   |   | 0  |
|   | 1 | 2 | 3 | 4 | 13  |   | 14 |

Thus, permutation (4, 2, 1, 3) on the set of components of vector x results in the following permutation on the set of components of vector f:

$$(0, 8, 1, 9, 4, 12, 5, 13, 2, 10, 3, 11, 6, 14, 7, 15).$$

Let  $f = 0111 \ 1010 \ 0100 \ 1001$ . Then the considered permutation on the set of components of this vector results in its new value  $f^*=0100 \ 1110 \ 1110 \ 0001$ , corresponding to the new order  $(x_4, x_2, x_1, x_3)$  of arguments.

#### III. OPERATIONS OVER ADJACENT ELEMENTS OF BOOLEAN SPACE

The set POBS contains also operations of interaction between different components of one Boolean  $2^n$ -vector f, specifying a function  $f(x) = f(x_1, x_2, ..., x_n)$ . Most important are the *operations over adjacent elements* of Boolean space. Boolean *n*-vectors are adjacent if they differ by their values only in one component. For representation of Boolean space as a many-dimensional Boolean cube such vectors are presented by nodes joint with edges.

Usage of such structure allows to perform parallel logical calculations and by that to accelerate them considerably. This idea is not new. So, a special supplement to the

universal computer, called L-machine, was developed in 1961-1962 in the Siberian Physical-Technical Institute, which essentially accelerates the process of solving logical design problems [2]. The basic idea consists in executing of distributed in a Boolean space logic operations on a series of information fields playing the role of some registers. The fields are structurally similar to ten-dimensional Boolean cubes and allow to represent immediately Boolean functions of ten variables and complete component-wise operations over them. One of the fields is named main, and is used for conversions limited by one function. The circuit implementation of the main field provides simultaneous execution of any two-place Boolean operation within of each from 512 couples of elements adjacent by some selected variable.

The same idea was put in the basis of the commutative computer offered by W.D. Hillis in 1978 and designed by Thinking Machine Corporation in 1985. This computer provides information exchange in a multiprocessor computer, which components are immediately connected with each other similarly to nodes of the *n*-dimensional Boolean cube. Transfer of a portion of information between any two processors in such computer takes no more than *n* time clicks. Several commutative computers where created and used by the researches working in the field of artificial intelligence to solve the logic inference problems [3].

When the number of arguments *n* exceeds 5 it is convenient to set vector *f* by a Boolean matrix of size  $2^5 \times 2^{n-5}$ , representing its 32-element rows by words in the computer memory (what is adequate for the majority of modern computers). In this case any two units of the space *M* adjacent by the variable  $x_k$  belong to the same word if k < 6, and belong to different words otherwise, that should be taken into account at programming. Let's remark, that in presented below examples it is more convenient for visual perception to use matrices by the size  $2^4 \times 2^{n-4}$ .

We introduce the following elementary operations of conversion of a Boolean function  $f(\mathbf{x}) = f(x_1, x_2, ..., x_n)$  presented with vector f, by interaction of adjacent units:

f - k – assignment of value 0 to argument  $x_k$ , f + k – assignment of value 1 to argument  $x_k$ .

These operations are illustrated by the following matrices, splitting the set of elements of vector f into two parts corresponding to different values of the selected variable  $x_k$  and called below conditionally *left* (marked bold font for  $x_4$ ) and *right*:



At execution of the operation f - k both elements in each couple adjacent by the variable  $x_k$  accept value from the left part, at execution of the operation f + k – from the right part. If n < 6 (in the given example if n < 5) this operation is implemented by means of appropriate shift of columns in the matrix, otherwise – of rows.

By way of generalization we shall enter the following operations, in which instead of the scalar k the *n*-component Boolean vector u is used:

f - u – assignment of value 0 to all arguments  $x_k$ , which correspond to 1-components (having value 1)  $u_k$  of vector u,

f + u – assignment of value 1 to all arguments  $x_k$ , which correspond to 1-components  $u_k$  of vector u.

The first of these operations can be interpreted as obtaining the initial coefficient  $f_0$  of disjunctive Shannon decomposition of function f by all variables of the set u (in this case all variables receive value 0), the second – as obtaining the finite coefficient  $f_1$  (when all variables receive value 1).

For example, if n = 8 and u = 01100010, the operation f - u is equivalent to the composition ((f - 2) - 3) - 7, and the operation f + u is equivalent to the composition ((f + 2) + 3) + 7. Thus the same value is assigned to variables  $x_2$ ,  $x_3$  and  $x_7$ .

Let's introduce also the operation of symmetrization S f \* k, at which execution the both adjacent by variable  $x_k$  elements in each couple gain an identical value, as a result of application of the operator \*, selected from the set { $\lor, \land, \rightarrow, \oplus, \ldots$ }, to the initial values of the elements of a couple. This operation also is reduced to the surveyed above, as

$$Sf * k = (f - k) * (f + k).$$

For example,

|           | f                                                                          | $Sf \oplus 1$                                                                                       |             |
|-----------|----------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------|-------------|
|           | 0110110101011110<br>0010010000010110<br>1100101001110001<br>01000101110001 | 1010011100101111<br>011000011100101111<br>1010011100101111<br>01100011100101111<br>0110000111000101 | 4<br>5<br>6 |
| <br> <br> | 0111111101111111<br>0011011000110110<br>111110111111                       | 0000110000001100<br>0000000000000000<br>11000000                                                    |             |
| 12        | $\mathbf{S} \boldsymbol{f} \lor 3$                                         | $Sf \wedge 6$                                                                                       |             |

In particular, operation  $S f \oplus k$  represents the well known operation of derivation of a Boolean function by the variable  $x_k$ .

The operation of symmetrizing S f \* k also is generalized by usage of a vector u instead of a scalar k. It is represented by expression S f \* u and is equivalent to the sequence of operations  $S f * k_i$ , in which scalars  $k_i$  represent the numbers of 1-components of vector u. In this case operator \* is selected from the set  $\{\lor, \land, \oplus\}$ .

For example, if u = 010011, the operation  $S f \wedge u$  is equivalent to the expression

$$S(S(Sf \land 2) \land 5) \land 6.$$

It can be interpreted in such a way: all elements of the fragment of vector f, corresponding to conjunction  $\overline{x_1} \ \overline{x_3} \ \overline{x_4}$ , gain value 0, if even one of them was equal to 0.

#### IV. OPERATIONS OF CONVERSION OF DIMENSION OF BOOLEAN VECTORS

Such operations allow to implement interaction between Boolean vectors of different dimension.

Consider two Boolean vectors:

*n*-vector **u** with *k* ones marking some *k* variables from the set  $\mathbf{x} = (x_1, x_2, ..., x_n)$ ,

 $2^k$ -vector **h**, specifying the Boolean function *h* of the marked variables.

We introduce into set POBS the operation  $h \times u$  of transfer of the function h into a fragment of the Boolean space of n variables, which corresponds to the conjunction of inversions of not marked in u variables. By that all elements of remaining fragments gain value 0.

Let, for example, n = 5, u = 01101 and h = 10010011.

Then a Boolean 2<sup>5</sup>-vector is created, in which the fragment corresponding to the conjunction  $\overline{x_1} \ \overline{x_4}$  is selected (marked with bold):

00000000 0000000 0000000 0000000,

and the vector h is inscribed in it. As a result, the following vector is obtained:

*h*×*u*= 10000100 00001100 0000000 00000000

The operation f: u is introduced by analogy: it realizes a return carry of the information from the selected fragment of  $2^n$ -vector to the vector **h**, specifying obtained by that Boolean function h of k variables. So, if u = 01010 and

 $f = 00110010 \ 11100000 \ 11100110 \ 00011101$ 

then the fragment is found, which corresponds to conjunction  $\overline{x_1} \ \overline{x_3} \ \overline{x_5}$ 

**0**0**1**10010 **1**1**1**00000 11100110 00011101

and the information contained in it is used for build-up of the required vector:

$$h = f : u = 0111$$
.

If it is known, that function f represented by vector f depends only on variables of the set u (the rest variables appear fictitious), then by means of the operation f: u the latter are deleted from the function and the result is represented by vector h.

#### V. OPERATIONS OVER PARTIAL BOOLEAN FUNCTIONS

Let's pass to reviewing *partial* (not completely defined) Boolean functions widely used when solving problems of logical design.

Any arbitrary partial Boolean function f of n variables can be represented by a corresponding  $2^n$ -component ternary vector  $f^-$ . For programming it is more convenient to set it by a couple of Boolean vectors  $f^{-1}$  and  $f^{-0}$ , also  $2^n$ component. By that ones in the vector  $f^{-1}$  mark components, on which the function f receives value 1, and in the vector  $f^{-0}$ they mark components, where the function is equal to zero. In other words, the vectors  $f^{-1}$  and  $f^{-0}$  represent accordingly characteristic sets  $M^{-1}$  and  $M^{-0}$  of the function f.

Let, for example,

$$f^{-} = 1 - 001010 \ 011 - 01 - 1.$$
  
 $f^{1} = 10001010 \ 011000101,$   
 $f^{0} = 00110101 \ 100001000.$ 

In this case the operation of assignment of value 0 to argument  $x_k$  will be represented by the couple of operations  $f^{1}-k$ ,  $f^{0}-k$ , and that of value 1 – by the couple  $f^{1}+k$ ,  $f^{0}+k$ .

Similarly to f, the Boolean vector u also can appear ternary, for example when representing some elementary conjunction Then it also should be replaced by a pair of Boolean vectors  $u^1$  and  $u^0$ , in this case *n*-component. For example, in the operation of disjunctive decomposition of a partial Boolean function f by all variables of the united set  $u^1 \cup u^0$  the obtaining of the coefficient at that conjunction is carried out by the series of operations

$$f^{1}-u^{0}, f^{0}-u^{0}, f^{1}+u^{1}, f^{0}+u^{1}$$

#### VI. PROGRAMMING IN BASIS POBS

Let's show some examples of using a software technology in basis POBS to solve tasks of the theory of Boolean functions.

#### *A.*. *Testing a partial Boolean function on monotony*

Consider a partial Boolean function  $f(\mathbf{x}) = (x_1, x_2, ..., x_n)$ , given by two sets of argument values collections: by the set  $M^1$ , on which it receives value 1, and the set  $M^0$ , where it receives value 0. Let's term it as *monotone* or, in particular, a *positive* function, if for any couple of collections  $\mathbf{p} \in M^1$ and  $\mathbf{q} \in M^0$  condition  $\mathbf{p} > \mathbf{q}$  (vector  $\mathbf{p}$  is greater than vector  $\mathbf{q}$ ) is satisfied, i.e. for any couple  $(p_i, q_i)$  of vector components  $p_i \ge q_i$  and at least for one couple  $p_i > q_i$ .

A simple method of checking the function for monotony could be applied, which is based on exhaustive search of all couples (p, q) and testing each of them on satisfying the condition p > q. However, with increase of the number of variables n and corresponding growth of power of sets  $M^1$  and  $M^0$  such method appears too labour-consuming. The offered below method using operations from the set POBS is more efficient.

Let's set the function f(x) by two Boolean  $2^n$ -vectors:  $f^1$  and  $f^0$ . The elements of set  $M^1$  are represented by ones in vector  $f^1$ , the elements of set  $M^0$  - by ones in vector  $f^0$ . Designate as  $M^*$  the set of such elements of Boolean space, each of which is greater than some element from set  $M^1$  or equals it, and present this set by vector  $f^*$ .

The affirmation 1. The function f(x) is monotone, if and only if  $f^*f^0 = 0$ .

The vector  $f^*$  can be found with the help of introduced above operations of the set POBS, by a sequence of *n* steps. At first we receive the vector  $f^2 = (f^1 - 1) \lor f^1$ , presenting set  $M^1$ , supplemented with elements of Boolean space, adjacent "from above" to some elements from  $M^1$  by variable  $x_1$ . Then the obtained set is expanded similarly by the next variable  $x_2$ :  $f^3 = (f^2 - 1) \lor f^2$ . After iterating this operation by all remaining variables we receive the required vector  $f^{n+1} = f^*$ .

Let, for example, n = 5,

$$f^{1} = 00010000 \ 00100000 \ 00000001 \ 00001010,$$
  
 $f^{0} = 11000010 \ 00000100 \ 10100000 \ 10000000.$ 

In this case the process of the sequential extension of set  $M_1$ , resulting in obtaining vector f \* representing set  $M^*$ , can be demonstrated by the following sequence of vectors obtained on the next steps:

 $f^2 = 00010000\ 0010000\ 00010001\ 00101010$ ,

$$\begin{aligned} f^3 &= 00010000\ 00110000\ 00010001\ 00111011\ , \\ f^4 &= 00010001\ 00110011\ 00010001\ 00111011\ , \\ f^5 &= 00010001\ 00110011\ 00010001\ 00111011\ , \\ f^6 &= 00010001\ 00110011\ 00010001\ 00111111 = f^* \end{aligned}$$

Component-wise conjunction of the obtained vector with vector

# $11000010\ 00000100\ 10100000\ 10000000 = f^{0}$

is equal to zero  $(f * \land f^0 = 0)$ , therefore, the considered Boolean function is monotone.

#### B. Search for functional regularities

An important role in modern information technologies is played by procedures of data mining, i.e. extraction of knowledge from the dataflow, search of regularities allowing discovering right decisions at solution of the intellectual tasks [4]. A special but important case of regularities is considered below, namely functional regularities, often encountered in natural sciences.

The following formal task was considered in [5]. We assume that a set of objects is preset, each of which is characterized by some combination of n binary values (indicating if the corresponding signs are present or not present). The question is, whether it is possible always to define uniquely the value of some selected sign, if the values of remaining ones are known? And if possible, how to define it?

The initial information in this task can be presented by a collection R of some elements in *n*-dimensional Boolean space  $M = \{0, 1\}^n$  of signs. These elements set known objects and can be considered as the roots of some Boolean equation F = 1, where  $\mathbf{x} = (x_1, x_2, ..., x_n)$ .

This equation is called solvable in regard to some variable, if this variable can be presented by a Boolean function of the remaining variables, which is defined on the set R [5]. We consider the task of detection of such variables in the equation F = 1 and finding the appropriate functions.

**The affirmation 2.** The necessary and sufficient condition of solvability of the equation F = 1 in regard to the variable  $x_i$  is the absence in the set *R* of couples of collections, adjacent by  $x_i$ .

*Proof by contradiction* (by the rule *modus tollens*): if there exists such a couple, the variable  $x_i$  receives in it different values on identical sets of values of remaining variables, which contradicts the definition of the functional relation.

Let's designate through  $f(\mathbf{x})$  the characteristic Boolean function of set R, where  $f(q_j) = 1$  if  $q_j \in R$  and  $f(q_j) = 0$  if  $\neg (q_j \in R)$ . Through  $f(x_i = 0)$  and  $f(x_i = 1)$  we denote the result of replacement in the function  $f(\mathbf{x})$  the variable  $x_i$  with constant 0 or 1, accordingly.

**The affirmation 3.** The equation F = 1 is solvable in regard to variable  $x_i$ , if and only if  $f(x_i = 0) \land f(x_i = 1) = 0$ .

This affirmation allows to apply introduced above vector operations f-i and f+i for checking the equation for solvability in regard to variable  $x_i$ . Affirmation 3 can be

reformulated in terms of these operations in the following way: the necessary and sufficient condition of solvability of the equation F = 1 in regard to variable  $x_i$  is the satisfaction of the relation

$$(f-i) \wedge (f+i) = \mathbf{0}$$

In case if this condition is satisfied there arises a task of finding an appropriate Boolean function, which generally appears to be partial, and its optimal determination. The optimization can consist both in minimization of the number of arguments of the function, and in simplification of its algebraic representation, for example in DNF.

Let's consider the first of these tasks. It is similar to the task of minimization of unconditional diagnostic test and can be solved by the same method. Some argument  $x_k$  can be defined as fictitious, if after its deleting the equation remains solvable in regard to the variable  $x_i$ . The operation of deleting the argument  $x_k$  can be presented as the extension of set R by this variable, i.e. as the following conversion of its characteristic function f

$$f := f(x_k = 0) \lor f(x_k = 1).$$

In terms of introduced above vector operations it is defined as  $S f \lor k$ , whence follows

The affirmation 4. The argument  $x_k$  can be deleted from the set of arguments of the variable  $x_i$  defined as a function of remaining variables, if and only if

$$((\mathbf{S}\mathbf{f}\vee k)-i)\wedge((\mathbf{S}\mathbf{f}\vee k)+i)=\mathbf{0}.$$

### C. Sequential composition of Boolean functions

Let's consider the following task. The set of arguments  $x=(x_1, x_2, ..., x_n)$  is divided by the Boolean *n*-vectors u, w and v into three not intersected subsets u, w and  $v : x = u \cup w \cup v$ . Two Boolean functions h(u, w) and g(x, w, v), presented with corresponding Boolean vectors h and g are given also. It is required to calculate their composition under condition x = h(u, w) and to present the obtained Boolean function f(x) by a  $2^n$ -vector f.

Such composition called non-disjoint sequential twoblock, is illustrated by an example on fig. 2, where n = 6and the sets  $u = (x_1, x_2)$ ,  $w = (x_3, x_4)$  and  $v = (x_5, x_6)$  are presented by six-dimensional Boolean vectors u = 110000, w = 001100 and v = 000011.



Fig. 2. An example of non-disjoint sequential two-block composition

Let's assume, that the functions h(u, w) and g(x, w, v) are preset by corresponding vectors:

For convenience, the vector g is broken in two halves, specifying values of the function g(x, w, v) at values 0 and 1 of binary variable x.

We present the Boolean space of variables x = (u, w, v) as follows:

|       |      |      |      |      | W 1        |
|-------|------|------|------|------|------------|
|       |      |      |      |      | $W_2^{-1}$ |
|       |      |      |      |      | $V_1$      |
|       |      |      |      |      | $v_2$      |
|       | 0000 | 0000 | 0000 | 0000 |            |
|       | 0000 | 0000 | 0000 | 0000 |            |
|       | 0000 | 0000 | 0000 | 0000 |            |
|       | 0000 | 0000 | 0000 | 0000 |            |
| $u_2$ |      |      |      |      |            |
| $u_1$ |      |      |      |      |            |

Then we sequentially map onto this space functions g,  $h_0$  and  $h_1$ , introducing thus additional fictitious variables from the sets v and u and representing results by  $2^n$ -vectors a, b and c:

$$h \times (u, w) - v = a$$

 1000
 1000
 0000
 1000
 1111
 1111
 0000
 1111

 0000
 0000
 1000
 0000
 0000
 0000
 1111
 0000
 1111
 0000
 1111
 0000
 1111
 0000
 1111
 0000
 1111
 0000
 1111
 0000
 1111
 0000
 0000
 1000
 0000
 1111
 1111
 0000
 0000
 0000
 1111
 1111
 0000
 0000

```
\boldsymbol{g}_0 \times (\boldsymbol{w}, \boldsymbol{v}) - \boldsymbol{u} = \boldsymbol{b}
```

| 0011 | 0100 | 1100 | 1001 | 0011 | 0100 | 1100 | 1001 |
|------|------|------|------|------|------|------|------|
| 0000 | 0000 | 0000 | 0000 | 0011 | 0100 | 1100 | 1001 |
| 0000 | 0000 | 0000 | 0000 | 0011 | 0100 | 1100 | 1001 |
| 0000 | 0000 | 0000 | 0000 | 0011 | 0100 | 1100 | 1001 |

```
g_1 \times (w, v) - u = c
```

 1010
 0101
 1010
 1011
 1010
 0101
 1010
 1011

 0000
 0000
 0000
 0000
 1010
 0101
 1010
 1011

 0000
 0000
 0000
 1010
 0101
 1010
 1011

 0000
 0000
 0000
 1010
 0101
 1010
 1011

 0000
 0000
 0000
 1010
 0101
 1010
 1011

In summary we discover the vector f, representing the required composition of functions h(u, w) and g(x, w, v):

| 0011 | 0100 | 0000 | 1001 |                               |
|------|------|------|------|-------------------------------|
| 0000 | 0000 | 1100 | 0000 | a b                           |
| 0000 | 0100 | 1100 | 0000 |                               |
| 0011 | 0100 | 0000 | 0000 |                               |
|      |      | 1010 |      |                               |
| 0000 | 0000 | 1010 | 0000 | _                             |
| 1010 | 0101 | 0000 | 1011 | a c                           |
| 1010 | 0000 | 0000 | 1011 |                               |
| 0000 | 0000 | 1010 | 1011 |                               |
|      |      |      |      |                               |
| 0011 | 0100 | 1010 | 1001 |                               |
| 1010 | 0101 | 1100 | 1011 | $f = a b \vee \overline{a} c$ |
| 1010 | 0100 | 1100 | 1011 |                               |
| 0011 | 0100 | 1010 | 1011 |                               |

# D. Testing a partial Boolean function on decomposability at a given partition on the set of arguments

Suppose that a partial Boolean function  $f(\mathbf{x})$  of n variables, represented by a ternary vector  $f^-$  is known. It is required to test it on decomposability at a given partition u/v of the set  $\mathbf{x}$ , i.e. to find out, whether there exist such functions h(u, w) and g(x, w, v) of smaller number of variables, that  $f(\mathbf{x}) = g(h(u, w), w, v)$ , where  $w = \mathbf{x} \setminus (u \cup v)$ .

At the positive answer to this question the logic circuit implementing function f(x) can be simplified (for example, at logical synthesis in the basis of units LUT (look up tables), implementing functions of restricted number of variables).

The necessary and sufficient condition of decomposability of a completely defined Boolean function f(x) at a partition u/v, which should be fulfilled for each coefficient  $f_i(u, v)$  of disjunctive Shannon decomposition of the function f(x) by variables of the set w is the following. Each of the coefficients of alike decomposition of these coefficients by variables of the set u should receive no more than two different values.

The coefficients  $f_i(u, v)$  of disjunctive Shannon decomposition of a partial Boolean function f(x) by variables of set w are represented by fragments  $T_i$  - ternary matrices, which rows correspond to different values of vector u, and columns correspond to different values of vector v. The corresponding components of the ternary vector  $f^-$  serve as elements of fragments. The condition of decomposability of the function f(x) at the partition u/v can be formulated now as follows: for each coefficient  $f_i(u, v)$ such predetermination of the appropriate matrix  $T_i$  is possible (replacement of values "-" by 0 or 1), at which its rows will receive no more than two different values.

It was shown in [6], that the check of this condition is reduced to finding out if the graph of orthogonality of rows of each matrix  $T_i$  is bichromatic. A heuristic algorithm was suggested there, which guarantees obtaining exact solutions under condition of connectivity of the considered graphs (this condition is usually fulfilled). The ternary vector  $f^-$  is represented in it by an appropriate couple of Boolean vectors  $f^1$  and  $f^0$ , and the operations over the neighbors are effectively used providing simultaneous testing of all  $2^{|w|}$ fragments  $T_i$ .

The algorithm tries to divide the set of rows in each fragment into two classes A and B of mutually compatible rows. A sequence of conversions is implemented over the initial vectors  $f^1$  and  $f^0$ , which results are represented by Boolean  $2^n$ -vectors  $a^1$  and  $a^0$ .

The algorithm is iterated. The first iteration starts with build-up of the class A by inclosing in it the first row of the fragment. This operation is reduced to a sequence of substitutions of value 0 for the variables from set u.

$$a^{0} := f^{0} - u$$
  
 $a^{1} := f^{1} - u$ 

Then in each fragment the rows orthogonal to the first one are found and marked with 1 in the Boolean vector  $\boldsymbol{b}$ .

$$\boldsymbol{b} := \mathrm{S}\left(\boldsymbol{h}^{0}\boldsymbol{f}^{1} \vee \boldsymbol{h}^{1}\boldsymbol{f}^{0}\right) \vee \boldsymbol{v}$$

The obtained sets constitute classes *B* and are checked for compatibility:

$$a^{0} := S(f^{0}b) \vee u$$
$$a^{1} := S(f^{1}b) \vee u$$

If by that  $a^{0}a^{1} \neq 0$ , some of the considered sets appear incompatible, whence follows, that the graph of orthogonality of rows of the corresponding fragment is not bichromatic and, therefore, the function  $f(\mathbf{x})$  is not decomposable at the partition u/v.

On the other hand, if  $a^{0}a^{1} = 0$ , the following iteration is implemented. The classes A are supplemented by rows, orthogonal by some of rows of classes B and are checked for compatibility. Then the classes B can be similarly extended, etc. The algorithm terminates after execution of a sufficient number of iterations.

#### REFERENCES

- Zakrevskij A.D Computation in Boolean spaces. In "Logical structure of scientific knowledge". *Moscow: Nauka*, 1965, pp. 292-310 (in Russian).
- [2] Zakrevskiy A.D. Machine for the solution of logical problems of the type of the synthesis of relay circuits. – Relay systems and finite automata. *Transl. proceedings., Burrough Corp.*, 1964. pp. 544-557.
- [3] W. Daniel Hillis. Connection machine. *Scientific American*, June 1987, Vol. 256, No 6.
- [4] Data mining and knowledge discovery approaches based on rule induction techniques (E. Triantaphyllou and G. Felici, Eds.). – Massive Computing Series, Springer, Heidelberg, Germany, 2006.
- [5] Zakrevskij A.D. About solvability of Boolean equations. -Proceedings of NAS of Belarus, 2007, Vol. 51, No 5, pp. 44-46 (in Russian).
- [6] Arkadij Zakrevskij. A new heuristic algorithm for sequential twoblock decomposition of Boolean functions. – *Proceedings of 3<sup>rd</sup> IFAC Workshop on Discrete Event System Design DESDes'06.* September 26-28, 2006, Rydzyna, Poland. University of Zielona Gora, pp. 13-17.

# Test Pattern Overlapping - a Promising Compression Method for Narrow Test Access Mechanism SOC Circuits

Ondřej Novák, Jiří Jeníček

Abstract – This paper describes research results obtained in the field of test pattern compression and decompression. We refer the hardware test pattern decompression system DvRESPIN built-in on a System on Chip, which uses test patterns compressed by the compressing algorithm called COMPAS. COMPAS reorders and compresses test patterns previously generated in an ATPG in such a way that they are well suited for decompression by the scan chains in the embedded tester cores. We report improvements that have been done recently on COMPAS. COMPAS algorithm has to manipulate with enormous amount of data when compressing test sets of large circuits and the CPU time grows rapidly with the growing number of test vectors. The CPU time problem was solved by using a test vector initial encoding by sparse vectors and by using a dynamic structure for storing the precalculated parameters of candidate vectors to be used in the near future algorithm loops for overlapping with the actual scan chain content. This arrangement allows the algorithm to skip unnecessary computations. The improvements cause that the CPU time grows approximately linearly with the size of the tested circuit. DyRESPIN uses a built-in processor for test control, the embedded RAM memory for storing both the compressed test vectors and the partial reconfiguration bit streams and the FPGA part of the chip for the wrapped cores implementation. The highly compressed test vectors are transferred from the memory to those selected cores that are reconfigured into the embedded tester cores. The patterns are decompressed within the internal scan chains of the embedded tester cores and they are simultaneously fed into the parallel scan chains of the cores under test with the help of the Test Access Mechanism (TAM) and standard wrappers. After having tested the first cores under test the TAM of the SoC is partially reconfigured with the help of the partial reconfiguration bitstreams stored in the RAM memory and the till now untested cores are tested by those cores that start to serve as embedded testers.

*Index Terms* — Circuit testing, Testing, Memory management

#### I. INTRODUCTION

Deterministic test spares testing time and the on chip hardware overhead is low. However the test sizes has

been pushing test costs up due to the necessity of using more powerful ATEs and if the test access mechanism (TAM) is narrow the test application time becomes to be critical, too. In order to minimize the data transfer through the TAM, *compacted* and *compressed* test sets are used. By the term *compact test set* is meant a test set, which is created in the automatic test pattern generator (ATPG) from test patterns by merging as many as possible patterns. An original test pattern usually detects one or more possible circuit faults and contains several don't care bits. The original patterns are merged in such a way that resulting patterns detect multiple faults and do not contain don't care bits while the test set fault coverage remains unchanged.

*Test data compression* is a non-intrusive method that can be used to compress the pre-computed test set to a much smaller test set, which is then stored in the ATE memory. An on-chip decoder is used to generate the original test set from the compressed one. Many contributions containing different decompression mechanisms were published; let us mention [1], [3], [5][7][18], [27], [34]. It is not straightforward to compare the compression methods because some authors demonstrate the efficiency on decompression of random resistant faults only and other authors compress and decompress the whole ATPG deterministic test sequence. The usefulness of a compressing algorithms and decompressing automaton is influenced not only by the compression ratio but also by the complexity of the decompressing automaton and by the computational complexity of the algorithm for finding the compressed test sequence.

Increasing number of transistors results in increasing ATPG computation time and memory consumption. Many of published test optimization techniques are dedicated to sequential test optimization. To handle time consuming test generation, it is often necessary to parallelize test generation process [35], [15], [36]. Concurrently generated ATPG output has to be than effectively compressed.

In this paper, we present results of our previous research done in the field of test pattern compression based on test pattern overlapping [16] and hardware decompression based

Manuscript received Fabtuary 6, 2008.

Ondřej Novák with the Technical University Liberec, Hálkova 6, 461 17 Liberec I, Czech Republic, E- mail: ondrej.novak@tul.cz.

Jiří Jeníček with the Technical University Liberec, Hálkova 6, 461 17 Liberec I, Czech Republic, E- mail: ondrej.novak@tul.cz.

Acknowledgment. The research was supported by the research grant of the grant IQS108040510 of the Czech Academy of Sciences.

on wrapper reconfiguration [26]. The results give us a possibility to construct a system that combines both of the mentioned methodologies.

### II. COMPAS - TEST PATTERN COMPRESSION TOOL

The main idea is to maximally overlap those patterns that are serially shifted into the scan chain. This approach was firstly described in [6]. The method uses an algorithm for finding contiguous and consecutive scan chain vectors for the actual scan chain vector. These vectors are checked whether they match with one or more remaining test patterns, which were previously generated and compacted with the help of some ATPG and which were not employed in the scan chain sequence yet. Similar approach was used in [33]. The compacted test vectors were reordered by a heuristic algorithm to attain maximal overlapping. A disadvantage of the mentioned methods is that they are either computationally complicated and thus they are not usable for large circuits or the obtained amount of test data stored in an ATE is greater than the data amount in other compression methods. We present an algorithm, which speeds up the computation by searching for the successors of given starting pattern (usually the all zero seed) and which improves the compression efficiency by fault simulation after every test pattern application. This algorithm uses test vectors with don't care bits instead of the compacted ATPG test vector test set, which enables us to combine test pattern compaction and compression to be well suited with the decompression in a scan chain. The algorithm is implemented in the COMPAS (COmpressed test PAttern Sequencer) software tool. It speeds up and improves the algorithm [26] by taking into account possible future conflicts between overlapping patterns, it uses more efficient pattern coding and it remembers information that could be useful in future algorithm loops. COMPAS is able to prepare test sequences for the most complex circuits in short time. COMPAS can be used also for preparation of test sequences of cores under test (CUT) that are designed according the IEEE 1500 standard [23]. Test data can be effectively decompressed with the RESPIN test architecture [7]. This architecture reuses scan chains of different cores for updating the tested core scan chain content. Latest version of the algorithm used in [26] has been further enhanced to lower CPU time and memory consumption.

#### III. MEMORY CONSUMPTION IMPROVEMENT

Uncompressed test data generated by an ATPG are stored as a plain text file, each fault corresponds to a single vector in form of sequences of '0', '1' and 'X' characters standing for log. 0, log. 1 and unspecified bit (hereafter DC bit, DCB or 'X').

This data organization allows many concurrent ATPG processes. Each ATPG can generate test vectors for small group of faults or for single fault. Outputs of all ATPGs are then merged into a single file.

Size of the file can be a problem; because data files are very large for larger circuits (e.g. b19 from ITC99

benchmark set has more than 2.5 GB uncompressed test data).

For optimal algorithm decision it is necessary to load all test data at once into a computer memory, so new method needs to be developed for storing the test data in a computer memory. Simple loading of the file is not possible for large circuits.

### A. New data encoding

One more stage of compression has to be performed instead of simple loading of text file into memory. First stage compresses plain text data from a file and stores them in a memory. Second stage uses compressed data from a memory to do pattern overlapping compression.

Three different encodings of test vectors are used in the program. Data produced by an ATPG are stored as a plain text: '0', '1' and 'X'. Each character is stored in on a hard drive as an 8bit char type. Large amount of uncompressed test vector data are encoded into two different forms when loaded into a computer memory, depending on which one consumes less memory.

The first encoding is a quite straightforward conversion of the eight bit character vector into two bits. By this encoding each 8 bit character is reduced to 2 bits, and 6 bits are saved (75% of memory).

The second encoding creates so called sparse vector, which means that only care bits and their positions are saved. 32 bit integer I used as a basic data type; one bit is used for actual value, and rest 31 bits are used to note the position. Scan chain with maximal length of  $2^{31}$  can be encoded this way. DC bits are not stored at all. As not only one byte but four are used to encode a single care bit, this compression method is useful only if total amount of care bits is lower than 25%, otherwise it consumes more memory than the original vector. The first method can certainly compress to 25% of the original, a vector to be encoded more effectively by using the sparse vector than by the first approach must have less than 6.25% care bits. However, Tab. 1 with the numbers of DCBs in benchmark circuits shows that it should not be a problem.

It is decided for each vector separately which method should be used. So it could be guaranteed that at least 75% of memory will be saved. For larger circuits often more than 95% can be saved by a proper encoding.

#### IV. RUNTIME IMPROVEMENT

# A. Algorithm description

At first, a Test Pattern List (TPL) together with the corresponding Undetected Fault List (UFL) is generated for the tested circuit. An ATPG tool that enables generating non-compacted test patterns has to be used. At least one three state test vector with bit values 0, 1 and X, where X means don't care value has to be generated for each considered fault. In this way we can distinguish, which pattern belongs to which fault.

TABLE 1: CARE BIT PERCENTAGE IN TEST DATA OF DIFFERENT BENCHMARK

| Circuit     | Gate count | Care bits[%] |
|-------------|------------|--------------|
| c17         | 6          | 56,36        |
| c432        | 160        | 43,67        |
| c499        | 202        | 82,32        |
| c880        | 383        | 17,78        |
| c1355       | 546        | 86,31        |
| c1908       | 880        | 55,16        |
| c2670       | 1193       | 7,92         |
| c3540       | 1669       | 25,32        |
| c5315       | 2307       | 7,39         |
| c6288       | 2416       | 76,24        |
| c7552       | 3512       | 13,1         |
| s27_comb    | 10         | 45,09        |
| s1196_comb  | 529        | 26,01        |
| s1238_comb  | 508        | 26,48        |
| s1494_comb  | 647        | 50,58        |
| s5378_comb  | 2779       | 4,05         |
| s9234_comb  | 5597       | 5,44         |
| s13207_comb | 7951       | 1,19         |
| s15850_comb | 9772       | 1,38         |
| s35932_comb | 16065      | 0,26         |
| s38417_comb | 22179      | 0,84         |
| s38584_comb | 19253      | 0,39         |

The main loop of the algorithm of finding bits to be stored in the ATE memory is described in Fig. 1. Let us suppose (without loss of generality) that the SC is reset before testing, which means that the all zero pattern is considered to be used as the first one (algorithm allows to start with any known scan chain state). The fault coverage of this pattern is simulated and the detected faults are deleted from the UFL, test patterns corresponding to the detected faults are deleted from the TPL. Then the algorithm tries to compact the test set by overlapping resting patterns with the actual scan chain state. The algorithm finds, whether log. 0 or log. 1 is better to be used as the next most left chain bit. To do this the algorithm finds positions of all patterns, in which the actual chain bits maximally overlap the pattern and for which the actual bit to be introduced into the SC has not a don't care value. After finding the position the algorithm has to count the usefulness U of the treated pattern. The pattern usefulness U is calculated according to the following formula:

 $U = t *(overlapped\_cares + shift) + global\_cares$ 

where *overlapped\_cares* – the number of the pattern care bits that overlap the SC; *shift* – the amount of non-overlapped bits in pattern; *global\_cares* - the global number of the pattern care bits; t – Experimentally fixed parameter; we obtained good results when we set t = *number\_of\_primary\_inputs* / 2.

Then the algorithm compares the number of the most useful patterns with log. 1 on the actual position and the



Fig. 1: Pattern overlapping algorithm

number of patterns with log. 0 on this position. If the number of ones is greater than the number of zeros the input actual bit is fixed to log. 1 in the other case to log. 0. This way of setting the actual bit guarantees that a maximum number of the most useful patterns could be encoded. When searching for the most useful pattern we check whether the exercised pattern matches with bits which will be necessary to be generated in the future clock cycles because of some previously selected patterns. These bits are stored in a Future Array (FA) together with their effectiveness and pattern identification numbers If some position of FA is reserved for a logical value that is clashing with the exercised pattern bit value we compare the usefulness of both patterns and the winner is used in future considerations. After bit selection the fault simulation is performed and the faults and patterns, which correspond to the covered faults, are removed from the lists. If there are not remaining faults in the Undetected Fault List the algorithm is finished.

#### B. Proposed optimization

Basic principle of the compression method remains the same as in previous chapter, but several steps of the algorithm can be skipped, if they can not influence the solution.

One possible state of the compression algorithm is shown in Tab. 2. To make explanation easier, the basic version of algorithm [26] without bit prediction is used, as the principle of the optimization remains the same.

| TABLE 2           Example of valid algorithm state |   |   |   |   |   |                 |   |   |   |   |   |   |
|----------------------------------------------------|---|---|---|---|---|-----------------|---|---|---|---|---|---|
|                                                    |   |   |   |   |   | Searched<br>bit |   |   |   |   |   |   |
| Step                                               | 5 | 4 | 3 | 2 | 1 | 0               |   |   |   |   |   |   |
| SC content                                         |   |   |   |   |   | ?               | 0 | 0 | 1 | 1 | 1 | 0 |
| vector A                                           |   |   | 1 | Х | Х | Х               | 0 | 0 |   |   |   |   |
| vector B                                           |   |   | 1 | Х | Х | 1               | 0 | 0 |   |   |   |   |
| vector C                                           |   |   |   |   | Х | 1               | х | 0 | 1 | 1 | ] |   |
| vector D                                           | 0 | Х | 1 | 1 | 1 | 1               |   |   |   |   | - |   |

All remaining vectors are overlapped as much as possible with current scan chain state during search of the next compressed bit (marked with question mark). At the given moment, three vectors are overlapping (vectors A,B,C) and one vector (D) could not be overlapped. It is not possible to store DC bit in the compressed sequence, so only vectors 2, 3 and 4 are useful. They all have '1' at the current position, so value '1' is shifted into the scan chain and stored as the next bit of the compressed sequence. The fault simulation of the current scan chain state is performed after that, the detected faults and their corresponding vectors are removed from the memory. Than the algorithm goes to the beginning and tries to overlap all remaining vectors again.

It is important, that vector A has only useless DC bits in step 1 and in the two following steps (2 and 3). Those bits can not be contained in the solution; on top of that they will never collide with any other selected bit. Because of that it is possible to omit the calculation of the possibility of vector overlapping for vector A in steps 1, 2 and 3. The vector A is not useful for calculations in step 1 and 2; in step 3 it could be overlapped without collision. Vectors B and C have DC bits in the following steps, but during calculation it is not certain, if their bit (value '1') will be chosen as a solution. Vector D has care bits on the next position, so its overlap has to be evaluated. If a vector has a DC bit in the actual position, it is only necessary to evaluate how many DC bits follow the actual position. This computation is done for each new found sequence of DC bits in vector only once. It is also faster than finding how much is a vector overlapped. Every time when vector overlapping is evaluated, the program finds if DC bits series follows, and how long is this DC bit sequence. If there is at least one DC bit, the overlapping evaluation is omitted.

Vectors are stored in a dynamic structure from Fig. 2 according to the number of steps needed to reach a care bit. Only the vectors from entry '0' of steps\_to\_care\_bit array are checked, others can not influence the solution. It is obvious, that it is necessary to store only distance to the next care bit, so that each vector will be saved only once. Solution is chosen after evaluation of all vectors in entry '0', and vectors are placed into the proper entries of steps\_to\_care\_bit array according to the distance to the next care bit. After evaluation and replacement of all the vectors from entry '0', whole array is shifted one position to the right and the algorithm is ready for the next loop.



Fig. 2: Dynamic structure for calculation omit decision

Amount of the DC bits in the uncompressed test data file of several circuits from ISCAS85 and ISCAS89 benchmark set is noted in Tab. 1. The data contain a lot of DC bits, and the percentage of DC bits grows with the circuit size. That is why it is possible to skip a lot of calculations.

#### V. TEST ACCESS MECHANISM (TAM)

A test session can be controlled by a tester or by a BIST controller. It could be advantageous to use an embedded processor instead of a specialized controller with a RAM. As the RAM size is limited, the test set has to be as small as possible. Further testing speed improvement could be obtained by minimizing the amount of data transferred between the processor and the tested cores. From this reason it is worthwhile to send the compressed data from the processor to the decoders that are placed closely to the tested cores and to leave the decoders to decode the patterns independently on the processor activity. This arrangement can speed up testing as the clock frequency of the core flip-flops could be higher than the processor clock frequency and the processor can prepare next data during decoding the previous pattern (Figure 3). Another problem arises when using cores with the SCs that contain internal flip-flops; if we have to guarantee not corrupting test patterns by CUT responses and simultaneously catching all test responses we have to scan in and scan out the whole test pattern after each system clock application. The RESPIN (Reusing Scan Chains for Test Decompression) test architecture [7] solves both pattern decompression and reducing the data traffic between tester and CUT.



Fig. 3. ETC and CUT in the RESPIN architecture

The RESPIN architecture temporarily divides the circuit into the core under test (CUT) and the embedded tester core (ETC). The data transfer mechanism between the tester and ETC can be denoted as a narrow TAM as the demanded transfer capacity is low. The TAM between the ETC and CUT is wide as the data transfer is done parallel and on a higher clock frequency. The ETC chains are concatenated into a serial scan chain; a feedback tap connects the ETC last chain output with the first bit input through a multiplexer. According the multiplexer control input, ETC can either load a bit from the tester or shift the scan chain circularly. The parallel chains of the CUT are connected with the parallel ETC chain outputs. This test pattern updating mechanism guarantees that the patterns, which are shifted through the CUT SC during several test steps, are not mixed with the CUT responses. An additional multi input MISR connected to the SC outputs can be exploited for capturing all the test responses. The conditions for effective testing are: the ETC has at least the same number of chains as the CUT; the CUT chains are not longer than the corresponding ETC chains and the number of scan cells of the CUT and the total number of ETC scan cells incremented by one have not a common divider. If it is not possible to find an ETC core that fulfils the above mentioned conditions, more than one core can be used for creating the ETC.

# A. Reconfiguration

The novel FPGA circuits are dynamically reconfigurable dynamically at runtime. These reconfigurable FPGA circuits have a capability to change the behavior of one part of the circuit; the rest part is fully operational without changes and without interruption. Generally, each memory-based FPGA can be reconfigured dynamically. In the currently known dynamically reconfigurable devices two techniques are used: "partial

30

configuration" and "Multiple-context configuration memory" [31].

Reconfiguration of the TAM for a SoC testing seems to be an efficient exploitation of the partial reconfiguration capability of FPGAs. As the Atmel FPGAs can efficiently perform the fine grained reconfiguration we decided to use it for an implementation of the self-testable SoC (System on Chip) design. The diagnostic system uses RESPIN architecture which is based on the IEEE 1500 standard. The partial reconfiguration is used for connection among ETCs, CUT and the feedback multiplexer.

The main advantage of the proposed solution is that all the reconfiguration bitstreams are stored inside the chip. Thereafter the reconfiguration process can be controlled by the embedded processor and the only communication between the tested SoC and the external test supervisor is a request for execution the test and checking the results of the done tests.

#### VI. EXPERIMENTAL RESULTS

Fig. 4 shows the COMPAS CPU time improvement against [26]. The new algorithm performs better for larger circuits, and it corresponds with amount of DC bits. Average speedup is 114% for all measured circuits and 181% for circuits with more than 10.000 gates.



Fig. 4: Speedup of the compression part of the algorithm

Tab. 3 shows the resulting numbers of stored bits for some well known test pattern compression methods and for the proposed algorithms. In the second column we plotted the test data volume for ATPG vectors, which were compacted only [3]. Next column shows the number of stored bits for statistical coding of the test patterns from the previous column [1]. Next results correspond to a combination of statistical coding and LFSR reseeding [18]. Next columns summarize results of compression with parallel/serial scan chains [27], frequency directed codes [5]. The results for the method of Embedded Deterministic Test are presented in the next column [18]. The column RESPIN++ shows the numbers of bits stored in the ATE for the RESPIN++ architecture given in [32]. We can see that the number of bits, which are stored in a memory, is substantially lower for the proposed method than for other pattern compressing methods. We have to note that a majority of the tabulated pattern compression methods do not use a fault simulation after encoding a new test pattern (with the exception of the method [32]). These methods use compacted test sequences, the fault coverage was simulated during test pattern generation in the ATPG in the process of pattern compaction. The number of fault simulations in these cases corresponds with the total number of non compacted test patterns. In case of COMPAS and RESPIN++ the ATPG patterns were generated without any simulation, fault simulation is performed after a pattern encoding. The number of fault simulations is equal to the length of the final compressed sequence. Lengths of the compressed sequences are the same as in previous work [26], because both optimizations do not change the principle of the algorithm. That means that the results should be exactly the same, but due to optimizations the results should be obtained faster and with smaller memory footprint. This is true especially for larger circuits, because their test data generated by an ATPG contain large amount of don't care bits.

The experimental diagnostic system was built on the FPSLIC<sup>TM</sup> AT94K40AL circuit. It is a dynamically reconfigurable programmable SoC, which integrates Atmel SRAM, FPGA and an 8-bit AVR processor [11].

The FPSLIC circuit is connected to PC through JTAG interface. A user is able to program both main parts of IC - program for AVR processor and/or static content of FPGA. Testing with the RESPIN architecture requires reconfiguring circuit cores several times during the test. Each core in the SoC is surrounded by the wrapper [14]. The wrapper allows connecting the core with the defined surrounding cores either in the functional mode or in the test mode. The Test Access Mechanism (TAM) takes care of the on-chip test pattern transport. The TAM and wrappers form the infrastructure for access to individual cores providing tests of all cores. Whereas the core wrapper is defined and standardized by the IEEE 1500 standard, the design of test access mechanism is excluded from this standard and assumed to be addressed by the SoC designer. Partial FPGA reconfiguration was used as an efficient way how to form the low area demanding TAM for multiple embedded core SoC design. The FPGA consists of a number of generic cells called LUTs. In our system the LUT is used for

connecting the test core terminal and a LUT of the TAM. By this arrangement two LUTs are needed to form one wire interconnection between 1-bit core test input and output terminal in the FPGA.

The testing system uses an 8-bit AVR processor, an SRAM memory and a dynamic reconfigurable FPGA accessible both from the processor and from the FPGA. In the FPGA we programmed wrapped cores, the MISR, the controller and detached area of the TAM. The AVR processor was used for data processing, for handling the data with the hardware controller and for partial reconfiguration of the TAM before initiation of the core test. Test patterns together with TAM configurations were stored in the embedded SRAM. The processor controls the test scheduling and communicates with the hardware controller. The RAM is used for storing the compressed test sequence. For each test pattern the processor gives the controller a command to run the test cycle independently on the processor. This arrangement enables the hardware controller and the processor to work concurrently and to speed up the test. The hardware controller drives core wrappers and the TAM by the WSC signals. During the test cycle the AVR transports one test bit from the memory to the port tdi and informs the controller about availability and suitability of test data. At the end of the test session, the processor shifts data through the port tdo from the MISR where the responses were accumulated and compares the resulting signature with the sample one stored in the RAM (Figure 5). After finishing the first CUT test the TAM is partially reconfigured and the next core is assigned as a CUT and it is tested through a newly reconfigured ETC. As the granularity of configurable blocks of the FPGA is relatively fine only a small part of the configuration memory has to be replaced by a new content (In Fig. 4 denoted by the gray color).



Fig. 5: An example of TAM configuration (given by dotted lines). The TAM is reconfigured by reprogramming LUTs of the reconfigurable FPGA blok

The ISCAS benchmark circuits (S298, S382, S444 and S1423) were used as cores in the experiment. The system with three cores S1423 designed in the SoC used 73% of the FPGA AT94K40 resources. Reconfiguration takes several thousands of clock cycles of processor. Number of clock cycles depends on the design to be reconfigured. In our case the reconfiguration time is less than 1 ms in case of 4 MHz processor clock. The circuit has 36 Kbytes of available RAM memory (20 - 32 Kbytes for program and)4 - 16 Kbytes for data). The size of one reconfigurable bitstream, which was used in the diagnostic system, was 2 Kbytes. The more cores are used in RESPIN architecture the more reconfigurable bitstreams are needed for arranging the ETC-CUT structure. Nevertheless the spent RAM memory amount was acceptable. In case of lack of the RAM memory the bitstreams can be reloaded from a PC. The test time depends on the longest parallel chain and on the number of bits of the compressed test. In our

TABLE 3

| COMPARISON OF MEMORY REQUIREMENTS FOR DIFFERENT TEST PATTERN COMPRESSION TECHNIQUES |           |           |           |           |           |           |           |            |  |  |
|-------------------------------------------------------------------------------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|------------|--|--|
| Circuit                                                                             | MinTest   | Stat.     | LFSR      | Illinois  | FDR       | EDT       | RESPIN++  | COMPAS     |  |  |
| name                                                                                | [3]       | Coding    | Reseed-   | Scan      | Codes     | [17]      | [18]      | (proposed) |  |  |
|                                                                                     |           | [1]       | ing       | [15]      | [5]       |           |           |            |  |  |
|                                                                                     |           |           | [10]      |           |           |           |           |            |  |  |
|                                                                                     | # of bits  |  |  |
| s13207                                                                              | 163,100   | 52,741    | 11,285    | 109,772   | 30,880    | 10,585    | 26,004    | 4,024      |  |  |
| s15850                                                                              | 58,656    | 49,163    | 12,438    | 32,758    | 26,000    | 9,805     | 32,226    | 7,737      |  |  |
| s38417                                                                              | 113,152   | 172,216   | 34,767    | 96,269    | 93,466    | 31,458    | 89,132    | 21,280     |  |  |
| s38584                                                                              | 161,040   | 128,046   | 29,397    | 96,056    | 77,812    | 18,568    | 63,232    | 6,675      |  |  |

case the test time is about 0.3 ms for the best possible clock frequency of the FPGA (40 MHz).

#### VII. CONCLUSION

The COMPAS compression tool demonstrates that it is possible to apply the method of test pattern compression through pattern overlapping for relatively large circuits and that the resulting test data volume is kept very low. COMPAS uses as input test patterns non compacted original ATPG test vectors with don't care bits. The patterns are overlapped and the resulting test sequence can be decompressed by the scan chain. The decompressed patterns are simulated by the fault simulator whether they cover any other additional fault. This mechanism reduces the number of test patterns that have to be used for testing since the interleaving patterns that appear in the scan chain between the original patterns cover the random testable faults. These faults are usually tested by the random pattern sequence in mixed/mode testing algorithms and the proposed method avoids using this random testing phase. We have solved the problem of long CPU time for enumerating the compressed test sequence by multiple usage of test bit usability evaluation during the process of finding the test sequence and by skipping pattern recalculation for cases when don't care bit groups are present in the patterns. This was enabled by using a concatenated list of pattern pointers. Problem of extreme memory consumption has been solved by using two new data encodings during the compression. New test data encoding effectively reduces memory footprint of the COMPAS program to less than 25% of the original. The algorithm is also capable of compression of data generated by concurrently running ATPG processes

The proposed method of compression and compaction of test patterns is very well suited for testing combinational circuits with Boundary Scan because it does not require any additional hardware for test pattern decompression. It can be used also for testing sequential cores with multiple scan chains. To do this we can use the RESPIN architecture. For the use of the compressed test sequence in the multi scan chain system the sequence is reordered in order to be correctly decompressed within the RESPIN architecture. Following the IEEE 1500 standard [23] we do not require extra hardware with the exception of one multiplexer and a feedback wire in every core. The sequence generated by COMPAS can be used for less time consuming sequential core testing than it is possible in the mixed-mode testing approaches [26].

We have verified that the proposed diagnostic system is applicable on a SoC. We have placed the system together with simple functional cores on the AT94K FPSLIC circuit. The diagnostic system uses the dynamic and partial reconfiguration feature of the embedded FPGA. This is advantageous because it saves resources of the FPGA devoted for switching the TAM busses. For larger cores the system can be built on the large Xilinx FPGA circuits with embedded processor and RAM memory block. The property of dynamic reconfiguration of the FPGA part could be an advantage that can save the FPGA resources. We can conclude that the diagnostic system is well suited for a SoC architecture with a processor, RAM block embedded FPGA and ASIC. The memory requirements for storing the test data are lower than it is in case of other comparable methods; the test time is very low, too.

#### REFERENCES

- Abhijit Jas, Jayabrata Ghos-Dastir, and Nur A. Touba: Scan Vector Compression/Decompression Using Statistical Coding. Proc. VTS 1999
- [2] Bayraktaroglu, I., and Orailoglu, A.: Decompression Hardware Determination for Test Volume and Time Reduction through Unified Test Pattern Compaction and Compression. Proc. of VTS 2003
- [3] Bernhart et al.: OPMISR: the foundation for compressed ATPG vectors. Proc. ITC, 2001, pp. 748-757
- [4] Brglez, F., Bryan, D., Kozminski, K.: Combinational Profiles of Sequential Benchmark Circuits. Proc. of. Int. Symp. on Circuits and Systems, 1989, pp. 1929-1934
- [5] C Chandra, A. Chakrabarty, K.: Frequency/Directed Run Length (FDR) Codes with Application to System/on/Chip Test Data Compression. Proc. VTS 2001, pp. 42-47
- [6] Daehn, W., Mucha, J.: Hardware Test Pattern Generation for Builtin Testing. Proc. of ITC, 1981, pp. 110-113
- [7] Dorsch, R. and Wunderlich, H-J:Reusing Scan Chains for Test Pattern Decompression.Proc. IEEE ETW, 2001, pp.24-32
- [8] Hellebrand, S., Liang, H.G. Wunderlich, H.J.: A mixed mode BIST scheme based on reseeding of folding counters. Proc. of ITC, 2000
- [9] http://direct.xilinx.com/bvdocs/userguides/ug070.pdf. [cit 9.5.2006]
- [10] http://iko.kes.vslib.cz, [cit 10.27.2006]
- [11] http://www.atmel.com/dyn/resources/prod\_documents/2818s.pdf [cit 20.5.2006]
- [12] http://www.cerc.utexas.edu/itc99- benchmarks/bench.html
- [13] http://www.cerc.utexas.edu/itc99-benchmarks/bench.html
- [14] IEEE Computer Society. IEEE Standard Testability Method for Embedded Core-based Integrated Circuits - IEEE Std 1500-2005. IEEE, New York, 2005.

- [15] Irion, A.; Kiefer, G.; Vranken, H.; Wunderlich, H.-J. : Circuit Partitioning for Efficient Logic BIST Synthesis, Proc. DATE, 2001, pp.88-93
- [16] Jenicek J. J., Novak O :Test Pattern Compression Based on Pattern Overlapping, In *Design and Diagnostics of Electronic Circuits and Systems*. Los Alamitos: IEEE Computer Society, 2007, pp. 29-34
- [17] Koenemann, B.: LFSR coded test patterns for scan designs. Proc. Europ. Test Conf., Munich, Germany, 1991,
- [18] Krishna, C.V., Touba, N.A.: Reducing Test Data Volume Using LFSR Reseeding with Seed Compression. Proc. of ITC 2002, pp321-330
- [19] Lee H. K., and Ha, D. S.: HOPE: An efficient parallel fault simulator. Proc of the IEEE Design Automation Conference, pp. 336-340, June 1992
- [20] Lee H. K., and Ha, D. S.: On the generation of test patterns for combinational circuits. Technical Report 12\_93, Department of Electrical Eng., Virginia Polytechnic Institute and State University
- [21] Li, L and Chakrabarty K.: Test Data Compression Using Dictionaries with Fixed-Length Indices. Proc. VTS, 2003
- [22] Li, L and Chakrabarty K.: Test Set Embedding for Deterministic BIST Using a Reconfigurable Interconnection Network. IEEE Trans. on Comp. Aided Design of IC, Vol. 23, No. 9, Sept 2004, pp.1289-1305
- [23] Marinissen, E. J.- Zorian, Y. Kapur, R. Taylor T., and Whetsel. L.:Towards a Standard for Embedded Core Test: An Example. Proc. of ITC, pp. 616–627. IEEE, 1999.
- [24] Marinissen, E.J., Arendsen, R., Bos, G.: A Structured and Scalable Mechanism for Test Access to Embedded Reusable Cores. Proceedings IEEE, ITC, 1998
- [25] Novák, O., Nosek, J.: Test Pattern Decompression Using a Scan Chain, Proc. of IEEE International Symposium on Defects and Fault Tolerance in VLSI Systems 2001, pp. 110 – 115.
- [26] Novak, O., Pliva, Z., Jenicek, J., Mader, Z., Jarkovsky, M.: Self Testing SoC with Reduced Memory Requirements and Minimized Hardware Overhead. Defect and Fault Tolerance in VLSI Systems, 2006. Proc. of DFT'06. pp. 300 – 308

- [27] Pandey, A. R. Patel, H. J.: Reconfiguration Technique for Reducing Test Time and Test Data Volume in Illinois Scan Architecture Based Designs. Proc. IEEE VLSI Test Symp, 2002, pp. 9-15
- [28] Pandey, A. R. Patel, H. J.: Reconfiguration Technique for Reducing Test Time and Test Data Volume in Illinois Scan Architecture Based Designs. Proc. IEEE VLSI Test Symp, 2002, pp. 9-15
- [29] Rajski, J. et al.: Embedded Deterministic Test . IEEE Trans. on CAD, vol. 23, No. 5, May 2004, pp. 776-792
- [30] Rao, W., Oraiologlu, A.: Virtual Compression through Test Vector Stitching for Scan Based Designs. DATE 2003
- [31] Scandaliaris, J., Moreno, J.M., Cabestany, J., Buttel, P., Rachet, A., Kadlec, J., Hermanek, A., de Saint Romain, D., Habay, G., Donati, A.: A General Design Flow for Dynamically Reconfigurable FPGAs (D\_FPGAs). http://www.reconf.org/Files/Publications/RAW03\_UPC.pdf [cit 22. 5. 2006]
- [32] Schafer, L. Dorsch, R.- Wunderlich, H.J.: RESPIN++-Deterministic Embedded Test. Proc. European Test Workshop, 2002, pp. 37-42
- [33] Su, C., and Hwang, K.: A Serial Scan Test Vector Compression Methodology. Proc. ITC 1993, PP. 981-988
- [34] Wolf, F. G. and Papachristou C.: Multiscan-based Test Compression and Hardware Decompression Using LZ77. Proc. of ITC 2002, pp. 331-339
- [35] Wolf, J.M.; Kaufman, L.M.; Klenke, R.H; Pylor J.H.; Waxman, R.: An Analysis of Fault Partitioned Parallel Test Generation, IEEE Trans. On Computer-Aided Design of ICs and Systems, Vol. 15, 1996
- [36] Wu, D.M.; Lin, M.; Reddy, M.; Jaber, T.; Sabbavarapu, A.; Thatcher, L.: An Optimized DFT and Test Pattern Generation Strategy for an Intel High Performance Microprocessor, Proc. Int. Test. Conf., 2004, pp. 38-47

# COMPARATIVE STUDY OF THE DESCRIPTIVE EXPERIMENT DESIGN AND ROBUST FUSED BAYESIAN REGULARIZATION TECHNIQUES FOR HIGH-RESOLUTION RADAR IMAGING

Ivan E. Villalon-Turrubiates, Member, IEEE, and Yuriy V. Shkvarko, Senior Member, IEEE

Abstract-In this paper, we perform a comparative study of two recently proposed high-resolution radar imaging paradigms: the descriptive experiment design regularization (DEDR) and the fused Bayesian regularization (FBR) methods. The first one, the DEDR, employs aggregation of the descriptive regularization and worst-case statistical performance (WCSP) optimization approaches to enhanced radar/SAR imaging. The second one, the FBR, performs image reconstruction as a solution of the illconditioned inverse spatial spectrum pattern (SSP) estimation problem with model uncertainties via unifying the Bayesian minimum risk (MR) estimation strategy with the maximum entropy (ME) randomized a priori image model and other projection-type regularization constraints imposed on the solution. Although the DEDR and the FBR are inferred from different descriptive and statistical constrained optimization paradigms, we examine how these two methods lead to structurally similar techniques that may be further transformed into new computationally more efficient robust adaptive imaging methods that enable one to derive efficient and consistent estimates of the SSP via unifying both the robust DEDR and FBR considerations. We present the results of extended comparative simulation study of the family of the image formation/ enhancement algorithms that employ the proposed robustified FBR and DEDR methods for high-resolution reconstruction of the SSP in a virtually real time. The computational complexity of different methods are analyzed and reported together with the scene imaging protocols. The advantages of the well designed SAR imaging experiments (that employ the FBR-based and DEDR-related robust estimators) over the cases of poorer designed experiments (that employ the conventional matched spatial filtering as well as the least squares techniques) are verified trough the simulation study.

*Index Terms*—Bayesian estimation, maximum entropy, radar imaging, regularization, remote sensing, spatial spectrum pattern, sufficient statistics.

#### I. INTRODUCTION

THE goal of this study is to address and discuss a new L computationally efficient approach to high-resolution radar/SAR imaging as an ill-conditioned inverse problem of estimating the spatial spectrum pattern (SSP) of the wavefield sources scattered from the probing surface (referred to as the radar/SAR image). The SSP estimation problem is a statistical ill-conditioned nonlinear inverse problem [6], [7]. Because of the stochastic nature and nonlinearity, no unique regular method exists for reconstructing the SSP from the finitedimensional measurement data in an analytic closed form. Hence, the particular solution strategy to be developed and applied must unify the practical data observation method with some form of statistical regularization that incorporates the a priori model knowledge about the SSP to alleviate the problem ill-poseness. The classical imaging with radar or SAR implies application of the method called "matched spatial filtering" (MSF) that originates from the celebrated maximum likelihood (ML) estimation strategy [14]. In the statistical terms [2], [6], [14] such a method implies application of the adjoint SFO to the recorded data, computation of the squared norm of a filter outputs and their averaging over the actually recorded samples (the so-called snapshots [10]) of the independent data observations. As it was analyzed in many works, e.g. [1] - [27], the MSF method does not exploit all the "degrees of freedom" of the inverse problem at hand, thus manifests low spatial resolution performances. The recent non-parametric approaches to high-resolution enhanced radar/SAR imaging are based on treatment the problem at hand as an ill-posed (ill-conditioned) nonlinear inverse problem with model uncertainties [6] - [8], [15], [16]. The principal idea is to employ different regularization paradigms, e.g. [6] - [8] to resolve the SSP estimation inverse problem with minimum risk (i.e. maximum spatial resolution balanced with noise suppression) subject to some non-trivial ME and other projection-type constraints imposed on the solution (i.e. incorporate the a priori model information with minimum subjective decision making). In this study, we provide an overview of the recently developed descriptive experiment design regularization (DEDR) and the fused Bayesian

Manuscript received March 4, 2008

I. E. Villalon-Turrubiates is with the Department of Computer Sciences at the University of Guadalajara Campus Valles, Ameca 46600 Jal. MEXICO (phone: +52-375-758-0148; e-mail: villalon@ieee.org).

Y. V. Shkvarko is with the CINVESTAV Unidad Guadalajara, Guadalajara 45015 Jal. MEXICO (e-mail: shkvarko@gdl.cinvestav.mx).

regularization (FBR) non-parametric paradigms for superhigh-resolution radar/SAR image formation and enhancement/ reconstruction. The first one, the DEDR, developed in [26], [27] employs aggregation of the descriptive regularization and worst-case statistical performance (WCSP) optimization approaches to enhanced radar/SAR imaging. The second one, the FBR, developed in [7], [8], [25], performs image reconstruction as a solution of the ill-conditioned inverse spatial spectrum pattern (SSP) estimation problem with model uncertainties via unifying the Bayesian minimum risk (MR) estimation strategy with the maximum entropy (ME) randomized a priori image model that incorporates the projection-type regularization constraints imposed on the solution. Although the DEDR and the FBR are inferred from different descriptive and statistical constrained optimization paradigms, we examine how these two methods lead to structurally similar techniques that may be further transformed into new computationally more efficient robust adaptive imaging methods that enable one to derive efficient and consistent estimates of the SSP via unifying both the robust DEDR and FBR considerations. The principal innovative contribution of this study may be briefly summarized as follows:

• Unification of the family of the DEDR-related and FBRrelated enhanced RS imaging techniques via comparative analysis of their operational computational structures.

• Development of the robustified versions of the DEDR and FBR methods via alleviating the ill-poseness of the nonlinear adaptive operator inversions in the overall image reconstruction procedures.

• Design of efficient computational algorithms that perform robust adaptive spatial processing for enhanced RS image formation in a virtually real computational time.

Also, we are going to present the results of extended comparative simulation studies of the family of the robustified DEDR-related and FBR-related SSP estimation algorithms using the MATLAB as simulation tools that provide efficiency and flexibility in performing all simulation experiments.

#### II. PROBLEM MODEL AND EXPERIMENT DESIGN CONSIDERATIONS

Consider a remote sensing experiment performed with a coherent array imaging radar or SAR (radar/SAR) that is traditionally referred to as radar imaging (RI) problem ([6] – [9]). The measurement sensor/SAR data wavefield  $u(\mathbf{y}) = s(\mathbf{y}) + n(\mathbf{y})$  modeled as a superposition of the echo signals *s* and additive noise *n* is assumed to be available for observations and recordings within the prescribed time-space observation domain  $Y \ni \mathbf{y}$ , where  $\mathbf{y} = (t, \mathbf{p})^T$  defines the time-space points in the observation domain  $Y = T \times P$ .

### A. RS motivated problem model

The model of the observation wavefield u is specified by the linear stochastic equation of observation (EO) of operator form:

$$u = Se + n; \ e \in E; \quad u, n \in U; \ S: E \to U \tag{1}$$

on the Hilbert signal spaces E and U with the metric structures induced by the inner products,

$$[e_1, e_2]_{\mathrm{E}} = \int_{X} e_1(\mathbf{x}) e_2^*(\mathbf{x}) d\mathbf{x} \quad \text{and} [u_1, u_2]_{\mathrm{U}} = \int_{Y} u_1(\mathbf{y}) u_2^*(\mathbf{y}) d\mathbf{y} ,$$
(2)

respectively, where asterisk stands for complex conjugate. In (1), the *S* is referred to as the regular signal formation operator (SFO). It defines the transform of random scattered signals  $e(\mathbf{x}) \in \mathbf{E}(X)$  distributed over the remotely sensed scene (probing surface)  $X \ni \mathbf{x}$  into the echo signals ( $Se(\mathbf{x})$ )( $\mathbf{y}$ ) $\in$ U(Y) over the time-space observation domain  $Y = T \times P$ ;  $t \in T$ ,  $\mathbf{p} \in P$ . In the functional terms [6], [9], such a transform is referred to as the operator *S*:  $\mathbf{E} \rightarrow \mathbf{U}$  that maps the scene signal space  $\mathbf{E}$  (the space of the signals scattered from the remotely sensed scene) onto the observation data signal space U. This operator model (1) in the conventional integral form [6] may be rewritten as

$$u(\mathbf{y}) = \int_{X} S(\mathbf{y}, \mathbf{x}) e(\mathbf{x}) d\mathbf{x} + n(\mathbf{y}), \qquad (3)$$

$$e(\mathbf{x}) = e(f; \boldsymbol{\rho}, \boldsymbol{\theta}) = \int_{F} e(t; \boldsymbol{\rho}, \boldsymbol{\theta}) \exp(-i2\pi f t) dt$$
(4)

where the functional kernel  $S(\mathbf{y}, \mathbf{x})$  of the SFO *S* given by (1) defines the signal wavefield formation model [9], [11]. Following the multi-scale array/SAR radar RS problem phenomenology [6], [9], we adopt here an incoherent model of the backscattered field  $e(\mathbf{x})$  in the frequency-space observation domain  $X = F \times R = F \times P \times \Theta$ , i.e. over the slant range  $\mathbf{p} \in P$  and azimuth angle  $\mathbf{\theta} \in \Theta$  domains, respectively. When tackling the RS spatial analysis problems, the radar engineers typically work in the frequency-space domain,  $\mathbf{x} = (f; \mathbf{p}, \mathbf{\theta})^T \in X = F \times P \times \Theta$  [6], [7], [9]. However, because of the one-to-one mapping, only the spatial cross range coordinates  $\mathbf{r} = (\mathbf{p}, \mathbf{\theta})$  may be associated with  $\mathbf{x} = \mathbf{r}$  as well [9], [11]. Such interpretation is valid if one assumes the narrowband system model [9], [11], [12] and incoherent nature of the backscattered field  $e(\mathbf{x})$ .

It is naturally inherent to the RS imaging experiments [7], [8], [11] to consider the phasor  $e(f,\mathbf{r})$  in (3) to be an independent random variable at each frequency f, and spatial coordinates  $\mathbf{r}$ ,  $\boldsymbol{\theta}$  with the zero mean value and  $\delta$ -form correlation function,  $R_e(f, f'; \mathbf{r}, \mathbf{r}') = \langle e(f; \mathbf{r})e^*(f', \mathbf{r}') \rangle = B(f, \mathbf{r})\delta(f - f')\delta(\mathbf{r} - \mathbf{r}')$  that enables one to introduce the following definition of the spatial spectrum pattern (SSP) of the wavefield sources distributed in the RS observation environment [9], [27]

$$B(\mathbf{r}) = Aver^{(2)}\{e(\mathbf{r})\} = \int_{F} \left\langle |e(f,\mathbf{r})|^{2} \right\rangle |H(f)|^{2} df; \, \mathbf{r} \in \mathbb{R}.$$
(5)

Here,  $\langle \cdot \rangle$  represents the ensemble averaging operator, while  $Aver^{(2)}$  is referred to as the second order (i.e. nonlinear) statistical averaging operator defined by (5). Also in (5), H(f) represents the given transfer function of the radar receive channels that we assume to be identical for all antenna array elements and impose the conventional normalization,  $|H(f)|^2 = 1$  for all frequencies  $f \in F$  in the radar receiver frequency integrating band F [9]. In the conventional radar imaging setting [9], [18], [21], the initial RS imaging problem is to form an estimate  $\hat{B}(\mathbf{x})$  of the SSP distribution  $B(\mathbf{r})$  over the remotely sensed scene  $R \ni \mathbf{r}$  by processing whatever values of measurements of the data field,  $u(\mathbf{y})$ ;  $\mathbf{y} \in Y$ , are available.

Next, following the RS data analysis methodology [1], [2], [20], [22] any particular physical signature of interest  $\hat{\Lambda}(\mathbf{x})$  could be extracted from the reconstructed RS image  $\hat{B}(\mathbf{x})$  applying the so-called deterministic *signature extraction operator*  $\Lambda$ . Hence, the particular RS signature (RSS) is mapped applying  $\Lambda$  to the reconstructed image, i.e.

$$\hat{\Lambda}(\mathbf{x}) = \Lambda(\hat{B}(\mathbf{x})). \tag{6}$$

Last, taking into account the RSS extraction model (6), we can reformulate now the RSS reconstruction problem as follows: to map the reconstructed particular RSS of interest  $\hat{\Lambda}(\mathbf{x}) = \Lambda(\hat{B}(\mathbf{x}))$  over the observation scene X is via post-processing whatever available values of the reconstructed scene image  $\hat{B}(\mathbf{x})$ ;  $\mathbf{x} \in X$ .

#### B. Numerical model of the problem

Viewing it as an approximation problem leads one to the projection concept for a transformation of the continuous data field  $u(\mathbf{y})$  to the  $M \times 1$  vector  $\mathbf{U} = (U_1, ..., U_M)^T$  of sampled spatial-temporal data recordings. The *M*-d observations in the terms of projections [7], [8] can be expressed as

$$u_{(M)}(\mathbf{y}) = (P_{\mathrm{U}(M)}u)(\mathbf{y}) = \sum_{m=1}^{M} U_m \varphi_m(\mathbf{y})$$
(7)

with coefficients  $\{U_m = [u, h_m]_U\}$  where  $P_{U(M)}$  represents a projector onto the *M*-d subspace

$$U_{(M)} = P_{U(M)}U = \text{Span}\{\phi_m(\mathbf{y})\}$$
(8)

uniquely defined by a set of the orthogonal functions  $\{\phi_m(\mathbf{y}) = \|h_m(\mathbf{y})\|^{-2}h_m(\mathbf{y}); m = 1, ..., M\}$  that are related to  $\{h_m(\mathbf{y})\}$  as a dual basis in  $U_{(M)}$  i.e.  $[h_m, \phi_n]_U = \delta_{mn} \forall m, n = 1, ..., M$ . In the observation scene  $X \ni \mathbf{x}$ , the discretization of the scattering field  $e(\mathbf{x})$  is traditionally performed over a  $Q \times N$  rectangular grid where Q defines the dimension of the grid over the horizontal (azimuth) coordinate  $x_1$  and N defines the grid dimension over the orthogonal coordinate  $x_2$  (the number of the range gates projected onto the scene). The discretized complex scattering function is represented by coefficients [7], [8]  $E_k = E_{(q,n)} = [e, g_k]_E = \int_X e(\mathbf{x})g_k(\mathbf{x})d\mathbf{x}$ ;  $k = 1,..., K = Q \times N$ , of it decomposition over the grid composed of such identical

shifted rectangular functions  $\{g_k(\mathbf{x})=g_{(q,n)}(\mathbf{x})=1 \text{ if } \mathbf{x}\in\rho_{(q,n)}(\mathbf{x})=$ = rect<sub>(q,n)</sub>( $x_1, x_2$ ) and  $g_k(\mathbf{x}) = 0$  for other  $\mathbf{x}\notin\rho_{(q,n)}(\mathbf{x})$  for all q =1, ..., Q; n = 1, ..., N;  $k = 1, ..., K = Q \times N$ }. Hence, the *K*-d approximation of the scattering field becomes

$$P_{(K)}(\mathbf{x}) = (P_{E(K)}e)(\mathbf{x}) = \sum_{k=1}^{K} E_k g_k(\mathbf{x})$$
(9)

where  $P_{E(K)}$  represents a projector onto such K-d signal approximation subspace

$$\mathbf{E}_{(K)} = P_{\mathbf{E}(K)}\mathbf{E} = \operatorname{Span}\{g_k(\mathbf{x})\}$$
(10)

spanned by *K* orthogonal grid functions (pixels)  $\{g_k(\mathbf{x})\}$ . Using such approximations, it is possible to proceed from the operator form (4) to its conventional numerical (vector) form

$$\mathbf{U} = \mathbf{S} \, \mathbf{E} + \mathbf{N} \,, \tag{11}$$

where **U**, **N** and **E** define the vectors composed of the coefficients  $U_m$ ,  $N_m$  and  $E_k$  of the finite-dimensional approximations of the fields u, n and e, respectively, and **S** is the matrix-form representation of the SFO with elements  $\{S_{mk} = [Sg_k, h_m]_U = \int_{Y} (Sg_k(\mathbf{x}))(\mathbf{y})h_m^*(\mathbf{y})d\mathbf{y}; k = 1, ..., K; m = 1, ..., K\}$ 

..., *M*} [6]. Zero-mean Gaussian vectors **E**, **N** and **U** in (11) are characterized by the correlation matrices,  $\mathbf{R}_E$ ,  $\mathbf{R}_N$  and  $\mathbf{R}_U = \mathbf{S}\mathbf{R}_E\mathbf{S}^+ + \mathbf{R}_N$ , respectively, where superscript + defines the Hermitian conjugate when it stands with a matrix or a vector. Because of the incoherent nature of the scattering field  $e(\mathbf{x})$ , the vector **E** has a diagonal correlation matrix,  $\mathbf{R}_E$ =diag( $\mathbf{B}$ ) =  $\mathbf{D}(\mathbf{B})$ , in which the *K*×1 vector of the principal diagonal **B** is composed of elements  $B_k = \langle E_k E_k^* \rangle$ ; k = 1, ..., K. This vector **B** is referred to as a vector-form representation of the SSP. Hence, using the definition (6) the *K*-d approximation of the desired RS signature estimate  $\hat{\Lambda}_{(K)}(\mathbf{x})$  as a continuous function of  $\mathbf{x} \in X$  over the probing scene *X* is now expressed as follows

$$\hat{\Lambda}_{(K)}(\mathbf{x}) = \operatorname{est}\{\Lambda < |e_{(K)}(\mathbf{x})|^2 > \} = \sum_{k=1}^{K} \Lambda(\hat{B}_k) g_k(\mathbf{x}); \qquad (12)$$
$$\mathbf{x} \in X.$$

Analyzing (12), one may deduce that in every particular measurement scenario (specified by the corresponding approximation spaces  $U_{(M)}$  and  $E_{(K)}$ ) one has to derive the estimate  $\hat{\mathbf{B}}$  of a vector-form approximation of the SSP that uniquely defines via (12) the approximated continuous pixel-format reconstructed map  $\hat{\Lambda}_{(K)}(\mathbf{x})$  of the desired RS signature distributed over the observed scene  $X \ni \mathbf{x}$ . Hence, the vector

$$\hat{\mathbf{\Lambda}} = \operatorname{vec}\left\{A(\hat{B}_k); k = 1, \dots, K\right\}$$
(13)

represents the numerical (i.e., vector-form) model of the reconstructed RS signature (RSS) in the conventional pixel format. Thus, the desired continuous-form RSS is uniquely
reconstructed from the estimate  $\hat{\mathbf{B}}$  of the SSP vector (pixel-formatted image) via (12).

### C. Experiment-design considerations

The experiment design (ED) aspects of the problem at hand implies the analysis of how to choose (finely adjust) the basis functions  $\{g_k(\mathbf{r})\}$  that span the signal representation subspace  $E_{(K)} = P_{E(K)}E = \text{Span}\{g_k\}$  for a given observation subspace  $U_{(M)} = \text{Span}\{\varphi_m\}$  [6], [8], [12]. Here, we formalize such the ED considerations via imposing the metrics structure in the solution space [6], [8] defined by the inner product

$$\|\mathbf{B}\|_{\mathbf{B}(K)}^2 = [\mathbf{B}, \mathbf{M}\mathbf{B}] \tag{14}$$

where B(K) represents the so-called correctness convex solution set [6], and **M** is referred to as the metrics inducing operator. Hence, the selection of **M** provides additional geometrical ED degrees of freedom of the problem model. In this study, we specify the model for **M** that corresponds to the numerical approximation of the Tikhonov's stabilizer of the second order [6], [8]. Next, following [6], we incorporate the projection-type a priori information, in which case the SSP vector **B** satisfies the linear constraint equation

$$\mathbf{GB} = \mathbf{C}, \quad \text{i.e.} \quad \mathbf{G}^{-}\mathbf{GB} = \mathbf{B}_{P} \tag{15}$$

where  $\mathbf{B}_P = \mathbf{G}^-\mathbf{C}$  and  $\mathbf{G}^-$  is the Moore-Penrose pseudoinverse of a given projection constraint operator  $\mathbf{G}: \mathbf{B}_{(K)} \to \mathbf{B}_{(Q)}$ , and the constraint vector  $\mathbf{C} \in \mathbf{B}_{(Q)}$  and the constraint subspace  $\mathbf{B}_{(Q)}$  (Q < K) are assumed to be given [8]. In (15), the constraint operator  $\mathbf{G}$  projects the portion of the unknown SSP onto the subspace where the SSP values are fixed by  $\mathbf{C}$ . In practice, such limitations may specify also the system calibration [15], [22].

### III. HIGH-RESOLUTION NONPARAMETRIC IMAGING

### A. DEDR method

In the descriptive statistical formalism, the desired SSP vector  $\hat{\mathbf{B}}$  is recognized to be the vector of the principal diagonal of an estimate of the correlation matrix  $\mathbf{R}_{\rm E}(\mathbf{B})$ , i.e.  $\hat{\mathbf{B}} = \{\hat{\mathbf{R}}_{\rm E}\}_{\rm diag}$ . Thus, one can seek to estimate  $\hat{\mathbf{B}} = \{\hat{\mathbf{R}}_{\rm E}\}_{\rm diag}$  given the data correlation matrix  $\mathbf{R}_{\rm U}$  pre-estimated via averaging  $J \ge 1$  independent sampled correlations [1], [24]

$$\hat{\mathbf{R}}_{\mathbf{U}} = \mathbf{Y} = \underset{j \in J}{\text{aver}} \{ \mathbf{U}_{(j)} \mathbf{U}^{+}_{(j)} \} = \frac{1}{J} \sum_{j=1}^{J} \mathbf{U}_{(j)} \mathbf{U}^{+}_{(j)} , \qquad (16)$$

and determining the solution operator (SO) F such that

$$\hat{\mathbf{B}} = \{ \hat{\mathbf{R}}_{\mathbf{E}} \}_{\text{diag}} = \{ \mathbf{F} \mathbf{Y} \mathbf{F}^{\dagger} \}_{\text{diag}} .$$
(17)

To optimize the search for the desired SO **F** we reformulate here the *DEDR* strategy [26], [27]

$$\mathbf{F} = \arg\min_{\mathbf{F}} \{ \mathcal{R}(\mathbf{F}) \} \text{ subject to } \langle \| \mathbf{\Delta} \|^2 \rangle_{p(\mathbf{\Delta})} \leq \delta$$
(18)

where the conditioning term represents the statistical worstcase statistical performance (WCSP) regularization constraint imposed on the unknown particular disturbed component of the uncertain SFO matrix [26],  $\tilde{S} = S + \Delta$ , where S represents the regular SFO,  $\Delta$  represents the random SFO perturbation term, and the DEDR "augmented risk" functional is defined as

$$\mathcal{R}(\mathbf{F}) = \operatorname{tr}\{\langle (\mathbf{F}\,\tilde{\mathbf{S}}\,-\mathbf{I})\mathbf{A}(\mathbf{F}\,\tilde{\mathbf{S}}\,-\mathbf{I})^{+}\rangle_{p(\boldsymbol{\Delta}\,)}\} + \alpha \operatorname{tr}\{\mathbf{F}\mathbf{R}_{\mathbf{N}}\mathbf{F}^{+}\}.$$
(19)

The DEDR strategy (18) implies the minimization of the weighted sum of the systematic and fluctuation errors (19) in the desired estimate (17), in which the unknown disturbances of the SFO  $\Delta$  are treated through the WCSP bounding constraint (18) imposed onto the averaged squared norm of  $\Delta$ . The selection (adjustment) of the regularization parameter  $\alpha$  and the weight matrix A provides the additional DEDR "degrees of freedom" incorporating any descriptive properties of a solution if those are known a priori [26], [27]. We incorporate also two additional requirements into such DEDR strategy: (i) the SO must involve the adjoint SFO  $S^+$  (to satisfy the observability condition [26]); (ii) the resulting SO must admit a representation form that does not involve the inversion of Y (to be applicable to the scenarios with the lowrank Y, e.g. SAR imaging). These additional requirements constitute the principal distinguishing aspects of the pursued DEDR approach from the conventional minimum risk strategies [9], [14], [24] and lead to the following reformulated conditional optimization problem [26], [27]

$$\mathbf{F} = \arg \min_{\mathbf{F}} \max_{< \|\Delta\|^2 >_{p(\Delta)} \le \delta} \{ \mathcal{P}(\mathbf{F}) \} .$$
(20)

To proceed with the derivation of the SO (20), in [26], [27] the following was performed: (i) decomposition of risk (19); (ii) evaluation of the maximum value  $\beta$  of the bounding constraint in (20) applying the Cauchy-Schwarz inequality. Doing this, we translate the original min-max problem (20) into the equivalent (under the specified constraints) aggregated optimization problem

$$\mathbf{F} = \arg\min_{\mathbf{F}} \{ \mathcal{R}_{DEDR}(\mathbf{F}) \}$$
(21)

with the aggregated DEDR risk functional,

$$\mathcal{R}_{DEDR}(\mathbf{F}) = \operatorname{tr}\{(\mathbf{FS} - \mathbf{I})\mathbf{A}(\mathbf{FS} - \mathbf{I})^{+}\} + \operatorname{atr}\{\mathbf{FR}_{\Sigma}\mathbf{F}^{+}\}$$
(22)

where

$$\mathbf{R}_{\Sigma} = \mathbf{R}_{\Sigma}(\beta) = (\mathbf{R}_{N} + \beta \mathbf{I}); \ \beta = \delta / \alpha \ge 0.$$
(23)

The solution of the minimization problem (21) was derived and detailed in [26], [27]; the resulting SO has the following representation

$$\mathbf{F}_{DEDR} = \mathbf{K}_{\mathbf{A},\alpha,\beta} \mathbf{S}^+ \mathbf{R}_{\Sigma}^{-1}$$
(24)

i.e., is a composition of the whitening filter,  $\mathbf{R}_{\Sigma}^{-1}$ , the matched spatial filter,  $\mathbf{S}^{+}$ , and the regularized reconstruction operator

$$\mathbf{K}_{\mathbf{A},\alpha,\beta} = (\mathbf{S}^{+} \mathbf{R}_{\Sigma}^{-1} \mathbf{S} + \alpha \mathbf{A}^{-1})^{-1}.$$
 (25)

Note that the derived SO (24) involves  $S^+$  (i.e. satisfies the DED-observability constraints) and does not involve the inversion of **Y** (i.e. is applicable to reconstructive SAR imaging problems with only one recorded realization of the trajectory data signal available for further processing, J = 1).

### B. FBR method

The robustified numerical version of the fused Bayesianregularization (FBR) method for reconstruction of the power spatial spectrum pattern (SSP) of the wave field scattered from a remotely sensing scene (that is referred to as a desired RS image) given a finite set of array radar/SAR signal recordings was developed originally in [7]. Since the SSP estimation is in essence a nonlinear numerical inverse problem, the proposition in [7], [8] was to alleviate the problem illposeness by robustification of the Bayesian estimation strategy [14], [24] via performing the non adaptive approximations of the reconstructive operators that incorporate the non trivial metrics considerations for designing the proper solution space and different regularization constraints imposed on a solution.

The estimator that produces the high-resolution optimal (in the sense of the Bayesian minimum risk strategy) estimate  $\hat{\mathbf{B}}$ of the SSP vector via processing the *M*-d data recordings **U** applying the FBR estimation strategy that incorporates also nontrivial a priori geometrical and projection-type model information was developed in [7] and [8]. Such optimal FBR estimate of the SSP is given by the nonlinear equation

$$\hat{\mathbf{B}} = \mathbf{B}_P + \mathbf{P}\mathbf{B}_0 + \mathbf{W}(\hat{\mathbf{B}})\{\mathbf{V}(\hat{\mathbf{B}}) - \mathbf{Z}(\hat{\mathbf{B}})\}.$$
(26)

In (26), the constraint  $\mathbf{B}_P$  is specified by (15) and  $\mathbf{B}_0$  represents the a priori SSP distribution to be considered as a zero step approximation to the desires SSP estimate  $\hat{\mathbf{B}}$ . The sufficient statistics (SS) vector  $\mathbf{V}(\hat{\mathbf{B}}) = {\mathbf{F}(\hat{\mathbf{B}})\mathbf{U}\mathbf{U}^{+}\mathbf{F}^{+}(\hat{\mathbf{B}})}_{\text{diag}}$  (vector composed of the principal diagonal of the embraced matrix) is formed via applying to the measured data vector  $\mathbf{U}$ , the solution-dependent SS formation operator [7]

$$\mathbf{F} = \mathbf{F}(\hat{\mathbf{B}}) = \mathbf{D}(\hat{\mathbf{B}})(\mathbf{I} + \mathbf{S}^{+}\mathbf{R}_{N}^{-1}\mathbf{S}\mathbf{D}(\hat{\mathbf{B}}))^{-1}\mathbf{S}^{+}\mathbf{R}_{N}^{-1}.$$
 (27)

The SS shift vector in (26) is defined as  $Z(\hat{B})$  [7], and the composite solution-dependent smoothing-projection window operator

$$\mathbf{W}(\hat{\mathbf{B}}) = \mathbf{P}_{\mathbf{W}} \mathbf{\Omega}(\hat{\mathbf{B}})$$
(28)

is composed of the projector

$$\mathbf{P}_{\mathbf{W}} = (\mathbf{I} - \mathbf{G}^{-}\mathbf{G}) \tag{29}$$

and the solution dependent smoothing window

$$\mathbf{\Omega}(\hat{\mathbf{B}}) = [\operatorname{diag}(\{\mathbf{S}^{+}\mathbf{F}^{+}\mathbf{F}\mathbf{S}\}_{\operatorname{diag}}) + \hat{\alpha} \mathbf{D}^{2}(\hat{\mathbf{B}})\mathbf{M}(\hat{\mathbf{B}})]^{-1}, \quad (30)$$

in which the regularization parameter  $\hat{\alpha}$  is to be adaptively adjusted using the system calibration data [7], [8]. The resulting FBR-optimal estimate in the numerical (discrete pixel) format is given by

$$\hat{\mathbf{B}}_{FBR} = \mathbf{B}_{P} + \mathbf{P}\mathbf{B}_{0} + \mathbf{W}(\hat{\mathbf{B}}) \{ \mathbf{V}(\hat{\mathbf{B}}) - \mathbf{Z}(\hat{\mathbf{B}}) \}.$$
(31)

Because of the non-linearity and complexity of the solutiondependent *K*-d operator inversions needed to be performed to compute the SS  $V(\hat{B})$ , the window  $W(\hat{B})$  and SS shift  $Z(\hat{B})$ , the computational load of such optimal FBR estimator (26), (31) developed originally in [7], [8] is extremely high to address that as a practically realizable estimator of the SSP and RSS (i.e. practical high-resolution RS radar imaging and signature mapping technique realizable to operate in a realtime mode).

### *C.* DEDR-related and FBR-related robust spatial filtering (RSF) techniques

The robustification scheme for real-time implementation of the DEDR estimator (17) and the FBR estimator (26), (31) enables one to reduce drastically the computation load of the image formation procedure without substantial degradation in the resolution and overall image performances. Here first, we propose the robustified versions of the DEDR estimator defined by (17) and the FBR estimator given by (26) that we refer to as the robust FBR reconstructive filtering (RFBR) method. This method is a direct generalization of the previous one proposed in [7] and [8] that we perform here via roughing  $P_W = I$  and approximating both the SS formation operator  $F(\hat{B})$  and the smoothing window  $\Omega(\hat{B})$  in (26) by roughing  $D(\hat{B}) \approx D = b_0 I$ , where  $b_0$  represents the expected a priori image grey level [7], [8]. Hence, the robustified SS formation operator

$$\mathbf{F} = \mathbf{A}^{-1}(\rho)\mathbf{S}^{+} \quad \text{with} \quad \mathbf{A}(\rho) = \mathbf{S}^{+}\mathbf{S} + \rho^{-1}\mathbf{I}$$
(32)

becomes the regularized inverse of the SFO **S** with regularization parameter  $\rho^{-1}$ , the inverse of the signal-to-noise ratio (SNR)  $\rho = b_0/N_0$  for the adopted white observation noise model,  $\mathbf{R}_{\mathbf{N}} = N_0 \mathbf{I}$  with intensity  $N_0$ . In that case, the robust smoothing window

$$\mathbf{W} = \mathbf{\Omega} = (w_0 \mathbf{I} + \mathbf{M})^{-1}$$
(33)

is completely defined by the matrix **M** that induces the metrics structure in the solution space [6] with the scaling factor  $w_0 = \text{tr}\{\mathbf{S}^+\mathbf{F}^+\mathbf{FS}\}/K$ . Such robustified **W** can be pre-computed a

priori for a family of different admissible  $\rho$  as it was proposed in the previous studies [7], [8]. Here, we employ a practical constraints of high SNR operational conditions [22],  $\rho \gg 1$ , in which case one can neglect also the constant bias  $\mathbf{Z} = Z_0 \mathbf{I}$  in (26) because it does not affect the pattern of the SSP estimate (it influences only the constant grey level in the resulting solution but  $Z_0 \ll \beta$  for  $\rho \gg 1$ ). Following these practically motivated assumptions, the resulting RFBR estimator for the SSP becomes

$$\hat{\mathbf{B}}_{RFBR} = \mathbf{B}_0 + \mathbf{\Omega} \mathbf{V} , \qquad (34)$$

where  $\mathbf{V} = {\{\mathbf{FUU}^{\dagger}\mathbf{F}^{\dagger}\}}_{\text{diag}}$  represents now the robust SS vector.

### D. Matched spatial filtering (MSF) algorithm

The simplest rough SSP and RSS estimators can be constructed as further simplification of (34) if the trivial a priori model information ( $\mathbf{P}_{\mathbf{W}} = \mathbf{I}$  and  $\mathbf{B}_0 = b_0 \mathbf{I}$ ) is adopted, and roughly approximate the SS formation operator **F** by the adjoint SFO, i.e. the matched filter

$$\mathbf{F} \approx \gamma_0 \mathbf{S}^+ \tag{35}$$

where the normalizing constant  $\gamma_0$  provides balance of the operator norms  $\gamma_0^2 = \text{tr}^{-1} \{ \mathbf{S}^+ \mathbf{S} \mathbf{S}^+ \mathbf{S} \} \text{tr} \{ \mathbf{F} \mathbf{S} \mathbf{S}^+ \mathbf{F}^+ \}$  [6]. In that case, the estimator (34) is simplified to its rough matched spatial filter (MSF) version

$$\hat{\mathbf{B}}_{MSF} = \mathbf{\Omega} \mathbf{\Pi} \,, \tag{36}$$

where the rough SS,  $\Pi = \gamma_0^2 \{\mathbf{S}^+ \mathbf{U}\mathbf{U}^+\mathbf{S}\}_{\text{diag}}$ , is now formed applying the adjoint operator (i.e. the matched spatial filter)  $\mathbf{S}^+$ , and the windowing of the rough SS  $\Pi$  is performed applying the smoothing filter  $\Omega = (w_0\mathbf{I} + \mathbf{M})^{-1}$  with the nonnegative entry [7], [8]. The (36) is referred to as matched spatial filtering (MSF) algorithm for estimation of the SSP. Equation (36) is recognized to be a vector-form representation of the conventional kernel SSP estimation algorithm [9], [24], in which the SS is formed as the squared modulus of the outcomes of the matched spatial filter applied to the recorded data signal (trajectory signal in the SAR terminology [12], [23]). Thus, in the framework of the FBR inference-based approach to RS imaging [6], the traditional MSF technique (36) can be viewed as a rough simplified version of the RFBR algorithm (34).

### E. Robust adaptive spatial filtering (RASF) algorithm

The RASF solution operator (SO) is a modification of the (27) for the case of an arbitrary zero-mean noise with the correlation matrix  $\mathbf{R}_N$ , the equal importance of the systematic and noise error measures, i.e.  $\alpha = 1$ , and the solution dependent weight matrix  $\mathbf{A} = \hat{\mathbf{D}}^{-1}$ . In this case, the SO is recognized to be the robust adaptive spatial filter (RASF)

$$\mathbf{F}_{RASF} = (\mathbf{S}^+ \, \mathbf{R}_{\mathbf{N}}^{-1} \, \mathbf{S} + \hat{\mathbf{D}}^{-1})^{-1} \mathbf{S}^+ \, \mathbf{R}_{\mathbf{N}}^{-1} \ . \tag{37}$$

### IV. QUALITY METRICS

The traditional quantitative quality metric [1] for RS images is the so-called Improvement in the Output Signal to Noise Ratio (IOSNR), which provides the metrics for performance gains attained with different employed estimators in dB scale

$$IOSNR(dB) = 10 \cdot \log_{10} \left( \frac{\sum_{k=1}^{K} (\hat{b}_{k}^{(MSF)} - b_{k})^{2}}{\sum_{k=1}^{K} (\hat{b}_{k}^{(p)} - b_{k})^{2}} \right), \quad (38)$$
$$p = 1, ..., P,$$

where  $b_k$  represents the value of the *k*-th element (pixel) of the original SSP,  $\hat{b}_k^{(MSF)}$  represents the value of the *k*-th element (pixel) of the rough SSP estimate formed applying the matched spatial filtering (MSF) method, and  $\hat{b}_k^{(p)}$  represents the value of the *k*-th element (pixel) of the enhanced SSP estimate formed applying the *p*th enhanced imaging method (p = 1, ..., P), respectively.

The percentage IOSNR (PIOSNR) quality metric is a modification of the IOSNR metric [22]; it expresses the percentage of the gained reconstruction improvement specified as follows

$$PIOSNR(\%) = 100 \left( 1 - \frac{\sum_{k=1}^{K} (\hat{b}_{k}^{(p)} - b_{k})^{2}}{\sum_{k=1}^{K} (\hat{b}_{k}^{(MSF)} - b_{k})^{2}} \right),$$
(39)  
$$p = 1, ..., P.$$

Finally, the total Mean Square Error (MSE) is a quality metric defined as [24]

$$MSE = \sum_{k=1}^{K} \left( \hat{b}_{k}^{(p)} - b_{k} \right)^{2}, \quad p = 1, ..., P.$$
 (40)

The quality metrics specified by (38), (39) and (40) allow to quantify the performances of the employed SSP and RSS reconstructive estimation methods (enumerated by p = 1, ..., P).

### V. SIMULATIONS

The first simulation experiment was performed for the test (artificially synthesized) scenes imaging applying the SAR with partially synthesized aperture as an RS imaging system [8]. The SFO of all RS images were factorized along two axes in the image plane: the azimuth (horizontal axis,  $x_1$ ) and the range (vertical axis,  $x_2$ ). Following the common practically motivated technical considerations [5], [12], [23] we modelled a triangular shape of the SAR range ambiguity function (AF)  $\Psi_r(x_2)$  in the  $x_2$  direction, and a  $|sinc|^2$  shape of the side-looking SAR azimuth AF  $\Psi_a(x_1)$  in the  $x_1$  direction at the zero crossing level for the simulated SAR system with fractionally synthesized array [8], [23], [24].

The behavior and performance indices of the described estimators were examined for five RS system configurations applied to three test scenes as specified below.

In the first simulation scenario, the assigned values of the AF widths were: 5 pixels width for  $\Psi_r(x_2)$  and 10 pixels width for  $\Psi_a(x_1)$ . In the simulations reported in Fig. 1, we considered the case of white Gaussian observation noise with the SNR of 30 dB. Figure 1(a) shows the 512×512-pixel original synthesized test scene. Figure 1(b) reports the image formed implementing the MSF method. Figure 1(c) presents the reconstructed (enhanced) synthesized image formed using the RASF estimator. Figure 1(d) shows the reconstructed (enhanced) synthesized image formed using the DEDR estimator. Figure 1(e) presents the reconstructed (enhanced) synthesized image formed using the FBR estimator. Last, Figure 1(f) shows the reconstructed (enhanced) synthesized image formed using the RFBR estimator. The quantitative quality metrics of the IOSNR, PIOSNR and MSE gained with the employed enhanced imaging methods for the simulated fractional aperture synthesis scenarios with different levels of noise are reported in Table 1.

In the second simulation scenarios, the high-resolution realworld environmental images were used as test scenes [4]. The first tested scene is shown in Fig. 2(a) and the second tested scene is shown in Fig. 3(a). The simulation experiments were run with the following system-level specifications: 5 pixels width for  $\Psi_r(x_2)$  and 20 pixels width for  $\Psi_a(x_1)$ , respectively. In the basic simulations, we considered the case of white Gaussian observation noise with the SNR of 30 dB. Figures 2(b) and 3(b) show the images formed via implementing the MSF method with the system parameters specified in the figure captions. Figures 2(c) and 3(c) present the reconstructed (enhanced) images formed using the RASF estimator. Figures 2(d) and 3(d) show the enhanced images reconstructed with the DEDR method. Figures 2(e) and 3(e) present the reconstructed (enhanced) images formed using the FBR estimator. Figures 2(f) and 3(f) show the enhanced images reconstructed with the RFBR method. The quantitative quality metrics of the IOSNR, PIOSNR and MSE gained with different tested enhanced imaging methods for the simulated fractional aperture synthesis scenarios with different levels of noise are reported in Tables 2 and 3, respectively.

In the third reported here simulation scenario that was run with the second real-world SAR scene, the system-level specifications were as follows: 5 pixels width for  $\Psi_r(x_2)$ , 40 pixels width for  $\Psi_a(x_1)$  for the first (1<sup>st</sup>) system and 50 pixels width for  $\Psi_a(x_1)$  for the second (2<sup>nd</sup>) simulated fractional SAR imaging system with SNR of 30 dB. Figures 4(a) and 5(a) show the 512×512-pixel high-resolution original scene. Figures 4(b) and 5(b) present the images of the same scene formed implementing the MSF method. Figures 4(c) and 5(c) display the reconstructed (enhanced) scene images formed using the RASF estimator. Figures 4(d) and 5(d) show the enhanced images reconstructed with the DEDR method. Figures 4(e) and 5(e) display the enhanced scene images reconstructed using the FBR estimator. Last, figures 4(f) and 5(f) present the reconstructed (enhanced) images formed using the RFBR technique. The *IOSNR*, *PIOSNR* and *MSE* quantitative quality metrics gained with different simulated enhanced imaging methods for different fractional SAR operational scenarios and different levels of noise are reported in Tables 4 and 5, respectively. The presented simulation protocols are indicative of improvements both in the qualitative and quantitative metrics gained with the proposed robust DEDR and FBR-related techniques in comparison with the conventional MSF and RSF algorithms.

### VI. COMPUTATIONAL COMPLEXITY

Real-time computing is traditionally referred to as study of software systems which are subject to some real-time operational constraints [1] (e.g., operational deadlines from en event to a system response) [19]. By contrast, a non-real-time system is one for which there is no deadline, even if fast response or high performance is desired or preferred [1], [19]. The needs of real-time software are often addressed in the context of real-time operating systems [1], and synchronous programming languages [2], which provide frameworks on which to build up the real-time application software [2], [3].

A real time RS data processing/imaging system is one, which performances can be considered (within a particular RS context) to be mission critical [3]. Real-time computations can be said to have failed if they are not completed before their deadline, where the deadline is relative to an RS event [19]. A real-time deadline must be met, regardless of system load [1].

For the previously described image enhancement and SSP/RSS mapping methodologies, it is reasonable to define the computational complexity via determining the number of computational operations needed to perform the particular employed algorithms [10]. Consider K as a matrix, I as an inverse matrix. Let suffix n represents the number of matrix multiplications and/or inversions required to complete the mathematical operations (e.g.,  $K^{(4)}$  represents a quadruple matrix multiplication,  $I^{(2)}$  represents a double matrix inversion, etc.). For the particular employed simulation formats, K and I are 512×512 matrixes. The number of operations needed to complete one reconstruction cycle for the tested and compared methods are reported in Table 6. With these results, one can analyze the processing time (in operation cycles) needed to perform computationally each proposed/employed algorithm. Last, Table 7 reports the computational times required for completing the compared SSP/RSS reconstructive techniques with three different typical computer processing unit (CPU) clock speeds: (i) with a personal computer (PC) running at 2.66 GHz with a single processor; (ii) with a workstation (WS) running at 3.80 GHz

with a duo processor, and (iii) with a dedicated hardware (DH) running at 300 MHz with a single processor.

The presented results of comparative simulation analysis illustrate the behavior and overall imaging performance improvements gained with the proposed robust DEDR and FBR-related approaches compared with other previously developed methods [1], [4], [12], [20] in both the reconstruction quality metrics and computational complexity reduction. The advantages of the well designed robust imaging experiments (that employ the RASF, DEDR, and RFBR methods) over the cases of poorer designed experiments (that employ the conventional MSF and RSF algorithms) were investigated through extensive simulation study and reported here for different multi-grade test scenes.



a. Original artificially synthesized test scene.



c. Test scene reconstruction using the RASF estimator.



e. Test scene reconstruction using the FBR estimator.



b. Low-resolution scene image formed applying the MSF method.



d. Test scene reconstruction using the DEDR estimator.



f. Test scene reconstruction using the robust RFBR estimator.

Fig. 1. Simulation results of the synthesized test scene SSP reconstruction. Specifications of the simulation experiment are summarized in Table 1.



a. First real-world original high-resolution scene.



c. Scene reconstruction using the RASF estimator.



e. Scene reconstruction using the FBR estimator.





d. Scene reconstruction using the DEDR estimator.



f. Scene reconstruction using the robust RFBR estimator.

Fig. 2. Simulation results of the first real-world SAR scene imaging with SSP reconstruction performed with the 1<sup>st</sup> system. Specifications of the simulation experiment are summarized in Table 2.



a First real-world original high-resolution scene.



c. Scene reconstruction using the RASF estimator.



e. Scene reconstruction using the FBR estimator.





d. Scene reconstruction using the DEDR estimator.



f. Scene reconstruction using the robust RFBR estimator.

Fig. 3. Simulation results of the first real-world SAR scene imaging with SSP reconstruction performed with the 2<sup>nd</sup> system. Specifications of the simulation experiment are summarized in Table 3.



a. Second real-world original high-resolution scene.



c. Scene reconstruction using the RASF estimator.



e. Scene reconstruction using the FBR estimator.





d. Scene reconstruction using the DEDR estimator.



f. Scene reconstruction using the robust RFBR estimator.

Fig. 4. Simulation results of the first real-world SAR scene imaging with SSP reconstruction performed with the 1<sup>st</sup> system. Specifications of the simulation experiment are summarized in Table 4.



a. Second real-world original high-resolution scene.



c. Scene reconstruction using the RASF estimator.



e. Scene reconstruction using the FBR estimator.





d. Scene reconstruction using the DEDR estimator.



f. Scene reconstruction using the robust RFBR estimator.

Fig. 5. Simulation results of the first real-world SAR scene imaging with SSP reconstruction performed with the 2<sup>nd</sup> system. Specifications of the simulation experiment are summarized in Table 5.

| TABLE 1                                                                                                            |
|--------------------------------------------------------------------------------------------------------------------|
| COMPARATIVE TABLE OF THE QUALITY METRICS GAINED WITH DIFFERENT ESTIMATION METHODS FOR THREE LEVELS OF NOISE (SNR). |
| RESULTS ARE REPORTED FOR THE SYNTHESIZED TEST SCENE.                                                               |
|                                                                                                                    |

|        | SYSTEM SPECII                           | FICATIONS: | RANGE TRIA | NGULAR SH | APE OF AF | $\Psi_R(X_2) = 5 \text{ Pl}$ | IXELS WIDTI | h; Azimuth | <i>SINC</i>   <sup>2</sup> SHA | PE OF AF $\Psi$ | $Y_{A}(X_{1}) = 10 \text{ PI}$ | XELS WIDTH | 1.    |  |
|--------|-----------------------------------------|------------|------------|-----------|-----------|------------------------------|-------------|------------|--------------------------------|-----------------|--------------------------------|------------|-------|--|
|        | $Method \rightarrow$                    |            | RASF       |           |           | DEDR                         |             |            | FBR                            |                 |                                | RFBR       |       |  |
|        | $\mathrm{SNR}[\mathrm{dB}] \rightarrow$ | 20         | 25         | 30        | 20        | 25                           | 30          | 20         | 25                             | 30              | 20                             | 25         | 30    |  |
| s      | IOSNR<br>[dB]                           | 15.65      | 20.84      | 25.23     | 13.54     | 18.85                        | 23.45       | 10.26      | 14.76                          | 17.37           | 11.16                          | 15.53      | 18.36 |  |
| Metric | PIOSNR<br>(%)                           | 72.34      | 78.16      | 77.06     | 76.74     | 84.77                        | 79.69       | 92.82      | 92.75                          | 95.54           | 91.73                          | 91.43      | 94.33 |  |
|        | MSF                                     | 0.20       | 0.50       | 0.60      | 0.23      | 0.40                         | 0.50        | 0.03       | 0.20                           | 0.10            | 0.04                           | 0.22       | 0.14  |  |

 TABLE 2

 COMPARATIVE TABLE OF THE QUALITY METRICS GAINED WITH DIFFERENT ESTIMATION METHODS FOR THREE LEVELS OF NOISE (SNR).

 RESULTS ARE REPORTED FOR THE 1<sup>ST</sup> SYSTEM APPLIED TO THE FIRST SAR SCENE.

 SYSTEM SPECIFICATIONS: RANGE TRIANGULAR SHAPE OF AF  $\Psi_a(x_2) = 5$  PIXELS WIDTH; AZIMUTH  $|SINC|^2$  SHAPE OF AF  $\Psi_a(x_1) = 20$  PIXELS WIDTH

|        | $Method \rightarrow$                       |       | DEDR  |       |       | FBR   |       |       | RFBR  |       |       |       |       |
|--------|--------------------------------------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
|        | $\mathrm{SNR} \ [\mathrm{dB}] \rightarrow$ | 15    | 20    | 25    | 15    | 20    | 25    | 15    | 20    | 25    | 15    | 20    | 25    |
| s      | IOSNR<br>[dB]                              | 10.15 | 15.32 | 20.25 | 8.76  | 13.74 | 18.84 | 5.47  | 9.85  | 12.63 | 6.15  | 10.62 | 13.04 |
| Metric | PIOSNR<br>(%)                              | 81.37 | 86.62 | 85.24 | 83.22 | 91.14 | 90.21 | 96.63 | 91.68 | 99.10 | 95.18 | 90.29 | 98.24 |
|        | MSF                                        | 0.16  | 0.46  | 0.57  | 0.18  | 0.37  | 0.46  | 0.02  | 0.24  | 0.24  | 0.03  | 0.29  | 0.34  |

TABLE 3

COMPARATIVE TABLE OF THE QUALITY METRICS GAINED WITH DIFFERENT ESTIMATION METHODS FOR THREE LEVELS OF NOISE (SNR). RESULTS ARE REPORTED FOR THE  $2^{ND}$  SYSTEM APPLIED TO THE FIRST SAR SCENE. SYSTEM SPECIFICATIONS: RANGE TRIANCILLAR SHAPE OF AF  $\Psi(x_1) = 5$  PIXELS WIDTH: AZIMUTH  $|x_1x_2|^2$  SHAPE OF AF  $\Psi(x_2) = 30$  PIXELS WIDTH:

|        | STSTEM SPECI                               | FICATIONS. I | NANGE I KIA | INGULAR SH | APE OF AF | $T_R(\Lambda_2) = 5$ PI | IAELS WIDT | i, Azimuth | SHA   | PEOFALT $T_{j}$ | $_{4}(X_{1}) = 50 \text{ PI}$ | AELS WIDTE | 1.    |
|--------|--------------------------------------------|--------------|-------------|------------|-----------|-------------------------|------------|------------|-------|-----------------|-------------------------------|------------|-------|
|        | $Method \rightarrow$                       |              | RASF        |            |           | DEDR                    |            |            | FBR   |                 |                               | RFBR       |       |
|        | $\mathrm{SNR} \ [\mathrm{dB}] \rightarrow$ | 15           | 20          | 25         | 15        | 20                      | 25         | 15         | 20    | 25              | 15                            | 20         | 25    |
| s      | IOSNR<br>[dB]                              | 9.42         | 14.87       | 19.37      | 7.83      | 12.96                   | 17.24      | 5.92       | 10.23 | 15.37           | 6.23                          | 11.73      | 15.96 |
| Metrio | PIOSNR<br>(%)                              | 77.37        | 82.74       | 81.24      | 79.32     | 87.74                   | 86.41      | 97.83      | 94.28 | 99.26           | 96.28                         | 93.29      | 98.64 |
|        | MSF                                        | 0.30         | 0.60        | 0.70       | 0.33      | 0.50                    | 0.60       | 0.13       | 0.30  | 0.20            | 0.14                          | 0.32       | 0.24  |

 TABLE 4

 COMPARATIVE TABLE OF THE QUALITY METRICS GAINED WITH DIFFERENT ESTIMATION METHODS FOR THREE LEVELS OF NOISE (SNR).

 RESULTS ARE REPORTED FOR THE 1<sup>ST</sup> SYSTEM APPLIED TO THE SECOND SAR SCENE.

 SYSTEM SPECIFICATIONS: RANGE TRIANGULAR SHAPE OF AF  $\Psi_4(x_2) = 5$  PIXELS WIDTH; AZIMUTH  $|SINC|^2$  SHAPE OF AF  $\Psi_4(x_1) = 40$  PIXELS WIDTH.

|        | $Method \rightarrow$     |       | RASF  |       |       | DEDR  |       |       | FBR   |       |       | RFBR  |       |
|--------|--------------------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
|        | Noise $[dB] \rightarrow$ | 15    | 20    | 25    | 15    | 20    | 25    | 15    | 20    | 25    | 15    | 20    | 25    |
| s      | IOSNR<br>[dB]            | 12.42 | 17.82 | 22.75 | 9.42  | 14.72 | 19.64 | 5.23  | 10.22 | 15.33 | 6.24  | 11.25 | 16.45 |
| Metric | PIOSNR<br>(%)            | 65.77 | 70.84 | 69.96 | 67.28 | 75.44 | 74.43 | 90.33 | 87.38 | 93.70 | 89.18 | 86.39 | 92.54 |
|        | MSF                      | 0.26  | 0.56  | 0.66  | 0.29  | 0.46  | 0.56  | 0.14  | 0.26  | 0.16  | 0.10  | 0.28  | 0.20  |

 TABLE 5

 COMPARATIVE TABLE OF THE QUALITY METRICS GAINED WITH DIFFERENT ESTIMATION METHODS FOR THREE LEVELS OF NOISE (SNR).

 RESULTS ARE REPORTED FOR THE  $2^{ND}$  SYSTEM APPLIED TO THE SECOND SAR SCENE.

 SYSTEM SPECIFICATIONS: RANGE TRIANGUL AR SHAPE OF AF  $\Psi_i(x_i) = 50$  pixel S width: Azimuth  $|SNC|^2$  Shape of AF  $\Psi_i(x_i) = 50$  pixel S width

|        | 5131EM SI ECH                              | TCATIONS. | KANGE IKIA | NOULAR SI | ALEOFAL | $I_R(\Lambda_2) = J I$ | IAELS WIDTI | I, AZIWU III | SINC SIIA | IL OF AL 1 | $A(A_1) = 5011$ | ALLS WIDTI | 1.    |
|--------|--------------------------------------------|-----------|------------|-----------|---------|------------------------|-------------|--------------|-----------|------------|-----------------|------------|-------|
|        | $Method \rightarrow$                       |           | RASF       |           |         | DEDR                   |             |              | FBR       |            |                 | RFBR       |       |
|        | $\mathrm{SNR} \ [\mathrm{dB}] \rightarrow$ | 15        | 20         | 25        | 15      | 20                     | 25          | 15           | 20        | 25         | 15              | 20         | 25    |
| S      | IOSNR<br>[dB]                              | 13.64     | 18.32      | 23.74     | 10.45   | 15.76                  | 20.73       | 6.37         | 11.52     | 16.75      | 7.43            | 12.53      | 17.89 |
| Metric | PIOSNR<br>(%)                              | 74.77     | 79.74      | 78.42     | 76.32   | 84.44                  | 83.41       | 94.63        | 91.53     | 97.12      | 93.58           | 90.89      | 96.74 |
|        | MSF                                        | 0.25      | 0.55       | 0.65      | 0.28    | 0.45                   | 0.55        | 0.08         | 0.25      | 0.15       | 0.09            | 0.27       | 0.19  |

 TABLE 6

 NUMBER OF OPERATIONS PER CYCLE FOR COMPUTATIONAL IMPLEMENTATION OF DIFFERENT ENHANCED IMAGING METHODS.

 Results are reported for each analyzed method.

| Method | Equation | Processing Algorithm                                                                                                                                                |               | Operations per cycle                                                         |
|--------|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|------------------------------------------------------------------------------|
| DEDR   | (24)     | $\mathbf{F}_{DEDR} = \mathbf{K}_{\mathbf{A},\alpha,\beta} \mathbf{S}^+ \mathbf{R}_{\Sigma}^{-1}$                                                                    | $\rightarrow$ | $K^{(2)} \cdot I$                                                            |
| FBR    | (31)     | $\hat{\mathbf{B}}_{FBR} = \mathbf{B}_{P} + \mathbf{P}\mathbf{B}_{0} + \mathbf{W}(\hat{\mathbf{B}}) \{\mathbf{V}(\hat{\mathbf{B}}) - \mathbf{Z}(\hat{\mathbf{B}})\}$ | $\rightarrow$ | $\boldsymbol{K} + \boldsymbol{K}^{(2)} + \boldsymbol{K}^{(4)}\boldsymbol{I}$ |
| RFBR   | (34)     | $\hat{\mathbf{B}}_{RFBR} = \mathbf{B}_0 + \mathbf{\Omega} \mathbf{V}$                                                                                               | $\rightarrow$ | $\boldsymbol{K} + \boldsymbol{K}^{(2)} \cdot \boldsymbol{I}$                 |
| MSF    | (36)     | $\hat{\mathbf{B}}_{MSF} = \mathbf{\Omega} \mathbf{\Pi}$                                                                                                             | $\rightarrow$ | K·I                                                                          |
| RASF   | (37)     | $\mathbf{F}_{RASF} = (\mathbf{S}^{+} \mathbf{R}_{\mathbf{N}}^{-1} \mathbf{S} + \hat{\mathbf{D}}^{-1})^{-1} \mathbf{S}^{+} \mathbf{R}_{\mathbf{N}}^{-1}$             | $\rightarrow$ | $\boldsymbol{K} \cdot \boldsymbol{I}^{(2)}$                                  |

#### TABLE 7

COMPARATIVE TABLE OF THE REQUIRED PROCESSING TIME FOR THE COMPARED ENHANCED IMAGING METHODS.

| THE RESULTS | ARE REPORTED | IN SECONDS |
|-------------|--------------|------------|

NOTE - PROCESSING TIMES ARE CALCULATED CONSIDERING ALL THE CPU CLOCK SPEED IS DEDICATED;

| Method | Operation per cycle                                                           | Total operations      | PC Time<br>[seconds] | WS Time<br>[seconds] | DH Time<br>[seconds] |
|--------|-------------------------------------------------------------------------------|-----------------------|----------------------|----------------------|----------------------|
| DEDR   | $K^{(2)} \cdot I$                                                             | $1.34 x 10^8$         | 0.05                 | 0.035                | 0.45                 |
| FBR    | $\boldsymbol{K} + \boldsymbol{K}^{(2)} + \boldsymbol{K}^{(4)} \boldsymbol{I}$ | 3.48x10 <sup>13</sup> | 1.30x10 <sup>4</sup> | 9.15x10 <sup>3</sup> | $11.60 \times 10^3$  |
| RFBR   | $\boldsymbol{K} + \boldsymbol{K}^{(2)} \cdot \boldsymbol{I}$                  | $6.92 \times 10^{10}$ | 26.15                | 18.21                | 230.66               |
| MSF    | K·I                                                                           | 1.34x10 <sup>8</sup>  | 0.05                 | 0.035                | 0.45                 |
| ASF    | $\boldsymbol{K} \cdot \boldsymbol{I}^{(2)}$                                   | $6.76 x 10^{10}$      | 25.41                | 17.69                | 225.12               |

### VII. CONCLUDING REMARKS

We have performed the detailed comparative study of different proposed robust numerical versions of two recently developed high-resolution adaptive radar/SAR imaging methodologies: DEDR and FBR techniques. The undergone study revealed structural similarity of the robustified algorithms invoked from both methodologies, in particular, structural similarity of the RASF (DEDR-related) and the RFBR (robust FBR-related) methods. The performed comparative analysis of the computational complexities of different imaging techniques based on the robust SSP and RSS estimators revealed that the DEDR-related and FBRrelated robust imaging algorithms manifest user-controlled real-time implementation performances because the RS deadline event is completed in each stage of the image reconstruction process to provide the system response in a virtually "real" (i.e., user-required) time.

In the RS applications related to the real-world  $512 \times 512$ pixel scene image enhancement/reconstruction scenarios, the computational complexity for performing the enhanced RS imaging with the proposed RFBR algorithm in comparison with the original FBR method was drastically decreased, i.e., approximately  $10^5$  times and required 27 seconds of the overall computational time. In the same manner, the computation time required for performing the DEDR-related robust RASF imaging algorithm was decreased even more drastically, approximately 10<sup>15</sup> times with respect to the adaptive (non-robust) original FBR method and required approximately only 0.50 seconds of the overall computational time. Also, the simulation protocols reported for different testes scenarios verify in more details the substantial efficiency of the proposed here high-resolution robust radar/SAR imaging techniques.

### REFERENCES

- [1] P.M. Mather, *Computer processing of remotely-sensed images*, John Wiley & Sons, U.S.A., 2004.
- [2] E. Schrödinger, Science, theory and man, Dover, U.S.A., 1957.
- [3] S. Greenfield, *The human brain: a guided tour*, Weinfeld and Nicholson, U.K., 1997.
- [4] T. Freeman, Jet Propulsion Laboratory, Space Imaging, "What is imaging radar?", 2005, http://www.spaceimaging.com.
- [5] C. Olmsted, Scientific SAR user's guide, Alaska SAR Facility, U.S.A. 1993.
- [6] Y.V. Shkvarko, "Estimation of Wavefield Power Distribution in the Remotely Sensed Environment: Bayesian Maximum Entropy Approach", *IEEE Transactions on Signal Processing*, vol. 50, pp. 2333-2346, September 2002.
- [7] Y.V. Shkvarko, "Unifying Regularization and Bayesian Estimation Methods for Enhanced Imaging with Remotely Sensed Data. Part I – Theory", *IEEE Transactions on Geoscience and Remote Sensing*, vol. 42, pp. 923-931, March 2004.
- [8] Y.V. Shkvarko, "Unifying Regularization and Bayesian Estimation Methods for Enhanced Imaging with Remotely Sensed Data. Part II – Implementation and Performance Issues", *IEEE Transactions on Geoscience and Remote Sensing*, vol. 42, pp. 932-940, March 2004.
- [9] F.M. Henderson and A.V. Lewis, *Principles and application of imaging radar, manual of remote sensing*, John Wiley & Sons, U.S.A., 1998.
- [10] J.L. Starck, F. Murtagh and A. Bijaoui, *Image processing and data analysis, the multiscale approach*, Cambridge University Press, U.K., 1998.
- [11] B.R. Mahafza, Radar systems analysis and design using MATLAB, CRC Press, U.S.A., 2000.
- [12] A.W. Doerry, F.M. Dickey, L.A. Romero and J.M. DeLaurentis, "Difficulties in Superresolving SAR Images", *SPIE Proceedings*, vol. 4727, pp. 122-133, April 2002.
- [13] R.C. Puetter, "Information Language and Pixon-based Image Reconstruction", SPIE Proceedings, vol. 2827, pp. 12-31, 1996.
- [14] S. Haykin and A. Steinhardt, Adaptive radar detection and estimation, John Wiley & Sons, U.S.A., 1992.
- [15] D.C. Bell and R.M. Narayanan, "Theoretical Aspects of Radar Imaging using Stochastic Waveforms", *IEEE Transactions on Signal Processing*, vol. 49, pp. 349-400, February 2001.
- [16] Y.V. Shkvarko, "Theoretical Aspects of Array Radar Imaging via Fusing Experiment Design and Descriptive Regularization Techniques", *Proceedings of the* 2<sup>nd</sup> *IEEE Workshop on Sensor Array and Multichannel Signal Processing*, Washington U.S.A., 2002.
- [17] Space Imaging, 2006, http://www.spaceimaging.com.
- [18] R.O. Harger, Synthetic aperture radar systems: theory and design, Academic Press, U.S.A. 1970.
- [19] R. Bamler, "A Comparison of Range-Doppler and Wave-Number Domain SAR Focusing Algorithms", *IEEE Transactions on Geoscience* and Remote Sensing, vol. 30, pp. 706-713, June 1991.
- [20] S.E. Falkovich, V.I. Ponomaryov and Y.V. Shkvarko, *Optimal spatial-temporal signal processing for spread radio channels*, Radio and Communication Press, USSR 1989.

- [21] G. Franceschetti and R. Lanari, Synthetic aperture radar processing, Boca Raton, FL:CRC, 1999.
- [22] S.A. Hovanessian, *Introduction to sensor systems*, Norwood, MA: Artech House, 1988.
- [23] L.G. Cutrona, "Synthetic Aperture Radar", in *Radar handbook*, McGraw-Hill, 1990.
- [24] D.R. Wehner, High-resolution radar, Artech House, 1994.
- [25] Yuriy Shkvarko and Ivan Villalon-Turrubiates, "Remote Sensing Imagery and Signature Fields Reconstruction via Aggregation of Robust Regularization with Neural Computing", in *Advanced Concepts for Intelligent Vision Systems*, J. Blanc-Talon, W. Philips, D. Popescu and P. Scheunders, Ed. Germany: Springer-Verlag, pp. 865-876, August 2007.
- [26] Yuriy V. Shkvarko, "Finite Array Observations-Adapted Regularization Unified with Descriptive Experiment Design Approach for High-Resolution Spatial Power Spectrum Estimation with Application to Radar/SAR Imaging", in *Proceedings of the 15<sup>nd</sup> International Conference on Digital Signal Processing*, Cardiff U.K., July 2007.
- [27] Yuriy V. Shkvarko, Ivan E. Villalon-Turrubiates and Jose L. Leyva-Montiel, "Remote Sensing Signature Fields Reconstruction via Robust Regularization of Bayesian Minimum Risk Technique", in *Proceedings* of the 2<sup>nd</sup> IEEE International Workshop on Computational Advances in Multi-Sensor adaptive processing, Virgin Islands U.S.A., pp. 237-240, December 2007.



**Ivan E. Villalon-Turrubiates** (M'04) was born in Salamanca Mexico in 1976. He received the Title of Mechanical Engineer in 2000 and the Master of Sciences Degree in Electrical Engineering (Digital Signal Processing) in 2005, both from the University of Guanajuato in Salamanca Mexico. He received the Doctor in Sciences Degree in Electrical Engineering (Digital Signal and Image Processing) in 2007 from the Center for Advanced Research and Education (CINVESTAV) of the National Polytechnic Institute (IPN) of Mexico in Guadalajara.

Presently, he is with the Department of Computer Sciences at the University of Guadalajara, Campus Valles in Ameca, Mexico, with a full-time Professor/Researcher position. Also, he is contributing with the Department of Physical Sciences at the TecMilenio University in Guadalajara, Mexico. His research activities are in applications of signal processing to remote sensing and imaging radar with emphasis in digital applications of signal and image processing in bioengineering. He published two papers in international journals, 2 chapters in scientific books, and 17 articles in international scientific conferences on these topics.



**Yuriy V. Shkvarko** (SM'05) received the Dip.Eng. (Hon.) degree in radio engineering, the Cand.Sci. degree (Ph.D. equivalent) in radio systems, and the Dr.Sci. degree in radio physics, radar, and Navigation, all from the Kharkov Aviation Institute, Ukraine, in 1976, 1980, and 1990, respectively.

From 1976 to 1991, he was with the Scientific Research Department, Kharkov Aviation Institute, Kharkov, as a Research Fellow, Senior Fellow, and finally as a Chair of the Research Laboratory in

information technologies for radar and navigation. From 1991 to 1999, he was a Full Professor with the Department of System Analysis and Control, at the Ukrainian National Polytechnic Institute in Kharkov. From 1999 to 2001, he was a Visiting Professor with the Guanajuato University at Salamanca, Mexico. In 2001, he joined the CINVESTAV del IPN, Unidad Guadalajara, Mexico, as a Full Titular Professor. His research interests are in applications of signal processing to remote sensing, imaging radar, navigation, and communications, particularly in inverse problems, random fields estimation, adaptive spatial analysis, statistical sensor array, and multichannel processing, and system fusion. He holds 12 patents and has published two books and some 130 papers in journals and conference records on these topics.

# Theoretical Study of Diffusion and Adsorption Inside Nano- and Mesoporous Active Particles

Oleksiy V. Klymenko

Abstract—Mass transport of a target species towards and within spherical mesoporous organosilica particles and its adsorption by active sites at pore walls is investigated through the numerical solution of the pertinent mathematical model. The presented theoretical results allow the optimization of mesoporous materials to increase their capacity towards target ions or molecules and their time-efficiency.

*Index Terms*— nanopore, adsorption, desorption, numerical simulation

### I. INTRODUCTION

THE synthesis of micro- or even nanometric structures L equipped with chemically active functional sites has gained significant attention in the recent years due to unique abilities of such materials to perform selective reactions on dilute species while having appropriate active chemical ligands bound to a rigid inorganic skeleton [1]-[8]. In particular, a comparatively easy sol-gel way of preparing such materials results in spherical microparticles consisting of bundles of nano- or mesopores opening through the particle surface and lined with specifically chosen active ligands to achieve selective trapping of target ions or molecules inside the particles. Due to the large surface area of the pore walls relative to their volume such particles are able to selectively accumulate extremely high concentrations of target species even of it is present in minute quantity in the solution containing the particles [4]-[5]. This property of the mesoporous particles is of high interest for selective filtration of liquids in various applications ranging from everyday to industrial and environmental purposes.

Despite a large number of works in this area most of them rely on empirical results and classical approaches although diffusion-reaction patterns developing inside nano- and mesoporous materials as well as transport conditions at the entrances to individual nanopores significantly influence the time efficiency of such materials. Therefore considering only limiting thermodynamic capacity of nanoporous particles is not sufficient for the construction of highly efficient and selective materials for specific applications.

Manuscript received March 27, 2008.



Fig. 1. Schematic representation of a spherical mesoporous particle consisting of a dense bundle of nanopores; R<sub>part</sub> is the particle radius, R<sub>pore</sub> that of the nanopores, L the nanopore length,  $\omega$  the half-thickness of sol-gel wall and  $\delta$  the thickness of the steady state diffusion layer surrounding it into the solution. (a) Schematic view showing one nanopore placement inside the particle and target species entering it from the surrounding solution. (b) Axial cross section of one nanopore and "its" diffusion layer; note that the origin of the abscissa axis, x, is fixed onto the particle surface and it is directed inside the particle; ordinates are measured orthogonally from the nanopore axis.

The general laws governing the diffusion-reaction patterns created inside such nano- or mesoporous materials during their filling and which control their cross-communications with the bulk solution (see Fig. 1) have been presented and discussed in a recent paper [9]. This general physicomathematical treatment has delineated the complex situations which may arise and outlined the general kinetic laws obeyed under each kinetic range. Its main results are presented in Fig. 2 in the form of a zone diagram depending on the two main dimensionless parameters which govern primarily the complex kinetics of diffusion and reaction within nano- or mesopores [9]. However, despite its conceptual utility the application of this general to a particular

O.V. Klymenko is with the Mathematical and Computer Modelling Laboratory, Kharkov National University of Radioelectronics, 14 Lenin Ave., Kharkov, 61166, Ukraine (phone: +38 057 702 09 69; fax: +38 057 702 10 13; e-mail: klymenko@kture.kharkov.ua).

experimental situation requires the consideration of many other parameters which affect the overall kinetic phenomena. This prevents the derivation of handy workable general laws, but nevertheless the complete model may be solved numerically. In this work we present a full numerical approach for the physicochemical problem at hand.



Fig. 2. (a) Kinetic zone diagram illustrating the different behaviors experienced by the 2D-nanoporous system as a function of its main dimensionless parameters characterizing its dynamics:  $\Xi_0 = (2\pi R_{pore}L)\Gamma_{site}/[(\pi R_{pore}^2L)C_0^b]$  is the ratio of the quantity of species storable by the nanopore wall sites ( $\Gamma_{site}$  is the surface concentration of active sites) and that stored by the solution inside the nanopore at initial solution concentration  $C_0^b$ ;  $\varepsilon = (L/R_{pore})^2$  is the squared ratio nanopore length and its radius; and  $\lambda_0 = k_{ads}L^2C_0^b/D_{bulk}$  is the dimensionless adsorption rate constant (adapted from [9]; see below for definitions of parameters).

In practice kinetic information regarding processes occurring inside the nano- or mesoporous particles is hidden from direct observation. Thus one can access kinetic data only through measurements of the concentration of a target species in the solution bulk during a transient experiment involving vigorous solution stirring which ensures that the bulk concentration may be regarded as time-dependent only. These experimental data may be represented in the form of the quantity ratio of already extracted species and the maximum

extraction capacity, 
$$f(t) = \frac{Q^{tot}(\tau)}{Q_{\infty}^{tot}} = \frac{1 - c(t)}{1 - c_{\infty}}$$
, where  $c(t)$  is

the normalized concentration of target species at time t and  $c_{\infty}$  is the normalized limiting bulk concentration of target species at infinite time when complete partition equilibrium is achieved.

Since experimentally one has information only about the behavior of f we focus our analysis on this quantity and investigate its behavior as a function of dimensionless time  $\tau = D_{bulk}t/L^2$  (which is related to the real time t via the target species diffusion coefficient in the bulk solution  $D_{bulk}$  and the average nanopore length  $L = R_{part}/3$  [9]) and other kinetic parameters that determine the rates of supply of target species towards the nanopores openings, its diffusion inside the nanopores and finally the rate of its adsorption/desorption at the walls. It should be noted that an efficient system

(mixture of mesoporous particles) ought to contain a sufficient number of active sites in order to achieve a high level of sequestration and the kinetics of the process depends on the relative amounts of target species and active sites. The time efficiency of the system is best analyzed by considering characteristic times when f reaches an indicative value chosen here to be 0.75, which corresponds to the moment when the mesoporous material has extracted the amount of target species corresponding to 75% of its capacity.

### II. THEORY

In this section, for the sake of convenience, we repeat the dimensionless formulation of the mass transport and adsorption problem [9] corresponding to significantly large nanopores with diameters of at least several molecular sizes (2D formulation) which permits classical diffusion of target species inside such nanopores. Fig. 1 shows the schematic representation of a single spherical particle immersed into a solution of target species under hydrodynamic mixing conditions exposing one nanotube within the particle core lined with active adsorption sites. Each particle is thus surrounded by a stagnant hydrodynamic diffusion layer which may be defined as a superposition of diffusion layers created due to diffusion of target species into each nanopore opening. Once inside the nanopore, target species travels by classical diffusion if the nanopore is sufficiently wide and/or by sitehopping diffusion along nanopore walls. Fig. 1 shows the normalized coordinate system inside a single nanopore.

In this work we focus on the 2D model of diffusion inside the nanopores which implies that the latter must be at least several target molecular sizes in diameter. The following mathematical equations describe the diffusion of target species occurring in the nanopore core solution with the same diffusion rate as in the bulk solution, which is coupled with adsorption at nanopore walls and diffusion by hopping between the active sites at the wall surface with the relative diffusion rate  $\eta_{sh} = D_{site hopping}/D_{bulk}$  and the variations of the bulk solution concentration [9]:

$$\frac{\partial a}{\partial \tau} = \frac{\partial^2 a}{\partial y^2} + \varepsilon \left( \frac{\partial^2 a}{\partial \chi^2} + \frac{1}{\chi} \frac{\partial a}{\partial \chi} \right); \tag{1}$$

$$\frac{\partial \theta}{\partial \tau} = \eta_{\rm sh} \frac{\partial^2 \theta}{\partial y^2} + \lambda_0 [(1-\theta)a - \theta\kappa]; \qquad (2)$$

$$\frac{\mathrm{d}c}{\mathrm{d}\tau} = -\upsilon\phi(c - \langle a \rangle_{y=0}) = -\upsilon\phi\left(c - 2\int_{0}^{1}a_{y=0}\chi\mathrm{d}\chi\right), \tag{3}$$

where y = x/L and  $\chi = \rho/R_{pore}$  are the normalized nanopore axial and radial coordinates, respectively, and  $\epsilon = (L/R_{pore})^2$ .

The system (1)-(3) is associated with the following initial conditions (  $\tau = 0$  ):

$$\mathbf{c} = \mathbf{l}; \tag{4a}$$

$$0 < y \le 1, \ 0 \le \chi \le 1$$
:  $a(y, \chi, 0) = 0, \ \theta(y, 0) = 0$ ; (4b)

$$y = 0$$
,  $0 \le \chi \le 1$ :  $a(0, \chi, 0) = 1$ ,  $\theta(0, 0) = 0$ , (4c)

and boundary conditions (  $\tau > 0$  ):

$$0 \le y < 1, \ \chi = 0: \ \left(\partial a / \partial \chi\right)_{\chi = 0} = 0; \tag{5a}$$

$$y = 1, \ 0 \le \chi \le 1: \ (\partial a / \partial y)_{y=1} = 0;$$
 (5b)

$$y = 1$$
,  $\chi = 1$ :  $(\partial \theta / \partial y)_{y=1} = 0$ ; (5c)

$$0 \le y < 1$$
,  $\chi = 1$ :  $\left(\frac{\partial a}{\partial \chi}\right)_{\chi=1} = -\frac{\lambda_0 \Xi_0}{2\epsilon} \left[(1-\theta)a - \theta\kappa\right];$  (5d)

$$y = 0$$
,  $0 \le \chi \le 1$ :  $\left(\frac{\partial a}{\partial y}\right)_{y=0} = -\varphi(c - a_{y=0})$ ; (5e)

$$y = 0$$
,  $\chi = 1$ :  $\left(\partial \theta / \partial y\right)_{y=0} = 0$ . (5f)

Equations (1)-(3) describe the evolution of the three main quantities that reflect the system behavior: normalized concentration of target species inside the nanopore  $a = C/C_0^b$ (where C is the concentration,  $C_0^b$  is the initial bulk concentration of target species), normalized bulk concentration of target species  $c = C^b/C_0^b$  and adsorption wall coverage,  $\theta$ , representing the fraction of occupied active chemical sites inside the nanotube. Adsorption kinetics are defined by the dimensionless adsorption rate constant  $\lambda_0 = k_{ads} L^2 C_0^b \,/\, D_{bulk}$  , where  $\,k_{ads}\,$  is the real adsorption rate constant (in  $M^{-1}s^{-1}$ ), and the dimensionless pore storage parameter  $\Xi_0 = (2\pi R_{pore}L)\Gamma_{site}/[(\pi R_{pore}^2L)C_0^b] = Q_{max}^{wall}/Q_{max}^{pore}$ representing the ratio of the target species amount storable by nanopore walls and that contained in the nanopore core volume at initial concentration, with  $\Gamma_{site}$  being the surface concentration of active sites.

We should remark here that the rate of decrease of the bulk concentration of target species due to the adsorption of the latter at the nanopore walls depends on (i) the mass transport conditions at the entrance to a single nanopore (including the concentration change across the pore entrance) and (ii) the effective number of all nanopores in the particle mixture. The concentrations in the bulk solution and that inside the pore are coupled through equations (3) and (5e). The effects of converging diffusion from the outer boundary of the hydrodynamic diffusion layer formed around a particle towards a single nanopore opening are taken into account by the parameter  $\varphi = (L/\delta)(1+\delta/R_{part})(1+\omega/R_{pore})^2$  which describes the ensuing amplification of the target species flux, with  $R_{part}$  being the nanoporous particle radius. On the other

hand, the parameter 
$$\upsilon = \frac{4\pi R_{part}^3 N_{part}}{3V^b} \left(\frac{R_{pore}}{R_{pore} + \omega}\right)^2$$
, where

 $N_{part}$  is the total number of nanoporous particles in the assembly and  $V^b$  bulk solution volume, gives the ratio between the total volume of the pores and the bulk solution

volume, thus characterizing the particle mixture as a whole (see below).

It should be noted that differential equation (3) introduces a slight simplification of the real physical picture of diffusion at the nanopore entrance through considering the average concentration over the nanopore entrance,  $\langle a \rangle_{y=0}$ . However, this simplification has a minor effect on the overall model since any variations of the target species concentration along the nanopore radius occurring due to converging diffusion into the nanopore are smoothed out as soon as the diffusion front penetrated into the nanopore by more than a few its radii. Since in real systems the nanopore length greatly exceeds its radius, differential equation (3) is accurate for the whole duration of interest of the diffusion-adsorption process except the very beginning of it (which is too short to be perceived experimentally).

The effect of site-hopping diffusion (described by the first term on the right-hand side of (2)) onto the overall kinetics of target species sequestration is expected to be extremely minute due to redistribution of adsorbed target ions through to their exchanges between active sites being significantly slower than diffusion in the bulk solution.

### III. ANALYSIS OF THE MODELS AND GENERAL THEORETICAL RESULTS

### A. Adsorption effectiveness

It is clear that high efficiency of inorganic particles-"sponges" aimed at selective sequestration of desired ions may be achieved under the conditions when adsorption of the target species is very strong compared to its desorption. Therefore in the following analysis we assume that  $K_{des} = k_{des}/k_{ads}$ , the desorption equilibrium constant introduced in [9], where  $k_{des}$  is the desorption rate constant in  $s^{-1}$ , is negligibly small so that the dimensionless parameter  $\kappa = K_{des}/C_0^b$  may also be assumed to have so negligible a value that it does not affect the system behavior. Note that this simplification corresponds to an experimentally desired situation and depends only on the reactivity of the active sites.

Next we wish to establish the conditions corresponding to high adsorbing capacity of mesoporous organosilica particle mixtures of  $N_{part}$  particles with respect to a given solution volume  $V^b$  containing the target species at the initial concentration  $C_0^b$ . It was established in [9] that the normalized limiting concentration at  $\tau \rightarrow \infty$  of target species in the bulk solution is given by the following expression:

$$c_{\infty} = \frac{1 - \kappa (1 + \upsilon) - \upsilon \Xi_0 + \sqrt{\left(1 - \kappa (1 + \upsilon) - \upsilon \Xi_0\right)^2 + 4\kappa (1 + \upsilon)}}{2(1 + \upsilon)}, (6)$$

which takes into account the limiting adsorptive wall coverage that depends on adsorption-desorption kinetics. Under the assumption of negligible desorption rate (viz.,  $\kappa \rightarrow 0$ ) the

normalized limiting concentration of target species simplifies into:

$$\mathbf{c}_{\infty} = \frac{1 - \upsilon \Xi_0 + \left| 1 - \upsilon \Xi_0 \right|}{2(1 + \upsilon)} \,. \tag{7}$$

The limiting concentration depends exclusively on two dimensionless parameters:  $\upsilon$  and  $\Xi_0$ . The first parameter,  $\upsilon$ , reflects the ratio of the cumulative volume of all pores in all particles and the solution volume:

$$\upsilon = \frac{4\pi R_{\text{part}}^3 N_{\text{part}}}{3V^b} \left(\frac{R_{\text{pore}}}{R_{\text{pore}} + \omega}\right)^2 = \frac{V_{\text{pore}}^{\text{tot}}}{V^b}.$$
 (8)

The second parameter,  $\Xi_0$ , indicates the potential efficiency of adsorptive removal of target species from the solution and is equal to the ratio between the maximum adsorption capacity of the pore walls and the quantity of target species initially present in the pore bulk solution at its initial concentration  $C_0^b$  (see above).

It is clear that highest efficiency of the system is achieved when  $c_{\infty}$  is sufficiently close to zero, i.e. when most of the target species may be extracted from the solution by mesoporous particles. Thus, it is directly apparent from (7) that the limiting concentration  $c_{\infty}$  is identical zero when

$$\upsilon \ge \Xi_0^{-1},\tag{9}$$

which corresponds to complete depletion of the solution. On the other hand, this result reflects the fact that complete extraction of the target species from the solution is possible only when the number of active sites in all the particles present in the solution exceeds the number of target molecules or ions in the initial solution, since

$$\upsilon \Xi_{0} = \frac{N_{\text{part}} \left( \frac{4\pi R_{\text{part}}^{2}}{\pi (R_{\text{pore}} + \omega)^{2}} \right) (2\pi R_{\text{pore}} L) \Gamma_{\text{site}}}{C_{0}^{b} V^{b}} = \frac{Q_{\text{max}}^{\text{wall tot}}}{Q_{0}^{b}}, \quad (10)$$

which should be not less than unity for an efficient system.

It should be noted that, from the point of view of efficiency, one also should impose an upper limit on the parameter  $\upsilon$  in order to define the maximum ratio of the particles volume to that of the solution.

### B. Kinetics of target species sequestration

Equation (7) represents an important result concerning the *potential* (thermodynamic) efficiency of the system, however, the kinetics of the overall process is crucial for the time required to reach the desired level of target species sequestration (and hence the *practical* efficiency) depends on the reactivity of the active sites lining the nanopore walls, their diffusional accessibility as well as the rate of supply of target species to the nanopore entrances and the rate of the bulk concentration decay. This is an important factor since under effective regimes of application the particles are confined inside a reactor traversed by the fluid containing the target ion to be removed by the particles. Therefore, what matters is the time duration required for a nanopore to fill up

to a given fraction of its nominal capacity, which should be comparable with reasonable residence time of the fluid to beremediated in the reactor. Hence, in the following we present data for the dimensionless times  $\tau_{75\%}$  necessary to reach 75% of the thermodynamic capacity reflected by (7), i.e. times corresponding to the moments when the quantity  $f(\tau) = (1 - c(\tau))/(1 - c_{\infty})$  reaches 0.75. In all the calculations quoted hereafter the value of the dimensionless desorption equilibrium constant  $\kappa$  was kept at  $10^{-6}$  to ensure that desorption effects are indeed negligible.



Fig. 3. Dependence of  $log\,\tau_{75\%}$  on  $\lambda_0$  and  $\Xi_0$  .

It is clear that the dimensionless mathematical model given by (1)-(5) depends on five independent dimensionless parameters being  $\lambda_0$ ,  $\Xi_0$ ,  $\epsilon$ ,  $\phi$  and  $\upsilon$  (see [9] and above for the definitions). First we investigate how the interplay between the adsorption kinetics ( $\lambda_0$ ) and relative quantity of adsorption sites  $(\Xi_0)$  determines the dimensionless time required to reach the acceptable level of solution sequestration (75% in our case). It is evident from Fig. 3 that the distribution of  $\tau_{75\%}$  mostly depends on the parameter  $\Xi_0$  and the dependence on the kinetic parameter  $\lambda_0$  reveals itself only for sufficiently low values of  $\lambda_0$  as predicted in [9] (see the schematic kinetic zone diagram in Fig. 2 and Fig. 4b of ref. [9]). When the adsorption rate  $\lambda_0$  is extremely fast the process becomes diffusion-limited, in which case the apparent pore filling rate depends on the parameter  $\Xi_0$  which results in the sole dependence on this parameter at high  $\lambda_0$ . When the adsorption kinetics are slow compared to diffusion, the diffusion-adsorption behavior inside the nanopores is more complex and is dictated by  $\Xi_0$ . In the case of low  $\Xi_0$  the overall rate of adsorption is low so that nanotubes are filled before any effective reaction may start which results in virtually uniform growth of adsorption coverage in the whole nanopore. However, when there is a large excess of adsorption

sites the propagation of target species into the pore slows down (similarly to the fast adsorption kinetics limit) but in this case due to a large adsorptive capacity of the walls which leads to almost identical temporal development of the concentration and coverage profiles.

The distribution of the quantity  $\tau_{75\%}$  presented in Fig. 3 must be considered together with the limiting solution sequestration, which is given by (7), as a function of  $\log(\upsilon \Xi_0)$  (note that the value  $\log \Xi_0 = 3$  approximately corresponds to  $log(\upsilon \Xi_0) = 0$  which corresponds to the conditions required for the number of active sites to equal the number of target species molecules or ions). As evident from Fig. 3, the time  $\tau_{75\%}$  first increases and then either decreases with increasing  $\Xi_0$  for low values of  $\lambda_0$  or reaches a plateau for larger values of  $\lambda_0$ . This occurs due to the following reasons. When  $\Xi_0$  is small the quantity of the target species that may be taken from the solution by the particles is also small which implies that the sequestration level of 75% is quickly reached. When  $\Xi_0$  increases, the number of available sites also increases and more of target species has to be i) delivered to the active sites and ii) bound by them. This leads to the increase of  $\tau_{75\%}$ . However, further increase of  $\Xi_0$ corresponds to a situation when the capacity of the pore walls becomes larger than the amount of target species in the solution (i.e. when  $\upsilon \Xi_0 > 1$ ) and the number of active sites is no longer a limiting factor in sequestration efficiency. Thus for low values of  $\lambda_0$  the dimensionless time  $\tau_{75\%}$  decreases due to the increasing abundance of active sites while for large values of  $\lambda_0$  this quantity reaches a plateau due to immediate binding of all target species entering the nanopores. Under the latter conditions the value of  $\tau_{75\%}$  is limited exclusively by the rate of transport of target species towards nanopore openings (see below). It is also worth to note that for practical applications one wishes to reach a high level of sequestration of the target species which corresponds to the conditions when  $\upsilon \Xi_0 \ge 1$  (see above), i.e. when all target species ions or molecules may be extracted from the solution at time infinity  $(c_{\infty} = 0)$  as follows from (7). Therefore only the corresponding parts of the plots in Fig. 3 are of practical interest.

Let us now investigate the dependence of  $\tau_{75\%}$  on the parameters  $\varepsilon$  and  $\phi$  which define the geometry of the system and the rate of supply of target species to the mesopore entrance. The ranges of these parameters are limited due to the physical meanings of real parameters composing these dimensionless quantities. Thus we shall not consider values of  $\varepsilon$  smaller than 100 which corresponds to the case when the average pore length is only 10 times its radius. For larger pore radii the model may introduce a significant bias due to the original geometrical assumptions. The value of  $\phi = (L/\delta)(1 + \delta/R_{part})(1 + \omega/R_{pore})^2$  is also limited from below

due to the fact that  $(1+\omega/R_{pore})^2 \approx 1$  and  $(L/\delta)(1+\delta/R_{part}) = (R_{part}/\delta+1)/3$ . Hence the lower bound for  $\varphi$  is 1/3 (which corresponds to infinitely thin nanopore walls and still bulk solution giving rise to infinite diffusion layer).



Fig. 4.  $\log \tau_{75\%}$  as a function of parameters  $\varepsilon$  and  $\phi$  for constant product  $\upsilon \Xi_0 = 1$  and  $\lambda_0 = 0.1$ .

The dependence of  $\tau_{75\%}$  on  $\epsilon$  and  $\phi$  was simulated while keeping the product  $\upsilon \Xi_0$  equal to unity so that the number of available adsorptive sites always equals the number of target species molecules or ions. In this case the resulting data reflect exclusively the effects of nanopore geometry (through the variation of  $\varepsilon$  which may be interpreted as variation of nanopore radius R<sub>pore</sub>) and rate of supply of target species from the solution bulk to nanopore entrance (represented by  $\varphi$  which in turn depends on the nanotube packing parameter  $\omega$  as well as on the thickness of diffusion layer  $\delta$  as imposed by solution hydrodynamic mixing). However, in doing so one needs to ensure that the actual density of active sites at pore walls remains the same for all values of  $\varepsilon$ , which is achieved by fixing the value of the dimensionless quantity  $\Gamma_{\rm site}/(LC_0^b)$ and recalculating the values  $\Xi_0=2\sqrt{\epsilon}\times\Gamma_{site}\big/(LC_0^b)$  and  $\upsilon = \Xi_0^{-1}$ . Therefore changing  $\varepsilon$  leads to changing values of two other parameters. The simulation results obtained for the parameter values  $\lambda_0 = 0.1$ ,  $\Gamma_{\text{site}}/(\text{LC}_0^b) = 5.35$  are shown in Fig. 4.

As expected, the values of  $\tau_{75\%}$  are lower for low values of  $\epsilon$  (i.e., for larger pores since this facilitates diffusional supply) and for high values of  $\phi$  which corresponds e.g. to thin diffusion layer around particles and hence to faster supply of target species to nanopore entrances. It is also evident from Fig. 4 that when the diffusion layer is sufficiently thin (large  $\phi$ ) the value of  $\tau_{75\%}$  becomes virtually independent of  $\phi$ 

which indicates that in this case the pore geometry is the main limiting factor determining the time-efficiency of the system. On the other hand, when the value of  $\varphi$  is not very high (i.e. less than ca. 0.75) the rate of diffusion inside the nanopore appears much faster than the rate of supply of target species towards the nanopore entrance. Under these circumstances the time  $\tau_{75\%}$  necessary to achieve 75% sequestration with respect to  $c_{\infty}$  is limited by the rates of target species supply and its adsorption at the pore walls, i.e. according to (5d) and (5e)

$$\tau_{75\%} \propto \frac{\lambda_0 \Xi_0 \varphi}{2\epsilon}.$$
 (11)

Bearing in mind that  $\Xi_0 = 2\sqrt{\epsilon} \times \Gamma_{site} / (LC_0^b)$  one can devise that within the specified range of  $\varphi$  the time  $\tau_{75\%}$  is constant whenever

$$\log \varphi - \frac{\log \varepsilon}{2} = \text{const} . \tag{12}$$

This relationship is clearly observable in the lower part of Fig. 4 where the contour lines of  $\tau_{75\%}$  are virtually straight and have the slope of 1/2.

### IV. COMPUTATIONAL DETAILS

The computer program for the presented numerical simulations was written in Borland Delphi 7 Professional Edition and executed on a PC equipped with Pentium D processor at 2.8 GHz and 512 MB of RAM.

The model was discretized using the fully implicit finite difference scheme [10] on a uniform grid in both spatial directions and time. The typical grid size was  $Ny \times N\chi \times N\tau = 100 \times 20 \times 2000$  which ensured numerical convergence of better than 1% for all considered parameter ranges.

### V. CONCLUSION

Numerical simulation has allowed the solution of the system of partial and ordinary differential equations comprising the mathematical model of selective sequestration of target species by spherical mesoporous organosilica particles. Thorough theoretical analysis and simulation of the case of sufficiently large nanopores (with radii of at least several molecular sizes) has yielded the dependences of characteristic time  $\tau_{75\%}$  necessary for reaching 75% sequestration on governing kinetic parameters. These results may be employed for the optimization of mesoporous materials for designing highly efficient systems to be applied for filtration or sequestration of desired ions or molecules from dilute solutions.

### ACKNOWLEDGMENT

The author wishes to express his gratitude to Professor Christian Amatore (ENS, Paris, France) and Dr Alain Walcarius (Nancy University, Nancy, France) for their help in conducting this research.

#### REFERENCES

- H. Yang, G.A. Ozin, C.T. Kresge, "The role of defects in the formation of mesoporous silica fibers, films, and curved shapes," *Adv. Mater.*, vol. 10, pp. 883-887, Aug. 1998.
- [2] I. Oda, K. Hirata, S. Watanabe, Y. Shibata, T. Kajino, Y. Fukushima, S. Iwai, S. Itoh, "Function of membrane protein in silica nanopores: incorporation of photosynthetic light-harvesting protein LH2 into FSM," *J. Phys. Chem. B*, vol. 110, pp. 1114-1120, Jan. 2006.
- [3] P.V. Braun in *Nanocomposite Science and Technology*, P.M. Ajayan, L.S. Schadler, P.V. Braun, Eds. Wenheim: Wiley-VCH Verlag GmbH & Co. KGaA, 2003, pp. 155-214.
- [4] A. Walcarius, E. Sibottier, M. Etienne, J. Ghanbaja, "Electrochemically assisted self-assembly of mesoporous silica thin films," *Nature Materials*, vol. 6, pp. 602-608, Aug. 2007.
- [5] M. Etienne, A. Quach, D. Grosso, L. Nicole, C. Sanchez, A. Walcarius, "Molecular transport into mesostructured silica thin films: electrochemical monitoring and comparison between *p6m*, *P6*<sub>3</sub>/*mmc*, and *Pm3n* structures," *Chem. Mater.*, vol. 19, pp. 844-856, Feb. 2007.
- [6] H. Han, H. Frei, "Visible light absorption of binuclear TiOCo<sup>II</sup> chargetransfer unit assembled in mesoporous silica," *Microporous and Mesoporous Materials*, vol. 103, pp. 265-272, Jun. 2007.
- [7] G. Wang, B. Zhang, J.R. Wayment, J.M. Harris, H.S. White, "Electrostatic-gated transport in chemically modified glass nanopore electrodes," J. Am. Chem. Soc., vol. 128, pp. 7679-7686, Jun. 2006.
- [8] S. Brandès, G. David, C. Suspène, R. J. P. Corriu, R. Guilard, "Exceptional affinity of nanostructured organic-inorganic hybrid materials towards dioxygen: Confinement effect of copper complexes," *Chem. Eur. J.*, vol. 13, pp. 3480-3490, Apr. 2007.
- [9] C. Amatore, "Theoretical trends of diffusion-reaction into tubular nanoand mesoporous structures: A general physicochemical and physicomathematical modeling," *Chem. Eur. J.*, to be published.
- [10] R.D. Richtmyer, K.W. Morton, Difference methods for initial-value problems, 2-nd ed. New York: Wiley-Interscience, 1967.

**Oleksiy V. Klymenko** is a Senior Scientist of the Mathematical and Computer Modelling Laboratory of Kharkov National University of Radioelectronics (KNURE). He received the D. Phil. degree in computational electrochemistry from the University of Oxford, UK in 2004. In 2006 he received the Candidate of Physical and Mathematical Sciences (equivalent to PhD) from the A.N. Podgorny Institute for Mechanical Engineering Problems of the National academy of Sciences of Ukraine, Kharkov, Ukraine. His research interests include mathematical physics, mathematical modelling in physical chemistry and biology, numerical methods, solution of inverse problems in sciences.

## Solving Parallel Multi Component Automata Equations

Natalia Shabaldina, Nina Yevtushenko

Abstract — The problem of designing the unknown component of a system of interacting automata that combined with the known part of the system meets the specification, is well known. However, most publications are devoted to solving the problem for the proper composition of two automata. In this paper, we consider a parallel multi component automata equation and propose two methods for deriving a largest solution to this equation (if the equation is solvable). In particular, we show that the union of alphabets over all known components and of that of the specification is the largest alphabet of actions over which a solution for a solvable equation should exist, and show how a solution over an appropriate alphabet can be derived from such a largest solution.

Index Terms — Automata, Equations, Discrete event systems

#### I. INTRODUCTION

any problems over discrete event systems can be **L**reduced to solving a language inequality  $A \otimes X \subseteq S$ or to solving a language equation A @ X = S where X is a free variable and (a) is the composition operator. For different applications, appropriate equations are formulated and their solutions were investigated by various researchers. Most papers in process algebra [see, for example, 1-5] are devoted to solving equations over parallel composition which allows arbitrary delay between communication events. In this paper, we consider a parallel multi component automata equation and propose two methods for deriving a largest solution to this equation (if the equation is solvable). In particular, we show that the union of alphabets over all known components and of that of the specification is the largest alphabet of actions over which a solution for a solvable equation should exist, and show how a solution over an appropriate alphabet can be derived from such a largest solution.

### II. PARALLEL EQUATIONS OVER AUTOMATA

### A. Parallel composition operator

Let *A* be an alphabet, and a language *L* is defined over the alphabet *A*. Given a non-empty subset  $A_1$  of the alphabet *A* and a sequence  $\alpha$  over alphabet *A*, the  $A_1$ -restriction of  $\alpha$ , denoted  $\alpha_{\bigcup A_1}$ , is a sequence obtained from  $\alpha$  by erasing symbols in  $A \setminus A_1$ . The language  $L_{\bigcup A_1} = \{\beta_{\bigcup A_1} : \beta \in L\}$  is the restriction of language *L* onto the subset  $A_1$ . For each word  $\beta \in L$  that does not have symbols of alphabet  $A_1$  the restriction of  $\beta$  is the empty word  $\varepsilon$ . Let now language *L* be defined over alphabet  $A_1 \subseteq A$ . The language

 $L_{\prod A} = \{\beta: \beta_{\bigcup A} \in L\}$  is the *expansion* of language L over

the alphabet A.

In this paper, we consider equation solving over finite automata. A finite automaton (or simply an automaton throughout this paper) is a quintuple  $S = \langle S, A, \delta_{S}, s_{0}, F_{S} \rangle$ , where S is a finite nonempty set of states with the initial state  $s_0$  and a subset  $F_s$  of *final* (or *accepting*) states, A is an alphabet of actions, and  $\delta_S \subseteq A \times S \times S$  is a transition relation. We say that there is a transition from a state s to a state s' labeled with an action a, if and only if the triple (a,s,s') is in the transition relation  $\delta_s$ . The automaton S is called *deterministic*, if for each state  $s \in S$  and any action a  $\in A$  there exists at most one state s', such that  $(a,s,s') \in \delta_{S}$ . If S is not deterministic, then it is called *nondeterministic*. As usual, the transition relation  $\delta_{S}$  of the automaton S can be extended to sequences over the alphabet A. Given a state s of the automaton S, the set  $L_s(S) = \{\alpha \in A^* | \exists s' \in F_S\}$  $((s, \alpha, s') \in \delta_s)$  is called the language, generated at the state s. The language, generated by the automaton S at the initial state, is called the language generated or accepted by the automaton S and is denoted by L(S), for short. Automata S and T are called equivalent  $(T \cong S)$  if L(T) = L(S). Automaton T is a *reduction* of automaton S ( $T \leq S$ ) if  $L(T) \subseteq L(S)$ . In the same way, the equivalence and the reduction relations are defined between two states of an automaton. Well-known results state that for each nondeterministic automaton, there exists an equivalent deterministic automaton [6]. It also is well-known how the union, intersection, complementation, restriction and expansion over deterministic automata [4-6] can be derived.

Manuscript received March 17, 2008. F. A. Author is with the National Institute of Standards and Technology, Boulder, CO 80305 USA (corresponding author to provide phone: 303-555-5555; fax: 303-555-5555; e-mail: author@ boulder.nist.gov).

Natalia Shabaldina is with the Tomsk State University, 36 Lenin Str., Tomsk, 634050, Russia (e-mail: snv@kitidis.tsu.ru)

Nina Yevtushenko is with the Tomsk State University, 36 Lenin Str., Tomsk, 634050, Russia (e-mail: <u>yevtushenko@elefot.tsu.ru</u>).

Acknowledgments. The authors gratefully acknowledge the partial support of the RFBR-NSC Grant 06-08-89500.

Given *k* automata  $F_1, F_2, ..., F_k$ , let the automaton  $F_j$ accept the language  $L_j, j = 1, 2, ..., k$ , over alphabet  $A_j, A = A_1 \cup A_2 \cup ... \cup A_k$  and *E* is a non-empty subset of *A*. The *parallel* composition  $\diamond_E(F_1, F_2, ..., F_k)$  is the automaton  $(F_1_{\uparrow A} \cap F_2_{\uparrow A} \cap ... \cap F_k_{\uparrow A})_{\Downarrow E}$ .<sup>1</sup> The automaton  $\diamond_E(F_1, F_2, ..., F_k)$  has the empty language if one component automaton

 $\dots$ ,  $F_k$ ) has the empty language if one component automaton has the empty language.

### B. Parallel language equations

In this section, we extend the notion of an automata equation to k automata, k > 2, and determine a largest alphabet over which the equation should be solvable.

### Extending the formula for a largest solution to a multi component automata equation

Given k automata  $F_1, F_2, ..., F_{k-1}, F$ , let the automaton  $F_j$ accept the language  $L_j, j = 1, 2, ..., k - 1$ , over alphabet  $A_j$ , while the automaton F accepting the language L over alphabet  $E, A = A_1 \cup A_2 \cup ... \cup A_{k-1} \cup E$  and R is a nonempty subset of A. Consider an *automata inequality*  $\diamond_E(F_1, F_2, ..., X) \leq F$  and an *automata equation*  $\diamond_E(F_1, F_2, ..., X) \cong F$  with a free variable X that is an automaton over alphabet R. An automaton  $F_R$  over the alphabet R is a *solution* to the inequality  $\diamond_E(F_1, F_2, ..., F_R)$  $\leq F$  if  $\diamond_E(F_1, F_2, ..., F_R) \leq F$ . An automaton  $F_R$  is a *solution* to the equation  $\diamond_E(F_1, F_2, ..., F_R) \cong F$  if  $\diamond_E(F_1, F_2, ..., F_R)$  $\cong F$ . A solution  $M_R$  over the alphabet R is a *largest* solution to the inequality  $\diamond_E(F_1, F_2, ..., X) \leq F$  (to the equation  $\diamond_E(F_1, F_2, ..., X) \cong F$ ) if each solution over alphabet R is a reduction of  $M_R$ .

Similar to a parallel equation over two automata, a multi component parallel automata inequality as well as a solvable parallel automata equation has always a largest solution<sup>2</sup>.

**Theorem 1.** 1. Given k automata  $F_1, F_2, ..., F_{k-1}, F$ , let the automaton  $F_j$  accept the language  $L_j, j = 1, 2, ..., k-1$ , over alphabet  $A_j$ , while the automaton F accepting the language L over alphabet  $E, A = A_1 \cup A_2 \cup ... \cup A_{k-1} \cup E$ and R is a non-empty subset of A. A largest solution to the automata inequality  $\diamond_E(F_1, F_2, ..., X) \leq F$  is the automaton  $\langle \rangle_R(F_1, ..., F_{k-1}, \overline{F})$ . 2. Given a language equation  $\diamond_E(F_1, F_2, ..., X) \cong F$ , the equation is solvable if and only if  $\diamond_E(F_1, F_2, ..., X) \cong F$ , the equation is solvable if and only if solvable then the automaton  $\langle \rangle_R(F_1, ..., F_{k-1}, \overline{F})$  is a largest solution to the equation.

### Determining a largest alphabet over which the equation is solvable

The formula of the previous section has the exponential complexity. However, the complexity of checking if there

exists an alphabet s.t. the equation is solvable over this alphabet can be reduced. In this section, we propose an algorithm for deriving a largest solution over alphabet A without using the complementation and restriction operators and state that if the equation has no solution over alphabet A then the equation is not solvable over any subset R of the alphabet A.

A largest solution to the inequality  $\diamond_E(F_1, F_2, ..., F_{k-1}, x) \leq F$  over alphabet A can be derived using the following procedure. Derive the automaton  $M_A$  by adding the designated accepting Don't Care state (DNC) to the set of states of  $F_1_{\uparrow_A} \cap ... \cap F_{k-1}_{\uparrow_A} \cap F_{\uparrow_A}$ . Given a state  $f_1...f_{k-1}$  f of the automaton  $F_1_{\uparrow_A} \cap ... \cap F_{k-1}_{\uparrow_A} \cap F_{\uparrow_A}$  and an action  $a \in A$ , the automaton  $M_A$  has a transition ( $f_1...f_{k-1}f$ , a, DNC) if there exists a component automaton  $F_j$  that has no transition under a from state  $f_j$ . There is a loop at the DNC state for each action  $a \in A$ .

**Theorem 2.** 1. Given an automata inequality  $\diamond_E(F_1, F_2, ..., F_{k-1}, x) \leq F$ , the automaton  $M_A$  is a largest solution to the inequality. 2. Given an automata equation  $\diamond_E(F_1, F_2, ..., F_{k-1}, x) \cong F$ , if the automata  $\diamond_E(F_1, F_2, ..., F_{k-1}, M_A)$  and F are equivalent then  $M_A$  is a largest solution to the equation. If the automata  $\diamond_E(F_1, F_2, ..., F_{k-1}, M_A)$  and F are not equivalent then the equation is not solvable over any alphabet  $R \subseteq A$ .

The following proposition gives a guide how a solution can be derived over a subset  $R \subset A$  (if such a solution exists).

**Proposition 3.** Given an automata inequality  $\diamond_E(F_1, F_2, \dots, F_{k-1}, x) \leq F$ , a largest solution  $M_A$  to the inequality over alphabet A and a proper subset  $R \subset A$ , an automaton  $M_R$  over alphabet R is a solution to the inequality over alphabet R if and only if the expansion of  $M_R_{\uparrow \uparrow A}$  is a reduction of  $M_A$ .

**Corollary.** Given an automata inequality  $\diamond_E(F_1, F_2, ..., F_{k-1}, x) \leq F$ , a largest solution  $M_A$  to the inequality over alphabet A and a proper subset  $R \subset A$ , a largest solution to the inequality over alphabet R is a largest automaton  $M_R$  (w.r.t. the language) over alphabet R s.t. the expansion of  $M_R_{\hat{\Pi}A}$  is a reduction of  $M_A$ .

Based on the above statements a largest solution  $M_R$  over alphabet R to the automata inequality  $\Diamond_E(F_1, F_2, ..., F_{k-1}, x) \leq F$  can be derived in the following steps.

**Step 1.** Derive a largest solution  $M_A$  to the inequality  $\Diamond_E(F_1, F_2, ..., F_{k-1}, x) \leq F$  over alphabet A. If a largest solution  $M_A$  is trivial (the language is empty) then a solution over each alphabet is trivial.

**Step 2.** Derive the largest complete submachine N of  $M_A$  over alphabet  $A \setminus R$ . If there is no complete submachine then the inequality has only a trivial solution over alphabet R. Otherwise, let N be a set of states of N. Delete each transition in  $M_A$  to every state that is not in the set N. Denote  $P_A$  the obtained automaton.

<sup>&</sup>lt;sup>1</sup> This definition can be viewed as an extension of the parallel composition of two automata [4]

<sup>&</sup>lt;sup>2</sup> With respect to the reduction relation.

**Step 3.** Derive the set  $K_1$  of all states reachable in  $M_A$  from the initial state under actions of the alphabet  $A \setminus R$ . Assign  $K = \{K_1\}$ ; i = 1;  $M_R$  be a trivial automaton with the initial state  $K_1$  over the alphabet R.

**Step 4.** For each action  $a \in R$  do:

if the transition under *a* is defined at each state of the set  $K_i$  then derive the set  $D_i$  of all states where *a* takes  $M_A$  from states of the set  $K_i$  and the set  $L_i$  of all states reachable in  $M_A$  from the states of the set  $D_i$  under actions of the alphabet  $A \setminus R$ . Add a transition  $(K_i, a, L_i)$  to the automaton  $M_R$  and if  $L_i \notin K$  then add  $L_i$  to the set K.

**Step 5.** Increment *i* by 1. If i < |K| then Step 4. Else END; the automaton  $M_R$  is a largest solution to the inequality over alphabet *R*.

**Theorem 4.** 1. Given an automata inequality  $\diamond_E(F_1, F_2, ..., F_{k-1}, x) \leq F$ , a largest solution  $M_A$  to the inequality over alphabet A and a proper subset  $R \subset A$ , the automaton  $M_R$  obtained in Step 5 of the above algorithm is a largest solution to the inequality over alphabet R. 2. If  $\diamond_E(F_1, F_2, ..., F_{k-1}, M_R)$  is equivalent to F then  $M_R$  is a largest solution over alphabet R to the equation  $\diamond_E(F_1, F_2, ..., F_{k-1}, M_R) \cong F$ ; otherwise the equation has no solution over alphabet R.

We also remind that not each subset of the largest solution to a parallel language equation inherits the property to be a solution and the complete characterization of the set of solutions to a parallel language equation is still an open issue.

### III. EXAMPLE

In this section, we briefly consider a simple example. We solve the equation  $\diamond_E(F_1, X) \cong F$  for automata  $F_1$  and F in Figures 1 and 2 where all states are accepting states. A largest solution over  $A = \{x, o, u, v\}$  is shown in Figure 3.



Fig. 1. Automaton F



Fig. 2. Automaton  $F_1$ 



Fig. 3. Largest solution over  $A = \{x, o, u, v\}$  (after merging equivalent states)

We now derive a largest solution over  $R = \{x, u, v\} = A \setminus \{o\}$ . A transition at state 5 under *o* is undefined; thus, we delete state 5. The initial state of the obtained automaton is  $K_1 = \{1, 2, 3, 4\}$ . There is no transition from this state under *v*, as the transition under *v* is undefined at state 3. However, there is a transition ( $K_1$ , x,  $K_2$ ), where  $K_2 = \{2, 4\}$ . The resulting automaton is shown in Figure 4.



Fig. 4. A largest solution over alphabet  $R = \{x, u, v\}$ 

### IV. CONCLUSION

In this paper we have studied the problem of solving multi component language equations over a parallel composition operator. In particular, the largest alphabet over which a solution exists, is characterized. A number of restricted solutions which are considered for a parallel equation over two automata [5] can be defined in the same way for multi component automata inequalities and equations. The complexity of the proposed method seems to be lower than that of the known method since we do not use the automata complements in our method.

### REFERENCES

[1] P. Merlin and G. Bochmann, "On the construction of submodule specifications and communication protocols", *ACM Transactions on Programming Language and Systems*, 5(1): 1–25, January 1983.

[2] H. Qin, P. Lewis, "Factorisation of Finite State Machines under strong and observational equivalences", *Formal Aspects of Computing*, 3:284– 307, Jul.–Sept. 1991.

[3] A. Petrenko, N. Yevtushenko, "Solving asynchronous equations", *Formal description techniques/Protocol specification, testing and verification.* Kluwer Academic Publishers, 1998, pp. 125–140.

[4] K. El-Fakih, N. Yevtushenko, S. Buffalov, G. v. Bochmann, "Progressive solutions to a parallel automata equation", *Theoretical Computer Science*, 362, 2006, pp. 17-32 (The preliminary version was published in *Lectures Notes in Computer Science*, 2003, vol. 2767, pp. 367 – 382).

[5] N.Yevtushenko, T.Villa. R.K.Brayton, A.Petrenko, A.Sangiovanni-Vincentelli, "Solving a parallel language equation", *In Proceedings of the ICCAD* '01, USA, 2001. pp. 103-110.

[6] J. E. Hopcroft and J. D. Ullman, *Introduction to automata theory, Languages, and Computation*, Addison-Wesley, 1979.

# Experiments with the Linear Automata and Synthesis Test to Them

Dmitry Speranskiy

*Abstract* - The overview of results in area of the experiments theory with linear automata is given. This theory is a fundamental base to devise methods of discrete systems technical diagnosis.

Index Terms – automata, discrete system, technical diagnosis.

### I. INTRODUCTION

The mathematical model of processes and devices, named the finite-state machines (automaton), is a very simple one, however, its model is very convenient and being widely used in informatics and engineering. The theory of the finite-state machines is a fundamental unit of the modern informatics, but the theory of experiments with automata has a direct connection to the reliability problem of discrete devices.

An automaton is considered as a system with unknown internal structure but we are able to observe "external" behavior of automaton (response of automaton on input sequence).

Some results of research in theory of experiments with finite-state machines up to 1960s were sum up by *A*. *Gill* in [1].

According to A. Gill, experiment is a process of applying input sequences to automaton, observation of resultant output sequences and conclusions, based on those observations.

One of the central questions in the theory of experiments is how to find an input sequence for experiment process. It is shown in [1] that for the most general model (Mealy automaton) the construction methods of above mentioned input sequences are very labor-consuming.

One of possible ways to reduce the complexity is the way of research of particular automaton class. In our article the results for linear automata (LA) are represented. The specific character of LA simplifies the method of experiment construction and significantly decreases the experiment length.

It is important to notice that *LA* is an adequate model of many processes and devices in real life, e.g., devices for encoding and decoding of information process, signature

analysis process, multiplication and division of binary polynomials can be defined by *LA* models.

Author of this article was involved in research of experiment automata theory for a long time. Some of our results we shortly introduce here but others were published in [3].

### II. BASIC DEFINITIONS

Let's begin with the description of LA model [2]. *LA* is a system with finite number *l* and *m* of input and output poles, respectively. Input signals apply to all inputs in discrete time moments simultaneously. It is assumed that input signals are values from the field  $GF(p) = \{0,1,..., p-1\}$ , where *p* is a prime number.

*LA state* is an ordered set of the element delay states, which are part of LA structure. Let the number of such delays is *n*. The number *n* is called *LA* dimension and state set of *LA* is designated as  $S_n$ .

Let's introduce the following notations:

$$\overline{u}(t) = [u_1(t), ..., u_l(t)]', \ \overline{y}(t) = [y_1(t), ..., y_m(t)]',$$
$$\overline{s}(t) = [s_1(t), ..., s_n(t)]'.$$

Here  $\overline{u}(t)$ ,  $\overline{y}(t)$ ,  $\overline{s}(t)$  are input, output and vectorstate respectively and t is a discrete time moment.

The functioning of *LA* is given by a system of equations of state and output respectively:

$$\overline{s}(t+1) = A\overline{s}(t) + B\overline{u}(t), \qquad (1)$$

$$\overline{y}(t) = C\overline{s}(t) + D\overline{u}(t), \qquad (2)$$

 $y(t) = Cs(t) + Du(t), \qquad (2)$ where  $A = [a_{i,j}]_{n \times n}, \quad B = [b_{i,j}]_{n \times l}, \quad C = [c_{i,j}]_{m \times n},$  $D = [d_{i,j}]_{m \times l}$  are called characteristic matrices. Every

 $D = [a_{i,j}]_{m \times l}$  are called characteristic matrices. Every matrix consists of the elements of GF(p).

Using the mathematical induction method, we can prove that the final state and output response on input sequence  $\overline{u}(0), \overline{u}(1), \dots, \overline{u}(k)$ , of the length k-1, can be calculated by the following formulas, where  $\overline{s}(0)$  is an initial LA state:

$$\overline{s}(k+1) = A^{k+1}\overline{s}(0) + A^{k}B\overline{u}(0) + A^{k-1}B\overline{u}(1) + + \dots + AB\overline{u}(k-1) + B\overline{u}(k), \qquad (3)$$
$$\overline{y}(k) = CA^{k}\overline{s}(0) + CA^{k-1}B\overline{u}(0) + CA^{k-2}B\overline{u}(1) + . + \dots + CB\overline{u}(k-1) + D\overline{u}(k). \qquad (4)$$

Manuscript received February 4, 2008.

Dmitry Speranskiy is with Saratov State University, *E-mail*: <u>SperanskiyDV@info.sgu.ru</u>

Now we shall define various types of experiments which will be used in our research. To keep it more compact, we shall make it with regard to the general model of Mealy automaton.

Mealy automaton is a set of five objects

$$A = (S, X, Y, \delta, \lambda)$$

where *S*, *X*, *Y* are finite sets of the states, input and output alphabets respectively, but  $\delta: S \times X \to S$  and  $\lambda: S \times X \to Y$  are the maps. These maps are called transition and output functions. Let S is set  $S = \{x_1, ..., x_n\}$ 

and X is set  $X = \{x_1, ..., x_l\}$ .

**Definition 1.** The input sequence  $p = x_{i_1}, x_{i_2}, ..., x_{i_a}$  is called a synchronizing sequence (SS) if  $\forall s_{j_1}, s_{j_2} \in S \quad \delta(s_{j_1}, p) = \delta(s_{j_2}, p)$ .

**Definition 2.** The input sequence  $p = x_{i_1}, x_{i_2}, ..., x_{i_a}$  is called a homing sequence (*HS*) if  $\forall s_{j_1}, s_{j_2} \in S$  $\lambda(s_{j_1}, p) = \lambda(s_{j_2}, p) \rightarrow \delta(s_{j_1}, p) = \delta(s_{j_2}, p)$ .

It is obvious that SS is a singular HS, as applied SS leads LA to the known final state, however there is no need to observe automaton response.

**Definition 3.** The input sequence  $p = x_{i_1}, x_{i_2}, ..., x_{i_a}$  is called a diagnostic sequence (DS) if  $\forall s_{j1}, s_{j2} \in S$ 

 $\lambda(s_{j1}, p) = \lambda(s_{j2}, p) \to s_{j1} = s_{j2}.$ 

It is clear that every *DS* is *HS*, at the same time, but the contrary statement is false.

### III. CONDITIONS OF SS, HS AND DS EXISTENCE FOR LA

Let's note that conditions of existence of SS, HS and DS were defined in [1] but it has been made in terms of a complex construction of a successor-tree and therefore conditions verification is a very labor-consuming process.

The results given below will show that these conditions for LA can be easily verified.

Note that all statements listed in the article are given without proof. These proofs are done in the articles [4-10] and monograph [3] mentioned in References.

**Theorem 1.** A necessary and sufficient condition that *LA A* has *SS* of length *k* is  $A^k = [0]$ .

Here [0] is a null-matrix.

**Theorem 2.** If there is a certain SS of length t for LA then every input sequence of the same length or more is also SS for this LA.

**Theorem 3.** A length of minimal *SS* for *LA* of dimension *n* is not more than *n*.

This theorem provides a simple rule for *SS* existence verification: we need to exponentiation the matrix *A* k-times (k=2,3,...) until  $A^k = [0]$ . If  $k \le n$  then SS exits, otherwise the process is stopped, *SS* does not exist for this *LA*.

**Theorem 4.** HS of the length k+1 for LA  $\widetilde{A}$  exists if and

only if 
$$(\forall \overline{s} \in S_n \quad \bigvee_{d=0}^{k} CA^d \overline{s} \neq [0]) \lor A^{k+1} = [0].$$

Here  $\bigvee_{d=0}$  is a conjunction of (k+1) expressions after this sign.

**Theorem 5.** If  $LA \ \widetilde{A}$  with a nonsingular characteristic matrix *C* has at least one *HS* of length (k + 1) then all sequences of the same length and more are *HSs* for this *LA*.

It means that HS construction problem comes to a problem how to find a natural number k that HS of the length k for given LA exists. Note that the same problem for Mealy automaton is not trivial and required labor-consuming solution methods.

**Theorem 6.** A length of minimal *HS* for *LA* of dimension *n* is not more than *n*.

Now we shall consider DS.

**Theorem 7.** DS of the length t for LA of the dimension n exists if and only if the rank of matrix

$$K_{t} = \begin{vmatrix} C \\ CA \\ CA^{2} \\ \dots \\ CA^{t-1} \end{vmatrix}$$

is equal to *n*.

**Theorem 8.** If LA has at least one DS of length k then any input sequence of length k and more will be also DS for this LA.

It was proved in [1] that in the general case the minimality of Mealy automaton is a necessary but not sufficient condition of DH existence for these automata. The following statement is true for LA.

**Theorem 9.** If *LA* is minimal automata then *DS* for this *LA* exists.

**Corollary.** A length of minimal DS for LA of the dimension n is not more than n.

Finding DH for given LA is called a diagnostic problem. It is known [1] that the ability to solve such problem depends on set of admissible initial states and used methods. It was shown in [1] that the most powerful method to solve the diagnostic problem is multiple experiments. Simple unconditional experiments are less helpful in this situation. The following statement shows that above mention is not true for LA.

**Theorem 10.** The diagnostic problem is always solvable by a simple unconditional diagnostic experiment for any minimal *LA* and for any set of admissible initial states.

### IV. EXPERIMENTS WITH THE NON-STATIONARY LA

Now we shall consider so-called non-stationary *LA* (*NLA*) that is described by system of equations

$$\overline{s}(t+1) = A(t)\overline{s}(t) + B(t)\overline{u}(t),$$

$$\overline{y}(t) = C(t)\overline{s}(t) + D(t)\overline{u}(t)$$
.

The matrix dimensions in these equations are the same as in (1) and (2).

It is proved by analogy with LA that the final state and the output response of NLA after application of input sequence  $\overline{u}(0), \overline{u}(1), ..., \overline{u}(t)$  can be calculated by the following formulas:

$$\overline{s}(t+1) = A(t)A(t-1)...A(0) \ \overline{s}(0) + \\ + \sum_{i=0}^{t-1} A(t)A(t-1)...A(i+1)(B(i)\overline{u}(i) + B(t)\overline{u}(t) , \\ \overline{y}(t) = C(t)A(t-1)...A(0) \ \overline{s}(0) + \\ + \sum_{i=0}^{t-1} C(t)A(t-1)...A(i+1)B(i)\overline{u}(i) + D(t)\overline{u}(t) .$$

Here  $\overline{s}(0)$  is the initial state of *NLA*.

The validity of below listed statements is proved in [3].

**Theorem 11.** The input sequence  $\overline{u}(0), \overline{u}(1), ..., \overline{u}(t)$  is SS for NLA  $\widetilde{A}$  if and only if

$$\forall \ \overline{s}_1(0), \overline{s}_2(0) \in Init(\widetilde{A})$$
  
 
$$A(t)A(t-1)...A(0)[\overline{s}_1(0) - \overline{s}_2(0)] = [0].$$

Here  $Init(\widetilde{A})$  is admissible initial state set of NLA  $\widetilde{A}$ .

**Corollary 1.** If  $Init(\widetilde{A}) = S_n$  then a necessary and sufficient condition of existence of SS of length (t+1) for NLA  $\widetilde{A}$  is the following one:

$$A(t)A(t-1)...A(0) = [0]$$

**Corollary 2.** If  $Init(\widetilde{A}) = S_n$  and there is a certain SS of length t for NLA then every input sequence of the length t or more is also a SS for this NLA.

It is easy to see that in the general case the length of *SS* has no upper bound.

Let's consider *NLA* with the following characteristic matrices:

$$A(i) = E$$
 for  $i = 0, 1, ..., t - 1, A(t) = 0$ 

where *E* is init matrix. It is clear from <u>Theorem 11</u> that such *NLA* has *SS* of length (t+1) but every input sequence of the length less than (t+1) is not *SS*. As far as *t* is an arbitrary parameter therefore above mentioned statement is true.

Now we shall consider the special class of *NLA*, so-called periodical *NLA*. Every such automaton has periodical characteristic matrices. In other words, there is integer positive number  $\lambda$  that  $A(t + \lambda) = A(t)$ ,  $B(t + \lambda) = B(t)$ ,  $C(t + \lambda) = C(t)$ ,  $D(t + \lambda) = D(t)$ .

Now we construct a stationary automata  $\widetilde{A}_{st}$  (based on  $\sim$ 

NLA  $\widetilde{A}$  ) with the following transition function:

$$\overline{s}(t+1) = \widehat{A} \cdot \overline{s}(t) ,$$

where  $\hat{A} = A(\lambda - 1)A(\lambda - 2)...A(0)$ .

**Theorem 12.** There is SS for periodical NLA  $\widetilde{A}$  if and only if there is SS for the stationary automata  $\widetilde{A}_{st}$ .

**Theorem 13.** If  $\lambda$  is a period of characteristic matrix A(t) of NLA  $\widetilde{A}$  then the length of minimal SS is not more than  $\lambda n$ , where *n* is NLA dimension.

**Theorem 14.** The input sequence  $\overline{u}(0), \overline{u}(1), ..., \overline{u}(t)$ is *HS* for periodical *NLA*  $\widetilde{A}$  if and only if  $\forall \ \overline{s}_1(0), \overline{s}_2(0) \in Init(\widetilde{A}) \quad \exists k \in [0, t]$  $(C(k)A(k-1)...A(0)[\overline{s}_1(0) - \overline{s}_2(0)] \neq [0]) \lor$  $\lor (A(t)A(t-1)...A(0)[\overline{s}_1(0) - \overline{s}_2(0)] = [0])$ 

**Corollary.** If there is a certain *HS* of length *t* for periodical *NLA* then every input sequence of the same length or more is also *HS* for this *NLA*.

By analogy with the stationary *LA* we introduce so-called diagnostical matrix of *NLA*:

$$K_{t} = \begin{bmatrix} C \\ CA(0) \\ CA(1)A(0) \\ \dots \\ CA(t-1)A(t-2)\dots A(0) \end{bmatrix}.$$

**Theorem 15.** The input sequence  $\overline{u}(0), \overline{u}(1), ..., \overline{u}(t)$  is *DS* for periodical *NLA* if and only if

rank  $K_t = n$ ,

where *n* is *NLA* dimension.

**Corollary.** If there is a certain DS of the length t for the periodical NLA then every input sequence of the same length or more is also DS for this NLA.

### V. LA TESTING

Faults can occur during exploitation of digital devices (DD) therefore its checks should be carried out. One of ways for fault detection is applying a special input sequence (test) to device. A response of DD on test must differ subject to a technical state of DD (good or faulty). Consequently, the fault detection process is an experiment with DD.

It is clear that experimenter must know the initial state of this *DD* before application of *DD* test. The identification of initial state in some cases may be carried out by *SS*, *HS* and *DS* that was described in 3rd section. The test construction methods that were known earlier contained hard restrictions with regard to *DD* structure and information about the initial state of *DD*.

The methods of the test construction suggested below don't demand to comply with mentioned restrictions and less labor-consuming.

Now we shall define a problem which is investigated in this section.

LA and its fault admissible modification are given. It is required to construct the input sequence (test) which will

detect above mentioned fault. In other words, the response of good and faulty *LA* on test must be different regardless of initial states.

Let's remind some definitions we'll use further.

We say that LA A has a finite memory of depth  $\mu$  if for any time moment *t* takes place the following equality:

 $\overline{y}(t) = f(\overline{u}(t), \overline{u}(t-1), ..., \overline{u}(t-\mu), \overline{y}(t-1), ..., \overline{y}(t-\mu))$ It means that we can predict *LA* reaction in any time moment *t* if we know an input sequence and response of *LA* in previous  $\mu$  time moments.

It is known [2] from LA theory that every LA has a finite memory of depth  $\mu$  where  $\mu \leq n$  (*n* is LA dimension).

We say that *LA* is  $\mu$ -definite one if *LA* response in discrete moment *t* depends only on previous  $\mu$  inputs:

$$\overline{y}(t) = f(\overline{u}(t), \overline{u}(t-1), \dots, \overline{u}(t-\mu))$$

Now pass on to description of test construction method. Let  $A_1$ ,  $B_1$ ,  $C_1$ ,  $D_1$  be the characteristic matrices of faulty  $LA \quad \widetilde{A}_1$ .

Let's consider a case when good LA and a faulty one are both  $\mu$  -definite ones but values of parameters  $\mu$  are different. Let  $\mu = \max(\mu_1, \mu_2)$  where  $\mu_1(\mu_2)$  is a depth of memory for LA  $\widetilde{A}(\widetilde{A}_1)$ . It is proved in [2] that a necessary and sufficient conditions for  $\mu$  -definiteness of LA is the equality  $CA^{\mu} = [0]$  This implies that  $CA^{k} = [0]$  and  $C_1A_1^{k} = [0]$  for any  $k \ge \mu$ . Take into account these equalities and (4), then the reaction on input sequences  $\overline{u}(0), \overline{u}(1), ..., \overline{u}(\mu)$  both LAs, regardless of their initial states, should be

$$\overline{y}(\mu) = CA^{\mu-1}B\overline{u}(0) + CA^{\mu-2}B\overline{u}(1) + \dots + + CB\overline{u}(\mu-1) + D\overline{u}(\mu), \overline{y}_1(\mu) = C_1A_1^{\mu-1}B_1\overline{u}(0) + + C_1A_1^{\mu-2}B_1\overline{u}(1) + \dots + D_1\overline{u}(\mu).$$

Subtracting one form another, we get

$$\overline{y}(\mu) - \overline{y}_{1}(\mu) = [CA^{\mu-1}B - C_{1}A_{1}^{\mu-1}B_{1}]\overline{u}(0) + \dots + [D - D_{1}]\overline{u}(\mu).$$
(5)

It is clear that the given fault is detected by input sequence  $\overline{u}(0), \overline{u}(1), ..., \overline{u}(\mu)$  (test) if  $\overline{y}(\mu) - \overline{y}_1(\mu) \neq [0]$ .

The relation (5) we shall interpret as a system of linear algebraic equations (*SLAE*) of unknown variables

$$u = [u_1(0), ..., u_l(0), ..., u_1(\mu), ..., u_l(\mu)].$$
 (6)

Let Q is the matrix of system (5), then (5) may be written in the following form:

$$Qu = y, \qquad (7)$$

where *y* is a *m*-dimensional nonzero vector.

Let T is the set of all tests for detection of given fault. In order to find the set T we need to vary the right side of (7) and find solutions of corresponding system.

It is clear that the number of different nonzero vectors y is  $p^{(\mu+1)m}$ . Even for small values  $\mu$  and m this value is very big. Therefore we shall consider more effective method.

We shall consider a homogeneous system instead of (7)

$$\partial u = [0]. \tag{8}$$

Let  $U_0$  is a set of solutions for this system. If U is a set

of all vectors of type (6) then obviously a set  $U \setminus U_0$  is a set T, so we reduce finding solutions to one homogeneous system (8).

Now we pass on to test construction for lock-in *LA*. We say that *LA* is a lock-in one if there is an *SS* for this *LA*.

**Theorem 16.** Any lock-in *LA* is a  $\mu$  – definite *LA* at the same time.

It is clear that application of above presented test construction method for the lock-in LA can be based on this theorem.

Now we shall describe test construction method for arbitrary LA but not for lock-in ones only. This method is based on the fact that any LA has a finite memory.

Let good (faulty) *LA* has the depth memory  $\mu_1(\mu_2)$  and  $\mu = \max(\mu_1, \mu_2)$ . It is known from [2] that the output functions *LA*  $\widetilde{A}$  and faulty *LA*  $\widetilde{A}_1$  always can be presented in the following form:

$$\overline{y}(t) = V_0 \overline{u}(t) + V_1 \overline{u}(t-1) + \dots + V_\mu \overline{u}(t-\mu) + W_1 \overline{y}(t-1) + \dots + W_\mu \overline{y}(t-\mu),$$

$$\overline{y}_1(t) = V_0^1 \overline{u}(t) + V_1^1 \overline{u}(t-1) + \dots + V_\mu^1 \overline{u}(t-\mu) + W_1^1 \overline{y}(t-1) + \dots + W_\mu^1 \overline{y}(t-\mu), \quad (9)$$

where  $V_i(V_i^1)$ ,  $W_i(W_i^1)$  are matrices of corresponding dimension.

It is clear that if  $\overline{u}(t-\mu)$ ,  $\overline{u}(t-\mu-1)$ ,  $\overline{u}(t)$  is test of the minimal length then  $\overline{y}(t-j) = \overline{y}_1(t-j)$  for  $j = 1, ..., \mu$ , but  $\overline{y}(t) \neq \overline{y}_1(t)$ .

Subtracting one form from another in (9), we get

$$\overline{y}(t) - \overline{y}_{1}(t) = [V_{0} - V_{0}^{1}]\overline{u}(t) + \dots + \\ + [V_{\mu} - V_{\mu}^{1}]\overline{u}(t - \mu) + [W_{1} - W_{1}^{1}]\overline{y}(t - 1) + \\ + \dots + [W\mu - W_{\mu}^{1}]\overline{y}(t - \mu) .$$
(10)

Equating (10) to some nonzero vector, we get *SLAE* of unknown variables, which are the coordinate of vector

$$u = [u_1(t - \mu), ..., u_l(t - \mu), ..., u_1(t), ..., u_l(t)] .$$

Let Q is the matrix of obtained *SLAE* then this *SLAE* can be written as

$$Qu = y \,. \tag{11}$$

### R&I, 2008, № 1

Note that the finding solutions of system (11) is pretty the same as for system (7).

Thus, the construction of all test sets for arbitrary *LAs*, both  $\mu$  - definite and lock-in ones, can be reduced to a solution of one homogeneous system of equations.

### VI. CONCLUSION

The above represented results show that the specific of LAs significantly simplify the construction of experiment theory for them. Thus, this specific makes it possible to decrease the upper bound of length for arbitrary types of experiments in comparison to corresponding assessment that is known for Mealy automaton. In addition, mentioned specific allows to reduce the problem of the experiment construction (in the general case this problem is very complex and labor-consuming) to more simple existence problem for such experiments. The last problem is solved by simple calculation of multiplying matrices, matrix exponentiation or matrix ranks. In other words, the conditions of existences of experiments can be easily verified.

It should be noted that after experiment's finished the identification of state can be carried out by solution of *SLAE*. There are well known and good developed mathematical methods for that.

Moreover, it is significant that above described methods provide test construction of length  $\mu + 1$ , where  $\mu$  is the depth of *LA* memory. Since, as it's well known [2],  $\mu \le n$ , where *n* is *LA* dimension, the methods shown in this article can provide very short test.

### REFERENCES

- Gill A. Introduction to the Theory of Finite-State Machines. N.Y.: McGraw-Hill, 1962.
- [2] Gill A. Linear Sequential Circuits. Analysis, Synthesis and Applications.– N.Y.: McGraw-Hill, 1966.
- [3] Speranskiy D.V. Experiments with the linear and the bilinear finitestate machines.-Saratov: Publish. House of the Saratov State University, 2004. (in Russian)
- [4] Speranskiy D.V. About testing of the linear automata // Avtomatika i telemekhanika.-2000.-№5.-C.157-165.( in Russian)
- [5] Speranskiy D.V., Speranskiy I.D. Experiments with the linear discrete systems //Elektronnoe modelirovanie.- 1999.-№4.-C. 64-73.( in Russian)
- [6] Speranskiy D.V. Generalized synchronization of the finite-state machines // Kibernetika i sistemny analiz.-1998.-№3.-C.17-25. ( in Russian)
- [7] Speranskiy D.V. Synthesis of the tests with minimal number of a signal drops for the linear automata // Avtomatika i vych. tekhnika.-2002.-№4.-C.70-78.( in Russian)
- [8] Speranskiy D.V. Synchronization of the linear finite-state machines // Avtomatika i telemekhanika.-1996.-No5.-C.141-149.(in Russian)
- [9] Speranskiy D.V. Homing and diagnostic sequences for the linear automata // Avtomatika I telemekhanika.-1997.-№5.-C.133-141.(in Russian)
- [10] Speranskiy D.V., Speranskiy I.D. Resolution of the diagnostic experiments with the linear automata//Kibernetika i sistemny analiz.-2000.-№3.-C. 62-65.( in Russian)

# Research digital devices by means of modelling system on the basis of *K*-Value differential calculus

Dmitrienko V.D., Leonov S.Yu., Gladkikh T.V.

Abstract – Given article is devoted to the description of new system's of simulation, which based on the mathematical device of K-Value differential calculus, possibilities. This system intend for research the complex high-speed devices constructed with usage of modern technologies.

*Index Terms* – CAD, elements models, equations with delay, fuzzy fronts, *K*-Value differential calculation, simulation, switching capacity.

### I. INTRODUCTION

Now computer simulation of digital devices behavior is an integral part of automation's systems of their designing and verification. By means of simulation in systems of electronic devices computer-aided design solve such tasks as definition of installation's chains signals of digital devices and their subsequent logical operation, optimization of synchronization's chains parameters and time characteristics of switching, the analysis of signals competitions, definitions of signals distribution's time and their delays, constructions of checking tests and checks of their entirety [1-3]. Thus complexity of modern electronic devices, high frequency of their operation demand appropriate new methods of the analysis of their operation correctness with usage of quantitative and qualitative characteristics of possibility of failures appearance in them

Complexity of modern digital single-crystal systems (System On Chip – SOC) and necessity of rise of their simulation's reliability have led to necessity of many-valued alphabets usage instead of traditional binary. The least labor-consuming is simulation by means of the three-value Eihelberger's alphabet [2] using three characters ("unit", "zero" and "uncertainty" (X)). This alphabet allows considering two steady states at simulation, and the others it represents all as uncertainty. Some extension of three-value simulation is five-digit which in addition uses the

characters, enabling to distinguish smooth transitions from competitions of signals.

Fantozi alphabet [3] contains nine characters and allows to differentiate already at simulation static and dynamic risks of failures as to three characters of Eihelberger's alphabet such states as smooth transition from "0" in "1" (E), smooth transition from "1" in "0" (H) are added, static risk of failure in "0" (P), static risk of failure in "1" (V), dvnamic risk of failure from "0" in "1" (F), dynamic risk of failure from "1" in "0" (L). Introduction of four more characters: O (transition from uncertainty in "0"), I (transition from uncertainty in "1"), A (transition from "0" in uncertainty) and B (transition from "1" in uncertainty), has allowed to distinguish phases of uncertainty and stability [4]. Application of logical many-valued operators enables to differentiate risks of failures, races and competitions of signals that are rather essential to combinative devices where it is necessary to find critical places (structural components) with a view of their subsequent modification and elimination of competitions [5].

At the same time, in known systems simulation of a new class processes and physical faults which are caused by usage of high frequencies of units operation is complicated and are linked to essential increase of a role of signals distribution on explorers of digital devices time parameters. Besides in similar systems there is no possibility to parse influence of increase quantitative parameters and recession of logical signals which are defined by power of switching.

The purpose of article is the description of the new automated system of the complex high-speed devices constructed with usage of modern technologies verification, which basis the mathematical device of K-value differential calculus [6, 7]

### II. FORMATION OF *K*-VALUE COMPUTER ELEMENTS MODELS

The developed system is based on the mathematical device of *K*-Value differential calculus. This method is development on idea of Boolean differential calculus [1], but, as against it, allows taking into account dynamics of fronts of switching logic signals. The mathematical apparatus of the system of automated designing on the basis

Manuscript received February 9, 2008.

Dmitrienko V.D. with the National Technical University "Kharkov Polytechnical Institute". (e-mail: sul@itechcraft.com).

Leonov S.Yu. with the National Technical University "Kharkov Polytechnical Institute". (e-mail: sul@itechcraft.com).

Gladkikh T.V. with the National Technical University "Kharkov Polytechnical Institute". (e-mail: sul@itechcraft.com).

of *K*-Value differential calculus is guided by use of *K*-Value functions, i.e. the functions accepting values from set of integers  $\{0, 1, ..., K-1\}$  during the discrete moments of time  $t_i = 0, 1, ..., N$ . In this case logic signals quantize on amplitude and receive corresponding values of the whole *K*-Value numbers. Processing of these values is conducted on the basis of use of *K*-Value derivatives which can be entered as follows:

$$\frac{dF^{\Delta}(t_{i})}{dt_{i}} = \frac{F(t_{i} + \Delta t) \langle - \rangle_{k} F(t_{i} - \Delta t)}{2\Delta t};$$
$$\frac{dF^{-}(t_{i})}{dt_{i}} = \frac{F(t_{i}) \langle - \rangle_{k} F(t_{i} - \Delta t)}{\Delta t};$$
$$\frac{dF^{+}(t_{i})}{dt_{i}} = \frac{F(t_{i} + \Delta t) \langle - \rangle_{k} F(t_{i})}{\Delta t},$$

where  $\langle - \rangle_k$  – operation of subtraction on module *K*; *F*(*t<sub>i</sub>*) – the function, accepting value from set [0, 1, ..., *K*–1] during the discrete moments of time  $t_i = 0, 1, ..., N$ ;  $\Delta t = 1$  – the minimal increment of an independent variable  $t_i$ .

From the entered definitions of derivative of *K*-Value functions in practice is used only such reception of *K*-Value derivatives, when values of *K*-Value functions during the current and previous moments of time are known. With the help of such *K*-Value differential operator any element digital computing can be described. Basic logic *K*-Value operations above *K*-Value operands are entered according to tables Kelly [2]. Besides the use of *K*-Value derivatives there is an opportunity to make calculation only at the moment of switching entrance signals that is at the moment of time when the *K*-Value derivative is not equal to zero. It allows reducing considerably time expenses at simulation devices of the big complexity.

Any element of digital computers in developed CAD is described by system of the *K*-Value differential equations with the late argument, looking like:

$$\frac{dU_{\text{out1}}(t_i)}{dt_i} = f(U_{\text{out1}}(t_i - 1), U'_{\text{out1}}(t_i - D_1), U'_{\text{inp11}}(t_i - D_1), U'_{\text{inp12}}(t_i - D_1), \cdots, U'_{\text{inp1}N_1}(t_i - D_1), t_i), t_i \ge t_0;$$

 $\frac{dU_{\text{out}M}(t_i)}{dt_i} = f(U_{\text{out}M}(t_i-1), U'_{\text{out}M}(t_i-D_M))$ 

$$dt_{i} \qquad U'_{\text{out}M1}(t_{i} - D_{1}), U'_{\text{out}M2}(t_{i} - D_{M}), \cdots, \\ U'_{\text{out}MN_{M}}(t_{i} - D_{M}), t_{i}), t_{i} \ge t_{0},$$

where  $\frac{dU_{\text{out}\,j}(t_i)}{dt_i}$  – value of derivative of unknown target signal  $U_{\text{out}j}(t_i)$  j's internal logic block at the moment of time  $t_i$ ;  $j = \overline{1, M}$ ; M – quantity of outputs of an element;  $U'_{\text{out}\,j}(t_i - D_j)$  and  $U'_{\text{inp}\,jw}(t_i - D_j)$  – the modified values of j's target signal  $U_{\text{out}\,j}(t_i - D_j)$  and jw's input signal  $U'_{\text{inp}\,jw}(t_i - D_j)$  logic element at a moment of time  $(t_i - D_j)$ ;  $jw = \overline{11, MN_M}$ ;  $j = \overline{1, M}$ ;  $w = \overline{1, N_M}$ .  $D_j$  – delay of j's logic block of the device,  $j = \overline{1, M}$ .

Presence of late argument gives an opportunity to description and simulation elements with feedback, such, for example, as elements with memory. Thus the size of delay not necessarily should have some unique value. For various internal logic units of an element it can accept various values. These values, in turn, also can vary in limits from minimal up to the maximal size of delay of element. It provides simulation an element with floating delays of its internal units.

As arguments of the *K*-Value differential equations making system, their switching modified in view of capacity, value of entrance, intermediate and target signals of an element are entered. This updating is carried out according to expression:

$$\begin{split} U_{\text{out}\,j}'(t_i - D_j) &= \begin{cases} U_{\text{out}\,j}(t_i - D_j), \text{if } \widetilde{E}_{\text{out}}^j(t_i - D_j) < \widetilde{E}_p, \\ j = \overline{1, M}; \end{cases} \\ K - 1, \text{if } (\widetilde{E}_{\text{out}}^j(t_i - D_j) \ge \widetilde{E}_p), \\ U_{st\_B\,j}(t_i - D_j) = 0, \ j = \overline{1, M}; \end{cases} \\ 0, \text{if } (\widetilde{E}_{\text{out}}^j(t_i - D_j) \ge \widetilde{E}_p), \\ U_{st\_B\,j}(t_i - D_j) = K - 1, \ j = \overline{1, M}; \end{cases} \\ \end{split} \\ \begin{aligned} U_{\text{inp}\,j\,w}(t_i - D_j) &= \begin{cases} U_{\text{inp}\,j\,w}(t_i - D_j), \text{if } \widetilde{E}_w^j(t_i - D) < \widetilde{E}_p, \\ w = \overline{1, N_j}; \end{cases} \\ K - 1, \text{if } (\widetilde{E}_w^j(t_i - D_j) \ge \widetilde{E}_p), \\ U_{stw}^j(t_i - D_j) = 0, \ w = \overline{1, N_j}; \end{cases} \\ 0, \text{if } (\widetilde{E}_w^j(t_i - D_j) \ge \widetilde{E}_p), \\ U_{stw}^j(t_i - D_j) = K - 1, \ w = \overline{1, N_j}, \end{cases} \end{split}$$

where  $\widetilde{E}_{out}^{j}(t_i - D_j)$  and  $\widetilde{E}_{w}^{j}(t_i - D_j)$  – the saved up value of capacity of switching *j*'s output and *jw*'s input signal on moment of time  $(t_i - D_j)$ ;  $\widetilde{E}_p$  – Discrete analogue of threshold power of switching;  $U_{st - B_j}(t_i - D_j)$  and  $U_{stw}^{j}(t_i - D_j)$  – the saved up value of *j*'s output and *jw*'s input signal, fixed before the beginning of transient process.

Saved up by the moment of time  $(t_i - D_j)$  capacities of signals are defined according to the following expressions:

$$\widetilde{E}_{out}^{j}(t_{i} - D_{j}) = \begin{cases} 0, \ U_{outj}(t_{i} - D_{j}) \in \{0, K - 1\}, \\ t_{i} > 0; \\ \\ \sum_{t_{k} = t_{s_{-}B,j}}^{t_{i} - D_{j}} [U_{out_{j}}(t_{k})(\widetilde{U}_{max} - \widetilde{U}_{min}) + \\ + \widetilde{U}_{min} \times (K - 1)]^{2}, \\ U_{outj}(t_{i} - D_{j}) \notin \{0, K - 1\}, \\ t_{i} > 0; \\ 0, \ \text{if} \quad t_{i} = 0, \end{cases}$$

where  $\tilde{U}_{\min}$ ,  $\tilde{U}_{\max}$  – integer analogues of levels the voltage corresponding logic unit and logic zero;  $t_{s_{B}j}$  – the moment of the beginning of transient of switching *j*'s target signal from one steady condition in another:

$$t_{s\_Bj} = \begin{cases} (t_i - D_j), \ U_{\text{beins }j}(t_i - D_j) \in \{0, K-1\}, t_i > 0; \\ t_{s\_Bj}, \ U_{\text{beins }j}(t_i - D_j) \notin \{0, K-1\}, t_i > 0; \\ -D_j, t_i = 0; \end{cases}$$

The size of delay  $D_j j$ 's internal logic unit of an element accepts the value from set  $T_{zj}$  possible sizes of delays of unit. Unit  $T_{zj}$  depends on values of typical, minimal and maximal delays of an element and can be set by function  $f_T(t_{z_opt}, t_{z_min}, t_{z_max}, Ctrl_T)$ . Signal  $Ctrl_T$ represents the managing signal determining a way of the task of this set:

- if  $Ctrl_T = 0$ , then set of all values of delays is reduced to a typical delay of an element  $(T_{z_i} = \{t_{z_opt}\})$ ;

- if  $Ctrl_T = 1$ , then set contains typical, minimal and maximal sizes of a delay  $(T_{z_i} = \{t_{z_{\min}}, t_{z_{opt}}, t_{z_{\max}}\});$ 

- if  $Ctrl_T = 2$  or  $Ctrl_T = 3$ , then set  $T_{z_i}$  contains

possible values of a delay which are defined in conformity by the law of distribution of the random variable describing a deviation of a delay from its typical value. This random variable has the normal law of distribution which parameters are defined by the minimal, maximal and typical sizes of a delay of logic unit. - if  $Ctrl_T = 4$ , then set  $T_{z_j}$  contains except for typical value also the delays of an element deviating from  $t_{z_opt}$  on size  $\tau$ , equal to duration of its transitive process of switching from one steady condition in another  $(T_{z_i} = \{\text{Round}(t_{z_opt} - \tau), t_{z_opt}, \text{Round}(t_{z_opt} + \tau)\}).$ 

In the developed system of the automated designing on the basis of *K*-Value differential calculus general model of an element of digital computers which can contain in structure *M* of logic internal units, each of which has one exit and  $N_j$  entrances ( $j = \overline{1, M}$ ), is set by the structure resulted on the fig. 1.

The mainframe of the given structure is the block 1 which is intended for the decision system of the K-Value differential equations with late argument. The structure of an element also includes the block 4 which are carrying out initialization of an element which consists in definition of the sizes of buffers at their initial filling. Before signals, which are removed from buffer elements will arrive on the solving block of system of the K-Value differential equations (the block 1), their values can be changed with depending on power of switching of elements. This analysis is carried out in the block 5 which is intended for calculation of power of entrance and target signals. On an input of this block are act signals, which are removed from entrance and target buffer elements. And their values are removed from an output of this block with the account of power analysis already.

The signal *Ctrl* define the stage of initialization or a stage of modelling is carried out. And if the stage of modelling is defines, can be two variants – calculation values of a target signal with modeling or updating of buffer elements gets out.

For maintenance of simulation with floating delays the structure resulted on fig. 1, includes the block 7 which are carrying out formation of sets of allowable sizes of delays of logic unit of an element. Thus required accuracy of the analysis is defined by signal  $Ctrl_T$  according to which the specified set is formed.

Besides the above described blocks, structure of the generalized structure (fig. 1) contains also the block 6 intended for switching of entrance signals of an element and target signals of internal logic units. Thus this block forms sets of entrance signals for each internal element.

Depending on features of functioning of the projected device and the requirements showed to the spent analysis of their serviceability in developed system on the basis of K-Value differential calculus it is possible to use both full model of an element, and its separate individual kinds. Use of these models is defined by the chosen mode of simulation.



Fig. 1. Structure of general model of an element of digital computers on the basis of K-Value differential calculus

In developed system it is allocated four basic modes of simulation:

<u>Mode of modeling 1.</u> It is connected to the description of functioning of elements the *K*-Value differential equations with delay. At a choice of the first mode in the structure of an element resulted on fig. 2 block of power analysis works as the transfer buffer. Thus values of the signals, acting on its input without change are transferred to the block 1. Besides on an output of the block of formation of delays of internal logic units of an element of set  $T_{z_i}$  contain only

### typical values.

<u>Mode of modeling 2.</u> This mode corresponds to the description of functioning of elements *the K*-Value differential equations with delay in view of capacity of switching of entrance and target signals. In this mode on structure of an element the block of power analysis completely functions. It allows to execute simulation of an element in view of capacity of switching of entrance and target signals. In this case as delays of elements their typical values are used only.

<u>Mode of modeling 3.</u> This mode corresponds to simulation with "floating" delays at use the description of functioning of elements by the *K*-Value differential equations with delay.

In this case the element is represented by full structure in which the block 5 of power analysis, however, does not function, i.e. signals on an output of this block fully comply with signals on its input <u>Mode of modeling 4.</u> This mode represents a mode of simulation with "floating" delays and the account of capacity of switching of entrance and target signals at use of the description of functioning of elements by the *K*-Value differential equations with delay. At a choice of the fourth mode of simulation all blocks which are included in structure of an element are in working order that allows executing the complex analysis of serviceability of the device with use of all opportunities incorporated in developed system.

### III. RESEARCH OF FAILURES RISKS WITH USE OF MODELLING ON THE BASIS OF *K*-VALUE DIFFERENTIAL CALCULUS

Now development of new and perspective devices is fulfilled on the basis of usage of CMOS-technology. However with implantation in process of CMOStechnology's production there was a new class of physical faults which appear in change of signals distribution time [8]. Correct operation of the digital device in this case is possible only when times of signals distribution along explorers of the logic circuit lay in the certain limits. When time of a signal's distribution quits for these limits speak that the fault of type change of delay of a signal takes place. Simulation of such devices can be fulfilled in system of simulation on the basis of K-value representation of signals and the computers base units operation's description on the basis of the K-value differential equations.

The offered system of simulation allows represent if necessary explorers of digital devices as the long lines

connecting transmitters and signals receivers, by means of *K*-value differential or functional models [9]. The system of simulation allows research also computers operation in view of power of switching processes that enables to determine a static noise stability of projected devices. At the same time, the offered system of simulation allows to model and all processes in the digital devices, enumerated earlier, and received at usage known thirteen-value alphabets. As an example we shall consider simulation of the device resulted operation on fig. 2.



Fig. 2. The device with presence of failure risks

The main component of such device is two-input a logical unit "And" which table of states in the 13-value alphabet looks like:

| & | 0 | 1 | X | Ε | Η | P | V | F | L | 0 | Ι | Α | В |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 1 | X | Ε | Η | P | V | F | L | 0 | Ι | Α | В |
| X | 0 | X | X | Α | 0 | P | X | Α | 0 | 0 | X | Α | X |
| Ε | 0 | Ε | Α | Ε | Р | P | F | F | Р | P | F | Α | Α |
| Н | 0 | Η | 0 | Р | Η | P | Р | P | L | 0 | 0 | P | L |
| Р | 0 | Р | Р | Р | Р | P | Р | P | Р | Р | Р | P | Р |
| V | 0 | V | X | F | Р | P | V | F | L | 0 | Ι | Α | В |
| F | 0 | F | Α | F | Р | P | F | F | Р | P | F | Α | Α |
| L | 0 | L | 0 | Р | L | P | L | P | L | 0 | 0 | P | L |
| 0 | 0 | 0 | 0 | P | 0 | P | 0 | P | 0 | 0 | 0 | P | 0 |
| Ι | 0 | Ι | X | F | 0 | P | Ι | F | 0 | 0 | Ι | Α | X |
| A | 0 | Α | Α | A | P | P | Α | Α | Р | P | A | Α | Α |
| В | 0 | В | X | Α | L | P | В | Α | L | 0 | X | A | В |

For the beginning it is possible to consider situation, when on inputs of a logical unit "And" there are signals E (smooth transition from "0" in "1") and O (transition from uncertainty in "0"). It corresponds to output signal P which is on an intersection of the fifth line and the eleventh column of the table. Such output signal is static risk of failure in "0" (P). On fig. 3 results of logical unit simulation are resulted "And" at the specified entry signals (signals A and B accordingly) in system on the basis of K-value differential calculus at K = 7.

In this case in system of simulation logical "0" coincides with zero, and logical "1" corresponds to value "6" accordingly the minimum and maximum values at sevenelement representation of signals.

On an output of a unit signal C which really corresponds to static risk of failure in "0" is observed, thus its amplitude

is equal "3" (half from maximum value at seven-element representation of logical signals).



Fig. 3. Static risk of failure in "0"

According to tab. the dynamic risk of failure from "0" in "1" (F) on an output of a logical unit "And" takes place, when on its inputs there are signals V (static risk of failure in "1") and E (smooth transition from "0" in "1"). It corresponds to an intersection of the eighth line and the fifth column of the table. On fig. 4 are resulted results of simulation of this case where signals A and B correspond to specified entry signals V and E, and signal C shows presence of dynamic risk of failure at transition from "0" in "1".



Fig. 4. Dynamic risk of failure from "0" in "1"

For an example it is possible to consider operation of more complex device (fig. 3) at which there are faults of type "delay of signal's distribution". Sort of this device in system of computer-aided design on the basis of K-value differential calculus is presented on fig. 5.

On inputs  $x_1$  and  $x_3$  this device difference of a logical signal from "1" in "0", on inputs  $x_2$  and  $x_4 - a$  level of a constant "1", and on an input  $x_5 - a$  logic zero level moves. Thus on an output  $x_{11}$  "failure" of an output logical signal concerning a level logical "1" is observed. It corresponds to a situation of static risk of failure in "1" (fig. 6). Thus any unit of the circuit can to be in any of thirteen states.

The system of *K*-value simulation allows parsing also devices with usage power the analysis of switching signals.

On fig. 7 time diagrams of operation of the device (fig. 3) in view of power of switching of entry signals are resulted.



Fig. 5. Structure of the device in system of K-Value modeling



Fig. 6. Static risk of failure

On fig. 8 time diagrams of this device operation in a base mode (without the analysis of logical signals switching power) in that case when on two its inputs  $x_1$  and  $x_3$  signals which are encoded as O (see the table), and on inputs  $x_2$ ,  $x_4$ act are resulted and  $x_5$  are submitted logical "1", "1" and "0" accordingly.

Apparently from a figure on outputs  $x_{10}$  and  $x_{11}$  during a working time slice (about 15 nanoseconds on 66 nanoseconds) keeps a level of logical uncertainty (K = 3), that is defined by logic of separate logical units operation according to truth tables [6]. Similar results turn out at usage of other methods and systems of many-valued simulation.

At activation in system of *K*-Value modelling process in the account of power switching on entrance signals, there is an opportunity of its account at modelling device, in particular, on outputs  $x_6$  and  $x_7$  on a working interval time signals reach steady logic levels (fig. 6), unlike a level of uncertainty which could be observed in the previous case (fig. 5).



Fig. 7. Time diagrams of the device functioning in base mode



Fig. 8. Time diagrams of functioning the device in view of power switching on entrance signals

Similarly, the account of power switching of entrance logic signals leads to a deviation from level K = 3 values on all other intermediate circuits of the designed device. All this causes that on an output  $x_{10}$  in an interval of time 27 – 30 nanoseconds are observed error condition in the form of "failure" of a target voltage from a level of uncertainty up to a level of logic zero, and on an output  $x_{11}$  – "failure" of a logic signal on an output from a level of logic "unit" up to a level of uncertainty.

The received results speak about an opportunity in designing devices and research their working capacity by means of system on the basis of *K*-Value differential calculus to receive more exact quantitative and qualitative analysis error situations in designed devices.

### IV. CONCLUSION

Use of system of modelling on the basis of *K*-Value differential calculus allows to receive fuller qualitative and quantitative characteristics of failures in comparison with other existing systems of multiple-valued modelling in which there is no opportunity to represent quantized on amplitude a logic signal in the *K*-Value alphabet. Besides its use enables to consider research of such failures a real steepness of signals fronts on synchronization and data. All this opens prospect of use of system of *K*-Value modelling at designing complex and difficult computers.

### REFERENCES

 V.V. Solov'ev,. Designing of digital schemes on the basis of programmed logic integrated schemes. M.: Goryachaja linija. 2001. 636 p.
 A. Jutman, At-Speed On-Chip Diagnosis of Board-Level Interconnect Faults // Proc. of 9<sup>th</sup> European Test Symposium (ETS'04). France. 2004. – P. 2 – 7.

[3] Aktouf A Chouki, Complete Strategy for Testing an On-Chip Multiprocessor Architecture // IEEE Design & Test of Computers. 2002. P. 18 - 28.

[4] V.I. Hahanov, Designing of digital schemes on the basis of programmed logic integrated schemes. K.: IZMN, 1997. 308 p.

[5] N. Ahmed, M. Tehranipour, M. Nourani, Tending JTAG for Tasting Signal Integry SoCs // International Conference on Design Automation and Test in Europe DATE'03. 2003. P. 218 – 223.

[6] T.V. Gladkih, Verification the dynamic parameters of electronic devices on the basis of *K*-Value modeling. The dissertation on competition of a degree of Cand.Tech.Sci / National Technical University "Kharkov Polytechnical Institute". Kharkov, 2007. 311 p.

[7] V.D. Dmitrienko, S.Yu. Leonov, T.V. Gladkikh, System of *K*-Value simulation for research switching processes in digital devices Proceedings

of IEEE East-West Design & Test Workshop (EWDTW'06). Sochi, 2006. P. 428 - 435.

[8] C.J. Lin, S.M. Reddy, On delay fault testing in logic circuits // IEEE Transactions on Computer-aided design. 1987. No 5. P. 694 - 704.

[9] Logic modelling and testing of digital devices / Yu.A. Skobtsov, V.Yu. Skobtsov. Donetsk: IPMM NAN Ukraine, DonNTU, 2005. 436 p.



**Dmitrienko Valery Dmitrievich**, Dr.Sci.Tech., professor of faculty "Computer facilities and programming" National technical university "Kharkov Polytechnical Institute". Has protected the thesis for a doctor's degree in 1994. Area of scientific interests – modelling and optimization of technical systems.

Leonov Sergey Yurievich, Cand.Tech.Sci., the senior lecturer of faculty "Computer facilities and programming" National technical university "Kharkov Polytechnical Institute". Has protected the master's thesis in 1993. Area of scientific interests – mathematical modelling and computer electronics.



Gladkikh Tatyana Valentinovna, Cand.Tech.Sci., the senior lecturer of faculty "Computer science and intellectual property" National technical university "Kharkov Polytechnical Institute". Has protected the master's thesis in 2007. Area of scientific interests – technical diagnostics, neural systems, indistinct logic.

# Theory and Applications of Constrained LinearPredictive (LP) models

Igor N. Presnjakov, Leonid I. Nefedov, Stanislaw A. Krivenko, and Alexander P. Stativka

*Abstract* — The present paper relates generally to speech encoding and decoding in voice communication systems; and, more particularly, it relates to various techniques used with code- excited linear prediction coding to obtain high quality speech reproduction through a limited bit rate communication channel.

### Index Terms — AMR, Speech coding, CELP

### I. INTRODUCTION

**S** IGNAL modeling and parameter estimation play significant roles in communicating voice information with limited bandwidth constraints. To model basic speech sounds, speech signals are sampled as a discrete waveform to be digitally processed. In one type of signal coding technique called LPC (linear predictive coding), the signal value at any particular time index is modeled as a linear function of previous values. A subsequent signal is thus linearly predictable according to an earlier value. As a result, efficient signal representations can be determined by estimating and applying certain prediction parameters to represent the signal.

Applying LPC techniques, a conventional source encoder operates on speech signals to extract modelling and parameter information for communication to a conventional source decoder via a communication channel. Once received, the decoder attempts to reconstruct a counterpart signal for playback that sounds to a human ear like the original speech.

A certain amount of communication channel bandwidth is required to communicate the modelling and parameter information to the decoder. In embodiments, for example

S. A. Krivenko is with the Department of Communications, Kharkiv National University of Radio Electronics, Kharkiv, 61166 Ukraine (phone: 057-702-1429; cell phone: 067-723-7551; e-mail: <a href="mailto:stanislas@ukr.net">stanislas@ukr.net</a>).

A. P. Stativka was with the Department of Communications, Kharkiv National University of Radio Electronics, Kharkiv, 61166 Ukraine. He is now with the Department of system performer, Telesystems of Ukraine, Kyiv, 04080 (e-mail: <u>stativka@people.net.ua</u>).

where the channel bandwidth is shared and real-time reconstruction is necessary, a reduction in the required bandwidth proves beneficial. However, using conventional modeling techniques, the quality requirements in the reproduced speech limit the reduction of such bandwidth below certain levels.

Speech encoding becomes increasingly difficult as transmission bit rates decrease. Particularly for noise encoding, perceptual quality diminishes significantly at lower bit rates. Straightforward code-excited linear prediction (CELP) is used in many speech codec, and it can be very effective method of encoding speech at relatively high transmission rates. However, even this method may fail to provide perceptually accurate signal reproduction at lower bit rates. One such reason is that the pulse like excitation for noise signals becomes more sparse at these lower bit rates as less bits are available for coding and transmission, thereby resulting in annoying distortion of the noise signal upon reproduction.

Many communication systems operate at bit rates that vary with any number of factors including total traffic on the communication system. For such variable rate communication systems, the inability to detect low bit rates and to handle the coding of noise at those lower bit rates in an effective manner often can result in perceptually inaccurate reproduction of the speech signal. This inaccurate reproduction could be avoided if a more effective method for encoding noise at those low bit rates were identified.

Additionally, the inability to determine the optimal encoding mode for a given noise signal at a given bit rate also results in an inefficient use of encoding resources. For a given speech signal having a particular noise component, the ability to selectively apply an optimal coding scheme at a given bit rate would provide more efficient use of an encoder processing circuit. Moreover, the ability to select the optimal encoding mode for type of noise signal would further maximize the available encoding resources while providing a more perceptually accurate reproduction of the noise signal.

### II. DETAILED DESCRIPTION MODELS

Fig. la is a schematic block diagram of a speech communication system illustrating the use of source encoding and decoding in accordance with the present model.

Manuscript received February 29, 2008.

I. N. Presnjakov is with the Department of Communications, Kharkiv National University of Radio Electronics, Kharkiv, 61166 Ukraine (phone: 057-702-1429; fax: 057-702-1429; e-mail: <u>tkvt@kture.kharkov.ua</u>).

L. I. Nefedov is with Department of Automation, Kharkiv National Auto Road University, Kharkiv, and 61022 Ukraine (e-mail: nefedov@mail.ru).





Therein, a speech communication system 100 supports communication and reproduction of speech across a communication channel 103. Although it may comprise for example a wire, fiber or optical link, the communication channel 103 typically comprises, at least in part, a radio frequency link that often must support multiple; simultaneous speech exchanges requiring shared bandwidth resources such as may be found with cellular telephony embodiments.

Although not shown, a storage device may be coupled to the communication channel 103 to temporarily store speech information for delayed reproduction or playback, e.g., to perform answering machine functionality, voiced email, etc. Likewise, the communication channel 103 might be replaced by such a storage device in a single device embodiment of the communication system 100 that, for example, merely records and stores speech for subsequent playback.

In particular, a microphone 111 produces a speech signal in real time. The microphone 111 delivers the speech signal to an A/D (analog to digital) converter 115. The A/D converter 115 converts the speech signal to a digital form then delivers the digitized speech signal to a speech encoder 117.

The speech encoder 117 encodes the digitized speech by using a selected one of a plurality of encoding modes. Each of the plurality of encoding modes utilizes particular techniques that attempt to optimize quality of resultant reproduced speech. While operating in any of the plurality of modes, the speech encoder 117 produces a series of modeling and parameter information (hereinafter "speech indices"), and delivers the speech indices to a channel encoder 119.

The channel encoder 119 coordinates with a channel decoder 131 to deliver the speech indices across the communication channel 103. The channel decoder 131 forwards the speech indices to a speech decoder 133. While operating in a mode that corresponds to that of the speech encoder 117, the speech decoder 133 attempts to recreate the original speech from the speech indices as accurately as possible at a speaker 137 via a D/A (digital to analog) converter 135.

The speech encoder 117 adaptively selects one of the pluralities of operating modes based on the data rate restrictions through the communication channel 103. The communication channel 103 comprises a bandwidth allocation between the channel encoder 119 and the channel decoder 131. The allocation is established, for example, by telephone switching networks wherein many such channels are allocated and reallocated as need arises. In one such embodiment, either a 22.8 kbps (kilobits per second) channel bandwidth, i.e., a full rate channel, or a 11.4 kbps channel bandwidth, i.e., a half rate channel, may be allocated.

With the full rate channel bandwidth allocation, the speech encoder 117 may adaptively select an encoding mode that supports a bit rate of 11.0, 8.0, 6.65 or 5.8 kbps. The speech encoder 117 adaptively selects an either 8.0, 6.65, 5.8 or 4.5 kbps encoding bit rate mode when only the half rate channel has been allocated. Of course these encoding bit rates and the aforementioned channel allocations are only representative of the present embodiment. Other variations to meet the goals of alternate embodiments are contemplated.

With either the full or half rate allocation, the speech encoder 117 attempts to communicate using the highest encoding bit rate mode that the allocated channel will support. If the allocated channel is or becomes noisy or otherwise restrictive to the highest or higher encoding bit rates, the speech encoder 117 adapts by selecting a lower bit rate encoding mode.

Similarly, when the communication channel 103 becomes more favorable, the speech encoder 117 adapts by switching to a higher bit rate encoding mode.

With lower bit rate encoding, the speech encoder 117 incorporates various techniques to generate better low bit rate speech reproduction. Many of the techniques applied are based on characteristics of the speech itself. For example, with lower bit rate encoding, the speech encoder 117 classifies noise, unvoiced speech, and voiced speech so that an appropriate modeling scheme corresponding to a particular classification can be selected and implemented. Thus, the speech encoder 117 adaptively selects from among a plurality of modeling schemes those most suited for the current speech. The speech encoder 117 also applies various other techniques to optimize the modeling as set forth in more detail below.

Fig. lb is a schematic block diagram illustrating several variations of an exemplary communication device employing the functionality of Fig. 1a.

A communication device 151 comprises both a speech encoder and decoder for simultaneous capture and reproduction of speech. Typically within a single housing, the communication device 151 might, for example. comprise a cellular telephone, portable telephone, computing system, etc. Alternatively, with some modification to include for example a memory element to store encoded speech information the communication device 151 might comprise an answering machine, a recorder, voice mail system, etc.

A microphone 155 and an A/D converter 157 coordinate to deliver a digital voice signal to an encoding system 159. The encoding system 159 performs speech and channel encoding and delivers resultant speech information to the channel. The delivered speech information may be destined for another communication device (not shown) at a remote location

As speech information is received, a decoding system 165 performs channel and speech decoding then coordinates with a D/A converter 167 and a speaker 169 to reproduce something that sounds like the originally captured speech.

The encoding system 159 comprises both a speech processing circuit 185 that performs speech encoding, and a channel processing circuit 187 that performs channel encoding. Similarly, the decoding system 165 comprises a speech processing circuit 189 that performs speech decoding, and a channel processing circuit 191 that performs channel decoding.

Although the speech processing circuit 185 and the channel processing circuit 187 are separately illustrated, they might be combined in part or in total into a single unit. For example, the speech processing circuit 185 and the channel processing circuitry 187 might share a single DSP (digital signal processor) and/or other processing circuitry.

Similarly, the speech processing circuit 189 and the channel processing circuit 191 might be entirely separate or combined in part or in whole. Moreover, combinations in whole or in part might be applied to the speech processing circuits 185 and 189, the channel processing circuits 187 and 191, the processing circuits 185, 187, 189 and 191, or otherwise.

The encoding system 159 and the decoding system 165 both utilize a memory 161. The speech processing circuit 185 utilizes a fixed codebook 181 and an adaptive codebook 183 of a speech memory 177 in the source encoding process. The channel processing circuit 187 utilizes a channel memory 175 to perform channel encoding. Similarly, the speech processing circuit 189 utilizes the fixed codebook 181 and the adaptive codebook 183 in the source decoding process. The channel memory 175 to perform channel decoding.

The speech memory 177 is shared as illustrated. Separate copies thereof can be assigned for the processing circuits 185 and 189. Likewise, separate channel memory can be allocated to both the processing circuits 187 and 191. The memory 161 also contains software utilized by the processing circuits 185,187,189 and 191 to perform various functionality required in the source and channel encoding and decoding processes.

### III. A MULTI-STEP ENCODING

Figs. 2-4 are functional block diagrams illustrating a multi-step encoding approach used by one embodiment of the speech encoder illustrated in Figs. la and lb. In particular, Fig. 2 is a functional block diagram illustrating of a first stage of operations performed by one embodiment of the speech encoder shown in Figs. la and lb. The speech encoder, which comprises encoder processing circuitry, typically operates pursuant to software instruction carrying out the following functionality.

At a block 215, source encoder processing circuitry performs high pass filtering of a speech signal 211. The filter uses a cutoff frequency of around 80 Hz to remove, for example, 60 Hz power line noise and other lower frequency signals. After such filtering, the source encoder processing circuitry applies a perceptual weighting filter as represented by a block 219. The perceptual weighting filter operates to emphasize the valley areas of the filtered speech signal, if the encoder processing circuitry selects operation in a pitch preprocessing (PP) mode as indicated at a control block 245, a pitch preprocessing operation is performed on the weighted speech signal at a block 225. The pitch preprocessing operation involves warping the weighted speech signal to match interpolated pitch values that will be generated by the decoder processing circuitry. When pitch preprocessing is applied, the warped speech signal is designated a first target signal 229. if pitch preprocessing is not selected the control block 245, the weighted speech
signal passes through the block 225 without pitch preprocessing and is designated the first target signal 229.



Fig. 2. A first stage of operations is performed by one embodiment of the speech encoder shown in Figs. la and lb

As represented by a block 255, the encoder processing circuitry applies a process wherein a contribution from an adaptive codebook 257 is selected along with a corresponding gain 257 which minimize a first error signal 253. The first error signal 253 comprises the difference between the first target signal 229 and a weighted, synthesized contribution from the adaptive codebook 257.

At blocks 247, 249 and 251, the resultant excitation vector is applied after adaptive gain reduction to both a synthesis and a weighting filter to generate a modeled signal that best matches the first target signal 229. The encoder processing circuitry uses LPC (linear predictive coding) analysis, as indicated by a block 239, to generate filter parameters for the synthesis and weighting filters. The weighting filters 219 and 251 are equivalent in functionality.

Next, the encoder processing circuitry designates the first error signal 253 as a second target signal for matching using contributions from a fixed codebook 261. The encoder processing circuitry searches through at least one of the pluralities of sub codebooks within the fixed codebook 261 in an attempt to select a most appropriate contribution while generally attempting to match the second target signal.

More specifically, the encoder processing circuitry selects an excitation vector, its corresponding sub codebook and gain based on a variety of factors. For example, the encoding bit rate, the degree of minimization, and characteristics of the speech itself as represented by a block 279 are considered by the encoder processing circuitry at control block 275. Although many other factors may be considered, exemplary characteristics include speech classification, noise level, sharpness, periodicity, etc. Thus, by considering other such factors, a first sub codebook with its best excitation vector may be selected rather than a second sub codebook's best excitation vector even though the second sub codebook's better minimizes the second target signal 265.

Fig. 3 is a functional block diagram depicting of a second stage of operations performed by the embodiment of the speech encoder illustrated in Fig. 2.



Fig. 3. A functional block diagram depict of a second stage of operations performed by the embodiment of the speech encoder

In the second stage, the speech encoding circuitry simultaneously uses both the adaptive the fixed codebook vectors found in the first stage of operations to minimize a third error signal 311.

The speech encoding circuitry searches for optimum gain values for the previously identified excitation vectors (in the first stage) from both the adaptive and fixed codebooks 257 and 261. As indicated by blocks 307 and 309, the speech encoding circuitry identifies the optimum gain by generating a synthesized and weighted signal, i.e., via a block 301 and 303, that best matches the first target signal 229 (which minimizes the third error signal 311). Of course if processing capabilities permit, the first and second stages could be combined wherein joint optimization of both gain and adaptive and fixed codebook rector selection could be used.

Fig. 4 is a functional block diagram depicting of a third stage of operations performed by the embodiment of the speech encoder illustrated in Figs. 2 and 3.

The encoder processing circuitry applies gain normalization, smoothing and quantization, as represented by blocks 401, 403 and 405, respectively, to the jointly optimized gains identified in the second stage of encoder processing. Again, the adaptive and fixed codebook vectors used are those identified in the first stage processing

With normalization, smoothing and quantization functionally applied, the encoder processing circuitry has

completed the modeling process. Therefore, the modeling parameters identified are communicated to the decoder. In particular, the encoder processing circuitry delivers an index to the selected adaptive codebook vector to the channel encoder via a multiplexor 419. Similarly, the encoder processing circuitry delivers the index to the selected fixed codebook vector, resultant gains, and synthesis filter parameters etc., to the multiplexor 419. The multiplexor 419 generates a bit stream 421 of such information for delivery to the channel encoder for communication to the channel and speech decoder of receiving device [1].



Fig. 4. A functional block diagram depict of a third stage of operations performed by the embodiment of the speech encoder

In many applications it is important to measure how much distortion an operation exerts on the spectrum, that is, to measure how much the spectrum changes, for example, in quantization of parameters. Define the spectral error or difference  $V(\omega)$  as [2]

$$V(\omega) = 10 \log_{10} [A(\omega)] - 10 \log_{10} [A(\omega)].$$
(1)

The simplest and most used of spectral distortion measures, log spectral distortion (SD), is defined as

$$d^{2} = (1/\pi) \int_{0}^{\pi} |V(\omega)|^{2} d\omega, \qquad (2)$$

where  $A(\omega)$  is the original spectrum and  $A(\omega)$  the distorted spectrum [3].

In this paper we describe the measured characteristics of constrained Linear Predictive (LP) models.

#### IV. LINEAR PREDICTIVE MODELS

#### A. Hardware

Fig. 5 is a block diagram of an embodiment illustrating functionality of speech decoder having corresponding

functionality to that illustrated in Figs. 2-4.



Fig. 5. A functional block diagram depict of a second stage of operations performed by the embodiment of the speech encoder

As with the speech encoder the speech decoder, which comprises decoder processing circuitry, typically operates pursuant to software instruction carrying out the following functionality.

A demultiplexor 511 receives a bit stream 513 of speech modeling indices from an often remote encoder via a channel decoder. As previously discussed, the encoder selected each index value during the multi-stage encoding process described above in reference to Figs. 2-4. The decoder processing circuitry utilizes indices, for example, to select excitation vectors from an adaptive codebook 515 and a fixed codebook 519, set the adaptive and fixed codebook gains at a block 521, and set the parameters for a synthesis filter 531. With such parameters and vectors selected or set, the decoder processing circuitry generates a reproduced speech signal 539. In particular, the codebooks 515 and 519 generate excitation vectors identified by the indices from the demultiplexor 511. The decoder processing circuitry applies the indexed gains at the block 521 to the vectors which are summed. At a block 527, the decoder processing circuitry modifies the gains to emphasize the contribution of vector from the adaptive codebook 515. At a block 529, adaptive tilt compensation is applied to the combined vectors with a goal of flattening the excitation spectrum. The decoder processing circuitry performs synthesis filtering at the block 531 using the flattened excitation signal

Finally, to generate the reproduced speech signal 539, post filtering is applied at a block 535 deemphasizing the valley areas of the reproduced speech signal 539 to reduce the effect of distortion. The hardware is realized with QuartusII from corporation Altera.

## B. Software

A sub design 551 receives a bit stream of Linear Predictive speech modeling indices  $LPC_j$  from an often remote encoder via a channel decoder and one  $LPC_j^{\uparrow}$  via block 541. It have used the following modified nonlinear equation

$$SA^{2} = \sum_{j} 10 \log_{10} \left| LPC_{j}^{*} / LPC_{j} \right|.$$
(3)

The software model is realized with MATLAB from corporation Math Works. Text of program is M-file (look appendix).

## V. CELLULAR TELEPHONY

In the exemplary cellular telephony embodiment of the present invention, the A/D converter 115 (Fig. la) will generally involve analog to uniform digital PCM including: 1) an input level adjustment device; 2) an input anti-aliasing filter; 3) a sample-hold device sampling at 8 kHz; and 4) analog to uniform digital conversion to 13-bit representation.

Similarly, the D/A converter 135 will generally involve uniform digital PCM to analog including: 1) conversion from 13-bitl8 kHz uniform PCM to analog; 2) a hold device; 3) reconstruction filter including x/sin(x) correction; and 4) an output level adjustment device.

In terminal equipment, the A/D function may be achieved by direct conversion to 13-bit uniform PCM format, or by conversion to 8-bit/A-law compounded format. For the D/A operation, the inverse operations take place.

The encoder 117 receives data samples with a resolution of 13 bits left justified in a 16-bit word. The three least significant bits are set to zero. The decoder 133 outputs data in the same format. Outside the speech codec, further processing can be applied to accommodate traffic data having a different representation. A specific embodiment of an AMR (adaptive multi-rate) codec with the operational functionality illustrated in Figs. 2-5 uses five source codec with bit-rates 11.0, 8.0, 6.65, 5.8 and 4.55 kbps. Four of the highest source coding bit-rates are used in the full rate channel and the four lowest bit-rates in the half rate channel.

All five source codec within the AMR codec are generally based on a code-excited linear predictive (CELP) coding model. A 10th order linear prediction (LP), or short-term, synthesis filter, e.g., used at the blocks 249, 267, 301, 407 and 531 (of Figs. 2-5), is used.

## VI. SIMULATION MODEL

Twenty-one encoder input sequences are provided ETSI [4]. Note that for the input sequences TEST0.INP to TEST3.INP, the amplitude figures are given in 13-bit precision. The active speech levels are given in dBov.

TEST0.INP - Synthetic harmonic signal. The pitch delay varies slowly from 18 to 143.5 samples. The minimum and maximum amplitudes are -997 and +971.

TEST1.INP - Synthetic harmonic signal. The pitch delay varies slowly from 144 down to 18.5 samples. Amplitudes at saturation point -4096 and +4095. - TEST2.INP - Sinusoidal sweep varying from 150 Hz to 3400 Hz. Amplitudes  $\pm$  1250.

TEST3.INP - Sinusoidal sweep varying from 150 Hz to 3400 Hz. Amplitudes  $\pm$  4000.

TEST4.INP - Female speech, active speech level: -19.4 dBov, flat frequency response.

TEST5.INP - Male speech, active speech level: -18.7 dBov, flat frequency response.

TEST6.INP - Female speech, ambient noise, active speech level: -35.0 dBov, flat frequency response.

TEST7.INP - Female speech, ambient noise, active speech level: -25.0 dBov, flat frequency response.

TEST8.INP - Female speech, ambient noise, active speech level: -15.6 dBov, flat frequency response.

TEST9.INP - Female speech, car noise, active speech level: -35.5 dBov, flat frequency response.

TEST10.INP - Female speech, car noise, active speech level: -26.1 dBov, flat frequency response.

TEST11.INP - Female speech, car noise, active speech level: -15.8 dBov, flat frequency response.

TEST12.INP - Male speech, ambient noise, active speech level: -34.9 dBov, flat frequency response.

TEST13.INP - Male speech, ambient noise, active speech level: -24.8 dBov, flat frequency response.

TEST14.INP - Male speech, ambient noise, active speech level: -15.0 dBov, flat frequency response.

TEST15.INP - Male speech, babble noise, active speech level: -34.1 dBov, flat frequency response.

TEST16.INP - Male speech, babble noise, active speech level: -24.3 dBov, flat frequency response.

TEST17.INP - Male speech, babble noise, active speech level: -14.4 dBov, flat frequency response.

TEST18.INP - Female speech, ambient noise, active speech level: -26.0 dBov, modified IRS frequency response, with many zero frames.

TEST19.INP - Male speech, ambient noise, active speech level: -36.0 dBov, modified IRS frequency response, with many zero frames.

TEST20.INP - Sequence for exercising the LPC vector quantization codebooks and ROM tables of the codec.

The TEST0.INP and TEST1.INP sequences were designed to test the pitch lag of the GSM enhanced full rate speech encoder. In a correct implementation, the resulting speech encoder output parameters shall be identical to those specified in the TEST0.COD and TEST1.COD sequences, respectively.

The document [5] contains an electronic copy of the ANSI-C code for the Adaptive Multi-Rate codec. The ANSI-C code is necessary for a bit exact implementation of the Adaptive Multi Rate speech transcoder (TS 26.090 [6]).

## VII. SIMULATION RESULTS

To begin, we obtain the spectral distortion of the TEST4.INP for the Fast Fourier transform. Table I shows the normalized spectral distortion of Female speech, active speech level: -19.4 dBov, flat frequency response.

| TABLE I    |
|------------|
| TEST4 LAST |

| BIT<br>RATE,<br>KBPS | QUANTITY | QUANTITY / 8,7797(1) |
|----------------------|----------|----------------------|
| MR122                | 2,8397   | 0,323439             |
| MR102                | 2,9657   | 0,337791             |
| MR795                | 3,5974   | 0,409741             |
| MR74                 | 3,3838   | 0,385412             |
| MR67                 | 3,7678   | 0,429149             |
| MR595                | 3,6410   | 0,414707             |
| MR515                | 4,4173   | 0,503127             |
| MR475                | 4,3649   | 0,497158             |

Second, we obtain the spectral distortion of the TEST5.INP for the Fast Fourier transform. C, active speech level: -18.7 dBov, flat frequency response.

TABLE II test5 last

| BIT<br>RATE,<br>KBPS | QUANTITY | QUANTITY / 8,7797(3) |
|----------------------|----------|----------------------|
| MR122                | 5,2161   | 0,594109             |
| MR102                | 5,4805   | 0,624224             |
| MR795                | 6,5916   | 0,750777             |
| MR74                 | 6,5855   | 0,750083             |
| MR67                 | 7,5220   | 0,856749             |
| MR595                | 7,7835   | 0,886534             |
| MR515                | 8,4710   | 0,964839             |
| MR475                | 8,7797   | 1                    |

Third, we obtain the new spectral distortion of the TEST4.INP for the Fast Fourier transform. Table III shows the normalized spectral distortion of Female speech, active speech level: -19.4 dBov, flat frequency response.

|                      | -<br>-<br>- | ΓABLE III<br>fest4 new |
|----------------------|-------------|------------------------|
| BIT<br>RATE,<br>KBPS | QUANTITY    | QUANTITY / 2,4408(2)   |
| MR122                | 1,9257      | 0,788963               |
| MR102                | 1,8834      | 0,771632               |
| MR795                | 1,9427      | 0,795928               |
| MR74                 | 1,9843      | 0,812971               |
| MR67                 | 1,9390      | 0,794412               |
| MR595                | 1,9377      | 0,793879               |
| MR515                | 2,0095      | 0,823296               |
| MR475                | 2,0348      | 0,833661               |

Forth, we obtain the new spectral distortion of the TEST5.INP for the Fast Fourier transform. Table IV shows the normalized spectral distortion of male speech, active speech level: -18.7 dBov, flat frequency response.

|       |          | TABLE IV             |
|-------|----------|----------------------|
|       |          | TEST5 NEW            |
| BIT   | 0        |                      |
| RATE, | QUANTITY | QUANTITY / 2,4408(4) |
| KBPS  |          |                      |
| MR122 | 2,0334   | 0,833088             |
| MR102 | 2,0502   | 0,839971             |
| MR795 | 2,0986   | 0,859800             |
| MR74  | 2,0925   | 0,857301             |
| MR67  | 2,0648   | 0,845952             |
| MR595 | 2,0857   | 0,854515             |
| MR515 | 2,1277   | 0,871722             |
| MR475 | 2,1492   | 0,880531             |







|                      | TABLE DELTA   | V            |
|----------------------|---------------|--------------|
| BIT<br>RATE,<br>KBPS | LAST DELTA(5) | NEW DELTA(6) |
| MR122                | 0,27067       | 0,044125     |
| MR102                | 0,286433      | 0,068338     |
| MR795                | 0,341037      | 0,063873     |
| MR74                 | 0,364671      | 0,044330     |
| MR67                 | 0,427600      | 0,051540     |
| MR595                | 0,471827      | 0,060636     |
| MR515                | 0,461713      | 0,048427     |
| MR475                | 0,502842      | 0,046870     |

Fig. 7 shows accuracy new technique.



Fig. 7. It is simulation results

#### VIII. CONCLUSION

The proposed techniques have more high accuracy.

#### APPENDIX

p = 10;number of frame = 64800/320; max shift = 5; kolichestvo tochek na grafike = 9; vv = wavread('Speech.wav', 64800);old y = yy; y(:, 1) = wavread('Speech.wav', 64800);y(:, 2) = wavread('speech 475-1.wav', 64800);y(:, 3) = wavread('speech 475-2.wav', 64800);y(:, 4) = wavread('speech 475-3.wav', 64800);y(:, 5) = wavread('speech 475-4.wav', 64800);y(:, 6) = wavread('speech 475-5.wav', 64800);y(:, 7) = wavread('speech\_475-6.wav',64800); y(:, 8) = wavread('speech 475-7.wav', 64800);y(:, 9) = wavread('speech 475-8.wav', 64800);for index i = 1:kolichestvo tochek na grafike sa super = 0; count = 0;for index ii = 1: (number of frame) count i = 0; for index iii = 1: max shift new = y((index ii\*160 - 160 + index iii)): (index ii\*160)+ index iii -1), index i); [a,g] = lpc(new,p)response new = real(a); old = old y((index ii\*160 - 159):index ii\*160, 1); [aa,gg] = lpc(old,p)response old = real(aa); meas sa = 10 \* log10(abs(response new./response old));f sa = meas sa \* meas sa'; sa ii = sqrt(f sa/(p + 1));

if (index iii == 1) old sa iii = sa ii; end if (old sa iii > sa ii) old sa iii = sa ii; end count i = count i + 1;end sa super = sa super + old sa iii; count = count + 1;end sa(index i, 1) = sqrt(sa super/count); ayv = fft(yv);ayvv = fft(y(:, index i));eyv = ayv.\*conj(ayv);eyvv = ayvv.\*conj(ayvv);  $yyv = 10*\log 10(eyv);$  $yyvv = 10*\log 10(eyvv);$ meas = minus(yyv,yyvv); f = meas'\*meas; sd(index i, 1) = sqrt(f/64800);end sd = sd

#### ACKNOWLEDGMENT

One of the authors, S.A. Krivenko, wishes to thank S.S. Krivenko for critical reading of the manuscript and useful discussion.

#### REFERENCES

- THYSSEN, Jess, "Low complexity random codebook structure," U.S. Patent 99/19135, August 24, 1999.
- [2] Tom Backstrom. Linear predictive modeling of speech -constraints and line spectrum pair decomposition. Dissertation for the degree of Doctor of Science in Technology, Finland, Helsinki University of Technology, 2004. – p.37.
- [3] A. H. Gray and J. D. Markel. Distance measures for speech processing. IEEE Trans. Acoust. Speech Signal Proc., ASSP-24:pp. 380–391, Oct. 1976.
- [4] Digital cellular telecommunications system (Phase 2+);Test sequences for the GSM Enhanced Full Rate (EFR)(3GPP TS 46.054 version 6.0.0 Release 6) Available: <u>http://www.etsi.com</u>
- [5] Digital cellular telecommunications system (Phase 2+); Universal Mobile Telecommunications System (UMTS); AMR speech Codec; C-source code (3GPP TS 26.073 version 6.0.0 Release 6) Available: <u>http://www.etsi.com</u>
- [6] Digital cellular telecommunications system (Phase 2+); Universal Mobile Telecommunications System (UMTS); AMR speech Codec; Transcoding Functions (3GPP TS 26.090 version 6.0.0 Release 6) <u>Available: http://www.etsi.com</u>

## Monitoring of Photonic-Crystal Fibers Positioning in the Connection Process

A. Filipenko, O. Sychova

*Abstract* – In this report the monitoring method of PCF positioning at the connection process is offered. It is based on the matched filtration principle in the form of autoconvolution. Equations which connected PCF core axis coordinates with the maximum value of optical field intensity autoconvolution was obtained. It is shown that optical fiber displacement relatively base coordinates corresponds to half coordinates of autoconvolution maximum. Researches have shown that the offered method possesses high noiseproof factor and much higher accuracy in comparison with an integrated method.

Index Terms – Autoconvolution, connect, optical intensity distribution, photonic crystal fiber (PCF)

## I. INTRODUCTION

The scope of new type optical fibers, named photoniccrystal fibers (PCF), extends in the electronic techniques recently. There are air canals in PCF cladding unlike ordinary fibers. It is possible to change PCF optical properties over a wide diapason depending on geometry, channels sizes, interval between them and also their relative positioning (in the form of a hexagonal or a casual channels positioning). Therefore, even the slightest deformation of PCF geometry very strongly influences on a fiber optical characteristics, so and on a signal transmission.

In many cases, for example at installation of functional electronics elements based on PCF, there is a need of PCF connections among themselves or with ordinary fibers. It inevitably leads to occurrence of PCF geometry deformations and various displacements. As it is known, allowable displacement excess leads to considerable increase of insertion optical losses. One of the main problem at maintenance of insertion losses low level is the definition of a spatial location of positioning objects, in particular PCF, concerning base coordinates. As the last, can by cores axes of interfaced optical fibers or a base axis of the technology equipment.

The decision of the given problem is reach by working out of the special technique, which should provide the positioning errors not exceeding percentage units of controllable value and making the tenth part of a micrometer. The majority of existing methods is based on perception and the analysis of positioned fibers optical images and is intended for the parameters control of ordinary optical fibers [1-5].

The purpose of these researches is working out of the automated precision control method of PCF positioning at connections. The problems of a mathematical substantiation of technique principles, imitating modelling on the personal computer and experimental researches on the technology equipment were solve during researches.

#### **II. THEORETICAL RESEARCHES**

As is well known, in the fiber-optic connect process are decided tasks of PCF positioning, at which the current fiber position is determined.

To the characteristics, describing these connected PCF properties, concern:

- corner of a mutual inclination of connected PCF cores axes. In the majority of modern positioning devices this size should be eliminated by a premise of fibers in V-groove of positioning devices elements. However, as has shown operating experience and the researches, the given property is realised not always;

- value of longitudinal displacement of PCF cores;

- value of cross-section displacement of PCF cores.

Determine of the specified positioning parameters is rational for carrying out by means of an optical television control method with use of cross-section PCF sounding by a wide bunch uniform on light exposure of a light stream. The scheme of given method realisation is given on fig.1.



Fig.1. Scheme of the control system of PCF positioning

Here that property is use that at cross-section illumination the fiber represents focusing cylindrical lens, which make distribution of the optical field in a photodetector plane, allowing defining of a fiber optic-geometrical characteristics.

Manuscript received February 9, 2008.

A. Filipenko is with Kharkov National University of Radio Electronics; 14, Lenin Ave, Kharkov, 61166, Ukraine.

O. Sychova is with Kharkov National University of Radio Electronics 14, Lenin Ave, Kharkov, 61166, Ukraine.

The initial information which is subject to the analysis, represents a matrix of the brightness codes corresponding to points of the image (fig.2,a). Resolution is caused by the matrix elements size and their quantity, and also objective magnification.



Fig.2. Image of an optical field obtained at PCF cross-section sounding (a) and corresponding to it 1D cross-section field intensity distribution (b)

Prominent features of the received image are:

- the greatest area on optical radiation intensity correspond to the free space surrounding of PCF. On level, these signals considerably exceed level of dark areas;

- the most dark areas correspond to areas of PCF cladding, free from air channels;

- at PCF displacement a zone occupied with background areas above and below a fiber (with reference to figure), change the sizes.

The main complexity of the given method realisation is the analysis of the measured information and conclusion about a condition of connection objects.

Let us consider decisions of the given problems offered in given work.

From features of an optical fibers structure known that in the absence of unacceptable defects the fields intensity distribution created by them has symmetric character concerning a core axis. The same feature is characteristic and for PCF. On fig.2,*b* the field intensity distribution in 1D variant, corresponding to the image of an optical field at the cross-section sounding PCF presented on fig.2, *a*.

From figures it is visible, that the signal is symmetric (though has the insignificant distortions caused by presence

of defects and pollution of fibers surfaces and optical elements) and represents even function concerning an axis passing through the centre of symmetry and coinciding with a required core optical axis in the absence of displacement. Therefore, the problem of the radial displacement monitoring is reducing to definition of lateral shift of the signal symmetry centre. The decision of this problem is offer to be carrying out with use of a principle of the matched filtration in the form of autoconvolution [6–8].

The signal model in optical field intensity distribution section it is possible to present in a form

$$\xi(x) = I(x) + n(x)$$
, (1)

where  $I(x) \approx E^2(x)$  – intensity distribution function; n(x) – additive noise with zero average value.

The matched filter is the optimum filter minimising average square error at allocation useful making of I(x) mix with noise  $\xi(x)$ .

The pulse response of the matched filter represents turned relatively y and shifted on function  $x_t$  and writing

$$h(x) = I(x_t - x)$$
 . (2)

Shift presence means, that for detection of a signal by duration  $x_t$  it is necessary to give it in during time  $x_t$  after beginning signal. The matched filtration consists in pass a signal I(x) the filter with the pulse response I(-x). The optimum filter does not depend on amplitude of a signal, i.e. instead I(-x) it is possible to take  $\alpha \cdot I(-x)$ .

Thus, the pulse characteristic of the matched filter within constant multiplier should represent the turned copy of a useful component, namely

$$h(x) = \alpha I(-x). \tag{3}$$

It is known, that the linear filtration in spatial area is equivalent to mathematical operation of convolution

$$y(t) = \int_0^t x(\tau)h(t-\tau)d\tau.$$
 (4)

As it is note, for the matched filter, h(t) = x(-t) therefore

$$y(t) = \int_{0}^{t} x(\tau) x(\tau - t) d\tau.$$
 (5)

Function of mutual correlation x(t) and x(-t) can be present in the form

$$C(\tau) = \frac{1}{T} \int_{0}^{T} x(t) x(t-\tau) dt .$$
 (6)

Thus, the matched filtration reduced to convolution x(t)and x(-t) or to calculation of their autocorrelation function. Applying the given data, we will write output signal of the matched filter in the form of convolution integral

$$s(z) = \xi(x) * h(x) = \int_{-D/2}^{D/2} \xi(x)h(z-x)dx, \quad (7)$$

where D-extent of a registration site.

At substitution (3) in (7) and  $\alpha = 1$  a point z = 0 possible writing

$$s(0) = \int_{-D/2}^{D/2} I^{2}(x) dx + R_{ni}(0) \approx R_{ii}(0), \qquad (8)$$

where the estimation mutual covariation function to noise and signal  $R_{ni}$  is close to zero owing to their statistical independence. Thus, the output signal of the matched filter corresponds to avtocovariation function of a useful component I(x) and reaches a maximum at the moment of exact identification of this component.

The spent consideration allows to construct computer processing algorithm of the measured distribution of an optical field an autoconvolution method. These operations are registration of sequence of discrete values of a signal, formation of the second sequence with return renumbering of elements, paired multiplication of elements values sequences and summation of the received products at the varied parameter of shift z. Maximum resulting to value of the received sum there corresponds to such shift of the second sequence concerning to the first at which their coincidence by criterion of a minimum average square errors is observed.

#### III. MODELLING AND EXPERIMENTAL RESEARCHES OF PCF POSITIONING MONITORING

Possibilities of a positioning control method have been investigated by modelling on the computer. Following principles of the theory were investigated during experiments:

- calculate of autoconvolution for initial field intensity distribution at the set measurement error and a finding on its maximum value of PCF displaced centre coordinates;

- determining of values and their errors of core centre coordinates  $X_c$ , expressed through the gravity centre of image intensity function and an estimation of their parity with the coordinates defined by calculation of initial field autoconvolution.

Efficiency and potential possibilities of a technique were checked by modelling on the computer with use of pseudorandom numbers generator for imitation of measurement errors  $\varepsilon$ . Initial distribution of an electric field intensity I(x) as realization of some statistical ensemble was set. The measurement error was set by size of 10% from field amplitude value in each point of distribution. After formation a signal with 10% error the coordinates of field distribution centre  $X_c$  calculates in according with equation defining the function gravity centre

$$Xc = \int_{-D/2}^{D/2} xI(x)dx / \int_{-D/2}^{D/2} I(x)dx .$$
 (9)

Further on this signal  $\xi(x)$  was executed autoconvolution operation, determined index of formed array maximum element. For undertaking of researches were designed algorithm and program, realizing the examinee method. The algorithm scheme is present on fig. 3. Modelling was carried out in MATLAB.



Fig. 3. The algorithm scheme of modelling researches

The offered control method of PCF positioning has been experimentally investigated on research technology equipment. Researches were executed on the strategy, which is similar to modeling on PC, with use of the measuring set-up realizing a near field technique. Sensitivity and accuracy of method under experimental researches was check by means of standards displacing the images on the given value  $\rho$ , controlled by the qualify meter of small displacement "Micron-02".

Experimental researches on the automated equipment of Fujikura optical fibers splicing have been spent for research of autoconvolution method application in the set-up realizing transversal fiber sounding (fig.4). Making PCF initial distribution of optical field at transversal illumination to the fiber axes (see fig.2) and its autoconvolution are represented on fig. 5. It is necessary to notice, that autoconvolution maximum locate on distance of the doubled coordinates in comparison with a starting axes position.

As it was already noted, PCF have concerning the axis a symmetric structure. However, generated by them optical fields at influence of various factors can be characterized by essential asymmetry of distribution. To these factors the light sources nonuniformity, defects of the form and surfaces, presence of pollution on control objects and optical elements, nonuniformity of CCD-matrix sensitivity and charge carrying concern.



Fig. 4. The experimental set-up scheme on the basis of the Fujikura optical fibers splicing for research of a autoconvolution method at cross-section PCF sounding

Instead of the staff technical vision system was used CCD-camera, analog-digital converter, interfacing scheme and personal computer. Experimental equipment also contains in the structure a microscope with fiber fastening and moving module, a light source – semiconductor infrared laser. As moving devices were used staff micrometric positioners in a hand control mode. Movement of fibers V-groove clamper were check by the qualify displacement inductive meter "Micron-02" with a moving measurement error of 0.02 microns. The photonic-crystal fiber with a quartz core was used as control object.

The researches technique consisted in formation of PCF image shift in a CCD-camera plane, registration of optical field intensity, its transformation to the digital form and processing on the algorithm described above. Real displacement varied from 0 to 5 microns.

The measurement error of core centre coordinates by autoconvolution technique does not exceed one element of image. Last, as it was noted, defined by coordinate grid of system of microscope – CCD-matrix (for used experimental set-up at microscope magnification  $300^{\times}$  corresponds to 0.1 microns).



Fig. 5. 1D PCF optical intensity distribution and its autoconvolution

Researches have shown, that in spite of enumerate influences, proposed technique has high noiseproof factor and much higher accuracy in contrast with the integral method. Last it will be obvious to displace coordinates of a fiber axis in area of values with greater intensity that inevitably leads to the roughest errors reaching of several tens of image elements.

Researches results of offered method accuracy are shown in the table. The parameter «Displacement of initial field» is presented by results of measurement by standard device "Micron-02". From the table analysis it is visible, that an error for the real fields generated by control objects, does not exceed two elements of the image that correspond to size less than 0.2 microns. Taking into account the given size and positioning accuracy at designing of measuring system should get out quantity of discretization elements within duration of a useful signal and possible diapasons of its displacement.

On fig.6 3D pattern of image intensity distribution and its autoconvolution are resulted. Apparently from image to determine the maximum corresponding to fibers axes does not represent complexity.



Fig. 6. 3D pattern of the measured field distribution and its autoconvolution

Result of the given method application is two arrays – a matrix of the axis centre  $X_c$  (a matrix in which the elements corresponding to core axis, are equal «1», the others – «0») the dimension  $[i \times j]$  coinciding with dimension by an initial matrix of image intensity codes,

| Parameter                                              | PCF   | PCF position<br>at splising |
|--------------------------------------------------------|-------|-----------------------------|
| Displacement of initial field, µm                      | 12.6  | 10.25                       |
| Displacement of autoconvolution (in elements of image) | 251   | 202                         |
| Displacement of autoconvolution, µm                    | 25.1  | 20.2                        |
| Calculation PCF displacement, µm                       | 12.55 | 10.1                        |
| Absolute error, μm                                     | -0.05 | -0.15                       |
| Relative error, %                                      | -0.4  | -1.5                        |

and a vector dimension of sections quantity C[j], which elements have the values equal to numbers of lines i, in which are observed maximum autoconvolution

$$C = [m(i, j)] = [5 \ 5 \ 5 \ 0 \ 4 \ 4 \ 4 \ 4].$$
(11)

On fig.7 present the result of autoconvolution technique application to the measured image.



Fig.7. PCF core line restored by an autoconvolution technique

## IV. CONCLUSION

In the report, the control method of photonic-crystal fibers positioning was developed based on the analysis of the measured distribution of optical field intensity and calculation of autoconvolution its discrete values. Method possibilities been investigated by modelling on the PC. Shown, that there is obviously expressed and unequivocally defined autoconvolution maximum even in the presence of considerable measurement errors of field amplitude. Therefore, the given method possesses high noiseproof factor and much higher accuracy in comparison, for example, with an integrated method at which coordinates of fiber axis strongly depend on the amplitude distribution form that leads to the roughest errors reaching the several tens elements of the image.

#### REFERENCES

- Filipenko A.I. Analysis method of optical fibers radiation // Radiotekhnika: All-Ukr.Sci. Interdep. Mag. 1997. №103. P.26–30.
- [2] Filipenko A.I., Malik B.A. System of the precision details control of fiber-optical information transfer systems components // Radiotekhnika: All-Ukr.Sci. Interdep. Mag. 1997. №103. P.31–34.
- [3] Nevludov I.Sh., Filipenko A.I. Technology of quality control of optical fibers positioning in optical connectors ferrules // Technology and designing in electronic equipment. 1997. №4. P.47-48.
- [4] Nevludov I.Sh., Filipenko A.I. The technological control of diameter mode fields of singlemode optical fibers // Technology and designing in electronic equipment. 1998. №1. P.22–24.
- [5] Filipenko A.I. Analysis method of radiation intensity and its use in manufacture of fiber-optical components // Radiotekhnika: All-Ukr.Sci. Interdep. Mag. 1999. № 110. P.130-133.
- [6] Filipenko A.I. Use of optical field distribution autoconvolution for position identification of optical fibers cores at their connection // Radiotekhnika: All-Ukr.Sci. Interdep. Mag. 2003. №132. P.109-114.
- [7] Filipenko A.I. Research of application of matching filtration for identification of optical fiber core position // High technologies in mechanical engineering: Mag of sci. works. 2004. №2(9). P.233-242.
- [8] Alexander Filipenko, Igor Nevludov. Core position identification of the optical fibers connection by an autoconvolution method // Proceedings of SPIE: Advanced optoelectronics and lasers, 2004.-Vol.5582, September.-P.269-277.



Alexander Filipenko received the engineer degree in radio apparatus designing and manufacture from Kharkov Institute of Radio Electronics, Ukraine, in 1983, the Ph.D. degree in engineering from Kharkov Research Institute of Instrumentation Technologies, Ukraine, in 1995, the Dc.Sc. degree in Engineering from Kharkov Research Institute of Instrumentation Technologies, Ukraine, in 2005.

He is the Professor in the Technology and Automation of Electronic Apparatus Manufacture

Department at the Kharkov National University of Radio Electronics, Ukraine. He is dean of Electronic Apparatus Faculty at the Kharkov National University of Radio Electronics, Ukraine.

His research interests include design and manufacture of fiber optic components, signal processing, automation and controlling, testing.

He is academic of Byelorussia, Russia and Ukraine Academy of Science of Applied Radio Electronics.



**Oksana Sychova** received M.S. degree in electronic and telecommunication from Kharkov National University of Radio Electronics, Ukraine.

She is postgraduate student in the Department of Technology and Automation of electronic apparatus manufacturing.

She research interests include telecommunication systems, photonic-crystal optical fibers.

# Features of Decision Support's Program at Choice of Tests Optimized Sequence for Semiconductors Memory Diagnosing

Ryabtsev V.G, Almadi M.K.

Abstract – a method, which allows decreasing calculations works labour intensiveness at the choice of paret-optimum tests for semiconductors storage devices diagnosing is offered. This method is realized in the program Optimal-test, which allows decreasing the duration of memory's microcircuits testing without worsening of their quality.

Index Terms - diagnosing, microcircuit, memory, test.

#### I. INTRODUCTION

A t mastering mass-produced storages devices it is necessary to choose a control-diagnostic equipment and set of effective tests to include them in the program of tests. Acquisition of control-diagnostic facilities is carried out on the basis of analysis of technical descriptions and determination of their conforming to produced requirements. First task behave to the cleanly technical tasks, its decision does not require considerable efforts. Considerable difficulties arise up at the choice of effective diagnostic tests and modes task of tests of semiconductor memory's wares [1-3].

For providing of the production plan fulfilling and issue of high quality wares, it is necessary to apply the optimized sequence of tests, which provide achieving high efficiency of diagnosing at the limited resources of productions of time and equipment. For providing of operative adjustment of the program of tests a programmatic facilities development of decision-making support is required.

To improve the wares quality it is necessary to multiply the duration of the test diagnosing, that will multiply the cost of wares and diminishes the Sales Quantity, and the circle of causal-effective connections is locked. Every firm decides independently, what time of the test diagnosing will be for it optimum and will allow to attain the set quality of wares.

Providing of possible economic level of memory's microcircuit and modules diagnosing of memory is one of tasks, which must be decided by managers of technical quality control departments of firms of designing, making and exploiting facilities of the computing engineering,

which are intended for the informative and control systems construction.

For diminishing of labour intensiveness of solving of this task, a special methods, algorithms and programs development is needed, which allow to use wide possibilities of the modern computing engineering for sorting of data array, describing a priori properties of the applied tests.

In the known algorithms of data sorting the arrays of integer or real numbers are processed and a new wellorganized by ascending array of numbers is formed as a result [4]. However these algorithms after implementation of sorting operation do not allow identifying property, which objects estimate well-organized cells of got array.

The purpose of this paper is development of algorithm and method of choice of the optimized set of tests, providing combination of high efficiency with the possible economic level of memory's microcircuit and modules diagnosing of memory.

## II. MATHEMATICAL MODEL AND SOLVING OF RESEARCH TASK

At the choice of the tests optimized sequence for memory's microcircuits and modules diagnosing, it is necessary to take into account, that their diagnostic properties are unclear certain, and tests are offered, which their detailed algorithmic description is not expounded even. In the conditions of presence of a priori unclear information about diagnostic properties of tests it is expedient to apply the method of choice of paret-optimum tests, in which criterion of tests quality determined by skilled specialists in area of semiconductors storages devices diagnosing.

The most essential criterion for the estimation of tests properties is their ability to find out most widely showing up refusals of memory's cells, decoders of address, reading amplifiers of charges renewal charts in memorizing condensers and other.

These properties of tests are estimated by probabilities of finding out the refusals of the set types and measured by real numbers in a range from 0 to 1 and therefore they compared easily.

The vector of comparison  $S_i^{\mathcal{G}\mu}$  of tests properties  $\mathcal{G}$  and  $\mu$  on probably of finding out the refusal of *i*-type is determined by the following expression:

Manuscript received January 21, 2008.

Ryabtsev V.G. is with the Cherkassy state technological university, Boulevard Shevchenko, 460, CHSTU, Cherkassy, Ukraine. 18006. Tel. (+380)-472-730271. E-mail: volodja18@ukr.net

Al Madi Mudar . is with the Cherkassy state technological university, Boulevard Shevchenko, 460, CHSTU, Cherkassy, Ukraine. 18006. Tel. (+380)-472-730271.E-mail: <u>mudarinfo@yahoo.com</u>.

$$\begin{split} S_i^{\mathcal{G}\mu} &= 1, \text{if } q_{i_i}^{\mathcal{G}} > q_{i_i}^{\mu}; \\ S_i^{\mathcal{G}\mu} &= 0, \text{if } q_{i_i}^{\mathcal{G}} = q_{i_i}^{\mu}; \\ S_i^{\mathcal{G}\mu} &= -1, \text{if } q_{i_i}^{\mathcal{G}} < q_{i_i}^{\mu}, \end{split}$$

where  $q_{i_i}^{\mu}$ ,  $q_{i_i}^{\theta}$  - probabilities of finding out the refusals of *i* type of tests  $\mathcal{G}$  and  $\mu$  accordingly.

For every criterion it is possible to get its mean value by

the formula: 
$$E_{\text{mid}}^{j} = \sum_{i=1}^{n} E_{ij} / n$$

Then the rationed value of estimation  $C_{ii}$  of properties of i- test on finding out the malfunction of j-type it is possible to get through this expression:

$$\begin{split} C_{ij} &= 1, \text{if } E_{ij} > E_{mid}^{j}; \\ C_{ij} &= 0, \text{if } E_{ij} = E_{mid}^{j}; \\ C_{ij} &= -1, \text{if } E_{ij} < E_{mid}^{j}. \end{split}$$

An important factor is also time of tests implementation, which can depending on complication of test be measured in microseconds, hours and even to achieve astronomical values. Logically, that tests of which duration do not accord the economic feasibilities of production or exploitation conditions  $t_i > t_{max}$ , it is necessary to exclude from the examined great number.

For the remaining group, it is necessary to ration the value of parameter testing time and for every test to get the vector of estimation of properties on execution time in

according to expression: 
$$S_t^j = (\sum_{j=1}^n t_j / n) / t_j$$
, where

 $S_t^j$  – estimation of properties of *j*-type test at implementations times; n – Number of tests;  $t_i$  – time of implementation of *i*-test.

Estimations for m criteria and n tests are presented in table. 1.

TABLE 1 ESTIMATIONS FOR TESTS PROPERTIES

| Test Name   |          | Estimations for criteria's |          |  |          |  |  |  |  |  |
|-------------|----------|----------------------------|----------|--|----------|--|--|--|--|--|
| Test Marine | 1        | 2                          | 3        |  | m        |  |  |  |  |  |
| Test 1      | $S_{II}$ | $S_{12}$                   | $S_{13}$ |  | $S_{Im}$ |  |  |  |  |  |
| Test 2      | $S_{12}$ | $S_{22}$                   | $S_{23}$ |  | $S_{2m}$ |  |  |  |  |  |
|             |          |                            |          |  |          |  |  |  |  |  |
| Test n      | $S_{n1}$ | $S_{n2}$                   | $S_{n3}$ |  | $S_{nm}$ |  |  |  |  |  |

Generalized estimation  $S_{sum}^i$  for every test it is possible

to get by the formula: 
$$S_{sum}^{i} = \sum_{j=1}^{m} s_{ij}$$
.

Thus, the task of choice of the optimized tests set is taken to ranging of tests, or to the sorting by their generalized estimations ascending.

#### III. FEATURES OF THE PROGRAM OF THE OPTIMIZED TESTS SEOUENCE CHOICE

For the programming realization of researched task it is suggested to use the determined by the user structured information with the following kind:

As input data to the computer's memory the array of the structured data, containing probabilities of exposure of refusals and tests names, is added.

For sorting of data array it is suggested to use the algorithm of "bubble", where records with the "easy" values of the key field emerge upwards like a bubble. With the purpose of diminishing of steps of algorithm a special flag. allowed comparison of array cells, is entered, and in the beginning this value of this flag will be one.

After activation of comparison operation of a flag with a value zero, and then in pairs compare array cells: first with the second, second with the third and etc, simultaneously assort them. If transposition of elements was carried out, a value is again the flag appropriates a value one and a new cycle of elements sorting begins. If a flag was not set in the one state, it means that there is nothing to assort and a process is stopped.

#### IV. PRACTICAL RESULTS OF RESEARCH

Diagnostic experiments were executed at two different values of power supply tension: high tension (HVcc) and low tension (LVcc). At HVcc in 1545 microcircuits were discovered disrepairs, from them 1343 microcircuits the defects were detected by all tests and only in 202 microcircuits the defects were discovered only by one or a few tests [5].

For implementation of diagnostic experiments the tests were used, of which execution time for the memory's microcircuits of different capacity is resulted in tabl1.

Probabilities of finding out refusals in the microcircuits of memory through the sets of tests of consisting of combinations from two tests calculated on formulas:

$$\forall i, i = \overline{1, n}, q_i = \frac{m_i}{m_s}, \forall i, j = \overline{1, n}, i \neq j, q_{ij} = \frac{m_i \cup m_j}{m_s}, \forall i, j = \overline{1, n}, i \neq j, q_{ij} = \frac{m_i \cup m_j}{m_s}, \forall i, j = \overline{1, n}, i \neq j, q_{ij} = \frac{m_i \cup m_j}{m_s}, \forall i, j = \overline{1, n}, j \neq j, q_{ij} = \frac{m_i \cup m_j}{m_s}, \forall i, j = \overline{1, n}, j \neq j, q_{ij} = \frac{m_i \cup m_j}{m_s}, \forall i, j = \overline{1, n}, j \neq j, q_{ij} = \frac{m_i \cup m_j}{m_s}, \forall i, j = \overline{1, n}, j \neq j, q_{ij} = \frac{m_i \cup m_j}{m_s}, \forall i, j = \overline{1, n}, j \neq j, q_{ij} = \frac{m_i \cup m_j}{m_s}, \forall i, j = \overline{1, n}, j \neq j, q_{ij} = \frac{m_i \cup m_j}{m_s}, \forall i, j = \overline{1, n}, j \neq j, q_{ij} = \frac{m_i \cup m_j}{m_s}, \forall i, j = \overline{1, n}, j \neq j, q_{ij} = \frac{m_i \cup m_j}{m_s}, \forall i, j = \overline{1, n}, j \neq j, q_{ij} = \frac{m_i \cup m_j}{m_s}, \forall i, j = \overline{1, n}, j \neq j, q_{ij} = \frac{m_i \cup m_j}{m_s}, \forall i, j = \overline{1, n}, j \neq j, q_{ij} = \frac{m_i \cup m_j}{m_s}, \forall i, j \in \mathbb{N}$$

where  $m_i$ ,  $m_j$  – an amount of microcircuits, discovered by the tests of *i*, *j* accordingly;  $m_s$  - common amount of defective microcircuits.

Implementation time of the most widespread tests is resulted in table. 2. Time of implementation of different tests can be measured by seconds and can achieve a few days even.

If not to apply measures on optimization of tests set, so the time of arbitrary sequence of tests performance can reach great values and thus the cost of memory's microcircuits will grow sharply.

|    |            | IMPI    | LEMENTATION | TIME OF EXAMI     | NED TESTS       |               |        |
|----|------------|---------|-------------|-------------------|-----------------|---------------|--------|
| N₂ | Test Name  | Formula | Time while  | e tc = 5 nsec and | l memory's capa | acity Mb, sec |        |
|    |            |         | 1           | 4                 | 16              | 64            | 256    |
| 1  | WalkRow    | 8n+2nC  | 10.779      | 86.067            | 687.866         | 5500          | 43990  |
| 2  | WalkColumn | 8n+2nR  | 10.779      | 86.067            | 687.866         | 5500          | 43990  |
| 3  | GalRow     | 6n+4nC  | 21.508      | 171.925           | 1375            | 11000         | 87970  |
| 4  | GalColumn  | 6n+4nR  | 21.506      | 171.925           | 1375            | 11000         | 87970  |
| 5  | Hammer     | 49n     | 0.257       | 1.028             | 4.11            | 16.442        | 65.767 |
| 6  | March SL   | 41n     | 0.215       | 0.86              | 3.439           | 13.757        | 55.029 |
| 7  | March RAW  | 26n     | 0.136       | 0.545             | 2.181           | 8.724         | 34.897 |
| 8  | March SS   | 22n     | 0.115       | 0.461             | 1.845           | 7.382         | 29.528 |
| 9  | March G    | 23n     | 0.121       | 0.482             | 1.929           | 7.718         | 30.87  |
| 10 | March SR   | 14n     | 0.073       | 0.294             | 1.174           | 4.698         | 18.79  |
| 11 | PMOVI      | 13n     | 0.068       | 0.273             | 1.091           | 4.362         | 17.448 |
| 12 | March C    | 10n     | 0.052       | 0.21              | 0.839           | 3.355         | 13.422 |
| 13 | MATS++     | 6n      | 0.031       | 0.126             | 0.503           | 2.013         | 8.053  |
| 14 | MATS+      | 5n      | 0.026       | 0.105             | 0.419           | 1.678         | 6.711  |
| 15 | Scan       | 4n      | 0.021       | 0.084             | 0.336           | 1.342         | 5.369  |

TABLE 2

Results of choice of Paret-optimum sequence of tests, got on the offered algorithm, at HVcc resulted on a fig. 1.

| 🕄 ουτ | TXT.  |          |     |      |      |   | <u>_   ×</u> |
|-------|-------|----------|-----|------|------|---|--------------|
|       |       |          |     |      |      |   | <u> </u>     |
| Paret | -opt: | imal sec | que | ence |      |   |              |
| Best  | test  | number   | -   | 12,  | name | - | March_C      |
| Next  | test  | number   | -   | 9,   | name | - | March_G      |
| Next  | test  | number   | -   | 6,   | name | - | March_SL     |
| Next  | test  | number   | -   | 8,   | name | - | March_SS     |
| Next  | test  | number   | -   | 7,   | name | - | March_RAW    |
| Next  | test  | number   | -   | 5,   | name | - | Hammer       |
| Next  | test  | number   | -   | 4,   | name | - | GalColumn    |
| Next  | test  | number   | -   | з,   | name | - | GalRow       |
| Next  | test  | number   | -   | 11,  | name | - | PMOVI        |
| Next  | test  | number   | -   | 13,  | name | - | MATS++       |
| Next  | test  | number   | -   | 14,  | name | - | MATS+        |
| Next  | test  | number   | -   | 10,  | name | - | March_SR     |
| Next  | test  | number   | -   | 15,  | name | - | Scan         |
| Next  | test  | number   | -   | 2,   | name | - | WalkColumn   |
| Next  | test  | number   | -   | 1,   | name | - | WalkRow      |

Fig. 1. Paret-optimum sequence of tests at HVcc

By results of experimental researches it is established, that 6 tests from the optimized set find out all defects of the given microcircuits, therefore the application of other tests is inexpedient.

The graphs of change duration of memory's microcircuits diagnosing are with a capacity 4 Mb at time of access cycle, equal 5 nsec at HVcc for the arbitrary sequence of tests and optimized sequence resulted on a fig. 2.

Implementation of 6 tests time from an arbitrary set of tests is 345,675 sec and for 6 tests from the optimized sequence – only 3,586 sec.





Fig. 2. Duration of diagnosing for the different sets of tests: Row 1 - for the arbitrary sequence of tests; Row 2 - for the optimized sequence

The histogram of duration of diagnosing for 6 tests from different sets is resulted on a fig. 3.



Fig. 3. Duration of diagnosing for different sets from 6 tests: Column 1 – for the arbitrary sequence of tests; Column 2 – for the optimized sequence; Column 3 – economy of time.

As a result of analysis of this histogram, it is possible to do a conclusion, that the economy of time at application of the optimized set of tests makes 342,089 sec.

The results of choice of Paret-optimum sequence of tests at LVcc are resulted on a fig. 4.

| 🕄 ουτ | TXT.  |          |     |      |      |   | <u>_   ×</u> |
|-------|-------|----------|-----|------|------|---|--------------|
|       |       |          |     |      |      |   | <u> </u>     |
| Paret | -opt: | imal sec | que | ence | :    |   |              |
| Best  | test  | number   | -   | 12,  | name | - | March_C      |
| Next  | test  | number   | -   | 5,   | name | - | Hammer       |
| Next  | test  | number   | -   | з,   | name | - | GalRow       |
| Next  | test  | number   | -   | 7,   | name | - | March_RAW    |
| Next  | test  | number   | -   | 4,   | name | - | GalColumn    |
| Next  | test  | number   | -   | 9,   | name | - | March_G      |
| Next  | test  | number   | -   | 11,  | name | - | PMOVI        |
| Next  | test  | number   | -   | 6,   | name | - | March_SL     |
| Next  | test  | number   | -   | 8,   | name | - | March_SS     |
| Next  | test  | number   | -   | 14,  | name | - | MATS+        |
| Next  | test  | number   | -   | 13,  | name | - | MATS++       |
| Next  | test  | number   | -   | 15,  | name | - | Scan         |
| Next  | test  | number   | -   | 10,  | name | - | March_SR     |
| Next  | test  | number   | -   | 2,   | name | - | WalkColumn   |
| Next  | test  | number   | -   | 1,   | name | - | WalkRow      |

Fig. 4. Paret-optimum sequence of tests at LVcc

The graphs of change duration of memory's microcircuits diagnosing are with a capacity 4 Mb at time of access cycle, equal 5 nsec at LVcc for the arbitrary sequence of tests and optimized sequence resulted on a fig. 5.



The obtained results show that for the different values of tensions of power supplies it is necessary to apply the different sets of tests, providing most economic expenses on conducting of the test diagnosing.

For two combinations of power supply tension: HVcc and LVcc time of diagnosing through 6 tests for the arbitrary sequence of tests is 691, 35 sec (see fig. 6), and for the optimized sequence of tests only 349,701 sec. Thus, the economy of time due of use of Optimal-test program is about 50%.



First column – execution time of 6 tests from an un optimized set, Second – execution time of 6 tests from an optimized set, Third – economy of time

## V. CONCLUSION

The offered method and program allow decreasing labour intensiveness of calculations works at the choice of Paretoptimum tests for diagnosing of semiconductors storages devices. Application of the program Optimal-test will allow reducing expenses on implementation of the test diagnosing of storage devices and decrease their prime price.

#### REFERENCES

- [1] Van de Goor, A.J.: 'Testing semiconductor memories, theory and practice' (ComTex Publishing, Gouda, The Netherlands, 1998, 2nd edn.)
- [2] Melnikov A.V., Ryabtsev V.G. Control of computers memory's modules. – K.: "Kornichuk", 2001. – 172 p.
- [3] Ryabtsev V.G., Kudlaenko V.M, Movchan Y.U. Method of estimation diagnostic properties of the Test Family March. // Proceeding of East-West and Test International Workshop (EWDTW'04). Yalta, Alushta, Crimea, Ukraine, September 23-26, 2004. – Pp. 220-224.
- [4] Aho A.V., Hopkroft D.E., Ulman D.D. Data structure and algorithms.
   M.: Publishing House "Williams", 2000. 384 p.
- [5] Hamdioui S., Van de Goor Ad J., Reyes D. J., Rodgers M. Memory test experiment: industrial results and data. // IEE Proc.-Computer. Digit. Tech., Vol. 153, No. 1, January 2006. – Pp. 1-8.

**Ryabtsev Volodymyr,** doctor of engineering sciences, professor of information technologies and systems faculty, in Cherkassy state technological university. Scientific interests: diagnosing of semiconductor storages devices, instrumental tools for the synthesis of tests of digital systems models verification, which contain built-in memory. Address: Boulevard Shevchenko, 460, CHSTU, Cherkassy, Ukraine. 18006. Tel. (+380)-472-730271. E-mail: volodja18@ukr.net

Al Madi Mudar, postgraduate student of the Cherkassy state technological university. Scientific interests: Verification of memory's microcircuits models. Address: Boulevard Shevchenko, 460, CHSTU, Cherkassy, Ukraine. 18006. Tel. (+380)-472-730271. E-mail: <u>mudarinfo@yahoo.com</u>.

## General Testing Models of SOC Hardware-Software Components

Vladimir Hahanov, IEEE Computer Society Golden Core Member, Eugenia Litvinova, IEEE Society member, W. Gharibi

Abstract — Innovative testable design technologies of hardware and software, which oriented on making graph models of SoC components for effective test development and SoC component verification, are considered. A novel approach to evaluation of hardware and software testability, represented in the form of register transfer graph, is proposed. Instances of making of software graph models for their subsequent testing and diagnosis are shown.

Index Terms — Infrastructure Intellectual Property, Register Transfer Graph, System-on-a-Chip, Testing.

#### I. HARDWARE-SOFTWARE TESTABILITY

A DAPTATION of testing and verification methods of digital systems can bring in big financial and time dividends, when using for testable design and diagnosis of software. Consideration of the following questions can be interesting: 1. Classification of key uses of SoC testable design technologies in software testing and verification problems. 2. Universal model of hardware and software component in the form of directed register transfer and control graph, on which the testable design, test synthesis and analysis problems can be solved. 3. Metrics of testability (controllability and observability) evaluation for hardware and software by the graph register transfer and control model.

The silicon chip that is basis of computers and communicators development has to be considered as the initiate kernel of new testing and verification technologies appearance in software and computer engineering. A chip is used as test area for new facilities and methods creation and testing for component routing, placement, synthesis and analysis. Technological solutions, tested by time in microelectronics, then are captured and implemented into macroelectronics (computer systems and networks). Here are some of artifacts, relating to the continuity of technological innovations development:

1. The Boundary Scan Standards [1] for board and chip levels result in the assertion technique appearance for software testing and verification. 2. The testability analysis facilities [2] (controllability and observability) of digital structures can be adapted to the evaluation of software code to detect critical statements and then to improve software relative to the testability criteria. 3. The covering analysis technologies [3] for given faults by test patterns have to be used for making of the fault covering table of software to estimate the test validity and to diagnose. 4. The Thatte-Abraham [4] and Sharshunov [5] graph register transfer models have to be used for software testing that is reduced to more technological form by structural-logical analysis. 5. Partition of an automaton on control [2] and operating parts is used for reduction of software verification on basis of preliminary synthesis of control and data transfer graphs. 6. Lifecycle curve for hardware [6] represents time stages of yield change at creation, replication and maintenance of software. 7. Platform-based electronic system-level design [7] by using of existent chip sets and GUI-based is isomorphic to the object-oriented programming technology on basis of created libraries. Application of the Electronic System Level Technology in the programming enables to use finished software functional components from basic libraries to create new software. In this case the main design procedure is mapping, oriented on covering of specification functions by existent components, at that new code is nothing more than 10% of a project. 8. The testbench notion [8] that is used for hardware testing and verification by means of HDL-compilers appears in software, realized on C++ language level and higher. 9. Platform-based testbench synthesis [7] by using the existent test libraries (ALINT) for components - standardized GUI-based F-IP SoC functionalities. It has to be used for software test generation on basis of developed libraries of the leading companies. 10. Standard solutions of F-IP in the framework of I-IP [9] can be used for embedded software component testing including faulty software module repair. 11. Twodimensionality assurance in a structure of interconnected functional components (IP-cores) of developed software is based on use of multicore architectures for technological paralleling of computational processes [10] that is quite urgent in the conditions of technological revolution, proposed Intel. 12. Creation of address space for SoC functionalities, which are realized as hardware or software,

Manuscript received June 23, 2008.

Vladimir Hahanov is with the Kharkov National University of Radio Electronics, Lenin Ave, 14, Kharkov, 61166, Ukraine, phone/fax: 70-21-326; e-mail: hahanov@kture.kharkov.ua

Eugenia Litvinova is with the Kharkov National University of Radio Electronics, Lenin Ave, 14, Kharkov, 61166, Ukraine, phone/fax: 70-21-421; e-mail: <u>KIU@kture.kharkov.ua</u>

W. Gharibi is with the Kharkov National University of Radio Electronics, Lenin Ave, 14, Kharkov, 61166, Ukraine, phone/fax: 70-21-326.

gives a digital system the marvelous self-repair feature by means of I-IP for hardware and software components. An instance of it is robust multicore version of hardware. At that a faulty addressable component can be replaced by other one (faultless) in the process of operation. Addressability has to be used when creation of critical software, in which availability of addressable diversion (multiversion) components gives a software system an opportunity to replace components at fault appearance. 13. The technological problem of offline on-chip self-testing, self-diagnosis and self-repair by using external facilities (or without them), which are solved by all leading companies. is quite interesting. To solve the problem the modern wireless and Internet technologies of distant service are applied. Disadvantage of these technologies is opportunity of remote unauthorized access to a chip that can result in unwanted destructive consequences and digital system failure. Though, the specific character of digital system-ona-chip is the marvelous ability to remove faults distantly due to chip connection with outer space by means of Internet or wi-fi, wi-max, bluetooth technologies, which are realized on a chip. Distance correction of software errors is possible due to utilization of SoC memory (which occupies up to 94% of chip area) for software storage. In a case of error detection new faultless code can be saved to this memory. Distance correction of hardware errors is possible due to utilization of Erasable Programmable Logic Device (EPLD), where new faultless bit stream can be saved in a case of fault detection; actually thereby new hardware is created by means of chip reprogramming.

Approximation and interpenetration of technologies result in isomorphic design, testing and verification methods in relation to software and hardware complexes that in essence are natural process of progressive concept assimilation. The most important characteristics of product lifecycle (time-to-market and yield) become commensurable by time and production volume and this fact favours the tendency above. The hardware lifecycle curve, shown in Fig. 1, to within the isomorphism represents time software stages: design, production rampup, fabrication improvement and maintenance.





In the context of lifecycle there are two urgent problems relating to a curve lifting ordinate-direction and a curve compression time-direction that means time-to-market reduction. Here yield rise takes place on all stages: design – because of design errors recovery, production ramp up – correction of code, implemented to SoC memory, volume – because of service pack release, which correct errors by means of distribution by Internet or satellites.

The research aim is to show development directions of effective testable design models and methods for software to raise yield by adaptation of hardware design technologies and reduction of software structures to the existent standards and patterns of testing and verification. The research problems: 1) Development of a software model for testable design and verification; 2) Development of software testing and diagnosis technologies on basis of the register models of operational and control software parts.

#### II. SOC SOFTWARE TESTING TECHNOLOGIES

The standard IEEE 1500 SECT [1] has to be considered as effective component of SoC Infrastructure Intellectual Property. The main its destination is testing of all F-IP functionalities and galvanic connections between them. Next step in evolution of the standard for the purpose of repairable chip creation is development of I-IP components with SoC diagnosis and repair service functions; last ones in the aggregate with a testing module are market attractive:  $I = \{T, D, R\}$ . The diagnosis and repair procedures are not regulated by the testable design standards because of the complexity and ambiguity of a universal solution of this problem for various types of computers. For irregular or unique structures solutions of all three problems are based on a priori redundancy - diversification of component functionalities, which make up SoC. At that rate only it can to say about on-chip repair of a fail element. Concerning regular structures, which have underlying redundancy, such as multi- and matrix processors, one of solution variants can be a controller structure that combines realization of all functions above by means of the Boundary Scan Standard:

$$\begin{split} &I = \{T, F, S, D, R\}, \ T = \{T^1, T^2, ..., T^1, ..., T^n\}; \\ &F = \{F^1, F^2, ..., F^i, ..., F^n\}; \\ &S = \{S^1, S^2, ..., S^i, ..., S^n\}; \ D = f(T, F, S) = F^D \in F; \\ &R = g(D, F) = (F^R \subseteq F) \& (F^R \cup F^D = F). \end{split}$$

Here the first three identifiers of a model are tests for functionalities; components, which represent functions; and boundary scan register cells for identification of functionalities' technical state. Other two ones are represented by functions for SoC diagnosis and repair realization. The first function (D) defines a faulty components set that is computed on basis of the output response vector S and a test, covered all functional faults; it is entered in the form of fault detection table (FDT). Second function (R) formulates the rules of component power reduction by removal of fault elements from addressing and forming of new faultless subset F-IP SoC to use according to its intended purpose.

A question about location of a test and functionality verification analyzer is not problematical. If the matter is unique components they should be connected with service I-IP components no dispersal on a chip area. In a case of the regular matrix structure tests for all cells are the same, so it has to be used for all components; also a single test, diagnosis and repair analyzer has to be in a structure.

A question about computer resource relocation after a faulty cell detection is interesting. If there are additional spares for repair, the problem comes to the optimal replacement of faulty memory cells by spare rows and columns. In other case there are other system repair models, which depends on a multiprocessor representation form. The linear or one-dimensional addressing form defines consistency of input variables n of a decoder and addressable components, which are connected among themselves by relation:  $|A| = 2^n$ . The matrix representation of a multiprocessor specifies two-dimensional component addressing that is oriented on pipelining technologically. In both cases decoding of a cell number by its address is carried out. So, for the purpose of a faulty component address change on a faultless one it is necessary to modify a decoder structure. This problem is strictly technical and its solution comes to the masking of faulty component addresses. Other solution is related to availability of spares in a processor structure. In given case the problem can be reduced to the replacement of one or several faulty processors by faultless elements from the spare. The optimal solution of the problem has considered for a case, when there are several faulty cells in a memory matrix. The problem becomes more complex if digital system functionality has parallelized yet on existent processor matrix  $P = P_{ii}$ ,  $|P| = m \times n$ , which has faulty elements, and it is necessary to reallocate a set of faultless cells  $|P^*| \le |P|$  to obtain the quasioptimal covering of functional subproblems

by a subset of faultless processors. Development of software formal model, to which CAD and EDA technologies can be applied, to use the formal methods of test synthesis, evaluation of fault covering, determination of testability (controllability and observability) for subsequent modernization of software structure is quite urgent. To solve this problem the automaton model cab be used:

$$\begin{cases} M = (M^{OA}, M^{CA}), M^{OA} = \{\vec{X}^{O}, Y^{C}, Y^{O}, \vec{Z}^{O}\}, \\ M^{CA} = \{X^{C}, Y^{O}, Y^{C}, Z^{C}\}; \\ \vec{Z}^{O} = f^{O}(\vec{X}^{O}, Y^{C}); Y^{O} = g^{O}(\vec{X}^{O}, Y^{C}); Z \\ C = f^{C}(X^{C}, Y^{O}); Y^{C} = g^{C}(X^{C}, Y^{O}). \end{cases}$$

Where  $\vec{X}^O, \vec{Z}^O$  are vectors or register input and output variables;  $Y^C, Y^O, Z^C$  are signals of operation control (initialization), announcing signals, and monitoring signals of a control automaton respectively;  $f^O, g^O(f^C, g^C)$  are functions, which determine relations between interface signals in an operational and control automata.

But the automaton model above  $M = (M^{OA}, M^{CA})$  is not technological for a developer at solution of practical problems of testable design. Processor (software)-based modification of one is proposed; it consists of two graphs with directed ribs:

$$\begin{cases} M = (M^{OR}, M^{CG}), M^{OR} = \{R, I\}; \\ R = \{R_1, R_2, ..., R_i, ..., R_n\}, I = \{I_1, I_2, ..., I_j, ..., I_m\}; \\ R_i = f(R_k, I_j); M^{CG} = \{S, E\}; \\ S = \{S_1, S_2, ..., S_i, ..., S_p\}, E = \{E_1, E_2, ..., E_j, ..., E_q\}; \\ S_i = f(S_k, E_j). \end{cases}$$

Here  $M^{OR}$  is Sharshunov register transfer graph [5] with a set of points R, which describes all memory components (registers, flip-flops, counters, memory, input and output buses) used in a program, and a set of ribs, which are marked by instructions I and activate information transfer between points. Expression  $R_i = f(R_k, I_j)$  defines functional dependence between adjacent points  $R_i \rightarrow R_k$ , which are connected by means of operation  $I_j \in I$ . Component  $M^{CG}$  is conceptual graph of a control automaton that is defined on a point set S, which are connected by directed ribs E, marked by transition conditions. Expression  $S_i = f(S_k, E_j)$  defines functional dependence between adjacent points  $S_i \rightarrow S_k$  of a control graph, which are connected to realize jump  $E_j \in E$ .

Instances of register transfer and control graphs are shown in Fig. 2 and 3 respectively.





Advantages of graph models are not only in structure representation of functionals interaction, but applicability of testability analysis methods, because directed graph models have explicit information flow directions, input and output points. On the basis of the testability evaluation experience for digital systems the following metrics of controllability and observability analysis for the graph structures above can be proposed:

$$\begin{split} & G = \{R, I\}; R = \{R_1, R_2, ..., R_n\}, I = \{I_1, I_2, ..., I_m\}; \\ & I_{ij} \in I_i \approx (R_p R_q); \\ & C(R_q) = \frac{1}{k} \times \sum_{i=1}^k \left| \frac{1}{m} \times \left| \bigcup_j I_{ij} \in (R_p R_q) \right| \times C(R_p) \right|; \\ & O(R_p) = \frac{1}{k} \times \sum_{i=1}^k \left| \frac{1}{m} \times \left| \bigcup_j I_{ij} \in (R_p R_q) \right| \times O(R_q) \right|; \\ & C(R_x) = I; O(R_y) = 1. \end{split}$$

Here a software (hardware) module model is represented by the graph  $G = \{R, I\}$  that consists of points (registers) and ribs (instructions). Every graph rib is marked not less one operation  $I_{ij} \in I_i \approx (R_p R_q)$  that forms a command subset, attached to the rib  $(R_pR_q)$ . The controllability criterion for the point  $C(R_q)$  depends on the controllability of previous point  $C(R_p)$  and reduced additive power of a command set

$$\frac{1}{k} \times \sum_{i=1}^{k} \left[ \frac{1}{m} \times \left| \bigcup_{j} I_{ij} \in (R_{p}R_{q}) \right| \right] = \frac{1}{k} \times \sum_{i=1}^{k} \left[ \frac{1}{m} \times d \right] = \frac{1}{k} \times \sum_{i=1}^{k} \left[ \frac{d}{m} \right],$$

which activate k ribs, attached to the given point  $C(R_{a})$ . Here every rib contains d operations (m - the total command quantity), which initiate information transfer to  $(R_nR_n)$ . On the analogy the observability evaluation criterion C(Rp) based on analysis of points-successors and ribs, outgoing from  $C(R_p)$ , is formed. The advantage of the proposed models and criteria of controllability and observability evaluation is their universality, based on realization of direct and inverse implication on a graph, as well as their invariance concerning the testable analysis and test synthesis for software and hardware components. Controllability  $C(R_x) = 1$  of input and observability  $O(R_v) = 1$  of output graph points is initiated by "1" values. As advancement of point analysis to internal lines the values of evaluations above can decrease only.

Thus, the graph points are represented by the following components: input variables, output variables, register variables, ALU block, memory arrays, which are represented in a format of their presentation in a software (hardware) module. Ribs determine an operand (command) set, which transfer (transformation) of information between points. The complete model of a device, represented by the register transfer and control graphs, covers all statements of data transfer and control in a software (hardware) module that is necessary to the synthesis of testable device. At that test synthesis is based on solving of the covering problem of all paths and points in register transfer and control graphs by testbench statements.

The integral evaluation of the point testability in a graph is calculated by formula:  $T(R_i) = C(R_i) \times O(R_i)$ .

The total graph testability for software (hardware) is computed by expression  $T_{total} = \frac{1}{n} \sum_{i=1}^{n} T(R_i)$ .

For instance, represented by a register transfer graph (Fig. 2), computation of testability is given below. The controllability factors are:

$$C(X) = 1;$$

$$C(R_1) = C(X) \times 1 \times \frac{d}{m} = 1 \times 1 \times \frac{2}{8} = \frac{2}{8} = 0,25;$$

$$C(R_2) = C(X) \times 1 \times \frac{d}{m} = 1 \times 1 \times \frac{2}{8} = \frac{2}{8} = 0,25;$$

$$C(R_3) = C(X) \times 1 \times \frac{d}{m} = 1 \times 1 \times \frac{2}{8} = \frac{2}{8} = 0,25;$$

$$C(R_4) = \left( \left[ C(R_1) \times \frac{d}{m} \right] + \left[ C(R_2) \times \frac{d}{m} \right] \right) \times \frac{1}{2} = \frac{1}{16} = 0,0625;$$

$$C(R_5) = \left( \left[ C(R_2) \times \frac{d}{m} \right] + \left[ C(R_3) \times \frac{d}{m} \right] \right) \times \frac{1}{2} = \frac{1}{16} = 0,0625;$$

$$C(Y) = \left( \left[ C(R_4) \times \frac{d}{m} \right] + \left[ C(R_5) \times \frac{d}{m} \right] \right) \times \frac{1}{2} = \frac{1}{64} = 0,015625;$$

С

The point Y has minimal controllability. Observability computation:

$$O(Y) = 1; \quad O(R_4) = O(Y) \times \frac{2}{8} = 1 \times \frac{2}{8} = 0,25;$$
  

$$O(R_5) = O(Y) \times \frac{2}{8} = 1 \times \frac{2}{8} = 0,25;$$
  

$$O(R_3) = O(R_5) \times \frac{2}{8} = \frac{1}{4} \times \frac{2}{8} = 0,0625;$$
  

$$O(R_1) = O(R_4) \times \frac{2}{8} = \frac{1}{4} \times \frac{2}{8} = 0,0625;$$
  

$$O(R_2) = \left[ \left( O(R_4) \times \frac{2}{8} \right) + \left( O(R_5) \times \frac{2}{8} \right) \right] / 2 = \frac{1}{16} = 0,0625;$$
  

$$O(X) = \left[ \left( O(R_1) \times \frac{2}{8} \right) + \left( O(R_2) \times \frac{2}{8} \right) + \left( O(R_3) \times \frac{2}{8} \right) \right] / 3 = \frac{1}{64} = 0,015625.$$

The point X has minimal observability. Testability computation:

$$\begin{split} T(X) &= 1 \times 0,015625 = 0,015625; \\ T(R_1) &= 0,25 \times 0,0625 = 0,015625; \\ T(R_2) &= 0,25 \times 0,0625 = 0,015625; \\ T(R_3) &= 0,25 \times 0,0625 = 0,015625; \\ T(R_4) &= 0,0625 \times 0,25 = 0,015625; \\ T(R_5) &= 0,0625 \times 0,25 = 0,015625; \\ T(Y) &= 0,015625 \times 1 = 0,015625. \end{split}$$

total circuit testability  $T_{total} = 0.015625.$ The Calculation of the testability characteristics for the control automaton graph (Fig. 3) is realized similarly. Determination of the graph controllability:

$$C(S_0) = 1;$$

$$C(S_1) = \left( \left[ C(S_0) \times \frac{d}{m} \right] + \left[ C(S_3) \times \frac{d}{m} \right] \right) \times \frac{1}{2} = \frac{11}{32} = 0,34375;$$

$$C(S_3) = C(S_0) \times 1 \times \frac{d}{m} = 1 \times 1 \times \frac{3}{4} = \frac{3}{4} = 0,75;$$

$$C(S_5) = C(S_3) \times 1 \times \frac{d}{m} = \frac{3}{4} \times 1 \times \frac{2}{4} = \frac{6}{16} = 0,375;$$

$$C(S_4) = \left( \left[ C(S_1) \times \frac{d}{m} \right] + \left[ C(S_3) \times \frac{d}{m} \right] + \left[ C(S_5) \times \frac{d}{m} \right] \right) \times \frac{1}{3} = \frac{47}{384} = 0,1224;$$

$$C(S_2) = \left( \left[ C(S_1) \times \frac{d}{m} \right] + \left[ C(S_4) \times \frac{d}{m} \right] \right) \times \frac{1}{2} = \frac{179}{1536} = 0,11654.$$

The point  $S_2$  has minimal controllability. Observability computation:

$$O(S_{2}) = 1; O(S_{4}) = O(S_{2}) \times 1 \times \frac{2}{4} = 1 \times 1 \times \frac{2}{4} = 0,5;$$
  

$$O(S_{5}) = O(S_{4}) \times 1 \times \frac{1}{4} = \frac{1}{2} \times 1 \times \frac{1}{4} = \frac{1}{8} = 0,125;$$
  

$$O(S_{3}) = \left[ \left( O(S_{1}) \times \frac{1}{4} \right) + \left( O(S_{4}) \times \frac{1}{4} \right) + \left( O(S_{5}) \times \frac{2}{4} \right) \right] / 3 =$$
  

$$= \frac{7}{96} = 0,07292;$$
  

$$O(S_{1}) = \left[ \left( O(S_{2}) \times \frac{2}{4} \right) + \left( O(S_{4}) \times \frac{1}{4} \right) \right] / 2 = \frac{1}{8} = 0,125;$$
  

$$O(S_{0}) = \left[ \left( O(S_{1}) \times \frac{2}{4} \right) + \left( O(S_{3}) \times \frac{3}{4} \right) \right] / 2 = \frac{15}{256} = 0,05859375;$$

The point  $S_0$  has minimal observability. Testability computation:

$$T(S_0) = 1 \times 0.05859375 = 0.05859375;$$
  

$$T(S_1) = 0.34375 \times 0.125 = 0.04296875;$$
  

$$T(S_2) = 0.11654 \times 1 = 0.11654;$$
  

$$T(S_3) = 0.75 \times 0.07292 = 0.05469;$$
  

$$T(S_4) = 0.1224 \times 0.5 = 0.0612;$$
  

$$T(S_5) = 0.375 \times 0.125 = 0.046875$$

The point  $S_1$  has the worst testability. The total circuit testability is  $T_{total} = 0,063478$ .

Thus, the proposed testability evaluation method has the followings advantages: 1) high effectiveness and universality relative to its use for evaluation of register transfer and control graph testability; 2) possibility to detect bottlenecks in software or hardware to modify a project structure; 3) choice of the best project by comparison of alternative variants testability.

#### III. SOFTWARE DIAGNOSIS TECHNOLOGY

At development of large size software verification of development project on the correctness of statements is urgent problem. Complex software includes great many branches and verification of software on every logical path is rather complex problem. A method of faulty statements (errors or faults) searching for software that is based on representation of software algorithm in the form of graph structure for subsequent test generation and fault diagnosis is considered below on an example. Lets it is necessary to verify the software that realizes computation of the following sum of functions:

$$S = (x) + \omega(x),$$

$$x = \begin{cases} x+3; & x < 2; \\ 2x-3; & 2 \le x < 12; \\ -3x+7; & x \ge 12; \end{cases}$$

$$\omega(x) = \begin{cases} \sin(x + \pi/3), x < 2\pi/3; \\ \sin(\pi i n + 2), x \ge 2\pi/3. \end{cases}$$

One of the possible problem solution variants on C++ language is represented by the following listing:

Listing 3.1. #include <iostream> #include <math.h> using namespace std; int main() {

const double Pi=3.14159; double F, w, f, x; cin>>x; if (x<2) f=x+3;</pre>

else if ((x>=2) && (x<12)) f=2\*x-3; else f=-3\*x+7; if (x<2./3.\*Pi) w=sin(x+Pi/3); else w=sin(Pi\*x)+2; F=f+w; cout<<F<<endl; return 0;

Lets an error takes place in a statement of computational part of software. Instead of the correct statement

else w=sin(Pi\*x)+2;

the following one is written:

else w=sin(Pi\*x) - 2;

It is necessary to detect faulty statement in program code by using the testing technology, based on the graph code model. Software diagnosis stages include 4 procedures below.

1. Making of register transfer graph.

Graph ribs are a set of code fragments or separate operations (Fig. 4); graph points are points of information monitoring (registers, variables, memory), which are used for forming of assertions too.



Fig. 4. Register transfer graph

A number of test points in the graph (registers, variables, memory) should be adequate to diagnose of given resolution. Otherwise it is necessary to carry out the analysis of register transfer graph testability for software and to determine the minimal additional quantity of observation lines for forming of assertions, which enable to detect faulty modules with given diagnosis resolution. Every rib (see Fig. 4) is marked by an arithmetic operation set:  $\{1\}$  – summation;  $\{2\}$  – multiplication;  $\{3\}$  – subtraction;  $\{4\}$  – division;  $\{5\}$  – obtainment of trigonometric sine. In a case when there is a branch in a program a number of outgoing ribs from a point is equal to quantity of adjacent sinks that is formed by branch statements in respective part of a program.

Thus, for the code fragment of the instance:

if (x<2) f=x+3; else if ((x>=2) && (x<12)) f=2\*x-3; else f=-3\*x+7;

there are three ribs, outgoing from the point X. Computational results  $I_1, I_2, I_3$ , which depend on the variable X, are checked in the points  $R_1, R_2, R_3$  respectively. In a case of execution of the operation  $I_1$  the following branch is realized:

if (x < 2./3.\*Pi) w=sin(x+Pi/3);

else w=sin(Pi\*x)+2;

Then the general summation operation for all transactions is carried out regardless of which branch statements had been executed.

F=f+w;

The summation operation is executed on various ribs (the objects  $I_{6A}$ ,  $I_{6B}$ ,  $I_{6C}$ ,  $I_{6D}$ ), but all of them correspond to the same part of the program code. So, faultless execution an operation on a rib eliminates a fault on other three ones. On next stages of software diagnosis these objects are merged to  $I_6$ . The result are checked in the final point Y.

The method of software algorithm representation by graph structure enables to show all possible variants of software execution, as well as to simplify realization of next diagnosis stage of software and forming of minimal test.

2. Test synthesis and analysis. A set of ribs are written in the form of disjunctive normal form (DNF), where every term is one-dimensional path from input port to output, subset of which covers а internal lines:  $P = X14Y \lor X15Y \lor X2Y \lor X3Y$ . In the aggregate onedimensional paths, represented in DNF, cover all possible transactions - graph points and ribs. An aggregate of code fragments or statements (activation instructions), written by disjunction, is brought to conformity with every rib. For instance, the path X14Y activates execution of operations on ribs  $I_1, I_4, I_{6A}$ . At that the ribs  $I_1$  and  $I_{6A}$  have only one statement, and consecutive execution of three statements corresponds to the identifier I4. The test  $P_1 = [(1)(1 \lor 4 \lor 5)(1)]$  that activates the path X14Y ensures the correctness check of all statements. Thus, the test of minimal covering of all graph points and ribs by commands,

which activate graph ribs and therefore data movement to observation points, can be written:

$$P = [(1)(1 \lor 4 \lor 5)(1)] \lor [(1)(1 \lor 2 \lor 5)(1)] \lor$$

$$[(2 \lor 3)(1)] \lor [(1 \lor 2)(1)].$$

Subsequent DNF transformation consists of removal of brackets to obtain complete test that enables to check transactions in a graph, which cover all points and ribs in various combinations:

$$P = (111 \lor 141 \lor 151) \lor (111 \lor 121 \lor 151) \lor$$

 $\vee$  (21  $\vee$  31)  $\vee$  (11  $\vee$  12)

The obtained test is redundant; it is not always acceptable for large size software, because of there is large quantity of test patterns. So, the ability to create minimal length test of given resolution is very important. Such test is formed by solving of the covering problem of all graph points and ribs and activation of code fragments sets. When testing it is supposed that hardware components, used in the software are faultless.

3. Fault detection table making. Fault detection table is oriented on verification of code fragments sets on ribs, which form data activation paths to the observation points (graph points). In compliance with comparison of experimental data of tested software and expected responses the output response vector V is formed. In a case of result failure on an observed line the respective coordinate of the vector V takes on a value "1" for the test pattern under consideration. The fault detection table of code fragments on complete test  $P = X14Y \lor X15Y \lor X2Y \lor X3Y$ , where test patterns are written in general form (a set of one-dimensional paths), is shown below:

| T <sub>i</sub> /Ij | I <u>1</u> 1 | I22 | I23 | I31 | I32 | I41 | I44 | I45 | I51 | I52 | I55 | I61 | V |
|--------------------|--------------|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|---|
| X14Y               | 1            |     |     |     |     | 1   | 1   | 1   |     |     |     | 1   | 0 |
| X15Y               | 1            |     |     |     |     |     |     |     | 1   | 1   | 1   | 1   | 1 |
| X2Y                |              | 1   | 1   |     |     |     |     |     |     |     |     | 1   | 0 |
| X3Y                |              |     |     | 1   | 1   |     |     |     |     |     |     | 1   | 0 |
| Faults             | 3            |     |     |     |     |     |     |     | 1   | 1   | 1   |     |   |

The symbolic notation  $I_{jk}$  means execution of a statement that is on the rib  $I_j$  and has index k. For instance,  $I_{22}$  means execution of statement sequence of the rib  $I_2$  at activation of the path X2Y and production operation that corresponds to the fragment of source program code:

else if ((x>=2) && (x<12)) f=2\*x-3;

The diagnosis resolution for the test at the value of vector V = (0100) is determined by three possible faults:  $F = I_{51}I_{52}I_{55}$ . Value "1" of the vector V for a test-vector under consideration means that when issuing second pattern the activation of respective commands execution is took place. The minimal set of DNF terms, which make out all single faults of program fragments of a register transfer graph, is minimal diagnosis test. Next term set (here it coincide with complete test) makes out faults of all instructions, determined in DNF:

 $\mathbf{P} = (111 \lor 141 \lor 151) \lor (111 \lor 121 \lor 151) \lor (21 \lor 31) \lor (11 \lor 12).$ 

Reduction impossibility is conditional on that removal any term does not provide activation of one or several fragments. Then complete and extended fault detection table is made that is formed by a term set above. Every obtained test pattern is divided on parts – terms. First test pattern (111 $\vee$ 141 $\vee$ 151) consists of three terms: (111), (141) and (151). Every of them has own position in a column. All possible executable operations, which are designated I<sub>ik</sub>, where j – rib identifier in a graph, k – statement that transforms data on j-th rib, is distinguished across. The graph path to which a term under consideration is applied is considered. For instance, term (141) is applied to first test pattern that activates the path X14Y. The extended fault detection table is:

| Ti∖Ii            | I11 | I22 | I23 | I31 | I32 | I41 | I44 | I45 | I51 | I52 | I55 | I61 | V |
|------------------|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|---|
| 1111             | 1   |     |     |     |     | 1   |     |     |     |     |     | 1   | 0 |
| 141              | 1   |     |     |     |     |     | 1   |     |     |     |     | 1   | 0 |
| 151 <sub>1</sub> | 1   |     |     |     |     |     |     | 1   |     |     |     | 1   | 0 |
| 1112             | 1   |     |     |     |     |     |     |     | 1   |     |     | 1   | 1 |
| 121              | 1   |     |     |     |     |     |     |     |     | 1   |     | 1   | 1 |
| 151 <sub>2</sub> | 1   |     |     |     |     |     |     |     |     |     | 1   | 1   | 1 |
| 21 <sub>1</sub>  |     | 1   |     |     |     |     |     |     |     |     |     | 1   | 0 |
| 31               |     |     | 1   |     |     |     |     |     |     |     |     | 1   | 0 |
| 11               |     |     |     | 1   |     |     |     |     |     |     |     | 1   | 0 |
| 212              |     |     |     |     | 1   |     |     |     |     |     |     | 1   | 0 |

Every term number means execution of a statement on respective graph rib. First nimber "1" provides activation of the statement  $\{1\}$  I<sub>1</sub>, so opposite respective column "1" is put. Column values of the extended fault detection table are moved from the FDT of code fragments that is defined on complete generalized test. But coordinate value is written for every test term. Extended fault detection table enable to show the results of every test pattern execution and to simplify the fault detection procedure with given resolution.

4. Diagnosis. In compliance with numbers of "1' in the output response vector V quantity of disjunctive CNF terms is formed. Every term is line-by-line writing of faults by logical operation "OR", which influence on distortion of output functional signals. Then transformation CNF to DNF by the Boolean algebra is carried out:

$$\begin{split} F &= (I_{11} \lor I_{51} \lor I_{61})(I_{11} \lor I_{52} \lor I_{61})(I_{11} \lor I_{55} \lor I_{61}) = \\ I_{11} \lor I_{11}I_{55} \lor I_{11}I_{61} \lor I_{11}I_{52} \lor I_{11}I_{52}I_{55} \lor I_{11}I_{52}I_{61} \lor I_{61}I_{11} \lor \\ &\lor I_{11}I_{61}I_{51} \lor I_{11}I_{61} \lor I_{51}I_{11} \lor I_{11}I_{51}I_{55} \lor I_{11}I_{51}I_{61} \lor \\ &\lor I_{11}I_{51}I_{52} \lor \lor I_{51}I_{52}I_{55} \lor I_{51}I_{52}I_{61} \lor I_{51}I_{61}I_{11} \lor \\ &\lor I_{55}I_{61} \lor I_{51}I_{61} \lor I_{11}I_{61} \lor I_{11}I_{55}I_{61} \lor I_{11}I_{61} \lor \\ \end{split}$$

 $\vee \ I{11} \ I{52} \ I{61} \vee \ I{52} \ I{55} \ I{61} \vee \ I{52} \ I{61} \vee \ I{61} \ I{11} \vee \ I{61} \ I{51} \vee \ I{61}.$ 

To reduce the obtained set of possible faults the Boolean algebra laws are used:

$$A \land A = A$$
;  $A \lor B = B \lor A$ ;  $(A \lor B)C = AC \lor BC$ ;  
 $(A \lor B) \lor C = A \lor (B \lor C)$ ;  $A \lor A = A$ ;

 $(A \land B) \lor A = A$ ;  $(A \lor B) \land A = A$ , it enables to obtain the expression:

$$\begin{split} F &= I_{11} \lor I_{11}I_{55} \lor I_{11}I_{61} \lor I_{11}I_{52} \lor I_{11}I_{52}I_{55} \lor \\ &\lor I_{11}I_{52}I_{61} \lor I_{61}I_{11} \lor I_{11}I_{61}I_{51} \lor I_{11}I_{61} \lor \\ &\lor I_{51}I_{11} \lor I_{11}I_{51}I_{55} \lor I_{11}I_{51}I_{61} \lor I_{11}I_{51}I_{52} \lor \\ &\lor I_{51}I_{52}I_{55} \lor I_{51}I_{52}I_{61} \lor I_{51}I_{61}I_{11} \lor I_{55}I_{61} \lor \\ &\lor I_{51}I_{61} \lor I_{11}I_{61} \lor I_{11}I_{55}I_{61} \lor I_{11}I_{61} \lor I_{11}I_{52}I_{61} \lor \\ &\lor I_{52}I_{55}I_{61} \lor I_{52}I_{61} \lor I_{61}I_{11} \lor I_{61}I_{51} \lor I_{61} = \\ &= I_{11} \lor I_{51}I_{52}I_{55} \lor I_{61}. \end{split}$$

Then such elements  $I_{jk}$  from F, which are executed in other test patterns with value  $V_i = 1$ , are removed. A set of

objects, contained the operations, which transform data at program execution uniquely and correctly, is formed:

 $\mathrm{H} = \{\mathrm{X14Y}, \mathrm{X2Y}, \mathrm{X3Y}\} = \{(141) \lor (151) \lor (21_1) \lor (31) \lor$ 

 $\vee$  (11)  $\vee$  (21<sub>2</sub>)} = I<sub>11</sub>  $\vee$  I<sub>22</sub>  $\vee$  I<sub>23</sub>  $\vee$  I<sub>31</sub>  $\vee$  I<sub>32</sub>  $\vee$  I<sub>44</sub>  $\vee$  I<sub>45</sub>  $\vee$  I<sub>61</sub>. After the reduction a single DNF term is obtained:

 $F' = F \setminus H = (I_{11} \vee I_{51} I_{52} I_{55} \vee I_{61}) \setminus (I_{11} \vee I_{22} \vee$ 

 $I_{23} \lor I_{31} \lor I_{32} \lor I_{44} \lor I_{45} \lor I_{61} = I_{51}I_{52}I_{55}.$ 

It means that the software functions with error at execution one of the statements  $\{1,2,5\}$  on the rib I<sub>5</sub>.

Really, an error takes place on linear program part that is applied to a rib of the statement sequence  $I_5$ , namely  $I_{51}$  – execution of subtraction instead of summation.

More exact diagnosis (to within statement) is possible if to use the greater quantity of test points that complicates diagnosis because of necessity to make longer tests. The proposed method enables to analyze software on presence of errors in the code and helps to detect their location. Testing and verification of software is the main problem at programming, and its solving enables to raise software quality and to obviate unforeseen results of its execution. The proposed method is based on representation of software algorithm by the graph structure, where ribs are statement sequences or code fragments, and points are information monitoring points for making of assertions. Creation of minimal quantity of test patterns enables to decrease time of fault detection. At that tests have to cover all possible transactions. Test points quantity has to be minimal and sufficient for diagnosis of given resolution.

## IV. CONCLUSION

The innovative technologies of software and hardware testable design, based on effective test development and verification of digital system-on-a-chip components, are considered.

1. The general directions of utilization of the testable design technologies for digital systems-on-chips in the problems of software testing and verification are shown.

2. The universal model of software and hardware component in the form of directed register transfer and control graph, on which the testable design, test synthesis and analysis problems can be solved, is represented.

3. The metrics of hardware and software testability evaluation (controllability and observability), models of which are represented by directed register transfer and control graphs, is proposed. 4. The technology of software testing and diagnosis on basis of synthesis the graph register transfer models is proposed.

5. The practical importance of proposed methods and models is high interest of the software companies in innovative solutions of the effective software testing and verification problems above.

#### REFERENCES

[1] Francisco DaSilva, Yervant Zorian, Lee Whetsel, Karim Arabi, Rohit Kapur, "Overview of the IEEE P1500 Standard", *ITC International Test Conference*, 2003, pp. 988–997.

[2] Abramovici M., Breuer M.A. and Friedman A.D., "Digital System Testing and Testable Design", *Computer Science Press*, 1998, 652 p.

[3] V.I.Hahanov, S.V.Chumachrnko, W.Gharibi, E.Litvinova, "Algebralogical method for SoC embedded memory repair", *Proceedings of the 15 International Conference «Mixed design of integrated circuits and systems»*, Poland, 2008, pp. 481-486.

[4] Thatte S.M., Abraham J.A, "Test generation for microprocessors", *IEEE Trans. Comput.*, 1980, C-29, No 6, pp. 429-441.

[5] Sharshunov S.G, "Construction of microprocessor tests. 1. The general model. Data processing check", *Automation and telemechanics*, 1985, №11, pp. 145-155.

[6] Zorian Yervant, "What is Infrustructure IP?", *IEEE Design & Test of Computers*, 2002, pp. 5-7.

[7] Douglas Densmore, Roberto Passerone, Alberto Sangiovanni-Vincentelli, "A Platform-Based taxonomy for ESL design", *Design&Test* of computers, September-October, 2006, pp. 359-373.

[8] Bergeron, Janick, "Writing testbenches: functional verification of HDL models", *Boston: Kluwer Academic Publishers*, 2001, 354 p.

[9] Zorian Yervant, "Guest Editor's Introduction: Advances in Infrastructure IP", *IEEE Design and Test of Computers*, 2003, 49 p.

[10] Shameem Akhter, Jason Roberts, "Multi-Core Programming", Intel Press, 2006, 270 p.

Vladimir Hahanov – Dean of the Computer Engineering Faculty, Doctor of Science, Professor. IEEE Computer Society Golden Core Member.

1985 – Ph.D, "Digital systems models and testing methods for computeraided design", Kharkov National University of Radio Electronics.

1996 – Dr. of Science, "Models and methods of digital microprocessors system for fault simulation and testing service", Kharkov National University of Radio Electronics.

2003 – till now dean of the Computer Engineering Faculty, Kharkov National University of Radio Electronics.

1997 – till now – professor of Kharkov National University of Radio Electronics. Kharkov Military University, Kharkov Academy of railway transport, Kharkov Academy of Culture, Kharkov Aerospace University. 1988 - 1997 - senior lecturer.

V. Hahanov has 510 publications, 7 books, and 2 patents.

Scientific work:

 Creation of the computer-aided system for logic simulation, test generation, faults diagnosis of digital devices;

Systems and microprocessor-based structures;

 Two-framed cubic Algebra, cubic form of the graph representation, Cubic models of digital devices, deductive-parallel method of cubic fault simulation, topological deductive back traced parallel fault simulation method, cubic method of test generation;

Algebra Logic fault localization and memory repair methods of SoC Functionality;

- Software tools (C++, Assembler, Fortran) for research.

High performance fault simulation and test generation development for complete digital systems and networks described hierarchical models. The development is based on Multi-Core architectures;

Design automation for testability with IEEE Boundary Scan standards and debugging tools for specialized microprocessor systems and digital systems processing;

Certification and verification of the hardware and software components of the computer systems and networks;

Design automation for educational applications in the field of computer engineering;

Digital Signal Processing and MPEG stsndards.

Honors:

1996 - "Best methodologist of University", Ukraine.

2000, 2001 - honors from "Best scientist of Kharkov region", Ukraine.

2003 - INTEL award of scientific projects competition.

2005 – The best professor of Ukraine.

2005 - Award from President of Ukraine.

2005 - IEEE Diploma for the IEEE conference organization.

2005 – IEEE Computer Society Golden Core Member.

2007 – IEEE Outstanding contribution Award.

Member of three specialized scientific boards for defense of thesis for a Destruction D(4.052.02)

Doctor's degree: D 64.052.02 – systems of design automation, D 64.807.02 – information technologies in control systems.

Leader of the scientific seminar "Design automation and diagnosis of computational devices, systems and networks".

Chairman of the international symposium "IEEE East-West. Design and Test".

Member of 10 organization committee for the International Conferences

Member of IEEE Computer Society from 2000.

Member of High Examination Board of Ministry of Education of Ukraine.

Scientific supervisor of "Design & Test" R&D Lab.

Chief Scientist of Aldec Inc., cooperation with Cadence, Microsoft, Intel.

E-mail: hahanov@kture.kharkov.ua

**Eugenia Litvinova** – Assistent Professor, Doctor of philosophy. IEEE Society member.

1985 – Kharkov National University of Radioelectronics, speciality "Radioelectronic Designing and Production".

1996, Academic degree - candidate of technical science.

2001, Academic status - associate professor.

Senior Lector in Kharkov National University of Radio Electronics, Ukraine.

Over a period of time from 2000 year more 30 scientific publications were made.

Computer Engineering Faculty, Kharkov National University of Radioelectronics, Ukraine, Lenin Ave. 14, Kharkov, Ukraine, 61166, phone: (057) 70-21-421, (057) 70-21-326. E-mail: kiu@kture.kharkov.ua

**W. Gharibi** – PhD Student of the Kharkov National University of Radio Electronics, Computer Engineering Faculty.

## Preparation of Papers for IEEE TRANSACTIONS and JOURNALS (May 2007)

First A. Author, Second B. Author, Jr., and Third C. Author, Member, IEEE

Abstract—These instructions give you guidelines for preparing papers for IEEE TRANSACTIONS and JOURNALS. Use this document as a template if you are using Microsoft Word 6.0 or later. Otherwise, use this document as an instruction set. The electronic file of your paper will be formatted further at IEEE. Define all symbols used in the abstract. Do not cite references in the abstract. Do not delete the blank line immediately above the abstract; it sets the footnote at the bottom of this column.

*Index Terms*—About four key words or phrases in alphabetical order, separated by commas. For a list of suggested keywords, send a blank e-mail to <u>keywords@ieee.org</u> or visit http://www.ieee.org/organizations/pubs/ani prod/keywrd98.txt

#### I. INTRODUCTION

THIS document is a template for Microsoft *Word* versions 6.0 or later. If you are reading a paper or PDF version of this document, please download the electronic file, TRANS-JOUR.DOC, from the IEEE Web site at http://www.ieee.org/web/publications/authors/transjnl/index.html so you can use it to prepare your manuscript. If you would prefer to use LATEX, download IEEE's LATEX style and sample files from the same Web page. Use these LATEX files for formatting, but please follow the instructions in TRANS-JOUR.DOC or TRANS-JOUR.PDF.

If your paper is intended for a *conference*, please contact your conference editor concerning acceptable word

F. A. Author is with the National Institute of Standards and Technology, Boulder, CO 80305 USA (corresponding author to provide phone: 303-555-5555; fax: 303-555-5555; e-mail: author@ boulder.nist.gov).

S. B. Author, Jr., was with Rice University, Houston, TX 77005 USA. He is now with the Department of Physics, Colorado State University, Fort Collins, CO 80523 USA (e-mail: author@lamar.colostate.edu).

T. C. Author is with the Electrical Engineering Department, University of Colorado, Boulder, CO 80309 USA, on leave from the National Research Institute for Metals, Tsukuba, Japan (e-mail: author@nrim.go.jp).

processor formats for your particular conference.

When you open TRANS-JOUR.DOC, select "Page Layout" from the "View" menu in the menu bar (View | Page Layout), which allows you to see the footnotes. Then, type over sections of TRANS-JOUR.DOC or cut and paste from another document and use markup styles. The pull-down style menu is at the left of the Formatting Toolbar at the top of your *Word* window (for example, the style at this point in the document is "Text"). Highlight a section that you want to designate with a certain style, then select the appropriate name on the style menu. The style will adjust your fonts and line spacing. Do not change the font sizes or line spacing to squeeze more text into a limited number of pages. Use italics for emphasis; do not underline.

To insert images in *Word*, position the cursor at the insertion point and either use Insert | Picture | From File or copy the image to the Windows clipboard and then Edit | Paste Special | Picture (with "float over text" unchecked).

IEEE will do the final formatting of your paper. If your paper is intended for a conference, please observe the conference page limits.

#### II. PROCEDURE FOR PAPER SUBMISSION

#### A. Review Stage

Please check with your editor on whether to submit your manuscript as hard copy or electronically for review. If hard copy, submit photocopies such that only one column appears per page. This will give your referees plenty of room to write comments. Send the number of copies specified by your editor (typically four). If submitted electronically, find out if your editor prefers submissions on disk or as e-mail attachments.

If you want to submit your file with one column electronically, please do the following:

--First, click on the View menu and choose Print Layout.

--Second, place your cursor in the first paragraph. Go to the Format menu, choose Columns, choose one column Layout, and choose "apply to whole document" from the dropdown menu.

--Third, click and drag the right margin bar to just over 4 inches in width.

Manuscript received October 9, 2001. (Write the date on which you submitted your paper for review.) This work was supported in part by the U.S. Department of Commerce under Grant BS123456 (sponsor and financial support acknowledgment goes here). Paper titles should be written in uppercase and lowercase letters, not all uppercase. Avoid writing long formulas with subscripts in the title; short formulas that identify the elements are fine (e.g., "Nd–Fe–B"). Do not write "(Invited)" in the title. Full names of authors are preferred in the author field, but are not required. Put a space between authors' initials.

The graphics will stay in the "second" column, but you can drag them to the first column. Make the graphic wider to push out any text that may try to fill in next to the graphic.

## B. Final Stage

When you submit your final version (after your paper has been accepted), print it in two-column format, including figures and tables. You must also send your final manuscript on a disk, via e-mail, or through a Web manuscript submission system as directed by the society contact. You may use *Zip* or CD-ROM disks for large files, or compress files using *Compress, Pkzip, Stuffit*, or *Gzip*.

Also, send a sheet of paper or PDF with complete contact information for all authors. Include full mailing addresses, telephone numbers, fax numbers, and e-mail addresses. This information will be used to send each author a complimentary copy of the journal in which the paper appears. In addition, designate one author as the "corresponding author." This is the author to whom proofs of the paper will be sent. Proofs are sent to the corresponding author only.

## C. Figures

Format and save your graphic images using a suitable graphics processing program that will allow you to create the images as PostScript (PS), Encapsulated PostScript (EPS), or Tagged Image File Format (TIFF), sizes them, and adjusts the resolution settings. If you created your source files in one of the following you will be able to submit the graphics without converting to a PS, EPS, or TIFF file: Microsoft Word, Microsoft PowerPoint, Microsoft Excel, or Portable Document Format (PDF).

## D. Electronic Image Files (Optional)

Import your source files in one of the following: Microsoft Word, Microsoft PowerPoint, Microsoft Excel, or Portable Document Format (PDF); you will be able to submit the graphics without converting to a PS, EPS, or TIFF files. Image quality is very important to how yours graphics will reproduce. Even though we can accept graphics in many formats, we cannot improve your graphics if they are poor quality when we receive them. If your graphic looks low in quality on your printer or monitor, please keep in mind that cannot improve the quality after submission.

If you are importing your graphics into this Word template, please use the following steps:

Under the option EDIT select PASTE SPECIAL. A dialog box will open, select paste picture, then click OK. Your figure should now be in the Word Document.

If you are preparing images in TIFF, EPS, or PS format, note the following. High-contrast line figures and tables should be prepared with 600 dpi resolution and saved with no compression, 1 bit per pixel (monochrome), with file names in the form of "fig3.tif" or "table1.tif."

Photographs and grayscale figures should be prepared with 300 dpi resolution and saved with no compression, 8 bits per pixel (grayscale).

## Sizing of Graphics

Most charts graphs and tables are one column wide (3 1/2 inches or 21 picas) or two-column width (7 1/16 inches, 43 picas wide). We recommend that you avoid sizing figures less than one column wide, as extreme enlargements may distort your images and result in poor reproduction. Therefore, it is better if the image is slightly larger, as a minor reduction in size should not have an adverse affect the quality of the image.

## Size of Author Photographs

The final printed size of an author photograph is exactly 1 inch wide by 1 1/4 inches long (6 picas  $\times$  7 1/2 picas). Please ensure that the author photographs you submit are proportioned similarly. If the author's photograph does not appear at the end of the paper, then please size it so that it is proportional to the standard size of 1 9/16 inches wide by 2 inches long (9 1/2 picas  $\times$  12 picas). JPEG files are only accepted for author photos.

## *How to create a PostScript File*

First, download a PostScript printer driver from http://www.adobe.com/support/downloads/pdrvwin.htm (for Windows) or from http://www.adobe.com/support/downloads/ pdrvmac.htm (for Macintosh) and install the "Generic PostScript Printer" definition. In Word, paste your figure into a new document. Print to a file using the PostScript printer driver. File names should be of the form "fig5.ps." Use Open Type fonts when creating your figures, if possible. A listing of the acceptable fonts are as follows: Open Type Fonts: Times Roman, Helvetica, Helvetica Narrow, Courier, Symbol, Palatino, Avant Garde, Bookman, Zapf Chancery, Zapf Dingbats, and New Century Schoolbook.

## Print Color Graphics Requirements

IEEE accepts color graphics in the following formats: EPS, PS, TIFF, Word, PowerPoint, Excel, and PDF. The resolution of a RGB color TIFF file should be 400 dpi.

When sending color graphics, please supply a high quality hard copy or PDF proof of each image. If we cannot achieve a satisfactory color match using the electronic version of your files, we will have your hard copy scanned. Any of the files types you provide will be converted to RGB color EPS files.



Fig. 1. Magnetization as a function of applied field. Note that "Fig." is abbreviated. There is a period after the figure number, followed by two spaces. It is good practice to explain the significance of the figure in the caption.

#### Web Color Graphics

IEEE accepts color graphics in the following formats: EPS, PS, TIFF, Word, PowerPoint, Excel, and PDF. The resolution of a RGB color TIFF file should be at least 400 dpi.

Your color graphic will be converted to grayscale if no separate grayscale file is provided. If a graphic is to appear in print as black and white, it should be saved and submitted as a black and white file. If a graphic is to appear in print or on IEEE Xplore in color, it should be submitted as RGB color.

## Graphics Checker Tool

The IEEE Graphics Checker Tool enables users to check graphic files. The tool will check journal article graphic files against a set of rules for compliance with IEEE requirements. These requirements are designed to ensure sufficient image quality so they will look acceptable in print. After receiving a graphic or a set of graphics, the tool will check the files against a set of rules. A report will then be e-mailed listing each graphic and whether it met or failed to meet the requirements. If the file fails, a description of why and instructions on how to correct the problem will be sent. The IEEE Graphics Checker Tool is available at http://graphicsqc.ieee.org/

For more Information, contact the IEEE Graphics H-E-L-P Desk by e-mail at <u>graphics@ieee.org</u>. You will then receive an e-mail response and sometimes a request for a sample graphic for us to check.

#### E. Copyright Form

An IEEE copyright form should accompany your final submission. You can get a .pdf, .html, or .doc version at <u>http://www.ieee.org/copyright</u>. Authors are responsible for

 TABLE I

 UNITS FOR MAGNETIC PROPERTIES

| Symbol         | Quantity                | Conversion from Gaussian and CGS EMU to SI <sup>a</sup>                                          |
|----------------|-------------------------|--------------------------------------------------------------------------------------------------|
| Φ              | magnetic flux           | $1 \text{ Mx} \rightarrow 10^{-8} \text{ Wb} = 10^{-8} \text{ V} \cdot \text{s}$                 |
| В              | magnetic flux density,  | $1 \text{ G} \rightarrow 10^{-4} \text{ T} = 10^{-4} \text{ Wb/m}^2$                             |
|                | magnetic induction      |                                                                                                  |
| H              | magnetic field strength | $1 \text{ Oe} \rightarrow 10^3/(4\pi) \text{ A/m}$                                               |
| т              | magnetic moment         | 1  erg/G = 1  emu                                                                                |
|                |                         | $\rightarrow 10^{-3} \text{ A} \cdot \text{m}^2 = 10^{-3} \text{ J/T}$                           |
| M              | magnetization           | $1 \text{ erg/(G·cm}^3) = 1 \text{ emu/cm}^3$                                                    |
|                |                         | $\rightarrow 10^3 \text{ A/m}$                                                                   |
| $4\pi M$       | magnetization           | $1 \text{ G} \rightarrow 10^3/(4\pi) \text{ A/m}$                                                |
| σ              | specific magnetization  | $1 \text{ erg/(G \cdot g)} = 1 \text{ emu/g} \rightarrow 1 \text{ A} \cdot \text{m}^2/\text{kg}$ |
| j              | magnetic dipole         | 1  erg/G = 1  emu                                                                                |
| -              | moment                  | $\rightarrow 4\pi \times 10^{-10} \text{ Wb} \cdot \text{m}$                                     |
| J              | magnetic polarization   | $1 \text{ erg/(G·cm}^3) = 1 \text{ emu/cm}^3$                                                    |
|                |                         | $\rightarrow 4\pi \times 10^{-4} \mathrm{T}$                                                     |
| χ, κ           | susceptibility          | $1 \rightarrow 4\pi$                                                                             |
| χo             | mass susceptibility     | $1 \text{ cm}^3/\text{g} \rightarrow 4\pi \times 10^{-3} \text{ m}^3/\text{kg}$                  |
| u              | permeability            | $1 \rightarrow 4\pi \times 10^{-7} \text{ H/m}$                                                  |
| •              |                         | $=4\pi \times 10^{-7}$ Wb/(A·m)                                                                  |
| μ <sub>r</sub> | relative permeability   | $\mu \rightarrow \mu_r$                                                                          |
| w, W           | energy density          | $1 \text{ erg/cm}^3 \rightarrow 10^{-1} \text{ J/m}^3$                                           |
| N, D           | demagnetizing factor    | $1 \rightarrow 1/(4\pi)$                                                                         |

Vertical lines are optional in tables. Statements that serve as captions for the entire table do not need footnote letters.

<sup>a</sup>Gaussian units are the same as cgs emu for magnetostatics; Mx = maxwell, G = gauss, Oe = oersted; Wb = weber, V = volt, s = second, T = tesla, m = meter, A = ampere, J = joule, kg = kilogram, H = henry.

obtaining any security clearances.

#### III. MATH

If you are using *Word*, use either the Microsoft Equation Editor or the *MathType* add-on (http://www.mathtype.com) for equations in your paper (Insert | Object | Create New | Microsoft Equation *or* MathType Equation). "Float over text" should *not* be selected.

#### IV. UNITS

Use either SI (MKS) or CGS as primary units. (SI units are strongly encouraged.) English units may be used as secondary units (in parentheses). **This applies to papers in data storage.** For example, write "15 Gb/cm<sup>2</sup> (100 Gb/in<sup>2</sup>)." An exception is when English units are used as identifiers in trade, such as "3½-in disk drive." Avoid combining SI and CGS units, such as current in amperes and magnetic field in oersteds. This often leads to confusion because equations do not balance dimensionally. If you must use mixed units, clearly state the units for each quantity in an equation.

The SI unit for magnetic field strength H is A/m. However, if you wish to use units of T, either refer to magnetic flux density B or magnetic field strength symbolized as  $\mu_0 H$ . Use the center dot to separate compound units, e.g., "A·m<sup>2</sup>."

#### V. HELPFUL HINTS

## A. Figures and Tables

Because IEEE will do the final formatting of your paper, you do not need to position figures and tables at the top and bottom of each column. In fact, all figures, figure captions, and tables can be at the end of the paper. Large figures and tables may span both columns. Place figure captions below the figures; place table titles above the tables. If your figure has two parts, include the labels "(a)" and "(b)" as part of the artwork. Please verify that the figures and tables you mention in the text actually exist. **Please do not include captions as part of the figures. Do not put captions in** "**text boxes**" linked to the figures. Use the abbreviation "Fig." even at the beginning of a sentence. Do not abbreviate "Table." Tables are numbered with Roman numerals.

Color printing of figures is available, but is billed to the authors. Include a note with your final paper indicating that you request and will pay for color printing. Do not use color unless it is necessary for the proper interpretation of your figures. If you want reprints of your color article, the reprint order should be submitted promptly. There is an additional charge for color reprints. Please note that many IEEE journals now allow an author to publish color figures on Xplore and black and white figures in print. Contact your society representative for specific requirements.

Figure axis labels are often a source of confusion. Use words rather than symbols. As an example, write the quantity "Magnetization," or "Magnetization M," not just "M." Put units in parentheses. Do not label axes only with units. As in Fig. 1, for example, write "Magnetization (A/m)" or "Magnetization (A · m<sup>-1</sup>)," not just "A/m." Do not label axes with a ratio of quantities and units. For example, write "Temperature (K)," not "Temperature/K."

Multipliers can be especially confusing. Write "Magnetization (kA/m)" or "Magnetization  $(10^3 \text{ A/m})$ ." Do not write "Magnetization (A/m) × 1000" because the reader would not know whether the top axis label in Fig. 1 meant 16000 A/m or 0.016 A/m. Figure labels should be legible, approximately 8 to 12 point type.

## B. References

Number citations consecutively in square brackets [1]. The sentence punctuation follows the brackets [2]. Multiple references [2], [3] are each numbered with separate brackets [1]–[3]. When citing a section in a book, please give the relevant page numbers [2]. In sentences, refer simply to the reference number, as in [3]. Do not use "Ref. [3]" or "reference [3]" except at the beginning of a sentence: "Reference [3] shows ... ." Please do not use automatic

endnotes in *Word*, rather, type the reference list at the end of the paper using the "References" style.

Number footnotes separately in superscripts (Insert | Footnote).<sup>1</sup> Place the actual footnote at the bottom of the column in which it is cited; do not put footnotes in the reference list (endnotes). Use letters for table footnotes (see Table I).

Please note that the references at the end of this document are in the preferred referencing style. Give all authors' names; do not use "*et al.*" unless there are six authors or more. Use a space after authors' initials. Papers that have not been published should be cited as "unpublished" [4]. Papers that have been accepted for publication, but not yet specified for an issue should be cited as "to be published" [5]. Papers that have been submitted for publication should be cited as "submitted for publication should be cited as "row publication should be cited as "submitted for publication should be cited as "submitted for publications [7].

Capitalize only the first word in a paper title, except for proper nouns and element symbols. For papers published in translation journals, please give the English citation first, followed by the original foreign-language citation [8].

## C. Abbreviations and Acronyms

Define abbreviations and acronyms the first time they are used in the text, even after they have already been defined in the abstract. Abbreviations such as IEEE, SI, ac, and dc do not have to be defined. Abbreviations that incorporate periods should not have spaces: write "C.N.R.S.," not "C. N. R. S." Do not use abbreviations in the title unless they are unavoidable (for example, "IEEE" in the title of this article).

#### D. Equations

Number equations consecutively with equation numbers in parentheses flush with the right margin, as in (1). First use the equation editor to create the equation. Then select the "Equation" markup style. Press the tab key and write the equation number in parentheses. To make your equations more compact, you may use the solidus ( / ), the exp function, or appropriate exponents. Use parentheses to avoid ambiguities in denominators. Punctuate equations when they are part of a sentence, as in

$$\int_{0}^{r_{2}} F(r,\varphi) dr d\varphi = [\sigma r_{2} / (2\mu_{0})]$$

$$\cdot \int_{0}^{\infty} \exp(-\lambda |z_{j} - z_{i}|) \lambda^{-1} J_{1}(\lambda r_{2}) J_{0}(\lambda r_{i}) d\lambda.$$
(1)

Be sure that the symbols in your equation have been defined before the equation appears or immediately following. Italicize symbols (T might refer to temperature,

<sup>&</sup>lt;sup>1</sup>It is recommended that footnotes be avoided (except for the unnumbered footnote with the receipt date on the first page). Instead, try to integrate the footnote information into the text.

but T is the unit tesla). Refer to "(1)," not "Eq. (1)" or "equation (1)," except at the beginning of a sentence: "Equation (1) is  $\dots$ ."

## E. Other Recommendations

Use one space after periods and colons. Hyphenate complex modifiers: "zero-field-cooled magnetization." Avoid dangling participles, such as, "Using (1), the potential was calculated." [It is not clear who or what used (1).] Write instead, "The potential was calculated by using (1)," or "Using (1), we calculated the potential."

Use a zero before decimal points: "0.25," not ".25." Use "cm<sup>3</sup>," not "cc." Indicate sample dimensions as "0.1 cm × 0.2 cm," not "0.1 × 0.2 cm<sup>2</sup>." The abbreviation for "seconds" is "s," not "sec." Do not mix complete spellings and abbreviations of units: use "Wb/m<sup>2</sup>" or "webers per square meter," not "webers/m<sup>2</sup>." When expressing a range of values, write "7 to 9" or "7-9," not "7~9."

A parenthetical statement at the end of a sentence is punctuated outside of the closing parenthesis (like this). (A parenthetical sentence is punctuated within the parentheses.) In American English, periods and commas are within quotation marks, like "this period." Other punctuation is "outside"! Avoid contractions; for example, write "do not" instead of "don't." The serial comma is preferred: "A, B, and C" instead of "A, B and C."

If you wish, you may write in the first person singular or plural and use the active voice ("I observed that ..." or "We observed that ..." instead of "It was observed that ..."). Remember to check spelling. If your native language is not English, please get a native English-speaking colleague to carefully proofread your paper.

## VI. SOME COMMON MISTAKES

The word "data" is plural, not singular. The subscript for the permeability of vacuum  $\mu_0$  is zero, not a lowercase letter "o." The term for residual magnetization is "remanence"; the adjective is "remanent"; do not write "remnance" or "remnant." Use the word "micrometer" instead of "micron." A graph within a graph is an "inset," not an "insert." The word "alternatively" is preferred to the word "alternately" (unless you really mean something that alternates). Use the word "whereas" instead of "while" (unless you are referring to simultaneous events). Do not use the word "essentially" to mean "approximately" or "effectively." Do not use the word "issue" as a euphemism for "problem." When compositions are not specified, separate chemical symbols by en-dashes; for example, "NiMn" indicates the intermetallic compound Ni0.5 Mn0.5 whereas "Ni-Mn" indicates an alloy of some composition Ni<sub>x</sub>Mn<sub>1-x</sub>.

Be aware of the different meanings of the homophones "affect" (usually a verb) and "effect" (usually a noun), "complement" and "compliment," "discreet" and "discrete," "principal" (e.g., "principal investigator") and "principle" (e.g., "principle of measurement"). Do not confuse "imply" and "infer."

Prefixes such as "non," "sub," "micro," "multi," and "ultra" are not independent words; they should be joined to the words they modify, usually without a hyphen. There is no period after the "et" in the Latin abbreviation "*et al.*" (it is also italicized). The abbreviation "i.e.," means "that is," and the abbreviation "e.g.," means "for example" (these abbreviations are not italicized).

An excellent style manual and source of information for science writers is [9]. A general IEEE style guide and an *Information for Authors* are both available at http://www.ieee.org/web/publications/authors/transjnl/index.html

## VII. EDITORIAL POLICY

Submission of a manuscript is not required for participation in a conference. Do not submit a reworked version of a paper you have submitted or published elsewhere. Do not publish "preliminary" data or results. The submitting author is responsible for obtaining agreement of all coauthors and any consent required from sponsors before submitting a paper. IEEE TRANSACTIONS and JOURNALS strongly discourage courtesy authorship. It is the obligation of the authors to cite relevant prior work.

The Transactions and Journals Department does not publish conference records or proceedings. The TRANSACTIONS does publish papers related to conferences that have been recommended for publication on the basis of peer review. As a matter of convenience and service to the technical community, these topical papers are collected and published in one issue of the TRANSACTIONS.

At least two reviews are required for every paper submitted. For conference-related papers, the decision to accept or reject a paper is made by the conference editors and publications committee; the recommendations of the referees are advisory only. Undecipherable English is a valid reason for rejection. Authors of rejected papers may revise and resubmit them to the TRANSACTIONS as regular papers, whereupon they will be reviewed by two new referees.

## VIII. PUBLICATION PRINCIPLES

The contents of IEEE TRANSACTIONS and JOURNALS are peer-reviewed and archival. The TRANSACTIONS publishes scholarly articles of archival value as well as tutorial expositions and critical reviews of classical subjects and topics of current interest.

Authors should consider the following points:

- 1) Technical papers submitted for publication must advance the state of knowledge and must cite relevant prior work.
- 2) The length of a submitted paper should be commensurate with the importance, or appropriate to

the complexity, of the work. For example, an obvious extension of previously published work might not be appropriate for publication or might be adequately treated in just a few pages.

- Authors must convince both peer reviewers and the editors of the scientific and technical merit of a paper; the standards of proof are higher when extraordinary or unexpected results are reported.
- 4) Because replication is required for scientific progress, papers submitted for publication must provide sufficient information to allow readers to perform similar experiments or calculations and use the reported results. Although not everything need be disclosed, a paper must contain new, useable, and fully described information. For example, a specimen's chemical composition need not be reported if the main purpose of a paper is to introduce a new measurement technique. Authors should expect to be challenged by reviewers if the results are not supported by adequate data and critical details.
- 5) Papers that describe ongoing work or announce the latest technical achievement, which are suitable for presentation at a professional conference, may not be appropriate for publication in a TRANSACTIONS or JOURNAL.

#### IX. CONCLUSION

A conclusion section is not required. Although a conclusion may review the main points of the paper, do not replicate the abstract as the conclusion. A conclusion might elaborate on the importance of the work or suggest applications and extensions.

#### APPENDIX

Appendixes, if needed, appear before the acknowledgment.

#### ACKNOWLEDGMENT

The preferred spelling of the word "acknowledgment" in American English is without an "e" after the "g." Use the singular heading even if you have many acknowledgments. Avoid expressions such as "One of us (S.B.A.) would like to thank ... ." Instead, write "F. A. Author thanks ... ." **Sponsor and financial support acknowledgments are placed in the unnumbered footnote on the first page, not here.** 

#### REFERENCES

- G. O. Young, "Synthetic structure of industrial plastics (Book style with paper title and editor)," in *Plastics*, 2nd ed. vol. 3, J. Peters, Ed. New York: McGraw-Hill, 1964, pp. 15–64.
- W.-K. Chen, *Linear Networks and Systems* (Book style). Belmont, CA: Wadsworth, 1993, pp. 123–135.

- [3] H. Poor, *An Introduction to Signal Detection and Estimation*. New York: Springer-Verlag, 1985, ch. 4.
- [4] B. Smith, "An approach to graphs of linear forms (Unpublished work style)," unpublished.
- [5] E. H. Miller, "A note on reflector arrays (Periodical style—Accepted for publication)," *IEEE Trans. Antennas Propagat.*, to be published.
- [6] J. Wang, "Fundamentals of erbium-doped fiber amplifiers arrays (Periodical style—Submitted for publication)," *IEEE J. Quantum Electron.*, submitted for publication.
- [7] C. J. Kaufman, Rocky Mountain Research Lab., Boulder, CO, private communication, May 1995.
- [8] Y. Yorozu, M. Hirano, K. Oka, and Y. Tagawa, "Electron spectroscopy studies on magneto-optical media and plastic substrate interfaces (Translation Journals style)," *IEEE Transl. J. Magn.Jpn.*, vol. 2, Aug. 1987, pp. 740–741 [*Dig. 9<sup>th</sup> Annu. Conf. Magnetics* Japan, 1982, p. 301].
- [9] M. Young, *The Techincal Writers Handbook*. Mill Valley, CA: University Science, 1989.
- [10] J. U. Duncombe, "Infrared navigation—Part I: An assessment of feasibility (Periodical style)," *IEEE Trans. Electron Devices*, vol. ED-11, pp. 34–39, Jan. 1959.
- [11] S. Chen, B. Mulgrew, and P. M. Grant, "A clustering technique for digital communications channel equalization using radial basis function networks," *IEEE Trans. Neural Networks*, vol. 4, pp. 570– 578, Jul. 1993.
- [12] R. W. Lucky, "Automatic equalization for digital communication," *Bell Syst. Tech. J.*, vol. 44, no. 4, pp. 547–588, Apr. 1965.
- [13] S. P. Bingulac, "On the compatibility of adaptive controllers (Published Conference Proceedings style)," in *Proc. 4th Annu. Allerton Conf. Circuits and Systems Theory*, New York, 1994, pp. 8– 16.
- [14] G. R. Faulhaber, "Design of service systems with priority reservation," in *Conf. Rec. 1995 IEEE Int. Conf. Communications*, pp. 3–8.
- [15] W. D. Doyle, "Magnetization reversal in films with biaxial anisotropy," in 1987 Proc. INTERMAG Conf., pp. 2.2-1–2.2-6.
- [16] G. W. Juette and L. E. Zeffanella, "Radio noise currents n short sections on bundle conductors (Presented Conference Paper style)," presented at the IEEE Summer power Meeting, Dallas, TX, Jun. 22– 27, 1990, Paper 90 SM 690-0 PWRS.
- [17] J. G. Kreifeldt, "An analysis of surface-detected EMG as an amplitude-modulated noise," presented at the 1989 Int. Conf. Medicine and Biological Engineering, Chicago, IL.
- [18] J. Williams, "Narrow-band analyzer (Thesis or Dissertation style)," Ph.D. dissertation, Dept. Elect. Eng., Harvard Univ., Cambridge, MA, 1993.
- [19] N. Kawasaki, "Parametric study of thermal and chemical nonequilibrium nozzle flow," M.S. thesis, Dept. Electron. Eng., Osaka Univ., Osaka, Japan, 1993.
- [20] J. P. Wilkinson, "Nonlinear resonant circuit devices (Patent style)," U.S. Patent 3 624 12, July 16, 1990.
- [21] *IEEE Criteria for Class IE Electric Systems* (Standards style), IEEE Standard 308, 1969.
- [22] Letter Symbols for Quantities, ANSI Standard Y10.5-1968.
- [23] R. E. Haskell and C. T. Case, "Transient signal propagation in lossless isotropic plasmas (Report style)," USAF Cambridge Res. Lab., Cambridge, MA Rep. ARCRL-66-234 (II), 1994, vol. 2.
- [24] E. E. Reber, R. L. Michell, and C. J. Carter, "Oxygen absorption in the Earth's atmosphere," Aerospace Corp., Los Angeles, CA, Tech. Rep. TR-0200 (420-46)-3, Nov. 1988.
- [25] (Handbook style) Transmission Systems for Communications, 3rd ed., Western Electric Co., Winston-Salem, NC, 1985, pp. 44–60.
- [26] Motorola Semiconductor Data Manual, Motorola Semiconductor Products Inc., Phoenix, AZ, 1989.
- [27] (Basic Book/Monograph Online Sources) J. K. Author. (year, month, day). *Title* (edition) [Type of medium]. Volume (issue). Available: <u>http://www.(URL)</u>
- [28] J. Jones. (1991, May 10). Networks (2nd ed.) [Online]. Available: <u>http://www.atm.com</u>

- [29] (Journal Online Sources style) K. Author. (year, month). Title. Journal [Type of medium]. Volume(issue), paging if given. Available: <u>http://www.(URL)</u>
- [30] R. J. Vidmar. (1992, August). On the use of atmospheric plasmas as electromagnetic reflectors. *IEEE Trans. Plasma Sci.* [Online]. 21(3). pp. 876–880. Available: http://www.halcyon.com/pub/journals/21ps03-vidmar

**First A. Author** (M'76–SM'81–F'87) and the other authors may include biographies at the end of regular papers. Biographies are often not included in conference-related papers. This author became a Member (M) of IEEE in 1976, a Senior Member (SM) in 1981, and a Fellow (F) in 1987. The first paragraph may contain a place and/or date of birth (list place, then date). Next, the author's educational background is listed. The degrees should be listed with type of degree in what field, which institution, city, state, and country, and year degree was earned. The author's major field of study should be lower-cased.

The second paragraph uses the pronoun of the person (he or she) and not the author's last name. It lists military and work experience, including summer and fellowship jobs. Job titles are capitalized. The current job must have a location; previous positions may be listed without one. Information concerning previous publications may be included. Try not to list more than three books or published articles. The format for listing publishers of a book within the biography is: title of book (city, state: publisher name, year) similar to a reference. Current and previous research interests end the paragraph.

The third paragraph begins with the author's title and last name (e.g., Dr. Smith, Prof. Jones, Mr. Kajor, Ms. Hunter). List any memberships in professional societies other than the IEEE. Finally, list any awards and work for IEEE committees and publications. If a photograph is provided, the biography will be indented around it. The photograph is placed at the top left of the biography. Personal hobbies will be deleted from the biography.

Camera-ready was prepared in Kharkov National University of Radio Electronics Approved for publication: 27.06.2008. Format 60×84 1/8. Relative printer's sheets: 11,86. Circulation: 300 copies. Published by SPD FL Andreev K.V. Lenin ave, 14, Kharkov, 61166, Ukraine

Рекомендовано Вченою радою Харківського національного університету радіоелектроніки (протокол № 10 від 27.06.2008) Підписано до друку 27.06.2008. Формат 60×841/8. Умов. друк. арк. 11,86. Тираж 300 прим. Ціна договірна. Віддруковано у ПП Андрєєв К.В. 61166, Харків, просп. Леніна, 14.