Provided for non-commercial research and education use. Not for reproduction, distribution or commercial use.



This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues.

Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited.

In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier's archiving and manuscript policies are encouraged to visit:

http://www.elsevier.com/copyright

Discrete Applied Mathematics 160 (2012) 525-535

Contents lists available at SciVerse ScienceDirect





journal homepage: www.elsevier.com/locate/dam

# Using well-solvable quadratic assignment problems for VLSI interconnect applications

Ben Emanuel<sup>a</sup>, Shmuel Wimer<sup>b,\*</sup>, Gershon Wolansky<sup>a</sup>

<sup>a</sup> Technion, Israel Institute of Technology, Math. Dept., Israel

<sup>b</sup> Bar-Ilan University, Eng. School, Israel

# ARTICLE INFO

Article history: Received 28 September 2010 Received in revised form 11 October 2011 Accepted 17 November 2011 Available online 7 December 2011

Keywords: Quadratic assignment problem Traveling salesman problem VLSI interconnects Networks on chip 3D VLSI

# ABSTRACT

This paper presents several optimization problems occurring in VLSI interconnect, Networks on Chip (NoC) design and 3D VLSI integration, all possessing closed-form solutions obtained by well-solvable Quadratic Assignment Problems (QAP). The first type of problems deals with the optimal ordering of signals in a bus bundle such that the switching power, delay and noise interference are minimized. We extend a known solution of ordering the signals in a bus bundle to minimize the impact of the first order wire-to-wire parasitic capacitance occurring between adjacent wires into a model accounting for also secondary components of wire-to-wire parasitic capacitances. The second type of problems arises in the mapping of computation tasks into an array of processors sharing a common bus, such as those found in NoC. We show a QAP closed-form solution to the optimal mapping problem which simultaneously minimizes the switching power and the average delay of the bus. The third problem deals with the optimization of 3D VLSI, vertically stacking ordinary ICs. Some of the above problems involve k-salesmen Traveling Salesman Problem (TSP), where costs are evaluated for elements located at k-distance apart along the tour. We show a simple proof that these are well-solvable problems and obtain their solution. This is then generalized to well-solvable QAPs obtained by superposition of such TSPs. A simple proof shows that if k-distance TSPs are well-solvable, so is the QAP obtained by their sum, where the solution of 1-distance TSPs dominates all the others.

© 2011 Elsevier B.V. All rights reserved.

#### 1. Introduction

In the following, we present three types of problems occurring in VLSI circuit and system design, and their relation to Traveling Salesmen Problem (TSP) and Quadratic Assignment Problem (QAP). TSP and QAP are well known intractable problems, but there are special instances where TSP [9,4,6] and QAP [21,3,7,17] are well-solvable. In the following, we show that those VLSI problems are mapped into well-solvable TSPs and QAPs.

#### 1.1. Minimizing delay and dynamic power in VLSI interconnect bus bundle

Fig. 1a illustrates a commonly used *n*-wire bus bundle. There, logic gates called drivers drive signals propagating along interconnecting wires. These signals stimulate other logic circuits, called receivers, connected at the opposite end of the wires. The bus is shielded by wires connected to ground. Cross-coupling parasitic capacitance (cross-cap for short) which is a predominant cause of signal propagation delay, dynamic (switching) power consumption and crosstalk noise interference,

<sup>\*</sup> Corresponding author. Tel.: +972 3 5317208; fax: +972 3 7384051. *E-mail address*: wimers@macs.biu.ac.il (S. Wimer).

<sup>0166-218</sup>X/\$ – see front matter 0 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.dam.2011.11.017



Fig. 1a. Typical VLSI interconnect bus. Drivers are shown on the left side and receivers on the right side. The bus is shielded on its two sides. A parasitic cross-coupling capacitance is incurring between any adjacent signals.



Fig. 1b. Cross-section of a 7-signal shielded bus. All line-to-line cross-coupling capacitances are shown (1-distance). Few other 2-distance, 3-distance and 4-distance capacitances are also shown.

occurs between any two wires of the bus [1]. The primary component of the cross-cap is occurring between adjacent wires, where we say that the wires are at 1-distance of each other. Secondary cross-cap components exist in the bus as shown in Fig. 1b. Most interconnect optimization algorithms account for only 1-distance cross-cap, which is claimed to reach 90% of the total cross-cap. A secondary component of about 6% is due to wires at 2-distance and the rest are due to higher distances. The secondary cross-cap impact on power dissipation and the accuracy of power estimation were studied in [15]. The question of how to order the wires in the bus to yield best performance (dynamic power consumption, delay and noise immunity) has been studied for 1-distance cross-cap, and it was shown to be a well-solvable TSP [20]. It is shown in the sequel that accounting for secondary cross-cap components results in a well-solvable QAP, which generalizes the former result.

A well known VLSI optimization problem is the setting of wire widths and spaces in a bus bundle whose total width (the distance between the shields in Fig. 1a) is constrained [19]. The question of how to order the wires in the bus such the above optimization will yield best bus performance was discussed in several works [13,12,18,10,8,22], where at each paper focused on a single objective, an all considered only 1-distance cross-cap. We show subsequently that the consideration of secondary cross-cap does not change the optimal order of wires in the bus. Accounting for a *k*-distance component alone,  $1 \le k \le n - 1$ , implies a sort of TSP with *k* salesmen. Considering all distances simultaneously yields a special QAP, which is a sum of all *k* salesmen TSPs. We show in Section 3 that in the combined problem the 1-distance solution, corresponding to the ordinary TSP (one salesman), dominates all the others.

Bus modeling associates with every wire a parameter r whose meaning depends on the optimization problem of interest. For delays r is the resistance of the signal's driver. For dynamic power consumption and noise interference r represents signal's relative switching probability, called activity factor in VLSI jargon [11]. Given an arbitrary order of the wires in the bus, let n real nonnegative parameters  $r_1, \ldots, r_n$  be associated with the wires. It was shown in [13,12,18,10,8,22] that up to a multiplicative factor which is independent of problem's setting, once the widths and the spaces are set to minimize delay, dynamic power or noise interference, the objective function satisfies the following expression:

$$F(r_1, \dots, r_n) = \sqrt{r_1} + \sum_{i=1}^{n-1} \sqrt{r_i + r_{i+1}} + \sqrt{r_n}.$$
(1.1)

In [18] the minimization of noise interference between signals was considered. In [13] average delay of a signal was the minimization objective and in [12,10,8,22] dynamic power minimization was addressed.

Eq. (1.1) rouses the question of what is the order of wires among all the *n*! possible permutations which minimizes *F*. For convenience, expression (1.1) can be modified into a cyclic sum by adding two artificial parameters  $r_0 = r_{n+1} = 0$ , turning it into  $F(r_0, r_1, ..., r_n, r_{n+1}) = \sum_{i=0}^{n+1} \sqrt{r_i + r_{i+1}}$ , where  $n + 2 \stackrel{\triangle}{=} 0$ . To adhere the indexing commonly used in combinatorial

optimization literature and w.l.o.g, we will explore the order of wires which minimizes the expression

$$F(r_1, \dots, r_n) = \sum_{i=1}^n \sqrt{r_i + r_{i+1}},$$
(1.2)

and a more general functions, where square root is just a special case. In (1.2) successive parameters  $r_i$  and  $r_{i+1}$  correspond to wires residing at 1-distance of each other, and  $n + 1 \stackrel{\triangle}{=} 1$ .

Assume w.l.o.g that  $r_1 < r_2 < \cdots < r_n$ . Let  $\Pi$  denote the set of all permutations  $\pi : \{1, \ldots, n\} \rightarrow \{1, \ldots, n\}$ , and the sequence  $\langle i_1, \ldots, i_n \rangle$ , called also a *tour*, be obtained by  $\pi (j) = i_j$ . The works in [13,12,18,10,8,22] showed that the permutation yielding the sequence  $\langle 1, 3, \ldots, n, \ldots, 4, 2 \rangle$  is minimizing the expression:

$$F(\pi) = \sum_{j=1}^{n} \sqrt{r_{i_j} + r_{i_{j+1}}} \stackrel{\Delta}{=} \sum_{j=1}^{n} \sqrt{r_{\pi^{-1}(j)} + r_{\pi^{-1}(j+1)}}.$$
(1.3)

This permutation was studied thoroughly in combinatorial optimization and is called *Symmetric Pyramidal Tour Permutation* (SPTP) [9]. It was shown in [13] that 10%–15% of delay reduction can be achieved by SPTP order of wires compared to the order used for a real microprocessor designed in 65 nm process technology. Same amount of power reduction was reported in [12] for the same microprocessor design. Similar numbers were reported in [10,8,22].

SPTP is a well known solution of a special case of TSP [4,6] whose cost matrix satisfies the so called "four-point" conditions [6]. The work in [20] showed that all the above mentioned VLSI problems satisfy Supnick conditions for TSP [16], for which SPTP is indeed optimal. It also showed that SPTP is optimal for a more general form of (1.3), where  $F(\pi) = \sum_{j=1}^{n} f(r_{\pi^{-1}(j)}, r_{\pi^{-1}(j+1)})$ , is a symmetric real function defined for  $x \ge 0$  and  $y \ge 0$ , twice differentiable, and satisfying  $\partial^2 f(x, y) / \partial x \partial y < 0$ .

The above works explored the optimal wires order in the bus in the presence of only primary cross-cap. The question of whether accounting for secondary cross-cap components may change the SPTP optimal order is interesting. The following discussion takes the secondary cross-cap into account and maps the optimization problem into a special QAP as a motivation for the study in Sections 2 and 3. The main result there is that the addition of the approximated secondary cross-cap preserves the SPTP optimal solution. One can expect for slight changes in the optimization which is beyond the scope of this paper.

Given a wire located at position *j* in the bundle, we account for its cross-cap to wires positioned at j + k,  $1 \le k \le n - 1$ , where wire indices are numbered cyclically. Let  $\alpha_k \ge 0$ ,  $1 \le k \le n - 1$ ,  $\sum_{k=1}^{n-1} \alpha_k = 1$ , be the *k*-distance cross-cap coefficient, e.g.,  $\alpha_1 = 0.9$ ,  $\alpha_2 = 0.06$ , etc. The minimum power and delay when higher than 1-distance cross-cap components are considered cannot be expressed as a linear sum of expressions as those in (1.3). Still, simulations show that the subsequent weighted sum, where the coefficients are monotonic decreasing in wire distance, yields a fair approximation [15]. We are therefore interested in finding the permutation minimizing the expression

$$F(\pi) = \sum_{k=1}^{n-1} \alpha_k \sum_{j=1}^{n-k} \sqrt{r_{\pi^{-1}(j)} + r_{\pi^{-1}(j+k)}}.$$
(1.4)

The consideration of higher distances turns the problem of finding the optimal permutation into a special QAP case, for which SPTP closed-form solutions exist. A variety of engineering optimization problems yielding SPTP optimal QAP solution are presented in [21]. Papers [3,7] discuss well-solvable QAP and provide good references to this topic. Given a  $n \times n$  real cost matrix  $\mathbf{C} = (c_{ij})$  and a  $n \times n$  real distance matrix  $\mathbf{D} = (d_{ij})$ , QAP aims at finding a permutation  $\pi$  minimizing the expression

$$F(\pi) = \sum_{i=1}^{n} \sum_{j=1}^{n} c_{\pi^{-1}(i)\pi^{-1}(j)} d_{ij}.$$
(1.5)

Eq. (1.4) is mapped into (1.5) by defining the costs  $c_{ij} = \sqrt{r_i + r_j}$ ,  $1 \le i, j \le n, i \ne j$ , and  $c_{ii} = 0, 1 \le i \le n$ . Distances are defined by  $d_{ij} = \alpha_{j-i}$ ,  $1 \le i < j \le n$ , and  $d_{ij} = 0$  otherwise. Section 3 shows that when a QAP is obtained by a sum of *k*-distance well-solvable TSPs the optimal permutation is SPTP of 1-distance TSP, which dominates all the others.

# 1.2. Minimizing the dynamic power and latency in a bus shared by an array of processors

Fig. 2a illustrates architecture of an array of identical processors sharing a common segmented bus, through which each pair of processors can communicate. Such an array is used to execute in parallel several computation tasks where tasks are communicating with each other through the bus. The goal in allocating tasks to processors is to minimize both the power consumed on the bus and the average data transfer latency. Such problems arise lately in the area of Networks on Chips (NoC) and [2] provides many references to works done in the area. The results of those works have been obtained by experiments and simulations, without analysis of problem's combinatorial properties, something done below.

# Author's personal copy

B. Emanuel et al. / Discrete Applied Mathematics 160 (2012) 525-535



**Fig. 2**. (a) An array of processors sharing a common segmented bus. Two processors only can communicate through the bus simultaneously. (b) The internal structure of a switch which segments the bus. Internal switches are controlled by signals (not shown) such that a connection between two communicating processors is established, while the unnecessary parts of the bus are disconnected.

We assume that at any time only two processors can communicate through the bus. Since transferred data is switching between 0 and 1, the dynamic power consumed is dictated by the active portion of bus. The bus is therefore automatically configured to minimize the capacitive load and avoid switching of any unnecessary portion. This is achieved by switches connecting the segment between processors  $P_i$  and  $P_j$  when they are communicating, and disconnecting the segments extending from  $P_1$  to  $P_i$  and from  $P_j$  to  $P_n$ . A switch S is shown in Fig. 2(b), comprising three internal switches:  $S_P$  connects the bus and processor P, while  $S_L$  and  $S_R$  establish leftward and rightward connections, respectively. The switches are controlled by signals to yield the desired configuration.

A computation task is associated with a probability  $p_i$ , corresponding to the relative time along the entire run-time of an application it is active on the bus. The probability  $p_i$  is known in advance, e.g. from simulations. Since only a single pair of processors can communicate at any time, it follows that  $\sum_{i=1}^{n-1} \sum_{j=i+1}^{n} p_i p_j = 1$ . We assume that the bus capacitive load incurring between  $P_i$  and  $P_j$  is proportional to their distance, given by  $\gamma | i - j |$ , where  $\gamma$  is some factor independent of the allocation. This holds since capacitance is proportional to area of wires, which is proportional to |i - j|, the length of wire connecting  $P_i$  with  $P_j$ .

Let  $\pi$  denote the order of tasks allocated to processors. The total dynamic power consumed along the entire run time of an application is given by:

$$F(\pi) = \gamma \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} p_{\pi^{-1}(i)} p_{\pi^{-1}(j)}(j-i).$$
(1.6)

Eq. (1.6) can be represented as QAP by defining the costs  $c_{ij} = p_i p_j$ ,  $1 \le i, j \le n, i \ne j$ , and  $c_{ii} = 0, 1 \le i \le n$ . Defining distances  $d_{ij} = j - i, 1 \le i < j \le n$ , and  $d_{ij} = 0$  otherwise, expression (1.6) turns into (1.5). Task allocation problem is reminiscent of the average access time minimization problem [17], and it is an Anti-Monge–Toeplitz well-solvable QAP solved by SPTP [3].

Consider now the minimization of average data transfer latency along the bus. The travel time of data between  $P_i$  and  $P_j$  is proportional to the *RC* delay of the bus segment connecting  $P_i$  with  $P_j$ , given by  $\delta(j-i)^2$ , where  $\delta$  is some factor independent of the allocation. Notice that minimizing the average latency is equivalent to minimizing its total, given by:

$$F(\pi) = \delta \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} p_{\pi^{-1}(i)} p_{\pi^{-1}(j)} (j-i)^2.$$
(1.7)

Eq. (1.7) can be represented as QAP by defining the costs same as in (1.6), and distances by  $d_{ij} = (j - i)^2$ ,  $1 \le i < j \le n$ , and  $d_{ij} = 0$  otherwise, turning (1.7) into (1.5). This problem is also an Anti-Monge–Toeplitz well-solvable QAP solved by SPTP [3].

Notice that a ring bus may share the same results as obtained for the linear bus in Fig. 2, provided that the ring connectivity topology is implemented in such a way that the distances between pair of connected processors is more or less the same for the entire ring. In that case there is no difference between the linear and ring bus with equal distances between processors. However, if the distances between processors are arbitrary, the problem for both buses turns to be ordinary intractable QAP.

#### 1.3. Optimizing 3D VLSI physical integration

The steady progress of VLSI CMOS technology, lasting for already five decades and known as Moore's law, which doubles the transistor count on silicon area every two years, will probably come to end within a decade or so due to technology limitations. On the other hand, the ever growing demand for computational power is driving the Integrated Circuit (IC) industry to explore and develop new integration technologies. 3D IC integration is a novel technology of growing importance, offering significant performance and functional benefits [14,5]. As shown in Fig. 3 3D ICs incorporate individual, vertically stacked ordinary planar ICs, where the interconnections between ICs are implemented by the so-called *Through-Silicon Via* (TSV). While today's 3D VLSI technology is capable of stacking up to eight ICs, it is expected that within a few years this number will grow into few dozens.

3D integration extends most of the performance affecting factors like wire length, the amount of data traffic, and few more, from planar into 3D models, for which the order of vertical stacking is critical. Moreover, cumulative and bulky

528



Fig. 3. (a) 3D VLSI comprising vertically stacked ordinary planar ICs. (b) Cross section of 3D IC incorporating 3 planar ICs. Inter-IC connections are implemented by Through-Silicon Via (TSV).

factors occurring in the individual planar ICs like the number of interconnects, the amount of switching occurring during IC operation, heat dissipation, peak current and the total capacitance of IC, have a primary impact on the 3D IC stacking order [14]. Those factors affect not only the individual IC, but also the ICs above and below. The interrelations between ICs and their impact on performance, reliability and cost of the entire 3D product lead to similar expressions as those mentioned in former examples, emphasizing the importance of TSP and QAP for those technologies and design methods.

Consider for instance the Through-Silicon Vias (TSVs), being the critical resource for 3D integration due to their limited number. Their size (pitch) is far larger compared to ordinary planar 2D vias used in the individual ICs, making their number very limited [14]. Since the number of inter-IC signals may reach tens of thousands, we would like to stack the ICs such that the total required number of TSVs is minimized. Considering *n* vertically stacked ICs, let  $p_{ij}$  be the number of signals connecting IC *i* with IC *j*,  $1 \le i, j \le n$ . Since an interconnect between IC *i* with IC *j* must pass through all the ICs in between, the number of TSVs required to implement those inter-IC connections is  $p_{ij} \times |i - j|$ . The total required number of TSVs is therefore

$$F(\pi) = \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} p_{\pi^{-1}(i)\pi^{-1}(j)}(j-i).$$
(1.8)

Though (1.8) is an ordinary intractable QAP problem,  $p_{ij}$  may in many cases have a special form of sum or product matrix, resulting in a well-solvable QAP.

In the above we described several VLSI problems; all represented as QAP, where their costs imply types of Monge matrix and their distances imply types of Toeplitz matrix. All problems can therefore be represented as sums of weighted TSPs where costs are taken between *k*-distances elements along the tour. It is therefore worthwhile to further explore the relation between QAP and sums of TSP problems. In the rest of the paper we will return to the types of QAP as in (1.4) and prove that those can be treated as a superposition of well-solvable TSPs, a fact that enables a simple proof showing that the combined problem is also well-solvable QAP. First, a *k*-distance,  $1 \le k \le n - 1$ , TSP special case is discussed and a closed-form optimal permutation is shown, obtained by an arbitrary evenly interleave of *k* SPTPs of independent problems with n/kelements each. We then derive a closed-form optimal permutation for the special QAP obtained by summing *k*-distance TSPs,  $1 \le k \le n - 1$ . Though every *k*-distance TSP yields different optimal permutation, their sum is always optimally solved by SPTP of 1-distance problem (ordinary TSP). We will conclude with few directions for further research.

# 2. The minimization of k-distance special TSP

The following lemma was proved in [20] and is used in the subsequent discussion to derive solutions for the TSP and QAP special cases.

**Lemma 1.** Let f(x, y) be a symmetric real function defined for  $x \ge 0$  and  $y \ge 0$ , twice differentiable, satisfying  $\partial^2 f(x, y)/\partial x \partial y < 0$ . If a, b, c and d are real nonnegative numbers satisfying  $0 \le a < b < c < d$  then

$$f(a, b) + f(c, d) < f(a, c) + f(b, d) < f(a, d) + f(b, c).$$
(2.1)

In the following we will use the term k-distance SPTP to denote the optimal permutation when the cost is measured at k-distance along the tour. In this terminology the ordinary SPTP corresponds to 1-distance SPTP. In order to derive the permutation which minimizes the TSP cost taken at k-distance, we first consider the case of 2-distance.

**Lemma 2.** Let f satisfy the conditions of Lemma 1,  $0 \le r_1 < r_2 < \cdots < r_{n-1} < r_n$  be real numbers and let n be even. An optimal permutation  $\pi^* \in \Pi$  minimizing

$$F(\pi) = \sum_{j=1}^{n} f\left(r_{\pi^{-1}(j)}, r_{\pi^{-1}(j+2)}\right)$$
(2.2)



**Fig. 4a.** The set of elements is ordered in increasing order  $r_1 < r_2 < \cdots < r_{n-1} < r_n$  and its division into two subsets, each ordered in SPTP.



Fig. 4b. An optimal permutation is obtained by any even interleave of the two SPTPs obtained in Fig. 4a.



**Fig. 5a.** The blue and red elements belong to  $B_1$  and  $B_2$ , respectively.  $A_1$  and  $A_2$  are SPTP ordered. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

satisfies the following:

- 1. It divides  $\{r_1, r_2, \ldots, r_n\}$  into two equal size subsets  $B_1 = \{r_1, r_2, \ldots, r_{n/2}\}$  and  $B_2 = \{r_{n/2+1}, r_{n/2+2}, \ldots, r_n\}$  as shown in Fig. 4a.
- 2. The elements in each of  $B_1$  and  $B_2$  are ordered by 1-distance SPTP, as shown in Fig. 4a. We denote those permutations by  $\pi_1^*$  and  $\pi_2^*$ , respectively.

**Proof.** We will prove first that the elements must be partitioned into two independent, equal sizes subsets where each must be SPTP ordered, as any other than  $(B_1, B_2)$  partition necessarily results in higher TSP cost. Since *n* is even and the cost is calculated for elements at 2-distance of each other, it follows that any tour must divide the elements into two sets  $A_1$  and  $A_2$  satisfying  $|A_1| = |A_2| = n/2$ . Moreover, if  $A_1$  is ordered by a permutation  $\pi_1$  and  $A_2$  by  $\pi_2$ , any combined permutation  $\pi$  must be a result of even interleave of  $\pi_1$  and  $\pi_2$  (Fig. 4b). Let  $A_1 = \{r_{j_1}, r_{j_2}, \ldots, r_{j_n/2}\}$  and  $A_2 = \{r_{j_n/2+1}, r_{j_n/2+2}, \ldots, r_{j_n}\}$  be the sets of n/2 elements each. Since the elements of  $A_1$  are not interacting with those of  $A_2$  it follows that  $F(\pi) = F(\pi_1) + F(\pi_2)$ . Each of  $\pi_1$  and  $\pi_2$  must therefore be 1-distance SPTP as otherwise  $F(\pi)$  was not minimal; hence 2 follows.

Proving that  $A_1 = B_1$  (and hence  $A_2 = B_2$ ) is done by showing that the interaction of elements from  $B_1$  with those of  $B_2$ in other than  $(B_1, B_2)$  partition results in higher TSP cost. So assume in contrary that  $A_1 \neq B_1$  and therefore  $A_1 \cap B_2 \neq \emptyset$ . By 2  $A_1$  and  $A_2$  are ordered in SPTP. Since the elements of  $B_2$  are larger than those of  $B_1$ , the elements of  $B_2$  are evenly centered and the elements of  $B_1$  are evenly tailed in  $A_1$  and  $A_2$ , as shown in Fig. 5a. Since the cost is calculated cyclically,  $A_1$  and  $A_2$  can



**Fig. 5b.** The new permutations (up to cyclic shift which does not change cost) as obtained after swapping the elements of  $A_1 \cap B_2$  with  $A_2 \cap B_1$ .

be viewed as being consist of two contiguous sequences, one comprising element of  $B_1$  (blue colored) and one comprising element of  $B_2$  (red colored). Moreover, there exists  $|A_1 \cap B_2| = |A_2 \cap B_1|$ . We can therefore modify  $A_1$  and  $A_2$  into  $B_1$  and  $B_2$ , respectively, by exchanging  $A_1 \cap B_2$  with  $A_2 \cap B_1$ , while preserving the n/2 size of each set. We denote by  $\pi'_1$  and  $\pi'_2$ , respectively, the new permutations thus obtained.

In the following we will show that the resulting permutations  $\pi'_1$  and  $\pi'_2$  yield lower cost than  $\pi_1$  and  $\pi_2$  do. We denote the border elements of  $B_1$  and  $B_2$  in  $\pi_1$  by a, b, c and d, and in  $\pi_2 u, v, w$  and x, as shown in Fig. 5a. There exists  $\max\{a, u\} < \min\{b, v\}$  and  $\min\{c, w\} > \max\{d, x\}$ . The order obtained after the swap (up to cyclic shift which does not change cost) is shown in Fig. 5b. Recall that the cost between 1-distance elements is taken cyclically, hence the extreme elements are also at 1-distance of each other. Comparison of the costs of  $\pi$  and  $\pi'$  yields:

$$F(\pi) - F(\pi') = [F(\pi_1) + F(\pi_2)] - [F(\pi'_1) + F(\pi'_2)]$$
  
=  $[f(a, b) + f(c, d) + f(u, v) + f(w, x)] - [f(v, b) + f(c, w) + f(a, u) + f(x, d)]$   
=  $\underbrace{[f(a, b) + f(v, u)] - [f(a, u) + f(v, b)]}_{(1)} + \underbrace{[f(c, d) + f(w, x)] - [f(c, w) + f(x, d)]}_{(2)} > 0.$  (2.3)

Inequality (2.3) follows from expression (1) which is positive due to  $\max\{a, u\} < \min\{b, v\}$  so the left hand side of inequality (2.1) applies. Expression (2) is also positive since  $min\{c, w\} > max\{d, x\}$  and Lemma 1 applies again. Inequality (2.3) means that  $\pi$  is not optimal. This is the outcome of the assumption that  $A_1 \neq B_1$ , hence equality must exist and the lemma follows.  $\Box$ 

Few comments are in order. It follows from Lemma 2 that 2n equivalent permutations minimizing  $F(\pi)$  defined in (2.2) can be constructed. Those are obtained by the n/2 relative shifts of the evenly interleaved elements of  $B_1$  with respect to those of  $B_2$ , and mirroring (reverting)  $\pi_1^*$  and  $\pi_2^*$ . The partition into  $B_1$  and  $B_2$ , and their implied permutations are called by some papers sub-tours. In case of odd n there is no separation into sub-tours and the optimal permutation is uniquely defined to yield SPTP 2-distance cyclic traversal, thus yielding (1, n - 1, 3, n - 3, ..., n - 2, 2, n). As can evidently be seen, a 2-distance TSP cyclic traversal yields SPTP indeed.

We next generalize Lemma 2 for any k-distance cost TSP.

**Theorem 1.** Let f satisfy the conditions of Lemma 1,  $0 \le r_1 < r_2 < \cdots < r_{n-1} < r_n$  be real numbers and n, m and k positive integers satisfying n = mk. An optimal permutation  $\pi^* \in \Pi$  minimizing

$$F(\pi) = \sum_{j=1}^{n} f(r_{\pi^{-1}(j)}, r_{\pi^{-1}(j+k)})$$
(2.4)

satisfies the following:

1. It divides  $\{r_1, r_2, \ldots, r_n\}$  into k subsets  $B_i = \{r_{(i-1)m+1}, r_{(i-1)m+2}, \ldots, r_{im}\}, 1 \le i \le k$ . 2. The elements of each of  $B_i$  are ordered by 1-distance SPTP. We denote their corresponding permutations by  $\pi_i^*$ .

**Proof.** Since n = mk and the cost is calculated for elements at k-distance of each other, every k contiguous elements in the tour must belong to distinct sets and k sub-tours result in. It follows therefore that any tour must divide the elements into k sub-tours comprising sets  $A_i$ ,  $1 \le i \le k$ , satisfying  $|A_i| = m$ . Let  $A_i = \{r_{j_{(i-1)m+1}}, r_{j_{(i-1)m+2}}, \dots, r_{j_{im}}\}$  be a sets of m elements in a sub-tour ordered by  $\pi_i$ . Since the elements of  $A_i s$  are not interacting with each other it follows that  $F(\pi) = \sum_{i=1}^k F(\pi_i)$ . Each  $\pi_i$  must therefore be 1-distance SPTP as otherwise  $F(\pi)$  was not minimal; hence 2 follows.

To prove that  $A_i = B_i$ ,  $1 \le i \le k$ , assume in contrary that there is  $A_l \ne B_l$  for some  $1 \le l < k$  and let l be the smallest index where such inequality holds. By its very definition  $A_l \cap \bigcup_{j=1}^{l-1} B_j = \emptyset$ , and  $A_l \cap \bigcup_{j=l+1}^{k} B_j \ne \emptyset$ . It follows from 2 that  $A_l$ is ordered such that  $\pi_l$  is a 1-distance SPTP. The elements of  $A_l \cap \bigcup_{j=l+1}^k B_j$  are therefore evenly centered in  $\pi_l$ , while those of  $A_l \cap B_l$  are evenly tailed in  $\pi_l$ . This is illustrated in Fig. 6a, where the red elements belong to  $\bigcup_{j=l+1}^{k} B_j$  and the blue ones to  $B_l$ .



**Fig. 6a.**  $A_l$  is the first set not equal to  $B_l$ . The red elements belong to  $\bigcup_{j=l+1}^k B_j$  and are centered in  $A_l$  by  $\pi_l$ , which is SPTP. The blue elements belong to  $A_l \cap B_l$  and are tailed in  $A_l$  by  $\pi_l$ . The smallest element of  $B_l - A_l$  is centered in the valley of  $A_q$ . (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)



**Fig. 6b.** The new sets  $A'_l$  and  $A'_q$ , and the corresponding permutations  $\pi'_l$  and  $\pi'_q$  obtained from  $A_l$  and  $A_q$  by exchanging sequences.

Let  $r_{(l-1)m+r} = a$ , be the largest element in  $A_l \cap B_l$  as shown in Fig. 6a. Clearly,  $r_{(l-1)m+r+1} = v \in B_l - A_l$  is the smallest among the elements of  $\bigcup_{j=l+1}^k A_j$  and it must be an end element in some  $A_q$ ,  $l + 1 \le q \le k$  (or equivalently, the central element of the valley in a cyclic viewing of SPTP), as shown in Fig. 6a. Let *b* be the succeeding element of *a* in  $A_l$  and let *u* be the preceding element of *v* in  $A_q$ . By SPTP definition *u* resides in the opposite end of  $A_q$ , as shown in Fig. 6a, (or centered in the valley adjacently to *u*, if  $A_q$  is viewed cyclically). Notice that the largest element in  $A_q$ , positioned at its center, must belong to  $\bigcup_{j=l+1}^k B_j$ , hence it is red colored. The rest elements however can be any of  $\bigcup_{j=l}^k B_j$ , hence are colored by a mix of blue and red. In this order those elements of  $\bigcup_{j=l+1}^k B_j$  are centered, while those of  $B_l$  are tailed. There exists a < u and b > v. Consider an ordered pair of elements  $(r, t), r \in A_l, t \in A_q$ . There exists a nonempty sequence of pairs  $(r_i, t_i), 1 \le i \le s < m$ , starting at  $(r_1, t_1) = (b, v)$  and terminating at  $(r_s, t_s) = (c, w)$ , such that  $r_i > t_i$ , while  $r_{s+1} < t_{s+1}$ . Let  $(r_{s+1}, t_{s+1}) = (d, x)$ , namely, x > d.

We now modify  $A_l$  and  $A_q$  into  $A'_l$  and  $A'_q$ , respectively, by exchanging the above defined *s*-length sequences  $\langle b, \ldots, c \rangle \in A_l$ and  $\langle v, \ldots, w \rangle \in A_q$ , while preserving the *m*-size of  $A'_l$  and  $A'_q$ , as shown in Fig. 6b. We denote the new permutations thus obtained by  $\pi'_l$  and  $\pi'_q$ , respectively. Comparison of the costs of  $\pi$  and  $\pi'$  yields

$$F(\pi) - F(\pi') = [F(\pi_l) + F(\pi_q)] - [F(\pi_l') + F(\pi_q')]$$
  
=  $[f(a, b) + f(c, d) + f(u, v) + f(w, x)] - [f(a, v) + f(w, d) + f(u, b) + f(c, x)]$   
=  $\underbrace{[f(a, b) + f(v, u)] - [f(a, v) + f(u, b)]}_{(1)} + \underbrace{[f(c, d) + f(w, x)] - [f(c, x) + f(w, d)]}_{(2)} > 0.$  (2.5)

Inequality (2.5) follows from expression (1) which is positive due to max{a, v} < min{b, u}, so the left hand side of inequality (2.1) applies. Expression (2) is also positive since c > w, x > d and d > c, so x > d > c > w holds and the right hand side of inequality (2.1) applies. Inequality (2.5) means that  $\pi$  is not optimal, which is the outcome of the assumption that  $A_l \neq B_l$ . Hence equality must exist and the theorem follows.  $\Box$ 

Analogous comments as in Lemma 1 follow. If  $B_i$  is ordered by a permutation  $\pi_i^*$ , any combined permutation  $\pi^*$  must be the outcome of an even interleave of all  $\pi_i^*$ ,  $1 \le i \le k$ , yielding  $2^k m^{k-1}$  equivalent permutations minimizing  $F(\pi)$ defined in (2.4). Those are obtained by the m = n/k relative shifts of  $B_i$  with respect to the rest k - 1 subsets and

mirroring (reverting) the order of elements in any of  $\pi_i^*$ ,  $1 \le i \le k$ . In case that k is not a divisor of n there are no subtours and the k-distance TSP traversal results in a single cycle consuming all the n elements. The order of the elements must be such that SPTP results in. Consider for example the case n = 16 and k = 5. The arrangement yielding SPTP is  $\langle 1, 6, 12, 15, 9, 3, 4, 10, 16, 11, 5, 2, 8, 14, 13, 7 \rangle$ .

#### 3. The QAP as a sum of *j*-distance TSPs

In this section we consider a weighted sum of *j*-distance TSPs. Let  $\alpha_j \ge 0$ ,  $1 \le j \le k < n$ , be nonnegative real numbers and consider the cost

$$F(\pi) = \sum_{j=1}^{\kappa} \alpha_j \sum_{i=1}^{n} f\left(r_{\pi^{-1}(i)}, r_{\pi^{-1}(i+j)}\right),$$
(3.1)

which generalizes (1.4). The expression in (3.1) is obtained by QAP comprising a  $n \times n$  cost matrix  $\mathbf{C} = (c_{ij})$  defined by f, and a  $n \times n$  distance matrix  $\mathbf{D} = (d_{ij})$  defined by the weights  $\alpha_j$ . In this setting  $\mathbf{D}$  is a Toeplitz matrix satisfying  $d_{ij} = \alpha_{j-i}$  if  $i < j \le i + k$ ,  $d_{ij} = \alpha_{n-i+j}$  if  $j < i \le j + k$  and  $d_{ii} = 0$ . We will show subsequently that (3.1) implies a well-solvable QAP.

Theorem 1 showed that when the cost matrix **C** is derived from a function f(x, y) satisfying the conditions of Lemma 1, a *k*-distance SPTP solves the *k*-distance TSP. It is not intuitively obvious what permutation solves the special QAP given by a weighted sum of *j*-distance TSPs,  $1 \le j \le k$ . In the following we will show that the 1-distance TSP is dominating all the other *j*-distance TSPs, and the optimal solution for their sum is always that of 1-distance SPTP. As a first step we consider the special QAP obtained by summing *j*-distance TSPs for  $\alpha_j = 1$ ,  $1 \le j \le k$ , where the indices are considered cyclically

$$F(\pi) = \sum_{j=1}^{k} \sum_{i=1}^{n} f\left(r_{\pi^{-1}(i)}, r_{\pi^{-1}(i+j)}\right).$$
(3.2)

We will then discuss the case of  $\alpha_j \ge 0$ ,  $1 \le j < n$ . Though the case  $\alpha_j = 1$  can be derived as a special case of [3,7], its proof of being well-solvable by 1-distance SPTP is far simpler and more intuitive.

**Theorem 2.** Let f satisfy the conditions of Lemma 1. The special QAP given in (3.2) defined by summing j-distance TSP costs,  $1 \le j \le k$ , is well-solvable and its solution is 1-distance SPTP, the solution of an ordinary well-solvable TSP.

**Proof.** Let *k* be a fixed number. The proof follows by induction on *n*. For the basis we use n = k. Since the maximal distance between two elements in a problem having *k* elements is k - 1, the cost consists of the function evaluated for all possible pairs of elements. Since all the off-diagonal elements of **D** are equal to 1, all the permutations yield the same cost, and 1-distance SPTP in particular is optimal.

Assume by induction that 1-distance SPTP is minimizing the cost of *m*-size QAP,  $k \le m$ , given as a sum of *j*-distance TSPs,  $1 \le j \le k$ , and consider a problem of m + 1 elements. Assume w.l.o.g that  $r_1 < r_2 < \cdots < r_{m+1}$ . Let  $\pi$  be a permutation of  $\{1, \ldots, m+1\}$  resulting in the sequence  $\langle j_1, \ldots, j_{m+1} \rangle$ . The cost of  $\pi$  is given by:

$$F(\pi) = \sum_{j=1}^{k} \sum_{i=1}^{m+1} f\left(r_{\pi^{-1}(i)}, r_{\pi^{-1}(i+j)}\right).$$
(3.3)

Let  $\pi^*$  be 1-distance SPTP. We will show subsequently that  $\pi^*$  is a lower bound of (3.3). Let  $l = \pi(1)$ . The proof follows by decomposing (3.3) into two terms  $F(\pi) = G(\pi) + H(\pi)$  and then showing that  $\pi^*$  is a lower bound of both G and H.

The term  $G(\pi)$  is obtained by excluding  $r_1(\stackrel{\Delta}{=} r_{\pi^{-1}(l)} \stackrel{\Delta}{=} r_{j_l})$ , so  $\pi$  is restricted to  $\{2, \ldots, m+1\}$ . The term  $H(\pi)$  compensates  $G(\pi)$  to satisfy (3.3) equality. Since we consider distances from 1 to  $k, H(\pi)$  exactly involves the 2k + 1 elements  $r_{j_{l-k}}, r_{j_{l-k+1}}, \ldots, r_{j_{l-1}}, r_1, r_{j_{l+1}}, \ldots, r_{j_{l+k-1}}, r_{j_{l+k-1}}$ . We similarly decompose the cost of  $\pi^*$  into  $F(\pi^*) = G(\pi^*) + H(\pi^*)$ . The smallest element  $r_1$  is centered in the valley of  $\pi^*$ ; hence its exclusion leaves  $\pi^*$  being SPTP. Since the *m* elements  $\{r_2, \ldots, r_{m+1}\}$  are involved in both  $G(\pi)$  and  $G(\pi^*)$ , it follows from the induction hypothesis that  $G(\pi^*) \leq G(\pi)$ .

It remains to show that  $H(\pi^*) \leq H(\pi)$ . The term  $H(\pi)$  is defined in (3.4) below and consists of two expressions. The first one is the result of  $r_1$  and its k left and k right adjacent elements, accounting for the costs in  $F(\pi)$  which have been disappeared from  $G(\pi)$  due to the exclusion of  $r_1$ . The second expression is the result of the new costs appearing in  $G(\pi)$  which has not been existed originally in  $F(\pi)$ . The new costs involve all the pairs of elements enclosing  $r_1$  positioned originally by  $\pi$  at (k + 1)-distance, but turned to be at k-distance in  $G(\pi)$  due to  $r_1$  exclusion.

$$H(\pi) = \underbrace{\sum_{i=1}^{k} \left[ f(r_{\pi^{-1}(l-i)}, r_1) + f(r_1, r_{\pi^{-1}(l+i)}) \right]}_{(1)} - \underbrace{\sum_{i=k}^{1} f\left( r_{\pi^{-1}(l-i)}, r_{\pi^{-1}(l-i+k+1)} \right)}_{(2)}.$$
(3.4)

Since  $H(\pi)$  involves  $r_1$  and additional 2k unknown elements, we may consider  $H(\pi)$  as a function  $H(r_{\pi^{-1}(l-k)}, \ldots, r_{\pi^{-1}(l-1)}, r_1, r_{\pi^{-1}(l+1)}, \ldots, r_{\pi^{-1}(l+k)})$  of 2k variables and seek for those among  $\{r_2, \ldots, r_m, r_{m+1}\}$  minimizing H. Notice that

the selection of those 2k elements does not affect  $G(\pi)$  since its evaluation involves all the *m* elements  $\{r_2, \ldots, r_{m+1}\}$  and its minimization is therefore only a matter of  $\pi$ . We claim that the 2k elements must be selected to be the smallest ones of  $\{r_2, \ldots, r_{m+1}\}$ , which are  $\{r_2, \ldots, r_{2k+1}\}$ . This follows from *H* being monotonic increasing in each of its variables as shown in the following. Take any element  $r_{\pi^{-1}(l+r)}, -k \leq r \leq k, r \neq 0$ , and let it get any two values  $r'_{\pi^{-1}(l+r)} < r''_{\pi^{-1}(l+r)}$ . Let  $-k \leq q \leq k, q \neq 0$ , be such that  $r_{\pi^{-1}(l+q)}$  is the element positioned by  $\pi$  at distance k + 1 from  $r_{\pi^{-1}(l+r)}$ , namely |r - q| = k + 1. There exists

$$H\left(r_{\pi^{-1}(l-k)},\ldots,r_{\pi^{-1}(l+r)}'',\ldots,r_{\pi^{-1}(l+k)}\right) - H\left(r_{\pi^{-1}(l-k)},\ldots,r_{\pi^{-1}(l+r)}',\ldots,r_{\pi^{-1}(l+k)}\right)$$

$$= \left[f\left(r_{\pi^{-1}(l+r)}',r_{1}\right) - f\left(r_{\pi^{-1}(l+r)}',r_{1}\right)\right] - \left[f\left(r_{\pi^{-1}(l+r)}',r_{\pi^{-1}(l+q)}\right) - f\left(r_{\pi^{-1}(l+r)}',r_{\pi^{-1}(l+q)}\right)\right]$$

$$= \left[f\left(r_{\pi^{-1}(l+r)}',r_{\pi^{-1}(l+q)}\right) + f\left(r_{\pi^{-1}(l+r)}',r_{1}\right)\right] - \left[f\left(r_{\pi^{-1}(l+r)}',r_{\pi^{-1}(l+q)}\right) + f\left(r_{\pi^{-1}(l+r)}',r_{1}\right)\right] > 0. \tag{3.5}$$

This inequality follows since  $r_1$  is the smallest, thus one of the following possibilities must hold:  $r_1 < r'_{\pi^{-1}(l+r)} < r''_{\pi^{-1}(l+r)} < r'_{\pi^{-1}(l+r)} < r'_{\pi^{-1}(l+r)}$ , or  $r_1 < r_{\pi^{-1}(l+q)} < r''_{\pi^{-1}(l+r)} < r''_{\pi^{-1}(l+r)}$ . For any of those Lemma 1 ensures that expression (3.5) is positive, hence H is monotonic increasing in any of its variables. Consequently, the set of 2k elements minimizing H must indeed be  $\{r_2, \ldots, r_{2k+1}\}$ , the smallest among  $\{r_2, \ldots, r_{m+1}\}$ . These indeed are the elements dictated by applying  $\pi^*$  to F and the exclusion of  $r_1$ .

We show now that among all the orders of  $\{r_2, \ldots, r_{2k+1}\}$ , the sequence  $\langle r_{2k+1}, \ldots, r_3, r_2, r_4, \ldots, r_{2k} \rangle$  is minimizing H, which indeed conforms with  $\pi^*$  applied to F and the exclusion of  $r_1$ . Notice that in term (1) of (3.4) the element  $r_1$  interacts with all the others, while they do not interact with each other, and therefore the term is independent of their order. Minimizing requires therefore maximizing term (2) of (3.4). Notice that the function f in expression (2) is evaluated for elements mapped by  $\pi$  so they are at (k + 1)-distance of each other. It necessitates that  $r_2$ , the smallest element among all, will be evaluated together with  $r_{2k+1}$ , the largest element among all. If this was not the case, let  $r_2$  be evaluated with r' and  $r_{2k+1}$  be evaluated with r''. Since  $r_{2k+1} > \max\{r', r''\} > \min\{r', r''\} > r_2$  Lemma 1 ensures that  $f(r_2, r_{2k+1}) + f(r', r'') > f(r_2, r') + f(r_{2k+1}, r'')$ , so by swapping  $r_{2k+1}$  with r', term (2) could be made larger. Similarly, it is necessary that  $r_3$  be at (k + 1)-distance apart of  $r_{2k}$  so they are evaluated together by f. By induction,  $r_{2+j}$  must be at (k + 1)-distance apart of  $r_{2k+1-j}$  for all  $0 \le j \le k - 1$ . Though there are k! possible mappings of  $\{r_2, \ldots, r_{2k+1}\}$  to  $\langle r_{\pi^{-1}(l-k)}, \ldots, r_{\pi^{-1}(l+1)}, r_{\pi^{-1}(l+k)} \rangle$  satisfying the above (k + 1)-distance pairing requirements, the mapping  $\langle r_{2k+1}, \ldots, r_3, r_2, r_4, \ldots, r_{2k} \rangle$  is the only conforming with  $\pi^*$ , which have already been shown to minimize  $G(\pi)$ . To conclude,  $\pi^*$  maximizes the expression (2) in (3.3), hence minimizing  $H(\pi)$ , which completes the proof that  $F(\pi) = G(\pi) + H(\pi) \ge G(\pi^*) + H(\pi^*) = F(\pi^*)$ .  $\Box$ 

Consider now the case of  $\alpha_j \ge 0$ ,  $1 \le j \le k$ . Unlike [3] which required  $\alpha_j$  adhering a benevolent function defined on  $\{1, 2, ..., n\}$ , no such requirement is imposed in our case. On the other hand, the case of monotonic decreasing  $\alpha_j$  in (1.4) conforms to [7] which required **D** being bimonotone to yield well-solvable QAP. We can therefore conclude that the generalization of  $\alpha_j \ge 0$  leaves the 1-distance SPTP optimal.

#### 4. Conclusions and further research

In this paper we showed how several VLSI applications are mapped into well-solvable QAPs possessing SPTP optimal solution, and studied the relations between well-solvable TSPs and QAPs. As 3D VLSI and Network on Chip (NoC) are gaining momentum, the problems of 3D vertical stacking of planar ICs, and mapping computation tasks into the processors of NoC are becoming important.

For NoC we considered an array topology of processors. A mesh topology may be more suitable for IC implementation and it has been lately studied in [2]. The optimal mappings of tasks to processors have been obtained by experiments and simulations, without analysis of their combinatorial structure and properties. The authors are not aware of any metric for measuring distance between processors positioned in a mesh network, such that the resulting distance matrix yields well-solvable QAP. Finding such metric and setting the conditions under which a planar SPTP yields minimum cost of tasks allocation to processors seems to be a challenge. Also, the cost matrix used in this paper assumed independence of processors' activity probabilities. Models incorporating dependences on one hand, but still preserving well-solvability are also of interest.

The vertical ordering of planar ICs in 3D VLSI implied by various design factors is a critical problem not being studied yet. Consideration of electrical, reliability and manufacturability factors may yield other types of objective functions and other distance metrics that will motivate further exploration of well-solvable QAP.

### Acknowledgments

The authors are grateful for the useful comments made by the anonymous referees, which helped in correcting and improving the manuscript.

#### References

- [1] H.B. Bakoglu, Circuits, Interconnections and Packaging for VLSI, Addison-Wesley, 1990.
- [2] R. Beraha, I. Walter, I. Cidon, I. Kolodny, Leveraging application-level requirements in the design of a NoC for a 4G SoC a case study, Conference of Design Automation and Test in Europe (DATE), 2010.
- [3] R.E. Burkard, E. Cela, G. Rote, G.J. Woeginger, The quadratic assignment problem with monotone anti-Monge and symmetric Toeplitz matrix: easy and hard cases, Mathematical Programming 82 (1998) 125–158.
- [4] R.E. Burkard, V.G. Deineko, R. Van Dal, J.A.A. Van Der Veen, G.J. Woeginger, Well-solvable special cases of the traveling salesman problem: a survey, SIAM Review 40 (3) (1998) 496–546.
- [5] J.A. Burns, et al., BA wafer-scale 3-D circuit integration technology, IEEE Transaction on Electron Devices 53 (2006) 2507-2515.
- [6] V. Deineko, B. Klinz, G.J. Woeginger, Four point conditions and exponential neighborhoods for symmetric TSP, SODA'06, January 22–26, Miami, FL.
- [7] V.M. Demidenko, G. Finke, V.S. Gordon, Well solvable cases of the quadratic assignment problem with monotone and bimonotone matrices, Journal of Mathematical Modeling and Algorithms (2006) 167–197.
   [8] P. Gritzmann, M. Ritter, P. Zuber, Optimal wire ordering and spacing in low power semiconductor design, Mathematical Programming 121 (2) (2010)
- [8] P. Gritzmann, M. Ritter, P. Zuber, Optimal wire ordering and spacing in low power semiconductor design, Mathematical Programming 121 (2) (2010) 201–220.
- [9] E.L. Lawler, J.K. Lenstra, Rinnooy Kan, D.B. Shmoys, The Traveling Salesman Problem, Wiley, Chichester, 1985.
- [10] E. Macii, M. Poncino, S. Salerno, Combining wire swapping and spacing for low-power deep-submicron buses, in: Proceedings of the 13th ACM Great Lakes Symposium on VLSI, 2003, pp. 198–202.
- [11] N. Magen, A. Kolodny, U. Weiser, N. Shamir, Interconnect-power dissipation in a microprocessor, in: Proceedings of the 2004 International Workshop on System Level Interconnect Prediction (SLIP), ACM Press, 2004, 7–13.
- [12] K. Moiseev, A. Kolodny, S. Wimer, Timing-aware power-optimal ordering of signals, ACM Transactions on Design Automation of Electronic Systems 13 (4) (2008) Article No. 65.
- [13] K. Moiseev, S. Wimer, A. Kolodny, On optimal ordering of signals in parallel wire bundles, Integration, the VLSI Journal 41 (2) (2008) 253–268.
- [14] V.F. Pavlidis, E.G. Friedman, Interconnect-based design methodologies for three-dimensional integrated circuits, Proceedings of the IEEE 97 (1) (2009) 123-140.
- [15] K. Sundaresan, N.R. Mahapatra, Accurate energy dissipation and thermal modeling for nanometer-scale buses, in: Proceedings of the 11th Int'l Symposium on High-Performance Computer Architecture, 2005.
- [16] F. Supnick, Extreme Hamiltonian lines, Annals of Mathematics 66 (1957) 179-201.
- [17] B.B. Timofeev, A.T. Litvinov, On the extremal value of quadratic form, Kibernetika 4 (1969) 56–61. (in Russian).
- [18] A. Vittal, L.H. Chen, M. Marek-Sadowska, K.P. Wang, S. Yang, Crosstalk in VLSI interconnections, IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems 18 (12) (1999) 1817–1824.
- [19] S. Wimer, S. Michaely, K. Moiseev, A. Kolodny, Optimal bus sizing in migration of processor design, IEEE Transactions on Circuits and Systems-I 53 (5) (2006) 1089–1100.
- [20] S. Wimer, K. Moiseev, A. Kolodny, On VLSI interconnect optimization and linear ordering problem, Optimization and Engineering (2010) doi:10.1007/s11081-010-9128-9.
- [21] G.J. Woeginger, Computational problems without computation, Nieuwarchief 5 (4) (2003) 140-147.
- [22] P. Zuber, O. Bahlous, T. Ilnseher, M. Ritter, W. Stechele, Wire topology optimization for low power CMOS, IEEE Transactions on VLSI Systems 17 (1) (2009) 1–11.