\documentclass{mc}
%%%%% journal  info -- DO NOT CHANGE %%%%%%%%%
\setcounter{page}{497}
\renewcommand\thisnumber{x}
\renewcommand\thisyear {2014}
\renewcommand\thismonth{xxx}
\renewcommand\thisvolume{19}
\renewcommand\datereceived{October 16, 2013}
\renewcommand\dateaccepted{February 11, 2014}
%%%%%%%%   end journal info   %%%%%%%%%%%%%
\usepackage[displaymath,mathlines,pagewise]{lineno}
%Rad zaprimljen dana: 16.10.2013.

%Rad prihvacen dana: 30.09.2014

\usepackage{subfigure}


%%%%% author macros %%%%%%%%%
% place your own macros HERE
%%%%% end %%%%%%%%%

%%%%%%%%%    already loaded packages -- DO NOT RELOAD  %%%%%
% \usepackage{amssymb}
% \usepackage{amsmath,amsthm}
% \usepackage{latexsym}
% \usepackage{amsfonts}
% \usepackage{amsbsyt}
% \usepackage{graphicx}
%%%%%%    end   %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\usepackage{color}
%\usepackage[pdftex]{graphics}
%\usepackage{subfig}
%\usepackage[geometry]{ifsym}

% ALGORITHMS -----------------------------------------------------
%\usepackage[ruled,lined,linesnumbered]{algorithm2e}
%\definecolor{svjetlosiva}{rgb}{0.95,0.95,0.95}
%\SetAlTitleFnt{textsc}
%\SetAlCapFnt{\sc}

%\addtolength{\textwidth}{2cm}   % <--- update
%\addtolength{\hoffset}{-1cm} % <--- update
%\addtolength{\voffset}{-1.1cm}    % <--- update
%\addtolength{\textheight}{2.2cm}  % <--- update

%\newtheorem{proposition}{Proposition}
%\newtheorem{remark}{Remark}

\begin{document}
%\begin{linenumbers}
\markboth{N.\,Bosner}{Balancing three matrices in control theory}
\title{Balancing three matrices in control theory\thanks{This work is supported by the Croatian Ministry  of Science, Education and Sports grant 0372783--2750.}}
\author[N.\,Bosner]{Nela Bosner\affil{1}\comma\corrauth}
\address{\affilnum{1}\ Department of Mathematics, University of Zagreb, Bijeni\v{c}ka cesta 30, HR-10 000 Zagreb, Croatia}
\email{{\tt nela@math.hr} (N.\,Bosner)}

\begin{abstract}
Several problems from control theory are presented which are sensitive to badly scaled matrices. We were specially concerned with the algorithms involving three matrices, thus we extended the Ward's balancing algorithm for two matrices. Numerical experiments confirmed that balancing three matrices can produce an accurate frequency response matrix for descriptor linear systems, it can also improve the solution of the pole assignment problem via state feedback and the determination of the controllable part of the system.
\end{abstract}

\keywords{balancing three matrices, diagonal transformations, numerical stability, frequency response matrix, pole assignment problem, controllable part of the system}

\ams{65F35, 65Y04}

\maketitle

\section{Introduction}
When applied to matrices with a wide range in the magnitude of elements a numerically stable algorithm can nevertheless produce a result with a large error.
The standard attempt to reduce the magnitude range of elements of a matrix $A$ is to scale its rows and columns by multiplication with positive definite diagonal matrices. Such a technique is used prior to solving a linear system $Ax=b$ in \cite{curtisreid}. As emphasized in the introduction of \cite{linpack}, in the absence of any other information, a satisfactory scaling is one in which the absolute errors in the elements are all about the same size. This choice of a scaling strategy makes the condition number meaningful. In case when only rounding errors are introduced, the error in an element is proportional to its size and the scaling forces all elements of $A$ to be about the same size.
This process is called balancing. Furthermore, the balanced system can produce a more accurate result. The same problem is observed when solving the standard eigenvalue problem $Ax=\lambda x$, where inaccuracies in eigenvalues and eigenvectors are reduced by a diagonal similarity transformation introduced by Parlett and Reinsch in \cite{parlettreinsch}. This similarity transformation equilibrates the column and row norms, and reduces the norm of $A$. In the generalized eigenvalue problem $Ax=\lambda Bx$, balancing is enforced on both matrices $A$ and $B$, as proposed by Ward in \cite{ward}, and by Lemonnier and Van Dooren in \cite{lemoniervandooren}. It is important to emphasize here that in most cases balancing will  improve condition numbers, and the balanced problems will produce more accurate results than the original ones. Nevertheless, there are examples where balancing does not improve the condition of the problem, or even worse, it can produce more ill-conditioned problems as illustrated by Watkins \cite{watkins}.

On the other hand, some algorithms are almost invariant under scaling in respect of numerical stability. Such algorithms are, for example, Jacobi algorithms for solving symmetric eigenvalue and singular value problems, see \cite{demmelveselic}. In the case of a positive definite matrix $A$, the condition number for eigenvalues is bounded by $n\cdot \min_{D \text{ is diagonal}}\kappa (DAD)$, and the condition number for singular values is bounded by $\sqrt{n}\cdot \min_{D \text{ is diagonal}}\kappa (AD)$ (see van der Sluis \cite{vandersluis}), showing that the forward errors are invariant under appropriate scalings.

In this paper, we propose efficient algorithms for balancing three matrices and we study how the scaling issues affect computational tasks in computational control. Specially, we are interested in numerical problems related to descriptor systems, which involve three matrices of different dimensions. A descriptor system has the following form
\begin{align}
E\dot{x}(t)&=Ax(t)+Bu(t)\label{descrsyst}\\
y(t)&=Cx(t)+Du(t),\nonumber
\end{align}
where $A,E\in \mathbb{R}^{n\times n}$, $B\in \mathbb{R}^{n\times m}$, $C\in \mathbb{R}^{p\times n}$, $D\in \mathbb{R}^{p\times m}$, and $m\le n$ (usually, it is $m\ll n$). $E$ is often singular. As pointed out by Paige in \cite{paigenumalgcompcontr}, scaling represents a coordinate transformation. When $\tilde{x}=D_{2}^{-1}x$, $\tilde{u}=D_{3}^{-1}u$, and $\tilde{y}=D_{4}y$, for positive definite diagonal matrices $D_{1}$, $D_{2}$, $D_{3}$ and $D_{4}$, then the system
\begin{align*}
D_{1}ED_{2}\dot{\tilde{x}}(t)&=D_{1}AD_{2}\tilde{x}(t)+D_{1}BD_{3}\tilde{u}(t)\\
\tilde{y}(t)&=D_{4}CD_{2}\tilde{x}(t)+D_{4}DD_{3}u(t),
\end{align*}
is equivalent to (\ref{descrsyst}) in the sense of restricted state-space equivalence (i.e., they have the same transfer function). Paige distinguishes two reasons for scaling in control theory. The first one is to choose coordinate transformations and units (this corresponds to diagonal scalings) so that the mathematical problem accurately reflects the sensitivity of the physical problem. The second reason is to minimize the effect of rounding errors on the computed solution, and it is less important. Scaling for numerical stability must not alter physical sensitivity. As an example of scaling in control theory, Paige refers to measuring how far a system with $E=I$, where $I$ is the identity matrix, is from an uncontrollable one. His proposed measure is pessimistic if bounds on model uncertainties are dominated by the uncertainties in just a few elements. In this case, a good scaling is similar to the one for the generalized eigenvalue problem: scale so that the uncertainties in elements of $A$ and $B$ are all of the same order of magnitude. In order to determine controllability of system (\ref{descrsyst}) with a general matrix $E$, the scaling has to include three matrices $A$, $B$ and $E$.

In \cite{ttmh} and \cite{blockmhess}, an efficient algorithm for computing the frequency response matrix
$\mathcal{G}(\sigma )=C(\sigma E-A)^{-1}B+D$
of system (\ref{descrsyst}) is proposed for a large number of shifts $\sigma$. The first part of this algorithm comprises the $m$-Hessenberg--triangular--triangular reduction of matrices $A$, $B$ and $E$ performed by a sequence of orthogonal transformations, and an efficient version of this algorithm is introduced in \cite{ttmh}.
A similar reduction is used for solving the pole assignment problem in descriptor linear systems via state feedback. This problem is very sensitive to input data.

The algorithms of our interest, which are related to the descriptor systems, are based on orthogonal transformations.
Even orthogonal transformations applied to badly scaled matrices can result in adding two numbers with a large difference in the order of magnitude. In floating point arithmetic the influence of the number with a small order of magnitude can be completely lost in the sum. Although the orthogonal transformation always produces a result with a small norm-wise relative error, in case of badly scaled matrices they can severely increase the element-wise error (for QR factorization see \cite[Section 19.4]{higham}), or they can transform a nonsingular matrix into a singular one.
For example, in \cite{powellreid}, Powell and Reid  observed that for the least squares problems, in which the rows of the coefficient matrix vary widely in norm, Householder QR factorization has unsatisfactory backward stability properties. They showed that row and column pivoting give a desirable backward error, and in \cite{coxhigham}, Cox and Higham  proved that sorting the rows by a descending $\infty$-norm at the start gives the same result.

In order to avoid this sort of numerical instability in problems involving three matrices, it is advisable to balance all three matrices. Balancing is performed by diagonal transformations which reduce the difference in orders of magnitude of all elements in these matrices. We will illustrate the need for balancing with an example.
In \cite{paigenumalgcompcontr} (see also \cite{ttmh}), an algorithm that reveals controllability of system (\ref{descrsyst}) with nonsingular $E$ is proposed. The original system is transformed to an equivalent system with a suitable form, and for $m=1$ this form is equivalent to the $m$-Hessenberg--triangular--triangular form.
The algorithm for the $m$-Hessenberg--triangular--triangular reduction first reduces $E$ to the upper triangular form. Then it annihilates one by one element below the diagonal of $B$ with the Givens rotations applied from the left, while simultaneously maintaining the triangular form of $E$ by applying the Givens rotations from the right. When $B$ is done, the algorithm switches to $A$, annihilating elements below the $m$-th subdiagonal. Suppose that $E$ and $B$ are already upper triangular. Let the exact matrices $A$, $B$ and $E$  be defined as follows
$$A=\left[ \begin{array}{ccc}
    \frac{5}{4}  & -\frac{1}{2}  &  \phantom{-}2\\[0.5ex]
    1  &  \phantom{-}\frac{3}{4}  & -\frac{1}{3}\\[0.5ex]
    1  & -\frac{1}{4}  &  \phantom{-}\frac{1}{30}
    \end{array}\right],\qquad B=\left[ \begin{array}{l} \frac{3}{2}\\[0.5ex] 0\\[0.5ex] 0 \end{array}\right],\qquad E=\left[ \begin{array}{ccc} 1&1&1\\[0.5ex] 0&1&1\\[0.5ex] 0&0&\epsilon \end{array}\right],$$
where $\epsilon$ is a small number. This is a typical situation in practice, where the input data represent measured physical quantities. These quantities may be represented in different units, hence the element in matrices may have a wide range in magnitude. Nevertheless, $E$ is a nonsingular matrix. Now, we want to annihilate the element on position $(3,1)$ of $A$ by the Givens rotation $G_{l}$. We obtain
$$G_{l}=\left[ \begin{array}{ccc} 1&0&0\\[0.5ex] 0&\frac{1}{\sqrt{2}}&\frac{1}{\sqrt{2}}\\[0.9ex] 0&-\frac{1}{\sqrt{2}}&\frac{1}{\sqrt{2}} \end{array}\right],\quad A_{1}=G_{l}A=\left[ \begin{array}{lll} \frac{5}{4}  & -\frac{1}{2}  &  \phantom{-}2 \\[0.5ex] \sqrt{2} &\phantom{-}\frac{1}{2\sqrt{2}}  & -\frac{9}{30\sqrt{2}} \\[0.9ex] 0 &-\frac{1}{\sqrt{2}}  &  \phantom{-}\frac{11}{30\sqrt{2}} \end{array}\right],$$
$$E_{1}=G_{l}E=\left[ \begin{array}{ccc} 1&1&1\\[0.5ex] 0&\frac{1}{\sqrt{2}}&\frac{1+\epsilon}{\sqrt{2}}\\[0.9ex] 0&-\frac{1}{\sqrt{2}}&\frac{-1+\epsilon}{\sqrt{2}} \end{array}\right] .$$
When the above computation is performed in finite precision arithmetic with the roundoff error $u$ and when $\epsilon \le u/2$, then the computed matrix $\hat{E}_{1}$ and its elements are of the form
$${\setlength\arraycolsep{2pt} \begin{array}{rcl}\text{fl}\left( \frac{1}{\sqrt{2}}+\frac{\epsilon}{\sqrt{2}}\right) & =&\text{fl}\left(\frac{1}{\sqrt{2}}\right),\\[2ex] \text{fl}\left( -\frac{1}{\sqrt{2}}+\frac{\epsilon}{\sqrt{2}}\right) &=&-\text{fl}\left(\frac{1}{\sqrt{2}}\right),\end{array}}\quad \hat{E}_{1}=\left[ \begin{array}{rrr} 1&1\hspace*{1em}&1\hspace*{1em}\\[2pt] 0&\text{fl}\left(\frac{1}{\sqrt{2}}\right)&\text{fl}\left(\frac{1}{\sqrt{2}}\right)\\[4pt] 0&-\text{fl}\left(\frac{1}{\sqrt{2}}\right)&-\text{fl}\left(\frac{1}{\sqrt{2}}\right) \end{array}\right] ,$$
and $\hat{E}_{1}$ is singular.
In this example, matrix $E$ is badly scaled which caused adding one large and one small number, thus loosing nonsingularity of the orthogonally transformed matrix $E$. Actually, the condition number of $E_{1}$ increased to infinity. If our task was to solve the pole assignment problem, this result would reduce the number of finite poles that can be assigned.
On the other hand, if the balancing algorithm presented in the next section is applied to the matrices $A$, $B$, and $E$, it produces matrices $A_{bal}=D_{l}AD_{r}$, $B_{bal}=D_{l}B$, and $E_{bal}=D_{l}ED_{r}$, with $D_{l}=\text{diag}(1,1,10^{4})$ and $D_{r}=\text{diag}(0.1,1,100)$. Elements which determine the Givens rotation $G_{bal,1}$ that annihilates $A_{bal}(3,1)$ are $(0.1,1000)$, and the updated matrix $\hat{E}_{bal,1}=\text{fl}(G_{bal,1}E_{bal})$ has the form
$$\hat{E}_{bal,1}=\left[ \begin{array}{lll}
{\scriptstyle 1.000000000000000\cdot 10^{-1}}  &  {\scriptstyle \phantom{-}1.000000000000000\cdot 10^{0}}  &  {\scriptstyle \phantom{-}1.000000000000000\cdot 10^{2}}\\
                         0      &  {\scriptstyle \phantom{-}9.999999950000001\cdot 10^{-5}} &  {\scriptstyle \phantom{-}1.000000000551115\cdot 10^{-2}}\\
                         0      & {\scriptstyle -9.999999950000000\cdot 10^{-1}} & {\scriptstyle -9.999999950000000\cdot 10^{1}}
\end{array}\right] .$$
$\hat{E}_{bal,1}$ is now both exactly and numerically nonsingular.

It is important to emphasize here that balancing is always applied to the original input data before any other transformation. Suppose that the numerically singular matrix $E$ presented in the previous example is obtained from a singular matrix $E_{0}$ by some non exact numerical algorithm. Then the reasoning of the rest of the example does not hold for this situation, because balancing is not going to be applied to $E$. The element $E_{3,3}$ is equal to $\varepsilon$ instead of $0$, and represents the error introduced by floating point arithmetic. Balancing in this case would only increase the error by the factor of $10^{6}$. On the other hand, balancing performed on $E_{0}$ would result with a singular matrix $E_{bal,0}$, and the same numerical algorithm applied to $E_{bal,0}$ would again produce a numerically singular matrix.

Besides control theory, there are other examples involving three matrices of the same dimensions. Such an example is the quadratic eigenvalue problem $\lambda^{2}Ax+\lambda Ex+Bx=0$, for $A,B,E\in \mathbb{R}^{n\times n}$. A similar example comes from the structural dynamics engineering problem, where direct frequency analysis leads to the solution of the algebraic linear system $(\sigma^{2}A+\sigma B+C)x=b$ for several values of the frequency-related parameter $\sigma$ (see, for example, \cite{simoncini} and \cite{simonciniperotti}).

\subsection{Balancing two matrices}
There are two different approaches to balancing two $n\times n$ matrices $A=[a_{ij}]$ and $B=[b_{ij}]$. In \cite{lemoniervandooren} , Lemonnier and Van Dooren developed a balancing algorithm suitable for general eigenvalue computing. The main idea is to reduce the relative condition number of the computed eigenvalues by diagonal scaling $D_{l}AD_{r}$ and $D_{l}BD_{r}$. They showed that this is achieved when
$\| D_{l}AD_{r}e_{j}\|_{2}^{2}+\| D_{l}BD_{r}e_{j}\|_{2}^{2}=\| e_{i}^{T}D_{l}AD_{r}\|_{2}^{2}+\| e_{i}^{T}D_{l}BD_{r}\|_{2}^{2}=1$ for all $i$,$j$, and their method is based on a coordinate descent method, which can have slow convergence.
Ward \cite{ward} proposes a different balancing technique which applies to two matrices involved in the generalized eigenvalue problem, but this technique is applicable to other problems with two matrices as well. This technique produces two diagonal matrices $D_{l}$ and $D_{r}$ such that the range of elements in $D_{l}AD_{r}$ and $D_{l}BD_{r}$ is favorably small. The basic idea is forcing the exponents in exponential notation of all nonzero elements in $D_{l}AD_{r}$ and $D_{l}BD_{r}$ to be as close to zero as possible. The diagonal matrices are defined as $D_{l}=\text{diag}(10^{l_{1}},\ldots,10^{l_{n}})$ and $D_{r}=\text{diag}(10^{r_{1}},\ldots ,$ $10^{r_{n}})\in \mathbb{R}^{n\times n}$.
Besides 10, another radix can be used for exponential representation. For example, multiplication with powers of 2 introduces no rounding errors. The elements of the scaled matrices are of the form $(D_{l}AD_{r})_{ij}=10^{l_{i}+r_{j}}a_{ij}$ and $(D_{l}BD_{r})_{ij}=10^{l_{i}+r_{j}}b_{ij}$. The magnitude of an element is represented as the logarithm of its absolute value. In order to reduce the range of magnitude for elements,
the objective
$$\min_{l_{i},r_{j}}\left[ \sum_{\substack{i,j=1\\ a_{ij}\ne 0}}^{n}(l_{i}+r_{j}+\log_{10}|a_{ij}|)^{2}+\sum_{\substack{i,j=1\\ b_{ij}\ne 0}}^{n}(l_{i}+r_{j}+\log_{10}|b_{ij}|)^{2}\right] $$
is minimized using a generalized conjugate gradient method developed by Concus et al. \cite{genconjgrad}. This algorithm is implemented in the LAPACK \cite{lapack} routine \texttt{dggbal}.

\section{Balancing three matrices}
Our main goal is to indicate a need of balancing three matrices in particular problems, and we will illustrate its benefits in Section \ref{numtest}. Since we are going to apply  balancing to several different problems, and since the Lemonnier and Van Dooren's approach is specially tailored for the eigenvalue problem, the Ward's approach seems more suitable for our task. We will just mention here that in \cite{betcke} Betcke  generalized Lemonnier and Van Dooren's method to balancing an arbitrary number of matrices of the same dimensions involved in the polynomial eigenvalue problem.

In the Control and Systems Library SLICOT \cite{slicot}, there already exists the routine \texttt{TG01AD} which can balance three matrices, but it has several drawbacks. It is only a slight modification of the Ward's algorithm. The minimization function takes into account only the element with the largest magnitude in each row of the matrix $B$. On the other hand, the generalized conjugate gradient method in this routine uses the same preconditioner as in the case of two matrices. The preconditioner in  \texttt{dggbal} is a singular matrix, while the system of normal equations in case of three matrices can be nonsingular. Thus, the conjugate gradient method can stop before reaching the minimum of the minimization function. The second drawback of \texttt{TG01AD} is that it can result with unsatisfactory scaling of the matrix $B$, leaving it with a large norm. This is particulary inconvenient for algorithms that include rank revealing, such as the staircase reduction. Both drawbacks are illustrated by examples and presented in Section \ref{numtest}.

Besides stressing the benefits of the balancing algorithms, our intention is to offer a proper and better algorithm for balancing three matrices, which will correct both drawbacks of the routine \texttt{TG01AD} and provide more functionality.

We will extend Ward's approach to balancing three matrices which will change the minimization problem, but the minimization algorithm will remain the same. Our balancing algorithm will produce diagonal matrices $D_{l}$ and $D_{r}$, such that the range of magnitude orders of all elements in the matrices $D_{l}AD_{r}$, $D_{l}ED_{r}$, and $D_{l}B$ is optimally small. Computation of the frequency response matrix $\mathcal{G}(\sigma )$ is invariant under such diagonal transformations. We can optionally balance the matrix $B$ from the right introducing the third diagonal matrix $D_{B}$.

After introducing abbreviations $l=(l_{1},\ldots ,l_{n})$ and $r=(r_{1},\ldots ,r_{n})$, our problem of balancing matrices $A=[a_{ij}]$, $B=[b_{ij}]$, and $E=[e_{ij}]$ is equivalent to the minimization problem
\begin{equation}\label{minprobs}
\min_{l,r\in \mathbb{R}^{n}}\phi (l,r),
\end{equation}
{\footnotesize $$
\phi (l,r)=\sum_{i=1}^{n}\left[ \sum_{\substack{j=1\\ a_{ij}\ne 0}}^{n}(l_{i}+r_{j}+\log_{10}|a_{ij}|)^{2}+\sum_{\substack{j=1\\ e_{ij}\ne 0}}^{n}(l_{i}+r_{j}+\log_{10}|e_{ij}|)^{2}+\sum_{\substack{j=1\\ b_{ij}\ne 0}}^{m}(l_{i}+\log_{10}|b_{ij}|)^{2}\right] .\nonumber
$$}
To find a solution of the minimization problem (\ref{minprobs}),
we differentiate the function $\phi (l,r)$ and equalize its gradient with zero, as in \cite{ward}. After obtaining minimizing $l_{\min}$ and $r_{\min}$, if integers in $l$ and $r$ are required, they can be retrieved by the rounding operation.
Further, $\nabla \phi (l,r)=0$ results with a linear system $Lx=p$ which has the following form
\begin{equation}\label{linsyss}
L=\left[ \begin{array}{cc} F_{1}&G\\ G^{T}&F_{2} \end{array}\right] ,\quad p=\left[ \begin{array}{c} -c\\ -d \end{array}\right] ,\quad x=\left[ \begin{array}{c} l\\ r \end{array}\right] ,
\end{equation}
where
\begin{itemize}
\item $F_{1}\in \mathbb{R}^{n\times n}$ is a diagonal matrix $F_{1}=\text{diag}(n_{r_{1}},\ldots ,n_{r_{n}})$ and
$$n_{r_{i}}={\displaystyle \sum_{\substack{j=1\\ a_{ij}\ne 0}}^{n}1+\sum_{\substack{j=1\\ e_{ij}\ne 0}}^{n}1+\sum_{\substack{j=1\\ b_{ij}\ne 0}}^{m}1}$$
is the total number of nonzero elements in the $i$-th rows of $A$, $B$, and $E$,
\item $F_{2}\in \mathbb{R}^{n\times n}$ is a diagonal matrix $F_{2}=\text{diag}(n_{c_{1}},\ldots ,n_{c_{n}})$ and
$$n_{c_{j}}={\displaystyle \sum_{\substack{i=1\\ a_{ij}\ne 0}}^{n}1+\sum_{\substack{i=1\\ e_{ij}\ne 0}}^{n}1}$$
is the total number of nonzero elements in the $j$-th columns of $A$ and $E$,
\item $G=[g_{ij}]\in \mathbb{R}^{n\times n}$ is the sum of the incidence matrices of $A$ and $E$, i.e.,
$$g_{ij}=\left\{ \begin{array}{ll} 1,&\ \text{if }a_{ij}\ne 0\\ 0,&\ \text{if }a_{ij}=0 \end{array}\right\} +\left\{ \begin{array}{ll} 1,&\ \text{if }e_{ij}\ne 0\\ 0,&\ \text{if }e_{ij}=0 \end{array}\right\},$$
\item $c=[c_{i}]\in \mathbb{R}^{n}$ has elements
$$c_{i}={\displaystyle \sum_{\substack{j=1\\ a_{ij}\ne 0}}^{n}\log_{10}|a_{ij}|+\sum_{\substack{j=1\\ e_{ij}\ne 0}}^{n}\log_{10}|e_{ij}|+\sum_{\substack{j=1\\ b_{ij}\ne 0}}^{m}\log_{10}|b_{ij}|},$$
\item $d=[d_{j}]\in \mathbb{R}^{n}$ has elements
$$d_{j}={\displaystyle \sum_{\substack{i=1\\ a_{ij}\ne 0}}^{n}\log_{10}|a_{ij}|+\sum_{\substack{i=1\\ e_{ij}\ne 0}}^{n}\log_{10}|e_{ij}|}.$$
\end{itemize}
In the special case when the matrices $A$, $B$, and $E$ contain only nonzero elements, the system matrix in (\ref{linsyss}) reduces to
\begin{equation}\label{Ms}
M=\left[ \begin{array}{cc} (2n+m)I_{n}&\ \ 2e_{n}e_{n}^{T}\\ 2e_{n}e_{n}^{T}&2nI_{n} \end{array}\right] ,
\end{equation}
where $I_{n}\in \mathbb{R}^{n\times n}$ is the identity matrix and $e_{n}=[\begin{array}{ccc} 1&\ldots &1 \end{array}]^{T}\in \mathbb{R}^{n}$.

Minimization problem (\ref{minprobs}) is in fact a linear least squares problem, and (\ref{linsyss}) represent its system of normal equations. We know that the matrix of normal equations is positive semidefinite, and that the system is consistent (see \cite{golubvanloan}, \cite{bjorck}). The system matrix $L$ of system (\ref{linsyss}) is symmetric positive semidefinite or positive definite. The matrix $M$ defined by (\ref{Ms}) is symmetric positive definite, and its inverse is equal to
\begin{equation}\label{Minvs}
M^{-1}=\left[ \begin{array}{cc}
\frac{1}{2n+m}I_{n}+\frac{2}{(2n+m)m}e_{n}e_{n}^{T}&-\frac{1}{nm}e_{n}e_{n}^{T}\\[2ex]
-\frac{1}{nm}e_{n}e_{n}^{T}&\frac{1}{2n}I_{n}+\frac{1}{nm}e_{n}e_{n}^{T}
\end{array}\right] .
\end{equation}

Now we can conclude that there is a solution to the equation $\nabla \phi (l,r)=0$, and the Hessian $\text{Hess}(\phi) =2L$ is positive semidefinite, so this solution is indeed the global minimum of the function $\phi$. Since we are dealing here with a symmetric positive semidefinite consistent system, we can apply the conjugate gradient method for obtaining its solution (see \cite{hestenesstiefel}). The conjugate gradient method is an iterative method specially suited for positive (semi-) definite matrices with a structure (such as $L$), where the matrix--vector product is efficiently implemented. For our purpose most convenient is the usage of a preconditioned conjugate gradient method with $M$ as the preconditioner (see, for example, \cite{greenbaum}), presented in Algorithm \ref{alggenconjgrad}.
The solution of a system with the matrix $M$ can be found directly, since we know the explicit expression for $M^{-1}$, and the product $M^{-1}v$ with a vector $v$ requires only $O(n)$ operations. The convergence rate of this method depends on the number of different eigenvalues of $M^{-1}L$ (see \cite{greenbaum}). For $L=M$, the method converges in only 1 iteration.

\begin{algorithm}[Preconditioned conjugate gradient method for the system $Lx=p$.]\label{alggenconjgrad}
\hspace*{1em}\\[-7ex]
\begin{tabbing}
xxxxxxxxxx\=  \kill
\textbf{\scriptsize (1)} $x_{0}=0$;\\
\textbf{\scriptsize (2)} $r_{0}=p$;\\
\textbf{\scriptsize (3)} solve $Mz_{0}=r_{0}$;\\
\textbf{\scriptsize (4)} $s_{0}=z_{0}$;\\
\textbf{\scriptsize (5)} \textbf{for} $k=1,2,\ldots$\\
\textbf{\scriptsize (6)}\> $\alpha_{k-1}=\frac{z_{k-1}^{T}r_{k-1}}{s_{k-1}^{T}Ls_{k-1}}$;\\
\textbf{\scriptsize (7)}\> $x_{k}=x_{k-1}+\alpha_{k-1}s_{k-1}$;\\
\textbf{\scriptsize (8)}\> $r_{k}=r_{k-1}-\alpha_{k-1}Ls_{k-1}$;\\
\textbf{\scriptsize (9)}\> solve $Mz_{k}=r_{k}$;\\
\textbf{\scriptsize (10)}\> $\beta_{k}=\frac{z_{k}^{T}r_{k}}{z_{k-1}^{T}r_{k-1}}$;\\
\textbf{\scriptsize (11)}\> $s_{k}=z_{k}+\beta_{k}s_{k-1}$;\\
\textbf{\scriptsize (12)} \textbf{end}
\end{tabbing}
\end{algorithm}

\subsection{Increasing the weight of $B$}
Since the matrix $B$ has usually less elements than $A$ and $E$, its influence on the minimization process is weaker. This can give an unsatisfactory result for $B$, where resulting $A$ and $E$ are balanced  much better than $B$. This is not an issue when balancing involves matrices of the same dimensions as in \cite{ward} and \cite{lemoniervandooren}. Therefore, we introduce here the weighted minimization function. To increase its influence, we can multiply the part in $\phi$ involving the matrix $B$ with stronger weight. The matrices $A$ and $E$ have $n^{2}$ elements, and $B$ only $mn$, thus the logical choice for weight is $\frac{n}{m}$.
In this case only the definition of the matrix $F_{1}$ and the vector $c$ in (\ref{linsyss}) are changed to
\begin{itemize}
\item $F_{1}\in \mathbb{R}^{n\times n}$ is a diagonal matrix $F_{1}=\text{diag}(n_{r_{1}},\ldots ,n_{r_{n}})$ and
$$n_{r_{i}}={\displaystyle \sum_{\substack{j=1\\ a_{ij}\ne 0}}^{n}1+\sum_{\substack{j=1\\ e_{ij}\ne 0}}^{n}1+\frac{n}{m}\sum_{\substack{j=1\\ b_{ij}\ne 0}}^{m}1},$$
\item $c=[c_{i}]\in \mathbb{R}^{n}$ has elements
$$c_{i}={\displaystyle \sum_{\substack{j=1\\ a_{ij}\ne 0}}^{n}\log_{10}|a_{ij}|+\sum_{\substack{j=1\\ e_{ij}\ne 0}}^{n}\log_{10}|e_{ij}|+\frac{n}{m}\sum_{\substack{j=1\\ b_{ij}\ne 0}}^{m}\log_{10}|b_{ij}|}.$$
\end{itemize}
In case when the matrices $A$, $B$, and $E$ contain only nonzero elements, the system matrix in (\ref{linsyss}) reduces to
\begin{equation}\label{Mw}
M=\left[ \begin{array}{cc} 3nI_{n}&2e_{n}e_{n}^{T}\\ 2e_{n}e_{n}^{T}&2nI_{n} \end{array}\right] .
\end{equation}
The matrix $M$ is positive definite, and its inverse is equal to
\begin{equation}\label{Minvw}
M^{-1}=\left[ \begin{array}{cc}
\frac{1}{3n}I_{n}+\frac{2}{3n^{2}}e_{n}e_{n}^{T}&-\frac{1}{n^{2}}e_{n}e_{n}^{T}\\[2ex]
-\frac{1}{n^{2}}e_{n}e_{n}^{T}&\frac{1}{2n}I_{n}+\frac{1}{n^{2}}e_{n}e_{n}^{T}
\end{array}\right] .
\end{equation}

\subsection{Balancing $B$ from both sides}
This variant of the balancing algorithm produces diagonal matrices $D_{l}$, $D_{r}$ and $D_{B}$ such that the range of magnitude orders of all elements in the matrices $D_{l}AD_{r}$, $D_{l}ED_{r}$, and $D_{l}BD_{B}$ is optimally small.

Let us denote $D_{B}=\text{diag}(10^{q_{1}},\ldots ,10^{q_{m}})\in \mathbb{R}^{m\times m}$ and $q=(q_{1},\ldots ,q_{m})$. The minimization problem is a bit more complicated than before:
\begin{equation}\label{minprobr}
\min_{l,r,q}\phi (l,r,q),
\end{equation}
{\footnotesize $$
\phi (l,r,q)=\sum_{i=1}^{n}\left[ \sum_{\substack{j=1\\ a_{ij}\ne 0}}^{n}(l_{i}+r_{j}+\log_{10}|a_{ij}|)^{2}\! +\!\!\! \sum_{\substack{j=1\\ e_{ij}\ne 0}}^{n}(l_{i}+r_{j}+\log_{10}|e_{ij}|)^{2}\! +\!\!\! \sum_{\substack{j=1\\ b_{ij}\ne 0}}^{m}(l_{i}+q_{j}+\log_{10}|b_{ij}|)^{2}\right] .\nonumber
$$}
Equalizing $\nabla \phi (l,r,q) =0$ produces a linear system $Lx=p$ with the following form
\begin{equation}\label{linsysr}
\left[ \begin{array}{ccc} F_{1}&G&K\\ G^{T}&F_{2}&0\\ K^{T}&0&F_{3} \end{array}\right] \left[ \begin{array}{c} l\\ r\\ q \end{array}\right] =\left[ \begin{array}{c} -c\\ -d\\ -f \end{array}\right] ,
\end{equation}
where $F_{1}$, $F_{2}$, $G$, $c$ and $d$ have the same form as in the original version of balancing, and
\begin{itemize}
\item $F_{3}\in \mathbb{R}^{m\times m}$ is a diagonal matrix $F_{3}=\text{diag}(n_{cb_{1}},\ldots ,n_{cb_{m}})$ and
$$n_{cb_{j}}={\displaystyle \sum_{\substack{i=1\\ b_{ij}\ne 0}}^{n}1}$$
is the total number of nonzero elements in the $j$-th column of $B$,
\item $K=[k_{ij}]\in \mathbb{R}^{n\times m}$ is the incidence matrix of $B$, i.e.,
$$k_{ij}=\left\{ \begin{array}{ll} 1,&\ \text{if }b_{ij}\ne 0\\ 0,&\ \text{if }b_{ij}=0 \end{array}\right\},$$
\item $f=[f_{j}]\in \mathbb{R}^{m}$ has elements
$$f_{j}={\displaystyle \sum_{\substack{i=1\\ b_{ij}\ne 0}}^{n}\log_{10}|b_{ij}|}.$$
\end{itemize}
In the special case, when the matrices $A$, $B$, and $E$ contain only nonzero elements, the system matrix in (\ref{linsysr}) reduces to
\begin{equation}\label{Mr}
M=\left[ \begin{array}{ccc} (2n+m)I_{n}&2e_{n}e_{n}^{T}&e_{n}e_{m}^{T}\\ 2e_{n}e_{n}^{T}&2nI_{n}&0\\ e_{m}e_{n}^{T}&0&nI_{m} \end{array}\right] ,
\end{equation}
where $I_{n}\in \mathbb{R}^{n\times n}$ and $I_{m}\in \mathbb{R}^{m\times m}$ are the identity matrices, $e_{n}=[\begin{array}{ccc} 1&\ldots &1 \end{array}]^{T}\in \mathbb{R}^{n}$, and $e_{m}=[\begin{array}{ccc} 1&\ldots &1 \end{array}]^{T}\in \mathbb{R}^{m}$.

There is an alternative form of the minimization function $\phi$ in (\ref{minprobr}), where the part involving the matrix $B$ has stronger weight, as in the previous subsection.

\begin{proposition}\label{proppossemidefconssysr}
\begin{itemize}
\item[(i)] The matrix $M\in \mathbb{R}^{(2n+m)\times (2n+m)}$ defined in (\ref{Mr}) is symmetric positive semidefinite of rank $2n+m-1$, and its Moore--Penrose generalized inverse is
\begin{equation}\label{Minvr}
M^{\dag}\! =\! \left[ \begin{array}{ccc}
\frac{1}{2n+m}I_{n}-\frac{3}{2(2n+m)^{2}}e_{n}e_{n}^{T}&\frac{n-m}{2n(2n+m)^{2}}e_{n}e_{n}^{T}&\frac{3}{2(2n+m)^{2}}e_{n}e_{m}^{T}\\[2ex]
\frac{n-m}{2n(2n+m)^{2}}e_{n}e_{n}^{T}&\frac{1}{2n}I_{n}-\frac{3}{2(2n+m)^{2}}e_{n}e_{n}^{T}&\frac{-5n-m}{2n(2n+m)^{2}}e_{n}e_{m}^{T}\\[2ex]
\frac{3}{2(2n+m)^{2}}e_{m}e_{n}^{T}&\frac{-5n-m}{2n(2n+m)^{2}}e_{m}e_{n}^{T}&\frac{1}{n}I_{m}+\frac{-7n-2m}{2n(2n+m)^{2}}e_{m}e_{m}^{T}
\end{array}\right] .
\end{equation}
\item[(ii)] In case when $L\ne M$, the following holds: $\operatorname{Ker}(L)^{\perp}\subset \operatorname{Ker}(M)^{\perp}$.
\end{itemize}
\end{proposition}

\begin{proof}
\begin{itemize}
\item[(i)] A direct verification of the Moore--Penrose conditions proves the statement about $M^{\dag}$. It can be easily verified that the vector
$$u_{0}=\frac{1}{\sqrt{2n+m}}\left[ \begin{array}{r} e_{n}\\ -e_{n}\\ -e_{m} \end{array}\right]$$
is a unit eigenvector of $M$ corresponding to the eigenvalue $\lambda_{0}=0$. Further, the following equation holds
$$MM^{\dag}=I-u_{0}u_{0}^{T},$$
thus $\text{Ker}(M)=\text{span}\{ u_{0}\}$ and $\text{Im}(M)=\text{Ker}(M)^{\perp}$ since $M$ is symmetric.
\item[(ii)] Let us compute $Lu_{0}$:
$$Lu_{0}=\frac{1}{\sqrt{2n+m}}\left[ \begin{array}{ccc} F_{1}&G&K\\ G^{T}&F_{2}&0\\ K^{T}&0&F_{3} \end{array}\right] \left[ \begin{array}{r} e_{n}\\ -e_{n}\\ -e_{m} \end{array}\right] =\frac{1}{\sqrt{2n+m}}\left[ \begin{array}{c} F_{1}e_{n}-Ge_{n}-Ke_{m}\\ G^{T}e_{n}-F_{2}e_{n}\\ K^{T}e_{n}-F_{3}e_{m} \end{array}\right] ,$$
whose components are
\begin{align*}
(Lu_{0})(i)&=\frac{1}{\sqrt{2n+m}}\left( n_{r_{i}}-\left( \sum_{\substack{j=1\\ a_{ij}\ne 0}}^{n}1+\sum_{\substack{j=1\\ e_{ij}\ne 0}}^{n}1\right) -\left( \sum_{\substack{j=1\\ b_{ij}\ne 0}}^{m}1\right) \right) =0,\\
 &\quad i=1,\ldots ,n\\
(Lu_{0})(n+j)&=\frac{1}{\sqrt{2n+m}}\left( \sum_{\substack{i=1\\ a_{ij}\ne 0}}^{n}1+\sum_{\substack{i=1\\ e_{ij}\ne 0}}^{n}1 -n_{c_{j}}\right) =0,\quad j=1,\ldots ,n\\
(Lu_{0})(2n+j)&=\frac{1}{\sqrt{2n+m}}\left( \sum_{\substack{i=1\\ b_{ij}\ne 0}}^{n}1-n_{cb_{j}}\right) =0,\quad j=1,\ldots ,m
\end{align*}
Hence, $u_{0}\in \text{Ker}(L)$ and $\text{Ker}(M)\subset \text{Ker}(L)$. The statement follows immediately by taking orthogonal complements.
\end{itemize}
\hfill\end{proof}

Part (ii) of Proposition \ref{proppossemidefconssysr} is important for line \textbf{(9)} of Algorithm \ref{alggenconjgrad}. Since the system $Lx=p$ is consistent $r_{k}=p-Lx_{k}\in \text{Im}(L)=\text{Ker}(L)^{\perp}\subset \text{Ker}(M)^{\perp}=\text{Im}(M)$. Thus, the system $Mz_{k}=r_{k}$ is consistent and there exists its solution $z_{k}$. In the SLICOT routine \texttt{TG01AD} this might not be the case. The preconditioner used in this routine
$$M=2\left[ \begin{array}{cc} nI_{n}&e_{n}e_{n}^{T}\\ e_{n}e_{n}^{T}&nI_{n} \end{array}\right] $$
is the same as in \texttt{dggbal} (see \cite{ward}) and is singular. On the other hand, the system matrix $L$ is as in the original version of our algorithm for $m=1$ and can be nonsingular. In some special cases it can happen that $0\ne r_{k}\in \text{Ker}(M)$, producing $z_{k}=0$ and the algorithm will prematurely stop since $x_{k+1}=x_{k}$. Such an example is presented in Section \ref{numtest}.

In the cases of the quadratic eigenvalue problem $\lambda^{2}Ax+\lambda Ex+Bx=0$ and the algebraic linear system $(\sigma^{2}A+\sigma B+C)x=b$, all three matrices are of the same size $A,B,E\in \mathbb{R}^{n\times n}$, and they all have to be balanced from both sides with the same diagonal matrices: $D_{l}AD_{r}$, $D_{l}ED_{r}$ and $D_{l}BD_{r}$. In this case the balancing procedure reduces to the Ward's balancing algorithm described in \cite{ward}, except that the elements of $F_{1}$, $F_{2}$, $G$, $c$ and $d$ equally include elements of all three matrices, and the matrix $M$ is equal to the corresponding matrix in \texttt{dggbal} multiplied by factor of $3/2$.

\section{Numerical tests}\label{numtest}
The tests were executed on the Intel\textregistered{} Core\texttrademark{} Duo CPU under Ubuntu Linux 10.04 (lucid). They were programmed in Fortran,  compiled with Intel\textregistered{} Fortran Compiler 12.0.2, and all variables were in double precision or double complex precision. The balancing algorithms for three matrices are implemented in the routine \texttt{dg3bal}, whose code is available from the author. We denote by: S -- the original variant, W -- the weighted variant, R -- the variant where $B$ is balanced from both sides.

As a verification of functionality of the \texttt{dg3bal} routine, we performed a set of tests with random matrices. The matrices were generated by starting with well scaled random matrices, and then by scaling rows and columns of $A$ and $E$, as well as rows of $B$ with the same diagonal matrices $D_{lr}$. The variant S of the balancing algorithm produced all three balanced matrices with a narrow magnitude range, and balanced matrices $A$ and $E$ are very similar to the matrices produced by the LAPACK routine \texttt{dggbal}. In case when the columns of $B$ are badly scaled, only the variant R of \texttt{dg3bal} algorithm was successful in balancing the matrix $B$. In case when the rows of $B$ are scaled with a different diagonal matrix $D_{lB}$ whose diagonal elements $D_{lB}(i,i)=D_{lr}(i,i)^3$ have a wider range in magnitude than $D_{lr}$, the variant W is slightly better for $B$ than the other two variants. On the other hand, the balanced matrices $A$ and $E$ have more scattered elements in this case than the obtained balanced matrices for the variants S and R. Nevertheless, all three variants of our balancing algorithm balanced the matrix $B$ better than the diagonal matrix obtained from \texttt{dggbal}. If we define range as proportion between the largest and the smallest element in magnitude, then the logarithm of range of the original matrix $B$ in this case is reduced on  average to 68.86\% of its value by \texttt{dggbal}, to 57.80\% of its value by the variant S, and to 44.27\% of its value by the variant $W$.

\subsection{Example 1: \texttt{dg3bal} vs. \texttt{TG01AD}}

Here we show the superiority of our balancing routine \texttt{dg3bal} over the SLICOT routine \texttt{TG01AD}. In the first test rounds, we generated matrices $A$, $B$, and $E$ as in Example 2, but with one example for each choice of diagonal matrices. The only difference is that we changed the number of columns of $B$, taking $m=3,5,8$. The results presented in Figure \ref{fig_dg3bal_TG01AD_1b_B} show that \texttt{dg3bal} balances elements of $B$ better than \texttt{TG01AD}, specially for larger $m$ when $B$ has larger influence in the minimization function of \texttt{dg3bal}. Let us note here that \texttt{TG01AD} cannot balance the columns of $B$, thus we compare it only to the variant S of \texttt{dg3bal}. The variant R would produce a better result than \texttt{TG01AD} in case when $B$ has badly scaled columns.

The second test round comprises one set of matrices $A$, $B$, and $E$, where $n=1000$ and $m=10$. $A$ and $E$ are generated by scaling random matrices with the matrix $D_{lr}=\text{diag}(10,10^{2},10^{3},\ldots ,10^{10},10,10^{2},10^{3},\ldots ,10^{10})$ from both sides, and $B$ is generated by scaling a random matrix with $D_{lB}=\text{diag}(10^{4},10^{8},10^{12},\ldots ,10^{40},10^{4},\linebreak10^{8},$ $10^{12},\ldots ,10^{40})$ from the left. In this case, Figure \ref{fig_dg3bal_TG01AD_3bar_B} displays the maximal magnitude of elements in $B$, which maximally influences $\| B\|_{F}$.


\begin{figure}
\centering
\subfigure[{\em Magnitude ranges of elements for the original matrix and the balanced matrix $B$}]{
\includegraphics[scale=0.41]{fig_ranges_dg3bal_TG01AD_1b_B.eps}\label{fig_dg3bal_TG01AD_1b_B}
}
\subfigure[{\em Maximal magnitude of elements for the original matrix and the balanced matrix $B$}]{
\includegraphics[scale=0.41]{fig_maxmag_dg3bal_TG01AD_3bar_B.eps}\label{fig_dg3bal_TG01AD_3bar_B}
}
\caption{{\rm\em  \texttt{dg3bal} vs. \texttt{TG01AD}: reduction of the magnitude range and the maximal magnitude of elements in $B$} }
\end{figure}





The magnitude of the norm is extremely important for the rank revealing algorithms deployed in the staircase reduction routines, which are used to determine the controllable part of system (\ref{descrsyst}). These are the SLICOT routines \texttt{TG01HD} and \texttt{TG01HX}, and the new staircase reduction algorithm from \cite{ttmh}. The standard tolerance for rank determination is $n^{2}u\sqrt{\| A\|_{F}^{2}+\| B\|_{F}^{2}}$, where $u$ is the unit roundoff error. When the norms of $A$ and $B$ are large, the numerical rank is usually smaller than the exact rank. In extreme cases, it turns out to be equal to zero for nontrivial submatrices. A detailed illustration of sensitivity of the staircase reduction is given in Example 5. Figure \ref{fig_dg3bal_TG01AD_3bar_B} shows that the S and W variants of \texttt{dg3bal} are more successful in norm reduction of $B$ than \texttt{TG01AD}. The maximal magnitude of elements in $A$ and $E$ are of order: $1$ for \texttt{TG01AD}, $10$ for variant S of \texttt{dg3bal}, and $10^{6}$ for variant W of \texttt{dg3bal}.


The last test round in this example is concerned with the problem of the singular preconditioner in the routine \texttt{TG01AD}. Unfortunately, \texttt{TG01AD} has no option to use the conjugate gradient method without preconditioning, or to change the preconditioner itself. Let the matrices $A$, $E$, and $B$ be defined as follows:
$$A=\left[ \begin{array}{lll}
10^{-2}&0&10^{-4}\\
0&10^{-4}&10^{4}\\
10^{-2}&0&10^{-4} \end{array}\right] ,\quad E=\left[ \begin{array}{ccc} 1&0&1\\ 0&1&1\\ 1&0&1 \end{array}\right] ,\quad
B=\left[ \begin{array}{l} 10^{10}\\ 10^{4}\\ 10^{10} \end{array}\right] ,$$
and $\sqrt{\| A\|_{F}^{2}+\| B\|_{F}^{2}}=1.414214\cdot 10^{10}$.
In this case, \texttt{TG01AD} generates the same matrix $L$ and the vector $p$ as the variant S of \texttt{dg3bal}, but $M$ is the same as in \texttt{dggbal}:
$$L=\left[ \begin{array}{cccccc}
     5  &  0  &  0  &  2  &  0  &  2\\
     0  &  5  &  0  &  0  &  2  &  2\\ 0  &  0  &  5  &  2  &  0  &  2\\ 2  &  0  &  2  &  4  &  0  &  0\\
     0  &  2  &  0  &  0  &  2  &  0\\
     2  &  2  &  2  &  0  &  0  &  6 \end{array}\right] ,\quad p=\left[ \begin{array}{r}
     -4\\ -4\\ -4\\ 4\\ 4\\ 4 \end{array}\right] ,\quad
M=\left[ \begin{array}{cccccc}
     6  &  0  &  0  &  2  &  2  &  2\\
     0  &  6  &  0  &  2  &  2  &  2\\
     0  &  0  &  6  &  2  &  2  &  2\\
     2  &  2  &  2  &  6  &  0  &  0\\
     2  &  2  &  2  &  0  &  6  &  0\\
     2  &  2  &  2  &  0  &  0  &  6 \end{array}\right] .$$
It is easy to check that $Mp=M^{\dag}p=0$. The first step of the conjugate gradient method is to compute $z_{0}=M^{\dag}p$. Since $z_{0}=0$, it implies $s_{0}=0$, $\alpha_{0}=0$, and $x_{1}=x_{0}$ which will satisfy the stopping criterion and the routine stops with unchanged data. The problem arises due to $p\in \text{Ker}(M)$. On the other hand, the S variant of \texttt{dg3bal} produces
$$A_{b}=\left[ \begin{array}{lll}
10^{-1}&0&10^{-3}\\
0&10^{-2}&10^{5}\\
10^{-1}&0&10^{-3} \end{array}\right] ,\quad E_{b}=\left[ \begin{array}{ccc} 10&0&10\\ 0&100&10\\ 10&0&10 \end{array}\right] ,\quad
B_{b}=\left[ \begin{array}{l} 10^{2}\\ 10^{-4}\\ 10^{2} \end{array}\right] ,$$
where $\sqrt{\| A_{b}\|_{F}^{2}+\| B_{b}\|_{F}^{2}}=1.000001\cdot 10^{5}$. The original and the balanced matrices are now used as input to the routine \texttt{TG01HD}. For  system (\ref{descrsyst}) defined by the original matrices $A$, $B$, and $E$, the routine returns that the system has 1 finite controllable and 2 finite uncontrollable poles (poles are generalized eigenvalues of the pencil $A-\lambda E$). For the balanced system, it turns out to have 2 finite controllable and 1 infinite uncontrollable pole. Thus, since \texttt{TG01AD} stopped prematurely, it does not change the original matrices, and \texttt{TG01HD} gives wrong answer about the controllable poles. For the balanced system returned by \texttt{dg3bal}, \texttt{TG01HD} returns the correct answer. Since $E$ is obviously singular, there has to be an infinite pole.


\subsection{Example 2: sensitivity of frequency response computation to scaling}

In this example, we demonstrate how badly scaled matrices can produce an inaccurate frequency response matrix.
We started with the descriptor system ($A_{0}$,$B_{0}$,$C_{0}$, $D_{0}$,$E_{0}$), where $A_{0},E_{0}\in \mathbb{R}^{4\times 4}$, $B_{0}\in \mathbb{R}^{4\times 1}$,  $C_{0}\in \mathbb{R}^{1\times 4}$, $D_{0}=0$, and whose pole is placed near $0.4518i$.  Then, we produced badly scaled matrices $E=D_{l,0}E_{0}D_{r,0}$, $A=D_{l,0}A_{0}D_{r,0}$, $B=D_{l,0}B_{0}$, $C=C_{0}D_{r,0}$, and $D=D_{0}$, where $D_{l,0}$ and $D_{r,0}$ are badly scaled diagonal matrices.
We observed three different choices of these diagonal matrices.
For each choice of the diagonal matrices we computed frequency response matrices for the original and the balanced system, where the $m$-Hessenberg--triangular--triangular reduction of $A$, $B$, and $E$ from \cite{ttmh} was the first step in the algorithm, and
$\sigma$ ranged from $10^{-2}i$ up to $10^{2}i$.

\begin{figure}[h!]
\centering
%\subfigure[{\em Magnitude ranges of elements for the original matrix and the balanced matrix $B$}]{
\includegraphics[scale=0.429]{fig_dg3bal6b_1all_bw.eps}
%\subfigure[{\em Maximal magnitude of elements for the original matrix and the balanced matrix $B$}]{
\includegraphics[scale=0.429]{fig_dg3bal6b_2all_bw.eps}

\includegraphics[scale=0.429]{fig_dg3bal6b_3all_bw.eps}\caption{{\rm\em Magnitude of the computed frequency response matrices, for the original and the balanced system: $\mathbf{D_{l,0}=D_{r,0}=\text{diag}(10^{0},10^{3},10^{6},10^{9})}$,  $\mathbf{D_{l,0}=D_{r,0}=\text{diag}(10^{0},10^{4},10^{8},10^{12})}$, and $\mathbf{D_{l,0}=D_{r,0}=\text{diag}(10^{0},10^{6},10^{12},10^{18})}$}}\label{fig_dg3bal6}
\end{figure}



We compared the computed frequency response matrices of the original system and the balanced system obtained by the variant S of the balancing algorithm. The obtained results are illustrated in Figure \ref{fig_dg3bal6}.

As we can see, as the matrices $D_{l,0}$ and $D_{r,0}$ are gradually becoming worse scaled, the produced frequency response matrix is becoming more inaccurate. For the first choice we obtained 6--7 accurate leading digits when compared with the result for the balanced system, while for the second choice there were 3--4 accurate digits.
Specially, for the third choice, the magnitudes of the errors were of the same order as the magnitudes of the results or larger, producing a result with no accurate digit for the original system.

Let us emphasize here that SLICOT has only the routine \texttt{TB05AD} which computes the frequency response matrix  for a system with $E=I$. In that case, balancing can be applied via the LAPACK routine \texttt{dgebal}. \texttt{dgebal} balances only the matrix $A$ by a diagonal similarity transformation, since \texttt{TB05AD} reduces only the matrix $A$ to the Hessenberg form. In MATLAB, the frequency response matrix can be computed for a general descriptor system by the Hessenberg--triangular reduction of matrices $A$ and $E$, implemented in the routine \texttt{freqresp}. The MATLAB routine seems to balance only the matrices $A$ and $E$. The output of \texttt{freqresp} for the given example is indistinguishable from the result of our routine applied to the balanced system. Our algorithm based on the $m$-Hessenberg--triangular--triangular reduction of $A$, $B$ and $E$ is more efficient than the algorithm based only on the Hessenberg--triangular reduction, and this is the reason why we need a balancing algorithm for three matrices.

\subsection{Example 3: sensitivity of the staircase reduction}
As mentioned earlier, the staircase reduction is a tool used to determine the controllable part of system (\ref{descrsyst}), see, for example, \cite{paigenumalgcompcontr} and \cite{vargacompiredgenstspreal}. This problem is numerically very sensitive, since it relies on rank revealing algorithms. We started again with random matrices $A_{0},E_{0}\in \mathbb{R}^{15\times 15}$ and $B_{0}\in \mathbb{R}^{15\times 3}$, and generated three sets of badly scaled matrices $A_{i}=D_{i}A_{0}D_{i}$, $E_{i}=D_{i}E_{0}D_{i}$, $B_{i}=D_{i}B_{0}$, for $i=1,2,3$, such that
\begin{align*}
D_{1}&=\text{diag}(1,10^{2},1,1,10^{4},1,1,10^{6},1,1,10^{8},1,1,10^{9},1),\\
D_{2}&=\text{diag}(1,10^{3},1,1,10^{6},1,1,10^{9},1,1,10^{12},1,1,10^{14},1),\\
D_{3}&=\text{diag}(1,10^{3},1,1,10^{6},1,1,10^{9},1,1,10^{12},1,1,10^{16},1).
\end{align*}
We applied the SLICOT routine \texttt{TG01HD} to all four examples and obtained four different results presented in the following table.
\begin{table}[ht]
\begin{center}
\begin{tabular}{cccc}
\hline
{\small example} & {\small \# of contr. poles }&{\small \# of fin. uncontr. poles }& {\small\# of infin. uncontr. poles}\\
\hline
0 & 15 & 0 & 0 \\
1 & 3 & 9 & 1 \\
2 & 1 & 14 & 0 \\
3 & 0 & 15 & 0 \\
\hline
\end{tabular}
\end{center}
\caption{\rm\em The results of \texttt{TG01HD} applied to four examples}
\end{table}

The result for $A_{0}$, $E_{0}$, and $B_{0}$ is correct, while all the others are incorrect due to large norms of $A_{i}$ and $E_{i}$. As the norms of these two matrices grow, the dimension of the controllable part decreases. Specially, in case of the last example, \texttt{TG01HD} exited immediately without finding the controllable part.

The SLICOT routine \texttt{TG01HD} recommends balancing the system by the routine \texttt{TG01AD}. Our balancing routine returns the matrices $A_{0}$, $B_{0}$ and $E_{0}$ for all examples.

\subsection{Example 4: sensitivity of the pole assignment problem to scaling}
We are interested in another problem from control theory: the pole assignment problem for descriptor linear systems of form (\ref{descrsyst}) via state feedback. For details, see \cite{duan}. Let us define the closed-loop system
\begin{equation}\label{closed_loop_sys}
E\dot{x}=(A-BK)x(t)+Bv(t),
\end{equation}
where $v(t)$ is an external signal, and $K\in \mathbb{R}^{m\times n}$ is a feedback matrix. For the regular system (\ref{descrsyst}) with $n_{1}=\deg \det (A-\lambda E)$, and for the set of complex numbers $\Gamma =\{ \lambda_{1},\lambda_{2},\ldots ,\lambda_{n_{1}}\}$ closed under complex conjugation, the problem is to find a state feedback controller in the form $u(t)=Kx(t)+v(t)$ such that $\Gamma$ is the set of finite poles of the closed-loop system (\ref{closed_loop_sys}), or alternatively, the elements of $\Gamma$ are the eigenvalues of the pencil $(A-BK)-\lambda E$. Reliable methods for solving this problem are the so-called Hessenberg methods based on explicit or implicit QZ-like techniques. An explicit shift method for single input systems is proposed by Miminis and Paige \cite{miminispaige}, and the implicit version of the algorithm for ordinary linear time invariant systems with multiple inputs is proposed by Patel and Misra \cite{patelmisra}. We applied the QZ version of the Patel and Misra algorithm on a controllable descriptor system with a non-singular matrix $E$. This algorithm is based on a reduction similar to the $m$-Hessenberg--triangular--reduction, which reveals controllability of the system.

Our example of the descriptor system (\ref{descrsyst}) is defined for $A,E\in \mathbb{R}^{10\times 10}$ and $B\in \mathbb{R}^{10\times 3}$, where

{\setlength\arraycolsep{1pt}\footnotesize
$$A=\left[ \begin{array}{cccccccccc}
  {\scriptstyle 8.15\cdot 10^{-8}} & 0    &       0 & {\scriptstyle 7.06\cdot 10^{-3}} &   0     &       0 & {\scriptstyle 7.51\cdot 10^{-5}} & {\scriptstyle 8.41\cdot 10^{-8}}   &      0      &       {\scriptstyle 7.59\cdot 10^{-2}}\\
            0 & {\scriptstyle 9.71\cdot 10^{-8}} &  0     &       0 &  {\scriptstyle 3.82\cdot 10^{-1}}&  0 &{\scriptstyle 2.55\cdot 10^{2}}     &      0   &       {\scriptstyle 8.31\cdot 10^{-7}}     &       0\\
            0 & {\scriptstyle 9.57\cdot 10^{-5}} & {\scriptstyle 8.49\cdot 10^{2}}    &     0         &  0 &{\scriptstyle 6.55\cdot 10^{2}}     &      0   &            0      &         0    &     0\\
            0 &          0      &      0            &             0         &  0 &{\scriptstyle 1.63\cdot 10^{-1}}    &      0   &   {\scriptstyle 2.44\cdot 10^{-1}}   &   0 & {\scriptstyle 7.79\cdot 10^{6}}\\
            0     &      0     &      0 & {\scriptstyle 9.71\cdot 10^{3}} & {\scriptstyle 1.87\cdot 10^{-1}} & {\scriptstyle 1.19\cdot 10^{-1}}     &       0      &       0  & {\scriptstyle 9.17\cdot 10^{-7}}     &       0\\
            0     &      0     &      0     &      0   &{\scriptstyle 4.90\cdot 10^{-1}}     &       0     &       0      &       0 & {\scriptstyle 2.86\cdot 10^{-7}}     &       0\\
            0     &      0     &      0     &      0   &        0     &      0     &       0      &       0      &      0 & {\scriptstyle 5.69\cdot 10^{6}}\\
            0     &      0     &      0     &      0   &        0     &      0 & {\scriptstyle 1.39\cdot 10^{-2}}       &      0     &       0      &      0\\
            0     &      0     &      0     &      0   &        0     &      0     &       0 &  {\scriptstyle 6.16\cdot 10^{14}}     &       0      &      0\\
            0     &      0     &      0     &      0   &        0     &      0     &       0     &        0 & {\scriptstyle 5.68\cdot 10^{-7}} & {\scriptstyle 3.37\cdot 10^{6}}
\end{array}\right],  $$
}
{\setlength\arraycolsep{1pt}\footnotesize
$$B^{T}=\left[ \begin{array}{cccccccccc}
  {\scriptstyle 1.62\cdot 10^{1}}&           0&           0&           0&           0&           0&           0&           0&           0&           0\\
  {\scriptstyle 4.51\cdot 10^{1}}&           0&           0& {\scriptstyle 9.13\cdot 10^{-1}}&           0& {\scriptstyle 8.26\cdot 10^{5}}&           0&           0&           0&           0\\
            0& {\scriptstyle 9.62\cdot 10^{-1}}&           0&           0& {\scriptstyle 8.17\cdot 10^{-1}}&           0&           0&           0& {\scriptstyle 2.60\cdot 10^{13}}& {\scriptstyle 8.00\cdot 10^{-1}}
\end{array}\right] .$$
}
{\setlength\arraycolsep{1pt}\footnotesize
$$E=\left[ \begin{array}{cccccccccc}
  {\scriptstyle 4.31\cdot 10^{-8}} &          0& {\scriptstyle 4.17\cdot 10^{-8}}&           0& {\scriptstyle 2.35\cdot 10^{-8}}&           0&           0& {\scriptstyle 6.44\cdot 10^{-8}}& {\scriptstyle 2.08\cdot 10^{-14}}&           0\\
            0 &{\scriptstyle 6.22\cdot 10^{-8}}&           0& {\scriptstyle 3.90\cdot 10^{4}}&           0&           0&           0&           0&           0&           0\\
            0 &          0& {\scriptstyle 9.03\cdot 10^{2}}&           0&           0&           0& {\scriptstyle 4.87\cdot 10^{5}}&           0&           0& {\scriptstyle 4.30\cdot 10^{9}}\\
            0 &          0&           0& {\scriptstyle 4.04\cdot 10^{4}}&           0&           0&           0&           0&           0&           0\\
            0 &          0&           0&           0& {\scriptstyle 4.30\cdot 10^{-2}}& {\scriptstyle 6.87\cdot 10^{-1}}&           0& {\scriptstyle 3.51\cdot 10^{-1}}&           0&           0\\
            0 &          0&           0&           0&           0& {\scriptstyle 1.84\cdot 10^{-1}}&           0&           0&           0&           0\\
            0 &          0&           0&           0&           0&           0& {\scriptstyle 5.09\cdot 10^{2}}&           0&           0&           0\\
            0 &          0&           0&           0&           0&           0&           0& {\scriptstyle 5.50\cdot 10^{-5}}&           0&           0\\
            0 &          0&           0&           0&           0&           0&           0&           0& {\scriptstyle 2.28\cdot 10^{8}}&           0\\
            0 &          0&           0&           0&           0&           0&           0&           0&           0& {\scriptstyle 4.09\cdot 10^{6}}
\end{array}\right] ,$$
}

\noindent
The set of desired finite poles of the closed-loop system is taken to be $\Gamma=\{ -1\pm 2i,-8\pm 12i,-3,-7,-15,-25,-30,-100\}$. The pole assignment algorithm produced the feedback matrix

{\setlength\arraycolsep{1pt}\footnotesize
$$K=\left[ \begin{array}{cccccccccc} {\scriptstyle 1.30\cdot 10^{-8}}&           0&           0&           0&           0&           0&           0&           0&           0&           0\\
            0&{\scriptstyle -4.74\cdot 10^{-13}}& {\scriptstyle 1.07\cdot 10^{-15}}&{\scriptstyle -2.31\cdot 10^{-19}}& {\scriptstyle 7.53\cdot 10^{-7}}&{\scriptstyle -4.72\cdot 10^{-8}}&           0&           0&           0&           0\\
            0&{\scriptstyle -1.54\cdot 10^{-17}}&{\scriptstyle -4.41\cdot 10^{-10}}& {\scriptstyle 3.33\cdot 10^{-16}}& {\scriptstyle 1.84\cdot 10^{-11}}& {\scriptstyle 2.93\cdot 10^{-10}}&           0&           0& {\scriptstyle 6.13\cdot 10^{-5}}& {\scriptstyle 1.05\cdot 10^{-16}}
\end{array}\right] ,$$
}

\noindent
and the computed eigenvalues of the pencil $(A-BK)-\lambda E$ are $
 -3.0000,
  8.6823 \pm 12.732i,
 -6.7362,
 -0.13415,
  0.40521,
  0.94064,
 -2.4631,
  1.5121 \pm 1.7799i
$, see Figure \ref{fig_poles}.




\begin{figure}[h!]
\centering
%\subfigure[{\em Magnitude ranges of elements for the original matrix and the balanced matrix $B$}]{
\includegraphics[scale=0.429]{poles_bw.eps}
%\subfigure[{\em Maximal magnitude of elements for the original matrix and the balanced matrix $B$}]{
\includegraphics[scale=0.429]{poles_zoomin_bw.eps}
\caption{\rm\em Desired poles in $\Gamma$ and the computed poles}\label{fig_poles}
\end{figure}


\noindent
It has to be mentioned here that the matrices $A-BK$ and $E$ were balanced prior to eigenvalue computation, and as we can see, only one eigenvalue was assigned correctly. On the other hand, when we applied the balancing algorithm on the matrices $A$, $B$ and $E$, obtaining $D_{l}=\text{diag}(10^{3},10^{-2},10^{-6},10^{-2},10^{-2},10^{-3},$ $10^{-3},10^{1},10^{-16},10^{-2})$ and $D_{r}=\text{diag}(10^{4},10^{10},10^{4},10^{-2},10^{3},10^{3},10^{1},10^{3},10^{9},\linebreak10^{-4})$, and the balanced matrices $A_{b}=D_{l}AD_{r}$, $B_{b}=D_{l}B$, $E_{b}=D_{l}ED_{r}$, the pole assignment algorithm applied to the balanced matrices produced the feedback matrix

{\setlength\arraycolsep{1pt}\footnotesize
$$K_{b}\!\!=\!\!\!\left[ \begin{array}{cccccccccc}
  {\scriptstyle 1.30\cdot 10^{-4}}&           0&           0&           0&           0&           0&           0&           0&           0&           0\\
            0& {\scriptstyle 7.49\cdot 10^{-1}}&{\scriptstyle -\!6.66\cdot 10^{0}}&{\scriptstyle -\!1.00\cdot 10^{0}}&{\scriptstyle -\!1.28\cdot 10^{-1}}&{\scriptstyle -\!2.04\cdot 10^{0}}&           0&           0&           0&           0\\
            0&           0&           0&           0&           0&           0& {\scriptstyle 3.91\cdot 10^{8}}& {\scriptstyle 2.93\cdot 10^{8}}& {\scriptstyle 7.45\cdot 10^{8}}&{\scriptstyle -\!4.3341\cdot 10^{7}}
\end{array}\right]\!\! .$$
}

\noindent
In this case, all computed eigenvalues of the pencil $A_{b}-B_{b}K_{b}-\lambda E_{b}$ have at least 8 correct digits. We also observed that in our experiments, where the columns of $B$ were badly scaled, the system became numerically uncontrollable, reducing the number of eigenvalues that can be assigned. Thus, the variant R of the balancing algorithm is applicable and useful for this problem.

\subsection{Conclusion of the test results}
From the presented test results we can conclude the following. The variant S is most suitable for the control theory problems that include orthogonal transformations of the matrices $A$ and $E$ from both sides, and $B$ only from the left, such as frequency response computing, staircase reduction and the pole assignment problem. If this variant does not balance $B$ in a satisfactory way because the rows of $B$ are worse scaled than the rows of $A$ and $E$, then usage of the variant W is recommended. In addition, if $B$ has badly scaled columns and the algorithms involving rank revealing are to be applied, such as staircase reduction and pole assignment problem, then the variant R should be used.

At the end, we would like to illustrate sensitivity issues of a badly scaled system demonstrated in this section by an example coming from a real application. The descriptor system of the form (\ref{descrsyst}) is derived as a power system model, and this example is based on the Brazilian interconnection power systems (BIPS) model relating to a 1998 heavy load condition. Matrices of the system can be found as example \texttt{bips98\_606}, which is part of the Rommes group in UF Sparse Matrix Collection \cite{rommes}. The dimensions are: $n=7135$ and $m=p=4$. The matrix $A$ is very badly scaled, with the smallest element equal to $1.4341\cdot 10^{-17}$ and the largest equal to $10^{20}$ on several entries, causing a large condition number $\kappa_{2}(A)=3.1321\cdot 10^{26}$. The routine \texttt{dg3bal} balanced all three matrices successfully. On the other hand, since $A$ has a large norm, the real effect of the balancing is observed when trying to determine the controllable part of the original system. When applied to the original system, the routine \texttt{TG01HD} returned without finding the controllable part, while when applied to the balanced system, the routine returned the controllable part of dimension 616. Clearly, we would not be able to solve this problem for this particular example without balancing.

\section{Conclusion}
In this paper, three versions of an algorithm for balancing three matrices simultaneously are proposed. Balancing is performed via diagonal transformations and the goal is to reduce the range of the order of magnitude for all elements of the involved matrices. We illustrated its application with the reduction to the $m$-Hessenberg--triangular--triangular form of three matrices $A$, $B$ and $E$, which is used for efficient computation of the frequency response matrix $\mathcal{G}(\sigma )=C(\sigma E-A)^{-1}B+D$ in case of a descriptor system, with the pole assignment problem via state feedback, and with finding the controllable part of the system. The reduction algorithm can produce a very inaccurate result for badly scaled matrices. The basic variant balances rows and columns of $A$ and $E$, and only rows of $B$, since computing $\mathcal{G}(\sigma )$ is invariant under such transformations. The weighted variant of the balancing algorithm gives more weight to balancing of elements of $B$, since the basic algorithm can produce well balanced $A$ and $E$ and poorly balanced $B$. The third variant offers a possibility of balancing columns of $B$ as well. Numerical experiments confirmed that balancing matrices $A$, $B$ and $E$ before the $m$-Hessenberg--triangular--triangular reduction produces an accurate frequency response matrix, as well as accurate pole assignment via state feedback. In case when the controllable part of the system is sought, the answer might not be obtained when the system is badly scaled. Balancing is very important for this kind of problem.

\section*{Acknowledgments}
The author is indebted to Zlatko Drma\v{c} for reading the paper and giving many valuable comments.


\begin{thebibliography}{99}

\bibitem{lapack}
    {\sc E.\,Anderson, Z.\,Bai, C.\,Bischof, S.\,Blackford, J.\,Demmel, J.\,J. Dongarra,  J.\,Du Croz, A.\,Greenbaum, S.\,J.\,Hammarling, A.\,McKenney, D.\,C.\,Sorensen}, {\em LAPACK Users' Guide}, Third ed., SIAM, Philadelphia,  1999.

\bibitem{betcke}
    {\color{black} {\sc T.\,Betcke}, {\em Optimal scaling of generalized and polynomial eigenvalue problems}, SIAM J. Matrix Anal. Appl. {\bf 30}(2008), 1320---1338.}

\bibitem{bjorck}
    {\sc \AA.\,Bj\"{o}rck}, {\em Least squares methods}, in: {\em Handbook of Numerical Analysis, Volume I}, (P. G. Ciarlet et al., Eds.), North-Holland, Amsterdam, 1990, 465--652.

\bibitem{ttmh}
    {\sc N.\,Bosner}, {\em Efficient algorithm for simultaneous reduction to the $m$-Hessenberg--triangular--triangular form}, BIT, 2014, DOI: 10.1007/s10543-014-0516-y
    %Tech. rep., University of Zagreb, 2011.

\bibitem{blockmhess}
    {\sc N.\,Bosner, Z.\,Bujanovi\'{c}, Z.\,Drma\v{c}}, {\em Efficient generalized Hessenberg form and applications}, {ACM} Trans. Math. Softw. {\bf 39}(2013), 19:1--19:19.

\bibitem{genconjgrad}
    {\sc P.\,Concus, G.\,H.\,Golub, D.\,P.\,O'Leary}, {\em A generalized conjugate gradient method for the numerical solution of elliptic partial differential equations}, in: {\em Sparse Matrix Comput., Proc. Symp. Lemont 1975}, Academic Press, New York, 1976, 309--332.

\bibitem{coxhigham}
    {\sc A.\,J.\,Cox, N.\,J.\,Higham}, {\em Stability of Householder QR factorization for weighted least squares problems}, in: {\em Numerical Analysis 1997, Proceedings of the 17th Dundee Biennial Conference, Pitman Research Notes in Mathematics, vol. 380}, (D\,F.\,Griffiths, D.\,J.\,Higham, G.\,A.\,Watson, Eds.), Addison-Wesley, Longman, Harlow, Essex, UK, 1998, 57--73.

\bibitem{curtisreid}
    {\sc A.\,R.\,Curtis, J.\,K.\,Reid}, {\em On the automatic scaling of matrices for Gaussian elimination},  J. Inst. Math. Appl. {\bf 10}(1972), 118--124.

\bibitem{demmelveselic}
    {\sc J.\,Demmel, K.\,Veseli\'{c}}, {\em Jacobi's method is more accurate than QR}, SIAM J. Matrix Anal. Appl. {\bf 13}(1992), 1204--1245.

\bibitem{linpack}
    {\sc J.\,J.\,Dongarra, J.\,R.\,Bunch, C.\,B.\,Moler, G.\,W.\,Stewart}, {\em LINPACK User's Guide}, SIAM, Philadelphia, 1979.

\bibitem{duan}
    {\sc G.-R.\,Duan}, {\em Analysis and Design of Descriptor Linear Systems}, Springer, New York, 2010.

\bibitem{golubvanloan}
    {\sc G.\,H.\,Golub, C.\,F.\,van Loan}, {\em Matrix Computations}, Third ed., M. D. Johns Hopkins University Press, Baltimore, 1996.

\bibitem{greenbaum}
    {\sc A.\,Greenbaum}, {\em Iterative Methods for Solving Linear Systems}, SIAM, Philadelphia, 1997, Chapter 3.

\bibitem{hestenesstiefel}
    {\sc M.\,R.\,Hestenes, E.\,Stiefel}, {\em Methods of conjugate gradients for solving linear systems},  J. Res. Natl. Bur. Stand. {\bf 49}(1952), 409--436.

\bibitem{higham}
    { {\sc N.\,J.\,Higham}, {\em Accuracy and Stability of Numerical Algorithms}, Second ed., SIAM, Philadelphia,  2002.}

\bibitem{lemoniervandooren}
    {\sc D.\,Lemonnier, P.\,Van Dooren}, {\em Balancing regular matrix pencils},  SIAM J. Matrix Anal. Appl. {\bf 28}(2006), 253--263.

\bibitem{miminispaige}
    {\sc G.\,S.\,Miminis, C.\,C.\,Paige}, {\em An algorithm for pole assignment of time invariant linear systems}, Int. J. Control {\bf 35}(1982), 341--354.

\bibitem{paigenumalgcompcontr}
    {\sc C.\,C.\,Paige}, {\em Properties of numerical algorithms related to computing controllability}, IEEE Trans. Autom. Control {\bf 26}(1981), 130--138.

\bibitem{parlett}
    {\sc B.\,N.\,Parlett}, {\em The Symmetric Eigenvalue Problem}, Prentice--Hall Series in Computational Mathematics, Prentice--Hall, Inc., Englewood Cliffs, New Jersey, 1980.

\bibitem{parlettreinsch}
    {\sc B.\,N.\,Parlett, C.\,Reinsch}, {\em Balancing a matrix for calculation of eigenvalues and eigenvectors}, Numer. Math. {\bf 13}(1969), 293--304.

\bibitem{patelmisra}
    {\sc R.\,V.\,Patel, P.\,Misra}, {\em Numerical algorithm for eigenvalue assignment by state feedback}, Proc. IEEE {\bf 72}(1984), 1755--1764.

\bibitem{powellreid}
    {\sc M.\,J.\,D.\,Powell, J.\,K.\,Reid}, {\em On applying Householder transformations to linear least squares problems}, Tech. report T.P. 322, Mathematics Branch, Theoretical Physics Division, Atomic Energy Research Establishment, Harwell, UK, February 1968.

\bibitem{rommes}
    {\em Rommes group}, \texttt{http://www.cise.ufl.edu/research/sparse/matrices/Rommes/}

\bibitem{simoncini}
    {\sc V.\,Simoncini}, {\em Restarted full orthogonalization method for shifted linear systems}, BIT {\bf 43}(2003), 459--466.

\bibitem{simonciniperotti}
    {\sc V.\,Simoncini, F.\,Perotti}, {\em On the numerical solution of $(\lambda^{2}A+\lambda B+C)x=b$ and application to structural dynamics}, SIAM J. Scientific Comput. {\bf 23}(2002), 1876--1898.

\bibitem{slicot}
    {\em SLICOT}, \texttt{http://www.slicot.org}

\bibitem{vandersluis}
    {\sc A.\,van der Sluis}, {\em Condition numbers and equilibration of matrices}, Numer. Math. {\bf 14}(1969), 14--23.

\bibitem{vargacompiredgenstspreal}
    {\sc A.\,Varga}, {\em Computation of irreducible generalized state-space realizations}, Kybernetika {\bf 26}(1990), 89--106.

\bibitem{ward}
    {\sc R.\,C.\,Ward}, {\em Balancing the generalized eigenvalue problem}, SIAM J. Sci. Stat. Comput. {\bf 2}(1981), 141--152.

\bibitem{watkins}
    {\sc D. S. Watkins}, {\em A case where balancing is harmful},  Electron. Trans. Numer. Anal. {\bf 23}(2006), 1--4.

\end{thebibliography}
%\end{linenumbers}

\end{document} 