A new rank metric for convolutional codes

Let F[D]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {F}}[D]$$\end{document} be the polynomial ring with entries in a finite field F\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {F}}$$\end{document}. Convolutional codes are submodules of F[D]n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {F}}[D]^n$$\end{document} that can be described by left prime polynomial matrices. In the last decade there has been a great interest in convolutional codes equipped with a rank metric, called sum rank metric, due to their wide range of applications in reliable linear network coding. However, this metric suits only for delay free networks. In this work we continue this thread of research and introduce a new metric that overcomes this restriction and therefore is suitable to handle more general networks. We study this metric and provide characterizations of the distance properties in terms of the polynomial matrix representations of the convolutional code. Convolutional codes that are optimal with respect to this new metric are investigated and concrete constructions are presented. These codes are the analogs of Maximum Distance Profile convolutional codes in the context of network coding. Moreover, we show that they can be built upon a class of superregular matrices, with entries in an extension field, that preserve their superregularity properties even after multiplication with some matrices with entries in the ground field.


Introduction
Within the area of coding theory, network coding has been a very active topic of research as it provides an effective tool to disseminate information (packets) over networks.Mathe-matically, we can consider the transmitted packets as columns of a matrix with entries in a finite field F q , and the linear combinations performed in the nodes of the network correspond to columns operations on this matrix.If no errors occur during the transmission over such a network, the F q -column space of the transmitted matrix remains invariant.To achieve a reliable communication over this channel, matrix codes are employed forming the so-called rank metric codes (see [31]).Rank metric codes such as Gabidulin codes are known to be able to protect packets in such a scenario.We call these codes one-shot codes, as they use the (network) channel only once.However, coding can also be performed over multiple uses of the network (multi-shot) in a sequential fashion, as it has been recently shown by several authors, see for instance [5,17,22,26,32].
The general idea of multi-shot network coding stems from the fact that creating dependencies among the sequence of transmitted codewords (subspaces) of different shots can improve the error-correction capabilities of the code.A first attempt to explain multi-shot network coding was presented in [26] and a type of concatenated n-shot codes (n ≥ 1) was proposed based on a multilevel code (see also [24]).Apart from concatenated codes, another very natural way to spread redundancy across codewords is by means of convolutional codes (see [13,21]).Adapting this class of codes to the context of networks gave rise to rank metric convolutional codes (see [5,17,18,32]).The work in [32] was pioneer in this direction by presenting the first class of rank metric convolutional codes together with a decoding algorithm able to deal with errors, erasures and deviations.However, the results were only valid for unit memory convolutional codes and in [5,17,18] an interesting and more general class of rank metric convolutional codes was introduced to cope with network streaming applications.For a more general and theoretical framework for rank metric convolutional codes we refer to [22].
The first metric proposed in this context was the active column sum rank distance in [32] and it was defined using the state trellis graph of the convolutional code.The j-th active column sum distance just considers sequences that are constructed by exiting the zero state of the trellis at time instant 0 and not re-entering it for 1 ≤ t ≤ j − 1.Thus, some sequences are not considered in the time interval [0, j] and therefore this metric is not a sufficient metric to guarantee decoding within a time interval.Hence, a new distance, called sum rank distance, was introduced as a generalization of the active column rank distance and the rank distance used for one-shot network coding (see [18], [20] and [26]).This new distance has proven to be the proper notion in order to deal with networks that are delay-free.In delay-free networks it is assumed that the natural delay in the transmission (due, for instance, to the delay of the nodes) is so small that can be disregarded.
In this work, we continue this thread of research and consider convolutional codes tailormade to handle networks with delays.We show that the previous metrics are not enough to characterize the error correction capability of the code in these networks (see Example 1) and consider a new (rank) metric, called column rank distance, that solves this problem.This distance is the rank analog of the so-called column distance of Hamming convolutional codes, see [9,12,13].It can also be seen as the standard rank distance of a block code, but the code considered here is nonlinear as it is derived from truncated convolutional codes, see definition of column block code in (7).Hence, previous known results of rank metric linear block codes cannot be straightforward applied in this case.We will show that the column rank distance characterizes the error correcting capability of the convolutional code within a time interval, in more general network channels (see Theorem 4).Moreover such characterization leads to an efficient algorithm to recover the lost packages (see Example 2).We also present concrete constructions of convolutional codes that are optimal with respect to the column rank distance using a particular type of superregular matrices that preserve superregularity after some linear operations.
In Sect.2, we present fundamental results on classical one-shot network coding, multishot network coding and convolutional codes.With this basic knowledge we introduce, in Sect.3, the metrics used so far for multi-shot network coding and introduce a new rank metric, presenting also some of its characterizations (see Theorem 3) and properties (see Theorem 4).Section 4 is devoted to convolutional codes that are optimal with respect to this metric, called Maximum Rank Profile, and we provide a characterization of them in terms of the sliding matrix of the code (see Theorem 6).We also present novel results on the optimization of the recovery of packets when this class of optimal codes are employed (see Theorem 7).In Sect. 5 we discuss and solve the problem of the existence and construction of these codes.We prove that this problem can be reduced to the construction of certain classes of matrices called superregular matrices (see Theorem 8) and present a concrete class of such matrices.The problem of deriving superregular matrices to build convolutional codes has become an active area of research, see for instance [2,6,10], and the results presented in this section extend the known results on this topic.

Preliminaries
In order to state more precisely our results, we introduce in this section the necessary material and notation on standard theory of (multi-shot) network coding and convolutional codes.
Let q be a prime power and let F q denote the finite field with q elements and let M > 1 be an integer.It is well-known that there always exists a primitive element α of the extension field F q M , such that F q M is isomorphic to F q [α].Moreover, F q M is isomorphic (as a vector space over F q ) to the vector space F M q .One then easily obtains the isomorphic description of matrices over the base field F q as vectors over the extension field, i.e., 2.1 From one-shot to multi-shot network coding

The network model: one-shot
), called channel packet, v represents the n packets of length M to be sent through the network.We shall follow the approach of [18] and consider a rank-deficiency channel for one shot given by where x ∈ F n q M represents the received packets and A ∈ F n×n q is the rank deficiency channel matrix.The channel matrix A corresponds to the overall linear transformations applied by the network over the base field F q and it is known by the receiver (as the combinations are carried over in the header bits of the packets).For perfect communications we have that rank(A ) = n, but failing and deactivated links may cause a rank deficient channel matrix.We call n − rank(A ) the rank deficiency of the channel.In order to protect information in this setting, rank metric codes are used.
A rank metric code C is defined as a nonempty subset of F M×n q equipped with the distance measure d rank (V , W ) = rank(V − W ), where V , W ∈ F M×n q (see [14]).The rank distance of a code C ⊂ F M×n q is defined as Although rank metric codes are in F M×n q , they are usually constructed as linear block codes of length n over the extension field F q M (see [14]).Then, with some obvious abuse of notation, we will sometimes use d rank (C) when C ⊂ F n q M .Consider linear codes over F q M and use k for their dimension.A linear (n, k) rank metric code over F q M satisfies the following analog of the Singleton bound: A code that achieves this bound is called Maximum Rank Distance (MRD).Gabidulin codes, introduced in [8], are a well-known class of MRD codes, see also [7] and [25].

Multi-shot
In many situations the sender needs to transmit not only one single vector v ∈ F n q M of data but a sequence of vectors.Suppose that we have a stream of source packets u to be transmitted.Then, the idea of multi-shot network coding is that instead of encoding each u t in v t independently of other u i , one can generate each codeword v t based on u t and previous u i , i ≤ t to improve the error correcting capabilities of the code.
The multi-shot setting can be described as follows: the transmitter receives at each time instance t a source packet u t ∈ F k q M (constituted by a set of k packets).A channel packet v t ∈ F n q M (constituted by a set of n packets) is constructed using not only u t but also previous source packets u 0 , . . ., u t−1 .Then, the channel packet v t is sent through the network at each shot (time instance) t.The receiver observes not only a combination of the n packets sent at instant t but also of previous packets sent at time i < t.Hence, following the operator channel in (1) at each shot t the received packets x t ∈ F n q M are linear combinations of the packets of v t and, if there is delay in the transmission, also of combinations of the previous packets v 0 , . . ., v t−1 .Hence, for any j ≥ 0, we have where and A [0, j] is a block lower triangular truncated channel matrix where A i are square (not necessarily nonsingular) matrices of order n, for i ≤ ≤ j.The rank deficiency during the time interval [0, j] is n( j + 1) − rank(A [0, j] ) and measures the amount of information lost during this interval.
So far this channel model has not been proposed nor addressed in the literature in this generality and only the delay-free case has been considered, see [18] and references therein.
In the delay-free case only combinations of packets of v t arrive at time instance t and not of packets of v i , i < t and therefore in this case the rank deficiency matrix A [0, j] is a block diagonal matrix, i.e., A i = 0 for i > .This assumption constitutes a restriction as there may exist a time-varying delay in the end-to-end transmission due to link delays.As we shall see, the extension to the above more general setting is not straightforward.In the next section, we introduce a metric that suits that framework.
Note that, in practice, the receiver can compute the rank deficiency as each packet carries a label identifying the shot (or generation) to which it corresponds.The possibility of using multi-shot coding in the context of network coding was already observed in the seminal papers [14] and [31].In this setting, convolutional codes are a natural way to protect and process information in a sequential manner.

Convolutional codes
As opposed to block codes, convolutional codes treat the information as a stream of data.If we introduce a variable D, usually called the delay operator, to indicate the time instant in which each information arrived or each codeword was transmitted, then we can represent the sequence message Similarly, we can represent the information vector to be encoded as u(D) and the encoder (typically implemented by means of a linear finite-state shift register) as G(D).Formally, we can define convolutional codes as follows (see [9] D] n of rank k such that there exists a polynomial matrix G(D) ∈ F q M [D] k×n , called generator matrix, that is basic, i.e., there exists a polynomial matrix X (D) such that G(D)X (D) = I k (that is, G(z) has a polynomial right inverse), with the property that Two generator matrices G 1 (D) and G 2 (D) generate the same convolutional code if there exists a unimodular matrix The degree δ of a convolutional code C is the maximum of the degrees of the determinants of the k × k sub-matrices of one, and hence, any generator matrix of C. The degree of C is the minimum size of a state space realization of C (the McMillan degree of the corresponding linear system, see [29]).For that, we feel that it is the single most important code parameter on the side of the transmission rate k/n.Note that a block code is a convolutional code with δ = 0.For more general classes of convolutional codes see [3,23] In the sequel, we adopt the notation of McEliece [21, p. 1082] and denote a convolutional code of rate k/n and degree δ an (n, k, δ)-convolutional code.
If we write v(D) = v 0 +v 1 D+. ..+vD , and we represent G(D) as a matrix polynomial, where G ν = 0 and G i = 0, for i > ν, the truncated sliding generator matrix is for any j ≥ 0, and a truncated codeword can be represented as For each channel packet it holds that Define now the j-th column block code of C as Note that this block code is nonlinear (over F q M ) as the zero vector is not in C c j .In this work we will be primarily interested in the correction capabilities of this nonlinear column block code rather than in the linear (over The code C can be equivalently described using an (n − k) × n full rank polynomial parity-check matrix H (D), defined by and the associated truncated sliding parity-check matrix of with H j = 0 when j > m, j ∈ N.

Metrics for multi-shot network coding
The sum rank distance, the distance that has been widely considered for multi-shot network coding, was first introduced in [26] under the name of extended rank distance.We note that the sum rank distance has been also used in the context of block codes reducing their decoding complexity, see for instance [20].
Let v = (v 0 , . . ., v t ) and w = (w 0 , . . ., w t ) be two (t + 1)-tuples of vectors in F n q M .The sum rank distance (SRD) between them is and, for 0 ≤ j ≤ t, its j-th Column Sum Rank Distance ( j-th CSRD) as In [32], concrete decoding algorithms for unit memory rank metric convolutional codes were presented using another distance, namely the active rank distance.However, in [18], it was shown that this metric fails to fully determine the error-correcting capabilities of rank metric convolutional codes with arbitrary memory and the j-th CSRD needs to be considered.
Moreover, in [22,Theorem 4.1] it was derived a Singleton-type upper bound for the FSRD of C in a different setting.This can be adapted easily to our context to obtain: Note that this bound coincides with the so-called generalized Singleton bound of (Hamming) (n, k, δ)-convolutional codes (see [27] and [28]).Thus, the bound ( 9) could be also derived from the fact that the rank distance is upper bounded by the Hamming distance (see [8,19]).
Although some results were presented in [30], the problem of existence and construction of rank metric convolutional codes whose free rank distance achieves the bound ( 9) remains open.
The following result was obtained in [18] and shows that the j-th CSRD can be used to characterize the error-correcting capabilities of multi-shot codes in the context of delay-free networks.
Theorem 1 [18,Theorem 2] [0, j] be the truncated block diagonal channel matrix.Then, u 0 is recoverable at time instant j if The following example illustrates that, when the network has delays in the transmission of packets, i.e., when A * [0, j] is not necessarily block diagonal, the j-th CSRD fails to characterize the rank deficiency correcting capability of C.
It is easy to see that d 1 SR (C) = 2 and that there exists a v(D) . By Theorem 1 one can always recover u 0 in delay-free networks as far as the j-th CSRD of C is larger than the rank deficiency in the window [0, j], i.e., if n( j + 1) − rank(A [0, j] ) < d j SR (C).Thus, in this example one can recover u 0 if 6 − rank(A [0,1] ) ≤ 1 or equivalently, if rank(A [0,1] ) ≥ 5.However, in presence of delays in the network this does not necessarily hold.Take the channel matrix with delays that has rank equal to 5 and yields A [0,1] v [0,1] = 0, i.e., v [0,1] is indistinguishable from the zero sequence and therefore cannot be corrected.
For the general case, i.e., for not necessarily delay free networks, we introduce the j-th Column Rank Distance ( j-th CRD) for convolutional codes as follows: Note that this metric is the rank metric of the block code C c j but such a code is nonlinear and therefore existing results on linear rank metric codes cannot be directly applied to our case. Clearly, , for any j ≥ 0. Therefore, we have the following Singleton bound for column rank distance: Next, we show that the j-th CRD characterizes the recovery capability within an interval over a channel that admits delays in the transmission.We then investigate how to build convolutional codes with designed j-th column rank distance.For the sake of clarity we first revise the linear block case and the characterization of the rank distance in terms of the properties of the corresponding parity-check matrices.
Lemma 2 Let C ⊂ F n q M be a linear block code and H ∈ F

and only if both of the following conditions hold
1. for all nonsingular matrix A ∈ F n×n q , every set of d − 1 columns of H A, are linearly independent; 2. for all nonsingular matrix A ∈ F n×n q , there exists d linearly dependent columns of H A. Proof The proof follows easily from [8, Theorem 1].
Next we derive a counterpart result in the context of convolutional codes.First, for any j ≥ 0, A [0, j] will represent a nonsingular block lower triangular matrix with entries in the base field F q , of the form Then, the following are equivalent span of any other d − 2 columns of H trunc ( j)A [0, j] for all nonsingular block lower triangular A [0, j] as in (13) and, moreover, one of the first n columns of H trunc ( j)A [0, j] is in the F q M -span of other d − 1 columns of H trunc ( j)A [0, j] , A [0, j] as in (13).
. Thus, H trunc ( j)A [0, j] has d columns linearly dependent, say and at least one element in S, say i 1 , belongs to {1, . . ., n}.Now take w = (w 1 , w 2 , . . ., Since v 0 = 0, at least one of the first n coordinates of v is nonzero.Take a nonsingular block upper triangular matrix is as in (13) and this implies that one of the first n columns of H trunc ( j)A [0, j] is contained in the span of other d − 2 columns of H trunc ( j)A [0, j] .By the first part of 2. we obtain a contradiction.Hence, The proof of (1. ⇒ 2.) follows the same reasoning.
We are now in a position to provide a necessary condition to recover rank deficiencies within a given time interval.
[0, j] be the truncated channel matrix, where j ≥ 0.Then, u 0 is recoverable at time instant j if Proof As G(D) is basic, G(0) is full row rank and therefore to recover u 0 it is enough to decode v 0 .Obviously, if rank A * 00 = n there is no rank deficiency at time instant zero and v 0 can be immediately recovered.Assume then that rank A * 00 < n.The received vector x = (x 1 , . . ., x j ) ∈ F n( j+1) q M and A * [0, j] are known and we aim to ) and H trunc ( j)(v 0 , v 1 , . . ., v j ) = 0. We will show that v 0 is uniquely determined.
By the Gauss elimination, there exists a nonsingular block lower triangular matrix A [0, j] ∈ F n( j+1)×n( j+1) q such that the nonzero columns of A * [0, j] A [0, j] are linearly independent.Clearly, the number of nonzero columns of First we find v in order to obtain v 0 .From the equation ) coordinates of v can be uniquely determined.Notice that this system of equations always admits a nonzero solution.The remaining coordinates must satisfy the system of linear equations The unknowns in v satisfy a system of linear equations By = c where y represents the vector of t unknowns, with t = n( j + 1) − rank(A * [0, j] ) and c ∈ F is a constant vector (notice that some of the k( j + 1) equations may be trivial).This system is always soluble since H trunc ( j)v = 0. Now, by (14), t < d j CR (C).Let y i be one the unknowns of v 0 .Since t − 1 ≤ d j CR (C) − 2, the column corresponding to y i of B is not a linear combination of the other t − 1 columns, by Theorem 3. Therefore y i can be uniquely determined.Since this is true for any unknown of v 0 , we can recover the whole vector v 0 .Finally, as A −1 [0, j] is also a nonsingular and block lower triangular matrix, v 0 is recovered as well.

Corollary 1 Assume that we have been able to correctly decode up to an instant T
Corollary 1 illustrates how the notion of j-th CRD captures the error correcting capability of a rank metric convolutional code in a network with possible delays, within a limited time interval.

MRD convolutional codes: a matrix characterization
A natural follow-up question is to know bounds on the j-th CRD for a given set of parameters (n, k, δ).An upper-bound on these distances is presented next.
Corollary 2 Let C be an (n, k, δ) code.Then, for any j ≥ 0, Proof Note that if A [0, j] has full row rank then A ii is nonsingular for i = 0, . . ., j and therefore H 0 A ii has also full row rank since H 0 is full row rank (as H (D) is basic).Thus, H trunc ( j)A [0, j] has full row rank which implies, in particular, that any of the columns of H trunc ( j)A [0, j] is linear combination of at most (n − k)( j + 1) columns of H trunc ( j)A [0, j] in particular, one of the first n columns.Then, the result follows from the first part of the proof (2.⇒ 1.) of Theorem 3.
Note that Singleton bound on classical rank metric codes still holds for non-linear codes and therefore Corollary 2 would also follow directly from this fact.
Since no column distance can achieve a value greater than the Singleton-type upper bound in (12), there must exist an integer L for which the bound ( 16) could be attained for all j ≤ L and it is a strict upper bound for j > L. It is a matter of straightforward computations to verify that this value is (see [11] for more details): An (n, k, δ)-convolutional code C with every d j CR (C) maximal, i.e., d j CR (C) = (n − k)( j + 1) + 1, for each j ≤ L, is called a Maximum Rank Profile (MRP) code.The j-th column rank distances of MRP codes increase as rapidly as possible for as long as possible.Fast growth of the column distances is an important property for codes to be used with sequential decoding since they have the potential to correct a maximal number of errors per time interval.
Another interesting property of MRP codes is that Theorem 4 can be strengthened when these codes are used.Suppose that all the packets have been recovered up to an instant T 0 − 1.In order to recover u T 0 as soon as possible, the condition of Corollary 1 is checked sequentially at T 0 , T 0 + 1, . . . up to an instant, say T f , when the condition is finally satisfied.Then, by Corollary 1, u T 0 can be recovered.However, if MRP codes are employed in the transmission not only u T 0 but the complete vector (u T 0 , . . ., u T f ) can be actually recovered at time instant T f .This will be properly stated and proved next.
First we show that, as it happens with the column (Hamming) distance, the maximality of the j-th column rank distance implies maximality of the previous i-th column rank distances, i = 0, 1, . . ., j − 1.
Proof It is proved by contradiction.It is enough to prove it for i = j − 1.Then, assume that d j−1 CR (C) ≤ j(n − k), that is, there exists an A [0, j−1] as in (13) such that one of the first n columns of H trunc ( j − 1)A [0, j−1] is in the span of some other (n − k) j − 1 columns.Note that for all A [0, j] it holds that where H s = s t=0 H s−t A t+ j−s j−s .As A j j is full row rank, then so it is H 0 A j j and therefore one of the first n columns of H trunc ( j)A [0, j] is in the span of some other A characterization of MRP codes in terms of the full size minors of G trunc ( j)A [0, j] is given next.
At this point we observe that as This, in particular, implies that c sk+1 > sn for s = 1, 2, . . ., .Extend the square matrix L by taking the matrix formed by the last k( j − ) columns of G trunc ( j)A [0,i] , which we denote by J , and build a k( j We call the indices of the selected columns c 1 , . . ., c ( j+1)k .Clearly, the indices satisfy c sk+1 > sn, for s = 1, 2, . . ., j and has determinant equal to zero as L is singular by construction.This means that 2. does not hold which concludes the proof.The last statement readily follows from Lemma 5.
We are now able to prove a stronger version of Theorem 4 and Corollary 1 if an MRP code is employed.In this case, when condition ( 14) is satisfied at time T f then we can recover the whole vector (u T 0 , . . ., u T f ).This property is particularly useful when erased packets must be recovered within tight delay constrains as it happens in many streaming applications.
Theorem 7 Let C = im G(D) be an MRP code.Assume that we have been able to recover all packets up to an instant T 0 and that T f is the first instant such that the condition (15) is satisfied.Then, the whole vector (u T 0 , . . ., u T f ) can be recovered.
Proof Denote T = T f − T 0 .Let A [T 0 ,T 0 + j] be the truncated channel matrix, 0 ≤ j ≤ T .Trying to recover u T 0 unsuccessfully up to T f − 1 using an MRP code is equivalent to By Corollary 1 u T 0 (and obviously also v T 0 ) can be recovered.Now, shift one time instant forward and consider the time window and thus again Corollary 1 guarantees that u T 0 +1 can be computed.The same procedure is applied sequentially to retrieve the remaining u T 0 + j , j = 2, 3, . . ., T .

Superregular matrices to build MRP codes
In this section we introduce a class of matrices that will be essential for the construction of convolutional codes that possess optimal rank distance properties: Superregular matrices.We will first explain the relation between superregular matrices and MRP.We then present a concrete class of superregular matrices that can be used to produce MRP codes.This, in particular, proves that MRP codes exist and the bounds given above are optimal for this new rank metric.

Block Toeplitz superregular matrices
Superregular matrices are important in coding theory as they can be used to construct codes with optimal Hamming distance.Roughly speaking, this is due to the fact that a full row rank superregular matrix has the following property: Take any one of its rows with Hamming weight, say d.Then, any combination of this row with t other rows yields a vector of Hamming weight ≥ d − t.In this paper we will show that a particular class of superregular matrices can be also used to build convolutional codes with optimal rank distance.Next, we formally introduce the notion of superregular matrix.Let F = [μ i j ] be a square matrix of order m over F q M and S m the symmetric group of order m.Recall that the determinant of F is given by where the sign of the permutation σ , sgn(oe), is 1 (resp.−1) according as if σ can be written as product of an even (resp.odd) number of transpositions.A trivial term of the determinant is a term of ( 18), μ 1σ (1) • • • μ mσ (m) , equal to zero.If F is a square submatrix of a matrix B, with entries in F q M , and all the terms of the determinant of F are trivial we say that |F| is a trivial minor of B. We say that B is superregular if all its non-trivial minors are different from zero.Notice that any full size minor of G trunc ( j)A [0, j] formed by more than sk columns in the first sn columns, with s ≤ j, is trivially zero.Indeed, if all the entries of G trunc ( j) are nonzero, then, it holds by Theorem 6 that the maximum j-th column rank distance is achieved if and only if all the nontrivial full size minors of G trunc ( j)A [0, j] are nonzero.This leads immediately to a sufficient condition to obtain MRP codes.
q M be an (n, k, δ)-convolutional code with L as in (17) and the entries of the coefficients G i of G(D) all nonzero.If G trunc (L)A [0,L] is superregular for A [0,L] as in (13), with j = L, then C is MRP.
Proof If the entries of the coefficients G i of G(D) are all nonzero, then the conditions in Theorem 6 on the column indices amounts to consider the nontrivial full size minors of G trunc ( j)A [0, j] .By Lemma 5 it is enough to consider j = L. Finally, the nontrivial full size minors of G trunc (L)A [0, j] are nonzero as G trunc (L)A [0, j] is assumed to be superregular.
The main goal of this section is to build a G(D) that satisfies conditions of Theorem 8. To this end we will first recall a known G(D) having the property that G trunc (L) is superregular and then show that G trunc (L)A [0, j] remains superregular.This last proof is not straightforward and requires the recall and modification of several previous results.
For the purposes of this paper we propose below a type of superregular matrices.This class of matrices were first introduced in the context of (Hamming) convolutional codes with q = 2 in [1] and will bring about a new class of convolutional codes with optimal rank distances.Recall that the extension field F q M is isomorphic to the M-dimensional vector space F M q over the base field F q .A basis of this vector space, say α 1 , α 2 , . . ., α M ∈ F q M , is said to be normal when there exists α ∈ F q M such that α i = α [i] = α q i , for all 1 ≤ i ≤ M. If so, α is called a normal element of F q M and there is always at least one element that is both normal and primitive (see [15]).Hence, every element ϕ of F q M can be written as a linear combination in F q of the elements of the basis, i.e., We shall make extensive use of linearized polynomials which are polynomials with its monomials terms having a Frobenius power.Hence, if ϕ i F q are the coordinates of the element ϕ of F q M , as in 19, then the linearized polynomial f ϕ (x) = M i=1 ϕ i x [i] satisfies f ϕ (α) = ϕ.The following result follows straightforward from the Freshman's Rule.

Lemma 9
Let u be a positive integer and f (x) = M i=1 a i x [i] be a linearized polynomial over F q M , then f [u] (x) = M i=1 a i x [i+u] .Now, let α be a primitive and normal element of a finite field F q M with q M elements and consider and where again we use the notation α [ j] = α q j to denote the j-th Frobenius power of α ∈ F q M .Note that the entries of ( 20) are the values of linearized monomials evaluated at α but the determinant is not necessarily the value of a linearized polynomial evaluated at α. Nevertheless, the next result states that if M is sufficiently large then G trunc ( j) (defined in ( 5)) is a superregular matrix for all j ≤ δ k + δ n−k .
Theorem 10 Let L = δ k + δ n−k , n ∈ N, α be a primitive and normal element of a finite field F q M of characteristic p and consider G trunc (L) ∈ F (L+1)k×(L+1)n q M as in (5), with G i submatrices of (20).If M ≥ q n(L+2)−1 then the matrix G trunc (L) is superregular (over F q M ).
Proof By observing that permutation of columns does not affect superregularity and using [1, Theorem 3.2] the theorem immediately follows for a prime q.The extension of that theorem obtained by R. Mahmood in [16] for any q = p r , where p is prime and r is a positive integer, concludes the proof.
We are now going to obtain a result that extends this theorem and states that the superregularity of the matrices G trunc ( j) built from block matrices of the form (20), remains invariant under right multiplications of A [0, j] .
In [18], the authors proved the superregularity of where the matrices T i are given by ⎡ and A i are invertible matrices, for 0 ≤ i ≤ m.To obtain their proof, they first showed the following properties: and where for each 0 ≤ r ≤ j ≤ n − 1, f r j is the value of a linearized polynomial evaluated at α and (a) ( f 0 j , f 1 j , . . ., f n−1 j ) are linearly independent over F q ; (b) If s < s, the entries of a row s of F i j are q-powers of the corresponding entries in row s , by the Freshman's Rule.(c) If i < i the entries of a row s of F i j are q-powers of the corresponding entries in row s of F i j , by the Freshman's Rule.
2. The q-degrees of the polynomials associated to the elements of F i j strictly increase downwards on any fixed column, for any 0 ≤ i ≤ j ≤ m. 3.If i < i and s ≥ i then the q-degree of the polynomial associated to any entry of the row r of F is is smaller than the q-degree of of the polynomial associated to any entry of the row r of F i (s+i −i) , for any 0 ≤ r ≤ n − 1 (notice that the block matrices F is and F i (s+i −i) are in the same block row of F). 4. If i < i ≤ j then the q-degree of the polynomial associated to any entry of the column s of F i j is smaller than the q-degree of the polynomial associated to any entry of the column s of F i j , for any 0 ≤ s ≤ n − 1. 5. If D is a square submatrix of F there exists an invertible matrix M such that (a) the q-degrees of the polynomials associated with the entries of any row of DM are strictly increasing, (b) the q-degrees of the polynomials associated with the entries of any column of DM are strictly increasing.
Next they used the proof of Theorem 3.2 of [1], where it is shown, when q is a prime number, that if a matrix B satisfies properties 5 (a) and 5 (b) above then B is invertible.This result is also valid for any q = p r , with p prime, as it was shown in [16].
Therefore, DM is invertible and since M is invertible also D is invertible.Therefore, F is superregular.
Based on these properties the next theorem was derived in [18].
Theorem 11 [18,Theorem 5] For any 0 ≤ t ≤ m let A t ∈ F n×n q be non-singular matrices where T i , for each 0 ≤ i ≤ m, is given by (20).
If M > q n(m+2)−1 and α is a primitive and normal element of F q M ,then F = T A [0,m] is superregular.
In our case instead of having a block diagonal matrix we have a block lower triangle matrix A [0,L] , but the corresponding versions of all the above properties are still satisfied.
Theorem 12 Let C be an (n, k, δ)-convolutional code with generator matrix G(D) = ν i=0 G i D i where each G i is a k × n matrix given by (20).Let M ≥ q n(L+2)−1 and let α be a primitive and normal element of F q M .Furthermore, let A [0,L] be nonsingular as described in (13), with j = L. Then S = G trunc (L)A [0,L] is a superregular matrix, and therefore where Let P be the permutation matrix corresponding to the permutation σ defined by Then S is superregular if and only if P S is superregular.Define where F i j = S j−i j .Each element of F i j is the value of a linearized polynomial evaluated at α, moreover, if the first row of F i j is ( f i j0 , . . ., f i j(n−1) ) then, for s ≤ n, the s-th row is ( f [s]  i j0 , . . ., f [s]  i j(n−1) ).So we obtain similar properties to 1. b) above.The corresponding propriety 1. c) is not valid, though.In our case, if i < i the entries of a row s of F i j are not q-powers of the corresponding entries in row s of F i j since our F i j instead of being just the product of two matrices, are the sum of products of matrices, so for i = i , the linearized polynomials from which we construct the entries of F i j may be completely different from the ones we use to construct the entries of F i j .That is also the reason why the exponents of the linearized polynomials used to obtain the entries of F i j don't depend on i, contrary to what happened above in (23).Hence, we have f [1]   i j0 f [1]  i j1 • • • f [1]   i j(n−1) f [2]   i j0 f [2]  i j1 • • • f [2]   i j(n−1) . . . . . . . . .
As α is normal over F q and each A ii is invertible, for 1 ≤ i ≤ n, then it holds that ( f i j0 , f i j1 , . . ., f i j(n−1) ) are linearly independent over F q , which corresponds to property 1. a) above.Clearly, The q-degrees of the polynomials associated to the elements of F i j strictly increase downwards on any fixed column, for any 0 ≤ i ≤ j ≤ L, so we get property 2..It is also easy to see that the proprieties corresponding to the properties 3. 4. and 5. above are also satisfied.Hence F is superregular.The last statement is Theorem 8.
The following example illustrates the proprieties mentioned above.

Conclusions
We have studied rank metric convolutional codes and propose a novel metric suitable for networks with delays.We have fully characterized and built optimal rank convolutional codes with respect to this metric and therefore extended previous research in the area of multi-shot network coding.The results were investigated for a given rate and degree allowing the field size unrestricted.Although we need an enormous field size in the Theorem 12, these type of constructions can be used to obtain superregular matrices in fields much smaller, but in that case these matrices have to be checked individually for superregularity.This approach was already explored in [18] and [4] and is one avenue of research we are interested to investigate.Another important issue that remains open is to provide not only sufficient (as given in Theorem 8) but also necessary conditions for a given convolutional code to be MRP in terms of superregular matrices.We conjecture that this is possible and would require the superregularity of a smaller matrix than G trunc (L) and H trunc (L) which would allow to build MRP over much smaller fields.These issues require further research.
is a codeword.Since φ(w) has at most d nonzero columns and A [0, j] is nonsingular then φ(wA [0, j] ) has rank at most d.Thus d j CR (C) ≤ d.To show that d j CR (C) ≥ d we do it by contradiction.Assume d j CR (C) < d.Then, there exists