One exciting but rarely talked about aspects of Tensorflow, is that nearly all its functions support complex valued functions out of the box. In addition, it provides an adaption of numpy’s `einsum`

, which makes wiring up quantum tensor networks a breeze. This opens up the possibility to simulate and explore optimization of quantum circuits using gradient descent.

In this post I’ll demonstrate how to use tf.einsum to quickly convert quantum tensor networks into tensorflow code. `tf.einsum`

is the swiss army knife of tensors, with a lineage tracing back to Einstein. This one function produces every linear form from scalar products, traces, diagonals, matrix multiplications, batch operations and beyond in a concise form. It’s so powerful and flexible, I find myself reaching for it even in the context of classical machine learning networks.

I’ll be using the notation described in Quantum Tensor Networks in a Nutshell. It’s a *great* read and covers more than what I’m presenting. I’ll reproduce the necessary bits here.

Let’s see how it works.

## Tensor Blocks

The fundamental object of quantum tensor networks are, unsurprisingly, tensors. In the simplest case, we may consider an arbitrary rank-1 tensor,

. Diagrammatically, `psi = tf.random.normal((N, ))`

`psi`

is presented as a node with one edge, representing its single rank.

Although trivial, I’m presenting the equivalent `einsum`

expression for reference. I’ll provide the procedure for transforming the diagram into code below.

Similarly, we can construct an arbitrary rank-2 tensor, a matrix, `A = tf.random.normal((N, N))`

. `A`

doesn’t need to be square, any shaped tensor will work, but you can only connect tensors along equally sized ranks.

Here there are *two* edges, representing the two ranks. The first edge, `i`

, is considered an *input *and the second edge, `j`

, is the output.

We can continue generalizing to higher rank tensors. For instance, `T= tf.random.normal((N, M, N))`

, can be considered as a rank-3 tensor with one input and two outputs. Which edges are inputs and which are outputs is largely arbitrary, but we need to keep track of inputs and outputs when transposing our operators.

## Tensor Transposes

Now that we have a few tensors to work with, we can consider the transpose of our tensors. Effectively, this means to swap the inputs and outputs of our tensor.

In the rank-1 case, transposition has little effect on the equivalent code. However, this is actually quite interesting, since it points to the ease of using `einsum`

. We don’t need to think of tensors and their duals as existing in different spaces.

In the rank-2 case, note that we are swapping the order of the input and output tensors.

Finally, in the 3-rank case, we again swap the position of the input and output indices, noting well that we are not simply reversing sequence order.

## Connecting Tensors, Contracting Tensors

Now that we have the representation of individual tensor blocks and their transposes, we do some non-trivial operations with `einsum`

. You’ll notice throughout, there is no fumbling with transposes or dimension expansions.

#### General procedure for converting a tensor network into tf.einsum code:

- Label each edge with a unique letter index
- For each node in the network, create a
*term*string by listing the indexes for the inputs followed by the outputs - Create the
*left hand side*of the*equation*string, by joining the*term*strings with commas. Order does not matter. - Create the
*right hand side*of the*equation*string by listing unterminated edges with inputs followed by outputs. This may be empty if there are no unterminated edges. - Create the
*equation*string by concatenating the*left hand side*with the*right hand side*separated by an`'->'`

arrow. If the right hand side is empty the arrow is not necessary. - Finally write
`tf.einsum(equation, tensor1, tensor2,...)`

where the tensor arguments are listed in the same order as in the*terms*in the*left hand side*.

Considering the rank-1 case again, the simplest contraction is the inner-product.

The two psi nodes are each represented by their common edge, `'i'`

, and there are no unterminated edges, so the equation is simply` `

`'i,i'`

In the rank-2 case we can consider matrix-vector multiplication. Diagrammatically, and logically, this is like transforming the input tensor in a pipeline.

The matrix node, `A,`

produces term` `

, and has a common edge with the vector,`'ij'`

` `

. Th edge `'i'`

is not terminated, yielding the equation`'j'`

` `

. Try for yourself to generate the equation to multiple `'ij,i->j'`

`psi`

from the opposite side of `A`

In the rank-2 case we can *also* consider matrix-matrix multiplication. This is like forming a pipeline of vector transforms.

In this case, there is a common edge between the first A’s input, and the following A’s output, and two exposed edges ready to be connected to other tensors, yielding

.`'ij,jk->ik'`

In the case of an orthogonal matrix, O, we expect the product with its transpose to yield an identity.

Replacing a rank-2 tensor with its transpose effectively means we pair the inputs in both tensors, yielding` `

. The equivalent operation `'ji,jk->ik'`

*without* using `einsum`

will start to become quite verbose, so I’ll won’t be including it in every case. You should try to work those out yourself.

Finally, we can contract more than two tensors, and of higher order:

## Tensor Products

Now, we turn to one of the distinguishing features of quantum circuits: groups of states are represented together by tensor product spaces. This is necessary to model entanglement. We can use `einsum`

to build product states from lower order tensors.

In the simplest, case, we can build a rank-2 tensor from a pair of rank-1 tensors.

Note well, this is not a simple concatenation or stacking of tensors. If `psi`

has shape `(N,)`

and `phi`

has shape `(M,)`

, this new tensor `Psi`

has shape `(N, M)`

. In terms of memory, stacking tensors requires `O(N + M)`

memory, whereas tensor products require `O(N * M)`

memory. Interestingly, part of the reason that quantum computation is so powerful is because quantum memory can hold tensor products like this in O(N + M ) space.

We can raise rank-2 tensors to rank-4 tensors in a similar manner:

This produces a new operator, `B`

, which applies `A`

independently to products of rank-1 tensors.

## Copy Tensor, But Never Use It Explicitly

Next, we’ll look at our edges a little more carefully.

You might have noticed that a single edge is effectively an identity tensor, tf.eye(N).

This could be constructed as:

eye = tf.reduce_sum([ tf.einsum('i,j->ij', y, y) for y in [ tf.one_hot(idx, depth=N) for idx in range(N)] ], 0)

This identity tensor could be inserted into our `einsum`

equations explicitly, but that would be unnecessary and computationally wasteful.

A copy tensor can be seen as a generalization of the identity tensor.

As such it can be constructed similarly to the identity tensor:

copy = tf.reduce_sum([ tf.einsum('i,j,k->ijk', y, y, y) for y in [ tf.one_hot(idx, depth=N) for idx in range(N)] ], 0)

However, I’ve never explicitly constructed this tensor in practice. Generally you only need to label all its edges with the same index, and literally copy the tensor variable in code as necessary.

For instance, copying a single rank-1 tensor is the same thing as forming the tensor’s product with itself.

## General Linear Forms

`einsum`

supports a broad array of linear forms, which appear as manifestly different functions in tensorflow.

For instance matrix trace operations:

By using the copy tensor (implicitly or explicitly), you can select out the diagonal of a matrix:

Using the copy tensor, `einsum`

also supports batch operations. A rank-2 tensor may be seen as either a product of two rank-1 tensors (as above) or it could be viewed as a batch of rank-1 tensors. Using the copy tensor, you can compute the batch of rank-1 dot products:

It really is the swiss army knife of linear forms, and supports these operations in a concise language compared to classical examples. The graphical language can be converted seamlessly into code. Consider implementing the following with and without `einsum`

. I’ll let you work it out for yourself (hint: it’s a trace of something).

## Hermitian Conjugates

For brevity, throughout this introduction, all tensors have been real valued. When working with quantum states and operators however, you will certainly want to use complex valued tensors. There’s nothing magical about `einsum`

in this case with respect to complex conjugation. In addition to transposing your operators, you will frequently want to form Hermitian conjugates by transposing *and* taking complex conjugates.

For example, if we use complex tensors:

psi = tf.complex( tf.random.normal((N,)), tf.random.normal((N,))) # Create a Hermitian matrix from A H = tf.complex( A + tf.transpose(A), A - tf.transpose(A)) # Create a unitary matrix from H U = tf.linalg.expm(-tf.complex(0., 1.) * H)

Then the bra-ket of psi with itself is:

In order to be an admissible quantum state, this must be 1.

Similarly, if U is unitary, then the product of U with its hermitian conjugate must be identity.

## Conclusion

By now I hope you see how amazingly flexible `einsum`

is, and you can quickly convert a tensor network into code. Even when I’m not working with quantum networks, I find `einsum`

easier to work with in many cases. In particular I don’t need to spend time thinking about whether my vector has a shape of (N,) or (N,1), or (1,N), nor search through documentation to remember exactly which submodule a function was placed in. I just jot down what I mean as an einsum equation, and move on.

I hope you’ve enjoyed this introduction. I’d enjoy hearing notes or comments, just send me a line. In the near future I intend to follow this up with some examples of quantum circuits for sequence models and the interesting difficulties therein.