---
title: "Writing Custom Keras Layers"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Writing Custom Keras Layers} 
  %\VignetteEngine{knitr::rmarkdown} 
  %\VignetteEncoding{UTF-8}
type: docs
repo: https://github.com/rstudio/keras
menu:
  main:
    name: "Custom Layers"
    identifier: "keras-custom-layers"
    parent: "keras-advanced-top"
    weight: 10
aliases:
  - /keras/articles/about_keras_layers.html
  - /keras/articles/custom_layers.html
---

```{r setup, include = FALSE}
library(keras)
knitr::opts_chunk$set(comment = NA, eval = FALSE)
```


If the existing Keras layers don't meet your requirements you can create a custom layer. For simple, stateless custom operations, you are probably better off using `layer_lambda()` layers. But for any custom operation that has trainable weights, you should implement your own layer. 

The example below illustrates the skeleton of a Keras custom layer. 

<!-- The [mnist_antirectifier](https://tensorflow.rstudio.com/examples/mnist_antirectifier.html) example includes another demonstration of creating a custom layer. -->

## The Layer function

Layers encapsulate a state (weights) and some computation.
The main data structure you'll work with is the `Layer`. A layer encapsulates both a state (the layer's "weights") and a transformation from inputs to outputs (a "call", the layer's forward pass).

```{r}
library(tensorflow)
library(keras)

layer_linear <- Layer(
  classname = "Linear", 
  initialize = function(units, input_dim) {
    super()$`__init__`()
    w_init <- tf$random_normal_initializer()
    self$w <- tf$Variable(
      initial_value = w_init(shape = shape(input_dim, units),
                             dtype = tf$float32)
      )
    b_init <- tf$zeros_initializer()
    self$b <- tf$Variable(
      initial_value = b_init(shape = shape(units),
                             dtype = tf$float32)
    )
  },
  call = function(inputs, ...) {
    tf$matmul(inputs, self$w) + self$b
  }
)

x <- tf$ones(shape = list(2,2))
layer <- layer_linear(units = 4, input_dim = 2)
y <- layer(x)
y
```

Note that the weights w and b are automatically tracked by the layer upon being set as layer attributes.

```{r}
get_weights(layer)
```

Note you also have access to a quicker shortcut for adding weight to a layer: the `add_weight` method:

```{r}
layer_linear <- Layer(
  classname = "Linear", 
  initialize = function(units, input_dim) {
    super()$`__init__`()
    self$w <- self$add_weight(
      shape = shape(input_dim, units),
      initializer = "random_normal",
      trainable = TRUE
    )
    self$b <- self$add_weight(
      shape = shape(units),
      initializer = "zeros",
      trainable = TRUE
    )
  },
  call = function(inputs, ...) {
    tf$matmul(inputs, self$w) + self$b
  }
)
```


It's important to call **`super()$__init__()`** in the `initialize` method.

Note that tensor operations are executed using the Keras `backend()`. See the Keras Backend article for details on the various functions available from Keras backends.

Besides trainable weights, you can add non-trainable weights to a layer as well. Such weights are meant not to be taken into account during backpropagation, when you are training the layer.

Here's how to add and use a non-trainable weight:

```{r}
layer_compute_sum <- Layer(
  classname = "ComputeSum",
  initialize = function(input_dim) {
    super()$`__init__`()
    self$total <- tf$Variable(
      initial_value = tf$zeros(shape(input_dim)),
      trainable = FALSE
    )
  },
  call = function(inputs, ...) {
    self$total$assign_add(tf$reduce_sum(inputs, axis = 0L))
    self$total
  }
)

x <- tf$ones(shape(2,2))
mysum <- layer_compute_sum(input_dim = 2)
print(mysum(x))
print(mysum(x))
```

It's part of `layer$weights` but it gets categorized as a non-trainable weight:

```{r}
get_weights(mysum)
mysum$non_trainable_weights
```

## Best practice: deferring weight creation until the shape of the inputs is known

In Linear example above, our Linear layer took an input_dim argument that was used to compute the shape of the weights w and b in `initialize`:

```{r}
layer_linear <- Layer(
  classname = "Linear", 
  initialize = function(units, input_dim) {
    super()$`__init__`()
    self$w <- self$add_weight(
      shape = shape(input_dim, units),
      initializer = "random_normal",
      trainable = TRUE
    )
    self$b <- self$add_weight(
      shape = shape(units),
      initializer = "zeros",
      trainable = TRUE
    )
  },
  call = function(inputs, ...) {
    tf$matmul(inputs, self$w) + self$b
  }
)
```

In many cases, you may not know in advance the size of your inputs, and you would like to lazily create weights when that value becomes known, some time after instantiating the layer.

In the Keras API, we recommend creating layer weights in the `build(inputs_shape)` method of your layer. Like this:

```{r}
layer_linear <- Layer(
  classname = "Linear", 
  initialize = function(units) {
    super()$`__init__`()
    self$units <- units
  },
  build = function(input_shape) {
    self$w <- self$add_weight(
      shape = shape(input_shape[2], self$units),
      initializer = "random_normal",
      trainable = TRUE
    )
    self$b <- self$add_weight(
      shape = shape(self$units),
      initializer = "zeros",
      trainable = TRUE
    )
  },
  call = function(inputs, ...) {
    tf$matmul(inputs, self$w) + self$b
  }
)
```

The `call` method of your layer will automatically run build the first time it is called. You now have a layer that's lazy and easy to use:

```{r}
layer <- layer_linear(units = 32)
x <- tf$ones(shape = list(2,2))
layer(x)
```

## Layers are recursively composable

If you assign a Layer instance as attribute of another Layer, the outer layer will start tracking the weights of the inner layer.

We recommend creating such sublayers in the `initialize` method (since the sublayers will typically have a build method, they will be built when the outer layer gets built).

```{r}
# Let's assume we are reusing the Linear class
# with a `build` method that we defined above.
layer_mlp_block <- Layer(
  classname = "MLPBlock",
  initialize = function() {
    super()$`__init__`()
    self$linear_1 <- layer_linear(units = 32)
    self$linear_2 <- layer_linear(units = 32)
    self$linear_3 <- layer_linear(units = 1)
  },
  call = function(inputs, ...) {
    inputs %>% 
      self$linear_1() %>% 
      tf$nn$relu() %>% 
      self$linear_2() %>% 
      tf$nn$relu() %>% 
      self$linear_3()
  }
)

mlp <- layer_mlp_block()

y <- mlp(tf$ones(shape(3, 64)))  # The first call to the `mlp` will create the weights
length(mlp$weights)
length(mlp$trainable_weights)
```

## Layers recursively collect losses created during the forward pass

When writing the `call` method of a layer, you can create loss tensors that you will want to use later, when writing your training loop. This is doable by calling `self$add_loss(value)`:

```{r}
# A layer that creates an activity regularization loss
layer_activity_reg <- Layer(
  classname = "ActivityRegularizationLayer",
  initialize = function(rate = 1e-2) {
    super()$`__init__`()
    self$rate <- rate
  },
  call = function(inputs) {
    self$add_loss(self$rate * tf$reduce_sum(inputs))
    inputs
  }
)
```

These losses (including those created by any inner layer) can be retrieved via `layer$losses`. This property is reset at the start of every `call` to the top-level layer, so that `layer$losses` always contains the loss values created during the last forward pass.

```{r}
layer_outer <- Layer(
  classname = "OuterLayer",
  initialize = function() {
    super()$`__init__`()
    self$dense <- layer_dense(
      units = 32, 
      kernel_regularizer = regularizer_l2(1e-3)
    )
  },
  call = function(inputs) {
    self$dense(inputs)
  }
)

layer <- layer_outer()
x <- layer(tf$zeros(shape(1,1)))

# This is `1e-3 * sum(layer.dense.kernel ** 2)`,
# created by the `kernel_regularizer` above.
layer$losses
```

## You can optionally enable serialization on your layers

If you need your custom layers to be serializable as part of a Functional model, you can optionally implement a `get_config` method.

Note that the `initialize` method of the base Layer class takes some keyword arguments, in particular a `name` and a `dtype`. It's good practice to pass these arguments to the parent class in `initialize` and to include them in the layer config:

```{r}
layer_linear <- Layer(
  classname = "Linear", 
  initialize = function(units, ...) {
    super()$`__init__`(...)
    self$units <- units
  },
  build = function(input_shape) {
    self$w <- self$add_weight(
      shape = shape(input_shape[2], self$units),
      initializer = "random_normal",
      trainable = TRUE
    )
    self$b <- self$add_weight(
      shape = shape(self$units),
      initializer = "zeros",
      trainable = TRUE
    )
  },
  call = function(inputs, ...) {
    tf$matmul(inputs, self$w) + self$b
  },
  get_config = function() {
    list(
      units = self$units
    )
  }
)

layer <- layer_linear(units = 64)
config <- get_config(layer)
new_layer <- from_config(config)
```

If you need more flexibility when deserializing the layer from its config, you can also override the `from_config` class method. This is the base implementation of `from_config`:

```{python}
def from_config(cls, config):
  return cls(**config)
```

## Privileged training argument in the call method

Some layers, in particular the `layer_batch_normalization` and the `layer_dropout`, have different behaviors during training and inference. For such layers, it is standard practice to expose a training (boolean) argument in the call method.

By exposing this argument in call, you enable the built-in training and evaluation loops (e.g. fit) to correctly use the layer in training and inference.

```{r}
layer_custom_dropout <- Layer(
  classname =  "CustomDropout",
  initialize = function(rate, ...) {
    super()$`__init__`(...)
    self$rate <- rate
  },
  call = function(inputs, training = NULL) {
    if (!is.null(inputs) && training) {
      inputs <- tf$nn$dropout(inputs, rate = self$rate)
    }
    inputs
  }
)
```