--- title: "Frequently Asked Questions" output: rmarkdown::html_vignette: default vignette: > %\VignetteIndexEntry{Frequently Asked Questions} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} type: docs repo: https://github.com/rstudio/keras menu: main: name: "Frequently Asked Questions" identifier: "keras-faq" parent: "keras-getting-started-top" weight: 40 aliases: - /keras/articles/faq.html --- ```{r setup, include=FALSE} library(keras) knitr::opts_chunk$set(eval = FALSE) ``` ## How should I cite Keras? Please cite Keras in your publications if it helps your research. Here is an example BibTeX entry: ``` @misc{chollet2017kerasR, title={R Interface to Keras}, author={Chollet, Fran\c{c}ois and Allaire, JJ and others}, year={2017}, publisher={GitHub}, howpublished={\url{https://github.com/rstudio/keras}}, } ``` ## How can I run Keras on a GPU? Note that installation and configuration of the GPU-based backends can take considerably more time and effort. So if you are just getting started with Keras you may want to stick with the CPU version initially, then install the appropriate GPU version once your training becomes more computationally demanding. Below are instructions for installing and enabling GPU support for the various supported backends. ### TensorFlow If your system has an NVIDIA® GPU and you have the GPU version of TensorFlow installed then your Keras code will automatically run on the GPU. Additional details on GPU installation can be found here: . ### Theano If you are running on the Theano backend, you can set the `THEANO_FLAGS` environment variable to indicate you'd like to execute tensor operations on the GPU. For example: ```r Sys.setenv(KERAS_BACKEND = "keras") Sys.setenv(THEANO_FLAGS = "device=gpu,floatX=float32") library(keras) ``` The name 'gpu' might have to be changed depending on your device's identifier (e.g. `gpu0`, `gpu1`, etc). ### CNTK If you have the GPU version of CNTK installed then your Keras code will automatically run on the GPU. Additional information on installing the GPU version of CNTK can be found here: https://learn.microsoft.com/en-us/cognitive-toolkit/setup-linux-python ## How can I run a Keras model on multiple GPUs? We recommend doing so using the **TensorFlow** backend. There are two ways to run a single model on multiple GPUs: **data parallelism** and **device parallelism**. In most cases, what you need is most likely data parallelism. ### Data parallelism Data parallelism consists in replicating the target model once on each device, and using each replica to process a different fraction of the input data. Keras has a built-in utility, `multi_gpu_model()`, which can produce a data-parallel version of any model, and achieves quasi-linear speedup on up to 8 GPUs. For more information, see the documentation for [multi_gpu_model](https://tensorflow.rstudio.com/reference/keras/multi_gpu_model.html). Here is a quick example: ```{r, eval= FALSE} # Replicates `model` on 8 GPUs. # This assumes that your machine has 8 available GPUs. parallel_model <- multi_gpu_model(model, gpus=8) parallel_model %>% compile( loss = "categorical_crossentropy", optimizer = "rmsprop" ) # This `fit` call will be distributed on 8 GPUs. # Since the batch size is 256, each GPU will process 32 samples. parallel_model %>% fit(x, y, epochs = 20, batch_size = 256) ``` ### Device parallelism Device parallelism consists in running different parts of a same model on different devices. It works best for models that have a parallel architecture, e.g. a model with two branches. This can be achieved by using TensorFlow device scopes. Here is a quick example: ```{r, eval = FALSE} # Model where a shared LSTM is used to encode two different sequences in parallel input_a <- layer_input(shape = c(140, 256)) input_b <- layer_input(shape = c(140, 256)) shared_lstm <- layer_lstm(units = 64) # Process the first sequence on one GPU library(tensorflow) with(tf$device_scope("/gpu:0", { encoded_a <- shared_lstm(tweet_a) }): # Process the next sequence on another GPU with(tf$device_scope("/gpu:1", { encoded_b <- shared_lstm(tweet_b) }): # Concatenate results on CPU with(tf$device_scope("/cpu:0", { merged_vector <- layer_concatenate(list(encoded_a, encoded_b)) }): ``` ## What does "sample", "batch", "epoch" mean? Below are some common definitions that are necessary to know and understand to correctly utilize Keras: - **Sample**: one element of a dataset. - *Example:* one image is a **sample** in a convolutional network - *Example:* one audio file is a **sample** for a speech recognition model - **Batch**: a set of *N* samples. The samples in a **batch** are processed independently, in parallel. If training, a batch results in only one update to the model. - A **batch** generally approximates the distribution of the input data better than a single input. The larger the batch, the better the approximation; however, it is also true that the batch will take longer to process and will still result in only one update. For inference (evaluate/predict), it is recommended to pick a batch size that is as large as you can afford without going out of memory (since larger batches will usually result in faster evaluating/prediction). - **Epoch**: an arbitrary cutoff, generally defined as "one pass over the entire dataset", used to separate training into distinct phases, which is useful for logging and periodic evaluation. - When using `evaluation_data` or `evaluation_split` with the `fit` method of Keras models, evaluation will be run at the end of every **epoch**. - Within Keras, there is the ability to add [callbacks](training_callbacks.html) specifically designed to be run at the end of an **epoch**. Examples of these are learning rate changes and model checkpointing (saving). ## Why are Keras objects modified in place? Unlike most R objects, Keras objects are "mutable". That means that when you modify an object you're modifying it "in place", and you don't need to assign the updated object back to the original name. For example, to add layers to a Keras model you might use this code: ```{r} model %>% layer_dense(units = 32, activation = 'relu', input_shape = c(784)) %>% layer_dense(units = 10, activation = 'softmax') ``` Rather than this code: ```{r} model <- model %>% layer_dense(units = 32, activation = 'relu', input_shape = c(784)) %>% layer_dense(units = 10, activation = 'softmax') ``` You need to be aware of this because it makes the Keras API a little different than most other pipelines you may have used, but it's necessary to match the data structures and behavior of the underlying Keras library. ## How can I save a Keras model? ### Saving/loading whole models (architecture + weights + optimizer state) You can use `save_model_hdf5()` to save a Keras model into a single HDF5 file which will contain: - the architecture of the model, allowing to re-create the model - the weights of the model - the training configuration (loss, optimizer) - the state of the optimizer, allowing to resume training exactly where you left off. You can then use `load_model_hdf5()` to reinstantiate your model. `load_model_hdf5()` will also take care of compiling the model using the saved training configuration (unless the model was never compiled in the first place). Example: ```{r} save_model_hdf5(model, 'my_model.h5') model <- load_model_hdf5('my_model.h5') ``` ### Saving/loading only a model's architecture If you only need to save the **architecture of a model**, and not its weights or its training configuration, you can do: ```{r} json_string <- model_to_json(model) yaml_string <- model_to_yaml(model) ``` The generated JSON / YAML files are human-readable and can be manually edited if needed. You can then build a fresh model from this data: ```{r} model <- model_from_json(json_string) model <- model_from_yaml(yaml_string) ``` ### Saving/loading only a model's weights If you need to save the **weights of a model**, you can do so in HDF5 with the code below. ```{r} save_model_weights_hdf5('my_model_weights.h5') ``` Assuming you have code for instantiating your model, you can then load the weights you saved into a model with the *same* architecture: ```{r} model %>% load_model_weights_hdf5('my_model_weights.h5') ``` If you need to load weights into a *different* architecture (with some layers in common), for instance for fine-tuning or transfer-learning, you can load weights by *layer name*: ```{r} model %>% load_model_weights_hdf5('my_model_weights.h5', by_name = TRUE) ``` For example: ```{r} # assuming the original model looks like this: # model <- keras_model_sequential() # model %>% # layer_dense(units = 2, input_dim = 3, name = "dense 1") %>% # layer_dense(units = 3, name = "dense_3") %>% # ... # save_model_weights(model, fname) # new model model <- keras_model_sequential() model %>% layer_dense(units = 2, input_dim = 3, name = "dense 1") %>% # will be loaded layer_dense(units = 3, name = "dense_3") # will not be loaded # load weights from first model; will only affect the first layer, dense_1. load_model_weights(fname, by_name = TRUE) ``` ## Why is the training loss much higher than the testing loss? A Keras model has two modes: training and testing. Regularization mechanisms, such as Dropout and L1/L2 weight regularization, are turned off at testing time. Besides, the training loss is the average of the losses over each batch of training data. Because your model is changing over time, the loss over the first batches of an epoch is generally higher than over the last batches. On the other hand, the testing loss for an epoch is computed using the model as it is at the end of the epoch, resulting in a lower loss. ## How can I obtain the output of an intermediate layer? One simple way is to create a new `Model` that will output the layers that you are interested in: ```{r} model <- ... # create the original model layer_name <- 'my_layer' intermediate_layer_model <- keras_model(inputs = model$input, outputs = get_layer(model, layer_name)$output) intermediate_output <- predict(intermediate_layer_model, data) ``` ## How can I use Keras with datasets that don't fit in memory? ### Generator Functions To provide training or evaluation data incrementally you can write an R generator function that yields batches of training data then pass the function to the `fit_generator()` function (or related functions `evaluate_generator()` and `predict_generator()`. The output of generator functions must be a list of one of these forms: - (inputs, targets) - (inputs, targets, sample_weights) All arrays should contain the same number of samples. The generator is expected to loop over its data indefinitely. For example, here's simple generator function that yields randomly sampled batches of data: ```{r} sampling_generator <- function(X_data, Y_data, batch_size) { function() { rows <- sample(1:nrow(X_data), batch_size, replace = TRUE) list(X_data[rows,], Y_data[rows,]) } } model %>% fit_generator(sampling_generator(X_train, Y_train, batch_size = 128), steps_per_epoch = nrow(X_train) / 128, epochs = 10) ``` The `steps_per_epoch` parameter indicates the number of steps (batches of samples) to yield from generator before declaring one epoch finished and starting the next epoch. It should typically be equal to the number of unique samples if your dataset divided by the batch size. ### External Data Generators The above example doesn't however address the use case of datasets that don't fit in memory. Typically to do that you'll write a generator that reads from another source (e.g. a sparse matrix or file(s) on disk) and maintains an offset into that data as it's called repeatedly. For example, imagine you have a set of text files in a directory you want to read from: ```{r} data_files_generator <- function(dir) { files < list.files(dir) next_file <- 0 function() { # move to the next file (note the <<- assignment operator) next_file <<- next_file + 1 # if we've exhausted all of the files then start again at the # beginning of the list (keras generators need to yield # data infinitely -- termination is controlled by the epochs # and steps_per_epoch arguments to fit_generator()) if (next_file > length(files)) next_file <<- 1 # determine the file name file <- files[[next_file]] # process and return the data in the file. note that in a # real example you'd further subdivide the data within the # file into appropriately sized training batches. this # would make this function much more complicated so we # don't demonstrated it here file_to_training_data(file) } } ``` The above function is an example of a stateful generator---the function maintains information across calls to keep track of which data to provide next. This is accomplished by defining shared state outside the generator function body and using the `<<-` operator to assign to it from within the generator. ### Image Generators You can also use the `flow_images_from_directory()` and `flow_images_from_data()` functions along with `fit_generator()` for training on sets of images stored on disk (with optional image augmentation/normalization via `image_data_generator()`). ### Batch Functions You can also do batch training using the `train_on_batch()` and `test_on_batch()` functions. These functions enable you to write a training loop that reads into memory only the data required for each batch. ## How can I interrupt training when the validation loss isn't decreasing anymore? You can use an early stopping callback: ```{r} early_stopping <- callback_early_stopping(monitor = 'val_loss', patience = 2) model %>% fit(X, y, validation_split = 0.2, callbacks = c(early_stopping)) ``` Find out more in the [callbacks documentation](training_callbacks.html). ## How is the validation split computed? If you set the `validation_split` argument in `fit` to e.g. 0.1, then the validation data used will be the *last 10%* of the data. If you set it to 0.25, it will be the last 25% of the data, etc. Note that the data isn't shuffled before extracting the validation split, so the validation is literally just the *last* x% of samples in the input you passed. The same validation set is used for all epochs (within a same call to `fit`). ## Is the data shuffled during training? Yes, if the `shuffle` argument in `fit` is set to `TRUE` (which is the default), the training data will be randomly shuffled at each epoch. Validation data is never shuffled. ## How can I record the training / validation loss / accuracy at each epoch? The `model.fit` method returns an `History` callback, which has a `history` attribute containing the lists of successive losses and other metrics. ```{r} hist <- model %>% fit(X, y, validation_split=0.2) hist$history ``` ## How can I "freeze" Keras layers? To "freeze" a layer means to exclude it from training, i.e. its weights will never be updated. This is useful in the context of fine-tuning a model, or using fixed embeddings for a text input. You can pass a `trainable` argument (boolean) to a layer constructor to set a layer to be non-trainable: ```{r} frozen_layer <- layer_dense(units = 32, trainable = FALSE) ``` Additionally, you can set the `trainable` property of a layer to `TRUE` or `FALSE` after instantiation. For this to take effect, you will need to call `compile()` on your model after modifying the `trainable` property. Here's an example: ```{r} x <- layer_input(shape = c(32)) layer <- layer_dense(units = 32) layer$trainable <- FALSE y <- x %>% layer frozen_model <- keras_model(x, y) # in the model below, the weights of `layer` will not be updated during training frozen_model %>% compile(optimizer = 'rmsprop', loss = 'mse') layer$trainable <- TRUE trainable_model <- keras_model(x, y) # with this model the weights of the layer will be updated during training # (which will also affect the above model since it uses the same layer instance) trainable_model %>% compile(optimizer = 'rmsprop', loss = 'mse') frozen_model %>% fit(data, labels) # this does NOT update the weights of `layer` trainable_model %>% fit(data, labels) # this updates the weights of `layer` ``` Finally, you can freeze or unfreeze the weights for an entire model (or a range of layers within the model) using the `freeze_weights()` and `unfreeze_weights()` functions. For example: ```{r} # instantiate a VGG16 model conv_base <- application_vgg16( weights = "imagenet", include_top = FALSE, input_shape = c(150, 150, 3) ) # freeze it's weights freeze_weights(conv_base) # create a composite model that includes the base + more layers model <- keras_model_sequential() %>% conv_base %>% layer_flatten() %>% layer_dense(units = 256, activation = "relu") %>% layer_dense(units = 1, activation = "sigmoid") # compile model %>% compile( loss = "binary_crossentropy", optimizer = optimizer_rmsprop(lr = 2e-5), metrics = c("accuracy") ) # unfreeze weights from "block5_conv1" on unfreeze_weights(conv_base, from = "block5_conv1") # compile again since we froze or unfroze layers model %>% compile( loss = "binary_crossentropy", optimizer = optimizer_rmsprop(lr = 2e-5), metrics = c("accuracy") ) ``` ## How can I use stateful RNNs? Making a RNN stateful means that the states for the samples of each batch will be reused as initial states for the samples in the next batch. When using stateful RNNs, it is therefore assumed that: - all batches have the same number of samples - If `X1` and `X2` are successive batches of samples, then `X2[[i]]` is the follow-up sequence to `X1[[i]`, for every `i`. To use statefulness in RNNs, you need to: - explicitly specify the batch size you are using, by passing a `batch_size` argument to the first layer in your model. E.g. `batch_size=32` for a 32-samples batch of sequences of 10 timesteps with 16 features per timestep. - set `stateful=TRUE` in your RNN layer(s). - specify `shuffle=FALSE` when calling fit(). To reset the states accumulated in either a single layer or an entire model use the `reset_states()` function. Notes that the methods `predict()`, `fit()`, `train_on_batch()`, `predict_classes()`, etc. will *all* update the states of the stateful layers in a model. This allows you to do not only stateful training, but also stateful prediction. ## How can I remove a layer from a Sequential model? You can remove the last added layer in a Sequential model by calling `pop_layer()`: ```{r} model <- keras_model_sequential() model %>% layer_dense(units = 32, activation = 'relu', input_shape = c(784)) %>% layer_dense(units = 32, activation = 'relu') %>% layer_dense(units = 32, activation = 'relu') length(model$layers) # "3" model %>% pop_layer() length(model$layers) # "2" ``` ## How can I use pre-trained models in Keras? Code and pre-trained weights are available for the following image classification models: - [Xception](https://tensorflow.rstudio.com/reference/keras/application_xception.html) - [VGG16](https://tensorflow.rstudio.com/reference/keras/application_vgg.html) - [VGG19](https://tensorflow.rstudio.com/reference/keras/application_vgg.html) - [ResNet50](https://tensorflow.rstudio.com/reference/keras/application_resnet.html) - [InceptionV3](https://tensorflow.rstudio.com/reference/keras/application_inception_v3.html) - [InceptionResNetV2](https://tensorflow.rstudio.com/reference/keras/application_inception_resnet_v2.html) - [MobileNet](https://tensorflow.rstudio.com/reference/keras/application_mobilenet.html) - [MobileNetV2](https://tensorflow.rstudio.com/reference/keras/application_mobilenet_v2.html) - [DenseNet](https://tensorflow.rstudio.com/reference/keras/application_densenet.html) - [NASNet](https://tensorflow.rstudio.com/reference/keras/application_nasnet.html) For example: ```{r} model <- application_vgg16(weights = 'imagenet', include_top = TRUE) ``` For a few simple usage examples, see [the documentation for the Applications module](applications.html). The VGG16 model is also the basis for the [Deep dream](https://tensorflow.rstudio.com/examples/deep_dream) Keras example script. ## How can I use other Keras backends? By default the Keras Python and R packages use the TensorFlow backend. Other available backends include Theano or CNTK. To learn more about using alternatate backends (e.g. Theano or CNTK) see the article on Keras backends. ## How can I use the PlaidML backend? [PlaidML](https://github.com/plaidml/plaidml) is an open source portable deep learning engine that runs on most existing PC hardware with OpenCL-capable GPUs from NVIDIA, AMD, or Intel. PlaidML includes a Keras backend which you can use as described below. First, build and install PlaidML as described on the [project website](https://github.com/plaidml/plaidml). You must be sure that PlaidML is correctly installed, setup, and working before proceeding further! Then, to use Keras with the PlaidML backend you do the following: ```{r} library(keras) use_backend("plaidml") ``` This should automatically discover and use the Python environment where `plaidml` and `plaidml-keras` were installed. If this doesn't work as expected you can also force the selection of a particular Python environment. For example, if you installed PlaidML in conda environment named "plaidml" you would do this: ```{r} library(keras) use_condaenv("plaidml") use_backend("plaidml") ``` ## How can I use Keras in another R package? ### Testing on CRAN The main consideration in using Keras within another R package is to ensure that your package can be tested in an environment where Keras is not available (e.g. the CRAN test servers). To do this, arrange for your tests to be skipped when Keras isn't available using the `is_keras_available()` function. For example, here's a [testthat](https://r-pkgs.org/testing-basics.html) utility function that can be used to skip a test when Keras isn't available: ```{r} # testthat utilty for skipping tests when Keras isn't available skip_if_no_keras <- function(version = NULL) { if (!is_keras_available(version)) skip("Required keras version not available for testing") } # use the function within a test test_that("keras function works correctly", { skip_if_no_keras() # test code here }) ``` You can pass the `version` argument to check for a specific version of Keras. ### Keras Module Another consideration is gaining access to the underlying Keras Python module. You might need to do this if you require lower level access to Keras than is provided for by the Keras R package. Since the Keras R package can bind to multiple different implementations of Keras (either the original Keras or the TensorFlow implementation of Keras), you should use the `keras::implementation()` function to obtain access to the correct python module. You can use this function within the `.onLoad` function of a package to provide global access to the module within your package. For example: ```{r} # Keras Python module keras <- NULL # Obtain a reference to the module from the keras R package .onLoad <- function(libname, pkgname) { keras <<- keras::implementation() } ``` ### Custom Layers If you create [custom layers in R](custom_layers.html) or import other Python packages which include custom Keras layers, be sure to wrap them using the `create_layer()` function so that they are composable using the magrittr pipe operator. See the documentation on [layer wrapper functions](custom_layers.html#layer-wrapper-function) for additional details. ## How can I obtain reproducible results using Keras during development? During development of a model, sometimes it is useful to be able to obtain reproducible results from run to run in order to determine if a change in performance is due to an actual model or data modification, or merely a result of a new random sample. The `use_session_with_seed()` function establishes a common random seed for R, Python, NumPy, and TensorFlow. It furthermore disables hash randomization, GPU computations, and CPU parallelization, which can be additional sources of non-reproducibility. To use the function, call it immediately after you load the keras package: ```{r} library(keras) use_session_with_seed(42) # ...rest of code follows... ``` This function takes all measures known to promote reproducible results from Keras sessions, however it's possible that various individual features or libraries used by the backend escape its effects. If you encounter non-reproducible results please investigate the possible sources of the problem. The source code for `use_session_with_seed()` is here: https://github.com/rstudio/tensorflow/blob/main/R/seed.R. Contributions via pull request are very welcome! Please note again that `use_session_with_seed()` disables GPU computations and CPU parallelization by default (as both can lead to non-deterministic computations) so should generally not be used when model training time is paramount. You can re-enable GPU computations and/or CPU parallelism using the `disable_gpu` and `disable_parallel_cpu` arguments. For example: ```{r} library(keras) use_session_with_seed(42, disable_gpu = FALSE, disable_parallel_cpu = FALSE) ``` ## Where is the Keras configuration filed stored? The default directory where all Keras data is stored is: ```bash ~/.keras/ ``` In case Keras cannot create the above directory (e.g. due to permission issues), `/tmp/.keras/` is used as a backup. The Keras configuration file is a JSON file stored at `$HOME/.keras/keras.json`. The default configuration file looks like this: ``` { "image_data_format": "channels_last", "epsilon": 1e-07, "floatx": "float32", "backend": "tensorflow" } ``` It contains the following fields: - The image data format to be used as default by image processing layers and utilities (either `channels_last` or `channels_first`). - The `epsilon` numerical fuzz factor to be used to prevent division by zero in some operations. - The default float data type. - The default backend (this will always be "tensorflow" in the R interface to Keras) Likewise, cached dataset files, such as those downloaded with `get_file()`, are stored by default in `$HOME/.keras/datasets/`.