# Export a TensorFlow model to LiteRT

You can convert TensorFlow models to LiteRT models and optimize them
for on-device inference. For more information about LiteRT model
conversion, see [Model conversion overview](https://ai.google.dev/edge/litert/models/convert).

LiteRT model conversion supports converting models to the following formats:

- 32-bit floating-point precision
- 16-bit floating-point precision
- uint8/int8 precision (quantizing models)

The following table lists the conversion methods, which the
TensorFlow framework provides to convert a TensorFlow model or a
Keras model to the LiteRT format:

TensorFlow model conversion methods

| Conversion method | Description |
| --- | --- |
| Python APIs | Converts, optimizes, and quantizes models to the LiteRT format |
| Command-line interface (CLI) tool | Converts models to the LiteRT format, but is suitable for<br>basic model conversion only |

The TensorFlow to LiteRT Python APIs offer more flexibility to
convert, optimize, and quantize models to suit your requirements.

## Convert models using Python APIs

The following table lists the Python APIs that TensorFlow provides to
convert a TensorFlow SavedModel or a Keras model to a LiteRT model:

TensorFlow Python APIs to convert models

| API | Description |
| --- | --- |
| `tf.lite.TFLiteConverter.from_saved_model()` (recommended) | Converts a TensorFlow SavedModel |
| `tf.lite.TFLiteConverter.from_keras_model()` | Converts a Keras model |

### Recommended: Convert a TensorFlow SavedModel using the Python API

The following example shows how to convert a TensorFlow model saved
in the saved\_model format to a LiteRT model:

import tensorflow as tf
    
    # Convert the model
    saved_model_dir = "/path/to/tf/model/in/saved_model/format"
    converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) tflite_model = converter.convert()
    
    # Save the model
    with open("model.tflite", "wb") as f: f.write(tflite_model)
    Copy to clipboard

Note

The converted LiteRT model isn’t quantized, and its data is in 32-bit floating-point precision.

#### Convert a Keras model using the Python API

The following example shows how to convert a Keras model to a LiteRT model:

import tensorflow as tf
    
    # Create a model using high-level tf.keras.* APIs
    model = tf.keras.models.Sequential(
                                        [
                                           tf.keras.layers.Dense(units=1, input_shape=[1]),
                                           tf.keras.layers.Dense(units=16, activation='relu'),
                                           tf.keras.layers.Dense(units=1)
                                        ]
                                     )
    
    # compile the model
    model.compile(optimizer='sgd', loss='mean_squared_error')
    
    # train the model
    model.fit(x=[-1, 0, 1], y=[-3, -1, 1], epochs=5)
    
    # Convert the model to LiteRT
    converter = tf.lite.TFLiteConverter.from_keras_model(model) tflite_model = converter.convert()
    
    # Save the model
    with open('model.tflite', 'wb') as f: f.write(tflite_model)
    Copy to clipboard

Note

The converted LiteRT model isn’t quantized, and its data is in 32-bit floating-point precision.

##### Quantize models

After converting models to the LiteRT format using Python APIs, you
can quantize them. Quantization reduces the size and computational
requirements of models. It involves converting high-precision values,
such as 32-bit floating-point numbers, into lower-precision formats,
such as 8-bit integers.

Quantization in neural network models involves the following steps:

1. Quantize weights and biases: These are already part of the trained
model; you can quantize them without additional information.
Therefore, quantizing weights and biases is a static step.
2. Quantize activation layers: The ranges for the activation layer
output depend on the input image during forward propagation.
Therefore, a set of sample inputs, known as calibration or
representative data sets, is necessary to quantize these layers and
identify the minimum and maximum ranges.

    To quantize a TensorFlow floating-point model to a quantized LiteRT
model, LiteRT provides posttraining quantization techniques. For more
information, see
[Posttraining quantization](https://ai.google.dev/edge/litert/models/post_training_quantization).

    LiteRT supports the following types of posttraining quantizations:

    - [Dynamic range quantization](https://docs.qualcomm.com/doc/80-80022-15B/topic/export-tf-model-litert.html#dynamic-range-quantization)
    - [Full integer quantization](https://docs.qualcomm.com/doc/80-80022-15B/topic/export-tf-model-litert.html#full-integer-quantization)

Quantize models using dynamic range quantization

In dynamic range quantization, weights and biases are statically
quantized from floating-point precision to fixed-point integer 8-bit
precision. The activation layer ranges remain in 32-bit
floating-point precision.

To reduce latencies during inference, dynamic-range operators do the
following:

- Quantize activations based on their ranges to fixed-point integer
8-bit precision
- Perform computations with 8-bit weights and activations

Note

This step only quantizes weights and doesn’t need extra calibration data.

The following script converts a TensorFlow model to a LiteRT model
and then quantizes it:

import tensorflow as tf from tensorflow import keras
    
    converter = tf.lite.TFLiteConverter.from_saved_model(exp_model_path)
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS]
    converter.optimizations = [tf.lite.Optimize.DEFAULT]
    
    tflite_model = converter.convert()
    save_name = 'quantized_model.tflite'
    
    print('Saving Dynamic Quantized LiteRT model ')
    
    with open(save_name, 'wb') as f:
       f.write(tflite_model)
    Copy to clipboard

Quantize models using full integer quantization

In full integer quantization, a representative data quantizes the
activation layers within the model.

The following script converts and quantizes a TensorFlow model to a
LiteRT model. It generates a full integer quantized model that’s more
suitable for fixed-point integer hardware, such as the Hexagon Tensor
Processor on Qualcomm development kits.

import tensorflow as tf
    
    def representative_dataset():
    for data in dataset:
       yield {"image": data.image, "bias": data.bias}
    
    saved_model_dir = "/path/to/saved/model"
    
    # prepare converter by loading model in saved_model format.
    converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
    converter.optimizations = [tf.lite.Optimize.DEFAULT]
    
    # Set representative dataset used for quantization.
    converter.representative_dataset = representative_dataset
    
    # For full-integer quantization, set target_spec supported_ops to TFLITE_ BUILTINS_INT8.
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
    converter.inference_input_type = tf.int8 # or tf.uint8 converter.inference_output_type = tf.int8 # or tf.uint8
    
    # Convert model
    tflite_quant_model = converter.convert() save_name = 'quantized_model_int8.tflite'
    
    print('Saving Quantized LiteRT model ')
    
    with open(save_name, 'wb') as f:
       f.write(tflite_model)
    Copy to clipboard

Note

`supported_ops` in converter sets `target_spec` to `tf.lite.OpsSet.TFLITE_BUILTINS_INT8`.

### Convert models using the `tflite_convert` command

You can use the TensorFlow pip package, which includes the
`tflite_convert` TensorFlow Lite offline converter tool (CLI), for
offline conversions with TensorFlow v2.x and later.

The `tflite_convert` command accepts the following input in the CLI:

You can convert the following models using the `tflite_convert` command:

tflite_convert --help
    
    `--output_file`. Type: string. Full path of the output file.
    `--saved_model_dir`. Type: string. Full path to the SavedModel directory.
    `--keras_model_file`. Type: string. Full path to the Keras H5 model file.
    `--enable_v1_converter`. Type: bool. (default False) Enables the converter and flags used in TF 1.x instead of TF 2.x.
    
    You are required to provide the `--output_file` flag and either the `--saved_model_dir` or `--keras_model_file` flag.
    Copy to clipboard

- [SavedModel](https://docs.qualcomm.com/doc/80-80022-15B/topic/export-tf-model-litert.html#convert-savedmodel)
- [Keras H5](https://docs.qualcomm.com/doc/80-80022-15B/topic/export-tf-model-litert.html#convert-keras-h5)

Convert a SavedModel using the `tflite_convert` command

To convert a typical TensorFlow model in the saved\_model format using
the `tflite_convert` command, run the following command:

tflite_convert \
       --saved_model_dir=/tmp/mobilenet_saved_model \
       --output_file=/tmp/mobilenet.tflite
    Copy to clipboard

Convert a Keras H5 model using the `tflite_convert` command

To convert a Keras H5 model using the tflite\_convert command, run the
following command:

tflite_convert \
       --keras_model_file=/tmp/mobilenet_keras_model.h5 \
       --output_file=/tmp/mobilenet.tflite
    Copy to clipboard

Note

The `tflite_convert` command is suitable for basic purposes only.

For posttraining integer quantization, use Python APIs.

Last Published: May 14, 2026

[Previous Topic
Use a pre-optimized LiteRT model](https://docs.qualcomm.com/bundle/publicresource/80-80022-15B/topics/use-preoptimized-litert-model.md) [Next Topic
Export ONNX model to a LiteRT](https://docs.qualcomm.com/bundle/publicresource/80-80022-15B/topics/export-onnx-model-to-litert.md)