# Convert and quantize using Python APIs

Source: [https://docs.qualcomm.com/doc/80-70014-54/topic/convert-and-quantize-using-python-apis.html](https://docs.qualcomm.com/doc/80-70014-54/topic/convert-and-quantize-using-python-apis.html)

TensorFlow offers APIs to convert a TensorFlow Saved Model or a Keras model to a
        TensorFlow Lite model.

- tf.lite.TFLiteConverter.from\_saved\_model() (recommended): Converts SavedModel
- tf.lite.TFLiteConverter.from\_keras\_model(): Converts a Keras model

## Convert a TensorFlow SavedModel (recommended)

The following example shows how to convert a TensorFlow model saved in the
                saved\_model format to a TensorFlow Lite model:

    import tensorflow as tf
    
    # Convert the model
    Saved_model_dir = “/path/to/tf/model/in/saved_model/format”
    converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
    tflite_model = converter.convert()
    
    # Save the model
    with open(‘model.tflite’, ‘wb’) as f:
        f.write(tflite_model)
    Copy to clipboard

Note: The converted TensorFlow Lite model is not quantized, and
                its data is in 32‑bit floating-point precision.

## Convert a Keras model

The following example shows how to convert a Keras model to a TensorFlow Lite
                model:

    import tensorflow as tf
    
    # Create a model using high-level tf.keras.* APIs
    model = tf.keras.models.Sequential([
        tf.keras.layers.Dense(units=1, input_shape=[1]),
        tf.keras.layers.Dense(units=16, activation='relu'),
        tf.keras.layers.Dense(units=1)
    ])
    
    # compile the model
    model.compile(optimizer='sgd', loss='mean_squared_error')
    
    # train the model
    model.fit(x=[-1, 0, 1], y=[-3, -1, 1], epochs=5)
    
    # Convert the model to TFLite
    converter = tf.lite.TFLiteConverter.from_keras_model(model)
    tflite_model = converter.convert()
    
    # Save the model
    with open('model.tflite', 'wb') as f:
        f.write(tflite_model)
    Copy to clipboard

Note: The converted TensorFlow Lite model is not quantized, and
                its data is in 32‑bit floating-point precision.

## Quantize models

Quantization in neural network models involves the following steps:

1. Quantize weights and biases: This is a static step as weights/biases are already
                    part of the trained model and can be quantized without additional
                    information.
2. Quantize activation layers: Ranges for the activation layer output depend on the
                    input image during forward propagation. Therefore, a set of sample inputs are
                    required to quantize these layers and identify the minimum/maximum ranges. Such
                    sample inputs are called calibration/representative data set.

To quantize a TensorFlow floating-point model to a quantized TensorFlow Lite model,
                the TensorFlow Lite model provides posttraining quantization techniques. For more
                information, see [Posttraining quantization](https://www.tensorflow.org/lite/performance/post_training_quantization).

## Posttraining quantization

TensorFlow Lite supports two types of posttraining quantization:

- Posttraining dynamic range quantization
- Posttraining full-integer quantization

## Posttraining dynamic range quantization

In posttraining dynamic range quantization, weights and biases are quantized
                statically from floating-point precision to fixed-point integer 8‑bit precision. The
                activation layer ranges remain in 32‑bit floating-point precision.

To reduce latencies during inference, dynamic-range operators do the following:

- Quantize activations based on their ranges to fixed-point integer 8‑bit
                    precision
- Perform a computation with 8‑bit weights and activations

Note: No additional calibration data is needed in this step as
                only weights are quantized.

The following script converts and quantizes a TensorFlow model to a TensorFlow Lite
                model:

    import tensorflow as tf
    from tensorflow import keras
      
    converter = tf.lite.TFLiteConverter.from_saved_model(exp_model_path)
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS]
    converter.optimizations = []
      
    tflite_model = converter.convert()
    save_name = 'quantized_model.tflite'
     
    print('Saving Dynamic Quantized TFLite model ..................')
      
    with open(save_name, 'wb') as f:
        f.write(tflite_model)
    Copy to clipboard

## Posttraining full-integer quantization

In full-integer quantization, a representative data set is used to perform
                quantization for activation layers within the model.

The following script converts and quantizes a TensorFlow model to a TensorFlow Lite
                model:

    import tensorflow as tf
      
    def representative_dataset():
        for data in dataset:
            yield {
                "image": data.image,
                "bias": data.bias,
            }
    saved_model_dir = “/path/to/saved/model”
      
    # prepare converter by loading model in saved_model format.
    converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
    converter.optimizations = [tf.lite.Optimize.DEFAULT]
    
    # Set representative dataset used for quantization.
    converter.representative_dataset = representative_dataset
    
    # For full-integer quantization, set target_spec supported_ops to TFLITE_BUILTINS_INT8.
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
    converter.inference_input_type = tf.int8  # or tf.uint8
    converter.inference_output_type = tf.int8  # or tf.uint8
    
    # Convert model
    tflite_quant_model = converter.convert()
    save_name = 'quantized_model_int8.tflite'
     
    print('Saving Quantized TFLite model ..................')
      
    with open(save_name, 'wb') as f:
        f.write(tflite_model)
    Copy to clipboard

Note: `supported_ops` in converter
                    `target_spec` is set to
                    `tf.lite.OpsSet.TFLITE_BUILTINS_INT8`.

This script generates a full-integer quantized model that is more suitable for a
                fixed-point integer hardware such as the Hexagon Tensor Processor on the Qualcomm®
                RB3 Gen 2 Platform.

**Parent Topic:** [Convert a TensorFlow or Keras model to TensorFlow Lite format](https://docs.qualcomm.com/doc/80-70014-54/topic/convert-a-tensorflow-or-keras-model-to-tensorflow-lite-format.html)

Last Published: Jul 12, 2024

[Previous Topic
Convert a TensorFlow or Keras model to TensorFlow Lite format](https://docs.qualcomm.com/bundle/publicresource/80-70014-54/topics/convert-a-tensorflow-or-keras-model-to-tensorflow-lite-format.md) [Next Topic
Convert using offline converter tool (CLI)](https://docs.qualcomm.com/bundle/publicresource/80-70014-54/topics/convert-using-offline-converter-tool-cli.md)