HistogramObserver¶
- class torch.ao.quantization.observer.HistogramObserver(bins=2048, upsample_rate=128, dtype=torch.quint8, qscheme=torch.per_tensor_affine, reduce_range=False, quant_min=None, quant_max=None, factory_kwargs=None, eps=1.1920928955078125e-07, is_dynamic=False, **kwargs)[source]¶
The module records the running histogram of tensor values along with min/max values.
calculate_qparams
will calculate scale and zero_point.- Parameters
bins (int) – Number of bins to use for the histogram
upsample_rate (int) – Factor by which the histograms are upsampled, this is used to interpolate histograms with varying ranges across observations
dtype (dtype) – dtype argument to the quantize node needed to implement the reference model spec
qscheme – Quantization scheme to be used
reduce_range – Reduces the range of the quantized data type by 1 bit
eps (Tensor) – Epsilon value for float32, Defaults to torch.finfo(torch.float32).eps.
The scale and zero point are computed as follows:
- Create the histogram of the incoming inputs.
The histogram is computed continuously, and the ranges per bin change with every new tensor observed.
- Search the distribution in the histogram for optimal min/max values.
The search for the min/max values ensures the minimization of the quantization error with respect to the floating point model.
- Compute the scale and zero point the same way as in the
MinMaxObserver