欢迎您访问 最编程 本站为您分享编程语言代码,编程技术文章!
您现在的位置是: 首页

深度学习编译器入门指南 (Q4) 模型量化 - INT8 量化实验

最编程 2024-03-30 19:34:23
...

量化计算

导入依赖库

from torch.autograd import Variable
import torch
import math

1. 计算scalefactor

计算表示输入张量所需的最大二进制位数,用于计算计算scalefactor
scalefactor =1 - \frac{max\_bits}{bits(=8)-1}

def compute_integral_part(input, overflow_rate):
    abs_value = input.abs().view(-1)
    sorted_value = abs_value.sort(dim=0, descending=True)[0]
    split_idx = int(overflow_rate * len(sorted_value))
    v = sorted_value[split_idx]
    if isinstance(v, Variable):
        v = float(v.data.cpu())
    max_bits = math.ceil(math.log2(v+1e-12))
    return max_bits

2. 进行线性量化

对输入张量进行量化计算

def linear_quantize(input, sf, bits):
    assert bits >= 1, bits
    if bits == 1:
        return torch.sign(input) - 1
    delta = math.pow(2.0, -sf)
    bound = math.pow(2.0, bits-1)
    min_val = - bound
    max_val = bound - 1
    rounded = torch.floor(input / delta + 0.5)
    clipped_value = torch.clamp(rounded, min_val, max_val) * delta
    return clipped_value

3. DEMO

input = (torch.rand((4,3))-0.5)*1024

input = tensor([[ -99.4158, -69.1860, 430.0259],
[ 506.4909, 262.8812, 504.7529],
[-342.8090, -338.9429, -12.9747],
[ 267.9500, -192.5225, 449.5162]])

max_bits = compute_integral_part(input,0.0)

max_bits = 9

bits = 8
sf = 1 - max_bits/(bits-1)
output = linear_quantize(input,sf,bits)

output = tensor([[ -99., -69., 127.],
[ 127., 127., 127.],
[-128., -128., -13.],
[ 127., -128., 127.]])