测评计划:
一、开箱报告,KV260通过网线共享PC网络
二、Zynq超强辅助-PYNQ配置,并使用XVC(Xilinx Virtual Cable)调试FPGA逻辑
三、硬件加速之—使用PL加速FFT运算(Vivado)
四、硬件加速之—使用PL加速矩阵乘法运算(Vitis HLS)
五、Vitis AI 构建开发环境,并使用inspector检查模型
六、Vitis AI 进行模型校准和来量化
七、Vitis AI 通过迁移学习训练自定义模型
八、Vitis AI 将自定义模型编译并部署到KV260中
铺垫
校准和量化有什么却别?
校准和量化是 Vitis AI 中用于优化神经网络模型的两个重要步骤。校准是指通过一些代表性的数据来估计模型中每一层的输入和输出的分布,从而确定量化的参数,如比特宽度、尺度因子和零点等。量化是指将模型中的浮点数运算转换为整数运算,以减少模型的大小和计算量,提高推理的速度和效率。校准和量化后,模型可以转换为 DPU 可部署的格式,如 xmodel 文件
校准和量化有什么用途?
执行校准过程
在Vitis AI中,启动Jupyter Lab,新建Notebook,导入必要的python包:
import torch, torchvision, random
from pytorch_nndct.apis import Inspector, torch_quantizer
import torchvision.transforms as transforms
from tqdm import tqdm
定义load_data函数,用于从dataset中加载一定的典型数据进行校准。
def load_data(batch_size=128, subset_len=None):
dataset = torchvision.datasets.ImageFolder(
'dataset/image30/val',
transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
]))
if subset_len:
assert subset_len <= len(dataset)
dataset = torch.utils.data.Subset(
dataset, random.sample(range(0, len(dataset)), subset_len))
data_loader = torch.utils.data.DataLoader(
dataset, batch_size=batch_size, shuffle=False)
return data_loader
调用Vitis AI工具,指定一定数量的样本用于校准,并导出校准配置:
# 判断GPU是否可用,如果可用则使用GPU,否则使用CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# 加载一个预训练模型resnet18
model = torch.load('model/resnet18.pth', map_location=device)
# 定义batchsize和随机输入
batch_size = 16
input = torch.randn([batch_size, 3, 224, 224])
# 使用torch_quantizer构建量化器,传入模型和样本输入
quantizer = torch_quantizer('calib', model, (input), device=device)
quant_model = quantizer.quant_model
# 执行200个样本的前向传递
model.eval()
val_loader = load_data(batch_size=batch_size, subset_len=200)
for iteraction, (images, labels) in tqdm(enumerate(val_loader), total=len(val_loader)):
outputs = quant_model(images)
# 导出校准配置
quantizer.export_quant_config()
---以下为执行结果
[VAIQ_NOTE]: Quant config file is empty, use default quant configuration
[VAIQ_NOTE]: Quantization calibration process start up...
[VAIQ_NOTE]: =>Quant Module is in 'cpu'.
[VAIQ_NOTE]: =>Parsing ResNet...
[VAIQ_NOTE]: Start to trace model...
[VAIQ_NOTE]: Finish tracing.
[VAIQ_NOTE]: Processing ops...
██████████████████████████████████████████████████| 71/71 [00:00<00:00, 709.83it/s, OpInfo: name = return_0, type = Return]
[VAIQ_NOTE]: =>Doing weights equalization...
[VAIQ_NOTE]: =>Quantizable module is generated.(quantize_result/ResNet.py)
[VAIQ_NOTE]: =>Get module with quantization.
100%|█████████████████| 13/13 [02:07<00:00, 9.85s/it]
[VAIQ_NOTE]: =>Exporting quant config.(quantize_result/quant_info.json)
这段日志反映了使用默认量化配置对模型进行静态量化的整个过程,包括解析模型、trace、均衡化、生成量化模型和导出量化配置等步骤,可以得到以下信息:
评估量化模型的准确度
构建三个Python函数:
class AverageMeter(object):
def __init__(self, name, fmt=':f'):
self.name = name
self.fmt = fmt
self.reset()
def reset(self):
self.val = 0
self.avg = 0
self.sum = 0
self.count = 0
def update(self, val, n=1):
self.val = val
self.sum += val * n
self.count += n
self.avg = self.sum / self.count
def __str__(self):
fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})'
return fmtstr.format(**self.__dict__)
def accuracy(output, target, topk=(1,)):
with torch.no_grad():
maxk = max(topk)
batch_size = target.size(0)
_, pred = output.topk(maxk, 1, True, True)
pred = pred.t()
correct = pred.eq(target.view(1, -1).expand_as(pred))
res = []
for k in topk:
correct_k = correct[:k].flatten().float().sum(0, keepdim=True)
res.append(correct_k.mul_(100.0 / batch_size))
return res
def evaluate(model, val_loader, loss_fn):
model.eval()
model = model.to(device)
top1 = AverageMeter('Acc@1', ':6.2f')
top5 = AverageMeter('Acc@5', ':6.2f')
total = 0
Loss = 0
for iteraction, (images, labels) in tqdm(enumerate(val_loader), total=len(val_loader)):
images = images.to(device)
labels = labels.to(device)
outputs = model(images)
loss = loss_fn(outputs, labels)
Loss += loss.item()
total += images.size(0)
acc1, acc5 = accuracy(outputs, labels, topk=(1, 5))
top1.update(acc1[0], images.size(0))
top5.update(acc5[0], images.size(0))
return top1.avg, top5.avg, Loss / total
对于原始模型和校准后的模型的准确度:
batch_size = 128
val_loader = load_data(batch_size=batch_size)
loss_fn = torch.nn.CrossEntropyLoss().to(device)
acc1_gen, acc5_gen, loss_gen = evaluate(model, val_loader, loss_fn)
print('Original model: top-1 / top-5 accuracy: %g / %g' % (acc1_gen, acc5_gen))
acc1_gen, acc5_gen, loss_gen = evaluate(quant_model, val_loader, loss_fn)
print('Quanted model: top-1 / top-5 accuracy: %g / %g' % (acc1_gen, acc5_gen))
---以下为执行结果
100%|█████████████████████████| 13/13 [00:08<00:00, 1.48it/s]
Original model: top-1 / top-5 accuracy: 87 / 99
100%|█████████████████████████| 13/13 [01:54<00:00, 8.82s/it]
Quanted model: top-1 / top-5 accuracy: 88.5 / 98.5
注意:量化后的模型的top-1准确率反而高于原始模型,一方面是因为量化是一种近似计算,在某些情况下可以起到模型蒸馏的效果,去除模型训练过程中的噪声,从而改善性能;另一方面,量化校准过程中,通常会对权重进行缩放来最小化量化误差,这可能起到了“网络微调”的效果,达到优化模型的效果。
输出量化模型
input = torch.randn([1, 3, 224, 224])
quantizer = torch_quantizer('test', model, (input), device=device)
quant_model = quantizer.quant_model
val_loader = load_data(batch_size=1, subset_len=1)
image, label = next(iter(val_loader))
output = quant_model(image)
quantizer.export_xmodel(deploy_check=False)
quantizer.export_onnx_model()
---以下为执行结果
[VAIQ_NOTE]: Quant config file is empty, use default quant configuration
[VAIQ_NOTE]: Quantization test process start up...
[VAIQ_NOTE]: =>Quant Module is in 'cpu'.
[VAIQ_NOTE]: =>Parsing ResNet...
[VAIQ_NOTE]: Start to trace model...
[VAIQ_NOTE]: Finish tracing.
[VAIQ_NOTE]: Processing ops...
██████████████████████████████████████████████████| 71/71 [00:00<00:00, 492.26it/s, OpInfo: name = return_0, type = Return]
[VAIQ_NOTE]: =>Doing weights equalization...
[VAIQ_NOTE]: =>Quantizable module is generated.(quantize_result/ResNet.py)
[VAIQ_NOTE]: =>Get module with quantization.
[VAIQ_NOTE]: =>Converting to xmodel ...
[VAIQ_NOTE]: =>Successfully convert 'ResNet' to xmodel.(quantize_result/ResNet_int.xmodel)
得到的文件解释:
总结
模型的校准和量化看起来很简单,只需少量代码就能完成整个过程。但其实,量化的关键不在于使用Vtiis AI工具,而在于理解量化是如何进行的,以及模型部署的相关问题。本文中省略了最重要的一部分,就是校准配置的内容,我将在下一讲中进行分析。
一部分,就是校准配置的内容,我将在下一讲中进行分析。
config_file= "./int8_config.json"
quantizer= torch_quantizer(quant_mode=quant_mode,
module=model,
input_args=(input),
device=device,
quant_config_file=config_file)
更多回帖