【大联大友尚安森美半导体感光芯片USB双目摄像头模组试用体验】敏感人群自动追踪系统开发

ti

`感谢主办方提供的双目摄像头提供测试。本项目利用双目摄像头模组+LattePanda Delta 432+NCS2神经棒部署openvino做一个敏感人群自动追踪装置，在热点地区或车辆顶上自动搜索指定的敏感人群人脸（如失踪人群等）。系统会调用摄像头图像根据目标文件夹的目标图像去搜索，相似度阀值大于设置值就报警显示出来。
依次在LattePanda Delta 432上搭建openvino环境进行开发、测试运行。
1.开发环境搭建步骤：
1)安装Microsoft Visual Studio* with C++ 2019 with MSBuild（注意安装community版，选中 “.NET 桌面开发”、“使用 C++ 的桌面开发”、“通用 Windows 平台开发”，选择MSBuild）
2)安装CMake 3.4 or higher 64-bit
3)Python 3.5 - 3.7 64-bit（如果安装过Anaconda就不必再安装）
4)设置环境变量
5)安装openvino（注意版本选择，不一定最新版本最好，新版本有空可以研究。2020.3版安装后好像发现自带openCV库缺文件，单独安装opencv才解决。2020.2版用到现在还没问题）
一般经过这个过程运行测试代码demo_security_barrier_camera.bat -d CPU会没问题，会出来识别汽车车牌图片，说明安装成功。

2.使用注意事项：
1).在cmd里面运行前，要右键cmd选择用管理员运行，否则一些命令会由于目录权限不够运行不了。
2).运行命令时，CPU、GPU、MYRIAD这些设备要大写，不能小写，因为这些是作为参数引入的，特别python是区分大小写，所以写错会匹配不到参数出问题。
3).下载一些东西是由于有些东西是国外的，文件比较大的时候会失败，需要切换到清华源等国内源再下载。
4）openvino要求最低6代酷睿处理器，太老的cpu不行。这次用的LattePanda 拿铁熊猫 Delta 432由于cpu是用的Intel 8th Gen Celeron Processor N4100，好像并不在支持之列。所以我们后来是用NCS2神经棒来运行的。
3.程序编写：
参考openvino的官方案例程序按项目需求改写。
1)程序文件
face_detector.py  处理面部探测
face_identifier.py  处理面部对比识别
faces_database.py 处理模型标准图片
ie_module.py  模型处理
landmarks_detector.py 处理标识探测
face_recognition_demo.py 主程序
2）用到的模型：
face-detection-adas-0001
face-reidentification-retail-0095
landmarks-regression-retail-0009
3）主程序分析：
主要说下主程序face_recognition_demo.py ，我们设置了超过70%的相似度阀值就触发报警，当然报警可以多种方式，可以是窗口报警，也可以是发出报警邮件，可按要求去改写：
import logging as log
import os.path as osp
import sys
import time
from argparse import ArgumentParser

import cv2
import numpy as np

from openvino.inference_engine import IENetwork
from ie_module import InferenceContext
from landmarks_detector import LandmarksDetector
from face_detector import FaceDetector
from faces_database import FacesDatabase
from face_identifier import FaceIdentifier

DEVICE_KINDS = ['CPU', 'GPU', 'FPGA', 'MYRIAD', 'HETERO', 'HDDL']
MATCH_ALGO = ['HUNGARIAN', 'MIN_DIST']

def build_argparser():
parser = ArgumentParser()

general = parser.add_argument_group('General')
general.add_argument('-i', '--input', metavar="PATH", default='0',
                     help="(optional) Path to the input video "
                     "('0' for the camera, default)")
general.add_argument('-o', '--output', metavar="PATH", default="",
                     help="(optional) Path to save the output video to")
general.add_argument('--no_show', action='store_true',
                     help="(optional) Do not display output")
general.add_argument('-tl', '--timelapse', action='store_true',
                     help="(optional) Auto-pause after each frame")
general.add_argument('-cw', '--crop_width', default=0, type=int,
                     help="(optional) Crop the input stream to this width "
                     "(default: no crop). Both -cw and -ch parameters "
                     "should be specified to use crop.")
general.add_argument('-ch', '--crop_height', default=0, type=int,
                     help="(optional) Crop the input stream to this height "
                     "(default: no crop). Both -cw and -ch parameters "
                     "should be specified to use crop.")
general.add_argument('--match_algo', default='HUNGARIAN', choices=MATCH_ALGO,
                     help="(optional)algorithm for face matching(default: %(default)s)")

gallery = parser.add_argument_group('Faces database')
gallery.add_argument('-fg', metavar="PATH", required=True,
                     help="Path to the face images directory")
gallery.add_argument('--run_detector', action='store_true',
                     help="(optional) Use Face Detection model to find faces"
                     " on the face images, otherwise use full images.")

models = parser.add_argument_group('Models')
models.add_argument('-m_fd', metavar="PATH", default="", required=True,
                     help="Path to the Face Detection model XML file")
models.add_argument('-m_lm', metavar="PATH", default="", required=True,
                     help="Path to the Facial Landmarks Regression model XML file")
models.add_argument('-m_reid', metavar="PATH", default="", required=True,
                     help="Path to the Face Reidentification model XML file")
models.add_argument('-fd_iw', '--fd_input_width', default=0, type=int,
                     help="(optional) specify the input width of detection model "
                     "(default: use default input width of model). Both -fd_iw and -fd_ih parameters "
                     "should be specified for reshape.")
models.add_argument('-fd_ih', '--fd_input_height', default=0, type=int,
                     help="(optional) specify the input height of detection model "
                     "(default: use default input height of model). Both -fd_iw and -fd_ih parameters "
                     "should be specified for reshape.")

infer = parser.add_argument_group('Inference options')
infer.add_argument('-d_fd', default='CPU', choices=DEVICE_KINDS,
                     help="(optional) Target device for the "
                     "Face Detection model (default: %(default)s)")
infer.add_argument('-d_lm', default='CPU', choices=DEVICE_KINDS,
                     help="(optional) Target device for the "
                     "Facial Landmarks Regression model (default: %(default)s)")
infer.add_argument('-d_reid', default='CPU', choices=DEVICE_KINDS,
                     help="(optional) Target device for the "
                     "Face Reidentification model (default: %(default)s)")
infer.add_argument('-l', '--cpu_lib', metavar="PATH", default="",
                     help="(optional) For MKLDNN (CPU)-targeted custom layers, if any. "
                     "Path to a shared library with custom layers implementations")
infer.add_argument('-c', '--gpu_lib', metavar="PATH", default="",
                     help="(optional) For clDNN (GPU)-targeted custom layers, if any. "
                     "Path to the XML file with descriptions of the kernels")
infer.add_argument('-v', '--verbose', action='store_true',
                     help="(optional) Be more verbose")
infer.add_argument('-pc', '--perf_stats', action='store_true',
                     help="(optional) Output detailed per-layer performance stats")
infer.add_argument('-t_fd', metavar='[0..1]', type=float, default=0.6,
                     help="(optional) Probability threshold for face detections"
                     "(default: %(default)s)")
infer.add_argument('-t_id', metavar='[0..1]', type=float, default=0.3,
                     help="(optional) Cosine distance threshold between two vectors "
                     "for face identification (default: %(default)s)")
infer.add_argument('-exp_r_fd', metavar='NUMBER', type=float, default=1.15,
                     help="(optional) Scaling ratio for bboxes passed to face recognition "
                     "(default: %(default)s)")
infer.add_argument('--allow_grow', action='store_true',
                     help="(optional) Allow to grow faces gallery and to dump on disk. "
                     "Available only if --no_show option is off.")

return parser

class FrameProcessor:
QUEUE_SIZE = 16

def __init__(self, args):
      used_devices = set([args.d_fd, args.d_lm, args.d_reid])
      self.context = InferenceContext(used_devices, args.cpu_lib, args.gpu_lib, args.perf_stats)
      context = self.context

      log.info("Loading models")
      face_detector_net = self.load_model(args.m_fd)

      assert (args.fd_input_height and args.fd_input_width) or
            (args.fd_input_height==0 and args.fd_input_width==0),
         "Both -fd_iw and -fd_ih parameters should be specified for reshape"

      if args.fd_input_height and args.fd_input_width :
         face_detector_net.reshape({"data": [1, 3, args.fd_input_height,args.fd_input_width]})
      landmarks_net = self.load_model(args.m_lm)
      face_reid_net = self.load_model(args.m_reid)

      self.face_detector = FaceDetector(face_detector_net,
                                       confidence_threshold=args.t_fd,
                                       roi_scale_factor=args.exp_r_fd)

      self.landmarks_detector = LandmarksDetector(landmarks_net)
      self.face_identifier = FaceIdentifier(face_reid_net,
                                          match_threshold=args.t_id,
                                          match_algo = args.match_algo)

      self.face_detector.deploy(args.d_fd, context)
      self.landmarks_detector.deploy(args.d_lm, context,
                                    queue_size=self.QUEUE_SIZE)
      self.face_identifier.deploy(args.d_reid, context,
                                 queue_size=self.QUEUE_SIZE)
      log.info("Models are loaded")

      log.info("Building faces database using images from '%s'" % (args.fg))
      self.faces_database = FacesDatabase(args.fg, self.face_identifier,
                                          self.landmarks_detector,
                                          self.face_detector if args.run_detector else None, args.no_show)
      self.face_identifier.set_faces_database(self.faces_database)
      log.info("Database is built, registered %s identities" %
         (len(self.faces_database)))

      self.allow_grow = args.allow_grow and not args.no_show

def load_model(self, model_path):
      model_path = osp.abspath(model_path)
      model_description_path = model_path
      model_weights_path = osp.splitext(model_path)[0] + ".bin"
      log.info("Loading the model from '%s'" % (model_description_path))
      assert osp.isfile(model_description_path),
         "Model description is not found at '%s'" % (model_description_path)
      assert osp.isfile(model_weights_path),
         "Model weights are not found at '%s'" % (model_weights_path)
      model = IENetwork(model_description_path, model_weights_path)
      log.info("Model is loaded")
      return model

def process(self, frame):
      assert len(frame.shape) == 3,
         "Expected input frame in (H, W, C) format"
      assert frame.shape[2] in [3, 4],
         "Expected BGR or BGRA input"

      orig_image = frame.copy()
      frame = frame.transpose((2, 0, 1)) # HWC to CHW
      frame = np.expand_dims(frame, axis=0)

      self.face_detector.clear()
      self.landmarks_detector.clear()
      self.face_identifier.clear()

      self.face_detector.start_async(frame)
      rois = self.face_detector.get_roi_proposals(frame)
      if self.QUEUE_SIZE < len(rois):
         log.warning("Too many faces for processing."
                  " Will be processed only %s of %s." %
                  (self.QUEUE_SIZE, len(rois)))
         rois = rois[:self.QUEUE_SIZE]
      self.landmarks_detector.start_async(frame, rois)
      landmarks = self.landmarks_detector.get_landmarks()

      self.face_identifier.start_async(frame, rois, landmarks)
      face_identities, unknowns = self.face_identifier.get_matches()
      if self.allow_grow and len(unknowns) > 0:
         for i in unknowns:
            # This check is preventing asking to save half-images in the boundary of images
            if rois.position[0] == 0.0 or rois.position[1] == 0.0 or
                  (rois.position[0] + rois.size[0] > orig_image.shape[1]) or
                  (rois.position[1] + rois.size[1] > orig_image.shape[0]):
                  continue
            crop = orig_image[int(rois.position[1]):int(rois.position[1]+rois.size[1]), int(rois.position[0]):int(rois.position[0]+rois.size[0])]
            #name = self.faces_database.ask_to_save(crop) xy
            """#xy
            if name:
                  id = self.faces_database.dump_faces(crop, face_identities.descriptor, name)
                  face_identities.id = id
            """

      outputs = [rois, landmarks, face_identities]

      return outputs

def get_performance_stats(self):
      stats = {
         'face_detector': self.face_detector.get_performance_stats(),
         'landmarks': self.landmarks_detector.get_performance_stats(),
         'face_identifier': self.face_identifier.get_performance_stats(),
      }
      return stats

class Visualizer:
BREAK_KEY_LABELS = "q(Q) or Escape"
BREAK_KEYS = {ord('q'), ord('Q'), 27}

def __init__(self, args):
      self.frame_processor = FrameProcessor(args)
      self.display = not args.no_show
      self.print_perf_stats = args.perf_stats

      self.frame_time = 0
      self.frame_start_time = 0
      self.fps = 0
      self.frame_num = 0
      self.frame_count = -1

      self.input_crop = None
      if args.crop_width and args.crop_height:
         self.input_crop = np.array((args.crop_width, args.crop_height))

      self.frame_timeout = 0 if args.timelapse else 1

def update_fps(self):
      now = time.time()
      self.frame_time = now - self.frame_start_time
      self.fps = 1.0 / self.frame_time
      self.frame_start_time = now

def draw_text_with_background(self, frame, text, origin,
                              font=cv2.FONT_HERSHEY_SIMPLEX, scale=1.0,
                              color=(0, 0, 0), thickness=1, bgcolor=(255, 255, 255)):
      text_size, baseline = cv2.getTextSize(text, font, scale, thickness)
      cv2.rectangle(frame,
                  tuple((origin + (0, baseline)).astype(int)),
                  tuple((origin + (text_size[0], -text_size[1])).astype(int)),
                  bgcolor, cv2.FILLED)
      cv2.putText(frame, text,
                  tuple(origin.astype(int)),
                  font, scale, color, thickness)
      return text_size, baseline

def draw_detection_roi(self, frame, roi, identity):
      label = self.frame_processor
         .face_identifier.get_identity_label(identity.id)
      #xy 根据置信度阀值判断是否需要报警

      confidence = 100.0 * (1 - identity.distance)
      if confidence>70:
         log.info('find missing people! NAME is %s %f' % (label,(100.0 * (1 - identity.distance))))

      # Draw face ROI border
      cv2.rectangle(frame,
                  tuple(roi.position), tuple(roi.position + roi.size),
                  (0, 220, 0), 2)

      # Draw identity label
      text_scale = 0.5
      font = cv2.FONT_HERSHEY_SIMPLEX
      text_size = cv2.getTextSize("H1", font, text_scale, 1)
      line_height = np.array([0, text_size[0][1]])
      text = label
      if identity.id != FaceIdentifier.UNKNOWN_ID:
         text += ' %.2f%%' % (100.0 * (1 - identity.distance))
      self.draw_text_with_background(frame, text,
                                    roi.position - line_height * 0.5,
                                    font, scale=text_scale)

def draw_detection_keypoints(self, frame, roi, landmarks):
      keypoints = [landmarks.left_eye,
                  landmarks.right_eye,
                  landmarks.nose_tip,
                  landmarks.left_lip_corner,
                  landmarks.right_lip_corner]

      for point in keypoints:
         center = roi.position + roi.size * point
         cv2.circle(frame, tuple(center.astype(int)), 2, (0, 255, 255), 2)

def draw_detections(self, frame, detections):
      for roi, landmarks, identity in zip(*detections):
         self.draw_detection_roi(frame, roi, identity)
         self.draw_detection_keypoints(frame, roi, landmarks)

def draw_status(self, frame, detections):
      origin = np.array([10, 10])
      color = (10, 160, 10)
      font = cv2.FONT_HERSHEY_SIMPLEX
      text_scale = 0.5
      text_size, _ = self.draw_text_with_background(frame,
                                                   "Frame time: %.3fs" % (self.frame_time),
                                                   origin, font, text_scale, color)
      self.draw_text_with_background(frame,
                                    "FPS: %.1f" % (self.fps),
                                    (origin + (0, text_size[1] * 1.5)), font, text_scale, color)

      log.debug('Frame: %s/%s, detections: %s, '
               'frame time: %.3fs, fps: %.1f' %
                  (self.frame_num, self.frame_count, len(detections[-1]), self.frame_time, self.fps))

      if self.print_perf_stats:
         log.info('Performance stats:')
         log.info(self.frame_processor.get_performance_stats())

def display_interactive_window(self, frame):
      color = (255, 255, 255)
      font = cv2.FONT_HERSHEY_SIMPLEX
      text_scale = 0.5
      text = "Press '%s' key to exit" % (self.BREAK_KEY_LABELS)
      thickness = 2
      text_size = cv2.getTextSize(text, font, text_scale, thickness)
      origin = np.array([frame.shape[-2] - text_size[0][0] - 10, 10])
      line_height = np.array([0, text_size[0][1]]) * 1.5
      cv2.putText(frame, text,
                  tuple(origin.astype(int)), font, text_scale, color, thickness)

      cv2.imshow('Looking for missing people!', frame) #xy

def should_stop_display(self):
      key = cv2.waitKey(self.frame_timeout) & 0xFF
      return key in self.BREAK_KEYS

def process(self, input_stream, output_stream):
      self.input_stream = input_stream
      self.output_stream = output_stream

      while input_stream.isOpened():
         has_frame, frame = input_stream.read()
         if not has_frame:
            break

         if self.input_crop is not None:
            frame = Visualizer.center_crop(frame, self.input_crop)
         detections = self.frame_processor.process(frame)

         self.draw_detections(frame, detections)
         self.draw_status(frame, detections)

         if output_stream:
            output_stream.write(frame)
         if self.display:
            self.display_interactive_window(frame)
            if self.should_stop_display():
                  break

         self.update_fps()
         self.frame_num += 1

@staticmethod
def center_crop(frame, crop_size):
      fh, fw, fc = frame.shape
      crop_size[0] = min(fw, crop_size[0])
      crop_size[1] = min(fh, crop_size[1])
      return frame[(fh - crop_size[1]) // 2 : (fh + crop_size[1]) // 2,
                  (fw - crop_size[0]) // 2 : (fw + crop_size[0]) // 2,
                  :]

def run(self, args):
      input_stream = Visualizer.open_input_stream(args.input)
      if input_stream is None or not input_stream.isOpened():
         log.error("Cannot open input stream: %s" % args.input)
      fps = input_stream.get(cv2.CAP_PROP_FPS)
      frame_size = (int(input_stream.get(cv2.CAP_PROP_FRAME_WIDTH)),
                  int(input_stream.get(cv2.CAP_PROP_FRAME_HEIGHT)))
      self.frame_count = int(input_stream.get(cv2.CAP_PROP_FRAME_COUNT))
      if args.crop_width and args.crop_height:
         crop_size = (args.crop_width, args.crop_height)
         frame_size = tuple(np.minimum(frame_size, crop_size))
      log.info("Input stream info: %d x %d @ %.2f FPS" %
         (frame_size[0], frame_size[1], fps))
      output_stream = Visualizer.open_output_stream(args.output, fps, frame_size)

      self.process(input_stream, output_stream)

      # Release resources
      if output_stream:
         output_stream.release()
      if input_stream:
         input_stream.release()

      cv2.destroyAllWindows()

@staticmethod
def open_input_stream(path):
      log.info("Reading input data from '%s'" % (path))
      stream = path
      try:
         stream = int(path)
      except ValueError:
         pass
      return cv2.VideoCapture(stream)

@staticmethod
def open_output_stream(path, fps, frame_size):
      output_stream = None
      if path != "":
         if not path.endswith('.avi'):
            log.warning("Output file extension is not 'avi'. "
                     "Some issues with output can occur, check logs.")
         log.info("Writing output to '%s'" % (path))
         output_stream = cv2.VideoWriter(path,
                                          cv2.VideoWriter.fourcc(*'MJPG'), fps, frame_size)
      return output_stream

def main():
args = build_argparser().parse_args()

log.basicConfig(format="[ %(levelname)s ] %(asctime)-15s %(message)s",
                  level=log.INFO if not args.verbose else log.DEBUG, stream=sys.stdout)

log.debug(str(args))

visualizer = Visualizer(args)
visualizer.run(args)

if __name__ == '__main__':
main()
4）编写批处理文件：

由于有两个摄像头，可以同时调用，但由于CPU不支持openvino，而且只有一根NCS2神经棒，所以只单独调用了一个摄像头。否则可以两个摄像头同时调用，或按照时间来调用，白天调用普通摄像头，晚上调用红外摄像头，持续进行搜索。
run01.bat批处理文件：
python face_recognition_demo.py ^
-m_fd .modelsface-detection-adas-0001.xml ^
-m_lm .modelslandmarks-regression-retail-0009.xml ^
-m_reid .modelsface-reidentification-retail-0095.xml ^
-fg "classroom" ^
-i 0 ^
-t_fd 0.3 ^
-t_id 0.8 ^
-d_fd MYRIAD ^
-d_lm MYRIAD ^
-d_reid MYRIAD ^
--allow_grow

run02.bat批处理文件：
python face_recognition_demo.py ^
-m_fd .modelsface-detection-adas-0001.xml ^
-m_lm .modelslandmarks-regression-retail-0009.xml ^
-m_reid .modelsface-reidentification-retail-0095.xml ^
-fg "classroom" ^
-i 1 ^
-t_fd 0.3 ^
-t_id 0.8 ^
-d_fd MYRIAD ^
-d_lm MYRIAD ^
-d_reid MYRIAD ^
--allow_grow

说明-i 0代表调用红外摄像头，-i 1代表调用普通RGB摄像头。

4.事先准备待搜寻的人物图片作为基准图片放入classroom文件夹：
这里我们用物理大师爱因斯坦的经典照片作为测试，注意放在这个文件夹的是基准图片，后面运行时，摄像头对着的是另外一张照片。

5.硬件连接好：

6.运行测试：

可以看到检测到的视频图像与基准图片相似度超过设置的70%阀值，就触发了报警提示。在基准文件夹中没有找到的人脸标签显示unknow。

7.演示运行视频地址：
https://v.qq.com/x/page/d31556mq4a6.html

8.总结：
双目摄像头能很好地完成人工智能识别的工作，可以广泛运用到多个实际场景。特别是红外摄像头和RGB摄像头的组合能完成一些特定环境的昼夜监测要求。如果后续能增加自带景深深度识别的功能就更好了，可以进一步扩大适用场景。

`