多年来,研究人员一直在研究赋予机器以视觉识别和识别物体的能力的可能性。这个称为计算机视觉或CV的特定领域具有广泛的现代应用程序。从被自动驾驶汽车用于道路目标检测到复杂的面部和肢体语言识别(可以识别可能的犯罪或犯罪活动),CV在当今世界中有许多用途。不可否认,对象检测还是Computer Vision最酷的应用之一。当今的CV工具可以轻松地在图像甚至是实时流视频上实现对象检测。在本文中,我们将看一下使用TensorFlow进行实时对象检测的简单演示。
设置简单的对象检测器
先决条件:
Tensorflow> = 1.15.0
通过执行pip install tensorflow安装最新版本
我们现在出发了!
搭建环境
步骤1.下载或克隆TensorFlow对象检测代码到本地计算机中
在终端中执行以下命令:git clone 如果您的计算机上未安装git,则可以选择从此处下载zip文件。
步骤2.安装依赖项
下一步是确保我们拥有在计算机上运行对象检测器所需的所有库和模块。
这是项目依赖的库的列表。(默认情况下,大多数依赖项都随Tensorflow一起提供)
· 赛顿
· contextlib2
· 枕头
· xml文件
· matplotlib
如果您发现缺少任何模块,只需在您的环境中执行pip install即可安装。
步骤3.安装Protobuf编译器
Protobuf或Protocol缓冲区是Google的语言无关,平台无关的可扩展机制,用于序列化结构化数据。它可以帮助我们定义我们希望数据的结构方式,一旦结构化,就可以轻松地使用各种语言在各种数据流之间读写结构化数据。
这也是该项目的依赖项。您可以在此处了解有关Protobufs的更多信息。现在,选择适合您的操作系统的版本,然后复制下载链接。
打开终端或命令提示符,将目录更改为克隆的存储库,然后在终端中执行以下命令。
cd models/research
wget -O protobuf.zip
unzip protobuf.zip
注意:请确保在models / research目录中解压缩protobuf.zip文件
步骤4.编译Protobuf编译器
从research /目录执行以下命令以编译协议缓冲区。
在Python中实现对象检测
现在,我们已经安装了所有依赖项,让我们使用Python来实现对象检测。
在下载的存储库中,将目录更改为 models/research/object_detection。在此目录中,您将找到一个名为object_detection_tutorial.ipynb的ipython笔记本。该文件是用于对象检测的演示,执行时将使用指定的“ssd_mobilenet_v1_coco_2017_11_17模型对存储库中提供的两个测试图像进行分类。
以下是测试输出之一:
引入了一些小的更改以从实时流视频中检测对象。在相同的文件夹中制作一个新的Jupyter笔记本,并遵循以下代码。
在[1]中:
import numpy as npimport osimport six.moves.urllib as urllibimport sysimport tarfileimport tensorflow as tfimport zipfilefrom distutils.version import StrictVersionfrom collections import defaultdictfrom io import StringIOfrom matplotlib import pyplot as pltfrom PIL import Image# This is needed since the notebook is stored in the object_detection
sys.path.append("..")from utils import ops as utils_opsif StrictVersion(tf.__version__) < StrictVersion('1.12.0'):
raise ImportError('Please upgrade your TensorFlow installation to v1.12.*.')
在[2]中:
# This is needed to display the images.
get_ipython().run_line_magic('matplotlib', 'inline')
在[3]中:
# Object detection imports# Here are the imports from the object detection module.from utils import label_map_utilfrom utils import visualization_utils as vis_util
在[4]中:
# Model preparation # Any model exported using the `export_inference_graph.py` tool can be loaded here simply by changing `PATH_TO_FROZEN_GRAPH` to point to a new .pb file.# By default we use an "SSD with Mobilenet" model here. #See
MODEL_NAME = 'ssd_mobilenet_v1_coco_2017_11_17'
MODEL_FILE = MODEL_NAME + '.tar.gz'
DOWNLOAD_BASE = # Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_FROZEN_GRAPH = MODEL_NAME + '/frozen_inference_graph.pb'# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')
在[5]中:
#Download Model
opener = urllib.request.URLopener()
opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
tar_file = tarfile.open(MODEL_FILE)for file in tar_file.getmembers():
file_name = os.path.basename(file.name)
if 'frozen_inference_graph.pb' in file_name:
tar_file.extract(file, os.getcwd())
在[6]中:
# Load a (frozen) Tensorflow model into memory.
detection_graph = tf.Graph()with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
在[7]中:
# Loading label map# Label maps map indices to category names, so that when our convolution network predicts `5`,#we know that this corresponds to `airplane`. Here we use internal utility functions, #but anything that returns a dictionary mapping integers to appropriate string labels would be fine
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)
在[8]中:
def run_inference_for_single_image(image, graph):
with graph.as_default():
with tf.Session() as sess:
# Get handles to input and output tensors
ops = tf.get_default_graph().get_operations()
all_tensor_names = {output.name for op in ops for output in op.outputs}
tensor_dict = {}
for key in [
'num_detections', 'detection_boxes', 'detection_scores',
'detection_classes', 'detection_masks']:
tensor_name = key + ':0'
if tensor_name in all_tensor_names:
tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(tensor_name)
if 'detection_masks' in tensor_dict:
# The following processing is only for single image
detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
# Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
real_num_detection = tf.cast(tensor_dict[0], tf.int32)
detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
detection_masks, detection_boxes, image.shape[1], image.shape[2])
detection_masks_reframed = tf.cast(
tf.greater(detection_masks_reframed, 0.5), tf.uint8)
# Follow the convention by adding back the batch dimension
tensor_dict['detection_masks'] = tf.expand_dims(
detection_masks_reframed, 0)
image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')
# Run inference
output_dict = sess.run(tensor_dict, feed_dict={image_tensor: image})
# all outputs are float32 numpy arrays, so convert types as appropriate
output_dict['num_detections'] = int(output_dict['num_detections'][0])
output_dict['detection_classes'] = output_dict[
'detection_classes'][0].astype(np.int64)
output_dict[0]
output_dict[0]
if 'detection_masks' in output_dict:
output_dict[0]
return output_dict
在[8]中:
import cv2
cam = cv2.cv2.VideoCapture(0)
rolling = Truewhile (rolling):
ret, image_np = cam.read()
image_np_expanded = np.expand_dims(image_np, axis=0)
# Actual detection.
output_dict = run_inference_for_single_image(image_np_expanded, detection_graph)
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
output_dict['detection_boxes'],
output_dict['detection_classes'],
output_dict['detection_scores'],
category_index,
instance_masks=output_dict.get('detection_masks'),
use_normalized_coordinates=True,
line_thickness=8)
cv2.imshow('image', cv2.resize(image_np,(1000,800)))
if cv2.waitKey(25) & 0xFF == ord('q'):
break
cv2.destroyAllWindows()
cam.release()
|