Academic Journal
SHEL5K: An Extended Dataset and Benchmarking for Safety Helmet Detection
Title: | SHEL5K: An Extended Dataset and Benchmarking for Safety Helmet Detection |
---|---|
Authors: | Munkh-Erdene Otgonbold, Munkhjargal Gochoo, Fady Alnajjar, Luqman Ali, Tan-Hsu Tan, Jun-Wei Hsieh, Ping-Yang Chen |
Source: | Sensors, Vol 22, Iss 6, p 2315 (2022) |
Publisher Information: | MDPI AG, 2022. |
Publication Year: | 2022 |
Collection: | LCC:Chemical technology |
Subject Terms: | YOLOv3, YOLOv4 YOLOv5, YOLOR, safety helmet, SHEL5K, object detection, Chemical technology, TP1-1185 |
More Details: | Wearing a safety helmet is important in construction and manufacturing industrial activities to avoid unpleasant situations. This safety compliance can be ensured by developing an automatic helmet detection system using various computer vision and deep learning approaches. Developing a deep-learning-based helmet detection model usually requires an enormous amount of training data. However, there are very few public safety helmet datasets available in the literature, in which most of them are not entirely labeled, and the labeled one contains fewer classes. This paper presents the Safety HELmet dataset with 5K images (SHEL5K) dataset, an enhanced version of the SHD dataset. The proposed dataset consists of six completely labeled classes (helmet, head, head with helmet, person with helmet, person without helmet, and face). The proposed dataset was tested on multiple state-of-the-art object detection models, i.e., YOLOv3 (YOLOv3, YOLOv3-tiny, and YOLOv3-SPP), YOLOv4 (YOLOv4 and YOLOv4pacsp-x-mish), YOLOv5-P5 (YOLOv5s, YOLOv5m, and YOLOv5x), the Faster Region-based Convolutional Neural Network (Faster-RCNN) with the Inception V2 architecture, and YOLOR. The experimental results from the various models on the proposed dataset were compared and showed improvement in the mean Average Precision (mAP). The SHEL5K dataset had an advantage over other safety helmet datasets as it contains fewer images with better labels and more classes, making helmet detection more accurate. |
Document Type: | article |
File Description: | electronic resource |
Language: | English |
ISSN: | 1424-8220 |
Relation: | https://www.mdpi.com/1424-8220/22/6/2315; https://doaj.org/toc/1424-8220 |
DOI: | 10.3390/s22062315 |
Access URL: | https://doaj.org/article/a2daf3239215465bb91f503ea297b401 |
Accession Number: | edsdoj.2daf3239215465bb91f503ea297b401 |
Database: | Directory of Open Access Journals |
Full text is not displayed to guests. | Login for full access. |
FullText | Links: – Type: pdflink Url: https://content.ebscohost.com/cds/retrieve?content=AQICAHjPtM4BHU3ZchRwgzYmadcigk49r9CVlbU7V5F6lgH7WwGSOomZrQoWB4RX1VegeaW_AAAA4jCB3wYJKoZIhvcNAQcGoIHRMIHOAgEAMIHIBgkqhkiG9w0BBwEwHgYJYIZIAWUDBAEuMBEEDL5U_mrOwyFbg6TtkwIBEICBmorUTDjqxICZIQyAcdeelYWhjc0llap_fczRH602ZM9g6mcpmJB_-ouuClyiTvxFhQ4Q_oIF1HCjAK8zFwlAKlG2gJi7pvC2fv1KUwXPWYkrsb-deVZv73s8nQltgnK7tHPl6K_keIxR9bukpBWTB43IpYgWV4ilLWwJVP4Hryv-RishEy3qlcBBAz0gLqtTXYXk96yW_I5Bkew= Text: Availability: 1 Value: <anid>AN0156096791;[1xdi]15mar.22;2022Apr07.05:51;v2.2.500</anid> <title id="AN0156096791-1">SHEL5K: An Extended Dataset and Benchmarking for Safety Helmet Detection </title> <p>Wearing a safety helmet is important in construction and manufacturing industrial activities to avoid unpleasant situations. This safety compliance can be ensured by developing an automatic helmet detection system using various computer vision and deep learning approaches. Developing a deep-learning-based helmet detection model usually requires an enormous amount of training data. However, there are very few public safety helmet datasets available in the literature, in which most of them are not entirely labeled, and the labeled one contains fewer classes. This paper presents the Safety HELmet dataset with 5K images (SHEL5K) dataset, an enhanced version of the SHD dataset. The proposed dataset consists of six completely labeled classes (helmet, head, head with helmet, person with helmet, person without helmet, and face). The proposed dataset was tested on multiple state-of-the-art object detection models, i.e., YOLOv3 (YOLOv3, YOLOv3-tiny, and YOLOv3-SPP), YOLOv4 (YOLOv4 and YOLOv4&lt;sub&gt;pacsp-x-mish&lt;/sub&gt;), YOLOv5-P5 (YOLOv5s, YOLOv5m, and YOLOv5x), the Faster Region-based Convolutional Neural Network (Faster-RCNN) with the Inception V2 architecture, and YOLOR. The experimental results from the various models on the proposed dataset were compared and showed improvement in the mean Average Precision (mAP). The SHEL5K dataset had an advantage over other safety helmet datasets as it contains fewer images with better labels and more classes, making helmet detection more accurate.</p> <p>Keywords: YOLOv3; YOLOv4 YOLOv5; YOLOR; safety helmet; SHEL5K; object detection; benchmark dataset</p> <hd id="AN0156096791-2">1. Introduction</hd> <p>Workplace safety has become a focus for many production and work sites due to the consequences of the unsafe environment on the health and productivity of the workforce. According to statistics [[<reflink idref="bib1" id="ref1">1</reflink>], [<reflink idref="bib3" id="ref2">3</reflink>]], the construction industry is at high risk for the injuries and deaths of workers. In 2005, the National Institute for Occupational Safety and Health (NIOSH) reported 1224 deaths of construction workers in 1 y, making it the most dangerous industry in the United States (U.S.) [[<reflink idref="bib1" id="ref3">1</reflink>]]. Moreover, the U.S. Bureau of Labor Statistics (BLS) estimated injuries for 150,000 workers every year at construction sites [[<reflink idref="bib1" id="ref4">1</reflink>]]. The Bureau also reported the death of one in five workers in 2014 and a total of 1061 construction workers' deaths in 2019 [[<reflink idref="bib2" id="ref5">2</reflink>]]. As per the report of the Ministry of Employment and Labor (MEOL) in Korea, 964 and 971 workers died in workplace accidents in 2016 and 2017, respectively [[<reflink idref="bib4" id="ref6">4</reflink>]]. Among these fatalities, 485 fatalities occurred at construction sites, followed by 217 and 154 in the manufacturing and service industry, respectively. Workers at most of the worksites and manual working environments are at high risk of injuries because of not following the safety measures and using Personal Protective Equipment (PPE). The carelessness of the workers and not following PPE compliance will have adverse effects and pose more threats of minor or major injuries. In 2012, the National Safety Council (NSC) reported more than 65,000 cases of head injuries and 1020 deaths at construction sites [[<reflink idref="bib5" id="ref7">5</reflink>]]. According to the American Journal of Industrial Medicine, a total number of 2210 construction workers died because of a Traumatic Brain Injury (TBI) from 2003 to 2010 [[<reflink idref="bib6" id="ref8">6</reflink>]]. Released by Headway, the brain injury association, 3% of PPE purchased was for head protection as head injuries account for more than 20% of total injuries [[<reflink idref="bib7" id="ref9">7</reflink>]].</p> <p>These statistics delineate the prevalence of fatal and non-fatal injuries in the construction industry, and there is a dire need to reduce the rate. Creating a safe environment for workers brings an arduous challenge for this sector globally. Adopting safety measures and providing construction workers with PPE can result in decreasing accident rates. Despite the effectiveness of these strategies, it is not guaranteed that the workers would be cautious and use the PPE. To avert all these troubles, there is a need to discover automated ways of detection and monitoring safety helmets. A deep-learning-based safety helmet detection system can be developed by using a large amount of labeled data. However, there is a lack of datasets to build highly accurate deep learning models for workers' helmet detection. There are few publicly available datasets for safety helmet detection, which are not entirely labeled, and the labeled ones contain fewer classes and incomplete labels. Therefore, the proposed work presents the Safety HELmet dataset with 5K images (SHEL5K) dataset, an enhanced version of the SHD dataset [[<reflink idref="bib8" id="ref10">8</reflink>]]. In the SHD dataset [[<reflink idref="bib8" id="ref11">8</reflink>]], many objects are not labeled, which is not sufficient to train an efficient helmet recognition model. The SHD dataset [[<reflink idref="bib8" id="ref12">8</reflink>]] was improved in the proposed work by labeling all three originally proposed classes and adding three more classes for training an efficient helmet detection model. The main aims of the proposed study were to: (<reflink idref="bib1" id="ref13">1</reflink>) complete the missing labels and (<reflink idref="bib2" id="ref14">2</reflink>) increase the number of classes from three to six (<emph>helmet</emph>, <emph>head with helmet</emph>, <emph>person with helmet</emph>, <emph>head</emph>, <emph>person without helmet</emph>, and <emph>face</emph>). The proposed dataset was tested on various object detection models, i.e., YOLOv3 [[<reflink idref="bib9" id="ref15">9</reflink>]], YOLOv3-tiny [[<reflink idref="bib10" id="ref16">10</reflink>]], YOLOv3-SPP [[<reflink idref="bib11" id="ref17">11</reflink>]], YOLOv4 [[<reflink idref="bib12" id="ref18">12</reflink>]], YOLOv5-P5 [[<reflink idref="bib13" id="ref19">13</reflink>]], the Faster Region-based Convolutional Neural Network (Faster-RCNN) [[<reflink idref="bib14" id="ref20">14</reflink>]] with the Inception V2 architecture [[<reflink idref="bib15" id="ref21">15</reflink>]], and YOLOR [[<reflink idref="bib16" id="ref22">16</reflink>]] models. The experimental results showed significant improvements in the mAP as compared to the publicly available datasets. A comparative analysis was performed, and discussions are provided based on results from the various models. The proposed system was also used to successfully perform real-time safety helmet detection in YouTube videos.</p> <hd id="AN0156096791-3">2. Related Work</hd> <p>In the literature, various efforts have been made by researchers to develop a vision-based system for the helmet detection task. Li et al. [[<reflink idref="bib17" id="ref23">17</reflink>]] proposed a Convolutional-Neural-Network (CNN)-based safety helmet detection method using a dataset of 3500 images collected by the web crawling method. The precision and recall of the system were recorded as 95% and 77%, respectively. Wang et al. [[<reflink idref="bib18" id="ref24">18</reflink>]] proposed a safety helmet detection model trained on a total of 10,000 images captured by 10 different surveillance cameras at construction sites. In the experiment's first phase, the authors employed the YOLOv3 architecture [[<reflink idref="bib9" id="ref25">9</reflink>]] and achieved an <emph>mAP</emph>0.5 of 42.5%. In the second phase, the authors improved the architecture of YOLOv3 [[<reflink idref="bib18" id="ref26">18</reflink>]] and achieved an <emph>mAP</emph>0.5 of 67.05%. Wang et al. [[<reflink idref="bib19" id="ref27">19</reflink>]] suggested a hardhat detection system based on a lightweight CNN using the Harvard database hardhat dataset [[<reflink idref="bib20" id="ref28">20</reflink>]]. The dataset contains 7064 annotated images, which consist of three classes (<emph>helmet</emph>, <emph>head</emph>, and <emph>person</emph>). In the three classes, the <emph>person</emph> class is not appropriately labeled. The network was trained considering two classes (<emph>helmet</emph> and <emph>head</emph>) and achieved an average accuracy of 87.4% and 89.4% for <emph>head</emph> and <emph>helmet</emph>, respectively. Li et al. [[<reflink idref="bib21" id="ref29">21</reflink>]] trained an automatic safety helmet-wearing detection system using the INRIA person dataset [[<reflink idref="bib22" id="ref30">22</reflink>]] and collected pedestrian data from a power substation. The authors in [[<reflink idref="bib21" id="ref31">21</reflink>]] showed that the accuracy of the proposed method (Color Feature Discrimination (CFD) and the ViBE algorithm in combination with the c4 classifier) yielded better results than HOG features and the SVM classifier method. The accuracy of the HOG feature with the SVM classifier achieved 89.2%, while the proposed method achieved an accuracy of 94.13%. Rubaiyat et al. [[<reflink idref="bib23" id="ref32">23</reflink>]] proposed an automated system for detecting helmets in construction safety. The authors collected 1000 images from the Internet using a web crawler, which consisted of 354 human images and 600 non-human images. The helmet class achieved an accuracy of 79.10%, while the without helmet class achieved an accuracy of 84.34%. Similarly, Kamboj and Powar [[<reflink idref="bib24" id="ref33">24</reflink>]] proposed an efficient deep-learning-based safety helmet detection system for the industrial environment by acquiring data from various videos of an industrial facility. The videos were captured by using cameras having a resolution of 1920 × 1080 px and a frame rate of 25 frames per second. The dataset consisted of 5773 images having two classes (<emph>helmet</emph> and <emph>without helmet</emph>). An improved helmet detection was proposed by Geng et al. [[<reflink idref="bib25" id="ref34">25</reflink>]] using an imbalanced dataset of 7581 images, mostly with a person in a helmet and a complex background. The label confidence of 0.982 was achieved by testing it on 689 images. Moreover, Long et al. [[<reflink idref="bib26" id="ref35">26</reflink>]] proposed a deep-learning-based detection of safety helmet wearing using 5229 images, acquired from the Internet and various power plants (including power plants under construction). The proposed system was based on SSD, and an <emph>mAP</emph>0.5 of 78.3% was achieved on the test images and compared with SSD, which was 70.8% using an IoU of 0.5. In the above studies [[<reflink idref="bib17" id="ref36">17</reflink>], [<reflink idref="bib23" id="ref37">23</reflink>], [<reflink idref="bib26" id="ref38">26</reflink>]], they used custom data to test their method; therefore, it is not fair to make a comparison of the proposed work in this paper with these methods.</p> <hd id="AN0156096791-4">2.1. Datasets for Safety Helmet Detection</hd> <p>In general, researchers develop helmet detection systems using custom data or publicly available datasets. Some of the publicly available datasets, i.e., [[<reflink idref="bib8" id="ref39">8</reflink>], [<reflink idref="bib20" id="ref40">20</reflink>], [<reflink idref="bib27" id="ref41">27</reflink>]], for safety helmet detection are summarized in Table 1. Table 1 shows a brief comparison of the proposed dataset in the current study with various publicly available datasets. Each dataset shown in Table 1 is explained in detail below.</p> <hd id="AN0156096791-5">2.1.1. Safety Helmet Detection Dataset</hd> <p>The Safety Helmet Detection (SHD) dataset [[<reflink idref="bib8" id="ref42">8</reflink>]] is a publicly available dataset on Kaggle containing 5000 labeled images and three classes (<emph>helmet—18,966</emph>, <emph>head—5785</emph>, and <emph>person—751</emph>). However, the dataset has many incompletely labeled objects. Figure 1b shows the dataset labels, which shows that the <emph>person</emph> class is not labeled.</p> <hd id="AN0156096791-6">2.1.2. Hardhat Dataset</hd> <p>The hardhat dataset [[<reflink idref="bib20" id="ref43">20</reflink>]] is a safety helmet dataset shared by Northeastern University consisting of 7063 labeled images. The dataset is divided into training and testing sets, which contain 5297 and 1766 images, respectively. The images are from three distinct classes having 27,249 labeled objects (<emph>helmet—19,852</emph>, <emph>head—6781</emph>, and <emph>person—616</emph>). In the given dataset, the <emph>person</emph> class is not labeled properly, as shown in Figure 1c, and the number of images in each class is not distributed equally.</p> <hd id="AN0156096791-7">2.1.3. Hard Hat Workers Object Detection Dataset</hd> <p>The Hard Hat Workers (HHW) dataset [[<reflink idref="bib27" id="ref44">27</reflink>]] is an improved version of the hardhat dataset [[<reflink idref="bib20" id="ref45">20</reflink>]] and is publicly available on the Roboflow website. In the HHW dataset [[<reflink idref="bib27" id="ref46">27</reflink>]], the number of labels in each class is increased (<emph>helmet—26,506</emph>, <emph>head—8263</emph>, and <emph>person—998</emph>). Figure 1d shows a sample image of the HHW dataset [[<reflink idref="bib27" id="ref47">27</reflink>]] labels in which it can be seen that the <emph>person</emph> class is not labeled.</p> <hd id="AN0156096791-8">2.1.4. Safety Helmet Wearing Dataset</hd> <p>The Safety Helmet Wearing (SHW) dataset [[<reflink idref="bib28" id="ref48">28</reflink>]] consists of 7581 images. The images have 111,514 safety helmet-wearing or positive class objects and 9044 not-wearing or negative class objects. Some of the negative class objects were obtained from the SCUT-HEAD dataset [[<reflink idref="bib29" id="ref49">29</reflink>]]. Several bugs of the original SCUT-HEAD dataset [[<reflink idref="bib29" id="ref50">29</reflink>]] were fixed to directly load the data into a normal PASCAL VOC format. Most images in the dataset are helmet images, and there are a very small number of head images. Figure 1e shows a labeled sample image from the SHW dataset. Figure 1a shows a comparison between the public datasets' labels and the SHEL5K dataset's labels.</p> <hd id="AN0156096791-9">3. SHEL5K Dataset</hd> <p>In the proposed work, the number of labels and classes in the SHD dataset [[<reflink idref="bib8" id="ref51">8</reflink>]] were extended and completed. Figure 2 shows sample images of the SHD dataset [[<reflink idref="bib8" id="ref52">8</reflink>]]. The SHD dataset [[<reflink idref="bib8" id="ref53">8</reflink>]] contains 5000 images having a resolution of 416 × 416 and 25,501 labels with complicated backgrounds and bounding box annotations in PASCAL VOC format for the three classes namely <emph>helmet</emph>, <emph>head</emph>, and <emph>person</emph>. The limitation of the SHD dataset [[<reflink idref="bib8" id="ref54">8</reflink>]] is that numerous objects are incompletely labeled. Figure 3a,b shows image samples with <emph>person</emph> and <emph>head</emph> not properly labeled. The main aims of the proposed study were to: (<reflink idref="bib1" id="ref55">1</reflink>) completed the missing labels and (<reflink idref="bib2" id="ref56">2</reflink>) increase the number of classes from three to six (<emph>helmet</emph>, <emph>head with helmet</emph>, <emph>person with helmet</emph>, <emph>head</emph>, <emph>person without helmet</emph>, and <emph>face</emph>).</p> <p>To address the limitations associated with the SHD dataset, SHEL5K is proposed, which consists of 75,570 labels. The number of labels in the SHEL5K dataset was increased for each class, i.e., (<emph>helmet—19,252</emph>, <emph>head—6120</emph>, <emph>head with helmet—16,048</emph>, <emph>person without helmet—5248</emph>, <emph>person with helmet—14,767</emph>, and <emph>face—14,135</emph>). Figure 3 shows the comparison of the labels of the SHD dataset [[<reflink idref="bib8" id="ref57">8</reflink>]] (a and b) and SHEL5K datasets (c and d), with the helmet in blue, the head in purple, the head with helmet in navy blue, the person with helmet in green, the person without a helmet in red, and the face in the yellow bounding boxes. Moreover, the graph in Figure 4 shows the comparison of the SHD dataset [[<reflink idref="bib8" id="ref58">8</reflink>]] and SHEL5K dataset in terms of the number of labels of each class. The SHD dataset [[<reflink idref="bib8" id="ref59">8</reflink>]] and SHEL5K labels are represented by blue and orange bars, respectively. From the graph, it can be seen that the class <emph>person</emph> is too poorly labeled. In the proposed work, the labeling of the image was performed by using the LabelImg [[<reflink idref="bib30" id="ref60">30</reflink>]] tool with the following steps: (<reflink idref="bib1" id="ref61">1</reflink>) the default number of classes in the tool was changed to six for our dataset; (<reflink idref="bib2" id="ref62">2</reflink>) images opening and label saving paths were specified; (<reflink idref="bib3" id="ref63">3</reflink>) objects corresponding to the classes were labeled, and an XML file was created.</p> <p>The file contains the name of the image, the path to the image, the image size and depth, and the coordinates of the producer image.</p> <hd id="AN0156096791-10">4. Results and Discussion</hd> <p>The proposed dataset SHEL5K was benchmarked by using state-of-the-art one-stage object detection models such as YOLOv3 [[<reflink idref="bib9" id="ref64">9</reflink>]], YOLOv4 [[<reflink idref="bib12" id="ref65">12</reflink>]] YOLOv5-P5 [[<reflink idref="bib13" id="ref66">13</reflink>]], the Faster-RCNN [[<reflink idref="bib14" id="ref67">14</reflink>]] with Inception v2 [[<reflink idref="bib15" id="ref68">15</reflink>]], and YOLOR [[<reflink idref="bib16" id="ref69">16</reflink>]]. In particular, we employed different pretrained variations of the models, i.e., YOLOv3-tiny [[<reflink idref="bib10" id="ref70">10</reflink>]], YOLOv3 [[<reflink idref="bib9" id="ref71">9</reflink>]], YOLOv3-SPP [[<reflink idref="bib11" id="ref72">11</reflink>]], YOLOv3-SPP pretrained on the MS COCO dataset [[<reflink idref="bib31" id="ref73">31</reflink>]], YOLOv3-SPP pretrained on the ImageNet dataset [[<reflink idref="bib32" id="ref74">32</reflink>]], and YOLOv5-P5 models (YOLOv5s, YOLOv5m, YOLOv5x) [[<reflink idref="bib13" id="ref75">13</reflink>]]. These models were prepared using the COCO 128 dataset, which contains the first 128 images of COCO train 2017 [[<reflink idref="bib31" id="ref76">31</reflink>]].</p> <hd id="AN0156096791-11">4.1. Evaluation Metrics</hd> <p>In the proposed work, the precision, recall, F1 score, and <emph>mAP</emph> were used as the evaluation metrics to perform a fair comparison between the experimental results of the models. The precision represents the object detection model's probability of the predicted bounding boxes being identical to the actual ground truth boxes and is described in Equation (<reflink idref="bib1" id="ref77">1</reflink>) below.</p> <p>(<reflink idref="bib1" id="ref78">1</reflink>) <ephtml> &lt;math display="block" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;Precision&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mfrac&gt;&lt;mrow&gt;&lt;mi&gt;T&lt;/mi&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;mi&gt;T&lt;/mi&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mi&gt;F&lt;/mi&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;/mfrac&gt;&lt;/mrow&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml></p> <p>where <emph>TP</emph>, <emph>TN</emph>, <emph>FP</emph>, and <emph>FN</emph> refer to True Positive, True Negative, False Positive, and False Negative, respectively. The recall represents the probability of ground truth objects being correctly detected as depicted in (<reflink idref="bib2" id="ref79">2</reflink>).</p> <p>(<reflink idref="bib2" id="ref80">2</reflink>) <ephtml> &lt;math display="block" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;Recall&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mfrac&gt;&lt;mrow&gt;&lt;mi&gt;T&lt;/mi&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;mi&gt;T&lt;/mi&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mi&gt;F&lt;/mi&gt;&lt;mi&gt;N&lt;/mi&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;/mfrac&gt;&lt;/mrow&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml></p> <p>Moreover, the F1 score is the harmonic mean of the model's precision and recall, and the mathematical representation is shown in Equation (<reflink idref="bib3" id="ref81">3</reflink>).</p> <p>(<reflink idref="bib3" id="ref82">3</reflink>) <ephtml> &lt;math display="block" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi mathvariant="normal"&gt;F&lt;/mi&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mspace width="0.166667em" /&gt;&lt;mi&gt;score&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;mo&gt;&amp;#8727;&lt;/mo&gt;&lt;mfrac&gt;&lt;mrow&gt;&lt;mi&gt;Precision&lt;/mi&gt;&lt;mo&gt;&amp;#8727;&lt;/mo&gt;&lt;mi&gt;Recall&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;Precision&lt;/mi&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mi&gt;Recall&lt;/mi&gt;&lt;/mrow&gt;&lt;/mfrac&gt;&lt;/mrow&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml></p> <p>Additionally, the mean Average Precision (<emph>mAP</emph>) is the score achieved by comparing the detected bounding box to the ground truth bounding box. If the intersection over union score of both the boxes is 50% or larger, the detection is considered as <emph>TP</emph>. The mathematical formula of the <emph>mAP</emph> is given in Equation (<reflink idref="bib4" id="ref83">4</reflink>) below.</p> <p>(<reflink idref="bib4" id="ref84">4</reflink>) <ephtml> &lt;math display="block" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;m&lt;/mi&gt;&lt;mi&gt;A&lt;/mi&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mfrac&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;/mfrac&gt;&lt;munderover&gt;&lt;mo&gt;&amp;#8721;&lt;/mo&gt;&lt;mrow&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;/mrow&gt;&lt;/munderover&gt;&lt;mi&gt;A&lt;/mi&gt;&lt;msub&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml></p> <p>where <ephtml> &lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;A&lt;/mi&gt;&lt;msub&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> is the average precision of class <emph>k</emph> and <emph>n</emph> represents the number of classes.</p> <hd id="AN0156096791-12">4.2. Experimental Setup</hd> <p>Data preparation started with the conversion of annotated files from the PASCAL VOC format to the YOLO format to be given as the input to object detection models. The proposed dataset was randomly divided into training and testing sets. The training set contained a total of 4000 (80%) images, while the testing set contained 1000 (20%) images. The criterion for evaluating the performance of the various models were the <emph>mAP</emph>0.5 (the and F1 score. During the experiments, the Intersection over Union (IoU) threshold value was kept at 0.5. YOLOv3-SPP [[<reflink idref="bib11" id="ref85">11</reflink>]], YOLOv4 [[<reflink idref="bib12" id="ref86">12</reflink>]], YOLOv5-P5 [[<reflink idref="bib13" id="ref87">13</reflink>]], and YOLOR [[<reflink idref="bib16" id="ref88">16</reflink>]] were considered trained on the proposed dataset as these models have the fastest inference time for real-time object detection as compared to the majority of object detection models. The reason is that these models perform classification and bounding box regression in a single step. Empirically, it was found that the suitable number of epochs for training the YOLOv3 models (YOLOv3-tiny [[<reflink idref="bib10" id="ref89">10</reflink>]], YOLOv3 [[<reflink idref="bib9" id="ref90">9</reflink>]], and YOLOv3-SPP [[<reflink idref="bib11" id="ref91">11</reflink>]]), and the Faster-RCNN [[<reflink idref="bib14" id="ref92">14</reflink>]] with Inception v2 [[<reflink idref="bib15" id="ref93">15</reflink>]] was 1000, while for the other models, namely YOLOv4 [[<reflink idref="bib12" id="ref94">12</reflink>]], YOLOv5-P5 [[<reflink idref="bib13" id="ref95">13</reflink>]], and YOLOR [[<reflink idref="bib16" id="ref96">16</reflink>]], the value was 500. The performance of these models was also compared with the Faster-RCNN [[<reflink idref="bib14" id="ref97">14</reflink>]] with Inception v2 [[<reflink idref="bib15" id="ref98">15</reflink>]] model, which is better at detecting small objects. The Faster-RCNN [[<reflink idref="bib14" id="ref99">14</reflink>]] with Inception V2 [[<reflink idref="bib15" id="ref100">15</reflink>]] model was trained with 250,000 steps, and the learning rate was set to 0.0002 while keeping the value of the batch size equal to 16. The results of the Faster-RCNN [[<reflink idref="bib14" id="ref101">14</reflink>]] with Inception v2 [[<reflink idref="bib15" id="ref102">15</reflink>]] were measured using the comparative analysis of object detection metrics with a companion open-source toolkit [[<reflink idref="bib33" id="ref103">33</reflink>]], which is similar to the YOLO models.</p> <hd id="AN0156096791-13">4.3. Three-Class Results</hd> <p>The SHD dataset [[<reflink idref="bib8" id="ref104">8</reflink>]] has three classes (<emph>helmet</emph>, <emph>head</emph>, and <emph>person</emph>) and the SHEL5K dataset has six classes (<emph>helmet</emph>, <emph>head with a helmet</emph>, <emph>a person with a helmet</emph>, <emph>head</emph>, <emph>person without a helmet</emph>, and <emph>face</emph>). In the current study, two classes <emph>person with helmet</emph> and <emph>person without helmet</emph> were combined to perform a fair comparison between the two datasets. Both classes were merged, as they correspond to the class <emph>person</emph> in the SHD dataset [[<reflink idref="bib8" id="ref105">8</reflink>]]. Figure 5 shows the comparison of the SHD and SHEL5K dataset results on the same images. For the SHD dataset [[<reflink idref="bib8" id="ref106">8</reflink>]] results, the person class was not detected. Table 2 shows the comparison results of the YOLOv3-SPP [[<reflink idref="bib11" id="ref107">11</reflink>]] and YOLOv5x [[<reflink idref="bib13" id="ref108">13</reflink>]] models on the SHD dataset [[<reflink idref="bib8" id="ref109">8</reflink>]] and SHEL5K dataset. For the sake of simplicity, the results of two models, YOLOv3-SPP [[<reflink idref="bib11" id="ref110">11</reflink>]] and YOLOv5x [[<reflink idref="bib13" id="ref111">13</reflink>]], are presented as they outperformed the remaining models. For the YOLOv5x [[<reflink idref="bib13" id="ref112">13</reflink>]] model, an <emph>mAP</emph>0.5 of 0.8528 was achieved for three classes, where the best and worst <emph>mAP</emph>0.5s were 0.8774 and 0.8311 for the <emph>helmet</emph> and <emph>person</emph> classes, respectively. The trained model showed low performance in the case of the <emph>person</emph> class in comparison with the <emph>helmet</emph> class. The YOLOv5x [[<reflink idref="bib13" id="ref113">13</reflink>]] model achieved better performance than YOLOv3-SPP [[<reflink idref="bib11" id="ref114">11</reflink>]] as shown in Table 2. The head class in the SHD dataset [[<reflink idref="bib8" id="ref115">8</reflink>]] achieved high a precision, recall and F1 score as it was properly labeled in comparison with the other classes. The helmet class results did not perform well as head with helmet and helmet were given a single label helmet in the dataset. Moreover, the results of the person class were low as the labeling of the person class was incomplete in the SHD dataset [[<reflink idref="bib8" id="ref116">8</reflink>]].</p> <p>Figure 6 shows the confusion matrices for the YOLOv5x [[<reflink idref="bib13" id="ref117">13</reflink>]] model based on various publicly available datasets. The confusion matrices computed for the HHW dataset [[<reflink idref="bib27" id="ref118">27</reflink>]], the hardhat dataset [[<reflink idref="bib20" id="ref119">20</reflink>]], and the SHD dataset [[<reflink idref="bib8" id="ref120">8</reflink>]] for three classes (<emph>helmet</emph>, <emph>head</emph>, and <emph>person</emph>) are plotted in Figure 6a–c, respectively. The confusion matrices showed very poor results for the class <emph>person</emph>, and the background FNs were also high. The background FNs are the unrecognized percentage of the labeled object, and its results were for all three datasets. The <emph>helmet</emph> and <emph>head</emph> classes performed very well, and the background FPs were recorded as high. Figure 6d shows the confusion matrices constructed by the object detection models on the SHW dataset [[<reflink idref="bib28" id="ref121">28</reflink>]] test dataset for two classes (<emph>hat</emph> and <emph>person</emph>). The confusion matrix shows that the YOLOv5x [[<reflink idref="bib13" id="ref122">13</reflink>]] model showed good performance on the SHW dataset [[<reflink idref="bib28" id="ref123">28</reflink>]] as compared to the other datasets. Overall, the performance of the other datasets was also promising, except for the person class, which was not detected by the model. Therefore, in the current study, the dataset was extended and labeled properly, and additional classes were added to have a more accurate detection of the <emph>person</emph> class in the SHEL5K dataset.</p> <hd id="AN0156096791-14">4.4. Six-Class Results</hd> <p>Table 3 and Table 4 show the comparison results of different variations of the YOLOv3-SPP [[<reflink idref="bib11" id="ref124">11</reflink>]] and YOLOv5-P5 [[<reflink idref="bib13" id="ref125">13</reflink>]] models trained on the SHEL5K dataset. For YOLOv3-SPP [[<reflink idref="bib11" id="ref126">11</reflink>]], three different models were evaluated using the SHEL5K dataset, in which one was trained from scratch (not pretrained) and the other two models were pretrained on the ImageNet dataset [[<reflink idref="bib32" id="ref127">32</reflink>]] and MS COCO dataset [[<reflink idref="bib31" id="ref128">31</reflink>]]. The highest <emph>mAP</emph>0.5 of 0.5572 was achieved by the YOLOv3-SPP [[<reflink idref="bib11" id="ref129">11</reflink>]] model pretrained on the ImageNet dataset [[<reflink idref="bib32" id="ref130">32</reflink>]]. For YOLOv3-SPP [[<reflink idref="bib11" id="ref131">11</reflink>]], the highest <emph>mAP</emph>0.5 of 0.6459 was achieved for the head with <emph>helmet</emph> class, while the two worst <emph>mAP</emph>0.5s of 0.007 and 0.0295 were reported for the <emph>face</emph> class when the model was trained from scratch and YOLOv3-SPP [[<reflink idref="bib11" id="ref132">11</reflink>]] was pretrained on the MS COCO dataset [[<reflink idref="bib31" id="ref133">31</reflink>]]. These models achieved an <emph>mAP</emph>0.5 value of nearly zero, which may be because the human faces were far away in most images and there was no face class included in the COCO dataset [[<reflink idref="bib31" id="ref134">31</reflink>]]. For the YOLOv5-P5 [[<reflink idref="bib13" id="ref135">13</reflink>]] model, Table 4 shows a comparison of the results of the three models of YOLOv5-P5 [[<reflink idref="bib13" id="ref136">13</reflink>]] on the SHEL5K dataset. The YOLOv5-P5 [[<reflink idref="bib13" id="ref137">13</reflink>]] model is available in the YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x models. However, in the current study, the YOLOv5s, YOLOv5m, and YOLOv5x models on the pretrained COCO128 dataset [[<reflink idref="bib31" id="ref138">31</reflink>]] were selected. The YOLOv5x [[<reflink idref="bib13" id="ref139">13</reflink>]] achieved an <emph>mAP</emph>0.5 of 0.8033, and the class with the highest <emph>mAP</emph>0.5 of 0.8565 was the class <emph>person with helmet</emph>. The results of the <emph>face</emph> class were relatively poor, and the <emph>mAP</emph>0.5 was 0.7196. The <emph>mAP</emph>0.5 of YOLOv5-P5 [[<reflink idref="bib13" id="ref140">13</reflink>]] was better than YOLOv3-SPP [[<reflink idref="bib11" id="ref141">11</reflink>]]. The results of the YOLOv5x [[<reflink idref="bib13" id="ref142">13</reflink>]] model on three different types of images captured at various distances (far, near, and medium) are shown in Figure 7.</p> <p>Figure 8 shows the confusion matrix of the SHEL5K dataset. The results were relatively low compared to the other public datasets. This is also evident from Table 5, which shows the comparison results of the YOLOv5x [[<reflink idref="bib13" id="ref143">13</reflink>]] model on various datasets including the SHEL5K dataset. The model trained on the SHEL5K dataset showed better results compared to the other datasets except for the SHW dataset [[<reflink idref="bib28" id="ref144">28</reflink>]]. The precision, recall, and F1 score achieved by the model on the proposed dataset were slightly lower than the SHW dataset [[<reflink idref="bib28" id="ref145">28</reflink>]]. The precision, recall, and F1 score of the model on the SHEL5K were recorded as 0.9188, 0.817, and 0.8644, respectively. This is because the SHW dataset [[<reflink idref="bib28" id="ref146">28</reflink>]] contains only two classes, while the proposed dataset contains six classes. Moreover, during the labeling of the proposed dataset, an image containing some part of the helmet and face was labeled as the helmet or face class, respectively. The Precision–Recall (PR) curve is also shown in Figure 8, which also depicts that the lowest <emph>mAP</emph>0.5 (0.72) was achieved by the face class, which was less than the <emph>mAP</emph> (0.80) of all the other classes.</p> <p>The results of the YOLOR model on the proposed dataset (SHEL5K) are summarized in Table 6. The YOLOR [[<reflink idref="bib16" id="ref147">16</reflink>]] model used in the proposed work was pretrained on the COCO dataset [[<reflink idref="bib31" id="ref148">31</reflink>]]. The model achieved an <emph>mAP</emph>0.5 of 0.8828, and the highest <emph>mAP</emph>0.5 of 0.911 was recorded for the class <emph>head with helmet</emph>. The result of the class <emph>person without helmet</emph> was relatively poor with an <emph>mAP</emph>0.5 of 0.8498. The results of the model on the sample images are depicted in Figure 9.</p> <p>Figure 10 compares the visualization results of the best model trained on the SHW dataset [[<reflink idref="bib28" id="ref149">28</reflink>]] and the SHEL5K dataset on a test image. It can be seen from the result of the model trained on the SHW dataset [[<reflink idref="bib28" id="ref150">28</reflink>]] in Figure 10a that the model was not able to detect the <emph>helmet</emph> class if the helmet in the image was half visible and the head of the worker was hidden, as shown in Figure 10a. The results of the model trained on the SHEL5K dataset are shown in Figure 10b, which shows that the model can detect the <emph>helmet</emph> class correctly, which shows that the labeling in the proposed dataset was performed efficiently. The state-of-the-art model trained on the SHEK5K dataset in the current study did not perform well. However, in the future, the proposed dataset will be given to new object detection models to achieve high performance.</p> <p>The K-fold cross-validation method was used to check whether the models were subjected to overfitting on the proposed data or not. The proposed dataset was divided into training 80% (4000 images) and testing 20% (1000 images). The value of K was considered five where the data were split into five folds, i.e., K1, K2, K3, K4 and K5. Table 7 shows the results of K-fold cross-validation on the SHEL5K dataset using the YOLOR [[<reflink idref="bib16" id="ref151">16</reflink>]] model. The results of all the folds were comparable, which shows that the model was not subjected to overfitting. The maximum <emph>mAP</emph>0.5 value of 0.8881 was achieved at fold K5, and the minimum <emph>mAP</emph>0.5 value of 0.861 was achieved at fold K4.</p> <p>The results of all the state-of-the-art models trained on the SHEL5K dataset are summarized in Table 8. The performance of the YOLO models was compared with the Faster-RCNN with the Inception V2 architecture. YOLOv3-tiny [[<reflink idref="bib10" id="ref152">10</reflink>]], YOLOv3 [[<reflink idref="bib9" id="ref153">9</reflink>]], and YOLOv3-SPP [[<reflink idref="bib11" id="ref154">11</reflink>]] were the models pretrained on the ImageNet dataset [[<reflink idref="bib32" id="ref155">32</reflink>]], while YOLOv5s, YOLOv5m, and YOLOv5x [[<reflink idref="bib13" id="ref156">13</reflink>]] were pretrained on the COCO128 dataset [[<reflink idref="bib31" id="ref157">31</reflink>]]. Detection results of the best yolov5x [[<reflink idref="bib13" id="ref158">13</reflink>]] models trained on SHEL5K dataset and other publicly available datasets [[<reflink idref="bib8" id="ref159">8</reflink>], [<reflink idref="bib20" id="ref160">20</reflink>], [<reflink idref="bib27" id="ref161">27</reflink>]] are illustrated in Appendix A. The best <emph>mAP</emph>0.5 of 0.8828 was achieved by the YOLOR [[<reflink idref="bib16" id="ref162">16</reflink>]] model with a precision, recall, and F1 score of 0.9322, 0.8066, and 0.8637, respectively. The lowest <emph>mAP</emph>0.5 score of 0.3689 was achieved by the Faster-RCNN [[<reflink idref="bib14" id="ref163">14</reflink>]] with a precision, recall, and F1 score of 0.7808, 0.3862, and 0.5167, respectively. The Faster-RCNN model achieved the highest inference time of 0.05 s. In the YOLO models, the YOLOv3-tiny [[<reflink idref="bib10" id="ref164">10</reflink>]] achieved the lowest <emph>mAP</emph>0.5 score of 0.3779 with a precision, recall, and F1 score of 0.7695, 0.4225, and 0.5408, respectively. Table 8 show the training time and testing time of all the models. The YOLOv3 tiny model had the lowest inference time of 0.006 s and fewer layers and parameters as compared to the other YOLO models. The YOLOR model achieved the highest <emph>mAP</emph>0.5 of 0.8828 with an optimum inference time of 0.012 s.</p> <hd id="AN0156096791-15">5. Conclusions</hd> <p>The proposed work aimed to extend the number of classes and labels of the publicly available SHD dataset [[<reflink idref="bib8" id="ref165">8</reflink>]]. The SHD dataset [[<reflink idref="bib8" id="ref166">8</reflink>]] contains 5000 images with three object classes (<emph>helmet</emph>, <emph>head</emph>, and <emph>person</emph>); however, most of the images were incompletely labeled. Therefore, a new dataset named SHEL5K (publicly available at https://data.mendeley.com/datasets/9rcv8mm682/draft?a=28c11744-48e7-4810-955b-d76e853beae5 (accessed on 5 January 2022)) was proposed by adding three more classes and completely labeling all 5000 images of the SHD dataset [[<reflink idref="bib8" id="ref167">8</reflink>]]. The proposed dataset was benchmarked on the various state-of-the-art one-stage object detection models, namely YOLOv3-tiny [[<reflink idref="bib10" id="ref168">10</reflink>]], YOLOv3 [[<reflink idref="bib9" id="ref169">9</reflink>]], YOLOv3-SPP [[<reflink idref="bib11" id="ref170">11</reflink>]], YOLOv4 [[<reflink idref="bib12" id="ref171">12</reflink>]], YOLOv5-P5 [[<reflink idref="bib13" id="ref172">13</reflink>]], the Faster-RCNN [[<reflink idref="bib14" id="ref173">14</reflink>]] with Inception v2 [[<reflink idref="bib15" id="ref174">15</reflink>]], and YOLOR [[<reflink idref="bib16" id="ref175">16</reflink>]]. The experimental results showed significant improvements in the <emph>mAP</emph>0.5 s of the compared models. From the experimental result of the models on the proposed dataset (SHEL5K), it can be concluded that all the models showed promising performances in detecting all classes. It can also be concluded that the proposed dataset had an advantage over the SHD dataset [[<reflink idref="bib8" id="ref176">8</reflink>]] in terms of images and labeling. Moreover, models trained on the proposed dataset can be used for a real-time safety helmet detection task. In the future, we will improve the real-time recognition rate of the safety helmet detection focusing on misclassified cases.</p> <hd id="AN0156096791-16">Figures and Tables</hd> <p>Graph: Figure 1 Comparison of public safety helmet datasets' labels and SHEL5K dataset's labels: (a) SHEL5K dataset, (b) SHD dataset [[<reflink idref="bib8" id="ref177">8</reflink>]], (c) hardhat dataset [[<reflink idref="bib20" id="ref178">20</reflink>]], (d) HHW dataset [[<reflink idref="bib27" id="ref179">27</reflink>]], and (e) SHW dataset [[<reflink idref="bib28" id="ref180">28</reflink>]].</p> <p>Graph: Figure 2 Sample images of the SHEL5K dataset.</p> <p>Graph: Figure 3 (a,b) SHD dataset [[<reflink idref="bib8" id="ref181">8</reflink>]] labels; (c,d) SHEL5K dataset labels.</p> <p>Graph: Figure 4 Bar graph comparison between the SHD dataset [[<reflink idref="bib8" id="ref182">8</reflink>]] and SHEL5K dataset in terms of the number of labels for each class.</p> <p>Graph: Figure 5 Comparison of the SHD and SHEL5K dataset results on the same images. (a–c) The results of the best SHD dataset [[<reflink idref="bib8" id="ref183">8</reflink>]] model. (d–f) The results of the best SHEL5K dataset model.</p> <p>Graph: Figure 6 Confusion matrices of the YOLOv5x model on (a) the SHD dataset [[<reflink idref="bib8" id="ref184">8</reflink>]], (b) hardhat dataset [[<reflink idref="bib20" id="ref185">20</reflink>]], (c) HHW dataset [[<reflink idref="bib27" id="ref186">27</reflink>]], and (d) SHW dataset [[<reflink idref="bib28" id="ref187">28</reflink>]].</p> <p>Graph: Figure 7 The YOLOv5x [[<reflink idref="bib13" id="ref188">13</reflink>]] detected outputs are plotted with the original images.</p> <p>Graph: Figure 8 Confusion matrix and PR curve of the object detection model calculated on the SHEL5K dataset and the YOLOv5x [[<reflink idref="bib13" id="ref189">13</reflink>]] model.</p> <p>Graph: Figure 9 Result of the YOLOR [[<reflink idref="bib16" id="ref190">16</reflink>]] model experiments on the sample images.</p> <p>Graph: Figure 10 Comparison of the best-trained model results on the (a) SHW dataset [[<reflink idref="bib28" id="ref191">28</reflink>]] and the (b) SHEL5K dataset.</p> <p>Table 1 Comparison of public safety helmet datasets and the SHEL5K dataset.</p> <p> <ephtml> &lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th align="left" style="border-bottom:solid thin;border-top:solid thin"&gt;Datasets&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin;border-top:solid thin"&gt;Hardhat&amp;#160;[&lt;xref ref-type="bibr" rid="bibr20"&gt;20&lt;/xref&gt;]&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin;border-top:solid thin"&gt;HHW&amp;#160;[&lt;xref ref-type="bibr" rid="bibr27"&gt;27&lt;/xref&gt;]&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin;border-top:solid thin"&gt;SHD&amp;#160;[&lt;xref ref-type="bibr" rid="bibr8"&gt;8&lt;/xref&gt;]&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin;border-top:solid thin"&gt;SHW&amp;#160;[&lt;xref ref-type="bibr" rid="bibr28"&gt;28&lt;/xref&gt;]&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin;border-top:solid thin"&gt;SHEL5K&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Total sample&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;7063&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;7041&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;5000&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;7581&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;5000&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Class&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;3&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;3&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;3&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;2&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;6&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan="6" align="center" valign="middle" style="border-bottom:solid thin"&gt;Number of labels in each class&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Helmet&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;19,852&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;26,506&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;18,966&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;-&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;19,252&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Head&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;6781&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;8263&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;5785&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;-&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;6120&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Person *&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;616&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;998&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;751&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;9044&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;-&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Head and helmet&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;-&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;-&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;-&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;-&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;16,048&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Person not helmet&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;-&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;-&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;-&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;-&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;5248&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Person and helmet&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;-&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;-&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;-&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;-&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;14,767&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Face&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;-&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;-&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;-&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;-&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;14,135&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Hat **&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;-&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;-&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;-&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;111,514&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;-&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Total&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;27,249&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;35,767&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;25,502&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;120,558&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;75,570&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> * The <emph>person</emph> class of the SHD dataset is called <emph>head</emph>. ** The <emph>hat</emph> class of the SHD dataset is called <emph>helmet</emph>.</p> <p>Table 2 Comparison between two dataset results for 3 classes on the YOLOv3-SPP [[<reflink idref="bib11" id="ref192">11</reflink>]] and YOLOv5x [[<reflink idref="bib13" id="ref193">13</reflink>]] models.</p> <p> <ephtml> &lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th align="left" style="border-top:solid thin" /&gt;&lt;th colspan="8" align="center" style="border-top:solid thin;border-bottom:solid thin"&gt;YOLOv3-SPP&amp;#160;[&lt;xref ref-type="bibr" rid="bibr11"&gt;11&lt;/xref&gt;]&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th align="left" style="border-bottom:solid thin" /&gt;&lt;th colspan="4" align="center" style="border-bottom:solid thin"&gt;SHD Dataset&amp;#160;[&lt;xref ref-type="bibr" rid="bibr8"&gt;8&lt;/xref&gt;]&lt;/th&gt;&lt;th colspan="4" align="center" style="border-bottom:solid thin"&gt;SHEL5K Dataset with 3 Classes&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;Class&lt;/bold&gt;&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;Precision&lt;/bold&gt;&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;Recall&lt;/bold&gt;&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;&lt;italic&gt;&lt;bold&gt;mAP&lt;/bold&gt;&lt;/italic&gt;&lt;bold&gt;0.5&lt;/bold&gt;&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;F1&lt;/bold&gt;&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;Precision&lt;/bold&gt;&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;Recall&lt;/bold&gt;&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;&lt;italic&gt;&lt;bold&gt;mAP&lt;/bold&gt;&lt;/italic&gt;&lt;bold&gt;0.5&lt;/bold&gt;&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;F1&lt;/bold&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle"&gt;Helmet&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.9578&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.4976&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.4869&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.6549&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.9222&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.7197&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.7028&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.8084&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle"&gt;Head&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.9154&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.302&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.2923&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.4542&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.9114&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.6642&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.6484&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.7684&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Person&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.9092&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.6354&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.6148&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.748&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Average&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.6244&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.2665&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.2597&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.3697&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.9143&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.6731&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.6553&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.775&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" /&gt;&lt;td colspan="8" align="center" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;YOLOv5x&amp;#160;[13]&lt;/bold&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin" /&gt;&lt;td colspan="4" align="center" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;SHD Dataset&amp;#160;[8]&lt;/bold&gt;&lt;/td&gt;&lt;td colspan="4" align="center" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;SHEL5K Dataset with 3 Classes&lt;/bold&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;Class&lt;/bold&gt;&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;Precision&lt;/bold&gt;&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;Recall&lt;/bold&gt;&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;&lt;italic&gt;&lt;bold&gt;mAP&lt;/bold&gt;&lt;/italic&gt;&lt;bold&gt;0.5&lt;/bold&gt;&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;F1&lt;/bold&gt;&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;Precision&lt;/bold&gt;&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;Recall&lt;/bold&gt;&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;&lt;italic&gt;&lt;bold&gt;mAP&lt;/bold&gt;&lt;/italic&gt;&lt;bold&gt;0.5&lt;/bold&gt;&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;F1&lt;/bold&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle"&gt;Helmet&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.9559&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.9162&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.9162&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.9356&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.9402&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.8858&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.8774&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.9122&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle"&gt;Head&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.909&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.879&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.8686&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.8938&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.9216&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.8562&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.8499&lt;/td&gt;&lt;td align="left" valign="middle"&gt;0.8877&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Person&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.0345&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.0052&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.0003&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.009&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.9203&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.8409&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.8311&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.8788&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Average&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.6331&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.6001&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.595&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.6128&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.9274&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.861&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.8528&lt;/td&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;0.8929&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>Table 3 Comparison results of different variations of the YOLOv3 models (a) trained from scratch, (b) pretrained on the ImageNet dataset [[<reflink idref="bib32" id="ref194">32</reflink>]], and (c) pretrained on the MS COCO dataset [[<reflink idref="bib31" id="ref195">31</reflink>]].</p> <p> <ephtml> &lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th align="center" style="border-top:solid thin" /&gt;&lt;th colspan="12" align="center" style="border-top:solid thin;border-bottom:solid thin"&gt;YOLOv3-SPP&amp;#160;[&lt;xref ref-type="bibr" rid="bibr11"&gt;11&lt;/xref&gt;]&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th align="center" style="border-bottom:solid thin" /&gt;&lt;th colspan="4" align="center" style="border-bottom:solid thin"&gt;Scratch&lt;/th&gt;&lt;th colspan="4" align="center" style="border-bottom:solid thin"&gt;Pretrained on ImagesNet Dataset&amp;#160;[&lt;xref ref-type="bibr" rid="bibr32"&gt;32&lt;/xref&gt;]&lt;/th&gt;&lt;th colspan="4" align="center" style="border-bottom:solid thin"&gt;Pretrained on MS COCO Dataset&amp;#160;[&lt;xref ref-type="bibr" rid="bibr31"&gt;31&lt;/xref&gt;]&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;Class&lt;/bold&gt;&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;Precision&lt;/bold&gt;&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;Recall&lt;/bold&gt;&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;&lt;italic&gt;&lt;bold&gt;mAP&lt;/bold&gt;&lt;/italic&gt;&lt;bold&gt;0.5&lt;/bold&gt;&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;F1&lt;/bold&gt;&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;Precision&lt;/bold&gt;&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;Recall&lt;/bold&gt;&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;&lt;italic&gt;&lt;bold&gt;mAP&lt;/bold&gt;&lt;/italic&gt;&lt;bold&gt;0.5&lt;/bold&gt;&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;F1&lt;/bold&gt;&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;Precision&lt;/bold&gt;&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;Recall&lt;/bold&gt;&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;&lt;italic&gt;&lt;bold&gt;mAP&lt;/bold&gt;&lt;/italic&gt;&lt;bold&gt;0.5&lt;/bold&gt;&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;F1&lt;/bold&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle"&gt;Helmet&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.9253&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.3144&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.3053&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.4693&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.9373&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.6275&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.6105&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.7518&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8277&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.2971&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.2602&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.4372&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle"&gt;Head with helmet&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.9244&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.4035&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.3871&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.5618&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.9349&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.6668&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.6459&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.7784&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.7806&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.463&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.4043&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.5813&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle"&gt;person with helmet&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.7778&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.1442&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.12&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.2433&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8746&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.6288&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.5924&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.7316&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8622&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.4491&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.4076&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.5906&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle"&gt;Head&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8868&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.2295&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.2173&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.3646&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.9268&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.6184&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.5978&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.7418&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8378&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.2775&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.2422&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.4169&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle"&gt;Person without helmet&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8241&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.1563&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.1339&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.2628&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8784&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.4957&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.4729&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.6338&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8389&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.3823&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.3528&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.5252&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Face&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.4191&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.0556&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.0295&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.0982&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.7588&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.4715&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.4238&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.5816&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.3978&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.013&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.007&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.0252&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Average&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.7929&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.2173&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.1988&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.3333&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8851&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.5848&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.5572&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.7032&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.7575&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.3137&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.279&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.4294&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>Table 4 Comparison results of different variations of YOLOv5-P5 [[<reflink idref="bib13" id="ref196">13</reflink>]]: (a) YOLOv5s, (b) YOLOv5m, and (c) YOLOv5x.</p> <p> <ephtml> &lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th align="center" style="border-top:solid thin;border-bottom:solid thin" /&gt;&lt;th colspan="4" align="center" style="border-top:solid thin;border-bottom:solid thin"&gt;YOLOv5s&amp;#160;[&lt;xref ref-type="bibr" rid="bibr13"&gt;13&lt;/xref&gt;]&lt;/th&gt;&lt;th colspan="4" align="center" style="border-top:solid thin;border-bottom:solid thin"&gt;YOLOv5m&amp;#160;[&lt;xref ref-type="bibr" rid="bibr13"&gt;13&lt;/xref&gt;]&lt;/th&gt;&lt;th colspan="4" align="center" style="border-top:solid thin;border-bottom:solid thin"&gt;YOLOv5x&amp;#160;[&lt;xref ref-type="bibr" rid="bibr13"&gt;13&lt;/xref&gt;]&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th align="left" style="border-bottom:solid thin"&gt;Class&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;Precision&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;Recall&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;&lt;italic&gt;mAP&lt;/italic&gt;0.5&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;F1&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;Precision&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;Recall&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;&lt;italic&gt;mAP&lt;/italic&gt;0.5&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;F1&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;Precision&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;Recall&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;&lt;italic&gt;mAP&lt;/italic&gt;0.5&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;F1&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td align="left" valign="middle"&gt;Helmet&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.961&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.7825&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.872&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8626&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.9632&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.7981&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8795&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8729&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.96&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8205&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8896&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8848&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle"&gt;Head with helmet&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.9437&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.7973&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8761&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8608&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.9476&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.7946&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8783&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8641&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.9357&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8247&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8912&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8767&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle"&gt;Person with helmet&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.9061&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8385&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8935&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.871&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.9131&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8346&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8922&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8721&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8953&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8723&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.9089&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8836&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle"&gt;Head&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.9341&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8219&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.889&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8744&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.9335&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8252&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8897&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.876&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.9344&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8497&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.9025&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.89&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle"&gt;Person without helmet&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8791&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.7583&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8493&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8142&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8872&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.7602&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8527&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8188&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8921&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.7924&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8732&lt;/td&gt;&lt;td align="center" valign="middle"&gt;0.8393&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Face&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8991&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.6514&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.7863&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.7558&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9061&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.6982&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8122&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.7886&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.895&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.7427&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8301&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8117&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Average&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9207&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.774&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.861&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8397&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9251&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.7851&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8687&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.84887&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9188&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.817&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8826&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8644&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>Table 5 The result of the YOLOv5x [[<reflink idref="bib13" id="ref197">13</reflink>]] and YOLOR [[<reflink idref="bib16" id="ref198">16</reflink>]] models on the publicly available datasets and the proposed SHEL5K dataset.</p> <p> <ephtml> &lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th align="center" style="border-top:solid thin;border-bottom:solid thin" /&gt;&lt;th align="center" style="border-top:solid thin;border-bottom:solid thin" /&gt;&lt;th colspan="4" align="center" style="border-top:solid thin;border-bottom:solid thin"&gt;YOLOv5x&amp;#160;[&lt;xref ref-type="bibr" rid="bibr13"&gt;13&lt;/xref&gt;]&lt;/th&gt;&lt;th colspan="4" align="center" style="border-top:solid thin;border-bottom:solid thin"&gt;YOLOR&amp;#160;[&lt;xref ref-type="bibr" rid="bibr16"&gt;16&lt;/xref&gt;]&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th align="left" style="border-bottom:solid thin"&gt;Datasets&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;Class&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;Precision&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;Recall&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;&lt;italic&gt;mAP&lt;/italic&gt;0.5&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;F1&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;Precision&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;Recall&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;&lt;italic&gt;mAP&lt;/italic&gt;0.5&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;F1&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;SHW&amp;#160;[&lt;xref ref-type="bibr" rid="bibr28"&gt;28&lt;/xref&gt;]&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;2&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9334&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9297&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9219&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9294&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9486&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8063&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.889&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8697&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Hardhat&amp;#160;[&lt;xref ref-type="bibr" rid="bibr20"&gt;20&lt;/xref&gt;]&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;3&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.6715&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.6545&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.6389&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.6546&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.6367&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.6263&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.6407&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.6315&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;HHW&amp;#160;[&lt;xref ref-type="bibr" rid="bibr27"&gt;27&lt;/xref&gt;]&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;3&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.6355&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.6295&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.6214&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.6288&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.6289&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.6177&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.6344&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.6233&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;SHD&amp;#160;[&lt;xref ref-type="bibr" rid="bibr8"&gt;8&lt;/xref&gt;]&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;3&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.6331&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.6001&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.595&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.6128&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.6211&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.6341&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.6431&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.6276&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;SHEL5K&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;6&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9187&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.817&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8826&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8644&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9322&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8066&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8828&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8637&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>Table 6 The result of the YOLOR [[<reflink idref="bib16" id="ref199">16</reflink>]] model on the SHEL5K dataset.</p> <p> <ephtml> &lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th colspan="5" align="center" style="border-bottom:solid thin;border-top:solid thin"&gt;YOLOR&amp;#160;[&lt;xref ref-type="bibr" rid="bibr16"&gt;16&lt;/xref&gt;]&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;Class&lt;/bold&gt;&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;Precision&lt;/bold&gt;&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;Recall&lt;/bold&gt;&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;&lt;italic&gt;&lt;bold&gt;mAP&lt;/bold&gt;&lt;/italic&gt;&lt;bold&gt;0.5&lt;/bold&gt;&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;&lt;bold&gt;F1&lt;/bold&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Helmet&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9658&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.7981&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8846&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.874&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Head with helmet&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9464&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8172&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8898&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.877&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Person with helmet&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9225&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8771&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9204&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8992&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Head&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9461&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8464&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9068&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8935&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Person without helmet&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8859&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8019&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8767&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8418&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Face&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9264&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.6992&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8182&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.797&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Average&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9322&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8066&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8828&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8637&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>Table 7 The results of K-fold cross-validation on the SHEL5K dataset using the YOLOR model [[<reflink idref="bib16" id="ref200">16</reflink>]].</p> <p> <ephtml> &lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th align="center" style="border-top:solid thin" /&gt;&lt;th colspan="2" align="center" style="border-top:solid thin;border-bottom:solid thin"&gt;K1&lt;/th&gt;&lt;th colspan="2" align="center" style="border-top:solid thin;border-bottom:solid thin"&gt;K2&lt;/th&gt;&lt;th colspan="2" align="center" style="border-top:solid thin;border-bottom:solid thin"&gt;K3&lt;/th&gt;&lt;th colspan="2" align="center" style="border-top:solid thin;border-bottom:solid thin"&gt;K4&lt;/th&gt;&lt;th colspan="2" align="center" style="border-top:solid thin;border-bottom:solid thin"&gt;K5&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th align="center" style="border-bottom:solid thin" /&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;&lt;italic&gt;mAP&lt;/italic&gt;0.5&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;F1&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;&lt;italic&gt;mAP&lt;/italic&gt;0.5&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;F1&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;&lt;italic&gt;mAP&lt;/italic&gt;0.5&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;F1&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;&lt;italic&gt;mAP&lt;/italic&gt;0.5&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;F1&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;&lt;italic&gt;mAP&lt;/italic&gt;0.5&lt;/th&gt;&lt;th align="center" style="border-bottom:solid thin"&gt;F1&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Helmet&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8846&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.874&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8813&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8704&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8878&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8787&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.881&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8702&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8896&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.878&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Head with helmet&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8898&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.877&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8848&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8741&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8932&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8815&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8859&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8713&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8953&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.88&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;person with helmet&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9204&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8992&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9146&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8976&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9213&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9048&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9319&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9117&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9226&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9037&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Head&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9068&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8935&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.893&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8805&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8979&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.885&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9068&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8921&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9134&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9003&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;person without helmet&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8767&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8418&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8731&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8433&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8867&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8547&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8749&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8412&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8832&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8584&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;face&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8182&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.797&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8213&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.7943&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.814&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.79&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8094&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.7795&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8244&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8008&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Average&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8828&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8637&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.878&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8614&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8835&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8658&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8817&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.861&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8881&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8714&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>Table 8 Results of state-of-the-art models on the SHEL5K dataset.</p> <p> <ephtml> &lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th align="left" style="border-top:solid thin;border-bottom:solid thin"&gt;Models&lt;/th&gt;&lt;th align="center" style="border-top:solid thin;border-bottom:solid thin"&gt;Precision&lt;/th&gt;&lt;th align="center" style="border-top:solid thin;border-bottom:solid thin"&gt;Recall&lt;/th&gt;&lt;th align="center" style="border-top:solid thin;border-bottom:solid thin"&gt;&lt;italic&gt;mAP&lt;/italic&gt;0.5&lt;/th&gt;&lt;th align="center" style="border-top:solid thin;border-bottom:solid thin"&gt;F1&lt;/th&gt;&lt;th align="center" style="border-top:solid thin;border-bottom:solid thin"&gt;Training Time&lt;break /&gt;(hours)&lt;/th&gt;&lt;th align="center" style="border-top:solid thin;border-bottom:solid thin"&gt;Testing Time&lt;break /&gt;(s)&lt;/th&gt;&lt;th align="center" style="border-top:solid thin;border-bottom:solid thin"&gt;Parameters&lt;break /&gt;(Million)&lt;/th&gt;&lt;th align="center" style="border-top:solid thin;border-bottom:solid thin"&gt;Layers&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;Faster-RCNN&amp;#160;[&lt;xref ref-type="bibr" rid="bibr14"&gt;14&lt;/xref&gt;]&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.7808&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.3862&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.3689&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.5167&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;55.6&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.084&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;13.3&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;48&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;YOLOv3-tiny&amp;#160;[&lt;xref ref-type="bibr" rid="bibr10"&gt;10&lt;/xref&gt;]&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.7695&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.4225&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.3779&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.5408&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;5.2&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.006&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;8.7&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;37&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;YOLOv3&amp;#160;[&lt;xref ref-type="bibr" rid="bibr9"&gt;9&lt;/xref&gt;]&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8509&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.4482&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.417&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.5848&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;24.6&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.011&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;61.6&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;222&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;YOLOv3-SPP&amp;#160;[&lt;xref ref-type="bibr" rid="bibr11"&gt;11&lt;/xref&gt;]&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8851&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.5848&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.5572&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.7032&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;24.6&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.012&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;62.6&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;225&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;YOLOv4&amp;#160;[&lt;xref ref-type="bibr" rid="bibr12"&gt;12&lt;/xref&gt;]&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.925&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.7798&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.7693&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8449&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;11.2&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.014&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;63.9&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;488&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;YOLOv4&lt;sub&gt;pacsp-x-mish&lt;/sub&gt; [&lt;xref ref-type="bibr" rid="bibr12"&gt;12&lt;/xref&gt;]&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9195&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8036&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.7915&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8567&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;14.5&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.014&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;63.9&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;488&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;YOLOv5s&amp;#160;[&lt;xref ref-type="bibr" rid="bibr13"&gt;13&lt;/xref&gt;]&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9205&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.774&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.861&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8397&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.3&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.018&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;7.1&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;224&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;YOLOv5m&amp;#160;[&lt;xref ref-type="bibr" rid="bibr13"&gt;13&lt;/xref&gt;]&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9251&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.7851&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8687&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8488&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;2.7&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.022&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;21.1&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;308&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;YOLOv5x&amp;#160;[&lt;xref ref-type="bibr" rid="bibr13"&gt;13&lt;/xref&gt;]&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9188&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.817&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8826&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8644&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;6.3&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.032&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;87.2&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;476&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left" valign="middle" style="border-bottom:solid thin"&gt;YOLOR&amp;#160;[&lt;xref ref-type="bibr" rid="bibr16"&gt;16&lt;/xref&gt;]&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.9322&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8066&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8828&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.8637&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;9.8&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;0.012&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;36.9&lt;/td&gt;&lt;td align="center" valign="middle" style="border-bottom:solid thin"&gt;665&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <hd id="AN0156096791-17">Author Contributions</hd> <p>Software, M.-E.O.; Supervision, M.G.; Writing—original draft, M.-E.O. and M.G.; Writing—review &amp; editing, M.-E.O., M.G., F.A., L.A., T.-H.T., J.-W.H. and P.-Y.C. All authors have read and agreed to the published version of the manuscript.</p> <hd id="AN0156096791-18">Funding</hd> <p>This research was funded by United Arab Emirates University. (grant no. 12R012).</p> <hd id="AN0156096791-19">Institutional Review Board Statement</hd> <p>Not available.</p> <hd id="AN0156096791-20">Informed Consent Statement</hd> <p>We used a publicly available dataset and do not need to be informed.</p> <hd id="AN0156096791-21">Data Availability Statement</hd> <p>The dataset is publicly available in Mendeley Data and can be found at: https://data.mendeley.com/datasets/9rcv8mm682/draft?a=28c11744-48e7-4810-955b-d76e853beae5 (accessed on 5 January 2022).</p> <hd id="AN0156096791-22">Conflicts of Interest</hd> <p>The authors declare no conflict of interest.</p> <hd id="AN0156096791-23">Acknowledgments</hd> <p>The authors would like to thank everyone who helped manually label the dataset and UAEU for providing the DGX-1 supercomputer.</p> <hd id="AN0156096791-24">Appendix A</hd> <p>Graph: Figure A1 Predictions of the best model trained on the SHEL5k dataset and the ground truth labels of the sample images.</p> <p>Graph: Figure A2 Predictions of the best model trained on the hardhat dataset [[<reflink idref="bib20" id="ref201">20</reflink>]] and the ground truth labels of the sample images.</p> <p>Graph: Figure A3 Predictions of the best model trained on the SHW dataset [[<reflink idref="bib28" id="ref202">28</reflink>]] and the ground truth labels of the sample images.</p> <p>Graph: Figure A4 Predictions of the best model trained on the SHD dataset [[<reflink idref="bib8" id="ref203">8</reflink>]] and the ground truth labels of the sample images.</p> <p>Graph: Figure A5 Predictions of the best model trained on the HHW dataset [[<reflink idref="bib27" id="ref204">27</reflink>]] and the ground truth labels of the sample images.</p> <ref id="AN0156096791-25"> <title> Footnotes </title> <blist> <bibl id="bib1" idref="ref1" type="bt">1</bibl> <bibtext> Publisher's Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.</bibtext> </blist> </ref> <ref id="AN0156096791-26"> <title> References </title> <blist> <bibtext> Keller J.R. Construction Accident StatisticsAvailable online: https://<ulink href="http://www.2keller.com/library/construction-accident-statistics.cfm(accessed">www.2keller.com/library/construction-accident-statistics.cfm(accessed</ulink> on 5 January 2022)</bibtext> </blist> <blist> <bibl id="bib2" idref="ref5" type="bt">2</bibl> <bibtext> U.S. Bureau of Labor Statistics (BLS) National Census of Fatal Occupational Injuries in 2014September. 2015Available online: https://<ulink href="http://www.bls.gov/news.release/archives/cfoi%5f09172015.pdf(accessed">www.bls.gov/news.release/archives/cfoi%5f09172015.pdf(accessed</ulink> on 5 January 2022)</bibtext> </blist> <blist> <bibl id="bib3" idref="ref2" type="bt">3</bibl> <bibtext> U.S. Bureau of Labor Statistics (BLS) Census of Fatal Occupational Injuries Summary, 2019December. 2020Available online: https://<ulink href="http://www.bls.gov/news.release/cfoi.nr0.html(accessed">www.bls.gov/news.release/cfoi.nr0.html(accessed</ulink> on 5 January 2022)</bibtext> </blist> <blist> <bibl id="bib4" idref="ref6" type="bt">4</bibl> <bibtext> Jeon J.-h. 971 S. Korean Workers Died on the Job in 2018, 7 More than Previous YearMay. 2019Available online: https://<ulink href="http://www.hani.co.kr/arti/english%5fedition/e%5fnational/892709.html(accessed">www.hani.co.kr/arti/english%5fedition/e%5fnational/892709.html(accessed</ulink> on 5 January 2022)</bibtext> </blist> <blist> <bibl id="bib5" idref="ref7" type="bt">5</bibl> <bibtext> HexArmor The Hard Truth about Safety Helmet Injuries and StatisticsJune. 2019Available online: https://<ulink href="http://www.hexarmor.com/posts/the-hard-truth-about-safety-helmet-injuries-and-statistics(accessed">www.hexarmor.com/posts/the-hard-truth-about-safety-helmet-injuries-and-statistics(accessed</ulink> on 5 January 2022)</bibtext> </blist> <blist> <bibl id="bib6" idref="ref8" type="bt">6</bibl> <bibtext> Konda S., Tiesman H.M., Reichard A.A. Fatal traumatic brain injuries in the construction industry, 2003–2010. Am. J. Ind. Med. 2016; 59: 212-220. 10.1002/ajim.22557</bibtext> </blist> <blist> <bibl id="bib7" idref="ref9" type="bt">7</bibl> <bibtext> Headway the Brain Injury Association Workplace hArd Hat Safety Survey ResultsAvailable online: https://<ulink href="http://www.headway.org.uk/media/8785/workplace-hard-hat-safety-survey-results.pdf(accessed">www.headway.org.uk/media/8785/workplace-hard-hat-safety-survey-results.pdf(accessed</ulink> on 5 January 2022)</bibtext> </blist> <blist> <bibl id="bib8" idref="ref10" type="bt">8</bibl> <bibtext> Larxel Safety Helmet DetectionAvailable online: https://<ulink href="http://www.kaggle.com/andrewmvd/hard-hat-detection(accessed">www.kaggle.com/andrewmvd/hard-hat-detection(accessed</ulink> on 5 January 2022)</bibtext> </blist> <blist> <bibl id="bib9" idref="ref15" type="bt">9</bibl> <bibtext> Redmon J., Farhadi A. YOLOv3: An Incremental Improvement. arXiv. 2018. 1804.02767</bibtext> </blist> <blist> <bibtext> Adarsh P., Rathi P., Kumar M. YOLO v3-Tiny: Object Detection and Recognition using one stage improved model. Proceedings of the 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS). Tamil Nadu, India. 6–7 March 2020. 10.1109/icaccs48705.2020.9074315</bibtext> </blist> <blist> <bibtext> Zhang X., Gao Y., Wang H., Wang Q. Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection. Int. J. Adv. Robot. Syst. 2020; 17. 10.1177/1729881420936062</bibtext> </blist> <blist> <bibtext> Bochkovskiy A., Wang C.-Y., Liao H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv. 2020. 2004.10934</bibtext> </blist> <blist> <bibtext> Jocher G., Stoken A., Borovec J., Liu C., Hogan A. Ultralytics/yolov5: v4.0-nn.SiLU() activations, Weights &amp; Biases logging, PyTorch Hub integration (v4.0). Zenodo. 2021. 10.5281/zenodo.4418161</bibtext> </blist> <blist> <bibtext> Ren S., He K., Girshick R., Sun J. Faster R -CNN: Towards real-time object detection with region proposal networks. Proceedings of the Advances in Neural Information Processing Systems. Montreal, QC, Canada. 7–12 December 2015: 91-99</bibtext> </blist> <blist> <bibtext> Szegedy C., Vanhoucke V., Ioffe S., Shlens J., Wojna Z. Rethinking the Inception architecture for computer vision. arXiv. 2015. 1512.00567</bibtext> </blist> <blist> <bibtext> Wang C.-Y., Yeh I.-H., Liao H.-Y.M. You Only Learn One Representation: Unified Network for Multiple Tasks. arXiv. 2021. 2105.04206</bibtext> </blist> <blist> <bibtext> Li Y., Wei H., Han Z., Huang J., Wang W. Deep Learning-Based Safety Helmet Detection in Engineering Management Based on Convolutional Neural Networks. Adv. Civ. Eng. 2020; 2020: 9703560. 10.1155/2020/9703560</bibtext> </blist> <blist> <bibtext> Wang H., Hu Z., Guo Y., Yang Z., Zhou F., Xu P. A Real-Time Safety Helmet Wearing Detection Approach Based on CSYOLOv3. Appl. Sci. 2020; 106732. 10.3390/app10196732</bibtext> </blist> <blist> <bibtext> Wang L., Xie L., Yang P., Deng Q., Du S., Xu L. Hardhat-Wearing Detection Based on a Lightweight Convolutional Neural Network with Multi-Scale Features and a Top-Down Module. Sensors. 2020; 201868. 10.3390/s20071868. 32230961</bibtext> </blist> <blist> <bibtext> Xie L. Hardhat. Harvard Dataverse. 2019Available online: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/7CBGOS(accessed on 5 January 2022)</bibtext> </blist> <blist> <bibtext> Li K., Zhao X., Bian J., Tan M. Automatic Safety Helmet Wearing Detection. Proceedings of the 2017 IEEE 7th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER). Honolulu, HI, USA. 31 July–4 August 2017. 10.1109/cyber.2017.8446080</bibtext> </blist> <blist> <bibtext> Dalal N. INRIA Person DatasetAvailable online: <ulink href="http://pascal.inrialpes.fr/data/human/(accessed">http://pascal.inrialpes.fr/data/human/(accessed</ulink> on 5 January 2022)</bibtext> </blist> <blist> <bibtext> Rubaiyat A.H.M., Toma T.T., Kalantari-Khandani M., Rahman S.A., Chen L., Ye Y., Pan C.S. Automatic Detection of Helmet Uses for Construction Safety. Proceedings of the 2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops (WIW). Omaha, NE, USA. 13–16 October 2016. 10.1109/wiw.2016.045</bibtext> </blist> <blist> <bibtext> Kamboj A., Powar N. Safety Helmet Detection in Industrial Environment using Deep Learning. Proceedings of the 9th International Conference on Information Technology Convergence and Services (ITCSE 2020). Zurich, Switzerland. 30–31 May 2020. 10.5121/csit.2020.100518</bibtext> </blist> <blist> <bibtext> Geng R., Ma Y., Huang W. An improved helmet detection method for YOLOv3 on an unbalanced dataset. arXiv. 2020. 2011.04214</bibtext> </blist> <blist> <bibtext> Long X., Cui W., Zheng Z. Safety Helmet Wearing Detection Based On Deep Learning. Proceedings of the 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). Chengdu, China. 15–17 March 2019. 10.1109/itnec.2019.8729039</bibtext> </blist> <blist> <bibtext> Northeastern University—China Hard Hat Workers Object Detection DatasetAvailable online: https://public.roboflow.com/object-detection/hard-hat-workers(accessed on 5 January 2022)</bibtext> </blist> <blist> <bibtext> Safety-Helmet-Wearing-DatasetAvailable online: https://github.com/njvisionpower/Safety-Helmet-Wearing-Dataset(accessed on 5 January 2022)</bibtext> </blist> <blist> <bibtext> Peng D., Sun Z., Chen Z., Cai Z., Xie L., Jin L. Detecting Heads using Feature Refine Net and Cascaded Multi-scale Architecture. Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR). Beijing, China. 20–22 August 2018. 10.1109/icpr.2018.8545068</bibtext> </blist> <blist> <bibtext> Tzutalin. LabelImg. Git Code. 2015Available online: https://github.com/tzutalin/labelImg(accessed on 5 January 2022)</bibtext> </blist> <blist> <bibtext> Lin T.-Y., Maire M., Belongie S., Hays J., Perona P., Ramanan D., Dollár P., Zitnick C.L. Microsoft COCO: Common Objects in Context. arXiv. 2014. 1405.0312</bibtext> </blist> <blist> <bibtext> Russakovsky O., Deng J., Su H., Krause J., Satheesh S., Ma S., Huang Z., Karpathy A., Khosla A., Bernstein M. ImageNet largescale visual recognition challenge. Int. J. Comput. Vis. 2015; 115: 211-252. 10.1007/s11263-015-0816-y</bibtext> </blist> <blist> <bibtext> Padilla R., Passos W.L., Dias T.L.B., Netto S.L., da Silva E.A.B. A Comparative Analysis of Object Detection Metrics with a Companion Open-Source Toolkit. Electronics. 2021; 10279. 10.3390/electronics10030279</bibtext> </blist> </ref> <aug> <p>By Munkh-Erdene Otgonbold; Munkhjargal Gochoo; Fady Alnajjar; Luqman Ali; Tan-Hsu Tan; Jun-Wei Hsieh and Ping-Yang Chen</p> <p>Reported by Author; Author; Author; Author; Author; Author; Author</p> </aug> <nolink nlid="nl1" bibid="bib10" firstref="ref16"></nolink> <nolink nlid="nl2" bibid="bib11" firstref="ref17"></nolink> <nolink nlid="nl3" bibid="bib12" firstref="ref18"></nolink> <nolink nlid="nl4" bibid="bib13" firstref="ref19"></nolink> <nolink nlid="nl5" bibid="bib14" firstref="ref20"></nolink> <nolink nlid="nl6" bibid="bib15" firstref="ref21"></nolink> <nolink nlid="nl7" bibid="bib16" firstref="ref22"></nolink> <nolink nlid="nl8" bibid="bib17" firstref="ref23"></nolink> <nolink nlid="nl9" bibid="bib18" firstref="ref24"></nolink> <nolink nlid="nl10" bibid="bib19" firstref="ref27"></nolink> <nolink nlid="nl11" bibid="bib20" firstref="ref28"></nolink> <nolink nlid="nl12" bibid="bib21" firstref="ref29"></nolink> <nolink nlid="nl13" bibid="bib22" firstref="ref30"></nolink> <nolink nlid="nl14" bibid="bib23" firstref="ref32"></nolink> <nolink nlid="nl15" bibid="bib24" firstref="ref33"></nolink> <nolink nlid="nl16" bibid="bib25" firstref="ref34"></nolink> <nolink nlid="nl17" bibid="bib26" firstref="ref35"></nolink> <nolink nlid="nl18" bibid="bib27" firstref="ref41"></nolink> <nolink nlid="nl19" bibid="bib28" firstref="ref48"></nolink> <nolink nlid="nl20" bibid="bib29" firstref="ref49"></nolink> <nolink nlid="nl21" bibid="bib30" firstref="ref60"></nolink> <nolink nlid="nl22" bibid="bib31" firstref="ref73"></nolink> <nolink nlid="nl23" bibid="bib32" firstref="ref74"></nolink> <nolink nlid="nl24" bibid="bib33" firstref="ref103"></nolink> CustomLinks: – Url: https://resolver.ebsco.com/c/xy5jbn/result?sid=EBSCO:edsdoj&genre=article&issn=14248220&ISBN=&volume=22&issue=6&date=20220301&spage=2315&pages=2315-2315&title=Sensors&atitle=SHEL5K%3A%20An%20Extended%20Dataset%20and%20Benchmarking%20for%20Safety%20Helmet%20Detection&aulast=Munkh-Erdene%20Otgonbold&id=DOI:10.3390/s22062315 Name: Full Text Finder (for New FTF UI) (s8985755) Category: fullText Text: Find It @ SCU Libraries MouseOverText: Find It @ SCU Libraries – Url: https://doaj.org/article/a2daf3239215465bb91f503ea297b401 Name: EDS - DOAJ (s8985755) Category: fullText Text: View record from DOAJ MouseOverText: View record from DOAJ |
---|---|
Header | DbId: edsdoj DbLabel: Directory of Open Access Journals An: edsdoj.2daf3239215465bb91f503ea297b401 RelevancyScore: 913 AccessLevel: 3 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 912.625732421875 |
IllustrationInfo | |
Items | – Name: Title Label: Title Group: Ti Data: SHEL5K: An Extended Dataset and Benchmarking for Safety Helmet Detection – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Munkh-Erdene+Otgonbold%22">Munkh-Erdene Otgonbold</searchLink><br /><searchLink fieldCode="AR" term="%22Munkhjargal+Gochoo%22">Munkhjargal Gochoo</searchLink><br /><searchLink fieldCode="AR" term="%22Fady+Alnajjar%22">Fady Alnajjar</searchLink><br /><searchLink fieldCode="AR" term="%22Luqman+Ali%22">Luqman Ali</searchLink><br /><searchLink fieldCode="AR" term="%22Tan-Hsu+Tan%22">Tan-Hsu Tan</searchLink><br /><searchLink fieldCode="AR" term="%22Jun-Wei+Hsieh%22">Jun-Wei Hsieh</searchLink><br /><searchLink fieldCode="AR" term="%22Ping-Yang+Chen%22">Ping-Yang Chen</searchLink> – Name: TitleSource Label: Source Group: Src Data: Sensors, Vol 22, Iss 6, p 2315 (2022) – Name: Publisher Label: Publisher Information Group: PubInfo Data: MDPI AG, 2022. – Name: DatePubCY Label: Publication Year Group: Date Data: 2022 – Name: Subset Label: Collection Group: HoldingsInfo Data: LCC:Chemical technology – Name: Subject Label: Subject Terms Group: Su Data: <searchLink fieldCode="DE" term="%22YOLOv3%22">YOLOv3</searchLink><br /><searchLink fieldCode="DE" term="%22YOLOv4+YOLOv5%22">YOLOv4 YOLOv5</searchLink><br /><searchLink fieldCode="DE" term="%22YOLOR%22">YOLOR</searchLink><br /><searchLink fieldCode="DE" term="%22safety+helmet%22">safety helmet</searchLink><br /><searchLink fieldCode="DE" term="%22SHEL5K%22">SHEL5K</searchLink><br /><searchLink fieldCode="DE" term="%22object+detection%22">object detection</searchLink><br /><searchLink fieldCode="DE" term="%22Chemical+technology%22">Chemical technology</searchLink><br /><searchLink fieldCode="DE" term="%22TP1-1185%22">TP1-1185</searchLink> – Name: Abstract Label: Description Group: Ab Data: Wearing a safety helmet is important in construction and manufacturing industrial activities to avoid unpleasant situations. This safety compliance can be ensured by developing an automatic helmet detection system using various computer vision and deep learning approaches. Developing a deep-learning-based helmet detection model usually requires an enormous amount of training data. However, there are very few public safety helmet datasets available in the literature, in which most of them are not entirely labeled, and the labeled one contains fewer classes. This paper presents the Safety HELmet dataset with 5K images (SHEL5K) dataset, an enhanced version of the SHD dataset. The proposed dataset consists of six completely labeled classes (helmet, head, head with helmet, person with helmet, person without helmet, and face). The proposed dataset was tested on multiple state-of-the-art object detection models, i.e., YOLOv3 (YOLOv3, YOLOv3-tiny, and YOLOv3-SPP), YOLOv4 (YOLOv4 and YOLOv4pacsp-x-mish), YOLOv5-P5 (YOLOv5s, YOLOv5m, and YOLOv5x), the Faster Region-based Convolutional Neural Network (Faster-RCNN) with the Inception V2 architecture, and YOLOR. The experimental results from the various models on the proposed dataset were compared and showed improvement in the mean Average Precision (mAP). The SHEL5K dataset had an advantage over other safety helmet datasets as it contains fewer images with better labels and more classes, making helmet detection more accurate. – Name: TypeDocument Label: Document Type Group: TypDoc Data: article – Name: Format Label: File Description Group: SrcInfo Data: electronic resource – Name: Language Label: Language Group: Lang Data: English – Name: ISSN Label: ISSN Group: ISSN Data: 1424-8220 – Name: NoteTitleSource Label: Relation Group: SrcInfo Data: https://www.mdpi.com/1424-8220/22/6/2315; https://doaj.org/toc/1424-8220 – Name: DOI Label: DOI Group: ID Data: 10.3390/s22062315 – Name: URL Label: Access URL Group: URL Data: <link linkTarget="URL" linkTerm="https://doaj.org/article/a2daf3239215465bb91f503ea297b401" linkWindow="_blank">https://doaj.org/article/a2daf3239215465bb91f503ea297b401</link> – Name: AN Label: Accession Number Group: ID Data: edsdoj.2daf3239215465bb91f503ea297b401 |
PLink | https://login.libproxy.scu.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsdoj&AN=edsdoj.2daf3239215465bb91f503ea297b401 |
RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.3390/s22062315 Languages: – Text: English PhysicalDescription: Pagination: PageCount: 1 StartPage: 2315 Subjects: – SubjectFull: YOLOv3 Type: general – SubjectFull: YOLOv4 YOLOv5 Type: general – SubjectFull: YOLOR Type: general – SubjectFull: safety helmet Type: general – SubjectFull: SHEL5K Type: general – SubjectFull: object detection Type: general – SubjectFull: Chemical technology Type: general – SubjectFull: TP1-1185 Type: general Titles: – TitleFull: SHEL5K: An Extended Dataset and Benchmarking for Safety Helmet Detection Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Munkh-Erdene Otgonbold – PersonEntity: Name: NameFull: Munkhjargal Gochoo – PersonEntity: Name: NameFull: Fady Alnajjar – PersonEntity: Name: NameFull: Luqman Ali – PersonEntity: Name: NameFull: Tan-Hsu Tan – PersonEntity: Name: NameFull: Jun-Wei Hsieh – PersonEntity: Name: NameFull: Ping-Yang Chen IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 03 Type: published Y: 2022 Identifiers: – Type: issn-print Value: 14248220 Numbering: – Type: volume Value: 22 – Type: issue Value: 6 Titles: – TitleFull: Sensors Type: main |
ResultId | 1 |