Neural Network based Object Detection Design for

Neural Network based Object Detection Design for Crop Vandalism Sai Siddartha Maram, Tanuj Vishnoi and Sachin Pandey Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala Punjab - 147004, India. smaram [email protected], tvishnoi [email protected], spandey [email protected]

Abstract. Crop vandalism leads to global food crisis and social-economic downfall. With the intensity of the problem reaching its pinnacle the project aims to deploy a powerful neural network to develop intelligent and sustainable solutions.The neural network based solution is responsible for detecting objects and take suitable triggering actions. In the present study three trained models TensorFlow, Vision and Yolo are used for object detection in real time. All the trained models are tested under different stress conditions such as distance, lighting, etc. After a series of experimentation under different conditions Yolo was found to be the most apt choice among other cognitive services.The potential of the proposed module is not just limited to a specific domain but the developed solution can be customized as per requirements that are to be met in demanding verticals. Keywords: Neural Networks,R-Convolutional Neural Networks, Real-Time object detection, Yolo, OpenCV, Tensorflow, Crop Vandalism

1

Introduction

India claiming itself as an agricultural superpower, it is disappointing to note that 15-30% of crop is lost due to crop vandalism. Such figures are not acceptable in a country where consumption needs are high and has food security as its top priority. An effective solution is needed to overcome this menace and come up with figures showing decline in this issue. Increasing deforestation every year leads to rise in the involvement of wildlife with human activity. With this increasing trend, it is evident that the farmers ergonomics are being disturbed.The primary source of income in many countries is agriculture and for the same reason it is in the need of the hour that Artificial Intelligence/ Machine Learning make their way into the same vertical and take part in developing annual yield. Neural Networks have always been believed to come up with effective solutions to solve real time problems.It would not be fair to isolate cost in the developed solution so considering the cost to profit ratio, the ratio evidently carries a higher weight. The project is keen on developing solutions which involve minimal or no complexity at all and is capable of adapting to different terrains and conditions. After a series of discussion with farmers in Gujarat, the research team was able to empathize with the farmers and planned to take appropriate action. The research team is keen on coming up with end-to-end solutions to address the same vertical by streaming live farm visuals and detect elements which are responsible for crop vandalism. Sristi [1] helps to understand the intensity of the problem and the need to empathize with the issue.

2

Proposed Work

Humans at the mere sight of an image are capable of identifying what objects are there in an image and develop all possible relations between objects in that image. The human detection system when mimicked is capable of generating new solutions in different domains. The current image recognition softwares use classifiers and run feature sets across the image to detect objects in the image, this does lead to a great deal of accuracy. Some of the best ways to get involved with object detection is using cognitive services from Microsoft Vision, TensorFlow [2] and Yolo [3]. The proposed working solution uses a python based wrapper and openCV [4] to feed real time data into the neural network and take intelligent actions from there on.

2

Sai Siddartha Maram, Tanuj Vishnoi and Sachin Pandey

(a) Crop Vandalism affected farm

(c) Feces by wild cattle in agricultural farms

(b) Monkey eaten away crop vandalism

(d) Deep footmarks spoiling seed structures

Fig. 1: Crop affected areas captured during the trek.

2.1

Discussion on Cognitive Support Based on Stress Factors

By running a set of same images on cognitive systems Microsoft Vision, TensorFlow and Yolo, a comparative analysis is discussed and based on which certain conclusions are drawn. Factors contributing to selection primarily include speed and accuracy under various stress factors which include distance, light, network connectivity and cost. The stress conditions are taken into consideration after a series of observations and discussions with farmers where the problem is dominant.

Table 1: Performance comparison of object detection tools at moderate distance [5] Index

Images Prediction Inference

Original Image

TensorFlow

Vision

Yolo

elephant: 90% cow: 95% elephant: 75%, cow: 64% TensorFlow predicts elephant with an astonishing accuracy of 91% despite of having no elephant.

Neural Network for Crop Vandalism

3

Table 2: Performance comparison of object detection tools at far away [6] Index

Original Image

TensorFlow

Vision

-

elephant: 52%, horse: 59%

cattle/cow: 89%

Yolo

Images Prediction

horse: 25%, cow: 36%,44%,44%,35% Inference Irrespective of distance Yolo beats Tensorflow both in terms of accuracy and consistency

Table 3: Performance comparison of object detection tools at optimal light [7] Index

Original Image


-

Index

Original Image

TensorFlow

Vision

Yolo

cow: 91% cow: 78% cow: 93% Under broad day light it is seen that all the cognitive services perform well.

Table 4: Performance comparison of object detection tools at dim light [8]


2.2

TensorFlow

Vision

Yolo

Not Detected cow/buffalo: 89% sheep: 27%, cow: 47% With most of the attacks expected to happen at nights Vision and Yolo seem the most apt choices.

Selection of Most Suitable Cognitive Service

With the solution aimed to serve farmers of different income groups, it is important to keep the solution cost effective. Cognitive support is the backbone of this system. It is a must to choose cognitive support which comes at minimal cost and satisfies the trade off between cost and accuracy. Despite the fact, providing extremely high accuracies consistently, cognitive support offered by Microsoft comes at a price which makes it not suitable to use in the module, while TensorFlow and Yolo are free to use. Apart from that, we find Microsoft cognitive support requires the continuous presence of an active internet connection which again comes at a cost and thus, further adding to t he amount required for buying and maintaining the product. With this, the module let go of Microsoft Cognitive support despite of its amazing accuracy and consistent predictions. To perform comparative analysis between TensorFlow and Yolo, we run these free cognitive services on a same video[10]. The video is trimmed such that it spans for a duration of 20 seconds. It is important to note that the video strictly contains buffalos / cows. So any other object predicted can be treated as a wrong prediction and will be treated as 0% accurate on cow/ buffalo. From Fig 2(a), which plots the (prediction number vs accuracy) for the first 100 predictions using TensorFlow we see that there are a number of instances where there is a sharp drop in the accuracy to zero which indicates that either no object has been detected or any other object apart from cow/buffalo has been detected. The total different

4


kind of objects which have been predicted despite of the fact not existing in the trimmed video is shown in Fig 2(d).

(a) Accuracy chart on buffalos/cows (Tensorflow)

(b) Accuracy chart on buffalos/cows(Yolo)

(c) Categories detected TensorFlow

(d) Categories detected Yolo

Fig. 2: Comparative analysis and Tensorflow and Yolo

Using cognitive service of Yolo, Fig 2(b) which plots the prediction number vs accuracy for the first 100 predictions, we see that the number of sharp drops are less as compared to TensorFlow and the detection is consistent over a long period of time despite consistency slightly being on the lower side. Fig 2(d) indicates all the different kinds of objects predicted despite of of their absence in the video. From both Fig 4 and Fig 2(a) we see that the number of objects predicted by TensorFlow is more despite of not existing in the video. It is also evident that in the trade off between consistency and accuracy Yolo performs better by providing the required consistency over a longer duration. The application of a Convolutional Neural Network over a predicted set of boxes instead of the whole image where the confidence of finding an object in the image is greater than a set threshold makes Yolo superior and the system believes it would be the most apt cognitive service to fit in the module. 2.3

Algorithmic Analysis in Python Wrapper

Algorithm Step Step Step Step

1 Load the required yolo.cfg and yolo.weights based on processing speed 2 Develop TensorFlow graph and store locally using TFNet 3 Capture Video using OpenCV and break into frames 4 While frame exists – Predict and perform suitable triggering actions


5

Step 5 Once done with all frames close Video capture.

Fig. 3: Algorithmic Design

3

Device Structure

The proposed device in simple is a real time webcam which can take the form as of any camera module. To keep the solution simple, it would be better to come up with a regular webcam or Pi-camera module. The front-view of the device is evident in Fig 4 which gives a clear indication on the simplicity involved in moving and pinning the device at required positions and hot zones which are more prone to the attack. If the situation which needs to be addressed is more severe, a continuously moving robo-car or to be more specific a line follower which is capable of moving around the specifically designed path to monitor or keep watch off. This model is shown in Fig 5 and it is a mere extension of the initial with a slight rise in price which we expect the government and other corporates will come up and take social responsibility which in turn affects the work pattern of employees in a positive manner [9].

Fig. 4: Basic vector model of proposed device

The wheels shown in Fig 5 are totally customized and can be terrain specific. Electronic components and other elements which show negative affinity towards water can be placed in inside the box to prevent any kind of contact with external weather conditions which lead to damage.

6


(a) Top View

(b) Side View

(c) Front View

Fig. 5: Proposed solution vector

3.1

Electronic Components

The code is feed into a Raspberry Pi3 which in turn is connected to a camera module or a suitable webcam. The Raspberry Pi3 is further connected to an Arduino Uno which is responsible for performing triggering actions such as imitating predator voice, sprinkling water at high pressure and any other suitable action based on wildlife distribution in the specific geographic area. Triggering actions can also be simple notifications to farmers using the WiFi module or assuming the density distribution of smart phone is less, a GSM module will solve the purpose where on detection of menace causing elements will send an SMS to the farm owner to come out and take suitable and required actions. Fig 6 shows a simple circuit with the WiFi module but it is free to be replaced with GSM module or any suitable component to offer a customized solution.

Fig. 6: Arduino UNO connected with triggering devices LED and WiFi module

3.2

Algorithmic Analysis in Arduino

Step 1 Keep reading serial monitor. Step 2 If change in serial monitor go to Step 3 else Step 1. Step 3 Perform suitable action based on change triggered.


4

7

Results and Discussion

After taking into consideration various factors discussed and first hand opinions from potential users and various industry experts, the final system has ended in two versions discussed below 1. A stationed stationary system (shown in Fig 7(a)) which in pairs are capable of covering a wide distance and perform to achieve amazing results along with performing suitable triggering actions as shown in Fig 7(b) 2. A programmed swift self-maneuverable rover capable of moving around detect elements and perform triggering suitable triggering actions as shown in Fig 7(c).

(a)

(b)

(c)

Fig. 7: Proposed prototype. (a) Stationary model which can be fixed developed (b) Pair System to Monitor (c) Maneuverable Model

With the intention to develop a customized solution for different verticals it has to be taken into account that nearly 80 different kinds of objects are detected in real time with greatest accuracy. Based on terrain and demographic requirements the cognitive service can be trained with more specific objects. For an instance, if the device is to be used in counter-terrorism activities, the model could be trained on all the different kinds of guns which would give soldiers prior information and so can prepare plans accordingly. The fundamental principle which the project runs on is the camera acts as an eye and the neural network acts as the brain which gives it almost human level precision in detecting objects and take intelligent decisions based. To better understand the power of this detection model consider going through https : //goo.gl/wKunW E. The project has successfully met its goals of developing a smart and cost effective solution for crop vandalism along with a number of other domains.

8

5


Conclusion

To address the issue of Crop Vandalism at this moment is an important task to be taken up by governments across the globe. With the project fabricated and tested on various terrains and stress environments, the module displayed its potential to assist governments to indulge in mass production of the product and assist farmers in crop maintenance and protection from crop vandalism. For the further enhancement of the product in terms of speed and accuracy it is to be trained specifically on certain objects which the domain demands. The project when tested for only human detection was able to achieve an outstanding speed of 5-6 frames per second despite of a low computation power.Apart from agriculture, the product finds many broad applications in verticals like defence to monitor infiltrations where conditions do not allow manned patrolling and the medical sector to assist specially gifted people and empowering them using cognitive intelligence. The module over a period of time looks to venture into developing assistants for security forces and smart glasses to assist the specially gifted.


9

References 1. Sristi. https://www.ss.sristi.org/single-post/2017/06/01/nilgai. Accessed on: 16 Feb 2018. 2. Mart´ın Abadi et. al. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org. 3. Joseph Redmon et. al. You only look once: Unified, real-time object detection. CoRR, abs/1506.02640, 2015. 4. Culjak Ivan et. al. A brief introduction to opencv. In MIPRO, 2012 proceedings of the 35th international convention, pages 1725–1730. IEEE, 2012. 5. Flickr. https://www.flickr.com/photos/chadica/3225724720/in/photolist-9hutrh/. Accessed on: 18 Feb 2018. 6. Flickr. https://www.flickr.com/photos/chadica/3225453234/in/photolist-9uhrftg. Accessed on: 18 Feb 2018. 7. Youtube. https : //www.youtube.com/watch?v = −wr6excs. Accessed on: 18 Feb 2018. 8. Wild4 African Photographic Safaris. https://www.wild4photographicsafaris.com/. Accessed on: 18 Feb 2018. 9. Zhu Liya and Yong Shaohong. The effect of corporate social responsibility on employees. In 6th International Conference on Information Management and Industrial Engineering, volume 1, pages 268–271. IEEE, 2013.