Deep Learning Implementation of Autonomous Driving using Ensemble-M in Simulated Environment

Making Autonomous Driving a safe, feasible and better alternative is one the core problems the world is facing today. The horizon of the applications of AI and Deep Learning has changed the perspective of the human mind. Initially, what used to be thought as subtle impossible task is applicable today and that too in the feasibly e�cient way. Computer vision tasks powered with highly tuned CNNs are outperforming humans in many �elds. Introductory implementations of autonomous vehicle were merely achieved using raw image processing and hard programmed rule-based logic systems along with machine/deep Learning used as secondary objective handlers. With the autonomous driving method proposed by NVIDIA, the usability of CNNs is more adequate, adaptable and applicable. In this paper, we propose, the ensemble implementation of CNN based regression models for autonomous-driving. We have taken simulator generated driving view image dataset along with mapped �le of steering angle in radians. After applying image pre-processing and augmentation, we have used two CNN models along with their ensemble and compared their performance as to minimize the risks of unsafe driving. We have compared Nvidia proposed CNN, MobileNet-V2 as regression model and Ensemble-M results for comparison their respective performance, MSE scores and compute time to process. In result analysis, the MobileNet-V2 model performs better in densely-featured roads and Nvidia model performs better in sparsely-featured roads whereas Ensemble-M normalizes the performance of both models and e�ciently result in the least MSE score (0.0201) with highest computation time utilization.


Introduction
Deep learning is a sub-category of Arti cial Intelligence which enables machines to achieve intelligence mathematically to take decisions.Self-driving is one of the classic applications of AI super eld where a small inconsistency in building the e cient solution may lead to severe loss of capital and even life.The self-driving application is not only limited to transportation on roads to carry passenger from one node to another but also have primary potential in the world shipping market, where it is implemented on a goods carrier truck, a rover explicitly designed for particular information extraction project, automated micro delivery units [26], IOT devices [10] and video game bots.Applications of Deep Learning which initially thought to be constant in number, are now explored with a different and innovative approach to attain almost any task [31].The key parameters which are required to implement such learning is extensive computational power and primary storage units.However, the problem of space and computational utilization is now solved with the introduction of Cloud Computing and GPU (Graphics Processing Unit) and even TPU (Tensor Processing Unit) [32].A vehicle's autonomy had always been a core problem in the sector.Experimental implementation of such project in the real-world scenario is not easy.Legal issues, various test cases, and situations has to be monitored closely which may or may not end up in a fatal accident causing both life and capital hazard.
Therefore, for implementation of such project, simulated environment is a fare and dominant alternative.
Due to the boom in the Video Game industry, the R&D in the sector of simulators have been increased [2].
There are several environment engines available that can be either directly utilized or can be transformed and utilized as per our objective.CARLA, Unity, TORCS (The Open Racing Car Simulator), AirSim, Udacity (based on Unity) are the few open-source environments/simulators which could be utilized to test and implement the autonomous vehicles.The scope of this paper is to implement and test results of various autonomous vehicles models using Open-Source Simulator.
Earlier attempts to implement Autonomous Driving was the use of hard-core logics and using Deep/Machine Learning as the secondary objective handlers.Moreover, Reinforcement Learning as an option for autonomy uses too much computational resources which is just not feasible [13].But in 2017, the paper of Nvidia changed the conventional approach by using a single CNN model as a regression model to attain self-driving [1].Advantages of a CNN based model is that high-level features can be automatically considered without giving explicit instructions to the system.The only limitation of such deep-learning system is being the size of dataset used and e cient utilization of computational resources.
This paper, further classi ed in the different sections such as section II, discussed various researchers' views in analyzing the implementation of self-driving via means of various techniques.Dataset used and model formulation is discussed in section III.Different matrices used are also discussed in section III.
Further, the evaluation of results in terms of training and testing with regression matrices for models used is discussed in section IV.Next, this work is concluded in section V with its future scope.

Related Works
In [1], the authors proposed the study in which they empirically demonstrated the ability of CNNs to learn the entire driving task without manual decomposition and without marked lanes, semantic abstraction, path planning, and control (i.e., end-to-end steering control system).The study successfully resulted in the CNNs ability to operate a car in diverse weather condition including sunny, cloudy, rainy.In [2], the author described all the simulators including simulators for self-driving and implemented the simulator FODS (First Order Driving Simulator) and tested its performance based on rendering against simulators including Udacity, TORCS, Driving Interactions, OpenAI Gym (CarRacing-V0).The study resulted in the outperformance of FODS against all the 3-D and 2-D simulators based on mean and standard deviation of Rendering Steps (X1000) per minutes and achieved the score of 0.0372 for standard deviation and 3.48 for mean of rendering.In [3], the authors used 7.25 hours of driving video of 20Hz, along with the mapped dataset having steering angle, speed, GPS data to implement a CNN based system for lane keeping for self-driving cars that can automatically produce proper steering angles from captured frames.
In [4], implemented JacintoNet, a 13 layers deep CNN and used low power TDA2x SoC on an IOT device to implement end-to-end steering angle prediction for self-driving application, they also compared the performance of the trained network against the performance by VGG-16 network.In [5], the authors have explained in detail the implementation, working and application of peer-to-peer Client-Server Architecture and its application in real-world scenario.In [6], the authors have used two models including Attention Branch Network and CNN based regression model for steering and throttle for visually explaining the end-to-end leaning for self-driving.They also combined the attention map with an FCLN that generates captions for speci c regions.In [7], the authors used TORCS simulator to implement deep reinforcement learning framework using TensorFlow to control the car and get sensor values.They also used additional python library SCR (Simulated Car Racing) to access driving parameter controls and the system took 1.5 million states to converge for the desired output.In [8], the authors implemented 3 different CNN endto-end steering control models and compared its performance with Nvidia model [1], with respect to convergence of loss and Lane Keep Time.The study resulted in Model 3 being more robust than Nvidia's proposed CNN architecture model with lane keeping average time equals 617.3 seconds than Nvidia's 453.7 seconds.In [9], the authors upgraded a pre-existing Unity based self-driving simulator CAIAS using Blenders and SolidWorks, added features like RBG camera, Lidar Sensor and simulated a new environment having sorts of weather conditions including rain, snow, fog, autumn and lanes including muddy, forest, and regular.Implemented Nvidia's CNN model and scored behavior learning on parameters like obstacle avoidance, bumpy road tackle [1].
In [10], the authors did experiment on the combination of VGG-19 model for classi cation along with Nvidia end-to-end steering control system [1] model to detect tra c signal at the same time detecting steering angle and attained the accuracy score of 86% for signal.In [11], the authors developed a low cost IOT platform DeepPicar and trained it using Nvidia's end-to-end steering control regression model [1].Compared the processing time with respect to number of cores, and compared the performance of various micro computer platforms based on parameters like read-write speeds, average execution time and cost.In [12], the authors used an MPC and NMPC controllers backed up with RNN and a feed forward model for achieving the self-driving task and showcased the use of two computational modules to e ciently calculate the steering reference trajectory and RMS values respectively.In [13], the authors, showcased CNN based approach to classify the essential tra c signals and assigned them with wights to be used in reinforcement learning environment using Q-Learning algorithm.In [14], the authors studies about various parameters of reinforcement learning, presented empirical results to demonstrate disadvantages and advantages of transferring SFs between MDPs that only differ in rewards function and concluded that transferring a SF representation between tasks gives a signi cance boost in leaning speed.
In [15], the authors implemented autonomous-driving using Nvidia's end-to-end steering control model [1] in Udacity CarND simulator, the study resulted in achieving the accuracy score of 78.5% on validation set.In [16], the authors experimented end-to-end autonomous driving in TORGS racing simulator with 6 different track data, used pre-trained CNN model to predict the desired steering wheel angle from continuous video stream and compared the performance based on mean-square-error between human and end-to-end controller.In [17], the authors developed a CNN via LSTM and fully connected layer based deep learning network to implement end-to-end steering control system for cooperative self-driving, they also argued for using V2V communication instead of GPS sensor data for making system more resistant to location-based errors.In [18], the authors used WPI and self-generated dataset for the sake of training heuristic ROI detection algorithm with Rttld implemented with RCNN, Yolo2, Yolo2-tiny, Yolo3, Yolo3-tiny, SSD, Faster RCNN models for tra c light detection.In [19], the author used Deepest LSTM-TinyPilotNet for implementing end-to-end steering control system for autonomous vehicle, and calculated the error with respect to the center of the road and also achieved the root-mean-square value of 0.0912 radians.In [20], the authors used DRIVETM PX2 computer to implement a CNN facilitating end-to-end steering control system for a self-driving car and compared the results based on steering wheel angles deviation pre and post training and displayed number of lane departures in real-time scenario.In [21], the authors experimented with 3D structure-based regression methods instead of end-to-end learning for steering prediction to predict the failure cases for pose regression techniques.Used PoseNet, MapNet and DenseVLAD for pose regression and concluded that MapNet outperforms DenseVLAD on smaller LOOP datasets but performs poorly with larger full scene.In [23], the authors used capability CNN combined with sampling-based predictive control to predict a cost map projection in image plane and top-down view of the cost map and compared the results with GPS sensor output.

Method And Materials
The dataset used and the methodology used is explained in the subsequent sections.

Simulator
We used Unity based Udacity open-source simulator [2] as the base environment for implementing our self-driving car.The respective simulator makes tasks easier due to its client-server architecture [5] and easy to use API to collect and transmit data in and out of the simulator.The simulator has two modes, training and autonomous modes.While in training mode, we start the recording session to trigger collections of frames from 3 different cameras (left, center, right), based on time it also records and maps the output frames with parameters like speed, throttle, steering angle and brake.The autonomous mode uses client server architecture to communicate and transmit the data via an API pipeline, once the model trained, could use the real-time data ow stream from this mode and can transmit back the model results based on output to update the current parameters and situation of the autonomous vehicle.Figure 1 shows the used simulator for implementing self-driving.

Dataset
The dataset used for this work has been collected from Unity based Udacity self-driving simulator [2].Our primary focus is to extract only the essential features for our deep-learning network.Images from left, center and right along with their mapping with steering angle is enough to implement our regression model, as other parameters could be adjusted using the logic programming and basic physics principles.One of the core problems with driving data collection is that it is never balanced.From the simulator we collected 12000 frames including the images and mapping from 3 different cameras.Due to the circular driving path of simulator, the graph which we get is either left or right skewed rather than being in normal distribution.A normal distribution can be equated as (1) Moreover, due to the maximum inactivity of the steering angle, the data is more imbalanced and outlined near the center as seen in Fig. 2.
The solution to this problem, can either be to improve the data collection method in the simulator by developing environments which facilitates equally diverted roots and lanes or by augmenting (horizontal ipping) all the collected images and negating the output steering angle and adding them to our dataset le, which will give us, a more normally distributed curve.However, to limit the outlier on the center i.e., 0 radian and due to the maximum inactivity of the steering wheel, we created a special function based on a count or bin value of the steering wheel rotation to eradicate the outlier image dataset concentrated at center or steering inactivity.Figure 3 shows the corrected and normally distributed dataset.

Augmentation and Pre-processing
To increase the variety of input, the images were cropped to focus on the essential features and were randomly augmented with 50% probability including translation, zooming, and HSV adjustment.Images were converted into YUV format from RGB to make them more human visual appearing which has become the standard in implementation of autonomous driving.Figure 4 shows the image before and after pre-processing augmenting (horizontal ipping) all the collected images and negating the output steering angle and adding them to our dataset le, which will give us a more normally distributed curve.However, to limit the outlier on the 0 angles, we created a special function based on a binning technique to eradicate the unnecessary image dataset concentrated at center or steering inactivity.Figure 3 shows the corrected and normally distributed dataset.
Images were also blurred using Gaussian Blur which equates to Eq. (2).

Model Formulation
Peer to Peer steering control mechanism works in the manner depicted in Fig. 5, The frames captured from their respective cameras i.e., left, center and right along with its mapping steering angles in radian lying between − 1 to 1, is horizontally stacked to normalize form and is fed into the deep-learning network respective to their pre-processing requirement.The CNN network computes the result and backpropagates the error value based on Mean Squared Error matrix and updates the weights and biases of the network for optimal results and better applicability of the network.
We are using two different network architectures for the implementation of a peer-to-peer steering control system for an autonomous driving vehicle including the ones from Nvidia's [12] as shown in Fig. 6.The rst 5 layers are convolutional fully connected layers followed by a dropout layer and then a attened layer.
The network is ended using 3 layers of dense type and an output layer having ELU equated as (3) as activation function since the Elu output limits between − 1 to + 1 ELU holds the best interest of limits for the application.The network consists of a total of 252,219 trainable parameters.The applicability and adaptability of MobileNet-V2 architecture is due to the bottleneck shown in Fig. 7. Due to its support for convolutional adaptability to reduce the number of features via a bottle necking, the network still is able to work in the most e cient way irrespective of reduced channels.The architecture shown in Fig. 8 has a depth of 88 layers and consists of around 3,538,984 trainable parameters.This classi cation network was converted into a regression-based model as shown in Fig. 7 for its adaptability for self-driving task by adding 4 additional dense layers at the end ltered via Elu activation function using Adagrad optimizer due to its lack of momentum and concentration to the features that are slow to change.
The Training was conducted into batches of 3000 each with various applied augmentation and preprocessing mentioned earlier.The dataset consists of around 24000 normally distributed and balanced frames.The results and trainability of different networks are discussed in the next section.While collecting the dataset from the simulated environment and implementing the model results in actual terrain faces a lot of ambiguous di culties.Differences in obstacle density including crowd variations, signal variations, and lane variation and narrowing, one particular model cannot be universal.Therefore, ensemble of different models having deep layers and shallow layers is necessary for normalizing the performance output in the city and outside of the city environment.Deep layers of the model observe more features and train better in densely obstructed environment whereas shallow layered model observes less but more important/weighing features essential for the sparsely obstructed environment.We have used the mean-square-error loss to train our model equated as (4).It is used to optimize the value of parameters used in our model.We intend to decrease the loss function with successive epochs.We have used Adam and Adagrad optimizers respectively with a learning rate = 0.001 for training our models.

Results And Discussion
In result analysis, model resulted steering angles have been compared against the actual result.Nvidia, MobileNet-V2 and Ensemble-M results are analyzed based on root-mean-square error matrices and average computation time per frame.The results were then contrasted to determine the best model.Although the model MSE are performing pretty good, we recommend using diverse but balanced data to enhance the performance using other simulators or real time driving image views.Due to the limitation of the simulator to generate variety of data, the model is trained only on 24000 augmented frames collected from looped path.The performance and respective model the training is mentioned below.

Nvidia Model
The Nvidia model due to its limited depth and high tuning performs exceptionally well for most of the end-to-end steering angle predictions in the simulator.But for some features the predictions starts deviating from actual result.Figure 8 shows the training and validation loss (MSE) respectively during the training phase.The convolutional lter visualization of Nvidia model can be seen in Fig. 9.

MobileNet-V2 Model
There is decent convergence rate of a pretrained classi cation model converted into a regression model by adding a few additional support dense layers.MobileNet-V3 performs decently well for self-driving car application.Figure 10 shows the convergence of loss (MSE) over training and validation dataset during training phase and Fig. 11 shows the MobileNet-V2 model.

Ensemble-M
To make a peer-to-peer steering control system to be universally adaptable, a deeper and the model with more number of trainable parameters performs better in the terrain accommodating more features and a shallower model with less number of trainable features performs better in the environment with less frequent changes and features.An Ensemble-M approach shown in Fig. 11 normalizes the deviation (error) to the middle grounds by averaging the predicted regression results hence making the nal result more converged towards the actual result.Therefore, the ensemble can be used effectively to solve this problem which equates to (5).
Where Y is the output of the respective model, n is the total number of models used.
Table 1 shows the respective MSE scores, average computation time per frame, depth of individual model.Ensemble-M performs better than any other individual model and have attained an MSE score of 0.0201 (radians).However, the computational time per frame for NVIDIA model is less as compared with others with average compute time per frame equals 0.5087 seconds thus, verifying performance and time tradeoff in self-driving.12 shows, the visualization of the performance of our models over testing dataset against actual regression results.For some parts of predicted regression, the results from Nvidia model lies near the actual result and for other parts of regression prediction, the results from MobileNet-V2 lie near the actual results.Due to the fact that a shallow model gets trained faster And is adaptable for most of the results still, at some complex points with a greater number of features its predictions do not coincide with the requirements.Hence our deeper model solves this problem by feeding in the input into more deeper layers for prediction.The importance of ensemble techniques plays a signi cant role in limiting the predicted output.Hence, Ensemble-M makes it normalized therefore making autonomous-driving safer due to its normalized deviation.

Conclusion And Future Scope
This study only covered the ensemble-M approach on 2 models respectively.Future work involves the veri cation of the ensemble-M on more than two models for achieving the least RMS score and hence making end-to-end steering control system more feasible than what its status for implementation is today.There is a plethora of techniques to implement full-edged autonomous driving.Methods including deep reinforcement learning, logic programming, deep learning (road segmentation, end-to-end) are a few of the methods to achieve the tasks.Irrespective of the technique used, there will always be a time-space trade-off.A deeper network tuned with correct parameters may lead to a better MSE score but at the same time, it may lead to a more time-consuming process.On the other hand, a shallow deep network may not have a good MSE score but performs comparatively enough required by the application.Since autonomous driving involves the risk of life, this trade-off has to be balanced on some middle ground.Since this project was implemented in the simulated environment, there will always be the possibility of missing features in the simulator which may have affected the performance and decision making of the model.Therefore, the development of a versatile simulator with custom feature adjustment is always a possibility.Moreover, the implementation of such a system in the real-life environment will also require enhanced security for data transmission facilitating the client-server architecture.Work on system versatility is also required since the system's adaptability is an issue that changes from country to country based on local tra c rules and variation.

Declarations
Approval: No agency or research Centre is involved in this research     Image after and before pre-processing, the conversion of image from RGB color scheme to YUV color scheme.
End-to-End Steering Control Mechanism.
Network Architecture of Nvidia End-to-End steering control CNN model [1].

Figures
Figures

Figure 2 Raw
Figure 2

Figure 3 Balance
Figure 3