omputer Vision Guided Deep Learning Network & Machine Learning Techniques to build Fully-Functional Autonomous Vehicles.
“If you recognize that self-driving cars are going to prevent accidents, AI will help to reduce one of the leading causes of death in the world.” — Mark Zuckerberg
If ride-on-demand services such as Uber & Ola have made a revolution in the idea of conveyance, self-driving vehicles are going to be the next renaissance shaking up the whole transportation industry. This new idea is on its way to become a multi trillion-dollar business — bigger than Amazon and Walmart combined. According to the World Economic Forum, this big leap in the auto industry will deliver $3.1 trillion annually by reducing number of crashes, need for emergency services, saving man-hours, cost of car ownership & also indirect savings from shorter commutes and less carbon emissions.
On top of that, there are endless design possibilities, once you eliminate the need for a steering wheel and a driver. Cars could be compact and egg-shaped or box-like mobile homes that can offer much more than mobility. Drastically different car design means drastically different roadways. The road itself could suggest a faster, safer route and smart devices — embedded in vehicles and infrastructure — would communicate on the fly to accommodate traffic.
Notable contestants in the self-driving car race are:
- Ford, GM etc.
Some versions of Tesla’s Autopilot and Intel’s Mobileye uses the highly credible visual, camera-based approach to self-driving tech, as opposed to laser-radar (LIDAR) technology.
- can work in environments, LIDAR can’t. Eg: bad weather
- is much cheaper than LIDAR
- less obtrusive to vehicle design.
But to make it work, visual technology demands massive computing power, to crunch all the visual data. This is where AI algorithms kicks in. In this project, I have used the following AI sub-domains to solve individual functions of a car, to build a fully functional one.
- Steering Angle: Used Deep Learning
- Acceleration and Brake: Computer Vision & Deep Learning
- Gear and Clutch: Used Machine Learning
Self Driving System in Action. All Controls are driven by ML/ DL Prediction.
The source code of this project can be found here
Download the Indian Driving video data from my Kaggle source.
There are couple of publicly available driving data sets, but each of them has some or other shortcomings, as below:
a) Sully Chen’s Driving Data (25 mins)
- Contains only Steering Angle as parameter
- Need more parameters, to build a fully functional car
b) Comma.ai Research Data (7 hours)
- Contains Steering, Acceleration & Brake. No ‘Gear’ data. Image quality less
- Most of their driving data is along lane-separated highways where curves are rare and traffic is less, as opposed to Indian roads.
c) Baidu’s Appollo Open Platform (Huge)
- Contains all the complex LIDAR and 3D data
- Massive database: unreasonably huge to process in PC
All the available open data sets are recorded in western setting, where lanes are wide and signs & dividers are well-defined. But for AI algorithms to learn, Indian roads are an order harder than western roads, as there isn’t enough data on Indian road conditions. Thus, algorithms trained on western data sets may not perform well in Indian conditions.
None of the above data sets were found suitable to build a fully functional car in a personal computer. Besides, I was curious to see how AI algorithms would work in Indian settings. Hence, I decided to record own video. Firstly, I mounted an iPhone outside my car, on the windshield glass and drove along the narrow and curvy roads of India. To record video, I tried to use Nexar AI dashcam app but my attempt failed, as it auto-detects and stores only collision frames. Then, I used the inbuilt video recorder in iPhone and drove for around 1–1.5 hours, along multiple circuits.
But the recorded video was not enough to train a self-drive system. To build a fully functional car, all the basic inputs a car-driver gives while driving viz. steering angle, acceleration, brake, clutch and gear has to be predicted. Hence, training data also need to have all these parameters.
Car Console Interface
To record steering angle while driving, we can use the on-board diagnostics system of the car, to find steering angle, every millisecond. Similarly we can hack around the mechanics and electronics of the car, to get other parameters such as acceleration, brake, clutch and gear etc. As a hack to generate training data, without tampering the OBD of car, I coded a car control interface in Python to record scan codes from keyboard. This interface was used to simulate driving, just as you play video game using keyboard.
Both the recorded video and console interface was run side-by-side, so that I can use arrow keys to simulate driving as seen in the recorded video. Seven key controls were used to compute parameters during driving simulation:
Up and Down Arrow : Acceleration & BrakeLeft and Right Arrow : Direction / Steering Angle‘a’ : Shift Gear Up‘z’ : Shift Gear Down‘g’ : Shift Gear Neutral
The interface program writes the steering angle, acceleration, brake and gear values to an output file, for every frame in the video. If the video is recorded at 30 fps then the parameter values have to be written every 1000/30 = 33 ms, in a loop. To get this accuracy is a bit challenging, as it depends on the execution time of the ‘while’ loop, that writes the output. You may have to write every 30–33 ms to properly sync with the video, recorded at 30 fps. Consider the ‘Write Frequency’ as a hyper-parameter, since it depends on the machine speed. You can tune write frequency by doing 1-minute driving simulation and check how far off it is from 60x30 = 1800 frames. In the code below, you can tune the ‘writeFrequency’ parameter at nano-second accuracy.
create_train_params.py hosted with by GitHub - Car Console Interface in Python
The self-recorded driving videos required some pre-processing steps before it could be fed to the network. Following softwares were used, in sequence:
a) Lossless Cut — To cut out relevant video without loss.
b) Anytime Video Converter — To resize video to reduce data.
c) Video to JPG Converter — To extract frames from video.
Indian Driving Data Set with extracted video frames and corresponding driving control parameters which are generated using car console interface are uploaded to Kaggle. The extracted frames were fed to Convolutional Neural Network along with corresponding parameter values (generated by the simulator) for the purpose of training.
If you want to use an open data set instead, you have to write a program to parse the given file format. The below parser code is to read .h5 files, along with parameter values from comma.ai dataset. The above pre-processing steps such as ‘frame resize’ can be implemented in the below parser code itself.
parser.py hosted with by GitHub - Parser code for .h5 files in comma.ai
By now, we have training data set containing frame images & corresponding parameter values viz. steering angle, acceleration, brake and gear.
To predict steering angle from images, we can either use Deep Learning (CNN) or Computer Vision techniques. The feasibility of both methods are analyzed in coming up sections. But even a human, cannot predict brake or acceleration from individual frames. Hence, such a frame-fed CNN model is not enough to predict acceleration or brake parameter, as we miss out a significant information, i.e. the “sequence” of frames.
To address the ‘sequence’ issue, we can either use
- Complex CNN-RNN Architecture (or)
- Computer Vision techniques for assistance.
We have chosen CV techniques to quantify the ‘sequence’ information, as it can be readily computed from the recorded video. Such a quantified value is used to assist acceleration and brake prediction. Gear can be predicted based on all remaining parameters, i.e. steering, acceleration, brake & sequence information. Hence, Machine Learning techniques are used. Clutch can be easily calculated based on gear shift.
a) Steering Angle Prediction
To predict steering angle, there is ample information for CNN network to learn, i.e. the angle of lane lines, the curvature of the road, the sign boards, the turning angle of other vehicles and pedestrians, the curvature of mid-road divider etc. Besides, there are computer vision techniques to find the angle of lane lines on the road, which in turn help decide steering angle. Below we introduce both the methods to analyze a better suit.
# Using Hough Transformation
To estimate the required angle of steer, Hough Transformation can be applied on extracted frames to find the coefficient & theta of ‘lane lines’ using OpenCV (Courtesy: Github project)
The sequence of Image Processing Steps are as shown below:
Hough Transformation can help estimate lines from point spread in images.
Disconnected Lane Lines after edge detection can also be detected by analyzing the butterfly width of the accumulator array as shown below:
The intersection of sine waves in Hough space would give the parameters of the line, to be estimated. The angle of the estimated lane line would help determine the steering angle, for every frame. In the visualization below, three point spread clouds corresponding to 3 lines generate 3 sine waves with 3 intersections in the Hough Space, on the right.
But, the above Computer Vision techniques are not suitable to build our autonomous car, as we want to self-drive on Indian roads, where such a consistent information like lane lines or dividers may not be present. Hence, we use sole Deep Learning to predict Steering Angle.
# Using Deep Learning Model
We need to predict steering angle for each individual frame. We can take one set of steering angles, and corresponding frame images to feed a CNN model and attempt to minimize MSE training loss. A model, thus trained in multiple iterations, and epochs, can be used to predict the steering angle, even in uneven Indian settings. Towards this purpose, NVidia’s end-to-end CNN Architecture for self driving cars is used. 
Model Interpretability: It is non-trivial to understand what is going on inside deep learning models. For CNN models, we can visualize the hidden layers to understand the features picked up by individual filters.
visualiize_cnn.py hosted with by GitHub - CNN Visualization Code
The input image frame and hidden layer visualization output is shown below:
CNN Hidden Layer Visualization for different filters
Three driving control parameters viz. Steering Angle, Acceleration and Brake, are trained with different models and are concurrently executed at run time to drive the car. The code to train one of the control parameters is as below.
train.py hosted with by GitHub - To Train the ‘Acceleration’ control variable. Similar training is done for Steer & Brake.
b) Acceleration & Brake Prediction
As acceleration and brake inherently depends upon the sequence of frames in the video, we take assistance of computer vision to predict both parameters.
Optical Flow represents apparent motion of objects and edges in a scene caused by relative motion between an observer and a scene. In the recorded video, we can analyze the optical flow of the strong corners, across previous frames to estimate future frame flow.
Let there be ‘k’ strong corners in the (n-1)th & (n-2)th frame, where ’n’ is the present frame. Lets say corner, c has moved to c’ from one frame to next. The optical flow can be quantified by summing up the distance shift of all the corner points across previous frames.
But, it has been found that the sum would squarely increase, proportional to the number of corner points detected. Moreover, the sum would sharply increase along the curves because the pixels in the frames would shift from left to right or right to left.
Based on the above logic, the below equation has been empirically worked out to quantify acceleration from optical flow.
Below python program analyze recorded video & compute the optical flow parameter, ‘gamma’ across each adjacent frame in video and output to file.
compute_optical_flow1.py hosted with by GitHub
As the computed optical parameter values are unbounded, apply sigmoid function or hyperbolic tangent function to squash the values. Since tanh bounds all positive values between 0 &1 as against 0.5 to 1 for sigmoid, tanh becomes our natural choice. The amount of squashing can be adjusted by ‘alpha’ parameter as visualized below.
Based on the spread of ‘gamma’ values, we can decide the ‘alpha’ value to match the distribution of recorded acceleration values and then multiply by maximum acceleration value to scale from 0 to 1. In this case, we computed alpha = 0.18 to avoid any gamma value in the asymptotic end of hyperbolic tangent curve.
Here, alpha is denoted by delta symbol
Combine both the values to predict the acceleration and brake values.
While the car is running, optical flow values from dash cam video is computed and above computation is done on the fly, to predict acceleration & brake. The implementation of above formulae are in Concurrent Execution section below.
To compute “DL Prediction” in the above equation, a 3-Convolution Layer model inspired by the famous LeNet architecture is used. Thus, for steer NVidia’s end-to-end CNN Architecture and for Accel and Brake LeNet-inspired architecture is coupled with optical flow parameter feature.
The prototyping with different models are done in Keras as below.
lenet_accel.py hosted with by GitHub -LeNet-inspired 3-Convolution Layer Architecture in Keras
c) Gear & Clutch Prediction
Gear doesn't depend upon individual frame images but the combination of all parameters i.e. steering angle, acceleration, brake & also sequence information. Hence, we can use a machine learning algorithm to predict the dependent variable ‘gear’ using the above parameters, as input features.
We know tree based ML algorithms perform better when features are less, as in this case. After trying out multiple algos like Logistic Regression, Naive Bayes, GBDT etc, RandomForest is found to give maximum accuracy. Instead of relying on the gear prediction from an individual frame, the autonomous car would do gear shift only if gear prediction of last ’n’ frames are the same. Clutch can be trivially computed while doing gear shift.
predict_gear.py hosted with by GitHub - RandomForest Algorithm to predict gear of last ’n’ frames.
Concurrent Model Execution
The models for steering, acceleration, brake and gear are simultaneously executed in different sessions with corresponding computation graphs, to predict all driving control values for each frame.
run_dataset.py hosted with by GitHub - Concurrent Execution of all models at Run-time
The car below is being self-driven based on the dash-cam video input. It works well with new tracks, unseen while training the network.
The source code of this project can be found here
To imagine our world with all the pros of self driven cars seems no less than a sci-fi movie in which unmanned ferrying objects(UFOs) are meandering our roads. For general public, it may be ‘Alice in wonderland’.
But today’s wonderland can turn into a breathtaking reality in future and then human driving cars would be a tale, told by grannies to their grandchildren. But for this to realize, it has to pass all the hurdles i.e ethical, technological and logistical, to become a real world utopia. For instance, infrastructure needs a complete overhaul and technology should handle all extreme cases, especially in rough Indian roads, to preempt a Uber-like incident.
“People change very slowly, technology change very quickly”, said Jeremy P Birnholtz, Professor, Northwestern University. How efficiently humans and self driven cars adopt to each other should be better left to the test of time.
People are so bad at driving cars that computers don’t have to be that good to be much better ~ Marc Andreesen