Leveraging direct curvature control for autonomous driving with deep learning.

Preface:

The use of deep learning for autonomous driving is a widely researched topic and has shown some interesting results [4]. However, the focus of such research has usually been on the internal architecture of the neural networks and less so on the impact of the output format of the neural network. My (remote) work at the University of Washington Seattle between April-July 2020, under the guidance of Dr. Christoforos Mavrogiannis and Matthew Schmittle, revolved around an approach wherein the neural network was used to predict a trajectory instead of directly predicting the steering angles. The trajectory could then be followed by a classical controller. A modified version of the Donkey Simulator was used for the generation of training data and validation through simulation.

This work began in April 2020, and the results were collected by the end of July 2020. As this work was the byproduct of me creating a deep learning guide for the MuSHR project, the intent of working in this direction was not to write a research paper, however, we were considering converting it into one, until I came across the work done in [2], wherein they investigated the same (or a highly similar) problem statement and had published it on Arxiv on 6th May 2020. Thus, I decided to write down my experience in the form of a blog instead. If you squint hard enough, it could be considered a peer review.

Motivation

Driving, or most mobile-robotics control tasks, are fundamentally temporal in their nature which is apparent from the convergence of classical control towards techniques that predict a sequence of actions, rather than just one instantaneous action (e.g.: MPC, MPPI controllers). However, most approaches to behavior cloning tend to ignore this and train the agent to predict the actions only for that instant. There have been attempts at incorporating the temporal nature of the task into the agent through means of decoupling of the perception, planning, and control[5]. However, decoupling to this extent results in overall greater processing requirements, which increases the processing time or the cost of the hardware. Our intent was to instead utilize the neural network to perform perception and planning in a combined fashion to reduce the overall computation and perform the control using classical techniques to deal with non-linearity and explainability issues.

By teaching the neural network to predict trajectory instead of instantaneous output, we essentially teach the network to generate a sequence of desired states in the future. It was initially expected that if the supervisor's trajectory was smooth, the predicted trajectory would also be smooth. However, predicting discrete (x,y) coordinates may not necessarily produce perfectly smooth outputs and could require further post-processing (the approach of predicting discrete states has been used in [6, 7, 8]). For this reason, it made sense to predict the parameters of a parametric curve instead. The reason for choosing Bezier curves was that the coefficients of the curve can be obtained simply from the (x,y) coordinates of the control points. This is useful for the regression-based learning approach of the network in contrast to using recursively defined curves like B-Splines [2].

Approach:

We tried two approaches for predicting a trajectory from the neural network. The first was predicting the trajectory as a region within which the car could drive (a top-down view) similar to how cost-map is predicted in [5], which would then be followed by a Stanley controller. The second approach was to predict Bezier coefficients for capturing a representation of the trajectory which would then be followed by using direct curvature control.

Direct trajectory prediction:

Prediction:

In the first approach, the label for each image is the car’s trajectory (in its own reference frame) from its current position to its position one second in the future plotted onto the image plane (by warping the trajectory). The neural network used here is the one shown below. It was taken from this project.

Control:

A Stanley controller was used to follow the predicted trajectory.

While the results for this approach are also included in the comparison, this approach is not representative of the approach taken in [5].

The structure of the neural network used here is significantly different from that used for the other two approaches
The controller used here for following the predicted trajectory is different from the one used for following the parametric curve approach.

The approach taken in [5] uses this approach to generate a cost-map which is used by an MPPI to generate the trajectory. However, the results for the computation time are still relevant.

Indirect trajectory prediction

Prediction:

In the second approach, a convolutional neural network was used to predict the parameters for a 3rd order Bezier curve that represents the trajectory of the agent relative to its current pose. The neural network used here is as shown below and was taken from this project. The only difference here was the number of outputs in the final layer. The same network was used for the image to steering model as well, with the only difference being the output dimensions.

The first and fourth points of the Bezier curve would simply be the car’s current pose and the car’s pose one second in the future (in the car's own reference frame) scaled down by the speed of the car, which was held constant during the dataset generation. The construction of the bezier curves for the dataset generation was the same as the approach used in my autonomous driving project. Note that in [2], a bezier curve is fit onto the real trajectory which avoids approximations.

Control:

Instead of using a PID/Stanley controller to follow the trajectory, we decided to take advantage of the fact that it is possible to directly calculate the curvature of the bezier curve at any point analytically. Thus, the steering angle would be found directly from the curvature of the bezier curve at the point corresponding to the next time step for the controller. This was inspired by my work with direct curvature control in my autonomous driving project.

Proposed metric for comparison:

We compare the performance of the different approaches by comparing how smoothly they drive in a simulation. For this purpose, the trend of the absolute rate of change of yaw rate is used as a metric for evaluating the performance of an approach. A lower overall rate of change of turn rate would indicate smoother driving. The comparison would also include the performance of a human driver to serve as a baseline. A fixed track was used for evaluating all the approaches.

Implementation:

The experiment was done by using a modified version of the Donkey simulator. The simulator was modified to provide greyscale images from 3 cameras, one looking straight ahead and the other two looking 15 degrees off-center towards the left and right. The left and right images are treated as if they were from the center camera and thus the ground truth for these images is also “rotated” 15 degrees left and right respectively. This was done for two reasons:

To ensure that the data remains balanced (equal number of cases where the car is going left, right, and straight ahead)
It solves the problem of “covariate shift”; in simple terms, a network trained on a dataset that is always already at the center of the lane and facing along the lane will have a hard time dealing with situations where it has drifted off to one side or is moving out of the lane.

The collected data included the images, steering, speed, and pose data of the vehicle.

For the image to steering approach, the label was the steering angle scaled between -1 and 1, albeit split into 2 variables (left and right).

For the image to the Bezier curve approach, a Bezier curve was constructed between the car’s current pose and its pose one second into the future.

For both of the above approaches, in order to have a fair comparison, the same network architecture is used until the final layer. In the final layer, the image-to-steering network has 2 outputs while the image-to-bezier network has 5 outputs.

The following figure shows how the sub-components of the system were connected:

The code for the implementation is available here. The file model_runner.py is for running agents, the controller.py is for manual control. Both send outputs to the intermediary CAR.py program which can switch between manual and autonomous control and push those commands to the client program, which then pushes them to the simulator.

Results:

Computation time comparison

(running on NVIDIA GTX 1050 GPU for inference and i7 7th gen 3.6 GHz for post-processing):

Direct trajectory prediction (image to image): 16 ms

Image to steering: 5 ms

Indirect trajectory prediction (image to bezier): 5ms

The first graph shows a comparison of the envelopes of the rate of change of turn rate. The envelope is calculated by assigning the RMS value of a sample (sample size was 1/30th of the total number of points as the data was recorded at 15 fps) to each value within that sample.

The second graph shows a statistical comparison of the data shown in the first graph. From the second graph, it can be concluded that the image to bezier approach, combined with an appropriate controller, results in smoother driving. The results for the Direct trajectory estimation are less than ideal for the reason that the trajectory is simply predicted as the average center-line of the predicted region. It is possible that better behavior may be obtained by using either an MPC or an MPPI (the same was not pursued in this particular work. A comparison between an improved version that uses MPPI against the Bezier approach may be interesting, provided that the neural networks used are similar).

Conclusion:

The results of this experiment appear to concur with the results found in [2]; The performance is indeed improved in comparison to direct steering angle prediction approaches. Thus, if you squint hard enough, you could consider this a peer review of [2]. I feel that the combination of parametric curves with neural networks is quite interesting, not just for driving, but robotics in general, for example, one could train a very thin neural network to predict control profiles as Bezier curve parameters using an MPC or MPPI as a supervisor, thus getting the performance of an MPC/MPPI but speeding up the computation by leveraging the GPU.

References:

[1] Exploring the Limitations of Behavior Cloning for Autonomous Driving

Felipe Codevilla, Eder Santana, Antonio M. Lopez, Adrien Gaidon; The IEEE International Conference on Computer Vision (ICCV), 2019, pp. 9329-9338

[2]Trent Weiss, & Madhur Behl. (2020). DeepRacing: Parameterized Trajectories for Autonomous Racing.

[3] P. Hamm, D. Jayaraman, and S. Levine. Causal confusion in imitation learning. In ”Neural Information

Processing Systems Imitation Learning and its Challenges in Robotics Workshop (NeurIPS ILR), 2018.

[4] End to End Learning for Self-Driving Cars Mariusz Bojarski NVIDIA Corporation Holmdel et al.

[5] Paul Drews, Grady Williams, Brian Goldfain, Evangelos A. Theodorou, & James M. Rehg. (2017). Aggressive Deep Driving: Model Predictive Control with a CNN Cost Model.

[6] S. H. Park, B. Kim, C. M. Kang, C. C. Chung and J. W. Choi, "Sequence-to-Sequence Prediction of Vehicle Trajectory via LSTM Encoder-Decoder Architecture," 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, 2018, pp. 1672-1678, doi: 10.1109/IVS.2018.8500658.

[7] B. Kim, C. M. Kang, J. Kim, S. H. Lee, C. C. Chung and J. W. Choi, "Probabilistic vehicle trajectory prediction over occupancy grid map via recurrent neural network," 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, 2017, pp. 399-404, doi: 10.1109/ITSC.2017.8317943.

[8] Y. Sun, W. Zuo and M. Liu, "See the Future: A Semantic Segmentation Network Predicting Ego-Vehicle Trajectory With a Single Monocular Camera," in IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 3066-3073, April 2020, doi: 10.1109/LRA.2020.2975414.