Big data and AI are joined at the hip: the best AI applications require massive amounts of constantly updated training data to build state-of-the-art models. AI has always been one of the most exciting applications of big data. Project Hydrogen is a major Apache Spark initiative to bring the best AI and big data solutions together. It introduced barrier execution mode to Spark 2.4.0 release to help distributed model training, and it explores optimized data exchange to accelerate distributed model inference. In this talk, we will explain why barrier execution mode is needed, how it works, and how to use it to integrate distributed DL training on Spark. We will demonstrate HorovodRunner, the first Spark+AI integration powered by Project Hydrogen. It is based on the Horovod framework developed by Uber and Databricks Runtime 5.0 for Machine Learning. We will also share our experience and performance tips on how to combine Pandas UDF from Spark and AI frameworks to scale complex model inference workload.