1. The document discusses using PySpark and Pandas UDFs to perform machine learning at scale for genomic data. It describes a genomics use case called GloWGR that uses this approach. 2. Three key problems are identified with existing tools: genomic data is growing too quickly; bioinformaticians are unfamiliar with Scala; and ML algorithms are difficult to write in Spark SQL. The solutions proposed are to use Spark, provide a Python client, and write algorithms in Python linked to Spark. 3. GloWGR is presented as a novel whole genome regression and association study algorithm built with PySpark. It uses Pandas UDFs to parallelize the REGENIE method and perform tasks like dimensionality