The document discusses virtualizing Hadoop clusters on VMware vSphere. It describes how Hadoop enables parallel processing of large datasets across clusters using MapReduce. Virtualizing Hadoop provides benefits like simple operations, high availability, and elastic scaling. The document outlines challenges with using Hadoop and how virtualization addresses them. It provides examples of deploying Hadoop clusters on Serengeti and configuring different distributions. Performance results show little overhead from virtualization and benefits of local storage. Joint engineering with Hortonworks adds high availability to Hadoop master daemons using vSphere features.