Hive is a data warehouse infrastructure tool that allows users to query and analyze large datasets stored in Hadoop. It uses a SQL-like language called HiveQL to process structured data stored in HDFS. Hive stores metadata about the schema in a database and processes data into HDFS. It provides a familiar interface for querying large datasets using SQL-like queries and scales easily to large datasets.