This document describes the design of a pipelined processing unit for a DSP FFT processor. It includes fused floating point units like a dot product unit and add/subtract unit to perform FFT butterfly operations more efficiently. The dot product unit performs two multiplications and an addition/subtraction in one cycle to reduce latency and area compared to discrete implementations. The add/subtract unit calculates the sum and difference of two numbers in parallel. These fused units are used to implement a radix-2 FFT butterfly that is 20% faster and 30% smaller than a conventional design. The processing unit can perform 26 different floating point and logical operations needed for FFT processing. Simulation results show the performance benefits of the fused units and radix-2