This document describes a technique for automatically partitioning sequential packet processing applications into coordinated parallel subtasks that can be efficiently mapped to pipelined network processor architectures. The technique balances work among pipeline stages and minimizes data transmission between stages. It was implemented in an auto-partitioning C compiler for Intel network processors. Experimental results showed over 4x speedups for IPv4 and IP forwarding benchmarks on a 9-stage pipeline compared to non-partitioned code.