This report describes the simulation and benchmarking steps taken in order to predict the parallel performance of an application using Dimemas and Cache-level simulations. Using Dimemas [3] the time
behaviour of NAS [1] integer sort was simulated for the architecture of the Barcelona Super Computer, MareNostrum [4]. The performance was evaluated as a function of the architecture latency, bandwidth,
connectivity and CPU speed. For Cache-Level Simulations, Intel's pin tool was used to benchmark a simple parallel application in function of the cache and cluster sizes.