|Abstract||Profiling is of great assistance in understanding and optimizing an application’s behavior. Today’s profiling techniques help developers focus on the pieces of code leading to the highest penalties according to a given performance metric. In this paper we describe a profiling tool we have developed by extending the Valgrind framework and one of its tools: Callgrind. Our extended profiling tool provides new object-differentiated profiling capabilities that help software developers and hardware designers (1) understand access patterns, (2) identify unexpected access patterns, and (3) determine whether a particular memory object is consistently featuring a troublesome access pattern. We use this tool to assist in the partition of big data objects so that smaller portions of them can be placed in small, fast memory subsystems of heterogeneous memory systems such as scratchpad memories. We showcase the potential benefits of this technique by means of the XSBench miniapplication from the CESAR codesign project. The benefits include being able to identify the optimal portion of data to be placed in a small scratchpad memory, leading to more than 19% perfor- mance improvement, compared with nonassisted partitioning approaches, in our proposed scratchpad-equipped compute node.