The document proposes a new method for evaluating word vectors learned from multisense words. It constructs a dataset using concept hierarchies from WordNet and BabelNet to identify related words for each sense of multisense words. An evaluation metric is also proposed that calculates the precision of a word vector's nearest neighbors against the related words for each sense, and aggregates the scores. The dataset and metric are designed to properly evaluate models that learn multiple vectors for each word. Experiments demonstrate the dataset can evaluate sense vectors more appropriately than existing benchmarks like SimLex-999. The method provides a way to specifically examine how well models handle the multiple meanings of words.