This document discusses vectorization, which is the process of converting raw data like text into numerical feature vectors that can be fed into machine learning algorithms. It covers the vector space model for text vectorization where each unique word is mapped to an index in a vector and the value is the word count. Common text vectorization strategies like bag-of-words, TF-IDF, and kernel hashing are explained. General vectorization techniques for different attribute types like nominal, ordinal, interval and ratio are also overviewed along with feature engineering methods and the Canova tool.