Bag of Words

A type of semantic representation of data, which can be extracted from page contents.

The Bag of Words is a basic method of semantic representation extracted from page contents. It treats text as an unordered collection of words, with the frequency of each word serving as a feature. While simple, this representation can be processed by machine learning models, such as those used to calculate an information gain score. Along with embeddings and histograms, the bag of words model acts as a data point that informs the algorithm about the textual content of a document.