Your cart is currently empty!
Semantic representation is a broad term encompassing the various ways that the meaning and context of a document or piece of text are quantified into a numerical format for computational processing. When machine learning models calculate an Information Gain score, they rely on several data points, one of which is the semantic representation of the data extracted from relevant pages.
There are several types of semantic representation referenced in the sources. The most advanced are embeddings (or semantic feature vectors), which capture the contextual meaning of the document. Simpler representations also exist, such as a bag of words (a frequency count of terms ignoring order) or a histogram generated from words and phrases. Entity data itself is also a form of salient extracted information used for ML inputs.
The selection and processing of these representations are vital steps in ensuring the accuracy and relevance of subsequent ML operations. For instance, using a semantic feature vector allows a neural network to process and compare the novelty of new documents against documents a user has already viewed, thereby generating a personalized information gain score.
