Bag of Words (BoW)
The Bag of Words model is a fundamental method for text representation. It converts text into a vector of word frequencies, disregarding grammar and word order.
- Vocabulary Creation:
- A vocabulary of all unique words in the text corpus is created.
- Frequency Vector:
- Each document is represented as a vector indicating the frequency of each word in the vocabulary.
- Example:
- Corpus: ["I love data science", "data science is amazing"]
- Vocabulary: ["I", "love", "data", "science", "is", "amazing"]
- Frequency Vectors:
- Document 1: [1, 1, 1, 1, 0, 0]
- Document 2: [0, 0, 1, 1, 1, 1]