My Blog.

Bag of Words (BoW)

The Bag of Words model is a fundamental method for text representation. It converts text into a vector of word frequencies, disregarding grammar and word order.

  • Vocabulary Creation:
    • A vocabulary of all unique words in the text corpus is created.
  • Frequency Vector:
    • Each document is represented as a vector indicating the frequency of each word in the vocabulary.
  • Example:
    • Corpus: ["I love data science", "data science is amazing"]
    • Vocabulary: ["I", "love", "data", "science", "is", "amazing"]
    • Frequency Vectors:
      • Document 1: [1, 1, 1, 1, 0, 0]
      • Document 2: [0, 0, 1, 1, 1, 1]