My Blog.

TF-IDF

TF-IDF is a statistical measure used to evaluate the importance of a word in a document relative to a corpus.

  • Term Frequency (TF):

    • Measures how frequently a term appears in a document.
    • Formula: ( \text{TF}(t, d) = \frac{\text{Frequency of term } t \text{ in document } d}{\text{Total terms in document } d} )
  • Inverse Document Frequency (IDF):

    • Measures how important a term is in the entire corpus.
    • Formula: ( \text{IDF}(t) = \log \left( \frac{\text{Total number of documents}}{\text{Number of documents containing term } t} \right) )
  • TF-IDF Score:

    • Combines TF and IDF to calculate the importance of a term.
    • Formula: ( \text{TF-IDF}(t, d) = \text{TF}(t, d) \times \text{IDF}(t) )