TF-IDF
TF-IDF is a statistical measure used to evaluate the importance of a word in a document relative to a corpus.
-
Term Frequency (TF):
- Measures how frequently a term appears in a document.
- Formula: ( \text{TF}(t, d) = \frac{\text{Frequency of term } t \text{ in document } d}{\text{Total terms in document } d} )
-
Inverse Document Frequency (IDF):
- Measures how important a term is in the entire corpus.
- Formula: ( \text{IDF}(t) = \log \left( \frac{\text{Total number of documents}}{\text{Number of documents containing term } t} \right) )
-
TF-IDF Score:
- Combines TF and IDF to calculate the importance of a term.
- Formula: ( \text{TF-IDF}(t, d) = \text{TF}(t, d) \times \text{IDF}(t) )