Naive Bayes
Best for: Simple text classification
How it works
$$\hat{y}=\arg\max_y\ P(y)\prod_i P(x_i\mid y)$$For text, Multinomial Naive Bayes models word counts in a document of class $y$ as draws from a multinomial with $P(w_i\mid y)=\frac{N_{iy}+\alpha}{N_y+\alpha V}$, where $V$ is the vocabulary size and $\alpha$ is Laplace smoothing. Prediction combines prior and likelihood under the conditional-independence assumption, $\hat{y}=\arg\max_y P(y)\prod_i P(x_i\mid y)$, computed in log-space to avoid underflow. Despite its independence assumption it is fast, robust on small datasets, and a long-standing spam/sentiment baseline.
Common fields
Spam · sentiment · document labels