09 Chapter

NLP & Text Mining

Extract meaning, structure, and signals from natural-language text.

NLP and text mining extract meaning, structure, and signals from natural-language text. The methods below span modern Transformer-based LLMs and retrieval through to fast classical baselines like TF-IDF and Naive Bayes for cheap, explainable classification.

  • Use LLMs/Transformers for high-quality NLP.
  • Use TF-IDF plus Logistic Regression/Linear SVM for cheap, fast, explainable text classification.
#AlgorithmBest forCommon fields
1Transformer Models / LLMs Most modern NLP tasks
  • Chatbots
  • search
  • summarization
  • translation
  • coding
2Embeddings + Vector Search Semantic search and retrieval
  • RAG
  • document search
  • recommendations
3TF-IDF + Linear Models Fast classical NLP baseline
  • Spam
  • legal search
  • support-ticket routing
4Naive Bayes Simple text classification
  • Spam
  • sentiment
  • document labels
5Topic Modeling: LDA, NMF Discovering themes in text
  • Research
  • customer feedback
  • news analysis
6CRF / HMM Sequence labeling, older NLP
  • Named entity recognition
  • POS tagging
7Word2Vec / GloVe / FastText Static word embeddings
  • Legacy NLP
  • semantic similarity