Latent Dirichlet Allocation
Hidden Dirichlet Allocation LDA is a topic model that can express the topic of each document in a document set in the form of probability distribution. It is also an unsupervised learning algorithm that does not require manually annotated training sets for training. It only requires a document set and a specified number of topics K. In addition, some words can be found to describe each topic.
LDA was first proposed by Blei, David M., Jordan, Michael I and Andrew Ng in 2003. It is currently used in the field of text mining such as text topic identification, text classification and text similarity calculation.
LDA is a typical bag-of-words model, that is, an article is a collection of words, and there is no order or precedence between words. A document can contain multiple topics, and each word in the document is generated by the corresponding topic.