A Bigram model is a language model in which we predict the probability of the correctness of a sequence of words by just predicting the occurrence of the word “a” after the word “b”.
Example:
Consider the following sentence:
- “I love machine learning.”
To build a bigram model from this sentence, we break it down into bigrams (pairs of words):
- (“I”, “love”)
- (“love”, “machine”)
- (“machine”, “learning”)
How the Bigram Model Works:
- Training:
- Suppose you have a corpus of text and you want to train a bigram model. You would count how often each bigram appears in the text.
- For instance, in a large corpus, you might find that “I love” appears 100 times, “love machine” appears 50 times, and so on.
- Probability Calculation:
- The bigram model calculates the probability of a word given the previous word.
- For example, the probability of “love” given “I” would be:
- P(love|I)=Count(I love)/Count(I)
- If “I love” appears 100 times and “I” appears 200 times in the corpus, then:
- P(love|I)=100/200=0.5
- Sentence Generation:
- To generate a new sentence, the model starts with an initial word and uses the bigram probabilities to predict the next word.
Practical Example:
Given a small corpus:
- “I love machine learning.”
- “I love coding.”
- “Coding is fun.”
Bigrams and their counts:
- (“I”, “love”): 2
- (“love”, “machine”): 1
- (“machine”, “learning”): 1
- (“love”, “coding”): 1
- (“coding”, “is”): 1
- (“is”, “fun”): 1
Using these counts, you can calculate the probabilities for each bigram, which the model uses to predict or generate text.