Understanding Word2Vec: A Dive into Vectorized Word Representations

5 min readNov 13, 2023

In the realm of Natural Language Processing (NLP), representing words as vectors has become a cornerstone for various applications. One powerful technique for this purpose is Word2Vec, which transforms words into high-dimensional vectors, typically 32 dimensions or more. The primary motivation behind using Word2Vec lies in its ability to preserve relationships between words, handle the addition of new words to the vocabulary, and deliver superior results in a wide array of deep learning applications.

Why Word2Vec?

1. Relationship Preservation

Word2Vec excels in capturing semantic relationships between words. For instance, in the sentences “The kid said he would be a footballer” and “The child said he would be a footballer,” Word2Vec recognizes that ‘Kid’ and ‘Child’ share a similar context and attempts to represent them as vectors with similar directions.

2. Dynamic Vocabulary

It accommodates the addition of new words to the vocabulary. This adaptability is crucial in real-world applications where language is constantly evolving.

3. Improved Results

Word2Vec’s vectorized representations yield enhanced performance in various deep learning tasks, making it a valuable tool in natural language understanding.

How Relationships are Formed

Word2Vec builds relationships by ensuring that words with similar contexts have similar embeddings. Two key algorithms, Continuous Bag of Words (CBOW) and Skip Gram, are employed for this purpose.

Continuous Bag of Words (CBOW)

Working of CBOW:

Certainly! Let’s walk through the Continuous Bag of Words (CBOW) model using the example “Liverpool FC are a football team.”

CBOW Example:

Encoding Words:

Encode each word in the sentence as a one-hot vector.

Example:

“Liverpool”: [1, 0, 0, 0]
“FC”: [0, 1, 0, 0]
“are”: [0, 0, 1, 0]
“a”: [0, 0, 0, 1]

Selecting Window Size:

Choose a window size for iterating over the sentence. Let’s consider a window size of 2 for this example.

Neural Network Prediction:

For each word in the sentence, use the surrounding context words to predict the target word.

Example: For the center word “FC,” use the context words “Liverpool” and “are” to predict it.

Input: [1, 0, 0, 0] + [0, 0, 1, 0]   -->   Output: [0, 1, 0, 0]   (Predict "FC")

Update the weights in the neural network based on the predicted target word.

Weight Update:

Update the weights in the neural network using backpropagation and the error between the predicted and actual target word.

Sliding Window:

Slide the window to the next center word (e.g., “Liverpool”) and repeat the process.

Repeat Iterations:

Continue iterating over the entire sentence, updating weights, and generating vectors for each word in the vocabulary.

In summary, CBOW predicts the target word (center word) based on the surrounding context words. The model learns to associate the context with the target word, updating its weights to improve predictions. This process is repeated iteratively, resulting in vector representations for each word in the vocabulary.

Skip-Gram

In contrast to CBOW, Skip-Gram predicts context words from a given target. It follows a similar process of window iteration, neural network prediction, and weight updates.

Working of Skip-Gram :

Let’s delve into the Skip-Gram model using the example “Liverpool FC are a football team.”

Skip-Gram Example:

Encoding Words:

Start by encoding each word as a one-hot vector. Each word in the sentence gets a unique one-hot representation.

Example:

“Liverpool”: [1, 0, 0, 0]
“FC”: [0, 1, 0, 0]
“are”: [0, 0, 1, 0]
“a”: [0, 0, 0, 1]

Selecting Window Size:

Similar to CBOW, choose a window size for iterating over the sentence. Let’s consider a window size of 2 for this example.

Neural Network Prediction:

For each word in the sentence, use the selected word as the target and predict the surrounding context words.

Example:

For the target word “FC,” predict the context words “Liverpool” and “are.”

Input: [0, 1, 0, 0]   -->   Output: [1, 0, 0, 0]   (Predict "Liverpool")
Input: [0, 1, 0, 0]   -->   Output: [0, 0, 1, 0]   (Predict "are")

Update the weights in the neural network based on the predicted context words.

Weight Update:

Update the weights in the neural network using backpropagation and the error between the predicted and actual context words.

Sliding Window:

Slide the window to the next target word (e.g., “Liverpool”) and repeat the process.

Repeat Iterations:

Continue iterating over the entire sentence, updating weights, and generating vectors for each word in the vocabulary.

In essence, Skip-Gram, unlike CBOW, predicts context words given a target word. It aims to learn the relationships between words by training the model to predict the context based on a given center word. This process is repeated throughout the training data, refining the vector representations for each word in the vocabulary.

Code:

In conclusion, Word2Vec, through algorithms like CBOW and Skip-Gram, offers a robust approach to represent words in a vectorized form, capturing intricate semantic relationships. By understanding its working principles and optimizing parameters, practitioners can harness the power of Word2Vec for various natural language processing tasks.

Understanding Word2Vec: A Dive into Vectorized Word Representations

Why Word2Vec?

1. Relationship Preservation

2. Dynamic Vocabulary

3. Improved Results

How Relationships are Formed

Continuous Bag of Words (CBOW)

Working of CBOW:

CBOW Example:

Encoding Words:

Example:

Selecting Window Size:

Neural Network Prediction:

Weight Update:

Sliding Window:

Repeat Iterations:

Skip-Gram

Working of Skip-Gram :

Skip-Gram Example:

Encoding Words:

Example:

Selecting Window Size:

Neural Network Prediction:

Weight Update:

Sliding Window:

Repeat Iterations:

Code:

Written by Shankar Sharma

No responses yet