A quick dive into Language Models.

Aim:To understand the intuition and implement the NLP model techniques for understanding human language.

What is NLP?

So,Natural Language Processing(NLP) is a sub domain in Artificial Intelligence where the machines learn natural languages that the humans know which will facilitate the interaction between the computer and humans.

NLP encompasses wide range of tasks like:

Text Classification: Categorizing and assigning labels to text documents based on their content, such as sentiment analysis, topic classification, and spam detection.

Named Entity Recognition (NER): Identifying and extracting named entities from text, such as names of people, organizations, locations, and dates.

Sentiment Analysis: Determining the sentiment or opinion expressed in a piece of text, ranging from positive, negative, or neutral sentiment.

Machine Translation: Translating text from one language to another automatically, such as Google Translate.

Question Answering: Building systems that can answer questions posed in natural language, such as chatbots or question-answering systems.

Text Summarization: Generating concise summaries of long texts, such as news articles or research papers.

Speech Recognition: Converting spoken language into written text, enabling voice assistants and voice-controlled systems.

To give yall an intuition as to where the model that we are going to build,stands

Look at this Venn diagram:

The model that we are going to build lies in the intersection between NLP and DL.

We will be using neural net to learn the statistical properties of the language.

By statistical properties of language,I mean:

1.Word Co-occurrence: Certain words tend to co-occur together more often than would be expected by chance. This property, known as word co-occurrence, forms the basis for various language modeling techniques.

For example:

Consider these facts about me:

"I love to eat chicky."

"chicky is my favourite food dawg."

"I order chicky every sunday."

So notice,in these sentences:

The word “chicky” co occurs with the words “to” and “eat”,”is” and “my”,”order” and “every” considering a context window of 2

Considering Multiple instances in different sentences

A food item in general more often than not co-occurs with “eat”,”order”,time frames like ”every Friday” etc etc.This forms the main basis for contextual learning of the semantic relationships of the words.

2. Words with similar meanings tend to appear in similar contexts. This property forms the foundation for distributional semantics, where word meanings are derived from the distributional patterns of words in a text corpus.

For example:

Consider the words “fast” and “faster”.

i."Yesh can run so fast".

ii."Yesh can run faster than gonda".

Notice the context we talking about here, is running.

3.Zipf’s law states that the frequency of occurrence of words in a given sentence is

Inversely proportional to the rank of the frequency distribution.

In a dataset containing 333,333 unique English words.these were the count of number of

Times the words on the left were repeated. With rank of “the” being 1,”of” being 2

And so on.The lesser the rank,the lesser is the amount of information it conveys and we

Remove these words during the data preprocessing stage because it is those unique

Words that we need to learn.

Link for the source: https://www.kaggle.com/datasets/rtatman/english-word-frequency/code?resource=download

So,it is through these means that our model learns word association,gain contextual knowledge and also understand the semantic relationship between the words.

There are two ways of modelling in NLP:

1)classic model,where we use robust mechanisms(based of conditional probability) with certain assumptions like for instance,We assume that all the words in the vocabulory of a language say,English is known which isn’t the case in the reality.It would require some feature engineering to identify patterns,leading to limited pattern identification.One popular model is N-gram model and we shall go through it all later on.

2)Deep learning model,where we use a neural net to “learn” the language as in forming associations between the words and understanding context.It is a little complex compared to classic models but perform wise,these are better due to their ability to learn on its own,identifying as many patterns as possible,no feature extraction neede as they study from raw input.One such models is called word2vec.

The model we choose depends on factors like Data availability and performance requirements.It is known that classic models which are simple and transparent perform way better than complex models when we use a huge dataset to train it.while complex models perform better with smaller datasets.

By now,I hope yall got quick insight as to what exactly we are getting ourselves into.In the following blogs,we will take a deeper dive into one classic model and one deep learning model.Till then,stay tuned !!

A quick dive into Language Models.

Recent Posts

コメント