Posts by Tags

ALBERT

Transformers

15 minute read

Published:

Transformers: This post contains my notes throughout years on different transformers. These notes are very crude and not edited yet (more like my cheat sheets), but I thought to share it anyway. Please let me know if you have any comments or if you find any mistakes. Images used in this blogpost, otherwise mentioned, are all taken from the papers on each model.

AMBERT

Transformers

15 minute read

Published:

Transformers: This post contains my notes throughout years on different transformers. These notes are very crude and not edited yet (more like my cheat sheets), but I thought to share it anyway. Please let me know if you have any comments or if you find any mistakes. Images used in this blogpost, otherwise mentioned, are all taken from the papers on each model.

AdapterHub

Transformers 2

5 minute read

Published:

This blog post is the continuation of my previous blog post, Transformers. In my previous blog post, I explained original Transformer paper, BERT, GPT, XLNet, RoBERTa, ALBERT, BART, and AMBER. In this blog post, I will explain MARGE, ConveRT, Generalization through Memorization, AdapterHub, and T5. Images and content used in this blogpost, otherwise mentioned, are all taken from the papers on each model.

BERT

Transformers

15 minute read

Published:

Transformers: This post contains my notes throughout years on different transformers. These notes are very crude and not edited yet (more like my cheat sheets), but I thought to share it anyway. Please let me know if you have any comments or if you find any mistakes. Images used in this blogpost, otherwise mentioned, are all taken from the papers on each model.

Masked Language Modeling + Fine Tuning for Text Classification with BERT

less than 1 minute read

Published:

My Colab notebook on Masked Language Modeling (MLM) + Fine Tuning for Text Classification with BERT. In this notebook, you can see how to train a BERT model on your data for MLM task and then fine tune it for text classification. This includes how to encode the data, masked the tokens (similar to here) and train a model from scratch (or train on a pretrained model :). You can load this model and fine tuned it on your labeled data for classification.

BERTSCORE

Text Summarization

9 minute read

Published:

Automatic summarization is the process of shortening a set of data computationally, to create a subset (a summary) that represents the most important or relevant information within the original content. Text summarization finds the most informative sentences in a document.

CRF

Conditional Random Field

2 minute read

Published:

In this post, I briefly explain what is conditional random Fields and how they can be used for sequence labeling. CRF is a discriminative model best suited for tasks in which contextual information or state of the neighbors affects the current prediction. CRFs are widely used in named entity recognition, part of speech tagging, gene prediction, noise reduction, and object detection problems.

Conditional Random Field

Conditional Random Field

2 minute read

Published:

In this post, I briefly explain what is conditional random Fields and how they can be used for sequence labeling. CRF is a discriminative model best suited for tasks in which contextual information or state of the neighbors affects the current prediction. CRFs are widely used in named entity recognition, part of speech tagging, gene prediction, noise reduction, and object detection problems.

ConveRT

Transformers 2

5 minute read

Published:

This blog post is the continuation of my previous blog post, Transformers. In my previous blog post, I explained original Transformer paper, BERT, GPT, XLNet, RoBERTa, ALBERT, BART, and AMBER. In this blog post, I will explain MARGE, ConveRT, Generalization through Memorization, AdapterHub, and T5. Images and content used in this blogpost, otherwise mentioned, are all taken from the papers on each model.

DSR

Text Summarization

9 minute read

Published:

Automatic summarization is the process of shortening a set of data computationally, to create a subset (a summary) that represents the most important or relevant information within the original content. Text summarization finds the most informative sentences in a document.

DeepLeaning

ICLR 2021

8 minute read

Published:

ICLR 2021

Digital Experimentation

GPT

Transformers

15 minute read

Published:

Transformers: This post contains my notes throughout years on different transformers. These notes are very crude and not edited yet (more like my cheat sheets), but I thought to share it anyway. Please let me know if you have any comments or if you find any mistakes. Images used in this blogpost, otherwise mentioned, are all taken from the papers on each model.

Generalization through Memorization

Transformers 2

5 minute read

Published:

This blog post is the continuation of my previous blog post, Transformers. In my previous blog post, I explained original Transformer paper, BERT, GPT, XLNet, RoBERTa, ALBERT, BART, and AMBER. In this blog post, I will explain MARGE, ConveRT, Generalization through Memorization, AdapterHub, and T5. Images and content used in this blogpost, otherwise mentioned, are all taken from the papers on each model.

ICLR

ICLR 2021

8 minute read

Published:

ICLR 2021

Knowledge Distillation

Knowledge Distillation

3 minute read

Published:

In this post, I will discuss what is knowledge distillation (also refered as Student-Teacher Learning), what is the intuition behind it, and why it works!

MARGE

Transformers 2

5 minute read

Published:

This blog post is the continuation of my previous blog post, Transformers. In my previous blog post, I explained original Transformer paper, BERT, GPT, XLNet, RoBERTa, ALBERT, BART, and AMBER. In this blog post, I will explain MARGE, ConveRT, Generalization through Memorization, AdapterHub, and T5. Images and content used in this blogpost, otherwise mentioned, are all taken from the papers on each model.

METEOR

Text Summarization

9 minute read

Published:

Automatic summarization is the process of shortening a set of data computationally, to create a subset (a summary) that represents the most important or relevant information within the original content. Text summarization finds the most informative sentences in a document.

Machine Learning

Knowledge Distillation

3 minute read

Published:

In this post, I will discuss what is knowledge distillation (also refered as Student-Teacher Learning), what is the intuition behind it, and why it works!

NAACL 2018, Summary of talks

less than 1 minute read

Published:

There was so much happening at NAACL; so many interesting works on all sorts of (old and new) NLP problems. Lots of papers focused on how to generalize the models beyond the conditions during training. In addition, there was workshop on “New Forms of Generalization in Deep Learning and Natural Language Processing”. In that workshop, Yejin Choi pointed out that natural language understanding (NLU) does not generalize to natural language generation (NLG). Another focus of the conference/workshops were on dialogue systems and chatbots. Lots of talks focused on using a knowledge graph in chatbots to have deeper conversations without staying on the topic for the whole conversations.

MachineLearning

ICLR 2021

8 minute read

Published:

ICLR 2021

Masked Language Modeling

Masked Language Modeling + Fine Tuning for Text Classification with BERT

less than 1 minute read

Published:

My Colab notebook on Masked Language Modeling (MLM) + Fine Tuning for Text Classification with BERT. In this notebook, you can see how to train a BERT model on your data for MLM task and then fine tune it for text classification. This includes how to encode the data, masked the tokens (similar to here) and train a model from scratch (or train on a pretrained model :). You can load this model and fine tuned it on your labeled data for classification.

NAACL

NAACL 2018, Summary of talks

less than 1 minute read

Published:

There was so much happening at NAACL; so many interesting works on all sorts of (old and new) NLP problems. Lots of papers focused on how to generalize the models beyond the conditions during training. In addition, there was workshop on “New Forms of Generalization in Deep Learning and Natural Language Processing”. In that workshop, Yejin Choi pointed out that natural language understanding (NLU) does not generalize to natural language generation (NLG). Another focus of the conference/workshops were on dialogue systems and chatbots. Lots of talks focused on using a knowledge graph in chatbots to have deeper conversations without staying on the topic for the whole conversations.

NLP

Edge Probing

3 minute read

Published:

In the past couple of years, Transformers has acheived state of art results in a variety of natural language tasks. In order to better understand Transformers and what they are learning in practice, researchers have done layer-wise analysis of Transformer’s hidden states to understand what the Transformer is learning in each layer. A wave of recent work has started to “prob” the state of the art Tranformers to inspect the structure of the network to assess whether there exist localizable regions associated with distinct types of linguistic decisions, both syntactic and semantic information. Researchers examine the hidden states between encoder layers directly and use those hidden states in a linear layer + softmax to predict what kind of information in encoded in each hidden state.

Transformers 2

5 minute read

Published:

This blog post is the continuation of my previous blog post, Transformers. In my previous blog post, I explained original Transformer paper, BERT, GPT, XLNet, RoBERTa, ALBERT, BART, and AMBER. In this blog post, I will explain MARGE, ConveRT, Generalization through Memorization, AdapterHub, and T5. Images and content used in this blogpost, otherwise mentioned, are all taken from the papers on each model.

Text Summarization

9 minute read

Published:

Automatic summarization is the process of shortening a set of data computationally, to create a subset (a summary) that represents the most important or relevant information within the original content. Text summarization finds the most informative sentences in a document.

NLP Papers

2 minute read

Published:

These are the most important transformer papers (in my opinion) that anyone working with Transformers should know. Also, there is a nice summary of Efficient Transformers: A Survey by folks at Google that I highly recommend as well.

Transformers

15 minute read

Published:

Transformers: This post contains my notes throughout years on different transformers. These notes are very crude and not edited yet (more like my cheat sheets), but I thought to share it anyway. Please let me know if you have any comments or if you find any mistakes. Images used in this blogpost, otherwise mentioned, are all taken from the papers on each model.

Conditional Random Field

2 minute read

Published:

In this post, I briefly explain what is conditional random Fields and how they can be used for sequence labeling. CRF is a discriminative model best suited for tasks in which contextual information or state of the neighbors affects the current prediction. CRFs are widely used in named entity recognition, part of speech tagging, gene prediction, noise reduction, and object detection problems.

Masked Language Modeling + Fine Tuning for Text Classification with BERT

less than 1 minute read

Published:

My Colab notebook on Masked Language Modeling (MLM) + Fine Tuning for Text Classification with BERT. In this notebook, you can see how to train a BERT model on your data for MLM task and then fine tune it for text classification. This includes how to encode the data, masked the tokens (similar to here) and train a model from scratch (or train on a pretrained model :). You can load this model and fine tuned it on your labeled data for classification.

Natural Language Processing

NAACL 2018, Summary of talks

less than 1 minute read

Published:

There was so much happening at NAACL; so many interesting works on all sorts of (old and new) NLP problems. Lots of papers focused on how to generalize the models beyond the conditions during training. In addition, there was workshop on “New Forms of Generalization in Deep Learning and Natural Language Processing”. In that workshop, Yejin Choi pointed out that natural language understanding (NLU) does not generalize to natural language generation (NLG). Another focus of the conference/workshops were on dialogue systems and chatbots. Lots of talks focused on using a knowledge graph in chatbots to have deeper conversations without staying on the topic for the whole conversations.

Query Understanding

ROUGE

Text Summarization

9 minute read

Published:

Automatic summarization is the process of shortening a set of data computationally, to create a subset (a summary) that represents the most important or relevant information within the original content. Text summarization finds the most informative sentences in a document.

RepresentaionLearning

ICLR 2021

8 minute read

Published:

ICLR 2021

RoBERTa

Transformers

15 minute read

Published:

Transformers: This post contains my notes throughout years on different transformers. These notes are very crude and not edited yet (more like my cheat sheets), but I thought to share it anyway. Please let me know if you have any comments or if you find any mistakes. Images used in this blogpost, otherwise mentioned, are all taken from the papers on each model.

Sequence Labeling

Conditional Random Field

2 minute read

Published:

In this post, I briefly explain what is conditional random Fields and how they can be used for sequence labeling. CRF is a discriminative model best suited for tasks in which contextual information or state of the neighbors affects the current prediction. CRFs are widely used in named entity recognition, part of speech tagging, gene prediction, noise reduction, and object detection problems.

Summarization

T5

Transformers 2

5 minute read

Published:

This blog post is the continuation of my previous blog post, Transformers. In my previous blog post, I explained original Transformer paper, BERT, GPT, XLNet, RoBERTa, ALBERT, BART, and AMBER. In this blog post, I will explain MARGE, ConveRT, Generalization through Memorization, AdapterHub, and T5. Images and content used in this blogpost, otherwise mentioned, are all taken from the papers on each model.

Text Classification

Masked Language Modeling + Fine Tuning for Text Classification with BERT

less than 1 minute read

Published:

My Colab notebook on Masked Language Modeling (MLM) + Fine Tuning for Text Classification with BERT. In this notebook, you can see how to train a BERT model on your data for MLM task and then fine tune it for text classification. This includes how to encode the data, masked the tokens (similar to here) and train a model from scratch (or train on a pretrained model :). You can load this model and fine tuned it on your labeled data for classification.

Text Summarization

Text Summarization

9 minute read

Published:

Automatic summarization is the process of shortening a set of data computationally, to create a subset (a summary) that represents the most important or relevant information within the original content. Text summarization finds the most informative sentences in a document.

Transformers

Edge Probing

3 minute read

Published:

In the past couple of years, Transformers has acheived state of art results in a variety of natural language tasks. In order to better understand Transformers and what they are learning in practice, researchers have done layer-wise analysis of Transformer’s hidden states to understand what the Transformer is learning in each layer. A wave of recent work has started to “prob” the state of the art Tranformers to inspect the structure of the network to assess whether there exist localizable regions associated with distinct types of linguistic decisions, both syntactic and semantic information. Researchers examine the hidden states between encoder layers directly and use those hidden states in a linear layer + softmax to predict what kind of information in encoded in each hidden state.

Transformers 2

5 minute read

Published:

This blog post is the continuation of my previous blog post, Transformers. In my previous blog post, I explained original Transformer paper, BERT, GPT, XLNet, RoBERTa, ALBERT, BART, and AMBER. In this blog post, I will explain MARGE, ConveRT, Generalization through Memorization, AdapterHub, and T5. Images and content used in this blogpost, otherwise mentioned, are all taken from the papers on each model.

NLP Papers

2 minute read

Published:

These are the most important transformer papers (in my opinion) that anyone working with Transformers should know. Also, there is a nice summary of Efficient Transformers: A Survey by folks at Google that I highly recommend as well.

Transformers

15 minute read

Published:

Transformers: This post contains my notes throughout years on different transformers. These notes are very crude and not edited yet (more like my cheat sheets), but I thought to share it anyway. Please let me know if you have any comments or if you find any mistakes. Images used in this blogpost, otherwise mentioned, are all taken from the papers on each model.

Masked Language Modeling + Fine Tuning for Text Classification with BERT

less than 1 minute read

Published:

My Colab notebook on Masked Language Modeling (MLM) + Fine Tuning for Text Classification with BERT. In this notebook, you can see how to train a BERT model on your data for MLM task and then fine tune it for text classification. This includes how to encode the data, masked the tokens (similar to here) and train a model from scratch (or train on a pretrained model :). You can load this model and fine tuned it on your labeled data for classification.

XLNet

Transformers

15 minute read

Published:

Transformers: This post contains my notes throughout years on different transformers. These notes are very crude and not edited yet (more like my cheat sheets), but I thought to share it anyway. Please let me know if you have any comments or if you find any mistakes. Images used in this blogpost, otherwise mentioned, are all taken from the papers on each model.