I came across some questions/Articles related to BERT-embeddings: Why do BERT embeddings work so well for machine learning? Do pre-trained embeddings as ELMo and BERT work well on noisy data like tweets? Demystifying BERT: The Groundbreaking NLP Framework Understanding BERT: Is it a Game Changer in NLP, and many more… …