What are embeddings?

What are embeddings

Understanding Natural language as mathematics aims are finding out ways to mathematically represent words or sentences as numbers such that representation of words or sentences which have similar meaning are closer together than words further apart. These days owing to neural networks and machine learning we’re able to develop more and more complex methods to represent these numbers which helps us

I have used a simple and light model(DistilBert) which can be run on everyday CPU to generate embeddings and showed how sentences which have similar meaning are closer together in space

The embedding generated using the set of sentences

As you’ll observe here, the Points representing similar ideas are closer together, Also a point to note would be the image that you’ll generate might not be exactly same everytime for the same set of embeddings

Point to note, for the mathematically curious. The plot I’m creating at the end is just a projection into 2D space done using UMAP of the embeddings. This method although robust has some assumptions which might or might not be met always, if you’re curious here’s a blog post on UMAP’s website on how it really works UMAP