Curated Resource ( ? )

Text Embedding — What, Why and How?

Like ai , llm , embedding

Curated: 26/04/2023 from medium.com/@yu-joshua/text-embedding-what-why-and-how-13227e983ba7

my notes ( ? )

Intro to "Embedding ... convert complex, high-dimensional data e.g. text, image etc. into lower-dimensional representations while preserving essential relationships and structure. Embedding is the knowledge for AI as it is produced, understood and used by ... AI systems."

After introducing "some well-known text embeddings" (from Word2Vec to GPT-x), the author sets out Why:

for search, content recommendation & document clustering: find "relevant documents quickly in large databases, as [embeddings] represent the semantic meaning of the text."
save space, as embeddings are "more compact than storing raw text"
faster computes, better performance
"Transfer learning. Pre-trained text embeddings can be used as a starting point for training domain-specific models... [to] save time and ... resources... the new model benefits from the knowledge gained during the pre-training phase."
"work with multiple languages, as they can capture semantic similarities even across different languages."
"Interoperability... a common representation for different text sources and formats... easier to combine and process data from various sources.

Which model to use to generate embeddings? "If you don’t bother to train your own language model... [just] call some API ... without worrying how it is done behind the screen, OpenAI GPT-3 Embedding API... text-embedding-ada-002 model... has the best performance" - it is usually near the top. It costs "0.04 cent for every 1000 tokens... ~4000 characters".

Read the Full Post

The above notes were curated from the full post medium.com/@yu-joshua/text-embedding-what-why-and-how-13227e983ba7.

Text Embedding — What, Why and How?

my notes ( ? )

Read the Full Post

Related reading

Cookies disclaimer