Curated Resource ( ? )

Text Embedding — What, Why and How?

my notes ( ? )

Intro to "Embedding ... convert complex, high-dimensional data e.g. text, image etc. into lower-dimensional representations while preserving essential relationships and structure. Embedding is the knowledge for AI as it is produced, understood and used by ... AI systems."

After introducing "some well-known text embeddings" (from Word2Vec to GPT-x), the author sets out Why:

  • for search, content recommendation & document clustering: find "relevant documents quickly in large databases, as [embeddings] represent the semantic meaning of the text."
  • save space, as embeddings are "more compact than storing raw text"
  • faster computes, better performance
  • "Transfer learning. Pre-trained text embeddings can be used as a starting point for training domain-specific models... [to] save time and ... resources... the new model benefits from the knowledge gained during the pre-training phase."
  • "work with multiple languages, as they can capture semantic similarities even across different languages."
  • "Interoperability... a common representation for different text sources and formats... easier to combine and process data from various sources.

Which model to use to generate embeddings? "If you don’t bother to train your own language model... [just] call some API ... without worrying how it is done behind the screen, OpenAI GPT-3 Embedding API... text-embedding-ada-002 model... has the best performance" - it is usually near the top. It costs "0.04 cent for every 1000 tokens... ~4000 characters".

Read the Full Post

The above notes were curated from the full post

Related reading

More Stuff I Like

More Stuff tagged ai , llm , embedding

Cookies disclaimer saves very few cookies onto your device: we need some to monitor site traffic using Google Analytics, while another protects you from a cross-site request forgeries. Nevertheless, you can disable the usage of cookies by changing the settings of your browser. By browsing our website without changing the browser settings, you grant us permission to store that information on your device. More details in our Privacy Policy.