Intro to "Embedding ... convert complex, high-dimensional data e.g. text, image etc. into lower-dimensional representations while preserving essential relationships and structure. Embedding is the knowledge for AI as it is produced, understood and used by ... AI systems."
After introducing "some well-known text embeddings" (from Word2Vec to GPT-x), the author sets out Why:
Which model to use to generate embeddings? "If you don’t bother to train your own language model... [just] call some API ... without worrying how it is done behind the screen, OpenAI GPT-3 Embedding API... text-embedding-ada-002 model... has the best performance" - it is usually near the top. It costs "0.04 cent for every 1000 tokens... ~4000 characters".
More Stuff I Like
More Stuff tagged ai , llm , embedding
See also: Digital Transformation , Innovation Strategy , Science&Technology , Large language models