Curated Resource ( ? )

Things we learned about LLMs in 2024

Things we learned about LLMs in 2024

my notes ( ? )

Simon Willison's "review of things we figured out about [LLMs] in the past twelve months, plus my attempt at identifying key themes and pivotal moments" has 19 major points:

  • "GPT-4 barrier was comprehensively broken": the year saw 18 organizations produce "models on the Chatbot Arena Leaderboard that rank higher than the original GPT-4 from March 2023 ... 70 models in total", with key themes including "increased context lengths ... [which] dramatically increase the scope of problems that can be solved", particularly coding problems
  • "Some of those GPT-4 models run on my [powerful but 2yr old] laptop... [although] they don’t leave much room for anything else ... Meta's Llama 3.2 models deserve a special mention... not GPT-4 class, but ... punch massively above their weight"
  • "LLM prices crashed, thanks to competition and increased efficiency", with the latter "really important ... cf environmental impact"
  • "Multimodal vision is common, audio and video are starting to emerge"
  • "Voice and live camera mode are science fiction come to life": GPT-4o (omni) intro'd this, and now with "ChatGPT voice mode ... share your camera feed with the model and talk about what you can see in real time."
  • "Prompt driven app generation is a commodity already... Claude Artifacts can write you an on-demand interactive application and then let you use it directly inside the Claude interface." Other teams followed: GitHub Spark, Mistral's Canvas...
  • "Universal access to the best models lasted for just a few short months": unlikely to return
  • "“Agents” still haven’t really happened yet": maybe because the the term has over 200 different meanings - SW's skepticism is "based... on the challenge of gullibility. LLMs believe anything you tell them. Any systems that attempts to make meaningful decisions on your behalf will run into the same roadblock: how good is a travel agent, or a digital assistant, or even a research tool if it can’t distinguish truth from fiction?... the most popular idea of “agents” [may be] dependent on AGI"
  • "Evals really matter... writing good automated evals for LLM-powered systems is... [essential] to build useful applications on top of these models... [with] a strong eval suite you can adopt new models faster, iterate better and build more reliable and useful products"
  • Apple Intelligence is bad, Apple’s MLX library is excellent
  • "The rise of inference-scaling “reasoning” models... exemplified by OpenAI’s o1 models ... think about ... [them as] an extension of the chain-of-thought prompting trick ... where you get a model to talk out loud about a problem it’s solving... o1 bakes it into the model itself ... take on harder problems by spending more compute on inference" rather than during training.
  • Was the best currently available LLM trained in China for less than $6m? ... DeepSeek v3 is a huge 685B parameter model... significantly bigger than the largest of Meta’s Llama series ... by far the highest ranking openly licensed model", and trained using 11x less GPU hours than Llama 3.1 405B - "some very effective training optimizations!"
  • "The environmental impact got better... [and] much, much worse:" increased efficiency cut energy usage, but "Google, Meta, Microsoft and Amazon are all spending billions of dollars rolling out new datacenters... Is this infrastructure necessary?" or will we end up with "financial crashes ... useful infrastructure and ... bankruptcies and environmental damage"
  • "The year of slop ... AI-generated content that is both unrequested and unreviewed."
  • "Synthetic training data works great:" model collapse is "clearly not happening ... AI labs increasingly train on synthetic content ... The days of just grabbing a full scrape of the web and indiscriminately dumping it into a training run are long gone."
  • "LLMs somehow got even harder to use... chainsaws disguised as kitchen knives... look deceptively simple to use ... [but] you need a huge depth of both understanding and experience to make the most of them and avoid their many pitfalls... Most users are thrown in at the deep end... [and so] develop wildly inaccurate mental models of how these things work and what they are capable of... [while] a lot of better informed people have sworn off LLMs entirely... [the] key skill ... is learning to work with tech that is both inherently unreliable and incredibly powerful at the same time."
  • "Knowledge is incredibly unevenly distributed... between [those] who actively follow this stuff and the 99% of the population who do not"
  • "LLMs need better criticism... There are plenty of reasons to dislike this technology ... The hype has been deafening ... enormous quantities of snake oil and misinformation ... A lot of very bad decisions are being made based on that ... [but] There is genuine value ... getting to that value is unintuitive and needs guidance."

Read the Full Post

The above notes were curated from the full post simonwillison.net/2024/Dec/31/llms-in-2024/?utm_source=pocket_reader.

Related reading

More Stuff I Like

More Stuff tagged ai , llm , overview , model collapse

See also: Digital Transformation , Innovation Strategy , Science&Technology , Large language models

Cookies disclaimer

MyHub.ai saves very few cookies onto your device: we need some to monitor site traffic using Google Analytics, while another protects you from a cross-site request forgeries. Nevertheless, you can disable the usage of cookies by changing the settings of your browser. By browsing our website without changing the browser settings, you grant us permission to store that information on your device. More details in our Privacy Policy.