Curated Resource ( ? )

BLOOM 176B — how to run a real LARGE language model in your own cloud?

my notes ( ? )

How to set up BLOOM on your own cloud.

"BigScience Large Open-science Open-access Multilingual Language Model:

  • free transformer-based language model created by 1000 researchers
  • trained on about 1,6 TB pre-processed multilingual text.
  • biggest BLOOM model in parameters is 176B = ~GPT-3 scale
  • smaller models available: 7b, 3b, 1b7
  • Needs 360 GB of RAM ,,, but "Microsoft has provided a downsampled variant with INT8 weights (from original FLOAT16 weights) that runs on the DeepSpeed Inference engine and uses tensor paralellism.... tensors are split into 8 shards. So ... absolute model size is reduced and ... split and parallelized and can thus be distributed over 8 GPUs."

He then provides instructions for hosting on AWS as "it provides a SageMaker setup for a Deep Learning container capable of initializing the model... [but] get the instances through support, you can’t do it by self configuring... hosted model can be loaded from the Microsoft repository on Huggingface into an S3 ... [it's] 180 GB... costs about $32 per hour when running... you can start it in about 18 min, shutting down and freeing the resources takes seconds... We put a custom API gateway and lambda function in the interface on top of the Sagemaker endpoint that allows users to connect externally with an API key".

Read the Full Post

The above notes were curated from the full post

Related reading

More Stuff I Like

More Stuff tagged guide , llm , bloom , cloud

Cookies disclaimer saves very few cookies onto your device: we need some to monitor site traffic using Google Analytics, while another protects you from a cross-site request forgeries. Nevertheless, you can disable the usage of cookies by changing the settings of your browser. By browsing our website without changing the browser settings, you grant us permission to store that information on your device. More details in our Privacy Policy.