Affordable Training of Large Language Models

Recent developments in large language models (LLMs) have caught the attention of the public. LLMs such as OpenAI's GPT-4 and Google's Bard are able to generate remarkably realistic, coherent text based on a user's input and have the potential to be general-purpose tools used throughout society e.g. for customer service, summarising text, answering questions, writing contracts or translating between languages.

However, LLMs are prohibitively expensive to train. GPT-3 (which is significantly smaller than its successor, GPT-4) has an estimated training time of 355-GPU years and an estimated training cost of $4.6M [1]. Only large, wealthy institutions can train these models and thereby control how they are trained and who gets access to them. This is undemocratic.

Very recent work provides hope however. In [2] the authors explore the promising idea of “cramming”: the training of a LLM on a single GPU in a day. In [3] the authors use synthetic data to train “small” language models that can produce consistent stories at little cost. There is a huge discrepancy in quality between these models and their expensive counterparts, however.

In this PhD, the student will investigate affordable LLM training i.e. with limited compute and/or data, inspired by [2,3]. Avenues of research could include (i) generating training data that facilitates fast training e.g. through dataset distillation [4]; (ii) exploring neural architecture search to develop models that are "aware" of being resource-constrained while being trained; (iii) developing novel cost-effective training algorithms, (iv) leveraging and tuning open-source LLMs.

The successful student will have opportunities for collaboration within and outside Edinburgh’s School of Engineering e.g. with colleagues in the Institute for Digital Communications, The Bayesian and Neural Systems Group, and Edinburgh NLP.

 

Further Information: 

The University of Edinburgh is committed to equality of opportunity for all its staff and students, and promotes a culture of inclusivity. Please see details here: https://www.ed.ac.uk/equality-diversity

https://elliotjcrowley.github.io/

https://www.bayeswatch.com

 

References

[1] https://lambdalabs.com/blog/demystifying-gpt-3

[2] https://arxiv.org/abs/2212.14034

[3] https://arxiv.org/abs/2305.07759

[4] https://arxiv.org/abs/1811.10959

Closing Date: 

Friday, September 1, 2023

Principal Supervisor: 

Assistant Supervisor: 

Eligibility: 

Minimum entry qualification - an Honours degree at 2:1 or above (or International equivalent) in a relevant science or engineering discipline, possibly supported by an MSc Degree. Further information on English language requirements for EU/Overseas applicants.

What we’re looking for

An enthusiastic, creative student with a solid academic record and strong programming skills. AI is an exciting, fast-moving field so be prepared to think on your feet (and at times laterally)!

Essential Criteria:

  • A 2.1 or higher numerate degree
  • Excellent communication skills
  • Programming skills

Desirable Criteria:

  • Knowledge of machine learning
  • An understanding of large language models
  • Experience programming in Python

Funding: 

Tuition fees + stipend are available for applicants who qualify as Home applicants (International students can apply, but the funding only covers the Home fee rate)

Further information and other funding options.

Informal Enquiries: