A Practical Guide to Gaining Value From LLMs

1. Credit for this example is due to X user Dean Buono (@deanbuono); credit for subsequent examples in this section is due to Colin Fraser (@colin_fraser).

2. L. Berglund, M. Tong, M. Kaufmann, et al., “The Reversal Curse: LLMs Trained on ‘A Is B’ Fail to Learn ‘B Is A,’” arXiv, submitted Sept. 21, 2023, https://arxiv.org.

3. E. Mollick, “Google’s Gemini Advanced: Tasting Notes and Implications,” One Useful Thing, Feb. 8, 2024, www.oneusefulthing.org.

4. “Retrival-Augmented Generation,” Wikipedia, accessed Oct. 22, 2024, https://en.wikipedia.org.

5. P. Béchard and O.M. Ayala, “Reducing Hallucination in Structured Outputs via Retrieval-Augmented Generation,” arXiv, submitted April 12, 2024, https://arxiv.org.

6. “Industrial-Strength LLM,” The Batch, Aug. 30, 2023, www.deeplearning.ai.

7. X. Xu, M. Li, C. Tao, et al., “A Survey on Knowledge Distillation of Large Language Models,” arXiv, submitted Feb. 20, 2024, https://arxiv.org.

8. S. Mukherjee, A. Mitra, G. Jawahar, et al., “Orca: Progressive Learning From Complex Explanation Traces of GPT-4,” arXiv, submitted June 5, 2023, https://arxiv.org.

9. E. Brynjolfsson, T. Mitchell, and D. Rock, “What Can Machines Learn, and What Does It Mean for Occupations and the Economy?” AEA Papers and Proceedings 108 (May 2018): 43-47.

10. E. Yan, B. Bischof, C. Frye, et al., “What We Learned From a Year of Building With LLMs (Part 1),” O’Reilly, May 28, 2024, www.oreilly.com.

11. J. Wei, X. Wang, D. Schuurmans, et al., “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” arXiv, submitted Jan. 8, 2022, https://arxiv.org.

12. Wei et al., “Chain-of-Thought Prompting.”

13. H.W. Chung, L. Hou, S. Longpre, et al. “Scaling Instruction-Finetuned Language Models,” preprint, arXiv, revised Dec. 6, 2022, https://arxiv.org.

14. S. Ranger, “Most Developers Will Soon Use an AI Pair Programmer — but the Benefits Aren’t Black and White,” ITPro, April 16, 2024, www.itpro.com.

15. H. Hamel, “Your AI Product Needs Evals,” Husain Hamel (blog), https://hamel.dev; E. Yan, “Task-Specific LLM Evals That Do & Don’t Work,” Eugene Yan (blog), https://eugeneyan.com; and L. Zheng, W.-L. Chiang, Ying Sheng, et al., “Judging LLM-as-a-Judge With MT-Bench and Chatbot Arena,” arXiv, submitted June 9, 2023, https://arxiv.org.

“The MIT Sloan Management Review is a research-based magazine and digital platform for business executives published at the MIT Sloan School of Management.”

Please visit the firm link to site