How Cohere uses Ray along with JAX and TPUv4 to train Large Language Models
Dec 2022
Blog I helped write at Anyscale. The FAX system at Cohere enables distributed training of Large Language Model using JAX's pjit tensor parallelism, TPUv4s' pods of accelerators with high-speed interconnect, and Ray's powerful yet simple worker orchestration.