Changelog

Kubernetes Icon blog.openai.com

Scaling Kubernetes to 2,500 Nodes

logged by @adamstac 2018-01-21T05:59:02.786613Z permalink #kubernetes #data-science

Are you really pushing Kubernetes? No? OpenAI is...

We’ve been running Kubernetes for deep learning research for over two years. While our largest-scale workloads manage bare cloud VMs directly, Kubernetes provides a fast iteration cycle, reasonable scalability, and a lack of boilerplate which makes it ideal for most of our experiments. We now operate several Kubernetes clusters (some in the cloud and some on physical hardware), the largest of which we’ve pushed to over 2,500 nodes. This cluster runs in Azure on a combination of D15v2 and NC24 VMs.

0:00 / 0:00