As the adoption of Kubernetes continues to scale, organizations are finding themselves at the precipice of growth and operational challenges. Whether you’re deploying your first Kubernetes environment or scaling existing containerized workloads, understanding the nuances of configuration, sizing, and security is crucial. As cloud-native applications continue to evolve, even seasoned practitioners face unforeseen challenges: resource shortages, security vulnerabilities, and the complexities of managing multi-cloud environments.
In this session, we’ll discuss strategies aimed at enhancing the resilience and scalability of Kubernetes deployments. Our discussion will include practical implementations that have helped avoid common pitfalls and outages, ensuring that your infrastructure is production-ready.
Moreover, the session will explore the transition to multi-cloud strategies, as demonstrated by our case study with HourOne, which rapidly scaled its operations across multiple locations and cloud providers to address GPU availability challenges critical for GenAI applications. We will discuss strategic use of Google Kubernetes Engine (GKE) and the integration of new resources to optimize performance, enhance system stability, and ensure durability against availability issues.
Attendees will gain insights into making informed decisions on cloud resource utilization, navigating the complexities of multi-cloud environments, and continuously adapting to the dynamic requirements of cloud-native ecosystems.
Request to Attend
Speakers
Lev Andelman
Gil Zellner
Venue
Google Cloud HQ
98 Alon Yigal TEL AVIV-JAFFA, Central District, 6789141 Israel
Lev Andelman
Avoiding A Kubernetes Outage
You probably running dev environment or even some production. You probably think you’ve got it covered. Many organizations do. Until get your first Kubernetes outage. As Mike Tyson says “everybody got a plan until they get punch in the mouth”.
This can be due to many reasons, incorrect sizing, inaccurate configuration of autoscaling, missing CPU or memory headroom, poor security ,insufficient observability, and many other reasons. Growing organizations move from outage to outage enhancing and improving their Kubernetes do not know what will hit them next. Even for experienced in cloud native ecosystem organizations – their workload change, traffic and requirements change, new security attack get introduced. So, environment review, adjustment and redesign is an ongoing process.
Based on dozens of TeraSky customers we developed a methodology to identify, design and deploy resilient and production ready Kubernetes. In this session I would like to share with you a couple of concepts and their practical implementation from our methodology that might save you some of your next outages.
This session will be great for companies that are planning their first Kubernetes or companies that already have significant amount of workloads running in containerized environment.
Lev Andelman, CTO, Terasky
Gil Zellner
How to safely go multi cloud doing GenAI video on K8s (GKE) on GCP GPUs
Discover how we navigated GPU scarcity and regional capacity limits, enhancing our pipeline performance and system stability. In this session, we’ll share actionable insights on leveraging Google Cloud Platform resources efficiently, providing a roadmap for CTOs and tech leaders aiming to implement robust multi-cloud strategies in high-demand, AI-centric environments.
Gil Zellner, Chronicle SOAR Lab Team lead, Google
9:30 AM – 10:00 AM
Networking Breakfast & Speaker Intros
10:00 AM – 10:45 AM
Avoiding A Kubernetes Outage
Launching Kubernetes in 2023 is not a very complicated task. You probably planning to do it or already have it. You probably running dev environment or even some production. You probably think you’ve got it covered. Many organizations do. Until get your first Kubernetes outage. ( optional ) As Mike Tyson says “everybody got a plan until they get punch in the mouth”.
This can be due to many reasons, incorrect sizing, inaccurate configuration of autoscaling, missing CPU or memory headroom, poor security ,insufficient observability, and many other reasons. Growing organizations move from outage to outage enhancing and improving their Kubernetes do not know what will hit them next. Even for experienced in cloud native ecosystem organizations – their workload change, traffic and requirements change, new security attack get introduced. So, environment review, adjustment and redesign is an ongoing process.
Based on dozens of TeraSky customers we developed a methodology to identify, design and deploy resilient and production ready Kubernetes. In this session I would like to share with you a couple of concepts and their practical implementation from our methodology that might save you some of your next outages.
This session will be great for companies that are planning their first Kubernetes or companies that already have significant amount of workloads running in containerized environment.
10:45 AM – 11:30 AM
How to safely go multi cloud doing GenAI video on K8s (GKE) on GCP GPUs.
Case study on how HourOne’s strategic evolution from a single-region AWS setup to a resilient multi-cloud, multi-region framework using Google Kubernetes Engine and advanced GPU resources. Discover how we navigated GPU scarcity and regional capacity limits, enhancing our pipeline performance and system stability. In this session, we’ll share actionable insights on leveraging Google Cloud Platform resources efficiently, providing a roadmap for CTOs and tech leaders aiming to implement robust multi-cloud strategies in high-demand, AI-centric environments.
11:30 AM – 12:30 PM
Closing Comments & Buffet Lunch