
Foundation Model Training
From Infrastructure to Evaluation, Debugging & Optimization

Virtual Summit, April 30th
Join us for Foundation Model Training: From Infrastructure to Evaluation, Debugging & Optimization
Join us for an exclusive technical summit where leading foundation-model researchers and practitioners converge to tackle real-world challenges in foundation model training.
This immersive event bridges theory and practice, offering AI researchers and practitioners training foundation models a rare opportunity to exchange battle-tested approaches for infrastructure scaling, debugging model internals, evaluation, and optimization.
Focus Areas
1) Infrastructure Debugging & Monitoring
- Diagnosing performance bottlenecks in multi-GPU / multi-node setups
- Instrumenting pipelines for deep observability (profiling GPU utilization, data flow, etc.)
- Correlating infrastructure metrics with model states (loss, gradients) in real time
- Failure detection and recovery strategies in distributed or HPC environments
2) Model Internals & Debugging
- Techniques for analyzing attention and activation patterns (layer-by-layer visualizations)
- Identifying and fixing gradient issues (vanishing, exploding, partial inactivity)
- Debugging architectural or layer-level bottlenecks
- Leveraging interpretability to guide early-phase debugging (during pre-training)
3) Evaluation
- Designing targeted test sets and adversarial evaluations for foundation models
- Error analysis frameworks to uncover overlooked failures or biases
- Establishing benchmarks for generalization, robustness, and emergent capabilities
- Integrating evaluation signals back into hyperparameter tuning and model iteration
4) Pre-Training Optimization
- Hyperparameter optimization at foundation-model scale (e.g., population-based training)
- Data pipeline throughput (streaming, multi-threaded I/O, sharding)
- Memory-saving strategies for large context windows (activation checkpointing, gradient sharding)
- Accelerating convergence (curriculum learning, dynamic batching, advanced scheduling)
Tickets
Join Our Community
Our goal is to provide an open, inclusive community of ML practitioners who can share projects, best practices and case studies. Join our open group, meet our community and share your work with practitioners from around the world.
Join us here and learn more: