"A Predictive Law for On-Policy Self-Distillation From World Feedback" SUMMARY coming soon :) ICML 2026 paper