Chapter 4. Task 1 — Causal Offline-to-Online Learning: 일반화된 정책 학습"Off-policy methods can significantly improve sample efficiency, since they allow an agent to learn from observed trajectories generated by different behavior policies, without directly deploying target policies in the underlying environment."— Zhang & Bareinboim (2025)4.1 문제 정의: L1(관측) + L2(개입) 데이터의 체계적 결합4.1.1 CRL Task 1의 위치CRL의 ..