{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Ch4 配套 notebook：MACE 最小可跑示範（mock-first）\n",
    "\n",
    "**這份 notebook 教的是「訓練一個 MLIP 會經歷哪些步驟」，不是教你調 MACE 的參數。**\n",
    "\n",
    "設計原則：\n",
    "- **預設 mock 模式**：只用 `numpy` + `matplotlib`，任何電腦都能跑，不需要安裝 MACE。\n",
    "- 我們用一個**玩具勢能（Morse）當作「假 DFT」**產生 energy / forces，然後親手走一遍 MLIP 的訓練流程：\n",
    "  描述符 → train/val/test 切分 → 學習曲線 → 外推測試 → 「force = 能量的梯度」。\n",
    "- 最後一節是 **Colab 真跑開關**：如果你在 Colab 且想用真正的 MACE foundation model，照著做即可。\n",
    "\n",
    "> ⚠️ 這裡的能量數字沒有物理單位上的意義，是教學示意。真實流程的資料來自你的 VASP 計算（見最後一節與 `docs/chapters/ch02-dft-to-mlip.md`）。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 0. 環境偵測\n",
    "\n",
    "這個 cell 會偵測有沒有裝 `mace-torch`。沒有就自動進 mock 模式——**不會報錯中斷**。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "try:\n",
    "    import mace  # noqa: F401\n",
    "    HAS_MACE = True\n",
    "except Exception:\n",
    "    HAS_MACE = False\n",
    "\n",
    "rng = np.random.default_rng(20260611)\n",
    "print('MACE 已安裝？', HAS_MACE)\n",
    "print('→ 模式：', '可選擇真跑（見最後一節）' if HAS_MACE else 'MOCK（純 numpy，照樣學完整流程）')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. 假 DFT：用 Morse 勢能生成 (結構, 能量, 力)\n",
    "\n",
    "真實世界裡，這一步是「跑 VASP，收集 OUTCAR 的 energy 和 forces」。\n",
    "這裡我們用一條 5 原子的鏈，原子間以 Morse 勢能交互作用，當作我們的「真理（ground truth）」。\n",
    "\n",
    "Morse 勢能：$V(r) = D_e\\,(1 - e^{-a(r-r_0)})^2$，力是它對距離的解析微分。\n",
    "\n",
    "**重點**：一個 N 原子構型只有 **1 個能量標籤**，卻有 **3N 個力標籤**——力把勢能面的坡度也教給模型。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "De, a, r0 = 1.0, 1.7, 1.1   # Morse 參數（示意）\n",
    "N_ATOMS = 5\n",
    "\n",
    "def morse_energy_forces(pos):\n",
    "    \"\"\"pos: (N,3) 座標 -> (總能量, 每原子力 (N,3))\"\"\"\n",
    "    E = 0.0\n",
    "    F = np.zeros_like(pos)\n",
    "    for i in range(len(pos)):\n",
    "        for j in range(i+1, len(pos)):\n",
    "            d = pos[i] - pos[j]\n",
    "            r = np.linalg.norm(d)\n",
    "            ex = np.exp(-a*(r - r0))\n",
    "            E += De*(1 - ex)**2\n",
    "            # dV/dr = 2*De*a*ex*(1-ex)\n",
    "            dVdr = 2*De*a*ex*(1 - ex)\n",
    "            fij = -dVdr * (d/r)        # 作用在 i 上的力\n",
    "            F[i] += fij\n",
    "            F[j] -= fij\n",
    "    return E, F\n",
    "\n",
    "def make_config(scale=1.0, jitter=0.08):\n",
    "    \"\"\"生成一個沿 x 軸排列、帶隨機擾動的鏈狀構型；scale 控制整體鍵長（壓縮/拉伸）\"\"\"\n",
    "    base = np.zeros((N_ATOMS, 3))\n",
    "    base[:,0] = np.arange(N_ATOMS) * r0 * scale\n",
    "    base += rng.normal(0, jitter, size=base.shape)\n",
    "    return base\n",
    "\n",
    "# 生成資料集：scale 在 0.9~1.15 之間（接近平衡 ~ 略拉伸）\n",
    "n_data = 400\n",
    "scales = rng.uniform(0.9, 1.15, n_data)\n",
    "configs = [make_config(s) for s in scales]\n",
    "energies = np.array([morse_energy_forces(c)[0] for c in configs])\n",
    "print('資料筆數：', n_data)\n",
    "print('能量範圍：', energies.min().round(3), '~', energies.max().round(3))\n",
    "E0, F0 = morse_energy_forces(configs[0])\n",
    "print('第 0 個構型：能量 = %.3f（1 個標籤），力的形狀 = %s（3N=%d 個標籤）'%(E0, F0.shape, F0.size))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. 描述符（descriptor）：把結構變成固定長度向量\n",
    "\n",
    "神經網路吃固定長度向量，但原子座標的長度、順序、座標系都會變。\n",
    "真實的 MACE 用**等變圖神經網路**自動學描述符；這裡我們手刻一個最簡版的\n",
    "**對稱函數**：把所有原子對距離用一組高斯基底展開、加總。\n",
    "\n",
    "它天生滿足「平移/旋轉/原子交換不變」——這正是 Ch3、Ch4 講的 invariance。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "centers = np.linspace(0.8, 4.0, 16)   # 16 個高斯中心 (Å)\n",
    "width = 0.25\n",
    "\n",
    "def descriptor(pos):\n",
    "    rs = []\n",
    "    for i in range(len(pos)):\n",
    "        for j in range(i+1, len(pos)):\n",
    "            rs.append(np.linalg.norm(pos[i]-pos[j]))\n",
    "    rs = np.array(rs)\n",
    "    # 每個高斯中心：sum_pairs exp(-((r-c)/w)^2)\n",
    "    return np.exp(-((rs[:,None]-centers[None,:])/width)**2).sum(axis=0)\n",
    "\n",
    "X = np.array([descriptor(c) for c in configs])\n",
    "y = energies.copy()\n",
    "print('描述符矩陣 X 形狀：', X.shape, '（每個構型 ->', X.shape[1], '維向量）')\n",
    "print('驗證旋轉不變性：把構型旋轉 37°，描述符應幾乎不變')\n",
    "th = np.deg2rad(37); R = np.array([[np.cos(th),-np.sin(th),0],[np.sin(th),np.cos(th),0],[0,0,1]])\n",
    "d_before = descriptor(configs[0]); d_after = descriptor(configs[0] @ R.T)\n",
    "print('  最大差異 = %.2e（≈0 代表不變）' % np.abs(d_before-d_after).max())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. train / validation / test 切分\n",
    "\n",
    "- **train**：用來擬合模型\n",
    "- **validation**：調超參數時看它（這份簡單範例用來選 ridge 的正則化強度）\n",
    "- **test**：最終誠實成績，**只能用一次**\n",
    "\n",
    "紀律：validation 被你「看過」就不再公正；test 一旦拿來調參就污染了。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "idx = rng.permutation(n_data)\n",
    "n_tr, n_va = 240, 80     # 其餘 80 當 test\n",
    "tr, va, te = idx[:n_tr], idx[n_tr:n_tr+n_va], idx[n_tr+n_va:]\n",
    "print('train/val/test =', len(tr), len(va), len(te))\n",
    "\n",
    "def fit_ridge(Xtr, ytr, lam):\n",
    "    # 加一個常數項（對應 atomic reference energy E0）\n",
    "    A = np.hstack([Xtr, np.ones((len(Xtr),1))])\n",
    "    n_feat = A.shape[1]\n",
    "    w = np.linalg.solve(A.T@A + lam*np.eye(n_feat), A.T@ytr)\n",
    "    return w\n",
    "\n",
    "def predict(w, Xq):\n",
    "    A = np.hstack([Xq, np.ones((len(Xq),1))])\n",
    "    return A @ w\n",
    "\n",
    "def rmse(p, t):\n",
    "    return float(np.sqrt(np.mean((p-t)**2)))\n",
    "\n",
    "# 用 validation 選正則化強度 lambda\n",
    "lams = [1e-6, 1e-4, 1e-2, 1e-1, 1.0]\n",
    "val_err = []\n",
    "for lam in lams:\n",
    "    w = fit_ridge(X[tr], y[tr], lam)\n",
    "    val_err.append(rmse(predict(w, X[va]), y[va]))\n",
    "best = lams[int(np.argmin(val_err))]\n",
    "for lam,e in zip(lams,val_err):\n",
    "    print('  lambda=%.0e  val RMSE=%.4f %s' % (lam, e, '  <- 最佳' if lam==best else ''))\n",
    "print('選定 lambda =', best)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Parity plot：預測 vs 真值\n",
    "\n",
    "MLIP 論文最常見的圖。點越貼近對角線越好。**只看 test 的點**才算數。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "w = fit_ridge(X[tr], y[tr], best)\np_te = predict(w, X[te])\nprint('test RMSE = %.4f' % rmse(p_te, y[te]))\n\n# 註：圖的軸標籤用英文，避免 Colab/本機缺中文字型時顯示成方框；中文解說在 markdown 與 print。\nplt.figure(figsize=(4.6,4.4))\nlim = [y.min()-0.2, y.max()+0.2]\nplt.plot(lim, lim, 'k--', lw=1, label='ideal (y=x)')\nplt.scatter(y[te], p_te, s=22, alpha=0.7, label='test configs')\nplt.xlabel('true energy (arb.)'); plt.ylabel('predicted')\nplt.title('Parity plot (test set)'); plt.legend(); plt.tight_layout(); plt.show()"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. 學習曲線：要準備多少 DFT 資料？\n",
    "\n",
    "把訓練集大小逐步加大，看 test RMSE 怎麼降。\n",
    "在 log-log 圖上常是一條近似直線（誤差 ∝ N^(−k)）——這條經驗律幫你估「資料加倍能換到多少精度」。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "sizes = [10, 20, 40, 80, 160, 240]\ncurve = []\nfor m in sizes:\n    w_m = fit_ridge(X[tr[:m]], y[tr[:m]], best)\n    curve.append(rmse(predict(w_m, X[te]), y[te]))\n\nplt.figure(figsize=(5,3.8))\nplt.loglog(sizes, curve, 'o-')\nplt.xlabel('training set size N'); plt.ylabel('test RMSE')\nplt.title('Learning curve (log-log)'); plt.grid(True, which='both', alpha=0.3)\nplt.tight_layout(); plt.show()\nfor m,e in zip(sizes,curve):\n    print('  N=%3d  test RMSE=%.4f' % (m,e))"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. 外推風險：MLIP 最重要的安全課\n",
    "\n",
    "現在做一件「作弊」的事：**只用壓縮的構型訓練（scale 0.9~1.0），拿去測拉伸的構型（scale 1.2~1.4）**。\n",
    "這模擬真實研究裡「訓練集沒涵蓋的化學環境」——例如只訓練 bulk、拿去算表面。\n",
    "\n",
    "你會看到分布內很準、分布外**自信地錯**。這就是 Ch2/Ch5 講的 extrapolation 風險與 uMLIP 的 softening。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "# 分布內：壓縮構型\nin_scales = rng.uniform(0.9, 1.0, 200)\nin_cfg = [make_config(s) for s in in_scales]\nXin = np.array([descriptor(c) for c in in_cfg])\nyin = np.array([morse_energy_forces(c)[0] for c in in_cfg])\n\n# 分布外：拉伸構型（訓練時從沒看過）\nout_scales = rng.uniform(1.2, 1.4, 200)\nout_cfg = [make_config(s) for s in out_scales]\nXout = np.array([descriptor(c) for c in out_cfg])\nyout = np.array([morse_energy_forces(c)[0] for c in out_cfg])\n\nw_in = fit_ridge(Xin[:150], yin[:150], best)\nerr_in  = rmse(predict(w_in, Xin[150:]), yin[150:])\nerr_out = rmse(predict(w_in, Xout), yout)\nprint('分布內 (interpolation) test RMSE = %.4f' % err_in)\nprint('分布外 (extrapolation) test RMSE = %.4f  <-- 通常大得多' % err_out)\nprint('外推誤差是內插的 %.1f 倍' % (err_out/max(err_in,1e-9)))\n\nfig,ax = plt.subplots(1,2,figsize=(8,3.6))\npanels = [(Xin[150:],yin[150:],'in-distribution (trained range)'),\n          (Xout,yout,'out-of-distribution (stretched)')]\nfor a_,(Xq,yq,ttl) in zip(ax, panels):\n    pq = predict(w_in, Xq)\n    lim=[min(yq.min(),pq.min())-0.2, max(yq.max(),pq.max())+0.2]\n    a_.plot(lim,lim,'k--',lw=1); a_.scatter(yq,pq,s=16,alpha=0.6)\n    a_.set_title(ttl,fontsize=10); a_.set_xlabel('true'); a_.set_ylabel('predicted')\nplt.tight_layout(); plt.show()"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. 「force = −能量的梯度」\n",
    "\n",
    "MLIP 的力不是另外訓練的輸出，而是能量對座標自動微分得到的——這保證了能量守恆。\n",
    "這裡我們用數值微分驗證：移動一個原子，$-\\dfrac{\\Delta E}{\\Delta x}$ 應該等於解析力。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cfg = make_config(1.0)\n",
    "E_analytic, F_analytic = morse_energy_forces(cfg)\n",
    "\n",
    "h = 1e-5\n",
    "F_numeric = np.zeros_like(cfg)\n",
    "for i in range(len(cfg)):\n",
    "    for d in range(3):\n",
    "        cp = cfg.copy(); cp[i,d]+=h; Ep,_ = morse_energy_forces(cp)\n",
    "        cm = cfg.copy(); cm[i,d]-=h; Em,_ = morse_energy_forces(cm)\n",
    "        F_numeric[i,d] = -(Ep-Em)/(2*h)   # F = -dE/dx\n",
    "\n",
    "print('解析力與數值梯度的最大差異 = %.2e  (≈0 代表 F=-∇E 成立)' % np.abs(F_analytic-F_numeric).max())\n",
    "print('\\n原子 0 的力：解析 =', F_analytic[0].round(4), ' 數值 =', F_numeric[0].round(4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8. （Colab 真跑開關）用真正的 MACE foundation model\n",
    "\n",
    "上面全程沒碰 MACE，但你已經走完一個 MLIP 的完整心智模型。\n",
    "如果你在 **Google Colab** 且想看真正的 MACE：把下面 cell 解除註解執行。\n",
    "\n",
    "```python\n",
    "# !pip install mace-torch ase\n",
    "# from mace.calculators import mace_mp\n",
    "# from ase import build\n",
    "# atoms = build.molecule('H2O')\n",
    "# atoms.calc = mace_mp(model='small', device='cpu', default_dtype='float32')  # 第一次會下載權重\n",
    "# print('H2O 能量 (eV):', atoms.get_potential_energy())\n",
    "# print('力 (eV/Å):\\n', atoms.get_forces())\n",
    "```\n",
    "\n",
    "**坑提醒（見 `docs/repo-analysis/mace.md`）**：\n",
    "- 套件名是 `mace-torch`，不是 `MACE`。\n",
    "- 先照 pytorch.org 裝對 CUDA 版的 torch（torch 2.4.1 整版不支援）。\n",
    "- 訓練/微調用 `mace_run_train`；舊教學文的 `--swa` 已改名 `--stage_two`。\n",
    "- foundation model 的 license 分裂：MP-0/MPA-0 是 MIT，OMAT/OFF/MH 系列是 ASL（僅學術）。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if HAS_MACE:\n",
    "    print('偵測到 MACE。把上面 markdown 裡的程式碼貼到這裡即可真跑。')\n",
    "else:\n",
    "    print('目前是 MOCK 模式——你已經學完 MLIP 的完整流程，這正是這份 notebook 的目的。')\n",
    "    print('要真跑 MACE，請在 Colab 執行上一節的安裝指令。')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 我學完這份 notebook 應該能說出什麼\n",
    "\n",
    "1. MLIP 的訓練標籤是 DFT 的能量與力；**力提供 3N 倍密度的訊號**。\n",
    "2. 描述符要滿足平移/旋轉/原子交換不變（我們手刻的對稱函數驗證過旋轉不變）。\n",
    "3. train/val/test 各自的角色，以及 **test 只能用一次** 的紀律。\n",
    "4. 學習曲線怎麼讀，能估「資料加倍換到多少精度」。\n",
    "5. **外推風險**：分布外模型會自信地錯——這是把 bulk 模型用到表面時的核心危險。\n",
    "6. 力是能量的梯度（$F=-\\nabla E$），不是另外訓練的輸出。\n",
    "\n",
    "下一步：`docs/chapters/ch04-mace.md` 的 Level 3，與官方 4 份 Colab tutorial。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.x"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}