搜索复习

搜索复习，但是 Rust。

在描述各类搜索算法之前，我们先给出一个通用的问题定义：

trait Problem {
    type State: State + Trans<Self::Action>;
    type Action: Clone;
    fn init_state(&self) -> Self::State;
    fn actions_of(&self, s: &Self::State) -> Vec<Self::Action>;
}

A* 算法

A* 算法的框架其实就是 Dijkstra 算法求解最短路。在这个算法中，问题的每个状态都有一个估价，
我们希望求出满足条件的代价最小的状态。为此我们定义搜索树上的结点为

trait Node<P: Problem>: Sized + Copy {
    fn create(s: P::State) -> Self;
    fn state(&self) -> &P::State;
    fn extend(&self, a: P::Action) -> Self;
}

接下来我们定义维护开结点的数据结构

trait FrontierSet<P: Problem, N: Node<P>> {
    fn new() -> Self;
    fn remove_with_state(&self, s: &P::State) -> Option<N>;
    fn empty(&self) -> bool;
    fn push(&mut self, node: N);
    fn pop(&mut self) -> N;
}

然后就可以描述 A* 算法为：

fn a_star_search<P: Problem, N: Node<P> + Cost, Q: FrontierSet<P, N>>(p: P) -> Option<N> {
    let mut frontier = Q::new();
    frontier.push(Node::create(p.init_state()));
    let mut explored: HashSet<P::State> = HashSet::new();
    loop {
        if frontier.empty() {
            break None;
        }
        let node = frontier.pop();
        if node.state().is_terminal() {
            break Some(node);
        }
        explored.insert(node.state().clone());
        for action in p.actions_of(node.state()) {
            let child = node.extend(action);
            if !explored.contains(child.state()) {
                frontier.push(child)
            } else {
                if let Some(frontier_node) = frontier.remove_with_state(child.state()) {
                    if child.cost() < frontier_node.cost() {
                        frontier.push(child)
                    } else {
                        frontier.push(frontier_node)
                    }
                }
            }
        }
    }
}

UCS 只根据之前花费的代价 $g(n)$ 搜索，贪婪搜索只根据当前状态到达目标的估价 $h(n)$ 搜索。
而 A* 算法则同时考虑这两者，也就是说 A* 算法中一个状态的估价为 $f(n) = g(n) + h(n)$ 。
此时的 $f(n)$ 被称为启发式函数。

只要保证对未来的估计 $h(n) \le$ 未来的真实代价，就能证明 A* 会找到最优解（可采纳）。

爬山算法

朴素的爬山算法会选择走到当前相邻状态中估价最高的状态，然后继续爬山。这里给出一些辅助的 trait：

trait ArgMax<T> {
    fn arg_max(&self, f: impl Fn(&T) -> f64) -> Option<&T>;
}

以及对应的实现

impl<T> ArgMax<T> for Vec<T> {
    fn arg_max(&self, f: impl Fn(&T) -> f64) -> Option<&T> {
        self.iter()
            .max_by(|a, b| f(*a).partial_cmp(&f(*b)).unwrap())
    }
}

这样我们就可以描述爬山算法了：

fn hill_climbing<P: Problem, N: Node<P> + Cost>(p: P) -> N {
    let mut cur: N = Node::create(p.init_state());
    loop {
        let neighbor = p
            .actions_of(cur.state())
            .into_iter()
            .map(|a| cur.extend(a))
            .collect::<Vec<N>>()
            .arg_max(Cost::cost)
            .cloned();
        if let Some(neighbor) = neighbor {
            if neighbor.cost() <= cur.cost() {
                break;
            }
            cur = neighbor
        }
    }
    cur
}

但可以明显发现上述算法不是很完备。为此衍生了很多变种：

随机平移：遇到平台时不直接退出，而是随机跳到一个邻居上（有一个次数上限）。
随机爬山：随机跳到更好的邻居，不要求最好。
第一选择爬山：跳到第一个更好的邻居，减少计算量。
随机重启：搜索到一定程度没有进展后，随机一个初始位置重新开始。

模拟退火

模拟退火可以看作是爬山算法的一个激进的变种：它不保证跳到比当前更优的状态，而是以一定概率跳到更劣的状态：

fn simulated_annealing<P: Problem, N: Node<P> + Cost>(p: P, temperature: impl Fn(i32) -> f64) -> N {
    let mut rng = thread_rng();
    let mut cur: N = Node::create(p.init_state());
    for t in 1.. {
        let te = temperature(t);
        if te <= 1e-9 {
            break;
        }
        let nex = p
            .actions_of(cur.state())
            .random_pick(&mut rng)
            .map(|a| cur.extend(a.clone()));
        if nex.is_none() {
            break;
        }
        let nex = nex.unwrap();
        let delta_e = nex.cost() - cur.cost();
        if delta_e > 0. || rng.gen_bool((delta_e / te).exp()) {
            cur = nex
        }
    }
    cur
}

约束满足问题与最小冲突算法。

上述搜索算法对问题的建模是以状态-操作的模式进行的，这里我们考虑另一种形式的建模：Constraint Satisfaction Problem。

我们有 $n$ 个变量 $x_1, \ldots, x_n$ 和 $m$ 个约束 $c_1, \ldots, c_m$ ，每个变量有自己的值域。我们要给变量一组赋值使得 $m$ 个约束均被满足。

最小冲突算法可以求解约束满足问题（或近似解）。基本思路很简单：对于存在不满足约束（冲突）的变量，我们将其更新为其值域中冲突个数（估价）最小的那个值。如此反复。

Minimax 搜索

Minimax 搜索常用于解决双人零和博弈问题。

直观理解是，我们对每个局面有一个估价（1 为先手胜 -1 为先手负）。我们总是希望走到估价最小的后继局面（这代表让对手难以获胜）

#[doc = " 计算当前局面的估价"]
fn minimax_evaluate<P: Problem, N: Node<P> + Cost>(p: &P, node: N) -> f64 {
    let acts = p.actions_of(node.state());
    if acts.is_empty() {
        return node.cost();
    }
    let mut r = -1.;
    for a in acts {
        let nex = node.extend(a);
        let c = -minimax_evaluate(p, nex);
        r = f64::max(r, c);
    }
    r
}

Alpha-Beta 剪枝

Minimax 搜索中，先取最大值后取最小值的时候，可以发现当内层局面的估价小于当前外层最大值时，将不会对结果产生更新，
因此可以直接返回。先取最小值后取最大值同理。

进一步思考可以发现，这其实就是在搜索的过程中给了一组上下界的限制。

严谨一点，考虑 minimax 搜索的数学形式：

$\begin{aligned} E(u) &= \max_{u \to v} \{ -E(v) \} \\ &= \max_{u \to v} \{ - \max_{v \to w} \{ -E(w) \} \} \end{aligned}$

假设对 $E(u)$ 的上下界是 $\alpha_u < E(u) < \beta_u$ 。那么有

$\alpha_u < \max_{u \to v} \{ -E(v) \} < \beta_u$

所以只保留上下界之间的部分即可。于是 $\alpha_u < -E(v) < \beta_u$ 。
此外在枚举 $v'$ 时我们设之前求出的最大值为 $r$
那么我们当然要求 $r < -E(v')$ ，不然这个 $E(v')$ 求出来就不会对答案有更新，
即 $E(v') < -r$ 。因此我们有

$-\beta_u < E(v') < \min(-r, -\alpha_u)$

fn alpha_beta_minimax<P: Problem, N: Node<P> + Cost>(p: &P, node: N, alpha: f64, beta: f64) -> f64 {
    let acts = p.actions_of(node.state());
    if acts.is_empty() {
        return node.cost();
    }
    let mut r = -1.;
    for a in acts {
        let nex = node.extend(a);
        let c = -alpha_beta_minimax(p, nex, -beta, f64::min(-r, -alpha));
        if c >= beta || c <= alpha {
            return r;
        }
        r = f64::max(r, c);
    }
    r
}

Monte Carlo 树搜索

蒙特卡洛树搜索算法利用随机采样与环境交互，利用探索-利用平衡进行学习。我们先来看一下它的代码框架：

fn search<P: Problem, N: MctsNode<P>>(p: &P, root: N, c: f64, lim: i32) -> N {
    for _ in 1..lim {
        let v = tree_select(root, c);
        let s = default_policy(p, v.state().clone());
        back_up(v, s)
    }
    best_child(root, 0.).unwrap()
}

这里我们定义了一个新的结点类型：

trait MctsNode<P>
where
    P: Problem,
    Self: super::Node<P>,
{
    fn expand(&self) -> Option<Self>;
    fn childs(&self) -> Vec<Self>;
    fn parent(&self) -> Option<Self>;
    fn visits(&self) -> f64;
    fn win_rate(&self) -> f64;
    fn update(&self, s: &P::State);
}

MCTS 首先按照 exploration-exploitation tradeoff 的权重公式：

$\arg\max_{v \to v'} \frac{Q(v')}{N(v')} + c \sqrt{ \frac{2 \ln N(v)}{ N(v)} \frac{N(v)}{N(v')} }$

从根节点开始，从上至下选择值得探索的状态：

fn tree_select<P: Problem, N: MctsNode<P>>(root: N, c: f64) -> N {
    let mut v: N = root;
    while !v.state().is_terminal() {
        if let Some(u) = v.expand() {
            return u;
        }
        v = best_child(v, c).unwrap();
    }
    v
}

fn best_child<P: Problem, N: MctsNode<P>>(u: N, c: f64) -> Option<N> {
    let h = |v: &N| v.win_rate() + c * (2. * u.visits().ln() / v.visits()).sqrt();
    u.childs().arg_max(h).cloned()
}

然后从这个状态出发，进行随机模拟（默认策略）完成一局游戏：

fn default_policy<P: Problem>(p: &P, s: P::State) -> P::State {
    let mut s = s;
    let mut rng = thread_rng();
    while !s.is_terminal() {
        let a = p.actions_of(&s).random_pick(&mut rng).unwrap().clone();
        s = s.trans(a)
    }
    s
}

然后将结果向上传播，更新路径上的结点信息：

fn back_up<P: Problem, N: MctsNode<P>>(v: N, s: P::State) {
    let mut maybe_v = Some(v);
    while let Some(v) = maybe_v {
        v.update(&s);
        maybe_v = v.parent();
    }
}

如此重复。最后搜素到一定程度后，通过 best_child 选择最终确定的决策。此时我们将探索参数置为 0，表示完全依赖我们搜索得到的信息。

文章目录