基于KDTree的点间距分析

本文介绍利用KDTree完成最邻近点间距的计算。

Pasted-image-20241028103226.png-4f97f54ea4.png

最邻近点间距（Nearest Neighbor Distance，NND）是一个能够反映信号点聚集情况的指标，文献中比较常见¹ ² 。

KDTree是一种可以高效处理维空间信息的数据结构，具有二叉搜索树的形态，在结点数n远大于维数时，应用 k-D Tree 的时间效率很好。而我们成像进行空间距离分析，一般就是二维或者三维，所以使用kdtree性价比很高。在python中，scipy就已经提供了封装好的 kdtree 方法，使用起来非常方便。

对三维颗粒质心NND的分析代码如下：

1
from glob import glob
2
import pandas as pd
3
import numpy as np
4
from scipy.spatial import KDTree
5

6

7
def get_nearest_distance(points, n_neighbors=1):
8
    '''计算最近邻间距离
9
    points: dataframe对象，必须包含坐标信息，xyz三列都得有
10
    n_neighbors: 要纳入统计的邻近点数量
11
    '''
12
    data = points.copy()
13
    locs = data.filter(items=['x', 'y', 'z'])
14
    tree = KDTree(locs)
15
    pids = np.array(locs.index)
16
    pidns = []
17
    dists_mean = []
18
    dists_sd = []
19
    for idx in range(len(locs)):
20
        ploc = locs.iloc[idx]
21
        distances, inds = tree.query(ploc, k=1+n_neighbors)
22
        dist_mean = np.mean(distances[1:])
23
        dist_sd = np.std(distances[1:])
24
        dists_mean.append(dist_mean)
25
        dists_sd.append(dist_sd)
26
        ind = inds[1]
27
        pidn = pids[ind]
28
        pidns.append(pidn)
29
    data['Nearest Neighbor ID'] = pidns
30
    data[f'Nearest {n_neighbors} Neighbors Distance'] = dists_mean
31
    return data

这段代码中，将输入数据中的坐标信息创建一个 KDTree，然后再遍历每一个质心，搜索距离它最近的点。搜索到之后，会把相应的距离汇总到一个新的表格中。如果指定搜索的邻近点数量大于一，而计算多个邻近点的平均值。

具体用例如下：

1
data = pd.read_csv("surface_pipeline_collection.csv", index_col=0)
2

3
data2 = get_nearest_distance(data, n_neighbors=2)

Pasted-image-20241028103025.png-4e22c9e521.png

Quantitative single-protein imaging reveals molecular complex formation of integrin, talin, and kindlin during cell adhesion, Nature Communications, 2021. https://doi.org/10.1038/s41467-021-21142-2 ↩
Whole-cell imaging of plasma membrane receptors by 3D lattice light-sheet dSTORM, Nature Communications, 2020. https://doi.org/10.1038/s41467-020-14731-0 ↩

Footnotes