基于KDTree的点间距分析

谢晓冬

2024 年 10 月 28 日

364 次浏览

暂无评论

2081字数

二维 python进阶

本文介绍利用KDTree完成最邻近点间距的计算。

最邻近点间距（Nearest Neighbor Distance，NND）是一个能够反映信号点聚集情况的指标，文献中比较常见¹ ² 。

KDTree是一种可以高效处理维空间信息的数据结构，具有二叉搜索树的形态，在结点数n远大于维数时，应用 k-D Tree 的时间效率很好。而我们成像进行空间距离分析，一般就是二维或者三维，所以使用kdtree性价比很高。在python中，scipy就已经提供了封装好的 kdtree 方法，使用起来非常方便。

对三维颗粒质心NND的分析代码如下：

from glob import glob 
import pandas as pd
import numpy as np
from scipy.spatial import KDTree

def get_nearest_distance(points, n_neighbors=1):
    '''计算最近邻间距离 
    points: dataframe对象，必须包含坐标信息，xyz三列都得有
    n_neighbors: 要纳入统计的邻近点数量
    '''
    data = points.copy()
    locs = data.filter(items=['x', 'y', 'z'])
    tree = KDTree(locs)
    pids = np.array(locs.index)
    pidns = []
    dists_mean = []
    dists_sd = []
    for idx in range(len(locs)):
        ploc = locs.iloc[idx]
        distances, inds = tree.query(ploc, k=1+n_neighbors)
        dist_mean = np.mean(distances[1:])
        dist_sd = np.std(distances[1:])
        dists_mean.append(dist_mean)
        dists_sd.append(dist_sd)
        ind = inds[1]
        pidn = pids[ind]
        pidns.append(pidn)
    data['Nearest Neighbor ID'] = pidns
    data[f'Nearest {n_neighbors} Neighbors Distance'] = dists_mean
    return data

这段代码中，将输入数据中的坐标信息创建一个 KDTree，然后再遍历每一个质心，搜索距离它最近的点。搜索到之后，会把相应的距离汇总到一个新的表格中。如果指定搜索的邻近点数量大于一，而计算多个邻近点的平均值。

具体用例如下：

data = pd.read_csv("surface_pipeline_collection.csv", index_col=0)
data2 = get_nearest_distance(data, n_neighbors=2)

Quantitative single-protein imaging reveals molecular complex formation of integrin, talin, and kindlin during cell adhesion, Nature Communications, 2021. https://doi.org/10.1038/s41467-021-21142-2 ↩
Whole-cell imaging of plasma membrane receptors by 3D lattice light-sheet dSTORM, Nature Communications, 2020. https://doi.org/10.1038/s41467-020-14731-0 ↩

基于KDTree的点间距分析