xarray指南:索引和选择数据 - 最近邻查找

目录

本文翻译自 xarray 官方文档 Indexing and selecting data 的部分内容。

本文使用之前文章《xarray指南:索引和选择数据 - 位置和名称索引》中的示例数据,不再重复介绍。

本文介绍如何使用 xarray 实现最近邻查找。

介绍

基于标签的选择方法 sel()reindex()reindex_like() 都支持 methodtolerance 关键字参数。 method 参数允许使用 padbackfillnearest 方法启用最近邻(inexact)查找:

构造一个一维数组

da = xr.DataArray(
    [1, 2, 3],
    [("x", [0, 1, 2])]
)
da
<xarray.DataArray (x: 3)>
array([1, 2, 3])
Coordinates:
  * x        (x) int64 0 1 2

选择给定坐标的最近邻点

da.sel(x=[1.1, 1.9], method="nearest")
<xarray.DataArray (x: 2)>
array([2, 3])
Coordinates:
  * x        (x) int64 1 2

使用 backfill 选择下一个有效值

da.sel(x=0.1, method="backfill")
<xarray.DataArray ()>
array(2)
Coordinates:
    x        int64 1

按照给定的坐标重新构造数组。

da.reindex(
    x=[0.5, 1, 1.5, 2, 2,5],
    method="pad",
)
<xarray.DataArray (x: 6)>
array([1, 2, 2, 3, 3, 3])
Coordinates:
  * x        (x) float64 0.5 1.0 1.5 2.0 2.0 5.0

tolerance 限制了使用不精确查找进行有效匹配的最大距离:

da.reindex(
    x=[1.1, 1.5],
    method="nearest",
    tolerance=0.2,
)
<xarray.DataArray (x: 2)>
array([ 2., nan])
Coordinates:
  * x        (x) float64 1.1 1.5

如果 .sel() 的任何参数是切片对象,则尚不支持 method 参数:

# 会抛出异常
da.sel(x=slice(1, 3), method="nearest")

但是,您无需使用 method 来进行不精确的切片。 只要索引标签是单调递增的,切片就已经返回范围内(包括)的所有值:

da.sel(x=slice(0.9, 3.1))
<xarray.DataArray (x: 2)>
array([2, 3])
Coordinates:
  * x        (x) int64 1 2

只要 slice.loc 参数也减小,带有单调递减标签的索引轴也可以工作:

reversed_da = da[::-1]
reversed_da.loc[3.1:0.9]
<xarray.DataArray (x: 2)>
array([3, 2])
Coordinates:
  * x        (x) int64 2 1

如果要沿坐标插值而不是查找最近邻,请使用 interp()interp_like()

实战

t850.sel(latitude=[80, 90], method="nearest")
<xarray.DataArray 't' (latitude: 2, longitude: 1440)>
array([[263.32236, 263.37234, 263.43234, ..., 263.20233, 263.25235, 263.28235],
       [249.19234, 249.16234, 249.16234, ..., 249.15234, 249.19234, 249.14235]],
      dtype=float32)
Coordinates:
    time           datetime64[ns] ...
    step           timedelta64[ns] ...
    isobaricInhPa  int64 ...
  * latitude       (latitude) float64 80.12 89.88
  * longitude      (longitude) float64 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
    valid_time     datetime64[ns] ...
Attributes:
    GRIB_paramId:                             130
    ...skip...
    standard_name:                            air_temperature
t850.sel(latitude=50, method="backfill")
<xarray.DataArray 't' (longitude: 1440)>
array([275.05234, 275.04236, 275.18234, ..., 275.81235, 275.46234, 275.19235],
      dtype=float32)
Coordinates:
    time           datetime64[ns] ...
    step           timedelta64[ns] ...
    isobaricInhPa  int64 ...
    latitude       float64 49.88
  * longitude      (longitude) float64 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
    valid_time     datetime64[ns] ...
Attributes:
    GRIB_paramId:                             130
    ...skip...
    standard_name:                            air_temperature
t850.reindex(latitude=[10, 20, 30, 40, 50], method="pad")
t850.reindex(latitude=[10, 20, 30, 40, 50], method="pad")
t850.reindex(latitude=[10, 20, 30, 40, 50], method="pad")
<xarray.DataArray 't' (latitude: 5, longitude: 1440)>
array([[295.08234, 295.09235, 295.07236, ..., 295.05234, 295.03235, 295.04236],
       [288.99234, 289.02234, 289.05234, ..., 288.86234, 288.91235, 288.95233],
       [280.86234, 281.12234, 281.23233, ..., 279.51233, 279.93234, 280.38235],
       [276.71234, 276.92233, 276.90234, ..., 276.45233, 276.51233, 276.61234],
       [274.84235, 274.85236, 274.94235, ..., 275.43234, 275.14233, 274.94235]],
      dtype=float32)
Coordinates:
  * latitude       (latitude) int64 10 20 30 40 50
    time           datetime64[ns] 2020-03-18
    step           timedelta64[ns] 4 days 09:00:00
    isobaricInhPa  int64 850
  * longitude      (longitude) float64 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
    valid_time     datetime64[ns] 2020-03-22T09:00:00
Attributes:
    GRIB_paramId:                             130
    ...skip...
    standard_name:                            air_temperature
t850.reindex(
    latitude=[10, 10.2, 10.5], 
    method="nearest",
    tolerance=0.1
)
<xarray.DataArray 't' (latitude: 3, longitude: 1440)>
array([[      nan,       nan,       nan, ...,       nan,       nan,
              nan],
       [295.08234, 295.09235, 295.07236, ..., 295.05234, 295.03235,
        295.04236],
       [      nan,       nan,       nan, ...,       nan,       nan,
              nan]], dtype=float32)
Coordinates:
  * latitude       (latitude) float64 10.0 10.2 10.5
    time           datetime64[ns] 2020-03-18
    step           timedelta64[ns] 4 days 09:00:00
    isobaricInhPa  int64 850
  * longitude      (longitude) float64 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
    valid_time     datetime64[ns] 2020-03-22T09:00:00
Attributes:
    GRIB_paramId:                             130
    ...skip...
    standard_name:                            air_temperature

latitude 已经按照从大到小排序。

t850.sel(latitude=slice(30, 20))
<xarray.DataArray 't' (latitude: 40, longitude: 1440)>
array([[281.30234, 281.41235, 281.63235, ..., 280.18234, 280.52234, 280.92233],
       [281.55234, 281.72235, 282.12234, ..., 280.71234, 280.96234, 281.31235],
       [281.73233, 282.06235, 282.68234, ..., 281.19235, 281.37234, 281.56235],
       ...,
       [288.52234, 288.58234, 288.62234, ..., 288.38235, 288.42233, 288.47235],
       [288.74234, 288.78235, 288.81235, ..., 288.60236, 288.65234, 288.70233],
       [288.99234, 289.02234, 289.05234, ..., 288.86234, 288.91235, 288.95233]],
      dtype=float32)
Coordinates:
    time           datetime64[ns] ...
    step           timedelta64[ns] ...
    isobaricInhPa  int64 ...
  * latitude       (latitude) float64 29.88 29.62 29.38 ... 20.62 20.38 20.12
  * longitude      (longitude) float64 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
    valid_time     datetime64[ns] ...
Attributes:
    GRIB_paramId:                             130
    ...skip...
    standard_name:                            air_temperature

检索北京基本站(54511)最近邻的温度数据。

t850.sel(
    latitude=39.48, 
    longitude=116.28, 
    method="nearest",
)
<xarray.DataArray 't' ()>
array(281.94235, dtype=float32)
Coordinates:
    time           datetime64[ns] ...
    step           timedelta64[ns] ...
    isobaricInhPa  int64 ...
    latitude       float64 39.38
    longitude      float64 116.2
    valid_time     datetime64[ns] ...
Attributes:
    GRIB_paramId:                             130
    ...skip...
    standard_name:                            air_temperature

参考

http://xarray.pydata.org/en/stable/indexing.html#nearest-neighbor-lookups