xarray指南:索引和选择数据 - 数据集索引

目录

本文翻译自 xarray 官方文档 Indexing and selecting data 的部分内容。

首先导入需要使用到的库。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import xarray as xr

xr.set_options(display_style="text")

我们还可以使用这些方法同时索引数据集中的所有变量,并返回一个新的数据集:

da = xr.DataArray(
    np.random.rand(4, 3),
    [
        ("time", pd.date_range("2000-01-01", periods=4)),
        ("space", ["IA", "IL", "IN"]),
    ],
)
ds = da.to_dataset(name="foo")
ds
<xarray.Dataset>
Dimensions:  (space: 3, time: 4)
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
  * space    (space) <U2 'IA' 'IL' 'IN'
Data variables:
    foo      (time, space) float64 0.5041 0.2959 0.1441 ... 0.4777 0.2955 0.2749
ds.isel(space=[0], time=[0])
<xarray.Dataset>
Dimensions:  (space: 1, time: 1)
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01
  * space    (space) <U2 'IA'
Data variables:
    foo      (time, space) float64 0.5041
ds.sel(time="2000-01-01")
<xarray.Dataset>
Dimensions:  (space: 3)
Coordinates:
    time     datetime64[ns] 2000-01-01
  * space    (space) <U2 'IA' 'IL' 'IN'
Data variables:
    foo      (space) float64 0.5041 0.2959 0.1441

不支持在数据集中进行位置索引,因为数据集中维度的顺序有些含糊(在不同数组之间可能会有所不同)。 但是,您可以使用维名称进行常规索引编制:

ds[dict(space=[0], time=[0])]
<xarray.Dataset>
Dimensions:  (space: 1, time: 1)
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01
  * space    (space) <U2 'IA'
Data variables:
    foo      (time, space) float64 0.5041
ds.loc[dict(time="2000-01-01")]
<xarray.Dataset>
Dimensions:  (space: 3)
Coordinates:
    time     datetime64[ns] 2000-01-01
  * space    (space) <U2 'IA' 'IL' 'IN'
Data variables:
    foo      (space) float64 0.5041 0.2959 0.1441

尚不支持使用索引对数据集的子集进行 赋值(例如 ds[dict(space=0)]=1)。

实战

构造包含三个要素的数据集。

from nwpc_data.grib.cfgrib import load_field_from_file
from nwpc_data.data_finder import find_local_file

t850 = load_field_from_file(
    file_path=find_local_file(
        "grapes_gfs_gmf/grib2/orig",
        start_time="2020031800",
        forecast_time="105h"
    ),
    parameter="t",
    level_type="isobaricInhPa",
    level=850,
)
u850 = load_field_from_file(
    file_path=find_local_file(
        "grapes_gfs_gmf/grib2/orig",
        start_time="2020031800",
        forecast_time="105h"
    ),
    parameter="u",
    level_type="isobaricInhPa",
    level=850,
)
v850 = load_field_from_file(
    file_path=find_local_file(
        "grapes_gfs_gmf/grib2/orig",
        start_time="2020031800",
        forecast_time="105h"
    ),
    parameter="v",
    level_type="isobaricInhPa",
    level=850,
)
weather_ds = xr.Dataset(
    {
        "t": t850,
        "u": u850,
        "v": v850,
    }
)
weather_ds
<xarray.Dataset>
Dimensions:        (latitude: 720, longitude: 1440)
Coordinates:
    time           datetime64[ns] 2020-03-18
    step           timedelta64[ns] 4 days 09:00:00
    isobaricInhPa  int64 850
  * latitude       (latitude) float64 89.88 89.62 89.38 ... -89.38 -89.62 -89.88
  * longitude      (longitude) float64 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
    valid_time     datetime64[ns] 2020-03-22T09:00:00
Data variables:
    t              (latitude, longitude) float32 ...
    u              (latitude, longitude) float32 ...
    v              (latitude, longitude) float32 ...
weather_ds.isel(latitude=[50], longitude=[100])
<xarray.Dataset>
Dimensions:        (latitude: 1, longitude: 1)
Coordinates:
    time           datetime64[ns] 2020-03-18
    step           timedelta64[ns] 4 days 09:00:00
    isobaricInhPa  int64 850
  * latitude       (latitude) float64 77.38
  * longitude      (longitude) float64 25.0
    valid_time     datetime64[ns] 2020-03-22T09:00:00
Data variables:
    t              (latitude, longitude) float32 ...
    u              (latitude, longitude) float32 ...
    v              (latitude, longitude) float32 ...
weather_ds.sel(longitude=25.0)
<xarray.Dataset>
Dimensions:        (latitude: 720)
Coordinates:
    time           datetime64[ns] 2020-03-18
    step           timedelta64[ns] 4 days 09:00:00
    isobaricInhPa  int64 850
  * latitude       (latitude) float64 89.88 89.62 89.38 ... -89.38 -89.62 -89.88
    longitude      float64 25.0
    valid_time     datetime64[ns] 2020-03-22T09:00:00
Data variables:
    t              (latitude) float32 ...
    u              (latitude) float32 ...
    v              (latitude) float32 ...
weather_ds[dict(latitude=[50], longitude=[100])]
<xarray.Dataset>
Dimensions:        (latitude: 1, longitude: 1)
Coordinates:
    time           datetime64[ns] 2020-03-18
    step           timedelta64[ns] 4 days 09:00:00
    isobaricInhPa  int64 850
  * latitude       (latitude) float64 77.38
  * longitude      (longitude) float64 25.0
    valid_time     datetime64[ns] 2020-03-22T09:00:00
Data variables:
    t              (latitude, longitude) float32 ...
    u              (latitude, longitude) float32 ...
    v              (latitude, longitude) float32 ...
weather_ds.loc[dict(longitude=25.0)]
<xarray.Dataset>
Dimensions:        (latitude: 720)
Coordinates:
    time           datetime64[ns] 2020-03-18
    step           timedelta64[ns] 4 days 09:00:00
    isobaricInhPa  int64 850
  * latitude       (latitude) float64 89.88 89.62 89.38 ... -89.38 -89.62 -89.88
    longitude      float64 25.0
    valid_time     datetime64[ns] 2020-03-22T09:00:00
Data variables:
    t              (latitude) float32 ...
    u              (latitude) float32 ...
    v              (latitude) float32 ...

参考

http://xarray.pydata.org/en/stable/indexing.html#dataset-indexing