xarray指南:索引和选择数据 - 数据集索引
目录
本文翻译自 xarray 官方文档 Indexing and selecting data 的部分内容。
首先导入需要使用到的库。
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import xarray as xr
xr.set_options(display_style="text")
我们还可以使用这些方法同时索引数据集中的所有变量,并返回一个新的数据集:
da = xr.DataArray(
np.random.rand(4, 3),
[
("time", pd.date_range("2000-01-01", periods=4)),
("space", ["IA", "IL", "IN"]),
],
)
ds = da.to_dataset(name="foo")
ds
<xarray.Dataset>
Dimensions: (space: 3, time: 4)
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
* space (space) <U2 'IA' 'IL' 'IN'
Data variables:
foo (time, space) float64 0.5041 0.2959 0.1441 ... 0.4777 0.2955 0.2749
ds.isel(space=[0], time=[0])
<xarray.Dataset>
Dimensions: (space: 1, time: 1)
Coordinates:
* time (time) datetime64[ns] 2000-01-01
* space (space) <U2 'IA'
Data variables:
foo (time, space) float64 0.5041
ds.sel(time="2000-01-01")
<xarray.Dataset>
Dimensions: (space: 3)
Coordinates:
time datetime64[ns] 2000-01-01
* space (space) <U2 'IA' 'IL' 'IN'
Data variables:
foo (space) float64 0.5041 0.2959 0.1441
不支持在数据集中进行位置索引,因为数据集中维度的顺序有些含糊(在不同数组之间可能会有所不同)。 但是,您可以使用维名称进行常规索引编制:
ds[dict(space=[0], time=[0])]
<xarray.Dataset>
Dimensions: (space: 1, time: 1)
Coordinates:
* time (time) datetime64[ns] 2000-01-01
* space (space) <U2 'IA'
Data variables:
foo (time, space) float64 0.5041
ds.loc[dict(time="2000-01-01")]
<xarray.Dataset>
Dimensions: (space: 3)
Coordinates:
time datetime64[ns] 2000-01-01
* space (space) <U2 'IA' 'IL' 'IN'
Data variables:
foo (space) float64 0.5041 0.2959 0.1441
尚不支持使用索引对数据集的子集进行 赋值(例如 ds[dict(space=0)]=1
)。
实战
构造包含三个要素的数据集。
from nwpc_data.grib.cfgrib import load_field_from_file
from nwpc_data.data_finder import find_local_file
t850 = load_field_from_file(
file_path=find_local_file(
"grapes_gfs_gmf/grib2/orig",
start_time="2020031800",
forecast_time="105h"
),
parameter="t",
level_type="isobaricInhPa",
level=850,
)
u850 = load_field_from_file(
file_path=find_local_file(
"grapes_gfs_gmf/grib2/orig",
start_time="2020031800",
forecast_time="105h"
),
parameter="u",
level_type="isobaricInhPa",
level=850,
)
v850 = load_field_from_file(
file_path=find_local_file(
"grapes_gfs_gmf/grib2/orig",
start_time="2020031800",
forecast_time="105h"
),
parameter="v",
level_type="isobaricInhPa",
level=850,
)
weather_ds = xr.Dataset(
{
"t": t850,
"u": u850,
"v": v850,
}
)
weather_ds
<xarray.Dataset>
Dimensions: (latitude: 720, longitude: 1440)
Coordinates:
time datetime64[ns] 2020-03-18
step timedelta64[ns] 4 days 09:00:00
isobaricInhPa int64 850
* latitude (latitude) float64 89.88 89.62 89.38 ... -89.38 -89.62 -89.88
* longitude (longitude) float64 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
valid_time datetime64[ns] 2020-03-22T09:00:00
Data variables:
t (latitude, longitude) float32 ...
u (latitude, longitude) float32 ...
v (latitude, longitude) float32 ...
weather_ds.isel(latitude=[50], longitude=[100])
<xarray.Dataset>
Dimensions: (latitude: 1, longitude: 1)
Coordinates:
time datetime64[ns] 2020-03-18
step timedelta64[ns] 4 days 09:00:00
isobaricInhPa int64 850
* latitude (latitude) float64 77.38
* longitude (longitude) float64 25.0
valid_time datetime64[ns] 2020-03-22T09:00:00
Data variables:
t (latitude, longitude) float32 ...
u (latitude, longitude) float32 ...
v (latitude, longitude) float32 ...
weather_ds.sel(longitude=25.0)
<xarray.Dataset>
Dimensions: (latitude: 720)
Coordinates:
time datetime64[ns] 2020-03-18
step timedelta64[ns] 4 days 09:00:00
isobaricInhPa int64 850
* latitude (latitude) float64 89.88 89.62 89.38 ... -89.38 -89.62 -89.88
longitude float64 25.0
valid_time datetime64[ns] 2020-03-22T09:00:00
Data variables:
t (latitude) float32 ...
u (latitude) float32 ...
v (latitude) float32 ...
weather_ds[dict(latitude=[50], longitude=[100])]
<xarray.Dataset>
Dimensions: (latitude: 1, longitude: 1)
Coordinates:
time datetime64[ns] 2020-03-18
step timedelta64[ns] 4 days 09:00:00
isobaricInhPa int64 850
* latitude (latitude) float64 77.38
* longitude (longitude) float64 25.0
valid_time datetime64[ns] 2020-03-22T09:00:00
Data variables:
t (latitude, longitude) float32 ...
u (latitude, longitude) float32 ...
v (latitude, longitude) float32 ...
weather_ds.loc[dict(longitude=25.0)]
<xarray.Dataset>
Dimensions: (latitude: 720)
Coordinates:
time datetime64[ns] 2020-03-18
step timedelta64[ns] 4 days 09:00:00
isobaricInhPa int64 850
* latitude (latitude) float64 89.88 89.62 89.38 ... -89.38 -89.62 -89.88
longitude float64 25.0
valid_time datetime64[ns] 2020-03-22T09:00:00
Data variables:
t (latitude) float32 ...
u (latitude) float32 ...
v (latitude) float32 ...
参考
http://xarray.pydata.org/en/stable/indexing.html#dataset-indexing