Skip to content

Enable more ergonomic seasonal grouping and resampling #10198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
6 tasks done
dcherian opened this issue Apr 2, 2025 · 2 comments
Open
6 tasks done

Enable more ergonomic seasonal grouping and resampling #10198

dcherian opened this issue Apr 2, 2025 · 2 comments

Comments

@dcherian
Copy link
Contributor

dcherian commented Apr 2, 2025

TLDR

I propose merging #8524 (docs) to provide this new API (after review).

from xarray.groupers import SeasonGrouper, SeasonResampler

ds.groupby(time=SeasonGrouper(["DJFM", "MAMJ", "JJAS", "SOND"])).mean()
ds.resample(time=SeasonResampler(["DJF", "MAM", "JJA", "SON"])).mean()

Is your feature request related to a problem?

Current Status

Xarray supports a very simple form of seasonal grouping: groupby("time.season") which has a fixed definition of seasons: DJF, MAM, JJA, SON, and doesn't enforce proper ordering of the output (the seasons get sorted as a string to give : DJF, JJA, MAM, SON :/ )

We support a little more complex resampling using Pandas syntax .resample(time="QS-Jan") for example, but I think this is limited to seasons of 3 months long.

User Requests

A quick scan of issues, discussions, and StackOverflow shows that our users want more control over how seasons are specified.

  • Don't include "incomplete" seasons in output.
  • allow custom season definitions (e.g. of varying length, overlapping seasons).

Here is a list of user requests:

Describe the solution you'd like

The problem of custom seasons is simply that of converting the seasons to proper integer codes. Our relatively new Grouper objects provide this extension point.

I have implemented this in #9524 (docs). The code isn't pretty and probably doesn't scale well for very long time vectors, but I focused on correctness and tests.

Describe alternatives you've considered

This could live outside Xarray, but is such a common ask from our userbase, that it seems worth of inclusion.

@trexfeathers
Copy link

This could live outside Xarray, but is such a common ask from our userbase, that it seems worth of inclusion.

Just in case this is useful to anyone. Not as flexible as the proposal, but available right now.

from pathlib import Path
from tempfile import TemporaryDirectory

from iris.coord_categorisation import add_season
from ncdata.iris_xarray import cubes_to_xarray, cubes_from_xarray
from ncdata.threadlock_sharing import enable_lockshare
import requests
import xarray as xr


enable_lockshare(iris=True, xarray=True)


with TemporaryDirectory() as tmpdirname:
    url = "https://github.com/pydata/xarray-data/raw/refs/heads/master/air_temperature.nc"
    response = requests.get(url)

    file_path = Path(tmpdirname) / "air_temperature.nc"

    with file_path.open("wb") as file_write:
        file_write.write(response.content)

    dataset = xr.open_dataset(file_path)

    (cube,) = cubes_from_xarray(dataset)
    # NOTE: this can't support overlapping seasons.
    add_season(cube, "time", name="season", seasons=["DJFM", "AM", "JJ", "ASON"])

    season_dataset = cubes_to_xarray(cube)

@dcherian
Copy link
Contributor Author

dcherian commented Apr 8, 2025

Nice thanks @trexfeathers


Since there are 5 👍 I will open #8524 for review and bring it up for discussion at our regular meeting tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants