Skip to content

API: bump MAXDIMS/MAXARGS to 64 introduce NPY_AXIS_RAVEL #25149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 28, 2023

Conversation

seberg
Copy link
Member

@seberg seberg commented Nov 14, 2023

Bumping these two has three main caveats:

  1. We cannot break ABI for the iterator macros, which means the
    iterators now need to use 32/their own macro.
  2. We used axis=MAXDIMS to mean axis=None, introduce NPY_AXIS_RAVEL
    to replace this, it will be run-time outside NumPy.
  3. The old style iterators cannot deal with high dimensions, meaning
    that some functions just won't work (because they use them).

Draft, as based on the mapiter removal. Needs to document the above points, but should be good for an initial review (the doc changes here may be a bit random). But, I don't expect the code to change really.

There are two plausible follow-ups that I would prefer not to include here:

  1. I had some start to use alloca() or similar for some allocations (or malloc for big ones). We don't have to do go all the way there, but at least the stack arrays in the ufunc setup are worth shrinking. (definitely want to do that)
  2. It might be nice to address some of the places that remain limited. Examples are the choose and arr.flat I added test for here. As well as np.random since it relies on the old-style iterator a lot.
    • I don't have a real solution here, we could duplicate the old style iterators, giving you more dimensions (if you rely on NumPy >2, backporting isn't plausible).

I had mentioned before to just delete NPY_MAXDIMS as a public symbol rather than deleting it. After seeing that it is used in a few places in SciPy, etc. I decided that maybe it is best to defer that to NumPy 3. I doubt we are ready to force that churn on downstream.

Addresses gh-5744, although some paths remain of course.

@seberg seberg marked this pull request as draft November 14, 2023 21:42
@seberg
Copy link
Member Author

seberg commented Nov 15, 2023

OK, I think this is now ready (but relying on the other PR). I had not done it here, but I would be happy to follow up with:

  • Removing NPY_MAXARGS. The main useful "public" use seems at the end of the iterator structs (and it is OK to change it there). (could also do this here, but not sure it helps)
  • I checked and Cython does not define arr.flat to return a C iterator. So I think we can make arr.flat return a new object that supports more dimensions.
    This may not be a super quick thing, unfortunately. Unless we mostly copy paste the old one. The reason is that I am not sure the indexing code can be redone very easily. (Even if it would be nice anyway)

@mattip
Copy link
Member

mattip commented Nov 15, 2023

the other PR

Link?

@seberg
Copy link
Member Author

seberg commented Nov 15, 2023

Oops, thought it was in the first post, just opened: #25138

@seberg seberg marked this pull request as ready for review November 15, 2023 14:45
"(Deprecated NumPy 1.23)") < 0) {
return NPY_FAIL;
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very niche, but, mentioned it in the release notes.

@@ -692,7 +692,7 @@ PyArray_ConcatenateInto(PyObject *op,
Py_DECREF(item);
}

if (axis >= NPY_MAXDIMS) {
if (axis == NPY_RAVEL_AXIS) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a subtle oddity, mentioned in the release note (any huge axes was interpreted as raveling)

"this function only supports up to 32 dimensions but "
"the array has %d.", PyArray_NDIM(mp));
return -1;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't us this internally, might remove the function, but not sure yet. In principle, anyone running into it should catch the error and not leak anything (normal array cleanup doesn't have this problem).

@@ -182,9 +182,8 @@ cdef extern from "numpy/arrayobject.h":
NPY_ARRAY_UPDATE_ALL

cdef enum:
NPY_MAXDIMS

npy_intp NPY_MAX_ELSIZE
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a bit random, but NPY_MAX_ELSIZE doesn't exist (it sure did at some point).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this removal need a release note? Just to say that if someone was using and relying on it, it wasn't doing what they thought it did.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷 I doubt anyone will notice, but sure, added.

@seberg seberg changed the title API: NumPy NPY_MAXDIMS and NPY_MAXARGS to 64 introduce NPY_AXIS_RAVEL API: bump NPY_MAXDIMS and NPY_MAXARGS to 64 introduce NPY_AXIS_RAVEL Nov 16, 2023
@seberg seberg changed the title API: bump NPY_MAXDIMS and NPY_MAXARGS to 64 introduce NPY_AXIS_RAVEL API: bump MAXDIMS/MAXARGS to 64 introduce NPY_AXIS_RAVEL Nov 16, 2023
Bumping these two has three main caveats:
1. We cannot break ABI for the iterator macros, which means the
   iterators now need to use 32/their own macro.
2. We used axis=MAXDIMS to mean axis=None, introduce NPY_AXIS_RAVEL
   to replace this, it will be run-time outside NumPy.
3. The old style iterators cannot deal with high dimensions, meaning
   that some functions just won't work (because they use them).
Memoryview actually also only supports 65 dimensions currently,
so skip the test when the memoryview creation fails.
If it were to work, the test should kick in and work fine.
@seberg
Copy link
Member Author

seberg commented Nov 22, 2023

@ngoldbaum maybe you have some time to review this?

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few language suggestions and some questions. Overall looks fine though.

used in stack allocation, where the increase should be safe.
However, we do encourage generally to remove any use of ``NPY_MAXDIMS`` and
``NPY_MAXARGS`` to eventually allow removing the constraint completely.
In some cases, ``NPY_MAXDIMS`` was passed (and returned) to mean ``axis=None``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make "In some cases" a little more specific? Does this only impact things that interact with numpy via the C API, or could you trigger this behavior from the python API too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The python side places are already mentioned in the other one, but made it a bit more concrete.

-----------------------------------------------
In some cases ``axis=32`` or for concatenate any large value
was the same as ``axis=None``.
Except for ``concatenate`` this was deprecate.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Except for ``concatenate`` this was deprecate.
Except for ``concatenate`` this was deprecated in Numpy 1.23.

Was the deprecation noisy for other functions? Just wondering if we might want to merely deprecate this usage for concatenate since that was missed in 2022.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gave a deprecation warning. I briefly considered that, and think that this is ridiculously niche. IIRC, we were close to not do the deprecation warning in the other path also.

We are talking about np.concatenate(..., axis=32) or higher. Where the axis value cannot relate to any actual axis.
This is undocumented and unless you do it accidentally (and it happens to do what you want?!) nobody should have reason to even realize it works (at least for ~15 years), I would say.

``NPY_MAXDIMS`` was also used to signal ``axis=None`` in the C-API, including
the ``PyArray_AxisConverter``.
If you run into this problem, you will see ``-2147483648``
(the minimum integer value) or ``64`` being used as an invalid axis.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might help to include the exact text of the error they'd see, so if they google the error message they'll find the migration guide

is the largest number of dimensions for any array.
``ndarraytypes.h`` points to this data member.
Although most operations may be limited in dimensionality, we do not
advertise a maximum dimension. Anyone explicitly relying on one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a remnant of an earlier version that completely removed NPY_MAXDIMS from the public API? Maybe rephrase this to say NPY_MAXDIMS is there but not to rely on it because it might change or be removed in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't a remnant, but rephrased a bit.

@@ -182,9 +182,8 @@ cdef extern from "numpy/arrayobject.h":
NPY_ARRAY_UPDATE_ALL

cdef enum:
NPY_MAXDIMS

npy_intp NPY_MAX_ELSIZE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this removal need a release note? Just to say that if someone was using and relying on it, it wasn't doing what they thought it did.

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on vacation but will merge this next week if no one else does it before me. This is mostly very straightforward and it's hard to know the impact of the ABI break or possible breakage of questionable C code using the macro without making the change.

@ngoldbaum
Copy link
Member

I'm going to merge this now and immediately trigger a new nightly wheel upload. Thanks @seberg!

If you are reading this because of new CI breakage, make sure any nightly builds or tests you're running against NumPy dev (future 2.0 release) use a stack that is built against Numpy dev as well, with build isolation disabled.

@ngoldbaum ngoldbaum merged commit a5b0831 into numpy:main Nov 28, 2023
@seberg seberg deleted the maxdims branch November 29, 2023 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants