NumPy: its Goals, Stakeholders, Use and Future

NumPy, short for Numerical Python, is a Python library that provides functionality for scientific computing. The first versions of the library were initially part of SciPy under the name Numeric. As the library became more popular and required more flexibility and speed, numarray was created as a replacement by the Space Science Telescope Institute. Numeric and numarray were eventually split up, but in 2005, Travis Oliphant reunited them, separated everything from SciPy and named the new library NumPy. In 2006, the library was included in Python’s standard library1.

Even though NumPy is described as: “(…) the fundamental package for scientific computing with Python” on its website2, its core strengths are multidimensional arrays and tools to work with these arrays. It has some functionality for scientific computing, but SciPy would be the go-to library for this.

The library is currently open-sourced on GitHub and is being maintained by a large community of developers. Below, you can read more about NumPy’s users, capabilities, stakeholders, context and roadmap.

NOTE: The formatting in this essay has been optimised for the online version.

Mental Model of the User

The book Lean Architecture by Bjørnvig and Coplien states that the mental model of the user consists of the roles and actors that interact in the system3. In the same book, the authors split the view on architecture into two parts: what the system is versus what the system does. When looking at it from the latter perspective, use cases can effectively elicit these end-user world models. This section will list the major use cases of NumPy to capture the mental model of the user.

The main use case for NumPy is offering a way to initialize and apply computations on N-dimensional array structures in Python. Python is a beginner-friendly language and libraries are expected to ‘just work’ after importing them. This is another assumption of the end user when using Python libraries in general; see a relevant xkcd below:

xkcd 353

A second use case of NumPy is the speed of the library. Large datasets mean large matrices on which computations need to be done. Therefore NumPy must be able to perform complex mathematical computations quickly on large data structures. For users, NumPy can be accessed entirely by using Python code.

The end-user model thus consists of two main expectations: the user must be able to simply import the NumPy library and start using it immediately and intuitively; additionally the library is expected to be fast, even on large data structures.

Key Capabilities and Properties

The capabilities of NumPy all serve the main goal of the framework, namely being the fundamental tool for scientific computing within Python.

According to its own website, NumPy aims to provide2:

“A powerful N-dimensional array object called a NumPy array, a selection of derived objects and functionality for a range of fast operations on these data structures. These operations include mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations.”

This NumPy array has a fixed length and contains only one datatype, unlike a regular Python list. NumPy also contains tools for simulation of random numbers. In short, NumPy allows for fast computations when using array-like data structures.

As mentioned before, using NumPy allows for programming more efficiently in terms of faster computations and smarter memory management. This performance is partially due to its implementation being C code, which is faster than Python4.

Stakeholder Analysis

The stakeholders of NumPy can be divided into several categories: developers, main sponsor, institutional partners, funders, and end users. These categories will be further described below.

Developers

The NumPy repository is part of the NumPy organisation on GitHub. This group consists of 23 members5 and has administrative control over the NumPy repository. As of early March 2020, there are 877 contributors to the NumPy repository6. The most active maintainers in the time range January 2019 - March 2020 can be found below. Probably not by coincidence, they also took part in one of our pull requests.

List of most active maintainers according to contributors in the time range January 2019 - March 2020 and their commits7:

Contributors in the 2019-2020 time range

Main sponsor

NumPy’s main sponsors is NumFOCUS, a 501(c)(3) nonprofit charity in the United States. This status comes with quite some limitations and regulations, ensuring that the organisation is focused on charitable purposes8, in this case: “Open Code for Better Science”9. NumFOCUS provides NumPy with fiscal, legal, and administrative support to help ensure the health and sustainability of the project.

Institutional partners

NumPy’s institutional partners are organisations that support the project by employing NumPy contributors, with contributing to the project as part of their official duties. Current institutional partners include Quansight (headed by Travis Oliphant) and Berkeley University of California.

Funders

NumPy receives direct funding from the following sources: Gordon and Betty Moore Foundation, Alfred P. Sloan Foundation and Tidelift. We were interested to what extent these funders were involved in the decision-making process of NumPy. The Moore Foundation and Alfred P. Sloan Foundation are both not involved in any decision-making process of their fundees10 11. It is unclear to what extent Tidelift is involved in NumPy, as there is no statement whatsoever and they seem to be more commercially-oriented. We contacted Tidelift about their involvement in NumPy, but unfortunately we only received an automated reaction.

End users

The end users of NumPy are also stakeholders in the sense that they, too, benefit from NumPy functioning and being managed properly. Some people use NumPy’s computational power in their personal projects, but there are also widely-used frameworks that base their functionality on NumPy. These include Tensorflow, PyTorch and sci-kit learn, among many others. On GitHub alone NumPy is already used by over 300k projects12.

Context

NumPy is, and in the future will be, used in the context of a Python source code file. As described earlier, the NumPy library provides functionality for working with arrays in Python. There is no GUI and this will probably never be created, as it does not match the environment in which NumPy is used. Below, an example of some NumPy code can be found13:

>>> import numpy as np
>>> a = np.arange(15).reshape(3, 5)
>>> a
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
>>> a.shape
(3, 5)
>>> a.ndim
2
>>> a.dtype.name
'int64'
>>> a.itemsize
8
>>> a.size
15
>>> type(a)
<type 'numpy.ndarray'>
>>> b = np.array([6, 7, 8])
>>> b
array([6, 7, 8])
>>> type(b)
<type 'numpy.ndarray'>

Roadmap

NumPy hosts an elaborate roadmap on its website, as well as a template for proposing new plans. These plans are called NumPy Enhancement Proposals (NEPs) and a list of previous and current NEPs can be found on the following webpage.

A solid example of such a NEP is NEP 14 - Plan for dropping Python 2.7 support14. The final stage of this NEP was done on January 1st, 2020, when NumPy’s community dropped support for Python 2 entirely. The NEP for dropping Python 2.7 was already suggested in November 201715.

However, NEPs do not cover NumPy’s roadmap entirely and hence, we will summarise NumPy’s roadmap below16:

  • Interoperability
    • The NumPy developers want to make it easier for other libraries to interoperate with NumPy. They want to provide better interoperability protocols and better array subclass handling, among other things.
  • Extensibility
    • The NumPy developers want to simplify NumPy’s internal datatypes (‘dtypes’) to simplify extending Numpy’s functionality. For instance, they want to simplify the creation of custom dtypes and want to add new ‘string’ dtypes for dealing with textual data.
  • Performance
    • The NumPy developers want to improve NumPy’s performance through for instance SIMD instructions and optimisations within functions.
  • Website and documentation
    • The website needs to be rewritten completely and the documentation is of ‘varying quality’17.
  • Random number generation policy & rewrite
    • At the time of writing, the developers are close to completing a new random number generation framework.

Conclusion

Many developers use NumPy and are familiar with the library itself. However, they most likely haven’t looked at NumPy from a more historical and contextual perspective, as this is not strictly needed to use the library. In this essay we hope to have given NumPy users or anyone else interested in the NumPy project insight into these overlooked aspects of one of the most iconic Python packages to date.

  1. History of SciPy, retrieved on 2020-03-02. https://scipy.github.io/old-wiki/pages/History_of_SciPy.html 

  2. NumPy homepage, retrieved on 2020-02-20. https://numpy.org/  2

  3. Coplien, J. O., & Bjørnvig, G. (2011). Lean architecture: for agile software development. John Wiley & Sons. 

  4. Rossant, C. (2018). Chapter 4.5 Understanding the internals of NumPy to avoid unnecessary array copying. In IPython Cookbook (Second Edition). Retrieved from https://ipython-books.github.io/45-understanding-the-internals-of-numpy-to-avoid-unnecessary-array-copying/ 

  5. NumPy Organization members on GitHub, retrieved on 2020-03-05. https://github.com/orgs/numpy/people 

  6. Contributors to NumPy, retrieved on 2020-03-05. https://github.com/numpy/numpy/graphs/contributors 

  7. Contributors to NumPy from January 2019 to March 2020, retrieved on 2020-03-05. https://github.com/numpy/numpy/graphs/contributors?from=2019-01-01&to=2020-03-05&type=c 

  8. What is a 501(c)(3), retrieved 2020-03-05. https://www.501c3.org/what-is-a-501c3/ 

  9. NumFOCUS website. retrieved 2020-03-05. https://numfocus.org/ 

  10. Moore Foundation, Founder’s Intent, retrieved on 2020-03-02. https://www.moore.org/about/founders-intent 

  11. Alfred P. Sloan Foundation, Governance and Policies, retrieved 2020-03-05. https://sloan.org/about/documents#tab-governance-and-policies 

  12. GitHub repositories depending on NumPy, retrieved on 2020-03-02. https://github.com/numpy/numpy/network/dependents 

  13. Numpy code example, retrieved on 2020-03-05. https://docs.scipy.org/doc/numpy/user/quickstart.html#an-example 

  14. NumPy’s NEP 14, retrieved on 2020-03-02. https://numpy.org/neps/nep-0014-dropping-python2.7-proposal.html 

  15. Initial discussion of dropping support for Python 2.7, retrieved on 2020-03-02. https://mail.python.org/pipermail/numpy-discussion/2017-November/077419.html 

  16. NumPy’s roadmap, retrieved on 2020-03-02. https://numpy.org/neps/roadmap.html 

  17. Section about website and documentation of NumPy’s roadmap, retrieved on 2020-03-05. https://numpy.org/neps/roadmap.html#website-and-documentation 

NumPy
Authors
Robbert Koning
Pravesh Moelchand
Erwin van Thiel
Jim Verheijde