Numerical Python

David Ascher
Paul F. Dubois
Konrad Hinsen
Jim Hugunin
Travis Oliphant

March 15, 2001

Lawrence Livermore National Laboratory, Livermore, CA 94566

UCRL-MA-128569

Legal Notice

 

Please see file Legal.html in the source distribution.

This open source project has been contributed to by many people, including personnel of the Lawrence Livermore National Laboratory. The following notice covers those contributions including this manual.

 

Copyright (c) 1999. The Regents of the University of California. All rights reserved.

Permission to use, copy, modify, and distribute this software for any purpose without fee is hereby granted, provided that this entire notice is included in all copies of any software which is or includes a copy or modification of this software and in all copies of the supporting documentation for such software.

This work was produced at the University of California, Lawrence Livermore National Laboratory under contract no. W-7405-ENG-48 between the U.S. Department of Energy and The Regents of the University of California for the operation of UC LLNL.

This software was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor the University of California nor any of their employees, makes any warranty, express or implied, or assumes any liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately-owned rights. Reference herein to any specific commercial products, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or the University of California. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or the University of California, and shall not be used for advertising or product endorsement purposes.

 

 

Table Of Contents

Numerical Python 1

Introduction 2

Where to get information and code 3

Acknowledgments 4

Installing NumPy 5

Testing the Python installation 5

Testing the Numeric Python Extension Installation 5

Installing NumPy 5

Installing on Windows 6

Installing on Unix 6

At the SourceForge... 6

The Numeric Discussion List 6

Bugs and Patches 6

CVS Repository 6

FTP Site 6

Pyfort 6

The NumTut package 7

Testing the NumTut package 7

Possible reasons for failure: 7

Win32 7

Unix 8

High-Level Overview 9

Array Objects 9

Universal Functions 10

Convenience Functions 10

Array Basics 12

Basics 12

Creating arrays from scratch 12

array() and typecodes 12

Multidimensional Arrays 14

resize 16

Creating arrays with values specified `on-the-fly' 17

zeros() and ones() 17

arrayrange() 17

Creating an array from a function: fromfunction() 19

identity() 20

Coercion and Casting 20

Automatic Coercions and Binary Operations 20

Deliberate up-casting: The asarray function 21

The typecode value table 21

Consequences of silent upcasting 22

Saving space 22

Deliberate casts (potentially down): the astype method 22

Operating on Arrays 23

Simple operations 23

Getting and Setting array values 24

Slicing Arrays 25

Ufuncs 27

What are Ufuncs? 27

Ufuncs can operate on any Python sequence 28

Ufuncs can take output arguments 28

Ufuncs have special methods 28

The reduce ufunc method 28

The accumulate ufunc method 29

The outer ufunc method 29

The reduceat ufunc method 29

Ufuncs always return new arrays 30

Which are the Ufuncs? 30

Unary Mathematical Ufuncs (take only one argument) 30

Binary Mathematical Ufuncs 30

Logical Ufuncs 30

Ufunc shorthands 31

Pseudo Indices 32

Array Functions 34

take(a, indices, axis=0) 34

put (a, indices, values) 35

putmask (a, mask, values) 36

transpose(a, axes=None) 36

repeat(a, repeats, axis=0) 36

choose(a, (b0, ..., bn)) 36

ravel(a) 37

nonzero(a) 37

where(condition, x, y) 37

compress(condition, a, axis=0) 37

diagonal(a, k=0) 37

trace(a, k=0) 38

searchsorted(a, values) 38

sort(a, axis=-1) 38

argsort(a, axis=-1) 39

argmax(a, axis=-1), argmin(a, axis=-1) 39

fromstring(string, typecode) 39

dot(m1, m2) 40

matrixmultiply(m1, m2) 40

clip(m, m_min, m_max) 40

indices(shape, typecode=None) 40

swapaxes(a, axis1, axis2) 41

concatenate((a0, a1, ... , an), axis=0) 41

innerproduct(a, b) 42

outerproduct(a,b) 42

array_repr() 42

array_str() 42

resize(a, new_shape) 42

diagonal(a, offset=0, axis1=-2, axis2=-1) 42

repeat (a, counts, axis=0) 42

convolve (a, v, mode=0) 43

cross_correlate (a, v, mode=0) 43

where (condition, x, y) 43

identity(n) 43

sum(a, index=0) 44

cumsum(a, index=0) 44

product(a, index=0) 44

cumproduct(a, index=0) 44

alltrue(a, index=0) 44

sometrue(a, index=0) 44

Array Methods 45

itemsize() 45

iscontiguous() 45

typecode() 45

byteswapped() 45

tostring() 45

tolist() 46

Array Attributes 47

flat 47

real and imaginary 47

Special Topics 49

Subclassing 49

Code Organization 49

Numeric.py and friends 49

UserArray.py 49

Matrix.py 49

Precision.py 49

ArrayPrinter.py 49

Mlab.py 49

bartlett(M) 49

blackman(M) 50

corrcoef(x, y=None) 50

cov(m,y=None) 50

cumprod(m) 50

cumsum(m) 50

diag(v, k=0) 50

diff(x, n=1) 50

eig(m) 50

eye(N, M=N, k=0, typecode=None) 50

fliplr(m) 50

flipud(m) 50

hamming(M) 50

hanning(M) 50

kaiser(M, beta) 50

max(m) 50

mean(m) 50

median(m) 50

min(m) 51

msort(m) 51

prod(m) 51

ptp(m) 51

rand(d1, ..., dn) 51

rot90(m,k=1) 51

sinc(x) 51

squeeze(a) 51

std(m) 51

sum(m) 51

svd(m) 51

trapz(y,x=None) 51

tri(N, M=N, k=0, typecode=None) 51

tril(m,k=0) 51

triu(m,k=0) 51

The multiarray object 52

Typecodes 52

Indexing in and out, slicing 53

Ellipses 54

NewAxis 54

Set-indexing and Broadcasting 54

Axis specifications 55

Textual representations of arrays 55

Comparisons 57

Pickling and Unpickling -- storing arrays on disk 57

Dealing with floating point exceptions 57

Writing a C extension to NumPy 58

Introduction 58

Preparing an extension module for NumPy arrays 58

Accessing NumPy arrays from C 58

Types and Internal Structure 58

Element data types 59

Contiguous arrays 60

Zero-dimensional arrays 60

A simple example 60

Accepting input data from any sequence type 61

Creating NumPy arrays 62

Returning arrays from C functions 62

A less simple example 62

C API Reference 64

ArrayObject C Structure and API 64

Structures 64

The ArrayObject API 65

Notes 68

UfuncObject C Structure and API 68

C Structure 68

UfuncObject C API 70

Glossary 73

Optional Packages 75

License and disclaimer for packages MA and RNG 76

FFT Reference 77

Python Interface 77

fft(data, n=None, axis=-1) 77

inverse_fft(data, n=None, axis=-1) 77

real_fft(data, n=None, axis=-1) 77

inverse_real_fft(data, n=None, axis=-1) 78

fft2d(data, s=None, axes=(-2,-1)) 78

real_fft2d(data, s=None, axes=(-2,-1)) 78

C API 78

Compilation Notes 79

LinearAlgebra Reference 80

Python Interface 80

solve_linear_equations(a, b) 80

inverse(a) 80

eigenvalues(a) 80

eigenvectors(a) 81

singular_value_decomposition(a, full_matrices=0) 81

generalized_inverse(a, rcond=1e-10) 81

determinant(a) 81

linear_least_squares(a, b, rcond=e-10) 81

Compilation Notes 81

RandomArray Reference 82

Python Interface 82

seed(x=0, y=0) 82

get_seed() 82

random(shape= ReturnFloat ) 82

uniform(minimum, maximum, shape=ReturnFloat) 82

randint(minimum, maximum, shape=ReturnFloat) 82

permutation(n) 82

Floating point random arrays 83

standard_normal (shape=ReturnFloat) 83

normal (mean, stddev, shape=ReturnFloat) 83

multivariate_normal (mean, covariance) or
multivariate_normal (mean, covariance, leadingAxesShape) 83

exponential (mean, shape=ReturnFloat) 83

beta (a, b, shape=ReturnFloat) 83

gamma (a, r, shape=ReturnFloat) 84

chi_square (df, shape=ReturnFloat) 84

noncentral_chi_square (df, nonc, shape=ReturnFloat) 84

F (dfn, dfd, shape=ReturnFloat) 84

noncentral_F (dfn, dfd, nconc, shape=ReturnFloat) 84

Integer random arrays 84

binomial (trials, prob, shape=ReturnInt) 84

negative_binomial (trials, prob, shape=ReturnInt) 84

poisson (mean, shape=ReturnInt) 84

multinomial (trials, probs) or multinomial (trials, probs, leadingAxesShape) 84

Examples 85

Independent Random Streams 87

Background 87

Usage 87

Module RNG 87

Generator objects 87

Module ranf 88

Examples 88

Masked Arrays 89

What is a masked array? 89

Installing and using MA 89

Class MaskedArray 89

Attributes of masked arrays 90

Methods on masked arrays. 90

Constructing masked arrays 92

Working with Masks 93

Copying or not? 94

Behaviors 94

Indexing and Slicing 94

Indexing that produces a scalar result 95

Assignment to elements and slices 95

Module MA: Attributes 95

Module MA: Functions 95

Unary functions 95

Binary functions 95

Comparison operators 96

Logical operators 96

Special array operators 96

Controlling the size of the string representations 98

Helper classes 98

MAError 98

The constant masked 98

Example of masked behavior 98

Class masked_unary_function 99

Class masked_binary_function 99

ActiveAttributes 99

Examples of Using MA 100

Data with a given value representing missing data 100

Filling in the missing data 100

Numerical operations 100

Seeing the mask 100

Filling it your way 101

Ignoring extreme values 101

Averaging an entire multidimensional array 101

Index 103

Part I: Numerical Python

Numerical Python ("Numpy") adds a fast multidimensional array facility to Python. This part contains all you need to know about "Numpy" arrays and the functions that operate upon them.

1. Introduction

This chapter introduces the Numeric Python extension and outlines the rest of the document.

The Numeric Python extensions (NumPy henceforth) is a set of extensions to the Python programming language which allows Python programmers to efficiently manipulate large sets of objects organized in grid-like fashion. These sets of objects are called arrays, and they can have any number of dimensions: one dimensional arrays are similar to standard Python sequences, two-dimensional arrays are similar to matrices from linear algebra. Note that one-dimensional arrays are also different from any other Python sequence, and that two-dimensional matrices are also different from the matrices of linear algebra, in ways which we will mention later in this text.

Why are these extensions needed? The core reason is a very prosaic one, and that is that manipulating a set of a million numbers in Python with the standard data structures such as lists, tuples or classes is much too slow and uses too much space. Anything which we can do in NumPy we can do in standard Python - we just may not be alive to see the program finish. A more subtle reason for these extensions however is that the kinds of operations that programmers typically want to do on arrays, while sometimes very complex, can often be decomposed into a set of fairly standard operations. This decomposition has been developed similarly in many array languages. In some ways, NumPy is simply the application of this experience to the Python language - thus many of the operations described in NumPy work the way they do because experience has shown that way to be a good one, in a variety of contexts. The languages which were used to guide the development of NumPy include the infamous APL family of languages, Basis, MATLAB, FORTRAN, S and S+, and others. This heritage will be obvious to users of NumPy who already have experience with these other languages. This tutorial, however, does not assume any such background, and all that is expected of the reader is a reasonable working knowledge of the standard Python language.

This document is the "official" documentation for NumPy. It is both a tutorial and the most authoritative source of information about NumPy with the exception of the source code. The tutorial material will walk you through a set of manipulations of simple, small, arrays of numbers, as well as image files. This choice was made because:

All users of NumPy, whether interested in image processing or not, are encouraged to follow the tutorial with a working NumPy installation at their side, testing the examples, and, more importantly, transferring the understanding gained by working on images to their specific domain. The best way to learn is by doing - the aim of this tutorial is to guide you along this "doing."

Here is what the rest of this part contains:

Installing NumPy provides information on testing Python, NumPy, and compiling and installing NumPy if necessary.

The NumTut package provides information on testing and installing the NumTut package, which allows easy visualization of arrays.

High-Level Overview gives a high-level overview of the components of the NumPy system as a whole.

Array Basics provides a detailed step-by-step introduction to the most important aspect of NumPy, the multidimensional array objects.

Ufuncs provides information on universal functions, the mathematical functions which operate on arrays and other sequences elementwise.

Pseudo Indices covers syntax for some special indexing operators.

Array Functions is a catalog of each of the utility functions which allow easy algorithmic processing of arrays.

Array Methods discusses the methods of array objects.

Array Attributes presents the attributes of array objects.

Special Topics is a collection of special topics, from the organization of the codebase to the mechanisms for customizing printing.

Writing a C extension to NumPy is an tutorial on how to write a C extension which uses NumPy arrays.

C API Reference is a reference for the C API to NumPy objects (both PyArrayObjects and UFuncObjects).

Glossary is a glossary of terms.

Reference material for the optional packages distributed with Numeric Python are described in the next part, Optional Packages.

Where to get information and code

Numerical Python and its documentation are available at SourceForge. The main web site is:

http://numpy.sourceforge.net

Downloads, bug reports, and patch facility, and releases are at the main project page, reachable from the above site or directly at: http://sourceforge.net/projects/numpy

The Python web site is www.python.org

Many packages are available from third parties that use Numeric to interface to a variety of mathematical and statistical software.

Acknowledgments

Numerical Python is the outgrowth of a long collaborative design process carried out by the Matrix SIG of the Python Software Activity (PSA). Jim Hugunin, while a graduate student at MIT, wrote most of the code and initial documentation. When Jim joined CNRI and began working on JPython, he didn't have the time to maintain Numerical Python so Paul Dubois at LLNL agreed to become the maintainer of Numerical Python. David Ascher, working as a consultant to LLNL, wrote most of this document, incorporating contributions from Konrad Hinsen and Travis Oliphant, both of whom are major contributors to Numerical Python.

Since the source was moved to SourceForge, the Numeric user community has become a significant part of the process. Numerical Python illustrates the power of the open source software concept.

Please send comments and corrections to this manual to paul@pfdubois.com, or to Paul F. Dubois, L-264, Lawrence Livermore National Laboratory, Livermore, CA 94566, U.S.A.

2. Installing NumPy

This chapter explains how to install and test NumPy, from either the source distribution or from the binary distribution.

Before we start with the actual tutorial, we will describe the steps needed for you to be able to follow along the examples step by step. These steps including installing Python, the NumPy extensions, and some tools and sample files used in the examples of this tutorial.

Testing the Python installation

The first step is to install Python if you haven't already. Python is available from the Python website's download directory at http://www.python.org/download . Click on the link corresponding to your platform, and follow the instructions described there. When installed, starting Python by typing python at the shell or double-clicking on the Python interpreter should give a prompt such as:

Python 1.5.1 (#0, Apr 13 1998, 20:22:04) [MSC 32 bit (Intel)] on win32

Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam

>>>

If you have problems getting this part to work, consider contacting a local support person or emailing python-help@python.org for help. If neither solution works, consider posting on the comp.lang.python newsgroup (details on the newsgroup/mailing list are available at http://www.python.org/psa/MailingLists.html#clp ).

Testing the Numeric Python Extension Installation

The standard Python distribution does not come as of this writing with the Numeric Python extensions installed, but your system administrator may have installed them already. To find out if your Python interpreter has NumPy installed, type import Numeric at the Python prompt. You'll see one of two behaviors (throughout this document, bold Courier New font indicates user input, and standard Courier New font indicates output):

>>> import Numeric

Traceback (innermost last):

File "<stdin>", line 1, in ?

ImportError: No module named Numeric

>>>

indicating that you don't have NumPy installed, or:

>>> import Numeric

>>>

indicating that you do. If you do, go on to the next step. If you don't, you have to get the NumPy extensions.

Installing NumPy

The release facility at SourceForge is accessed through the project page, http://sourceforge.net/projects/numpy. Click on the "numpy" releases and you will be presented with a list of the available files. The files whose names end in ".tar.gz" are source code releases. The others are "prebuilt" for a given platform. It is possible to get the latest sources directly from our CVS repository using the facilities described at SourceForge. Note that while every effort is made to ensure that the repository is always "good", direct use of the repository is subject to more errors than using a standard release.

Installing on Windows

On Windows, we currently have .zip files that should be unzipped into the top of your Python distribution; there is no "Setup" to run. If you wish to build from source on Windows, the Unix procedure described below can be used, running python in a command-line tool.

In general, there may not be a prebuilt version of a particular kind available in every minor release. If you need a prebuilt version, choose the most recent version available.

Installing on Unix

The source distribution should be uncompressed and unpacked using the the tar program:

csh> tar xfz Numeric-n.m.tar.gz

Follow the instructions in the top-level directory for compilation and installation. Note that there are options you must consider before beginning. Installation is usually as simple as:

python setup_all.py install

However, please (please!) see the README itself for the latest details.

 

Just like all Python modules and packages, the Numeric module can be invoked using either the import Numeric form, or the from Numeric import ... form. Because most of the functions we'll talk about are in the Numeric module, in this document, all of the code samples will assume that they have been preceded by a statement:

from Numeric import *

 
At the SourceForge...

The SourceForge facility is at http://sourceforge.net/projects/numpy. Look on SourceForge also for various Numeric-based packages supplied by individuals.

The Numeric Discussion List

You can subscribe to a discussion list about Numeric python using the project page at SourceForge. The list is a good place to ask questions and get help. Send mail to numpy-discussion@lists.sourceforge.net.

Bugs and Patches

Bug tracking and patch-management facilities is provided on the SourceForge project page.

CVS Repository

You can get the latest and greatest (albeit less tested and trustworthy) version of Numeric directly from our CVS repository.

FTP Site

The FTP Site contains this documentation in several formats, plus maybe some other goodies we have lying around.

Pyfort

One tool for connecting Fortran to Numeric and Python is Pyfort, sourceforge.net/projects/pyfortran.

3. The NumTut package

This chapter leads the user through the installation and testing of the NumTut package, which should have been distributed with this document.

Testing the NumTut package

This tutorial assumes that the NumTut package has been installed. This package contains a few sample images and utility functions for displaying arrays and the like. To find out if NumTut has been installed, do:

>>> from NumTut import *

>>> view(greece)

 

If a picture of a greek street shows up on your screen, you're all set, and you can go to the next chapter.

Possible reasons for failure:

>>> import NumTut

Traceback (innermost last):

File "<stdin>", line 1, in ?

ImportError: No module named NumTut

This message indicates that you do not have the NumTut package installed in your PythonPath. NumTut is distributed along with the Python source in the Demo subdirectory. Copy the NumTut subdirectory somewhere into your Python path, or just execute python from the Demo directory.

On Win32, the NumTut directory can be placed in the main directory of your Python installation. On Unix, it can be placed in the site-packages directory of your installation.

Win32

>>> import NumTut

Traceback (innermost last):

[...]

ConfigurationError: view needs Tkinter on Win32, and either threads or the IDLE editor"

or:

ConfigurationError: view needs either threads or the IDLE editor to be enabled.

On Win32 (Windows 95, 98, NT), the Tk toolkit is needed to view the images. Additionally, either the Python interpreter needs to be compiled with thread support (which is true in the standard win32 distribution) or you need to call the NumTut program from the IDLE interactive development environment.

If you do not wish to modify your Python installation to match these requirements, you can simply ignore the references to the demonstrations which use the view() command later in this document. Using NumPy does not require image display tools, they just make some array operations easier to understand.

Unix

On Unix machines, NumTut will work best with a Python interpreter with Tk support (not true in the default configuration), with the Tkinter GUI framework available and optionally with the tkImaging add-on (part of the Python Imaging Library). If this is not the case, it will try to use an external viewer which is able to read PPM files. The default viewer is 'xv', a common image viewer available from ftp://ftp.cis.upenn.edu/pub/xv. If xv is not installed, you will get an error message similar to:

>>> import NumTut

Traceback (innermost last):

[...]

ConfigurationError: PPM image viewer 'xv' not found

You can configure NumTut to use a different image viewer, by typing e.g.:

>>> import NumTut

>>> NumTut.view.PPMVIEWER = 'ppmviewer'

>>> from NumTut import *

>>> view(greece)

If you do not have a PPM image viewer, you can simply ignore the references to the demonstrations which use the view() command later in this document. Using NumPy does not require image display tools, they just make some array operations easier to understand.

4. High-Level Overview

In this chapter, a high-level overview of the extensions is provided, giving the reader the definitions of the key components of the system. This section defines the concepts used by the remaining sections.

Numeric Python consists of a set of modules:

This module defines two new object types, and a set of functions which manipulate these objects, as well as convert between them and other Python types. The objects are the new array object (technically called multiarray objects), and universal functions (technically ufunc objects).

Array Objects

The array objects are generally homogeneous collections of potentially large numbers of numbers. All numbers in a multiarray are the same kind (i.e. number representation, such as double-precision floating point). Array objects must be full (no empty cells are allowed), and their size is immutable. The specific numbers within them can change throughout the life of the array.

Note: In some applications arrays of numbers may contain entries representing invalid or missing values. An optional package "MA" is available to represent such arrays. Attempting to do so by using NaN as a value may lead to disappointment or lack of portability.

Mathematical operations on arrays return new arrays containing the results of these operations performed elementwise on the arguments of the operation.

The size of an array is the total number of elements therein (it can be 0 or more). It does not change throughout the life of the array.

The shape of an array is the number of dimensions of the array and its extent in each of these dimensions (it can be 0, 1 or more). It can change throughout the life of the array. In Python terms, the shape of an array is a tuple of integers, one integer for each dimension that represents the extent in that dimension.

The rank of an array is the number of dimensions along which it is defined. It can change throughout the life of the array. Thus, the rank is the length of the shape.

The typecode of an array is a single character description of the kind of element it contains (number format, character or Python reference). It determines the itemsize of the array.

The itemsize of an array is the number of 8-bit bytes used to store a single element in the array. The total memory used by an array tends to its size times its itemsize, as the size goes to infinity (there is a fixed overhead per array, as well as a fixed overhead per dimension).

To put this in more familiar mathematicial language: A vector is a rank-1 array (it has only one dimension along which it can be indexed). A matrix as used in linear algebra is a rank-2 array (it has two dimensions along which it can be indexed). There are also rank-0 arrays, which can hold single scalars -- they have no dimension along which they can be indexed, but they contain a single number.

Here is an example of Python code using the array objects (bold text refers to user input, non-bold text to computer output):

>>> vector1 = array((1,2,3,4,5))

>>> print vector1

[1 2 3 4 5]

>>> matrix1 = array(([0,1],[1,3]))

>>> print matrix1

[[0 1]

[1 3]]

>>> print vector1.shape, matrix1.shape

(5,) (2,2)

>>> print vector1 + vector1

[ 2 4 6 8 10]]

>>> print matrix1 * matrix1

[[0 1] # note that this is not the matrix

[1 9]] # multiplication of linear algebra

If this example does not work for you because it complains of an unknown name "array", you forgot to begin your session with

from Numeric import *

See Just like all Python modules and packages, the Numeric module can be invoked using either the import Numeric form, or the from Numeric import ... form. Because most of the functions we'll talk about are in the Numeric module, in this document, all of the code samples will assume that they have been preceded by a statement: from Numeric import *.

Universal Functions

Universal functions (ufuncs) are functions which operate on arrays and other sequences. Most ufuncs perform mathematical operations on their arguments, also elementwise.

Here is an example of Python code using the ufunc objects:

>>> print sin([pi/2., pi/4., pi/6.])

[ 1. , 0.70710678, 0.5 ]

>>> print greater([1,2,4,5], [5,4,3,2])

[0 0 1 1]

>>> print add([1,2,4,5], [5,4,3,2])

[6 6 7 7]

>>> print add.reduce([1,2,4,5])

12 # 1 + 2 + 3 + 4 + 5

Ufuncs are covered in detail in Ufuncs.

Convenience Functions

The Numeric module provides, in addition to the functions which are needed to create the objects above, a set of powerful functions to manipulate arrays, select subsets of arrays based on the contents of other arrays, and other array-processing operations.

>>> data = arange(10) # convenient homolog of builtin range()

>>> print data

[0 1 2 3 4 5 6 7 8 9]

>>> print where(greater(data, 5), -1, data)

[ 0 1 2 3 4 5 -1 -1 -1 -1] # selection facility

>>> data = resize(array((0,1)), (9, 9))

>>> print data

[[0 1 0 1 0 1 0 1 0]

[1 0 1 0 1 0 1 0 1]

[0 1 0 1 0 1 0 1 0]

[1 0 1 0 1 0 1 0 1]

[0 1 0 1 0 1 0 1 0]

[1 0 1 0 1 0 1 0 1]

[0 1 0 1 0 1 0 1 0]

[1 0 1 0 1 0 1 0 1]

[0 1 0 1 0 1 0 1 0]]

All of the functions which operate on NumPy arrays are described in Array Functions.

5. Array Basics

This chapter introduces some of the basic functions which will be used throughout the text.

Basics

Before we explore the world of image manipulation as a case-study in array manipulation, we should first define a few terms which we'll use over and over again. Discussions of arrays and matrices and vectors can get confusing due to disagreements on the nomenclature. Here is a brief definition of the terms used in this tutorial, and more or less consistently in the error messages of NumPy.

The python objects under discussion are formally called "multiarray" objects, but informally we'll just call them "array" objects or just "arrays." These are different from the array objects defined in the standard Python array module (which is an older module designed for processing one-dimensional data such as sound files).

These array objects hold their data in a homogeneous block of elements, i.e. their elements all have the same C type (such as a 64-bit floating-point number). This is quite different from most Python container objects, which can contain heterogeneous collections. (You can, however, have an array of Python objects, as discussed later).

Any given array object has a rank, which is the number of "dimensions" or "axes" it has. For example, a point in 3D space [1, 2, 1] is an array of rank 1 - it has one dimension. That dimension has a length of 3.

As another example, the array

1.0 0.0 0.0

0.0 1.0 2.0

is an array of rank 2 (it is 2-dimensional). The first dimension has a length of 2, the second dimension has a length of 3. Because the word "dimension" has many different meanings to different folks, in general the word "axis" will be used instead. Axes are numbered just like Python list indices: they start at 0, and can also be counted from the end, so that axis -1 is the last axis of an array, axis -2 is the penultimate axis, etc.

There are two important and potentially unintuitive behaviors of NumPy arrays which take some getting used to. The first is that by default, operations on arrays are performed element-wise. This means that when adding two arrays, the resulting array has as elements the pairwise sums of the two operand arrays. This is true for all operations, including multiplication. Thus, array multiplication using the * operator will default to element-wise multiplication, not matrix multiplication as used in linear algebra. Many people will want to use arrays as linear algebra-type matrices (including their rank-1 versions, vectors). For those users, the Matrix class provides a more intuitive interface. We defer discussion of the Matrix class until later.

The second behavior which will catch many users by surprise is that functions which return arrays which are simply different views at the same data will in fact share their data. This will be discussed at length when we have more concrete examples of what exactly this means.

Now that all of these definitions and warnings are laid out, let's see what we can do with these arrays.

Creating arrays from scratch
array() and typecodes

There are many ways to create arrays. The most basic one is the use of the array() function:

>>> a = array([1.2, 3.5, -1])

to make sure this worked, do:

>>> print a

[ 1.2 3.5 -1. ]

The array(numbers, typecode=None, savespace=0) function takes three arguments - the first one is the values, which have to be in a Python sequence object (such as a list or a tuple). The optional second argument is the typecode of the elements. If it is omitted, as in the example above, Python tries to find the one type which can represent all the elements. The third is discussed in Saving space.

Since the elements we gave our example were two floats and one integer, it chose `float' as the type of the resulting array. If one specifies the typecode, one can specify unequivocally the type of the elements - this is especially useful when, for example, one wants to make sure that an array contains floats even though in some cases all of its elements are integers:

>>> x,y,z = 1,2,3

>>> a = array([x,y,z]) # integers are enough for 1, 2 and 3

>>> print a

[1 2 3]

>>> a = array([x,y,z], Float) # not the default type

>>> print a

[ 1. 2. 3.]

 

Pop Quiz: What will be the type of an array defined as follows:

>>> mystery = array([1, 2.0, -3j])

 

Hint: -3j is an imaginary number.

Answer: try it out!

A very common mistake is to call array with a set of numbers as arguments, as in array(1,2,3,4,5) . This doesn't produce the expected result as soon as at least two numbers are used, because the first argument to array() must be the entire data for the array -- thus, in most cases, a sequence of numbers. The correct way to write the preceding invocation is most likely array((1,2,3,4,5)) .

 

Possible values for the second argument to the array creator function (and indeed to any function which accepts a so-called typecode for arrays) are:

  1. One type corresponding to single ASCII characters: Character .
  2. One unsigned numeric type: UnsignedInt8 , used to store numbers between 0 and 255.
  3. Many signed numeric types:
  4. Signed integer choices: Int , Int0 , Int8 , Int16 , Int32 , and on some platforms, Int64 and Int128 (their ranges depend on their size).
  5. Floating point choices: Float , Float0 , Float8 , Float16 , Float32 , Float64 , and on some platforms, Float128 .
  6. Complex number choices: Complex , Complex0 , Complex8 , Complex16 , Complex32 , Complex64 , Complex128 .

The meaning of these is as follows:

  • The versions without any numbers ( Int , Float , Complex ) correspond to the int , float and complex datatypes in Python. They are thus long integers and double-precision floating point numbers, with a complex number corresponding to two double-precision floats.
  • The versions with a number following correspond to whatever words are available on the specific platform you are using which have at least that many bits in them. Thus, Int0 corresponds to the smallest integer word size available, Int8 corresponds to the smallest integer word size available which has at least 8 bits, etc. The word sizes for the complex numbers refer to the total number of bits used by both the real and imaginary parts (in other words, the data portion of an array of N Complex128 elements uses up the same amount of memory as the data portions of two arrays of typecode Float64 with 2N elements).
  • One non-numeric type, PyObject . Arrays of typecode PyObject are arrays of Python references, and as such their data area can contain references to any kind of Python objects.

The last typecode deserves a little comment. Indeed, it seems to indicate that arrays can be filled with any Python objects. This appears to violate the notion that arrays are homogeneous. In fact, the typecode PyObject does allow heterogeneous arrays. However, if you plan to do numerical computation, you're much better off with a homogeneous array with a potentially "large" type than with a heterogeneous array. This is because a heterogeneous array stores references to objects, which incurs a memory cost, and because the speed of computation is much slower with arrays of PyObject 's than with uniform number arrays. Why does it exist, then?

A very useful features of arrays is the ability to slice them, dice them, select and choose from them, etc. This feature is so nice that sometimes one wants to do the same operations with, e.g., arrays of class instances. In such cases, computation speed is not as important as convenience. Also, if the array is filled with objects which are instances of classes which define the appropriate methods, then NumPy will let you do math with those objects. For example, if one creates an object class which has an __add__ method, then arrays (created with the PyObject typecode) of instances of such a class can be added together.

Multidimensional Arrays

The following example shows one way of creating multidimensional arrays:

>>> ma = array([[1,2,3],[4,5,6]])

>>> print ma

[[1 2 3]

[4 5 6]]

The first argument to array() in the code above is a single list containing two lists, each containing three elements. If we wanted floats instead, we could specify, as discussed in the previous section, the optional typecode we wished:

>>> ma_floats = array([[1,2,3],[4,5,6]], Float)

>>> print ma_floats

[[ 1. 2. 3.]

[ 4. 5. 6.]]

This array allows us to introduce the notion of `shape'. The shape of an array is the set of numbers which define its dimensions. The shape of the array ma defined above is 2 by 3. More precisely, all arrays have a shape attribute which is a tuple of integers. So, in this case:

>>> print ma.shape

(2, 3)

Using the earlier definitions, this is a shape of rank 2, where the first axis has length 2, and the seond axis has length 3. The rank of an array A is always equal to len(A.shape) .

Note that shape is an attribute of array objects. It is the first of several which we will see throughout this tutorial. If you're not used to object-oriented programming, you can think of attributes as "features" or "qualities" of individual arrays. The relation between an array and its shape is similar to the relation between a person and their hair color. In Python, it's called an object/attribute relation.

What if one wants to change the dimensions of an array? For now, let us consider changing the shape of an array without making it "grow." Say, for example, we want to make the 2x3 array defined above ( ma ) an array of rank 1:

>>> flattened_ma = reshape(ma, (6,))

>>> print flattened_ma

[1 2 3 4 5 6]

One can change the shape of arrays to any shape as long as the product of all the lengths of all the axes is kept constant (in other words, as long as the number of elements in the array doesn't change):

>>> a = array([1,2,3,4,5,6,7,8])

[1 2 3 4 5 6 7 8]

>>> print a

>>> b = reshape(a, (2,4)) # 2*4 == 8

[[1 2 3 4]

[5 6 7 8]]

>>> print b

>>> c = reshape(b, (4,2) # 4*2 == 8

>>> print c

[[1 2]

[3 4]

[5 6]

[7 8]]

Notice that we used a new function, reshape() . It, like array() , is a function defined in the Numeric module. It expects an array as its first argument, and a shape as its second argument. The shape has to be a sequence of integers (a list or a tuple). Keep in mind that a tuple with a single element needs a comma at the end; the right shape tuple for a rank-1 array with 5 elements is (5,) , not (5) .

One nice feature of shape tuples is that one entry in the shape tuple is allowed to be -1 . The -1 will be automatically replaced by whatever number is needed to build a shape which does not change the size of the array. Thus:

>>> a = reshape(array(range(25)), (5,-1))

>>> print a, a.shape

[[ 0 1 2 3 4]

[ 5 6 7 8 9]

[10 11 12 13 14]

[15 16 17 18 19]

[20 21 22 23 24]] (5, 5)

The shape of an array is a modifiable attribute of the array. You can therefore change the shape of an array simply by assigning a new shape to it:

>>> a = array([1,2,3,4,5,6,7,8,9,10])

>>> a.shape

(10,)

>>> a.shape = (2,5)

>>> print a

[[ 1 2 3 4 5]

[ 6 7 8 9 10]]

>>> a.shape = (10,1) # second axis has length 1

>>> print a

[[ 1]

[ 2]

[ 3]

[ 4]

[ 5]

[ 6]

[ 7]

[ 8]

[ 9]

[10]]

>>> a.shape = (5,-1) # note the -1 trick described above

>>> print a

[[ 1 2]

[ 3 4]

[ 5 6]

[ 7 8]

[ 9 10]]

As in the rest of Python, violating rules (such as the one about which shapes are allowed) results in exceptions:

>>> a.shape = (6,-1)

Traceback (innermost last):

File "<stdin>", line 1, in ?

ValueError: total size of new array must be unchanged

 

The default printing routine provided by the Numeric module prints arrays as follows:

  1. The last axis is always printed left to right
  2. The next-to-last axis is printed top to bottom

The remaining axes are printed top to bottom with increasing numbers of separators.

This explains why rank-1 arrays are printed from left to right, rank-2 arrays have the first dimension going down the screen and the second dimension going from left to right, etc.

 

If you want to change the shape of an array so that it has more elements than it started with (i.e. grow it), then you have many options: One solution is to use the concat() method discussed later. An alternative is to use the array() creator function with existing arrays as arguments:

>>> print a

[0 1 2 3 4 5 6 6 7]

>>> b = array([a,a])

>>> print b

[[1 2 3 4 5 6 7 8]

[1 2 3 4 5 6 7 8]]

>>> print b.shape

(2, 8)

resize

A final possibility is the resize() function, which takes a "base" array as its first argument and the desired shape as the second argument. Unlike reshape() , the shape argument to resize() can corresponds to a smaller or larger shape than the input array. Smaller shapes will result in arrays with the data at the "beginning" of the input array, and larger shapes result in arrays with data containing as many replications of the input array as are needed to fill the shape. For example, starting with a simple array

>>> base = array([0,1])

one can quickly build a large array with replicated data:

>>> big = resize(base, (9,9))

>>> print big

[[0 1 0 1 0 1 0 1 0]

[1 0 1 0 1 0 1 0 1]

[0 1 0 1 0 1 0 1 0]

[1 0 1 0 1 0 1 0 1]

[0 1 0 1 0 1 0 1 0]

[1 0 1 0 1 0 1 0 1]

[0 1 0 1 0 1 0 1 0]

[1 0 1 0 1 0 1 0 1]

[0 1 0 1 0 1 0 1 0]]

and if you imported the view function from the NumTut package, you can do:

>>> view(resize(base, (100,100)))

# grey grid of horizontal lines is shown

>>> view(resize(base, (101,101)))

# grey grid of alternating black and white pixels is shown

 

Sections denoted "For Advanced Users" will be used to indicate aspects of the functions which may not be needed for a first introduction at NumPy, but which should be mentioned for the sake of completeness.

The array constructor takes a mandatory data argument, an optional typecode, and optional savespace argument, and an optional copy argument. If the data argument is a sequence, then array creates a new object of type multiarray, and fills the array with the elements of the data object. The shape of the array is determined by the size and nesting arrangement of the elements of data.

If data is not a sequence, then the array returned is an array of shape () (the empty tuple), of typecode 'O' , containing a single element, which is data .

 
Creating arrays with values specified `on-the-fly'
zeros() and ones()

Often, one needs to manipulate arrays filled with numbers which aren't available beforehand. The Numeric module provides a few functions which create arrays from scratch:

zeros() and ones() simply create arrays of a given shape filled with zeros and ones respectively:

>>> z = zeros((3,3))

>>> print z

[[0 0 0]

[0 0 0]

[0 0 0]]

>>> o = ones([2,3])

>>> print o

[[1 1 1]

[1 1 1]]

Note that the first argument is a shape - it needs to be a list or a tuple of integers. Also note that the default type for the returned arrays is Int , which you can feel free to override using something like:

>>> o = ones((2,3), Float)

>>> print o

[[ 1. 1. 1.]

[ 1. 1. 1.]]

arrayrange()

The arrayrange() function is similar to the range() function in Python, except that it returns an array as opposed to a list.

>>> r = arrayrange(10)

>>> print r

[0 1 2 3 4 5 6 7 8 9]

Combining the arrayrange() with the reshape() function, we can get:

>>> big = reshape(arrayrange(100),(10,10))

>>> print big
[[ 0 1 2 3 4 5 6 7 8 9]

[10 11 12 13 14 15 16 17 18 19]

[20 21 22 23 24 25 26 27 28 29]

[30 31 32 33 34 35 36 37 38 39]

[40 41 42 43 44 45 46 47 48 49]

[50 51 52 53 54 55 56 57 58 59]

[60 61 62 63 64 65 66 67 68 69]

[70 71 72 73 74 75 76 77 78 79]

[80 81 82 83 84 85 86 87 88 89]

[90 91 92 93 94 95 96 97 98 99]]

>>> view(reshape(arrayrange(10000),(100,100)))

# array of increasing lightness from top down (slowly) and from left to

# right (faster) is shown

arange() is a shorthand for arrayrange() .

One can set the start, stop and step arguments, which allows for more varied ranges:

>>> print arrayrange(10,-10,-2)

[10 8 6 4 2 0 -2 -4 -6 -8]

An important feature of arrayrange is that it can be used with non-integer starting points and strides:

>>> print arrayrange(5.0)

[ 0. 1. 2. 3. 4.]

>>> print arrayrange(0, 1, .2)

[ 0. 0.2 0.4 0.6 0.8]

If you want to create an array with just one value, repeated over and over, you can use the * operator applied to lists

>>> a = array([[3]*5]*5)

>>> print a

[[3 3 3 3 3]

[3 3 3 3 3]

[3 3 3 3 3]

[3 3 3 3 3]

[3 3 3 3 3]]

but that is relatively slow, since the duplication is done on Python lists. A quicker way would be to start with 0's and add 3:

>>> a = zeros([5,5]) + 3

>>> print a

[[3 3 3 3 3]

[3 3 3 3 3]

[3 3 3 3 3]

[3 3 3 3 3]

[3 3 3 3 3]]

The optional typecode argument can force the typecode of the resulting array, which is otherwise the "highest" of the starting and stopping arguments. The starting argument defaults to 0 if not specified. Note that if a typecode is specified which is "lower" than that which arrayrange would normally use, the array is the result of a precision-losing cast (a round-down, as that used in the astype method for arrays.)

Creating an array from a function: fromfunction()

Finally, one may want to create an array with contents which are the result of a function evaluation. This is done using the fromfunction() function, which takes two arguments, a shape and a callable object (usually a function). For example:

>>> def dist(x,y):

... return (x-5)**2+(y-5)**2 # distance from point (5,5) squared

...

>>> m = fromfunction(dist, (10,10))

>>> print m

[[50 41 34 29 26 25 26 29 34 41]

[41 32 25 20 17 16 17 20 25 32]

[34 25 18 13 10 9 10 13 18 25]

[29 20 13 8 5 4 5 8 13 20]

[26 17 10 5 2 1 2 5 10 17]

[25 16 9 4 1 0 1 4 9 16]

[26 17 10 5 2 1 2 5 10 17]

[29 20 13 8 5 4 5 8 13 20]

[34 25 18 13 10 9 10 13 18 25]

[41 32 25 20 17 16 17 20 25 32]]

>>> view(fromfunction(dist, (100,100))

# shows image which is dark in topleft corner, and lighter away from it.

>>> m = fromfunction(lambda i,j,k: 100*(i+1)+10*(j+1)+(k+1), (4,2,3))

>>> print m

[[[111 112 113]

[121 122 123]]

[[211 212 213]

[221 222 223]]

[[311 312 313]

[321 322 323]]

[[411 412 413]

[421 422 423]]]

By examining the above examples, one can see that fromfunction() creates an array of the shape specified by its second argument, and with the contents corresponding to the value of the function argument (the first argument) evaluated at the indices of the array. Thus the value of m[3,4] in the first example above is the value of dist when x=3 and y=4 . Similarly for the lambda function in the second example, but with a rank-3 array.

The implementation of fromfunction consists of:

def fromfunction(function, dimensions):

return apply(function, tuple(indices(dimensions)))

which means that the function function is called for each element in the sequence indices(dimensions). As described in the definition of indices, this consists of arrays of indices which will be of rank one less than that specified by dimensions. This means that the function argument must accept the same number of arguments as there are dimensions in dimensions, and that each argument will be an array of the same shape as that specified by dimensions. Furthermore, the array which is passed as the first argument corresponds to the indices of each element in the resulting array along the first axis, that which is passed as the second argument corresponds to the indices of each element in the resulting array along the second axis, etc. A consequence of this is that the function which is used with fromfunction will work as expected only if it performs a separable computation on its arguments, and expects its arguments to be indices along each axis. Thus, no logical operation on the arguments can be performed, or any non-shape preserving operation. The first example below satisfies these requirements, hence works (the x and y arrays both get 10x10 arrays as input corresponding to the values of the indices along the two dimensions), while the second array attemps to do a comparison test on an array of indices, which fails.

>>> def buggy(test):

... if test > 4: return 1

... else: return 0

...

>>> print fromfunction(buggy, (10,))

Traceback (innermost last):

File "<stdin>", line 1, in ?

File "C:\PYTHON\LIB\Numeric.py", line 157, in fromfunction

return apply(function, tuple(indices(dimensions)))

File "<stdin>", line 2, in buggy

TypeError: Comparison of multiarray objects is not implemented.

identity()

The simplest array constructor is the identity(n) function, which takes a single integer argument and returns a square identity array of that size of integers:

>>> print identity(5)

[[1 0 0 0 0]

[0 1 0 0 0]

[0 0 1 0 0]

[0 0 0 1 0]

[0 0 0 0 1]]

>>> view(identity(100))

# shows black square with a single white diagonal

Coercion and Casting

We've mentioned the typecodes of arrays, and how to create arrays with the right typecode, but we haven't covered what happens when arrays with different typecodes interact.

Automatic Coercions and Binary Operations

The rules followed by NumPy when performing binary operations on arrays mirror those used by Python in general. Operations between numeric and non-numeric types are not allowed (e.g. an array of characters can't be added to an array of numbers), and operations between mixed number types (e.g. floats and integers, floats and omplex numbers, or in the case of NumPy, operations between any two arrays with different numeric typecodes) first perform a coercion of the 'smaller' numeric type to the type of the `larger' numeric type. Finally, when scalars and arrays are operated on together, the scalar is converted to a rank-0 array first. Thus, adding a "small" integer to a "large" floating point array is equivalent to first casting the integer "up" to the typecode of the array:

>>> arange(0, 1.0, .1) + 12

array([ 12. , 12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.7, 12.8, 12.9])

The automatic coercions are described in Figure 1. Avoiding upcasting is discussed in Saving space.

 

Up-casts are indicated with arrows. Down-casts are allowed by the astype() method, but may result in loss of information.
Deliberate up-casting: The asarray function

One more array constructor is the asarray() function. It is used if you want to have an array of a specific typecode and you don't know what typecode array you have (for example, in a generic function which can operate on all kinds of arrays, but needs them to be converted to complex arrays). If the array it gets as an argument is of the right typecode, it will get sent back unchanged. If the array is not of the right typecode, each element of the new array will be the result of the coercion to the new type of the old elements. asarray() will refuse to operate if there might be loss of information -- in other words, asarray() only casts 'up'.

asarray is also used when you have a function that operates on arrays, but you want to allow people to call it with an arbitrary python sequence object. This gives your function a behavior similar to that of most of the builtin functions that operate on arrays.

The typecode value table

The typecodes identifiers ( Float0 , etc.) have as values single-character strings. The mapping between typecode and character strings is machine dependent. An example of the correspondences between typecode characters and the typecode identifiers for 32-bit architectures are shown in Table 3-X.

Typecode character/identifier table on a Pentium computer

Character

# of bytes

# of bits

Identifiers

D

16

128

Complex, Complex64

F

8

64

Complex0, Complex16, Complex32, Complex8

d

8

64

Float, Float64

f

4

32

Float0, Float16, Float32, Float8

l

4

32

Int

1

1

8

Int0, Int8

s

2

16

Int16

i

4

32

Int32

Consequences of silent upcasting

When dealing with very large arrays of floats and if precision is not important (or arrays of small integers), then it may be worthwhile to cast the arrays to "small" typecodes, such as Int8 , Int16 or Float32 . As the standard Python integers and floats correspond to the typecodes Int32 and Float64 , using them in apparently "innocent" ways will result in up-casting, which may null the benefit of the use of small typecode arrays. For example:

>>> mylargearray.typecode()

'f' # a.k.a. Float32 on a Pentium

>>> mylargearray.itemsize()

4

>>> mylargearray = mylargearray + 1 # 1 is an Int64 on a Pentium

>>> mylargearray.typecode() # see Fig. 1 for explanation.

'd'

>>> mylargearray.itemsize()

8

Note that the sizes returned by the itemsize() method are expressed in bytes.

Saving space

Numeric arrays can be created using an optional, keyworded argument to the constructor, savespace. If savespace is set to 1, Numeric will attempt to avoid the silent upcasting behavior. The status of an array can be queried with the spacesaver() method. If x.spacesaver() is true, x has its space-saving flag set. The flag can be set with the savespace method: x.savespace(1) to set it, x.savespace(0) to clear it.

Deliberate casts (potentially down): the astype method

You may also force NumPy to cast any number array to another number array. For example, to take an array of any numeric type (IntX or FloatX or ComplexX or UnsignedInt8) and convert it to a 64-bit float, one can do:

>>> floatarray = otherarray.astype(Float64)

The typecode can be any of the number typecodes, "larger" or "smaller". If it is larger, this is a cast-up, as if asarray() had been used. If it is smaller, the standard casting rules of the underlying language (C) are used, which means that truncation or loss of precision can occur:

>>> print x

[ 0. 0.4 0.8 1.2 1.6]

>>> x.astype(Int)

array([0, 0, 0, 1, 1])

If the typecode used with astype() is the original array's typecode, then a copy of the original array is returned.

Operating on Arrays
Simple operations

If you have a keen eye, you have noticed that some of the previous examples did something new. It added a number to an array. Indeed, most Python operations applicable to numbers are directly applicable to arrays:

>>> print a

[1 2 3]

>>> print a * 3

[3 6 9]

>>> print a + 3

[4 5 6]

Note that the mathematical operators behave differently depending on the types of their operands. When one of the operands is an array and the other is a number, the number is added to all the elements of the array and the resulting array is returned. This is called broadcasting . This also occurs for unary mathematical operations such as sin and the negative sign

>>> print sin(a)

[ 0.84147098 0.90929743 0.14112001]

>>> print -a

[-1 -2 -3]

When both elements are arrays with the same shape, then a new array is created, where each element is the sum of the corresponding elements in the original arrays:

>>> print a + a

[2 4 6]

If the operands of operations such as addition are arrays which have the same rank but different non-integer dimensions, then an exception is generated:

>>> print a

[1 2 3]

>>> b = array([4,5,6,7]) # note this has four elements

>>> print a + b

Traceback (innermost last):

File ``<stdin>``, line 1, in ?

ArrayError: frames are not aligned