8. Python cho lập trình tính toán khoa học

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.” – Donald Knuth

8.1. Tổng quan

Python là một ngôn ngữ cực kỳ thông dụng dành cho khoa học tính toán bởi vì một số ưu thế sau:

  • sự linh hoạt và dễ tiếp cận,

  • có sẵn một số lượng lớn thư viện tính toán với chất lượng cao,

  • ngôn ngữ lập trình và các thư viện hầu như đều là mã nguồn mỡ,

  • Anaconda là một bản phân phối phổ biến của Python, giúp đơn giản hoá việc cài đặt và quản lý các thư viện,

  • Python nhận được sự quan tâm ngày càng lớn khi được dùng trong học máy và trí tuệ nhân tạo.

Trong bài giảng này, chúng tôi đưa ra một giới thiệu tổng quan về lập trình Python dành cho khoa học tính toán để giúp trả lời những câu hỏi sau:

  • Điểm mạnh và điểm yếu của Python trong lập trình tính toán khoa học là gì?

  • Thành phần chính của Python dành cho lập trình tính toán khoa học là gì?

  • How is the situation changing over time?

Ngoài những gì có sẵn trong Anaconda, chúng ta cần cài đặt thêm thư viện quantecon bằng lệnh sau:

!conda install -y quantecon
Collecting package metadata (current_repodata.json): - 
\ 
| 
/ 
- 
\ 
| 
/ 
- 
\ 
| 
/ 
- 
\ 
done
Solving environment: / 
- 
\ 
| 
/ 
- 
\ 
| 
/ 
- 
\ 
| 
/ 
- 
\ 
| 
/ 
- 
\ 
| 
/ 
done
# All requested packages already installed.

8.2. Thư viện khoa học

Chúng ta hãy xem xét sơ lượt các thư viện khoa học của Python, bắt đầu bằng việc trả lời câu hỏi tại sao chúng lại cần thiết.

8.2.1. Vai trò của thư viện tính toán khoa học

Một lí do rõ ràng cho việc sử dụng những thư viện dạng này là vì chúng cung cấp sẵn hàm để giải quyết các tác vụ chúng ta cần.

Ví dụ như sẽ dễ dàng hơn nhiều nếu chúng ta sử dụng hàm tìm căn bậc hai được cung cấp sẵn thay vì tìm cách tự cài đặt nó từ đầu.

(Đối với những thuật toán thông thường, hiệu quả của thuật toán được tối đa hóa nếu cộng đồng có thể đóng góp trong những thành phần chính của thuật toán, viết bởi chuyên gia và tinh chỉnh bởi người dùng là cách tốt nhất để có một thư viện vừa nhanh vừa mạnh mẽ.)

Nhưng đó không phải là lí do duy nhất khiến chúng ta sử dụng thư viện tính toán khoa học của Python.

Một lí do khác là vì ngôn ngữ Python nguyên thuỷ mặc dù linh hoạt và trong sáng nhưng lại không nhanh.

Vì thế ta cần những thư viện được thiết kế để tăng tốc thời gian thực thi mã nguồn Python.

Như chúng ta có thể thấy ở dưới, những thư viện Python hiện nay có khả năng làm điều đó cực kỳ tốt.

8.2.2. Hệ sinh thái lập trình tính toán khoa học của Python

Trong khía cạnh của sự phổ biến, “big four” những thư viện tính toán khoa học của Python trên thế giới chính là

  • NumPy

  • SciPy

  • Matplotlib

  • Pandas

Với chúng ta, có một thư viện (mới gần đây) cũng sẽ được xem là thiết yếu cho tính toán số học:

  • Numba

Over the next few lectures we’ll see how to use these libraries.

But first, let’s quickly review how they fit together.

  • NumPy forms the foundations by providing a basic array data type (think of vectors and matrices) and functions for acting on these arrays (e.g., matrix multiplication).

  • SciPy builds on NumPy by adding the kinds of numerical methods that are routinely used in science (interpolation, optimization, root finding, etc.).

  • Matplotlib is used to generate figures, with a focus on plotting data stored in NumPy arrays.

  • Pandas provides types and functions for empirical work (e.g., manipulating data).

  • Numba accelerates execution via JIT compilation — we’ll learn about this soon.

8.3. The Need for Speed

Now let’s discuss execution speed.

Higher-level languages like Python are optimized for humans.

This means that the programmer can leave many details to the runtime environment

  • specifying variable types

  • memory allocation/deallocation, etc.

The upside is that, compared to low-level languages, Python is typically faster to write, less error-prone and easier to debug.

The downside is that Python is harder to optimize — that is, turn into fast machine code — than languages like C or Fortran.

Indeed, the standard implementation of Python (called CPython) cannot match the speed of compiled languages such as C or Fortran.

Does that mean that we should just switch to C or Fortran for everything?

The answer is: No, no and one hundred times no!

(This is what you should say to the senior professor insisting that the model needs to be rewritten in Fortran or C++.)

There are two reasons why:

First, for any given program, relatively few lines are ever going to be time-critical.

Hence it is far more efficient to write most of our code in a high productivity language like Python.

Second, even for those lines of code that are time-critical, we can now achieve the same speed as C or Fortran using Python’s scientific libraries.

8.3.1. Where are the Bottlenecks?

Before we learn how to do this, let’s try to understand why plain vanilla Python is slower than C or Fortran.

This will, in turn, help us figure out how to speed things up.

8.3.1.1. Dynamic Typing

Consider this Python operation

a, b = 10, 10
a + b
20

Even for this simple operation, the Python interpreter has a fair bit of work to do.

For example, in the statement a + b, the interpreter has to know which operation to invoke.

If a and b are strings, then a + b requires string concatenation

a, b = 'foo', 'bar'
a + b
'foobar'

If a and b are lists, then a + b requires list concatenation

a, b = ['foo'], ['bar']
a + b
['foo', 'bar']

(We say that the operator + is overloaded — its action depends on the type of the objects on which it acts)

As a result, Python must check the type of the objects and then call the correct operation.

This involves substantial overheads.

8.3.1.2. Static Types

Compiled languages avoid these overheads with explicit, static types.

For example, consider the following C code, which sums the integers from 1 to 10

#include <stdio.h>

int main(void) {
    int i;
    int sum = 0;
    for (i = 1; i <= 10; i++) {
        sum = sum + i;
    }
    printf("sum = %d\n", sum);
    return 0;
}

The variables i and sum are explicitly declared to be integers.

Hence, the meaning of addition here is completely unambiguous.

8.3.2. Data Access

Another drag on speed for high-level languages is data access.

To illustrate, let’s consider the problem of summing some data — say, a collection of integers.

8.3.2.1. Summing with Compiled Code

In C or Fortran, these integers would typically be stored in an array, which is a simple data structure for storing homogeneous data.

Such an array is stored in a single contiguous block of memory

  • In modern computers, memory addresses are allocated to each byte (one byte = 8 bits).

  • For example, a 64 bit integer is stored in 8 bytes of memory.

  • An array of \(n\) such integers occupies \(8n\) consecutive memory slots.

Moreover, the compiler is made aware of the data type by the programmer.

  • In this case 64 bit integers

Hence, each successive data point can be accessed by shifting forward in memory space by a known and fixed amount.

  • In this case 8 bytes

8.3.2.2. Summing in Pure Python

Python tries to replicate these ideas to some degree.

For example, in the standard Python implementation (CPython), list elements are placed in memory locations that are in a sense contiguous.

However, these list elements are more like pointers to data rather than actual data.

Hence, there is still overhead involved in accessing the data values themselves.

This is a considerable drag on speed.

In fact, it’s generally true that memory traffic is a major culprit when it comes to slow execution.

Let’s look at some ways around these problems.

8.4. Vectorization

There is a clever method called vectorization that can be used to speed up high level languages in numerical applications.

The key idea is to send array processing operations in batch to pre-compiled and efficient native machine code.

The machine code itself is typically compiled from carefully optimized C or Fortran.

For example, when working in a high level language, the operation of inverting a large matrix can be subcontracted to efficient machine code that is pre-compiled for this purpose and supplied to users as part of a package.

This clever idea dates back to MATLAB, which uses vectorization extensively.

Vectorization can greatly accelerate many numerical computations (but not all, as we shall see).

Let’s see how vectorization works in Python, using NumPy.

8.4.1. Operations on Arrays

First, let’s run some imports

import random
import numpy as np
import quantecon as qe

Next let’s try some non-vectorized code, which uses a native Python loop to generate, square and then sum a large number of random variables:

n = 1_000_000
%%time

y = 0      # Will accumulate and store sum
for i in range(n):
    x = random.uniform(0, 1)
    y += x**2
CPU times: user 529 ms, sys: 61 µs, total: 529 ms
Wall time: 529 ms

The following vectorized code achieves the same thing.

%%time

x = np.random.uniform(0, 1, n)
y = np.sum(x**2)
CPU times: user 7.06 ms, sys: 11.8 ms, total: 18.8 ms
Wall time: 18.6 ms

As you can see, the second code block runs much faster. Why?

The second code block breaks the loop down into three basic operations

  1. draw n uniforms

  2. square them

  3. sum them

These are sent as batch operators to optimized machine code.

Apart from minor overheads associated with sending data back and forth, the result is C or Fortran-like speed.

When we run batch operations on arrays like this, we say that the code is vectorized.

Vectorized code is typically fast and efficient.

It is also surprisingly flexible, in the sense that many operations can be vectorized.

The next section illustrates this point.

8.4.2. Universal Functions

Many functions provided by NumPy are so-called universal functions — also called ufuncs.

This means that they

  • map scalars into scalars, as expected

  • map arrays into arrays, acting element-wise

For example, np.cos is a ufunc:

np.cos(1.0)
0.5403023058681398
np.cos(np.linspace(0, 1, 3))
array([1.        , 0.87758256, 0.54030231])

By exploiting ufuncs, many operations can be vectorized.

For example, consider the problem of maximizing a function \(f\) of two variables \((x,y)\) over the square \([-a, a] \times [-a, a]\).

For \(f\) and \(a\) let’s choose

\[ f(x,y) = \frac{\cos(x^2 + y^2)}{1 + x^2 + y^2} \quad \text{and} \quad a = 3 \]

Here’s a plot of \(f\)

import matplotlib.pyplot as plt
%matplotlib inline
from mpl_toolkits.mplot3d.axes3d import Axes3D
from matplotlib import cm

def f(x, y):
    return np.cos(x**2 + y**2) / (1 + x**2 + y**2)

xgrid = np.linspace(-3, 3, 50)
ygrid = xgrid
x, y = np.meshgrid(xgrid, ygrid)

fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(x,
                y,
                f(x, y),
                rstride=2, cstride=2,
                cmap=cm.jet,
                alpha=0.7,
                linewidth=0.25)
ax.set_zlim(-0.5, 1.0)
ax.set_xlabel('$x$', fontsize=14)
ax.set_ylabel('$y$', fontsize=14)
plt.show()
_images/need_for_speed_19_0.png

To maximize it, we’re going to use a naive grid search:

  1. Evaluate \(f\) for all \((x,y)\) in a grid on the square.

  2. Return the maximum of observed values.

The grid will be

grid = np.linspace(-3, 3, 1000)

Here’s a non-vectorized version that uses Python loops.

%%time

m = -np.inf

for x in grid:
    for y in grid:
        z = f(x, y)
        if z > m:
            m = z
CPU times: user 4.83 s, sys: 7.88 ms, total: 4.84 s
Wall time: 4.77 s

And here’s a vectorized version

%%time

x, y = np.meshgrid(grid, grid)
np.max(f(x, y))
CPU times: user 30.9 ms, sys: 16 ms, total: 46.9 ms
Wall time: 46.7 ms
0.9999819641085747

In the vectorized version, all the looping takes place in compiled code.

As you can see, the second version is much faster.

(We’ll make it even faster again later on, using more scientific programming tricks.)

8.5. Beyond Vectorization

At its best, vectorization yields fast, simple code.

However, it’s not without disadvantages.

One issue is that it can be highly memory-intensive.

For example, the vectorized maximization routine above is far more memory intensive than the non-vectorized version that preceded it.

This is because vectorization tends to create many intermediate arrays before producing the final calculation.

Another issue is that not all algorithms can be vectorized.

In these kinds of settings, we need to go back to loops.

Fortunately, there are alternative ways to speed up Python loops that work in almost any setting.

For example, in the last few years, a new Python library called Numba has appeared that solves the main problems with vectorization listed above.

It does so through something called just in time (JIT) compilation, which can generate extremely fast and efficient code.

We’ll learn how to use Numba soon.