Lập trình Python cho kinh tết và tài chính
More Language Features
16. More Language Features¶
Contents
16.1. Overview¶
With this last lecture, our advice is to skip it on first pass, unless you have a burning desire to read it.
It’s here
as a reference, so we can link back to it when required, and
for those who have worked through a number of applications, and now want to learn more about the Python language
A variety of topics are treated in the lecture, including generators, exceptions and descriptors.
16.2. Iterables and Iterators¶
We’ve already said something about iterating in Python.
Now let’s look more closely at how it all works, focusing in Python’s implementation of the for
loop.
16.2.1. Iterators¶
Iterators are a uniform interface to stepping through elements in a collection.
Here we’ll talk about using iterators—later we’ll learn how to build our own.
Formally, an iterator is an object with a __next__
method.
For example, file objects are iterators .
To see this, let’s have another look at the US cities data, which is written to the present working directory in the following cell
%%file us_cities.txt
new york: 8244910
los angeles: 3819702
chicago: 2707120
houston: 2145146
philadelphia: 1536471
phoenix: 1469471
san antonio: 1359758
san diego: 1326179
dallas: 1223229
Writing us_cities.txt
f = open('us_cities.txt')
f.__next__()
'new york: 8244910\n'
f.__next__()
'los angeles: 3819702\n'
We see that file objects do indeed have a __next__
method, and that calling this method returns the next line in the file.
The next method can also be accessed via the builtin function next()
,
which directly calls this method
next(f)
'chicago: 2707120\n'
The objects returned by enumerate()
are also iterators
e = enumerate(['foo', 'bar'])
next(e)
(0, 'foo')
next(e)
(1, 'bar')
as are the reader objects from the csv
module .
Let’s create a small csv file that contains data from the NIKKEI index
%%file test_table.csv
Date,Open,High,Low,Close,Volume,Adj Close
2009-05-21,9280.35,9286.35,9189.92,9264.15,133200,9264.15
2009-05-20,9372.72,9399.40,9311.61,9344.64,143200,9344.64
2009-05-19,9172.56,9326.75,9166.97,9290.29,167000,9290.29
2009-05-18,9167.05,9167.82,8997.74,9038.69,147800,9038.69
2009-05-15,9150.21,9272.08,9140.90,9265.02,172000,9265.02
2009-05-14,9212.30,9223.77,9052.41,9093.73,169400,9093.73
2009-05-13,9305.79,9379.47,9278.89,9340.49,176000,9340.49
2009-05-12,9358.25,9389.61,9298.61,9298.61,188400,9298.61
2009-05-11,9460.72,9503.91,9342.75,9451.98,230800,9451.98
2009-05-08,9351.40,9464.43,9349.57,9432.83,220200,9432.83
Writing test_table.csv
from csv import reader
f = open('test_table.csv', 'r')
nikkei_data = reader(f)
next(nikkei_data)
['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']
next(nikkei_data)
['2009-05-21', '9280.35', '9286.35', '9189.92', '9264.15', '133200', '9264.15']
16.2.2. Iterators in For Loops¶
All iterators can be placed to the right of the in
keyword in for
loop statements.
In fact this is how the for
loop works: If we write
for x in iterator:
<code block>
then the interpreter
calls
iterator.___next___()
and bindsx
to the resultexecutes the code block
repeats until a
StopIteration
error occurs
So now you know how this magical looking syntax works
f = open('somefile.txt', 'r')
for line in f:
# do something
The interpreter just keeps
calling
f.__next__()
and bindingline
to the resultexecuting the body of the loop
This continues until a StopIteration
error occurs.
16.2.3. Iterables¶
You already know that we can put a Python list to the right of in
in a for
loop
for i in ['spam', 'eggs']:
print(i)
spam
eggs
So does that mean that a list is an iterator?
The answer is no
x = ['foo', 'bar']
type(x)
list
next(x)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-12-92de4e9f6b1e> in <module>
----> 1 next(x)
TypeError: 'list' object is not an iterator
So why can we iterate over a list in a for
loop?
The reason is that a list is iterable (as opposed to an iterator).
Formally, an object is iterable if it can be converted to an iterator using the built-in function iter()
.
Lists are one such object
x = ['foo', 'bar']
type(x)
list
y = iter(x)
type(y)
list_iterator
next(y)
'foo'
next(y)
'bar'
next(y)
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
<ipython-input-17-81b9d2f0f16a> in <module>
----> 1 next(y)
StopIteration:
Many other objects are iterable, such as dictionaries and tuples.
Of course, not all objects are iterable
iter(42)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-18-ef50b48e4398> in <module>
----> 1 iter(42)
TypeError: 'int' object is not iterable
To conclude our discussion of for
loops
for
loops work on either iterators or iterables.In the second case, the iterable is converted into an iterator before the loop starts.
16.2.4. Iterators and built-ins¶
Some built-in functions that act on sequences also work with iterables
max()
,min()
,sum()
,all()
,any()
For example
x = [10, -10]
max(x)
10
y = iter(x)
type(y)
list_iterator
max(y)
10
One thing to remember about iterators is that they are depleted by use
x = [10, -10]
y = iter(x)
max(y)
10
max(y)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-23-062424e6ec08> in <module>
----> 1 max(y)
ValueError: max() arg is an empty sequence
16.3. Names and Name Resolution¶
16.3.1. Variable Names in Python¶
Consider the Python statement
x = 42
We now know that when this statement is executed, Python creates an object of
type int
in your computer’s memory, containing
the value
42
some associated attributes
But what is x
itself?
In Python, x
is called a name, and the statement x = 42
binds the name x
to the integer object we have just discussed.
Under the hood, this process of binding names to objects is implemented as a dictionary—more about this in a moment.
There is no problem binding two or more names to the one object, regardless of what that object is
def f(string): # Create a function called f
print(string) # that prints any string it's passed
g = f
id(g) == id(f)
True
g('test')
test
In the first step, a function object is created, and the name f
is bound to it.
After binding the name g
to the same object, we can use it anywhere we would use f
.
What happens when the number of names bound to an object goes to zero?
Here’s an example of this situation, where the name x
is first bound to one object and then rebound to another
x = 'foo'
id(x)
139985574198576
x = 'bar' # No names bound to the first object
What happens here is that the first object is garbage collected.
In other words, the memory slot that stores that object is deallocated, and returned to the operating system.
16.3.2. Namespaces¶
Recall from the preceding discussion that the statement
x = 42
binds the name x
to the integer object on the right-hand side.
We also mentioned that this process of binding x
to the correct object is implemented as a dictionary.
This dictionary is called a namespace.
Definition: A namespace is a symbol table that maps names to objects in memory.
Python uses multiple namespaces, creating them on the fly as necessary .
For example, every time we import a module, Python creates a namespace for that module.
To see this in action, suppose we write a script math2.py
with a single line
%%file math2.py
pi = 'foobar'
Writing math2.py
Now we start the Python interpreter and import it
import math2
Next let’s import the math
module from the standard library
import math
Both of these modules have an attribute called pi
math.pi
3.141592653589793
math2.pi
'foobar'
These two different bindings of pi
exist in different namespaces, each one implemented as a dictionary.
We can look at the dictionary directly, using module_name.__dict__
import math
math.__dict__.items()
dict_items([('__name__', 'math'), ('__doc__', 'This module provides access to the mathematical functions\ndefined by the C standard.'), ('__package__', ''), ('__loader__', <_frozen_importlib_external.ExtensionFileLoader object at 0x7f50eea771f0>), ('__spec__', ModuleSpec(name='math', loader=<_frozen_importlib_external.ExtensionFileLoader object at 0x7f50eea771f0>, origin='/usr/share/miniconda3/envs/pyecon/lib/python3.8/lib-dynload/math.cpython-38-x86_64-linux-gnu.so')), ('acos', <built-in function acos>), ('acosh', <built-in function acosh>), ('asin', <built-in function asin>), ('asinh', <built-in function asinh>), ('atan', <built-in function atan>), ('atan2', <built-in function atan2>), ('atanh', <built-in function atanh>), ('ceil', <built-in function ceil>), ('copysign', <built-in function copysign>), ('cos', <built-in function cos>), ('cosh', <built-in function cosh>), ('degrees', <built-in function degrees>), ('dist', <built-in function dist>), ('erf', <built-in function erf>), ('erfc', <built-in function erfc>), ('exp', <built-in function exp>), ('expm1', <built-in function expm1>), ('fabs', <built-in function fabs>), ('factorial', <built-in function factorial>), ('floor', <built-in function floor>), ('fmod', <built-in function fmod>), ('frexp', <built-in function frexp>), ('fsum', <built-in function fsum>), ('gamma', <built-in function gamma>), ('gcd', <built-in function gcd>), ('hypot', <built-in function hypot>), ('isclose', <built-in function isclose>), ('isfinite', <built-in function isfinite>), ('isinf', <built-in function isinf>), ('isnan', <built-in function isnan>), ('isqrt', <built-in function isqrt>), ('ldexp', <built-in function ldexp>), ('lgamma', <built-in function lgamma>), ('log', <built-in function log>), ('log1p', <built-in function log1p>), ('log10', <built-in function log10>), ('log2', <built-in function log2>), ('modf', <built-in function modf>), ('pow', <built-in function pow>), ('radians', <built-in function radians>), ('remainder', <built-in function remainder>), ('sin', <built-in function sin>), ('sinh', <built-in function sinh>), ('sqrt', <built-in function sqrt>), ('tan', <built-in function tan>), ('tanh', <built-in function tanh>), ('trunc', <built-in function trunc>), ('prod', <built-in function prod>), ('perm', <built-in function perm>), ('comb', <built-in function comb>), ('pi', 3.141592653589793), ('e', 2.718281828459045), ('tau', 6.283185307179586), ('inf', inf), ('nan', nan), ('__file__', '/usr/share/miniconda3/envs/pyecon/lib/python3.8/lib-dynload/math.cpython-38-x86_64-linux-gnu.so')])
import math2
math2.__dict__.items()
dict_items([('__name__', 'math2'), ('__doc__', None), ('__package__', ''), ('__loader__', <_frozen_importlib_external.SourceFileLoader object at 0x7f50e9a02c70>), ('__spec__', ModuleSpec(name='math2', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7f50e9a02c70>, origin='/home/runner/work/python-stats-vi/python-stats-vi/lectures/math2.py')), ('__file__', '/home/runner/work/python-stats-vi/python-stats-vi/lectures/math2.py'), ('__cached__', '/home/runner/work/python-stats-vi/python-stats-vi/lectures/__pycache__/math2.cpython-38.pyc'), ('__builtins__', {'__name__': 'builtins', '__doc__': "Built-in functions, exceptions, and other objects.\n\nNoteworthy: None is the `nil' object; Ellipsis represents `...' in slices.", '__package__': '', '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': ModuleSpec(name='builtins', loader=<class '_frozen_importlib.BuiltinImporter'>), '__build_class__': <built-in function __build_class__>, '__import__': <built-in function __import__>, 'abs': <built-in function abs>, 'all': <built-in function all>, 'any': <built-in function any>, 'ascii': <built-in function ascii>, 'bin': <built-in function bin>, 'breakpoint': <built-in function breakpoint>, 'callable': <built-in function callable>, 'chr': <built-in function chr>, 'compile': <built-in function compile>, 'delattr': <built-in function delattr>, 'dir': <built-in function dir>, 'divmod': <built-in function divmod>, 'eval': <built-in function eval>, 'exec': <built-in function exec>, 'format': <built-in function format>, 'getattr': <built-in function getattr>, 'globals': <built-in function globals>, 'hasattr': <built-in function hasattr>, 'hash': <built-in function hash>, 'hex': <built-in function hex>, 'id': <built-in function id>, 'input': <bound method Kernel.raw_input of <ipykernel.ipkernel.IPythonKernel object at 0x7f50ecaadfa0>>, 'isinstance': <built-in function isinstance>, 'issubclass': <built-in function issubclass>, 'iter': <built-in function iter>, 'len': <built-in function len>, 'locals': <built-in function locals>, 'max': <built-in function max>, 'min': <built-in function min>, 'next': <built-in function next>, 'oct': <built-in function oct>, 'ord': <built-in function ord>, 'pow': <built-in function pow>, 'print': <built-in function print>, 'repr': <built-in function repr>, 'round': <built-in function round>, 'setattr': <built-in function setattr>, 'sorted': <built-in function sorted>, 'sum': <built-in function sum>, 'vars': <built-in function vars>, 'None': None, 'Ellipsis': Ellipsis, 'NotImplemented': NotImplemented, 'False': False, 'True': True, 'bool': <class 'bool'>, 'memoryview': <class 'memoryview'>, 'bytearray': <class 'bytearray'>, 'bytes': <class 'bytes'>, 'classmethod': <class 'classmethod'>, 'complex': <class 'complex'>, 'dict': <class 'dict'>, 'enumerate': <class 'enumerate'>, 'filter': <class 'filter'>, 'float': <class 'float'>, 'frozenset': <class 'frozenset'>, 'property': <class 'property'>, 'int': <class 'int'>, 'list': <class 'list'>, 'map': <class 'map'>, 'object': <class 'object'>, 'range': <class 'range'>, 'reversed': <class 'reversed'>, 'set': <class 'set'>, 'slice': <class 'slice'>, 'staticmethod': <class 'staticmethod'>, 'str': <class 'str'>, 'super': <class 'super'>, 'tuple': <class 'tuple'>, 'type': <class 'type'>, 'zip': <class 'zip'>, '__debug__': True, 'BaseException': <class 'BaseException'>, 'Exception': <class 'Exception'>, 'TypeError': <class 'TypeError'>, 'StopAsyncIteration': <class 'StopAsyncIteration'>, 'StopIteration': <class 'StopIteration'>, 'GeneratorExit': <class 'GeneratorExit'>, 'SystemExit': <class 'SystemExit'>, 'KeyboardInterrupt': <class 'KeyboardInterrupt'>, 'ImportError': <class 'ImportError'>, 'ModuleNotFoundError': <class 'ModuleNotFoundError'>, 'OSError': <class 'OSError'>, 'EnvironmentError': <class 'OSError'>, 'IOError': <class 'OSError'>, 'EOFError': <class 'EOFError'>, 'RuntimeError': <class 'RuntimeError'>, 'RecursionError': <class 'RecursionError'>, 'NotImplementedError': <class 'NotImplementedError'>, 'NameError': <class 'NameError'>, 'UnboundLocalError': <class 'UnboundLocalError'>, 'AttributeError': <class 'AttributeError'>, 'SyntaxError': <class 'SyntaxError'>, 'IndentationError': <class 'IndentationError'>, 'TabError': <class 'TabError'>, 'LookupError': <class 'LookupError'>, 'IndexError': <class 'IndexError'>, 'KeyError': <class 'KeyError'>, 'ValueError': <class 'ValueError'>, 'UnicodeError': <class 'UnicodeError'>, 'UnicodeEncodeError': <class 'UnicodeEncodeError'>, 'UnicodeDecodeError': <class 'UnicodeDecodeError'>, 'UnicodeTranslateError': <class 'UnicodeTranslateError'>, 'AssertionError': <class 'AssertionError'>, 'ArithmeticError': <class 'ArithmeticError'>, 'FloatingPointError': <class 'FloatingPointError'>, 'OverflowError': <class 'OverflowError'>, 'ZeroDivisionError': <class 'ZeroDivisionError'>, 'SystemError': <class 'SystemError'>, 'ReferenceError': <class 'ReferenceError'>, 'MemoryError': <class 'MemoryError'>, 'BufferError': <class 'BufferError'>, 'Warning': <class 'Warning'>, 'UserWarning': <class 'UserWarning'>, 'DeprecationWarning': <class 'DeprecationWarning'>, 'PendingDeprecationWarning': <class 'PendingDeprecationWarning'>, 'SyntaxWarning': <class 'SyntaxWarning'>, 'RuntimeWarning': <class 'RuntimeWarning'>, 'FutureWarning': <class 'FutureWarning'>, 'ImportWarning': <class 'ImportWarning'>, 'UnicodeWarning': <class 'UnicodeWarning'>, 'BytesWarning': <class 'BytesWarning'>, 'ResourceWarning': <class 'ResourceWarning'>, 'ConnectionError': <class 'ConnectionError'>, 'BlockingIOError': <class 'BlockingIOError'>, 'BrokenPipeError': <class 'BrokenPipeError'>, 'ChildProcessError': <class 'ChildProcessError'>, 'ConnectionAbortedError': <class 'ConnectionAbortedError'>, 'ConnectionRefusedError': <class 'ConnectionRefusedError'>, 'ConnectionResetError': <class 'ConnectionResetError'>, 'FileExistsError': <class 'FileExistsError'>, 'FileNotFoundError': <class 'FileNotFoundError'>, 'IsADirectoryError': <class 'IsADirectoryError'>, 'NotADirectoryError': <class 'NotADirectoryError'>, 'InterruptedError': <class 'InterruptedError'>, 'PermissionError': <class 'PermissionError'>, 'ProcessLookupError': <class 'ProcessLookupError'>, 'TimeoutError': <class 'TimeoutError'>, 'open': <built-in function open>, 'copyright': Copyright (c) 2001-2020 Python Software Foundation.
All Rights Reserved.
Copyright (c) 2000 BeOpen.com.
All Rights Reserved.
Copyright (c) 1995-2001 Corporation for National Research Initiatives.
All Rights Reserved.
Copyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.
All Rights Reserved., 'credits': Thanks to CWI, CNRI, BeOpen.com, Zope Corporation and a cast of thousands
for supporting Python development. See www.python.org for more information., 'license': Type license() to see the full license text, 'help': Type help() for interactive help, or help(object) for help about object., '__IPYTHON__': True, 'display': <function display at 0x7f50ee1bc3a0>, 'get_ipython': <bound method InteractiveShell.get_ipython of <ipykernel.zmqshell.ZMQInteractiveShell object at 0x7f50ecac4cd0>>}), ('pi', 'foobar')])
As you know, we access elements of the namespace using the dotted attribute notation
math.pi
3.141592653589793
In fact this is entirely equivalent to math.__dict__['pi']
math.__dict__['pi'] == math.pi
True
16.3.3. Viewing Namespaces¶
As we saw above, the math
namespace can be printed by typing math.__dict__
.
Another way to see its contents is to type vars(math)
vars(math).items()
dict_items([('__name__', 'math'), ('__doc__', 'This module provides access to the mathematical functions\ndefined by the C standard.'), ('__package__', ''), ('__loader__', <_frozen_importlib_external.ExtensionFileLoader object at 0x7f50eea771f0>), ('__spec__', ModuleSpec(name='math', loader=<_frozen_importlib_external.ExtensionFileLoader object at 0x7f50eea771f0>, origin='/usr/share/miniconda3/envs/pyecon/lib/python3.8/lib-dynload/math.cpython-38-x86_64-linux-gnu.so')), ('acos', <built-in function acos>), ('acosh', <built-in function acosh>), ('asin', <built-in function asin>), ('asinh', <built-in function asinh>), ('atan', <built-in function atan>), ('atan2', <built-in function atan2>), ('atanh', <built-in function atanh>), ('ceil', <built-in function ceil>), ('copysign', <built-in function copysign>), ('cos', <built-in function cos>), ('cosh', <built-in function cosh>), ('degrees', <built-in function degrees>), ('dist', <built-in function dist>), ('erf', <built-in function erf>), ('erfc', <built-in function erfc>), ('exp', <built-in function exp>), ('expm1', <built-in function expm1>), ('fabs', <built-in function fabs>), ('factorial', <built-in function factorial>), ('floor', <built-in function floor>), ('fmod', <built-in function fmod>), ('frexp', <built-in function frexp>), ('fsum', <built-in function fsum>), ('gamma', <built-in function gamma>), ('gcd', <built-in function gcd>), ('hypot', <built-in function hypot>), ('isclose', <built-in function isclose>), ('isfinite', <built-in function isfinite>), ('isinf', <built-in function isinf>), ('isnan', <built-in function isnan>), ('isqrt', <built-in function isqrt>), ('ldexp', <built-in function ldexp>), ('lgamma', <built-in function lgamma>), ('log', <built-in function log>), ('log1p', <built-in function log1p>), ('log10', <built-in function log10>), ('log2', <built-in function log2>), ('modf', <built-in function modf>), ('pow', <built-in function pow>), ('radians', <built-in function radians>), ('remainder', <built-in function remainder>), ('sin', <built-in function sin>), ('sinh', <built-in function sinh>), ('sqrt', <built-in function sqrt>), ('tan', <built-in function tan>), ('tanh', <built-in function tanh>), ('trunc', <built-in function trunc>), ('prod', <built-in function prod>), ('perm', <built-in function perm>), ('comb', <built-in function comb>), ('pi', 3.141592653589793), ('e', 2.718281828459045), ('tau', 6.283185307179586), ('inf', inf), ('nan', nan), ('__file__', '/usr/share/miniconda3/envs/pyecon/lib/python3.8/lib-dynload/math.cpython-38-x86_64-linux-gnu.so')])
If you just want to see the names, you can type
dir(math)[0:10]
['__doc__',
'__file__',
'__loader__',
'__name__',
'__package__',
'__spec__',
'acos',
'acosh',
'asin',
'asinh']
Notice the special names __doc__
and __name__
.
These are initialized in the namespace when any module is imported
__doc__
is the doc string of the module__name__
is the name of the module
print(math.__doc__)
This module provides access to the mathematical functions
defined by the C standard.
math.__name__
'math'
16.3.4. Interactive Sessions¶
In Python, all code executed by the interpreter runs in some module.
What about commands typed at the prompt?
These are also regarded as being executed within a module — in this case, a module called __main__
.
To check this, we can look at the current module name via the value of __name__
given at the prompt
print(__name__)
__main__
When we run a script using IPython’s run
command, the contents of the file are executed as part of __main__
too.
To see this, let’s create a file mod.py
that prints its own __name__
attribute
%%file mod.py
print(__name__)
Writing mod.py
Now let’s look at two different ways of running it in IPython
import mod # Standard import
mod
%run mod.py # Run interactively
__main__
In the second case, the code is executed as part of __main__
, so __name__
is equal to __main__
.
To see the contents of the namespace of __main__
we use vars()
rather than vars(__main__)
.
If you do this in IPython, you will see a whole lot of variables that IPython needs, and has initialized when you started up your session.
If you prefer to see only the variables you have initialized, use whos
x = 2
y = 3
import numpy as np
%whos
Variable Type Data/Info
-----------------------------------------------------
e enumerate <enumerate object at 0x7f50e9192e40>
f function <function f at 0x7f50e99da820>
g function <function f at 0x7f50e99da820>
i str eggs
math module <module 'math' from '/usr<...>-38-x86_64-linux-gnu.so'>
math2 module <module 'math2' from '/ho<...>ts-vi/lectures/math2.py'>
mod module <module 'mod' from '/home<...>tats-vi/lectures/mod.py'>
nikkei_data reader <_csv.reader object at 0x7f50e9a13ac0>
np module <module 'numpy' from '/us<...>kages/numpy/__init__.py'>
reader builtin_function_or_method <built-in function reader>
x int 2
y int 3
16.3.5. The Global Namespace¶
Python documentation often makes reference to the “global namespace”.
The global namespace is the namespace of the module currently being executed.
For example, suppose that we start the interpreter and begin making assignments .
We are now working in the module __main__
, and hence the namespace for __main__
is the global namespace.
Next, we import a module called amodule
import amodule
At this point, the interpreter creates a namespace for the module amodule
and starts executing commands in the module.
While this occurs, the namespace amodule.__dict__
is the global namespace.
Once execution of the module finishes, the interpreter returns to the module from where the import statement was made.
In this case it’s __main__
, so the namespace of __main__
again becomes the global namespace.
16.3.6. Local Namespaces¶
Important fact: When we call a function, the interpreter creates a local namespace for that function, and registers the variables in that namespace.
The reason for this will be explained in just a moment.
Variables in the local namespace are called local variables.
After the function returns, the namespace is deallocated and lost.
While the function is executing, we can view the contents of the local namespace with locals()
.
For example, consider
def f(x):
a = 2
print(locals())
return a * x
Now let’s call the function
f(1)
{'x': 1, 'a': 2}
2
You can see the local namespace of f
before it is destroyed.
16.3.7. The __builtins__
Namespace¶
We have been using various built-in functions, such as max(), dir(), str(), list(), len(), range(), type()
, etc.
How does access to these names work?
These definitions are stored in a module called
__builtin__
.They have there own namespace called
__builtins__
.
dir()[0:10]
['In', 'Out', '_', '_11', '_13', '_14', '_15', '_16', '_19', '_2']
dir(__builtins__)[0:10]
['ArithmeticError',
'AssertionError',
'AttributeError',
'BaseException',
'BlockingIOError',
'BrokenPipeError',
'BufferError',
'BytesWarning',
'ChildProcessError',
'ConnectionAbortedError']
We can access elements of the namespace as follows
__builtins__.max
<function max>
But __builtins__
is special, because we can always access them directly as well
max
<function max>
__builtins__.max == max
True
The next section explains how this works …
16.3.8. Name Resolution¶
Namespaces are great because they help us organize variable names.
(Type import this
at the prompt and look at the last item that’s printed)
However, we do need to understand how the Python interpreter works with multiple namespaces .
At any point of execution, there are in fact at least two namespaces that can be accessed directly.
(“Accessed directly” means without using a dot, as in pi
rather than math.pi
)
These namespaces are
The global namespace (of the module being executed)
The builtin namespace
If the interpreter is executing a function, then the directly accessible namespaces are
The local namespace of the function
The global namespace (of the module being executed)
The builtin namespace
Sometimes functions are defined within other functions, like so
def f():
a = 2
def g():
b = 4
print(a * b)
g()
Here f
is the enclosing function for g
, and each function gets its
own namespaces.
Now we can give the rule for how namespace resolution works:
The order in which the interpreter searches for names is
the local namespace (if it exists)
the hierarchy of enclosing namespaces (if they exist)
the global namespace
the builtin namespace
If the name is not in any of these namespaces, the interpreter raises a NameError
.
This is called the LEGB rule (local, enclosing, global, builtin).
Here’s an example that helps to illustrate .
Consider a script test.py
that looks as follows
%%file test.py
def g(x):
a = 1
x = x + a
return x
a = 0
y = g(10)
print("a = ", a, "y = ", y)
Writing test.py
What happens when we run this script?
%run test.py
a = 0 y = 11
x
2
First,
The global namespace
{}
is created.The function object is created, and
g
is bound to it within the global namespace.The name
a
is bound to0
, again in the global namespace.
Next g
is called via y = g(10)
, leading to the following sequence of actions
The local namespace for the function is created.
Local names
x
anda
are bound, so that the local namespace becomes{'x': 10, 'a': 1}
.Statement
x = x + a
uses the locala
and localx
to computex + a
, and binds local namex
to the result.This value is returned, and
y
is bound to it in the global namespace.Local
x
anda
are discarded (and the local namespace is deallocated).
Note that the global a
was not affected by the local a
.
16.3.9. Mutable Versus Immutable Parameters¶
This is a good time to say a little more about mutable vs immutable objects.
Consider the code segment
def f(x):
x = x + 1
return x
x = 1
print(f(x), x)
2 1
We now understand what will happen here: The code prints 2
as the value of f(x)
and 1
as the value of x
.
First f
and x
are registered in the global namespace.
The call f(x)
creates a local namespace and adds x
to it, bound to 1
.
Next, this local x
is rebound to the new integer object 2
, and this value is returned.
None of this affects the global x
.
However, it’s a different story when we use a mutable data type such as a list
def f(x):
x[0] = x[0] + 1
return x
x = [1]
print(f(x), x)
[2] [2]
This prints [2]
as the value of f(x)
and same for x
.
Here’s what happens
f
is registered as a function in the global namespacex
bound to[1]
in the global namespaceThe call
f(x)
Creates a local namespace
Adds
x
to local namespace, bound to[1]
The list
[1]
is modified to[2]
Returns the list
[2]
The local namespace is deallocated, and local
x
is lost
Global
x
has been modified
16.4. Handling Errors¶
Sometimes it’s possible to anticipate errors as we’re writing code.
For example, the unbiased sample variance of sample \(y_1, \ldots, y_n\) is defined as
This can be calculated in NumPy using np.var
.
But if you were writing a function to handle such a calculation, you might anticipate a divide-by-zero error when the sample size is one.
One possible action is to do nothing — the program will just crash, and spit out an error message.
But sometimes it’s worth writing your code in a way that anticipates and deals with runtime errors that you think might arise.
Why?
Because the debugging information provided by the interpreter is often less useful than the information on possible errors you have in your head when writing code.
Because errors causing execution to stop are frustrating if you’re in the middle of a large computation.
Because it’s reduces confidence in your code on the part of your users (if you are writing for others).
16.4.1. Assertions¶
A relatively easy way to handle checks is with the assert
keyword.
For example, pretend for a moment that the np.var
function doesn’t
exist and we need to write our own
def var(y):
n = len(y)
assert n > 1, 'Sample size must be greater than one.'
return np.sum((y - y.mean())**2) / float(n-1)
If we run this with an array of length one, the program will terminate and print our error message
var([1])
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-62-8419b6ab38ec> in <module>
----> 1 var([1])
<ipython-input-61-e6ffb16a7098> in var(y)
1 def var(y):
2 n = len(y)
----> 3 assert n > 1, 'Sample size must be greater than one.'
4 return np.sum((y - y.mean())**2) / float(n-1)
AssertionError: Sample size must be greater than one.
The advantage is that we can
fail early, as soon as we know there will be a problem
supply specific information on why a program is failing
16.4.2. Handling Errors During Runtime¶
The approach used above is a bit limited, because it always leads to termination.
Sometimes we can handle errors more gracefully, by treating special cases.
Let’s look at how this is done.
16.4.2.1. Exceptions¶
Here’s an example of a common error type
def f:
File "<ipython-input-63-262a7e387ba5>", line 1
def f:
^
SyntaxError: invalid syntax
Since illegal syntax cannot be executed, a syntax error terminates execution of the program.
Here’s a different kind of error, unrelated to syntax
1 / 0
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
<ipython-input-64-bc757c3fda29> in <module>
----> 1 1 / 0
ZeroDivisionError: division by zero
Here’s another
x1 = y1
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-65-a7b8d65e9e45> in <module>
----> 1 x1 = y1
NameError: name 'y1' is not defined
And another
'foo' + 6
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-66-216809d6e6fe> in <module>
----> 1 'foo' + 6
TypeError: can only concatenate str (not "int") to str
And another
X = []
x = X[0]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-67-082a18d7a0aa> in <module>
1 X = []
----> 2 x = X[0]
IndexError: list index out of range
On each occasion, the interpreter informs us of the error type
NameError
,TypeError
,IndexError
,ZeroDivisionError
, etc.
In Python, these errors are called exceptions.
16.4.2.2. Catching Exceptions¶
We can catch and deal with exceptions using try
– except
blocks.
Here’s a simple example
def f(x):
try:
return 1.0 / x
except ZeroDivisionError:
print('Error: division by zero. Returned None')
return None
When we call f
we get the following output
f(2)
0.5
f(0)
Error: division by zero. Returned None
f(0.0)
Error: division by zero. Returned None
The error is caught and execution of the program is not terminated.
Note that other error types are not caught.
If we are worried the user might pass in a string, we can catch that error too
def f(x):
try:
return 1.0 / x
except ZeroDivisionError:
print('Error: Division by zero. Returned None')
except TypeError:
print('Error: Unsupported operation. Returned None')
return None
Here’s what happens
f(2)
0.5
f(0)
Error: Division by zero. Returned None
f('foo')
Error: Unsupported operation. Returned None
If we feel lazy we can catch these errors together
def f(x):
try:
return 1.0 / x
except (TypeError, ZeroDivisionError):
print('Error: Unsupported operation. Returned None')
return None
Here’s what happens
f(2)
0.5
f(0)
Error: Unsupported operation. Returned None
f('foo')
Error: Unsupported operation. Returned None
If we feel extra lazy we can catch all error types as follows
def f(x):
try:
return 1.0 / x
except:
print('Error. Returned None')
return None
In general it’s better to be specific.
16.5. Decorators and Descriptors¶
Let’s look at some special syntax elements that are routinely used by Python developers.
You might not need the following concepts immediately, but you will see them in other people’s code.
Hence you need to understand them at some stage of your Python education.
16.5.1. Decorators¶
Decorators are a bit of syntactic sugar that, while easily avoided, have turned out to be popular.
It’s very easy to say what decorators do.
On the other hand it takes a bit of effort to explain why you might use them.
16.5.1.1. An Example¶
Suppose we are working on a program that looks something like this
import numpy as np
def f(x):
return np.log(np.log(x))
def g(x):
return np.sqrt(42 * x)
# Program continues with various calculations using f and g
Now suppose there’s a problem: occasionally negative numbers get fed to f
and g
in the calculations that follow.
If you try it, you’ll see that when these functions are called with negative numbers they return a NumPy object called nan
.
This stands for “not a number” (and indicates that you are trying to evaluate a mathematical function at a point where it is not defined).
Perhaps this isn’t what we want, because it causes other problems that are hard to pick up later on.
Suppose that instead we want the program to terminate whenever this happens, with a sensible error message.
This change is easy enough to implement
import numpy as np
def f(x):
assert x >= 0, "Argument must be nonnegative"
return np.log(np.log(x))
def g(x):
assert x >= 0, "Argument must be nonnegative"
return np.sqrt(42 * x)
# Program continues with various calculations using f and g
Notice however that there is some repetition here, in the form of two identical lines of code.
Repetition makes our code longer and harder to maintain, and hence is something we try hard to avoid.
Here it’s not a big deal, but imagine now that instead of just f
and g
, we have 20 such functions that we need to modify in exactly the same way.
This means we need to repeat the test logic (i.e., the assert
line testing nonnegativity) 20 times.
The situation is still worse if the test logic is longer and more complicated.
In this kind of scenario the following approach would be neater
import numpy as np
def check_nonneg(func):
def safe_function(x):
assert x >= 0, "Argument must be nonnegative"
return func(x)
return safe_function
def f(x):
return np.log(np.log(x))
def g(x):
return np.sqrt(42 * x)
f = check_nonneg(f)
g = check_nonneg(g)
# Program continues with various calculations using f and g
This looks complicated so let’s work through it slowly.
To unravel the logic, consider what happens when we say f = check_nonneg(f)
.
This calls the function check_nonneg
with parameter func
set equal to f
.
Now check_nonneg
creates a new function called safe_function
that
verifies x
as nonnegative and then calls func
on it (which is the same as f
).
Finally, the global name f
is then set equal to safe_function
.
Now the behavior of f
is as we desire, and the same is true of g
.
At the same time, the test logic is written only once.
16.5.1.2. Enter Decorators¶
The last version of our code is still not ideal.
For example, if someone is reading our code and wants to know how
f
works, they will be looking for the function definition, which is
def f(x):
return np.log(np.log(x))
They may well miss the line f = check_nonneg(f)
.
For this and other reasons, decorators were introduced to Python.
With decorators, we can replace the lines
def f(x):
return np.log(np.log(x))
def g(x):
return np.sqrt(42 * x)
f = check_nonneg(f)
g = check_nonneg(g)
with
@check_nonneg
def f(x):
return np.log(np.log(x))
@check_nonneg
def g(x):
return np.sqrt(42 * x)
These two pieces of code do exactly the same thing.
If they do the same thing, do we really need decorator syntax?
Well, notice that the decorators sit right on top of the function definitions.
Hence anyone looking at the definition of the function will see them and be aware that the function is modified.
In the opinion of many people, this makes the decorator syntax a significant improvement to the language.
16.5.2. Descriptors¶
Descriptors solve a common problem regarding management of variables.
To understand the issue, consider a Car
class, that simulates a car.
Suppose that this class defines the variables miles
and kms
, which give the distance traveled in miles
and kilometers respectively.
A highly simplified version of the class might look as follows
class Car:
def __init__(self, miles=1000):
self.miles = miles
self.kms = miles * 1.61
# Some other functionality, details omitted
One potential problem we might have here is that a user alters one of these variables but not the other
car = Car()
car.miles
1000
car.kms
1610.0
car.miles = 6000
car.kms
1610.0
In the last two lines we see that miles
and kms
are out of sync.
What we really want is some mechanism whereby each time a user sets one of these variables, the other is automatically updated.
16.5.2.1. A Solution¶
In Python, this issue is solved using descriptors.
A descriptor is just a Python object that implements certain methods.
These methods are triggered when the object is accessed through dotted attribute notation.
The best way to understand this is to see it in action.
Consider this alternative version of the Car
class
class Car:
def __init__(self, miles=1000):
self._miles = miles
self._kms = miles * 1.61
def set_miles(self, value):
self._miles = value
self._kms = value * 1.61
def set_kms(self, value):
self._kms = value
self._miles = value / 1.61
def get_miles(self):
return self._miles
def get_kms(self):
return self._kms
miles = property(get_miles, set_miles)
kms = property(get_kms, set_kms)
First let’s check that we get the desired behavior
car = Car()
car.miles
1000
car.miles = 6000
car.kms
9660.0
Yep, that’s what we want — car.kms
is automatically updated.
16.5.2.2. How it Works¶
The names _miles
and _kms
are arbitrary names we are using to store the values of the variables.
The objects miles
and kms
are properties, a common kind of descriptor.
The methods get_miles
, set_miles
, get_kms
and set_kms
define
what happens when you get (i.e. access) or set (bind) these variables
So-called “getter” and “setter” methods.
The builtin Python function property
takes getter and setter methods and creates a property.
For example, after car
is created as an instance of Car
, the object car.miles
is a property.
Being a property, when we set its value via car.miles = 6000
its setter
method is triggered — in this case set_miles
.
16.5.2.3. Decorators and Properties¶
These days its very common to see the property
function used via a decorator.
Here’s another version of our Car
class that works as before but now uses
decorators to set up the properties
class Car:
def __init__(self, miles=1000):
self._miles = miles
self._kms = miles * 1.61
@property
def miles(self):
return self._miles
@property
def kms(self):
return self._kms
@miles.setter
def miles(self, value):
self._miles = value
self._kms = value * 1.61
@kms.setter
def kms(self, value):
self._kms = value
self._miles = value / 1.61
We won’t go through all the details here.
For further information you can refer to the descriptor documentation.
16.6. Generators¶
A generator is a kind of iterator (i.e., it works with a next
function).
We will study two ways to build generators: generator expressions and generator functions.
16.6.1. Generator Expressions¶
The easiest way to build generators is using generator expressions.
Just like a list comprehension, but with round brackets.
Here is the list comprehension:
singular = ('dog', 'cat', 'bird')
type(singular)
tuple
plural = [string + 's' for string in singular]
plural
['dogs', 'cats', 'birds']
type(plural)
list
And here is the generator expression
singular = ('dog', 'cat', 'bird')
plural = (string + 's' for string in singular)
type(plural)
generator
next(plural)
'dogs'
next(plural)
'cats'
next(plural)
'birds'
Since sum()
can be called on iterators, we can do this
sum((x * x for x in range(10)))
285
The function sum()
calls next()
to get the items, adds successive terms.
In fact, we can omit the outer brackets in this case
sum(x * x for x in range(10))
285
16.6.2. Generator Functions¶
The most flexible way to create generator objects is to use generator functions.
Let’s look at some examples.
16.6.2.1. Example 1¶
Here’s a very simple example of a generator function
def f():
yield 'start'
yield 'middle'
yield 'end'
It looks like a function, but uses a keyword yield
that we haven’t met before.
Let’s see how it works after running this code
type(f)
function
gen = f()
gen
<generator object f at 0x7f50c8b96890>
next(gen)
'start'
next(gen)
'middle'
next(gen)
'end'
next(gen)
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
<ipython-input-110-6e72e47198db> in <module>
----> 1 next(gen)
StopIteration:
The generator function f()
is used to create generator objects (in this case gen
).
Generators are iterators, because they support a next
method.
The first call to next(gen)
Executes code in the body of
f()
until it meets ayield
statement.Returns that value to the caller of
next(gen)
.
The second call to next(gen)
starts executing from the next line
def f():
yield 'start'
yield 'middle' # This line!
yield 'end'
and continues until the next yield
statement.
At that point it returns the value following yield
to the caller of next(gen)
, and so on.
When the code block ends, the generator throws a StopIteration
error.
16.6.2.2. Example 2¶
Our next example receives an argument x
from the caller
def g(x):
while x < 100:
yield x
x = x * x
Let’s see how it works
g
<function __main__.g(x)>
gen = g(2)
type(gen)
generator
next(gen)
2
next(gen)
4
next(gen)
16
next(gen)
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
<ipython-input-118-6e72e47198db> in <module>
----> 1 next(gen)
StopIteration:
The call gen = g(2)
binds gen
to a generator.
Inside the generator, the name x
is bound to 2
.
When we call next(gen)
The body of
g()
executes until the lineyield x
, and the value ofx
is returned.
Note that value of x
is retained inside the generator.
When we call next(gen)
again, execution continues from where it left off
def g(x):
while x < 100:
yield x
x = x * x # execution continues from here
When x < 100
fails, the generator throws a StopIteration
error.
Incidentally, the loop inside the generator can be infinite
def g(x):
while 1:
yield x
x = x * x
16.6.3. Advantages of Iterators¶
What’s the advantage of using an iterator here?
Suppose we want to sample a binomial(n,0.5).
One way to do it is as follows
import random
n = 10000000
draws = [random.uniform(0, 1) < 0.5 for i in range(n)]
sum(draws)
4999588
But we are creating two huge lists here, range(n)
and draws
.
This uses lots of memory and is very slow.
If we make n
even bigger then this happens
n = 100000000
draws = [random.uniform(0, 1) < 0.5 for i in range(n)]
We can avoid these problems using iterators.
Here is the generator function
def f(n):
i = 1
while i <= n:
yield random.uniform(0, 1) < 0.5
i += 1
Now let’s do the sum
n = 10000000
draws = f(n)
draws
<generator object f at 0x7f50d3cdb5f0>
sum(draws)
5000081
In summary, iterables
avoid the need to create big lists/tuples, and
provide a uniform interface to iteration that can be used transparently in
for
loops
16.7. Recursive Function Calls¶
This is not something that you will use every day, but it is still useful — you should learn it at some stage.
Basically, a recursive function is a function that calls itself.
For example, consider the problem of computing \(x_t\) for some t when
Obviously the answer is \(2^t\).
We can compute this easily enough with a loop
def x_loop(t):
x = 1
for i in range(t):
x = 2 * x
return x
We can also use a recursive solution, as follows
def x(t):
if t == 0:
return 1
else:
return 2 * x(t-1)
What happens here is that each successive call uses it’s own frame in the stack
a frame is where the local variables of a given function call are held
stack is memory used to process function calls
a First In Last Out (FILO) queue
This example is somewhat contrived, since the first (iterative) solution would usually be preferred to the recursive solution.
We’ll meet less contrived applications of recursion later on.
16.8. Exercises¶
16.8.1. Exercise 1¶
The Fibonacci numbers are defined by
The first few numbers in the sequence are \(0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55\).
Write a function to recursively compute the \(t\)-th Fibonacci number for any \(t\).
16.8.2. Exercise 2¶
Complete the following code, and test it using this csv file, which we assume that you’ve put in your current working directory
def column_iterator(target_file, column_number):
"""A generator function for CSV files.
When called with a file name target_file (string) and column number
column_number (integer), the generator function returns a generator
that steps through the elements of column column_number in file
target_file.
"""
# put your code here
dates = column_iterator('test_table.csv', 1)
for date in dates:
print(date)
16.8.3. Exercise 3¶
Suppose we have a text file numbers.txt
containing the following lines
prices
3
8
7
21
Using try
– except
, write a program to read in the contents of the file and sum the numbers, ignoring lines without numbers.
16.9. Solutions¶
16.9.1. Exercise 1¶
Here’s the standard solution
def x(t):
if t == 0:
return 0
if t == 1:
return 1
else:
return x(t-1) + x(t-2)
Let’s test it
print([x(i) for i in range(10)])
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
16.9.2. Exercise 2¶
One solution is as follows
def column_iterator(target_file, column_number):
"""A generator function for CSV files.
When called with a file name target_file (string) and column number
column_number (integer), the generator function returns a generator
which steps through the elements of column column_number in file
target_file.
"""
f = open(target_file, 'r')
for line in f:
yield line.split(',')[column_number - 1]
f.close()
dates = column_iterator('test_table.csv', 1)
i = 1
for date in dates:
print(date)
if i == 10:
break
i += 1
Date
2009-05-21
2009-05-20
2009-05-19
2009-05-18
2009-05-15
2009-05-14
2009-05-13
2009-05-12
2009-05-11
16.9.3. Exercise 3¶
Let’s save the data first
%%file numbers.txt
prices
3
8
7
21
Writing numbers.txt
f = open('numbers.txt')
total = 0.0
for line in f:
try:
total += float(line)
except ValueError:
pass
f.close()
print(total)
39.0