Introduction to python for econometrics, statistics and data analysis ( PDFDrive com )

Background

This guide is tailored for beginners in statistical computing who want to build essential skills for conducting original research with Python It serves as a valuable resource for students, researchers, and practitioners seeking a flexible platform for econometrics, statistics, and general numerical analysis, including numeric solutions to economic models and model simulations.

Python is a versatile programming language that effectively addresses a variety of challenges, particularly in econometrics, statistics, and numerical analysis Recent advancements have expanded its capabilities, making it competitive with specialized languages like R, MATLAB, and Julia when equipped with the appropriate add-ons If you're contemplating the merits of Python versus other programming languages, several factors should be taken into account.

You might want to consider R if:

If you're looking to apply statistical methods, R's statistics library is unparalleled and leads in the development of innovative statistical algorithms, ensuring that you will likely discover the latest procedures available in R.

• Performance is of secondary importance.

You might want to consider MATLAB if:

• Commercial support, and a clean channel to report issues, is important.

• Documentation and organization of modules is more important than raw routine availability.

When evaluating software packages, performance should take precedence over the range of available options MATLAB stands out due to its advanced optimizations, including Just-in-Time (JIT) compilation for loops, a feature that is often lacking in many competing packages.

You might want to consider Julia if:

According to various rankings, Python is recognized as one of the most popular programming languages The TIOBE index ranks Python as the 8th most popular language, while LangPop places it between the 5th and 6th positions Additionally, another source, LangPop.com, also lists Python as the 6th most popular language.

• Performance in an interactive based language is your most important concern.

• You don’t mind learning enough Python to interface with Python packages The Julia ecosystem is in its infancy and a bridge to Python is used to provide important missing features.

• You like living on the bleeding edge, and aren’t worried about code breaking across new versions of Julia.

• You like to do most things yourself.

Having read the reasons to choose another package, you may wonder why you should consider Python.

Python serves as a comprehensive solution for various tasks, enabling users to access web-based services, manage databases, and perform data processing and statistical computations all within one language Additionally, Python can be utilized to develop server-side applications, dynamic websites, and desktop applications with graphical user interfaces, as well as mobile apps for iOS and Android platforms.

• Data handling and manipulation – especially cleaning and reformatting – is an important concern. Python is substantially more capable at data set construction than either R or MATLAB.

• Performance is a concern, but not at the top of the list 2

• Free is an important consideration – Python can be freely deployed, even to 100s of servers in a compute cluster or in the cloud (e.g Amazon Web Services or Azure).

• Knowledge of Python, as a general purpose language, is complementary to R/MATLAB/Julia/Ox/-GAUSS/Stata.

Conventions

These notes will follow two conventions.

1 Code blocks will be used throughout.

# Comments appear in a different color

Reserved keywords in Python include essential terms such as "def," "return," "if," "else," "for," "while," and "import," which are fundamental for defining functions, controlling flow, and managing exceptions Other keywords like "global," "lambda," "try," and "except" play crucial roles in variable scope and error handling Understanding these keywords is vital for effective programming in Python, as they dictate the language's structure and functionality.

# Common functions and classes are highlighted in a

# different color Note that these are not reserved,

Python performance can be significantly enhanced to approach that of C through various techniques, including the use of Numba for pure Python, Cython as a C/Python hybrid language, or by directly integrating C code Additionally, recent developments have notably narrowed the performance gap between Python and other Just-in-Time compiled languages like MATLAB.

# and can be used although best practice would be

# to avoid them if possible array matrix xrange list True False None

# Long lines are indented some_text = ’This is a very, very, very, very, very, very, very, very, very, very, very , very long line.’

In an interactive IPython session, the presence of the symbol ">>>" in a code block signifies that a command is being executed The output typically follows the console command directly and does not include a command indicator beforehand.

When a code block lacks the console session indicator, it signifies that the code is meant to run in an independent Python file For example, the code snippet imports the print function from the future, utilizes NumPy to create an array with values [1, 2, 3, 4], computes the sum of the array, and prints both the array and its sum.

Important Components of the Python Scientific Stack

Python 2.7.6 (or later, but in the Python 2.7.x family) is required This provides the core Python interpreter.

NumPy provides a set of array and matrix data types which are essential for statistics, econometrics and data analysis.

SciPy offers a comprehensive suite of routines essential for data analysis, featuring a diverse array of random number generators, linear algebra functions, and optimization tools It is built on the foundation of NumPy, enhancing its capabilities for scientific computing.

IPython provides an interactive Python environment which enhances productivity when developing code or performing interactive data analysis.

Matplotlib is a powerful library for creating 2D plots in Python, offering some support for 3D visualizations In contrast, Seaborn enhances the aesthetics of Matplotlib plots effortlessly, allowing users to generate visually appealing graphics with minimal additional coding.

1.3.6 pandas pandas provides high-performance data structures.

Several modules, such as Cython and Numba, enhance performance in Python programming Cython allows developers to write functions in a Python-derived creole that can be compiled into native C code extensions Meanwhile, Numba employs just-in-time compilation to convert a subset of Python code into native code using the Low-Level Virtual Machine (LLVM).

Setup

To install the Python scientific stack, it is recommended to use Anaconda by Continuum Analytics For those unable to install Anaconda, Appendix 1 provides a detailed guide for a more complex installation process, including direct installation of Python and necessary modules Additionally, the appendix emphasizes the importance of using virtual environments, which are regarded as best practices in Python development.

Anaconda, offered for free by Continuum Analytics, is a comprehensive scientific stack for Python that includes the core interpreter, standard libraries, and essential modules for data analysis It features performance-enhancing modules for linear algebra on Intel processors via the Math Kernel Library (MKL), available at no cost for academic users and for a nominal fee for others Additionally, Continuum Analytics provides high-performance modules for handling large data files and utilizing GPU acceleration for a modest charge The installation process is straightforward across Windows, Linux, and OS X, and users can easily update to the latest version using the conda update commands.

To install Anaconda on Windows, download the installer and run it, ensuring the installation directory is set to the default (C:\Anaconda) After completing the setup, open a command prompt and execute the commands: `cd ANACONDA\Scripts`, `conda update conda`, `conda update anaconda`, and `conda install mkl` to update Anaconda and install the Intel Math Kernel Library for enhanced linear algebra performance Note that a license for MKL is free for academic use but may incur a fee otherwise; if a license cannot be obtained, skip this step You can use `conda install` later to add more packages as needed Additionally, navigate to `cd ANACONDA\Scripts` and run `pip install pylint html5lib seaborn` to install extra packages not included in Anaconda Ensure that the installation path does not contain unicode characters or spaces if you choose a different directory.

The recommended settings for installing Anaconda on Windows are:

To install the application for all users, admin privileges are necessary If you lack these privileges, opt for the "Just for me" installation option However, be cautious when selecting a path that includes non-ASCII characters, as this may lead to installation issues.

• Add Anaconda to the System PATH - This is important to ensure that Anaconda commands can be run from the command prompt.

• Register Anaconda as the system Python - If Anaconda is the only Python installed, then select this option.

To run Python programs effectively, ensure that the Anaconda installation is included in the system PATH This can be achieved by adding the ANACONDA and ANACONDA\Scripts directories using the command: set PATH=ANACONDA;ANACONDA\Scripts;%PATH%.

To install Anaconda on Linux, run the command `bash Anaconda-x.y.z-Linux-ISA.sh`, where `x.y.z` corresponds to the specific version and `ISA` is typically `x86_64` For OS X, you can choose between a GUI installer in pkg format or a bash installer, which follows the same installation process as Linux It is highly recommended to prepend the Anaconda binary directory to your PATH for ease of use, which can be done temporarily by executing `export PATH=/home/python/anaconda/bin:$PATH` in your terminal session.

To make this change permanent on Linux, add the specified line to the hidden bashrc file in the home directory (~/) For OS X users, the same line can be included in the bash_profile file, also found in the home directory (~/).

After completing the installation of Anaconda, navigate to the installation directory (typically ~/anaconda) and execute the commands `conda update conda`, `conda update anaconda`, and `conda install mkl` to ensure Anaconda is current and to install the Intel Math Kernel Library, which enhances performance Note that this library requires a free license for academic users and is available at a low cost for others; if a license cannot be obtained, you may skip this step Additionally, you can use `conda install` to add other desired packages later Finally, run `pip install pylint html5lib seaborn` to install additional packages not included with Anaconda.

For OS X and Linux users, it is essential to have ANACONDA/bin included in the system path If it is not set up, navigate to the ANACONDA directory and then to the bin folder using the command `cd ANACONDA/cd bin` After this, all subsequent commands should be prefixed with a dot (.) to execute properly.

Using Python

Python can be programmed using an interactive session using IPython or by directly executing Python scripts – text files that end in the extension py – using the Python interpreter.

Interactive programming offers significant advantages for language learning While the standard Python interactive console is quite basic and lacks features like tab completion, IPython, particularly the QtConsole version, enhances the experience by providing a highly productive environment with numerous useful functionalities.

Tab completion enhances efficiency by displaying a list of matching functions, packages, and variables after typing one or more characters and pressing the tab key If the list is extensive, pressing tab again enables navigation through the options using the arrow keys for easy selection.

The "Magic" function in IPython simplifies tasks like navigating the local file system with commands such as %cd ~/directory/ and running Python programs using run program.py By entering %magic in an IPython session, users can access a detailed description of available functions, while %lsmagic provides a concise list of magic commands These features enhance the user experience by streamlining common operations.

– edit filename- launch an editor to editfilename

– lsorls pattern- list the contents of a directory

– runfilename - run the Python filefilename

– timeit- time the execution of a piece of code or function

When utilizing the QtConsole, calling a function displays the initial portion of its help documentation, allowing users to see the top 20 lines of help text by simply entering the function name, such as "mean."

The QtConsole offers the ability to display figures inline, creating a clean and self-contained workspace This feature can be activated by using the pylab=inline switch at startup or by setting the configuration option _c.IPKernelApp.pylab="inline".

• The special variable_contains the last result in the console, and so the most recent result can be saved to a new variable using the syntaxx = _.

• Support for profiles, which provide further customization of sessions.

IPython offers the ability to utilize profiles, enabling users to create alternative environments at launch, affecting both the appearance and the packages loaded into the session These profiles are configured through a collection of files stored in a designated location.

%USERPROFILE%\.ipython\ on Windows and

To configure a profile in IPython on OS X or Linux, navigate to the ~/.config/ipython/ directory, where you will find a mostly empty folder named profile_default To create a new profile, open a terminal or command prompt and execute the command: `ipython profile create econometrics`.

This will create a directory namedprofile_econometricsand populate it with 4 files:

The ìpython_config.py` file contains general settings applicable to all IPython sessions, while ìpython_nbconvert_config.py` includes configurations used by the Notebook converter Additionally, ìpython_notebook_config.py` is tailored for settings specific to IPython Notebook browser sessions, and ìpython_qtconsole_config.py` provides configurations exclusive to QtConsole sessions.

The two key configuration files for IPython are `ipython_config` and `ipython_qtconsole_config`, which can be opened in a text editor to explore a wide range of options, all of which are commented out with a # symbol For a comprehensive understanding of these settings, it is recommended to consult the online IPython documentation, where most options include brief comments explaining their purpose and potential values.

The settings in the IPython configuration file affect all sessions using the specified profile, whether in the terminal, QtConsole, or Notebook A key feature is c.InteractiveShellApp.exec_lines, which enables the automatic execution of commands upon opening an IPython session, making it ideal for importing frequently used packages Additionally, c.InteractiveShellApp.pylab allows for loading pylab directly, similar to using the command line option pylab=backend, while c.InteractiveShellApp.matplotlib specifically loads only the matplotlib library without the additional pylab components.

The settings in this file are specifically for QtConsole sessions and primarily influence the console's appearance Key configurations include adjusting the font size with `c.IPythonWidget.font_size` and selecting the font family using `c.IPythonWidget.font_family`.

The configuration for pylab can be specifically set to "inline" in QtConsole sessions, which is equivalent to using the command line option pylab=inline when launching IPython with QtConsole This setting overrides the general pylab configuration exclusively for QtConsole, allowing for different settings, such as "qt4" for terminal-based IPython sessions while using "inline" for QtConsole.

This final setting is identical to the command-line switch colorsand can be set to"linux"to produce a console with a dark background and light characters. c.ZMQInteractiveShell.colors

When executing code in IPython or standalone Python programs, it is essential to include two specific imports: `from future import print_function` and `from future import division` These imports enable the use of future versions of the print function and division behavior To implement this, open the `ipython_config.py` file located in the `profile_econometrics` directory and set the value of `c.InteractiveShellApp.exec_lines` to `["from future import print_function, division"]`.

"os.chdir(’c:\\dir\\to\\start\\in’)"] and c.InteractiveShellApp.pylab="qt4"

This code does two things First, it imports two “future” features (which are standard in Python 3.x+), the print function and division, which are useful for numerical programming.

In Python 2.7, the print statement is used without parentheses, as in print 'string to print', whereas Python 3.x standardizes it as a function call, requiring parentheses: print('string to print') I prefer the latter approach, as it facilitates a smoother transition to Python 3.x and aligns better with other function calls in the language.

In Python 2.7, dividing integers results in an integer, truncating the result (e.g., 9/5 equals 1) However, in Python 3.x, integer division yields a float when the numbers are not exact multiples (e.g., 9/5 equals 1.8) To achieve integer division with truncation in Python 3.x, the syntax 9//5 can be used, which returns 2 for 11/5, while 11//5 equals 2.

Second, pylab will be loaded by default using the qt4 backend.

Changing settings inipython_qtconsole_config.pyis optional, although I recommend using c.IPythonWidget.font_size c.IPythonWidget.font_family="Bitstream Vera Sans Mono" c.IPKernelApp.pylab="inline" c.ZMQInteractiveShell.colors="linux"

These commands assume that the Bitstream Vera fonts have been locally installed, which are available fromhttp://ftp.gnome.org/pub/GNOME/sources/ttf-bitstream-vera/1.10/.

IPython can be started by running ipython profilenometrics in the terminal Starting IPython using the QtConsole is virtually identical. ipython qtconsole profilenometrics

A single line launcher on OS X or Linux can be constructed using bash -c "ipython qtconsole profilenometrics"

This single line launcher can be saved asfilename.command wherefilenameis a meaningful name (e.g. IPython-Terminal) to create a launcher on OS X by entering the command chmod 755 /FULL/PATH/TO/filename.command

Exercises

2 Test the installation using the code in section1.5.7.

3 Configure IPython using the start-up script in section1.5.3.

4 Customize IPython QtConsole using a font or color scheme More customizations can be found by runningipython -h.

Discover the tab completion feature in IPython by typing "a" followed by the key to view a list of functions starting with "a" that are available in pylab Then, try typing "i" and pressing to generate an extensive list that may exceed the screen length.

ESC to exit the pager.

6 Launch IPython Notebook and run code in the testing section.

7 Open Spyder and explore its features.

Python's sensitivity to whitespace, including spaces and tabs, impacts how it interprets files, making proper indentation crucial Configuration files like ipython_config.py are plain Python files and must adhere to this whitespace sensitivity To avoid errors, ensure there is no whitespace preceding configuration lines, such as c.InteractiveShellApp.exec_lines.

Python does not generally work when directories have spaces.

Python 2.7 struggles with paths that include unicode characters, particularly affecting IPython's ability to locate configuration files in user directories To resolve this issue, users should set the HOME variable to a path containing only ASCII characters before starting IPython This can be done by creating a directory, setting the HOME variable to that directory, activating the Anaconda environment, and creating an IPython profile specifically for econometrics.

Theset HOME=c:\anaconda\ipython_configcan point to any path with directories containing only ASCII characters, and can also be added to any batch file to achieve the same effect.

Installing Anaconda to the root of the partition

When using Anaconda, it's important to avoid running the installation as the root user, as it may default to installing in the /anaconda directory instead of the ~/anaconda directory While using Anaconda as root is technically feasible, it is recommended to follow best practices by installing it in the user's home directory You can still use /anaconda in place of ~/anaconda in any provided instructions.

Unable to create profile for IPython

Non-ASCII characters in the username can cause problems for IPython, as it searches for configuration files in the $HOME/.ipython directory, typically located at /Users/username/.ipython To resolve this issue, it is recommended to create a new directory with an ASCII-only path and set an environment variable You can do this by running the commands: `mkdir /tmp/ipython_config` and èxport IPYTHONDIR=/tmp/ipython_config` After that, activate your Anaconda environment with `source ~/anaconda/bin/activate econometrics`, and create a new IPython profile using ìpython profile create econometrics` followed by ìpython profile=nometrics`.

To create a profile directory for IPython, use the commands to set it up in /tmp/ipython_config, ensuring the path contains only ASCII characters For permanent changes, edit your ~/.bash_profile to include the line export IPYTHONDIR=/tmp/ipython_config, eliminating the need for future modifications to the earlier commands.

~/.bash_profileis hidden and may not exist, sonano ~/.bash_profilecan be used to create and edit this file.

A complete listing ofregister_python.pyis included in this appendix.

# Script to register Python 2.0 or later for use with win32all

# and other extensions that require Python registry settings

# Adapted by Ned Batchelder from a script

# written by Joakim Law for Secret Labs AB/PythonWare

# http://www.pythonware.com/products/works/articles/regpy20.htm import sys from _winreg import *

# tweak as necessary version = sys.version[:3] installpath = sys.prefix regpath = "SOFTWARE\\Python\\Pythoncore\\%s\\" % (version) installkey = "InstallPath" pythonkey = "PythonPath" pythonpath = "%s;%s\\Lib\\;%s\\DLLs\\" % ( installpath, installpath, installpath

) def RegisterPy(): try: reg = OpenKey(HKEY_LOCAL_MACHINE, regpath) except EnvironmentError: try: reg = CreateKey(HKEY_LOCAL_MACHINE, regpath) except Exception, e: print "*** Unable to register: %s" % e return

SetValue(reg, installkey, REG_SZ, installpath)

SetValue(reg, pythonkey, REG_SZ, pythonpath)

CloseKey(reg) print " - Python %s at %s is now registered!" % (version, installpath) if name == " main ":

To install the Python scientific stack efficiently, the recommended approach is to use Continuum Analytics' Anaconda This article also outlines alternative installation methods utilizing virtual environments, which are regarded as best practices for Python usage.

1.C.1 Using Virtual Environments with Anaconda

To install Anaconda on Windows, download the installer and run it, with the default installation directory typically set to C:\Anaconda After completing the setup, open a command prompt and execute the following commands to update Anaconda and create a virtual environment named "econometrics": `conda update conda`, `conda update anaconda`, and `conda create -n econometrics ipython-qtconsole ipython-notebook scikit-learn matplotlib numpy pandas scipy spyder statsmodels` Using a virtual environment is a best practice, as it ensures that your components remain stable even with Anaconda updates, preventing potential errors from backward incompatible changes After creating the environment, you can add additional packages using `conda install -n econometrics` followed by the package names Additionally, installing the Intel Math Kernel library can significantly enhance performance on Intel systems, though a license is required for its use For academic users, this license is free, making it worthwhile to obtain for substantial performance improvements Alternatively, you can install all available packages with the command `conda create -n econometrics anaconda`.

To utilize the econometrics environment, first activate it by executing ANACONDA\Scripts\activate.bat econometrics in the command prompt, which will show [econometrics] in the prompt to indicate that the virtual environment is active After activation, run the command pip install pylint html5lib seaborn to install an additional package that is not directly available through Anaconda.

To install Anaconda on Linux, run the command `bash Anaconda-x.y.z-Linux-ISA.sh`, where x.y.z corresponds to the version and ISA is typically x86_64 For OS X, the installer is available in both GUI (pkg format) and bash formats, with installation procedures mirroring those of Linux Once the installation is complete, navigate to the Anaconda installation directory (default is ~/anaconda) by executing `cd ANACONDA` followed by `cd bin`.

/conda create -n econometrics ipython-qtconsole ipython-notebook matplotlib numpy pandas scikit-learn scipy spyder statsmodels

/conda install -n econometrics cython distribute lxml nose numba numexpr openpyxl pep8 pip psutil pyflakes pytables rope sphinx xlrd xlwt

To set up a virtual environment named "econometrics" with essential packages, use the command `conda install -n econometrics mkl`, which ensures Anaconda is up-to-date The `conda create` command establishes the environment, while `conda install` adds necessary packages Additionally, the Intel Math Kernel Library can be installed for enhanced performance, but note that it requires a free license for academic users or a low-cost option for others; if a license cannot be obtained, this step can be skipped Later, you can use `conda install` to add more packages as needed To activate the environment, execute `source ANACONDA/bin/activate econometrics`, followed by `pip install pylint html5lib seaborn` to install packages not included in Anaconda.

Anaconda streamlines the installation of the scientific Python stack, but there are instances where Anaconda cannot be installed In such cases, more complex installation instructions for both Windows and Linux are provided.

The list of required windows binary packages, along with the version and Windows installation file, required for these notes include:

This article provides a comprehensive list of essential Python packages and their corresponding versions for the Python 2.7.5 environment Key installations include Setuptools 2.2.0, Pip 1.5.4, and Virtualenv 1.11.1, which are crucial for package management and virtual environment creation Additionally, it highlights important libraries such as Jinja2 2.7.2 for templating, Tornado 3.2.0 for web applications, and PyCairo 1.10.0 for graphics For scientific computing, NumPy 1.8.0, SciPy 0.13.3, and Matplotlib 1.3.1 are included, along with data manipulation tools like Pandas 0.13.0 The article also mentions IPython 1.2.0 for interactive computing and machine learning libraries such as scikit-learn 0.14.1 and statsmodels 0.5.0 Lastly, it covers PyTables 3.1.0 for managing hierarchical datasets and lxml 3.3.1 for XML processing, ensuring a robust toolkit for Python developers.

These remaining packages are optional and are only discussed in the final chapters related to performance.

Cython 0.20.1 Cython-0.20.1.win-amd64-py2.7

LLVMPy 0.12.3 llvmpy-0.12.3.win-amd64-py2.7 LLVMMath 0.1.2 llvmmath-0.1.2.win-amd64-py2.7 Numba 0.12.1 numba-0.12.1.win-amd64-py2.7 pandas (Optional)

Bottleneck 0.8.0 Bottleneck-0.8.0.win-amd64-py2.7 NumExpr 2.3.1 numexpr-2.3.1.win-amd64-py2.7

To get started, install Python, setuptools, pip, and virtualenv Once these packages are set up, open an elevated command prompt with administrator privileges and initialize the virtual environment by executing the command: cd C:\Dropbox followed by virtualenv econometrics.

I use Dropbox to store my virtual environments, specifically naming mine "econometrics." While the virtual environment can be placed anywhere, it's recommended to avoid paths with spaces In this guide, "VIRTUALENV" will represent the directory of the virtual environment (e.g., C:\Dropbox\econometrics) After setting up the virtual environment, activate it by running `cd VIRTUALENV\Scripts activate.bat` and install necessary packages using the command `pip install beautifulsoup4 html5lib meta nose openpyxl patsy pep8 pyflakes pygments pylint pyparsing pyreadline python-dateutil pytz= 13d rope seaborn sphinx spyder wsgiref xlrd xlwt` To ensure the virtual environment is recognized as the default Python environment, execute the script `register_python.py` from the website Once the correct Python version is registered, install any remaining packages in order, including optional ones Finally, execute the command `xcopy c:\Python27\tcl VIRTUALENV\tcl /S /E /I` to complete the setup.

Python 2.7 vs 3

Python 2.7 is the final version of the Python 2.x line – all future development work will focus on Python 3.

It may seem strange to learn an “old” language The reasons for using 2.7 are:

Python 2.7 offers a wider selection of modules compared to Python 3, particularly when it comes to niche or less commonly used modules Although the core Python modules are compatible with both versions, some specialized modules are exclusively available for Python 2.7 or have not undergone thorough testing in Python 3 While it is anticipated that many of these modules will eventually be adapted for Python 3, they are not yet fully functional.

The modifications in language for numerical computing are minimal, ensuring that these notes reduce changes needed for compatibility with Python 3 and beyond, ideally requiring no adjustments at all.

• Configuring and installing 2.7 is easier.

• Anaconda defaults to 2.7 and the selection of packages available for Python 3 is limited.

Learning Python 3 has some advantages:

• No need to update in the future.

• Some improved out-of-box behavior for numerical applications.

Intel Math Kernel Library and AMD Core Math Library

Intel's MKL and AMD's CML offer optimized linear algebra routines that significantly outperform standard linear algebra libraries These libraries are multithreaded by default, allowing linear algebra operations to leverage all available processors on your system However, most standard NumPy builds do not incorporate these optimizations, making it crucial to use a Python distribution that includes the appropriate linear algebra library, particularly for tasks like computing inverses or eigenvalues of large matrices There are three main methods to access a NumPy build that utilizes Intel MKL.

• Use Anaconda on any platform and secure a license for MKL (free for academic use, otherwise $29 at the time of writing).

• Use the pre-built NumPy binaries made available byChristoph Gohlkefor Windows.

• Follow instructions for building NumPy on Linux with MKL, which is free on Linux.

Building NumPy from scratch is essential for users with AMD processors, as there are no pre-built libraries available for AMD's CML Alternatively, opting for an Intel system can provide a simpler solution.

Other Variants

Some other variants of the recommended version of Python are worth mentioning.

Enthought Canopy serves as a viable alternative to Anaconda, compatible with Windows, Linux, and OS X This software is frequently updated and offers a basic version free of charge, while the full version is also accessible at no cost for academic users Built on MKL, Canopy ensures rapid performance in matrix algebra computations.

IronPython is a version of Python that operates on the Common Language Runtime (CLR) within the Windows NET framework, making it a suitable option for numerical computing, particularly for those already skilled in C# or needing to interact with NET components While core modules like NumPy and SciPy are accessible, it is important to note that some libraries, such as matplotlib for plotting, are not supported, presenting significant limitations.

Jython is a version of Python that operates on the Java Runtime Environment (JRE), but it lacks support for NumPy, which significantly restricts its capabilities for numerical tasks Despite this limitation, a key benefit of using Python is the ability to execute mostly unmodified Python code on the JVM and access various Java libraries.

PyPy is an innovative implementation of Python that utilizes Just-in-time compilation to significantly enhance code execution speed, particularly for loops frequently used in numerical computing, achieving speeds 2 to 500 times faster than standard Python However, as of now, the core library NumPy is only partially implemented, making it unsuitable for immediate use Future plans aim to complete this implementation, positioning PyPy as a potentially preferred choice for numerical computing in Python.

2.A Relevant Differences between Python 2.7 and 3

Python 2.7 and 3 exhibit minimal differences that impact their use in econometrics, statistics, and numerical analysis, allowing for their interchangeable use under three common assumptions The configuration instructions for IPython provided in the previous chapter will yield the expected results during interactive sessions However, it's important to note that these differences become significant in stand-alone Python programs.

The `print` function is essential for displaying text in the console while running programs In Python 2.7, `print` is a keyword that functions differently than in Python 3, where it operates like a standard function In Python 2.7, the syntax is `print 'String to Print'`, whereas in Python 3, it is `print('String to Print')`, aligning with typical function calls To use the Python 3 version of `print` in Python 2.7, you can include `from future import print_function` at the beginning of your file I prefer the Python 3 syntax and recommend incorporating this statement in all programs for consistency.

Python 3 introduces a significant change in how integer division is handled compared to Python 2.7 In Python 2.7, dividing two integers results in an integer, truncating any fractional part, so 9/5 equals 1 However, Python 3 automatically converts the result to a floating-point number, yielding 9/5 as 1.8 This automatic conversion helps prevent rare errors when dealing with numerical data For those still using Python 2.7, the behavior of Python 3 can be mimicked by adding "from future import division" at the beginning of the program, a practice I expect will become standard.

Generating a sequence of numbers is essential for iterating over data in Python In Python 2.7, the recommended approach is to use the keyword `xrange`, whereas in Python 3, it has been renamed to `range` Therefore, when transitioning from Python 2.7 to Python 3, it is important to replace `xrange` with `range` for compatibility.

Unicode is a universal standard for text encoding that ensures consistency across different platforms Initially, the computer alphabet was restricted to just 128 characters, which proved inadequate for representing the diverse range of characters found in all languages.

Unicode allows for a character space of up to 2^31 characters, depending on the encoding used Unlike Python 2.7, which treats characters as single bytes and requires special syntax for unicode strings, Python 3 handles all strings as unicode This difference typically does not affect most numeric code in Python, except when reading or writing data For developers working with languages that frequently use characters beyond the standard 128-character set, using "from future import unicode_literals" can enhance future compatibility when transitioning to Python 3.

Before exploring Python for data analysis or Monte Carlo simulations, it's essential to grasp the fundamental concepts of core Python data types Unlike specialized languages like MATLAB, Python offers a versatile approach to data handling.

R is primarily designed for numerical work with a default data type that suits statistical analysis, whereas Python is a versatile programming language ideal for data analysis, econometrics, and statistics In MATLAB, the fundamental numeric type is an array that uses double precision for floating-point calculations, while Python's basic numeric data type is a 1-dimensional scalar, which can be either an integer or a double-precision floating point, depending on how the number is formatted during input.

Variable Names

Variable names in Python can consist of numbers, letters (both uppercase and lowercase), and underscores (_), but they must start with a letter or an underscore and are case-sensitive Certain reserved words, such as "import" and "for," cannot be used as variable names For instance, a valid variable assignment would be x = 1.0.

Variable names such as _x = 1.0 and x = 1.0 are considered legal and distinct However, names that start or end with an underscore, although legal, are typically avoided due to their conventional implications It's important to note that illegal names do not adhere to these established rules.

In Python, variable names with a single leading underscore, such as _some_internal_value, signify that the variable is intended for internal use within a module or class, although it remains accessible to calling code In contrast, variable names with double leading underscores, like some_private_value, indicate true privacy, rendering them inaccessible Additionally, trailing underscores are utilized to prevent conflicts with reserved Python keywords, such as class_ or lambda_ It's important to note that double leading and trailing underscores are reserved for "magic" variables (e.g., init ), and their use should be limited to specific feature access.

Multiple variables can be assigned on the same line using commas, x, y, z = 1, 3.1415, ’a’

Core Native Data Types

In Python, numbers can be categorized as integers, floats, or complex numbers Integers can be either 32-bit or 64-bit, depending on the Python interpreter's compilation for the operating system, while floats are always 64-bit, similar to doubles in C/C++ Long integers are unique in that they do not have a fixed size, allowing them to represent values larger than the maximum limit of standard integers This chapter focuses specifically on data types pertinent to numerical analysis, econometrics, and statistics, excluding types such as byte, bytearray, and memoryview.

The float data type is essential for numerical analysis, but not all non-complex numeric types qualify as floats To define a floating data type, a period (dot) must be included in the expression For instance, the function type() can be utilized to identify the data type of a variable.

In programming, assigning a value with the expression `x = 1` creates an integer variable, whereas `x = 1.0` results in a float variable It's crucial to use ".0" when a float is expected, as relying solely on integers can lead to unexpected results.

Complex numbers are also important for numerical analysis Complex numbers are created in Python usingjor the functioncomplex().

Note thata + b jis the same ascomplex( a , b ), whilecomplex( a )is the same asa +0j.

Floats approximate numbers with decimal portions, while the integer data type provides an exact representation of whole numbers However, the limitation of integers is their inability to represent non-integer values, which restricts their applicability in various numerical tasks.

Basic integers can be inputted by omitting the decimal point or by using the int() function Additionally, the int() function can convert a float to an integer by rounding towards zero.

In Python, integers can range from -2^31 to 2^31 - 1, but the language also supports long integers that have no effective range limitations Long integers can be created using the syntax `x = 1L` or by calling `long()` Furthermore, Python automatically converts integers that exceed the standard range into long integers.

>>> x = y ** 64 # ** is denotes exponentiation, y^64 in TeX

The trailingLafter the number indicates that it is a long integer, rather than a standard integer.

The Boolean data type represents true and false values using the reserved keywords True and False Boolean variables play a crucial role in controlling program flow and are often generated from logical operations, though they can also be input directly.

Non-zero, non-empty values generally evaluate to true when evaluated bybool() Zero or empty values such asbool(0),bool(0.0),bool(0.0j),bool(None),bool(’’)andbool([])are all false.

Strings are often overlooked in numerical analysis, yet they play a crucial role in handling data files, particularly during data import and output formatting for readability In programming, strings are typically enclosed in either single (' ') or double (" ") quotation marks, but not in a combination of both (e.g., do not use ’").

String manipulation is further discussed in Chapter21.

Slicing is a powerful method for accessing substrings within a string, utilizing square brackets to specify character indices, where the first index starts at 0 and the last index is n−1 for a string of length n The most commonly used slice types include s[i], which retrieves the character at position i, s[:i], which returns all leading characters from the start of the string up to position i−1, and s[i:], which provides all trailing characters from position i to the end of the string Additionally, slicing supports negative indices, allowing users to index the string in reverse.

Slice Behavior Slice Behavior s[:] Entire string s[ i ] Charactersi s[−i ] Charactersn−i s[ i :] Charactersi, ,n−1 s[−i :] Charactersn−i, ,n−1 s[: i ] Characters0, ,i −1 s[:−i ] Characters0, ,n−i −1 s[ i : j ] Charactersi, ,j −1 s[−j :−i ] Charactersn− j, ,n−i −1,−j >> text = ’Python strings are sliceable.’

IndexError: string index out of range

Lists are a built-in data type which require other data types to be useful A list is a collection of otherobjects

Lists in Python are fundamental data structures that can store various types of values, including floats, integers, complex numbers, strings, and even other lists They are particularly useful for organizing collections of data, such as representing a vector with a list of floats, though NumPy arrays and matrices are often more appropriate for numerical computations Python lists support slicing, allowing users to access one or more elements efficiently To create a basic list, simply enclose the values in square brackets [] and separate them with commas.

# 2-dimensional list (list of lists)

These examples show that lists can be regular, nested and can contain any mix of data types including other lists.

Lists in Python can be sliced similarly to strings, but with greater versatility due to their multi-dimensional nature Basic list slicing operations, such as x[:], x[1:], x[:1], and x[-3:], function in the same way as string slicing For a clearer understanding, consider a one-dimensional list x containing n elements, where Python employs 0-based indexing, allowing elements to be referenced as x0, x1, , xn−1.

Slice Behavior, Slice Behavior x[:] Return allx x[ i ] Returnx i x[ i ] Returnx i x[−i ] Returnsx n−i except wheni =−0 x[ i :] Returnx i , x n −1 x[−i :] Returnx n−i , ,x n−1 x[: i ] Returnx 0 , ,x i−1 x[:−i ] Returnx 0 , ,x n−i x[ i : j ] Returnx i ,x i + 1 , x j −1 x[−j :−i ] Returnx n−j , ,x n−i x[ i : j : m ] Returnsx i ,x i + m , .x i + m b j−i−1 m c x[−j :−i : m ] Returnsx n−j ,x n−j + m , .,x n− j + mb j−i−1 m c

The default list slice in Python utilizes a unit stride, meaning it has a step size of one However, you can customize the stride by using a third parameter in the slice notation, formatted as x[i:j:m], where 'i' is the starting index, 'j' is the ending index (exclusive), and 'm' represents the stride length For instance, x[::2] selects every second element from the list, equivalent to x[0:n:2], where n is the length of the list Additionally, a negative stride can be employed to access elements in reverse order; for example, x[::-1] reverses the list and is the same as x[0:n:-1] Various examples of accessing elements from one-dimensional lists illustrate these concepts.

IndexError: list index out of range

Lists can be multidimensional, allowing for direct slicing in higher dimensions For example, consider a 2-dimensional list, x = [[1,2,3,4], [5,6,7,8]] Using single indexing, x[0] retrieves the first inner list, while x[1] accesses the second inner list Additionally, since the first inner list is sliceable, it can be further sliced using x[0][0] or x[0][1:4].

A number of functions are available for manipulating lists The most useful are

The list methods in Python provide essential functionalities for managing data collections The `append(value)` method adds a specified value to the end of a list, while `len(x)` returns the total number of elements within that list To combine lists, the `extend(list)` method appends all values from one list to another For removing items, `pop(index)` deletes the value at a specified index and returns it, whereas `remove(value)` eliminates the first occurrence of a specified value Additionally, `count(value)` counts how many times a particular value appears in the list, and the `del` statement can be used to delete elements within a specified slice of the list.

Elements can also be deleted from lists using the keyworddelin combination with a slice.

A tuple is similar to a list in that it can hold multiple pieces of data, often of mixed types, but it differs in its immutability, meaning that once created, the elements of a tuple cannot be changed, added, or removed Tuples are defined using parentheses (()) instead of square brackets ([]), and they can be sliced just like lists Additionally, lists can be converted to tuples using the `tuple()` function, while tuples can be transformed into lists with the `list()` function.

>>> x # Contents can change, elements cannot

Tiêu đề	Introduction to Python for Econometrics, Statistics and Data Analysis
Tác giả	Kevin Sheppard
Trường học	University of Oxford
Chuyên ngành	Econometrics
Thể loại	book
Năm xuất bản	2014
Thành phố	Oxford

Định dạng
Số trang	405
Dung lượng	2,53 MB
File đính kèm	Introduction to Python for Econometrics.rar (2 MB)