Commit 31bdbbb4 authored by Michael Wimmer's avatar Michael Wimmer
Browse files

Simplified the numpy/scipy examples

parent 5c510a3d
# year hare lynx carrot
1900 30e3 4e3 48300
1901 47.2e3 6.1e3 48200
1902 70.2e3 9.8e3 41500
1903 77.4e3 35.2e3 38200
1904 36.3e3 59.4e3 40600
1905 20.6e3 41.7e3 39800
1906 18.1e3 19e3 38600
1907 21.4e3 13e3 42300
1908 22e3 8.3e3 44500
1909 25.4e3 9.1e3 42100
1910 27.1e3 7.4e3 46000
1911 40.3e3 8e3 46800
1912 57e3 12.3e3 43800
1913 76.6e3 19.5e3 40900
1914 52.3e3 45.7e3 39400
1915 19.5e3 51.1e3 39000
1916 11.2e3 29.7e3 36700
1917 7.6e3 15.8e3 41800
1918 14.6e3 9.7e3 43300
1919 16.2e3 10.1e3 41300
1920 24.7e3 8.6e3 47300
......@@ -11,7 +11,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"At the start of the week we had an introduction to the Python programming language and learned some basic concepts such as:\n",
"At the start of the day we had an introduction to the Python programming language and learned some basic concepts such as:\n",
"\n",
"+ assigning and using **variables** to store values\n",
"+ basic **types** (numbers, strings, lists, dictionaries) for representing data\n",
......@@ -1221,74 +1221,6 @@
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Advanced topics"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Interactive plots with [Holoviews](http://holoviews.org/)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"While Matplotlib provides reasonable plotting, it is not particularly well-adapted to interactive exploration of data sets.\n",
"\n",
"The Holoviews library provides more features for interactively exploring your data.\n",
"\n",
"First we import holoviews and tell it that we want it to display its output in the browser:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import holoviews as hv\n",
"hv.notebook_extension('matplotlib')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can plot our `sinc` function from before, and also slice the data along an axis, \n",
"with the position of the axis controlled by a slider widget:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x = np.linspace(-20, 20, 100)\n",
"y = np.linspace(-20, 20, 100)\n",
"\n",
"z = sinc(x.reshape(100, 1), y.reshape(1, 100))\n",
"\n",
"\n",
"sinc_image = hv.Image(z, bounds=(x[0], y[0], x[-1], y[-1]))\n",
"\n",
"hv.HoloMap({y: sinc_image * hv.HLine(y=y) + sinc_image.sample(y=y)\n",
" for y in np.arange(-19, 20)}, kdims=['Y']).collate().cols(2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
......@@ -1304,7 +1236,9 @@
"\n",
"[Scipy](http://scipy.org) is the core library that provides these more advanced capabilites. In addition there is an ecosystem of so-called \"[SciKits](http://scikits.appspot.com/)\" that build on Scipy to provide functionality adapted to more specific problem domains.\n",
"\n",
"Here, we will look at some of the key functionality that core Scipy provides."
"Here, we will look at some of the key functionality that core Scipy provides.\n",
"\n",
"***Do not feel compelled to study all those examples in detail! They are just provided to give you an overview of what scipy can do - you may find this helpful for example in your final project***"
]
},
{
......@@ -1892,495 +1826,6 @@
"+ sometimes you may come across [funny answers](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) too..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Scenario"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Your predecessor has left you some experimental data, in an excel spreadsheet. You want to plot and analyze the data in ways that are not provided by Excel; what do you do?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Solution"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"+ Copy and paste the data from the excel worksheet to a text file.\n",
"+ Import the data using `numpy.loadtxt`\n",
"+ Analyze and plot\n",
"\n",
"This may seem contrived, but **often the simplest solution is the best**."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Scenario"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You have data in an Excel spreadsheet \"`current_voltage_data.xlsx`\", but it is routinely updated by colleagues. In addition, your supervisor\n",
"wants to have not only the plots you produce, but also the data after you have run your analysis on it,\n",
"and she wants it in an Excel spreadsheet."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Solution"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"+ Google the string: \"`data analysis excel python`\"\n",
"+ First hit is a video tutorial that mentions \"pandas\"\n",
"+ Google \"pandas python excel\"\n",
"+ First hit is documentation from the Pandas website about a function \"`read_excel`\"; seems promising!\n",
"+ Install \"pandas\" with \"`pip install pandas`\"\n",
"+ Try to use `\"read_excel\"` (it will fail complaining about `xlrd`\n",
"+ Install `xlrd` with pip\n",
"+ Import data\n",
"+ Google \"scipy linear regression\"\n",
"+ copy snippet from website\n",
"+ google for \"pandas data to numpy array\"\n",
"+ calculate linear regression\n",
"+ plot\n",
"+ google for how to output data to excel\n",
"+ try to use \"to_excel\" (it will fail complaining about `openpyxl`)\n",
"+ Install `openpyxl` with pip\n",
"+ save to Excel format"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas\n",
"import scipy.stats as stats\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"\n",
"# read in and clean up data\n",
"data_frame = pandas.read_excel('current_voltage_data.xlsx')\n",
"cleaned_data = data_frame.dropna() # remove undefined data entries \n",
"cleaned_data_matrix = cleaned_data.as_matrix() # get data as numpy array\n",
"\n",
"# perform linear regression\n",
"linregress = stats.linregress(cleaned_data_matrix)\n",
"\n",
"# plot the data\n",
"x, y = cleaned_data_matrix.transpose()\n",
"plt.scatter(x, y)\n",
"plt.plot(x, linregress.slope * x + linregress.intercept, linewidth=2, color='r')\n",
"\n",
"cleaned_data.to_excel('output.xlsx', index=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Extended Exercises"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These exercises require you to use functionality from several of the libraries introduced above"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Explore!\n",
"\n",
"At the start of the notebook there were just a few examples of scientific Python packages for treating different problem domains.\n",
"\n",
"Find a Python package that you think could be useful in your research. Try installing it and testing out some of the usage examples"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Animal Statistics"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The file in `populations.txt` describes the populations of hares and lynxes (and carrots) in nothern Canada over the course of 20 years.\n",
"\n",
"#### Tasks\n",
"\n",
"+ load and plot the data in `populations.txt`. Don't forget to label your axes and produce a legend!\n",
"+ Print the mean and standard deviation of the population for each species\n",
"+ Print which species has the largest population for each year (**hint**: `np.argmax`)\n",
"+ Print for which years the populations of any of the species above 50000\n",
"+ For each species, print the 2 years when their population was the lowest\n",
"+ Plot the change in the hare population and the number of lynxes over time\n",
"+ Calculate the correlation coefficient between the change in the hare population and the number of lynxes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = np.loadtxt('populations.txt')\n",
"\n",
"years, hares, lynxes, carrots = data.T # trick: columns to variables\n",
"populations = data.T[1:]\n",
"\n",
"species = ('Hare', 'Lynx', 'Carrot')\n",
"\n",
"plt.axes([0.2, 0.1, 0.5, 0.8]) \n",
"plt.plot(years, hares, years, lynxes, years, carrots)\n",
"plt.xlabel('year')\n",
"plt.ylabel('population')\n",
"plt.legend(('Hare', 'Lynx', 'Carrot'), loc=(1.05, 0.5)) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"means = np.mean(populations, axis=1)\n",
"std_devs = np.std(populations, axis=1)\n",
"for organism, mean, std in zip(species, means, std_devs):\n",
" print(organism, '-- population mean:', mean, ' -- population std dev:', std)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"max_idxs = np.argmax(populations, axis=0)\n",
"for year, i in zip(years, max_idxs):\n",
" print('year', int(year), 'had mostly: ', species[i])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"which_species, which_years = np.argwhere(populations > 50000).T\n",
"print('the following years had > 50000 of an organism:', *years[which_years])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.axes([0.2, 0.1, 0.5, 0.8]) \n",
"plt.plot(years[1:], np.diff(hares), years, lynxes)\n",
"plt.xlabel('year')\n",
"plt.ylabel('population')\n",
"plt.legend(('Change in Hares', 'Lynx', 'Carrot'), loc=(1.05, 0.5)) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"least_populous_idxs = np.argsort(populations, axis=1)[:, :2]\n",
"\n",
"for organism, least_populous_years in zip(species, years[least_populous_idxs]):\n",
" print(organism, 'was least populous in years: ', *least_populous_years)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('The correlation coefficient between the lynx population and the change in the Hare population is',\n",
" np.corrcoef(np.diff(hares), lynxes[1:])[1, 0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## [Mandelbrot set](https://en.wikipedia.org/wiki/Mandelbrot_set)\n",
"\n",
"The Mandelbrot set is the set of complex numbers $c$ for which the function $f_c ( z ) = z^2 + c$ does not diverge when iterated from $z = 0$.\n",
"\n",
"In this exercise we will sample and vizualize the Mandelbrot set using Numpy and Matplotlib.\n",
"\n",
"#### Tasks\n",
"+ write a function `mandelbrot` that takes a complex value $c$, evaluates $f_c$ 100 times starting from $0$, and returns the\n",
" absolute value of the result\n",
"+ evaluate your function on a grid of complex numbers (**hint**: use numpy operations and writing `c = x + 1j * y`\n",
" where `x` and `y` are real)\n",
"+ plot the result using `pyplot.imshow`\n",
"+ improve your `mandelbrot` function so as to remove the `RuntimeWarning` that you get when evaluating it at points that \n",
" are not in the Mandelbrot set (what is happening here?)\n",
" \n",
"#### Harder tasks\n",
"+ make a visualisation that allows interactive zooming\n",
"+ make the visualisation re-evaluate the points in the mandelbrot set when\n",
" the user zooms (so that more structure is revealed by zooming)\n",
"+ do you think you can speed up your solution?\n",
"\n",
"+ use [Cython](http://cython.org/) to rewrite your `mandelbrot` function. Take advantage of multiple CPU cores when\n",
" evaluating the function by using [prange](http://cython.readthedocs.io/en/latest/src/userguide/parallelism.html).\n",
" You can use the [Cython cell magic](http://cython.readthedocs.io/en/latest/src/quickstart/build.html) to auto-compile\n",
" your code from within the notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def mandelbrot(c):\n",
" z = 0\n",
" for i in range(100):\n",
" z = z**2 + c\n",
" return np.abs(z)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x = np.linspace(-2, 1, 500)\n",
"y = np.linspace(-1, 1, 500)\n",
"M = x[None, :] + 1j * y[:, None]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"R = mandelbrot(M)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.imshow(R)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%load_ext Cython"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%cython --compile-args=-fopenmp --link-args=-fopenmp --annotate\n",
"\n",
"import numpy as np\n",
"from cython.parallel cimport prange\n",
"from cython cimport boundscheck\n",
"\n",
"cdef double mandelbrot_inner(complex c) nogil:\n",
" cdef complex z = 0\n",
" for i in range(100):\n",
" z = z * z + c\n",
" return z.real * z.real + z.imag * z.imag\n",
"\n",
"@boundscheck(False)\n",
"def mandelbrot_cython(points):\n",
" cdef complex[:] linear_points = points.reshape(-1)\n",
" cdef long i, size = linear_points.shape[0]\n",
" cdef double[:] result = np.empty((linear_points.shape[0],), dtype=float)\n",
" for i in prange(size, nogil=True):\n",
" result[i] = mandelbrot_inner(linear_points[i])\n",
" return np.asarray(result).reshape(points.shape) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%timeit mandelbrot(M)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%timeit mandelbrot_cython(M)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## [Lorenz attractor](https://en.wikipedia.org/wiki/Lorenz_system)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Lorenz system is a system of ordinary differential equations that displays chaotic solutions for certain parameter values.\n",
"\n",
"The system is defined by\n",
"\n",
"$$\n",
"\\begin{align}\n",
"\\frac{dx}{dt} &= \\sigma(y - x)\\\\\n",
"\\frac{dy}{dt} &= x(\\rho - z) - y\\\\\n",
"\\frac{dz}{dt} &= xy - \\beta z\n",
"\\end{align}\n",
"$$\n",
"\n",
"where $(x, y, z)$ are the coordinates, and $\\beta$, $\\sigma$, and $\\rho$\n",
"are parameters.\n",
"\n",
"In this exercise we will numerically solve the Lorenz system, visualize the solution with Matplotlib, and explore the parameter space\n",
"\n",
"#### Tasks\n",
"+ write a function, `lorenz`, that takes a 3-vector of coordinates $(x, y, z)$ and the current time, and\n",
" returns a 3-vector $(\\dot{x}, \\dot{y}, \\dot{z})$ (use the values $\\beta=8/3$, $\\sigma=10$, and $\\rho=28$\n",
" for now)\n",
"+ use `scipy.integrate.odeint` with your `lorenz` function to solve the lorenz system\n",
"+ plot the trajectory projected onto the $x-y$, $y-z$ and $x-z$ planes, respectively\n",
"+ this system shows sensitivity to initial conditions; estimate the [Lyapunov exponent](https://en.wikipedia.org/wiki/Lyapunov_exponent) of the system.\n",
"\n",
"#### Harder Tasks\n",
"+ create a 3D visualization instead of the 3 projected visualizations\n",
"+ estimate the positions of the \"attractor\" points"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import scipy.integrate as spi\n",
"beta = 8/3\n",
"sigma = 10\n",
"rho = 10\n",
"\n",
"def lorenz(state, t):\n",
" x, y, z = state\n",
" deriv = np.empty((3,))\n",
" deriv[0] = sigma * (y - x)\n",
" deriv[1] = x * (rho - z) - y\n",
" deriv[2] = x * y - beta * z\n",
" return deriv\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"times = np.linspace(0, 50, 10000)\n",
"start = np.ones((3,))\n",
"result = spi.odeint(lorenz, start, times)\n",
"x, y, z = result.transpose()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.plot(x, z)"
]
},
{
"cell_type": "markdown",
"metadata": {},
......@@ -2414,7 +1859,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
"version": "3.6.5"
}
},
"nbformat": 4,
......