Commit 5c510a3d by Michael Wimmer

### reduce number of exercises

parent 2ed3c571
 ... ... @@ -4,7 +4,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ "# List of projects for day 1\n", "# List of possible exercises for basic python\n", "\n", "## 1. Basic exercises\n", "\n", ... ... @@ -131,61 +131,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Programming for biologists\n", "\n", "#### Level: Beginner\n", "\n", "Here are two basic problems from a python programming course expecially [aimed at biologists](http://www.programmingforbiologists.org)\n", "\n", "### Body mass of dinosaurs\n", "\n", "The length of an organism is typically strongly correlated with it's body mass. This is useful because it allows us to estimate the mass of an organism even if we only know its length. This relationship generally takes the form:\n", "\n", "$$\\text{Mass} = a * \\text{Length}^b$$\n", "\n", "Where the parameters $a$ and $b$ vary among groups, mass is given in kg, and length in m. This allometric approach is regularly used to estimate the mass of dinosaurs since we cannot weigh something that is only preserved as bones.\n", "\n", "Different values of $a$ and $b$ are for example ([Seebacher 2001](http://www.jstor.org/stable/4524171\n", "))\n", "\n", "[Therapoda](https://en.wikipedia.org/wiki/Theropoda) (e.g. T-rex): $a=0.73$, $b=3.63$\n", "[Sauropoda](https://en.wikipedia.org/wiki/Sauropoda) (e.g. Brachiosaurus): $a = 214.44$, $b = 1.46$\n", "\n", "get_mass_from_length() that estimates the mass of an organism in kg based on it's length in meters by taking length, a, and b as parameters. To be clear we want to pass the function all 3 values that it needs to estimate a mass as parameters. This makes it much easier to reuse for all species.\n", "\n", "Use this function to compute the mass of [T-rex Trix](https://en.wikipedia.org/wiki/Trix_(dinosaur)) on display in Naturalis, Leiden, which is 13 meters long. \n", "Compare to the [Camarasaurus](https://www.naturalis.nl/nl/kennis/collectie/topstukken/camarasaurusskelet/) they have there, too.\n", "\n", "### DNA vs RNA\n", "\n", "Write a function, dna_or_rna(sequence), that determines if a sequence of base pairs is DNA, RNA, or if it is not possible to tell given the sequence provided. Since all the function will know about the material is the sequence the only way to tell the difference between DNA and RNA is that RNA has the base Uracil (u) instead of the base Thymine (t). Have the function return one of three outputs: 'DNA', 'RNA', or 'UNKNOWN'. Use the function and a for loop to print the type of the sequences in the following list.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sequences = ['ttgaatgccttacaactgatcattacacaggcggcatgaagcaaaaatatactgtgaaccaatgcaggcg',\n", " 'gauuauuccccacaaagggagugggauuaggagcugcaucauuuacaagagcagaauguuucaaaugcau',\n", " 'gaaagcaagaaaaggcaggcgaggaagggaagaagggggggaaacc',\n", " 'guuuccuacaguauuugaugagaaugagaguuuacuccuggaagauaauauuagaauguuuacaacugcaccugaucagguggauaaggaagaugaagacu',\n", " 'gauaaggaagaugaagacuuucaggaaucuaauaaaaugcacuccaugaauggauucauguaugggaaucagccggguc']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Optional: For a little extra challenge make your function work with both upper and lower case letters, or even strings with mixed capitalization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Project Euler\n", "## 2. Project Euler\n", "\n", "#### Level: Beginner - as complicated as you want\n", "\n", ... ... @@ -219,174 +165,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Cryptography \n", "\n", "### Caesar's cipher - trial and error\n", "\n", "#### Level: Beginner\n", "\n", "The following text is encrypted by shifting all letters in the alphabet by a fixed amount\n", "(e.g. a shift by 2 would give A -> C, B -> D, ..., Z -> B). Decrypt it by trying out all possible shifts! (Just copy the assignment below to use the text in your python program)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "encrypted_text1 = \"\"\"\n", "RW. XMJWQTHP MTQRJX, BMT BFX ZXZFQQD AJWD QFYJ NS YMJ RTWSNSLX, XFAJ ZUTS YMTXJ\n", "STY NSKWJVZJSY THHFXNTSX BMJS MJ BFX ZU FQQ SNLMY, BFX XJFYJI FY YMJ GWJFPKFXY YFGQJ.\n", "N XYTTI ZUTS YMJ MJFWYM-WZL FSI UNHPJI ZU YMJ XYNHP BMNHM TZW ANXNYTW MFI QJKY GJMNSI\n", "MNR YMJ SNLMY GJKTWJ. NY BFX F KNSJ, YMNHP UNJHJ TK BTTI, GZQGTZX-MJFIJI, TK YMJ XTWY\n", "BMNHM NX PSTBS FX F \"UJSFSL QFBDJW.\" OZXY ZSIJW YMJ MJFI BFX F GWTFI XNQAJW GFSI SJFWQD\n", "FS NSHM FHWTXX. \"YT OFRJX RTWYNRJW, R.W.H.X., KWTR MNX KWNJSIX TK YMJ H.H.M.,\" BFX\n", "JSLWFAJI ZUTS NY, BNYM YMJ IFYJ \"1884.\" NY BFX OZXY XZHM F XYNHP FX YMJ TQI-KFXMNTSJI\n", "KFRNQD UWFHYNYNTSJW ZXJI YT HFWWD—INLSNKNJI, XTQNI, FSI WJFXXZWNSL. \n", "\n", "\"BJQQ, BFYXTS, BMFY IT DTZ RFPJ TK NY?\" \n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To this end, write a *function* that takes an encrypted text as input as well as a shift, and that then\n", "prints the decrypted string.\n", "\n", "If you would like to have some hints, click here.\n", "\n", "
\n", "Remember, you can loop over all letters in a string using for c in encrypted_text1:. Also, you can\n", "check if a letter is between \"A\" to \"Z\" by \"A\" <= c and c <= \"Z\". One way of doing the subsitution is\n", "to generate a dictionary with letters as keys and as values, and then use this to do the substitution in\n", "the for-loop.\n", "
\n", "\n", "A typical mistake you can find here.\n", "\n", "
\n", "There's a subtle logical mistake that is often done here: you might be compelled to first replace all A's \n", "in the string with another letter. Say the shift is 1, and that letter is B. Then you will have in your string\n", "two types of B's: decrypted (from replacing A) and still encrypted (the B from the original encrypted string). If you now replace both type of B's in the next step, you run into problems ...\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Caesar's cipher - frequency analysis\n", "\n", "#### Level: Intermediate\n", "\n", "The Caesar's cypher can be broken by *frequency analysis*: Letters occur in a text with a certain probability. For example, a probability distribution for the English langiage can be found at [wikipedia](https://en.wikipedia.org/wiki/Frequency_analysis).\n", "\n", "Make a probability analysis of the text above. Plot the histogram of the letters using print statements, that looks like \n", "\n", " A ****\n", " B ********\n", " C **\n", " \n", "etc. From that, determine the shift of the Caeser's cypher and decrypt it in one go.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A step aside: Frequency analysis of English texts\n", "\n", "#### Level: Intermediate\n", "\n", "In the previous exercise, you were asked to break a cypher using the frequency of letters as given in [wikipedia](https://en.wikipedia.org/wiki/Frequency_analysis). \n", "\n", "You can create such a histogram of frequency of letters yourself. For this, download the text of \"Romeo and Juliet\" from the file romeo_and_juliet.txt (that was downloade from [project Gutenberg](https://www.gutenberg.org/ebooks/1112.txt.utf-8)), and count the occurences\n", "of all letters A-Z (treating upper case A-Z and lower case a-z the same). Do you find the same result as in the wikipedia article?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Substitution cipher\n", "\n", "#### Level: Intermediate - Advanced\n", "\n", "Now let's look at a more complicated cipher where we replace (uniquely) every letter by some other letter. This is not just described by a simple shift as before, but now we need to find a different mapping for every letter!\n", "\n", "It is a bit difficult to do this by frequency analysis of single letters (although you might identify certain letters). It is easier to do this by doing a frequency analysis of [bigrams](https://en.wikipedia.org/wiki/Bigram#Bigram_frequency_in_the_English_language), i.e. two letter combinations.\n", "\n", "Write a code to find out the probability of single letters and of the 10 most frequent bigrams. Use this as input to decypher the following text:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "encrypted_text2 = \"\"\"\n", " \"IEGG ZE, DMISTW, DBMI FT RTN ZMOE TQ TNY LPSPITY'S SIPHO? SPWHE DE BMLE XEEW ST\n", "NWQTYINWMIE MS IT ZPSS BPZ MWF BMLE WT WTIPTW TQ BPS EYYMWF, IBPS MHHPFEWIMG STNLEWPY\n", "XEHTZES TQ PZJTYIMWHE. GEI ZE BEMY RTN YEHTWSIYNHI IBE ZMW XR MW EUMZPWMIPTW TQ PI.\"\n", " \"P IBPWO,\" SMPF P, QTGGTDPWV MS QMY MS P HTNGF IBE ZEIBTFS TQ ZR HTZJMWPTW, \"IBMI FY.\n", "ZTYIPZEY PS M SNHHESSQNG, EGFEYGR ZEFPHMG ZMW, DEGG-ESIEEZEF SPWHE IBTSE DBT OWTD BPZ\n", "VPLE BPZ IBPS ZMYO TQ IBEPY MJJYEHPMIPTW.\"\n", " \"VTTF!\" SMPF BTGZES. \"EUHEGGEWI!\"\n", " \"P IBPWO MGST IBMI IBE JYTXMXPGPIR PS PW QMLTNY TQ BPS XEPWV M HTNWIYR JYMHIPIPTWEY\n", "DBT FTES M VYEMI FEMG TQ BPS LPSPIPWV TW QTTI.\"\n", " \"DBR ST?\"\n", " \"XEHMNSE IBPS SIPHO, IBTNVB TYPVPWMGGR M LEYR BMWFSTZE TWE BMS XEEW ST OWTHOEF MXTNI\n", "IBMI P HMW BMYFGR PZMVPWE M ITDW JYMHIPIPTWEY HMYYRPWV PI. IBE IBPHO-PYTW QEYYNGE PS\n", "DTYW FTDW, ST PI PS ELPFEWI IBMI BE BMS FTWE M VYEMI MZTNWI TQ DMGOPWV DPIB PI.\"\n", " \"JEYQEHIGR STNWF!\" SMPF BTGZES.\n", " \"MWF IBEW MVMPW, IBEYE PS IBE 'QYPEWFS TQ IBE H.H.B.' P SBTNGF VNESS IBMI IT XE IBE\n", "STZEIBPWV BNWI, IBE GTHMG BNWI IT DBTSE ZEZXEYS BE BMS JTSSPXGR VPLEW STZE SNYVPHMG\n", "MSSPSIMWHE, MWF DBPHB BMS ZMFE BPZ M SZMGG JYESEWIMIPTW PW YEINYW.\"\n", " \"YEMGGR, DMISTW, RTN EUHEG RTNYSEGQ,\" SMPF BTGZES, JNSBPWV XMHO BPS HBMPY MWF GPVBIPWV\n", "M HPVMYEIIE. \"PWIEYESIPWV, IBTNVB EGEZEWIMYR,\" SMPF BE MS BE YEINYWEF IT BPS QMLTNYPIE\n", "HTYWEY TQ IBE SEIIEE. \"IBEYE MYE HEYIMPWGR TWE TY IDT PWFPHMIPTWS NJTW IBE SIPHO.\n", "PI VPLES NS IBE XMSPS QTY SELEYMG FEFNHIPTWS.\"\n", " \"BMS MWRIBPWV ESHMJEF ZE?\" P MSOEF DPIB STZE SEGQ-PZJTYIMWHE. \"P IYNSI IBMI IBEYE PS\n", "WTIBPWV TQ HTWSEANEWHE DBPHB P BMLE TLEYGTTOEF?\"\n", " \"P MZ MQYMPF, ZR FEMY DMISTW, IBMI ZTSI TQ RTNY HTWHGNSPTWS DEYE EYYTWETNS.\"\n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hint: Don't try to do everything in one go. As you have identified certain letters, substitute those in the text, and then guess other letters. Decrypt the text thus step by step." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Babynames\n", "\n", "#### Level: Intermediate\n", "\n", "The file babynames.txt contains data on all names given to male children in the Netherlands in 2015 (from [SVB](https://www.svb.nl/int/nl/kindernamen/artikelen/top20/jongens/index.jsp). I apologize for only giving boy's names - for some reason the SVB didn't give a file with all girl's names)\n", "\n", "Read the file and do some analysis of the data:\n", "\n", "- Find the most common name. How many percent of Dutch male children were named like this in all of 2015?\n", "- Find the shortest/longest name\n", "- Find the name with the most special characters (\"special\" = not A-Z)\n", "- Find the top 20 names, and give the percentage how often children were named like this. What is the total percentage of the top 20 names, i.e. how often are top 20 names given?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Find the zero of a function\n", "## 3. Find the zero of a function\n", "\n", "#### Level: Intermediate to Advanced\n", "\n", ... ... @@ -427,7 +206,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" "version": "3.6.5" } }, "nbformat": 4, ... ...
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!