{"id":23,"date":"2019-08-15T09:47:36","date_gmt":"2019-08-15T07:47:36","guid":{"rendered":"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/?post_type=chapter&#038;p=23"},"modified":"2024-01-28T12:19:01","modified_gmt":"2024-01-28T11:19:01","slug":"introduction-to-r","status":"publish","type":"chapter","link":"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/chapter\/introduction-to-r\/","title":{"raw":"Introduction to R","rendered":"Introduction to R"},"content":{"raw":"<p style=\"text-align: justify\">In this chapter, we will take a step back and discuss several defining features of R. If you are not yet familiar with any programming language, you will learn the conventional terminology and get to know practical and common concepts of programming.<\/p>\r\n&nbsp;\r\n<p style=\"text-align: justify\"><strong>Why R?<\/strong><\/p>\r\n<a href=\"https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_2zy3cc4p\">https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_2zy3cc4p<\/a>\r\n\r\n<span>[iframe width=\"640\" height=\"360\" src=\"https:\/\/tube.switch.ch\/embed\/f5548367<\/span><span>\" frameborder=\"0\" webkitallowfullscreen mozallowfullscreen allowfullscreen]<\/span>\r\n<p style=\"text-align: justify\">In the last chapter, we saw how R can be used to get a handle on large data sets, and if you got this far, there will be no need to recapitulate how statistics can augment linguistic research. If, however, you already know another programming language you might wonder why you should learn R. Well, there are several reasons:<\/p>\r\n\r\n<ul style=\"text-align: justify\">\r\n \t<li>It\u2019s very close to statistics<\/li>\r\n \t<li>It\u2019s powerful, as you will see later on<\/li>\r\n \t<li>It makes minimal use of procedural style and has a functional flavor<\/li>\r\n \t<li>There are many libraries for specific tasks<\/li>\r\n \t<li>The code is often less hackish than e.g. Perl<\/li>\r\n \t<li>It has less syntactic overhead than e.g. Perl<\/li>\r\n \t<li>It is less verbose than e.g. Java<\/li>\r\n \t<li>It is conceptually easier than Prolog<\/li>\r\n<\/ul>\r\n<p style=\"text-align: justify\">And, of course, different languages are useful for different purposes. Don\u2019t use R instead of, but in addition to the languages you already know. There are, for example, APIs from Perl, Prolog and Python to R which allow you to integrate the different languages with each other.<\/p>\r\n<p style=\"text-align: justify\">Unlike other statistics programs, R is a free, open source language, and has a very active community committed to programming extensions to R. These extensions are known as libraries, and usually if you have a clear idea of what you want to do in R but can't find a predefined function to do it, you will find that someone has programmed a library which serves your need. We are grateful to guyjantic on Imgur for visualizing the allure of R over other programs:<\/p>\r\n\r\n\r\n[caption id=\"attachment_166\" align=\"aligncenter\" width=\"960\"]<img src=\"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/1.-Why-R.png\" alt=\"\" class=\"size-full wp-image-166\" width=\"960\" height=\"817\" \/> Figure 2.1: R in comparison (https:\/\/imgur.com\/gallery\/CCr9jE9).[\/caption]\r\n\r\n&nbsp;\r\n<p style=\"text-align: justify\"><strong>Introduction to R<\/strong><\/p>\r\nThis section builds on Gries (2008), chapter 2.\r\n\r\n<a href=\"https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_3eal1bck\">https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_3eal1bck<\/a>\r\n<p style=\"text-align: justify\"><span>[iframe width=\"640\" height=\"360\" src=\"https:\/\/tube.switch.ch\/embed\/7ac34dc7<\/span><span>\" frameborder=\"0\" webkitallowfullscreen mozallowfullscreen allowfullscreen]<\/span><\/p>\r\n<p style=\"text-align: justify\">In the last chapter, you already got a taste of some simple commands. Here, we briefly recapitulate some of what we discussed in the previous chapter and introduce several more foundational concepts.<\/p>\r\n<p style=\"text-align: justify\">For those of you new to programming, it is useful to think of programming as writing sets of calculations and\/or commands. These commands have to be entered at the prompt in the console of R. Remember that the prompt is the <code>&gt;<\/code>.<\/p>\r\n<p style=\"text-align: justify\">At the most basic level, we can use R as a calculator. In those cases, we just enter our calculations and get the result:<\/p>\r\n\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">17\/2\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 8.5<\/span><\/span><\/code><\/pre>\r\n<p style=\"text-align: justify\">As we have seen, we can also define variables. Variables can take different forms and structures. In addition to numeric data, we often use character strings in linguistics. Character strings are identifiable by the quotation marks around them, as we will see below. Both strings and numerals can come as single values, data frames (which we saw in the last chapter) or vectors, which we discuss at length below.<\/p>\r\nHere you see an example of a variable which contains a string and of one which contains a numeric vector:\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">a&lt;- \"Hello World\"\r\n<\/span><\/span><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">a\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] \"Hello World\"\r\n<\/span><\/span><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">b=c(1,3,4)\r\n<\/span><\/span><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">b\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 1 3 4<\/span><\/span><\/code><\/pre>\r\n<p style=\"text-align: justify\">Note how the strings are identifiable by the quotation marks and the numerals by the absence of quotation marks.<\/p>\r\n<p style=\"text-align: justify\">So far, we have only shown examples where there is one command per line. However, it is also possible to enter sequences of commands. To do this, we enter several commands separated by a semicolon:<\/p>\r\n\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">a &lt;- 2+3 ; a &lt;- a+a ; a\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 10<\/span><\/span><\/code><\/pre>\r\n<p style=\"text-align: justify\">In RStudio, we can also use the script to write and execute a sequence of commands. When working in the script window of RStudio, you can write each command on a new line. Then you select the lines you want to run and execute them by pressing \"Ctrl+Enter\" or clicking \"Run\" in the top right of the window.<\/p>\r\n\r\n\r\n[caption id=\"attachment_170\" align=\"aligncenter\" width=\"1919\"]<img src=\"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/2.-Sequence-RStudio.jpg\" alt=\"\" class=\"size-full wp-image-170\" width=\"1919\" height=\"1030\" \/> Figure 2.2: A sequence of commands in the RStudio script[\/caption]\r\n<p style=\"text-align: justify\">As we mentioned above, R contains a lot of predefined functions beyond simple mathematical ones. Unless you use R only as a calculator, you will use functions which take an argument. This means that function is performed on an object, like one of our variables. Let's begin by looking at a simple mathematical function which takes an argument: the square root. We can use the predefined function <code>sqrt()<\/code> to calcualte the square root of a numeric value. Let's overwrite our variable <em>b<\/em> to become the square root of 5 and then print the result:<\/p>\r\n\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">b &lt;- sqrt(5); print(b)\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 2.236068<\/span><\/span><\/code><\/pre>\r\n<p style=\"text-align: justify\">A function we will use a lot is <code>plot()<\/code>. Let's a numeric vector to the new variable <em>v<\/em> and plot the result. The picture below contains the two bottom windows of RStudio, showing the console with the command and the resulting plot.<\/p>\r\n\r\n\r\n[caption id=\"attachment_171\" align=\"aligncenter\" width=\"1903\"]<img src=\"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/3.-Plotting-vector.jpg\" alt=\"\" class=\"size-full wp-image-171\" width=\"1903\" height=\"518\" \/> Figure 2.3: Plotting vector <em>v<\/em> in RStudio[\/caption]\r\n<p style=\"text-align: justify\">Another function which takes an argument is <code>sample ()<\/code>. In fact, <code>sample()<\/code> takes two arguments. Test your intuition of how this function works in the exercise below.<\/p>\r\n<p style=\"text-align: justify\">[h5p id=\"3\"]<\/p>\r\n<p style=\"text-align: justify\">In the last chapter, we already briefly touched on ranges. This is a fairly simple concept, but it can be very useful if you want to generate a bunch of numbers quickly (as sample data, for instance). The operater which enables us to create a range is the <code>:<\/code>. In the example below, we use this to generate a variable containing a numerical vector ranging from 0 through 10:<\/p>\r\n\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">MyRange &lt;- c(0:10) ; MyRange\r\n<\/span><span class=\"GNKRCKGCGSB\"> [1]  0  1  2  3  4  5  6  7  8  9 10<\/span><\/span><\/code><\/pre>\r\n<p style=\"text-align: justify\">A final basic concept are indexes, which we also touched on in the last chapter. Indexes refer to particular positions in vectors and can be accessed using square brackets. For instance, to access the fifth position in the <em>MyRange<\/em> variable, we can use the command:<\/p>\r\n\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">MyRange[5]\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 4<\/span><\/span><\/code><\/pre>\r\n<p style=\"text-align: justify\">As we discussed at some length in the last chapter, we can also use indexes to access specific cells in data frames. The data frame examples, of course, had two dimensions (rows and columns), while the vector, being a list, only has one. If you try to incorporate a second dimension in <em>MyRange<\/em> by inserting a comma in the brackets, you get an error message:<\/p>\r\n\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><span><code><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">MyRange[5,]\r\n<\/span><\/code><span class=\"GNKRCKGCASB ace_constant\"><code>Error in MyRange[5, ] : incorrect number of dimensions<\/code> <\/span><\/span><\/pre>\r\n&nbsp;\r\n\r\n<strong>Sampling and looping<\/strong>\r\n<p style=\"text-align: justify\">So far, so good, or so we hope. As you might expect, things get more complicated fairly quickly. For instance, we can formulate conditional commands in R, meaning that the output depends on an element which is determined by a step in a squence of commands, rather than by us.<\/p>\r\n<a href=\"https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_vijio01q\">https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_vijio01q<\/a>\r\n<p style=\"text-align: justify\"><span>[iframe width=\"640\" height=\"360\" src=\"https:\/\/tube.switch.ch\/embed\/54b95209<\/span><span>\" frameborder=\"0\" webkitallowfullscreen mozallowfullscreen allowfullscreen]<\/span><\/p>\r\nTake, for example, the sequence of commands below:\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">randomthingy = sample(c(1,2,3),1) ; if (randomthingy &gt; 1) {print (\"You are lucky\")} else {print(\"I am sorry\")}\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] \"You are lucky\"<\/span><\/span><\/code><\/pre>\r\n<p style=\"text-align: justify\">You are now, of course, familiar with the <code>sample()<\/code> function, so you will immediately see that the first command in this sequence assigns the value 1, 2 or 3 to the variable randomthingy. The second part of the sequence is a little more complicated: it is a logical test. The argument in brackets behind the <code>if<\/code>, <code>(randomthingy &gt; 1)<\/code>, tests whether the number we sampled before is greater than one. If this is the case, the output is \"You are lucky\". If, however, the number we sampled is equal to one, the output is \"I am sorry\".<\/p>\r\n<p style=\"text-align: justify\">You can see that the conditional output is put in quotation marks, because we want the output to be character strings, and in curly brackets, { }. The curly brackets are referred to as blocks. In this example, the blocks allow us to conditionally define the output.<\/p>\r\n<p style=\"text-align: justify\">Blocks are also used in another important feature of programming: loops. Loops allow us to run multiple iterations of a command. Before we discuss what this means in detail, take a look at this example:<\/p>\r\n\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">for (i in c(1:3)) {myS = sample(c(1,2,3),i) ; print(myS)}\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 2\r\n[1] 2 1\r\n[1] 1 2 3<\/span><\/span><\/code><\/pre>\r\n<p style=\"text-align: justify\">What is going on here? This is perhaps easiest to grasp by looking at the what is in the block. You can see that it is a squence of familiar commands: in the first, we draw a sample from a vector and assign it to the variable <em>myS<\/em>, and in the second we tell R to display the variable.<\/p>\r\n<p style=\"text-align: justify\">The new element here is the second argument in the <code>sample()<\/code> function, <code>i<\/code>. What is <code>i<\/code>? If you look at the <code>for()<\/code> command in front of the block, you get the answer. Here, we tell R to iterate through the vector ranging from 1 through 3. This means that in the first iteration, <code>i<\/code> takes the value 1. Then, the commands in the block are executed. In the second iteration, <code>i<\/code> takes the value 2. Again, the commands in the block are executed. In the third and final iteration, <code>i<\/code> takes the value 3. Again, the commands in the block are executed.<\/p>\r\n<p style=\"text-align: justify\">If it was slightly repetitive to read the explanation, you have understood why loops are a useful concept. Imagine you want to run a loop through not three, but, say, 10'000 iterations. With this small but powerful line of code, you can sometimes save yourself hours of tedious manual labor.<\/p>\r\n&nbsp;\r\n\r\n<strong>Default assumptions<\/strong>\r\n\r\n<a href=\"https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_oje11hip\">https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_oje11hip<\/a>\r\n<p style=\"text-align: justify\"><span>[iframe width=\"640\" height=\"360\" src=\"https:\/\/tube.switch.ch\/embed\/89735fe5<\/span><span>\" frameborder=\"0\" webkitallowfullscreen mozallowfullscreen allowfullscreen]<\/span><\/p>\r\n<p style=\"text-align: justify\">Often it is not entirely clear what a command is supposed to do, and there are many default assumptions being made by R in the background. Run the following loop, which is identical to the one above except that it runs not through three but through eight iterations:<\/p>\r\n\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">for (i in c(1:8)) {myS = sample(c(1,2,3),i) ; print(myS)}\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 2\r\n[1] 1 2\r\n[1] 1 2 3\r\n<\/span><span class=\"GNKRCKGCASB ace_constant\">Error in sample.int(length(x), size, replace, prob) : \r\n  cannot take a sample larger than the population when 'replace = FALSE'<\/span><\/span><\/code><\/pre>\r\n<p style=\"text-align: justify\">The error message R prints shows that we are working with the default assumption under which the <code>sample()<\/code> function samples without replacement. In other words, there are only three values which can be drawn, and once all three numbers are drawn R prints the error message because it cannot draw the fourth value which would be required in the fourth iteration. In order to avoid this error, we can augment our sample command with a further argument, where we define \u201creplace = TRUE\u201d:<\/p>\r\n\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">for (i in c(1:8)) {myS = sample(c(1,2,3),i, replace=TRUE) ; print(myS)}\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 3\r\n[1] 2 2\r\n[1] 3 2 2\r\n[1] 1 2 3 2\r\n[1] 2 1 1 1 2\r\n[1] 1 2 2 2 3 3\r\n[1] 2 3 1 2 3 3 3\r\n[1] 2 2 2 3 2 1 3 1<\/span><\/span><\/code><\/pre>\r\n<p style=\"text-align: justify\">With this additional argument, we allow for sampling with replacement, which means that R can draw each value multiple times. Now the loop runs without error messages.<\/p>\r\n<p style=\"text-align: justify\">It is not always obvious which functions take which arguments, let alone what the default assumptions of these arguments are. In those cases, we can look up the default arguments using R\u2019s inbuilt documentation using one of the two help commands: <code>help(sample)<\/code>\u00a0or <code>?sample<\/code>. If you are using the R base, a new window will open with the documentation file, and if you are using RStudio, the documentation will open in the bottom right window.<\/p>\r\n\r\n\r\n[caption id=\"attachment_176\" align=\"aligncenter\" width=\"1348\"]<img src=\"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/Help-in-RStudio.jpg\" alt=\"\" class=\"size-full wp-image-176\" width=\"1348\" height=\"515\" \/> Figure 2.4: Screenshot of the documentation to sample() in RStudio[\/caption]\r\n<p style=\"text-align: justify\">In the beginning, the information in the documentation may be more confusing than helpful because of the style it is presented in. Most help pages feature several examples at the end of the page, and it is worth scrolling down to look at those if you get stuck. In time, you will become used to the information in the help pages and understand it readily.<\/p>\r\n&nbsp;\r\n\r\n<strong>Nestedness and assistance<\/strong>\r\n\r\n<a href=\"https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_ycqp6cqk\">https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_ycqp6cqk<\/a>\r\n<p style=\"text-align: justify\"><span>[iframe width=\"640\" height=\"360\" src=\"https:\/\/tube.switch.ch\/embed\/14f1cbb8<\/span><span>\" frameborder=\"0\" webkitallowfullscreen mozallowfullscreen allowfullscreen]<\/span><\/p>\r\n<p style=\"text-align: justify\">A further concept we want to introduce is nestedness. Nestedness describes situations where commands contain other commands, leading to a layering of commands. Take a look at the following example:<\/p>\r\n\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">sort(sample(c(1,2,3),5, replace=T))\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 1 1 2 3 3\r\n<\/span><\/span><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">sort(sample(c(1,2,3),5, replace=T))\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 1 1 2 2 3<\/span><\/span><\/code><\/pre>\r\n<p style=\"text-align: justify\">First, let's focus on the familiar elements. In brackets, we see a sample function which draws 5 observations with replacement from a vector ranging from 1 through 3. We discussed similar examples above, and there we always received an unsorted sample. Here, however, the numbers are sorted in ascending order. You are correct in thinking that this is because of the <code>sort()<\/code> function. What may be less evident is that we are dealing with nested commands here. You can clearly see this if you perform the steps separately:<\/p>\r\n\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">chaos &lt;- sample(c(1,2,3),5, replace=T)\r\n<\/span><\/span><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">chaos\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 3 1 3 3 3\r\n<\/span><\/span><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">order &lt;- sort(chaos)\r\n<\/span><\/span><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">order\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 1 3 3 3 3<\/span><\/span><\/code><\/pre>\r\n<p style=\"text-align: justify\">With an example like this, nestedness is fairly easy to see and to work with. However, consider this more complex set of nested elements:<\/p>\r\n\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">L &lt;- sample(c(1:10),1); if (L == 1) {print (\"O dear ...\")} else { if (L &gt; 8) {print(\"you lucky bastard\")}}<\/span><\/span><\/code><\/pre>\r\n<p style=\"text-align: justify\">[h5p id=\"5\"]<\/p>\r\n<p style=\"text-align: justify\">It is with examples like this that you will start to get a feeling for the importance of keeping a good overview over your brackets. In the base R, you are responsible for getting the brackets right yourself, while in RStudio offers a bit of assistance on that front by coloring the other half of the bracket grey if your text cursor touches a bracket:<\/p>\r\n\r\n\r\n[caption id=\"attachment_178\" align=\"aligncenter\" width=\"204\"]<img src=\"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/5.-RStudio-Cursor-aid.jpg\" alt=\"\" class=\"size-full wp-image-178\" width=\"204\" height=\"28\" \/> Figure 2.5: RStudio's grey bracket assistance[\/caption]\r\n<p style=\"text-align: justify\">A user-friendly editing function which is included both in RStudio and base R is the history. You can scroll through the history with the \"arrow-up\" key, which saves you the trouble of copying and pasting if you want to run a command multiple times without (or with few) adjustments. If you scroll back too far, you can use the \"arrow-down\" key to get to where you want to go. The history allows you to retrieve any command you entered during the current session.<\/p>\r\n<a href=\"https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_6hlfnata\">https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_6hlfnata<\/a>\r\n\r\n<span>[iframe width=\"640\" height=\"360\" src=\"https:\/\/tube.switch.ch\/embed\/bd179a3d<\/span><span>\" frameborder=\"0\" webkitallowfullscreen mozallowfullscreen allowfullscreen]<\/span>\r\n<p style=\"text-align: justify\">Another user-friendly editing function included in R is the auto-fill. If you type the letters \u201csor\u201d into the console and hit the TAB key, you will see a small window open which contains all possible completions of \u201csor\u201d, from <code>sort<\/code> to <code>sortedXyData<\/code>. Use the up- and down-arrow keys to navigate through your options and hit TAB again on the one you want.<\/p>\r\n\r\n\r\n[caption id=\"attachment_179\" align=\"aligncenter\" width=\"844\"]<img src=\"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/6.-RStudio-autofill.jpg\" alt=\"\" class=\"size-full wp-image-179\" width=\"844\" height=\"181\" \/> Figure 2.6: The auto-fill function in RStudio[\/caption]\r\n\r\nArmed with these concepts, you should be prepared for most of the things we discuss from here on.\r\n\r\n&nbsp;\r\n<p style=\"text-align: justify\"><strong>Mistakes and the importance of practice<\/strong><\/p>\r\n<p style=\"text-align: justify\">In the beginning, you will make a lot of mistakes. This does not mean anything is wrong with you. From beginners to experienced programmers, everyone makes mistakes and runs into error messages from time to time. In fact, the more time you spend programming, the more error messages you will see. In a sense, learning a programming language forces you to confront your mistakes more than other solitary practices, since computers are extremely nit-picking and take everything you enter exactly as it is, regardless of whether it makes sense or not. To help you avoid more error messages than are necessary, we compiled a list of several frequent mistakes and let you sort out the correct from the incorrect commands.<\/p>\r\n<p style=\"text-align: justify\">[h5p id=\"7\"]<\/p>\r\n<a href=\"https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_c7ry0ht7\">https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_c7ry0ht7<\/a>\r\n<p style=\"text-align: justify\"><span>[iframe width=\"640\" height=\"360\" src=\"https:\/\/tube.switch.ch\/embed\/5bf28343<\/span><span>\" frameborder=\"0\" webkitallowfullscreen mozallowfullscreen allowfullscreen]<\/span><\/p>\r\n<p style=\"text-align: justify\">You will spend a lot of time browsing manuals, debugging and googling. As the saying goes, there\u2019s no learning like learning the hard way. While this is perhaps a negative way of looking at things, it has a decisively positive flipside: you are in constant interaction with the computer which means it is easy to make progress. So you get error messages, and you get output, and the more you practice, the more often the output will be what you want it to be and at some point you will find solutions to errors more quickly and, eventually, the error messages will become rarer (although if you stop getting any error messages, it means that you probably stopped programming).<\/p>\r\n<p style=\"text-align: justify\">What is more, you are not alone in this. The online community has answered a lot of questions, and often it is enough to just google the error message to find a solution. For instance, if you search Google for \u201cR random number generate\u201d you will find many different ways of doing just that. We should also add, though, that it becomes easier to find solutions online if you are familiar with the R terminology, so we would encourage you to think, for example, about \u201cdata frames\u201d rather than \u201ctables\u201d when working with R. We cannot practice for you, but we make sure that you pick up the correct terminology from reading this book.<\/p>\r\n<a href=\"https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_r1y7v2d8\">https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_r1y7v2d8<\/a>\r\n<p style=\"text-align: justify\"><span>[iframe width=\"640\" height=\"360\" src=\"https:\/\/tube.switch.ch\/embed\/9c7af654<\/span><span>\" frameborder=\"0\" webkitallowfullscreen mozallowfullscreen allowfullscreen]<\/span><\/p>\r\n<p style=\"text-align: justify\">After our litany on practicing, we want to give one more piece of advice before beginning with statistics proper. One of the difficulties beginners encounter, and indeed some of the most frustrating challenges occur when importing files into R. We already discussed one way of doing that in the previous chapter, where you imported a tab-separated table with headers into R. In addition to tab-separated files, there are two more structures which are worth knowing. Open these two raw text files in new tabs:<\/p>\r\n<a href=\"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/vector1.txt\">vector1.txt<\/a> and <a href=\"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/vector2.txt\">vector2.txt<\/a>\r\n<p style=\"text-align: justify\">First of all, since they are one-dimensional we can identify both of these files as vectors. Then, we see that the data points in the first vector are separated with new lines, and those in the second are separated by a whitespace. When importing a file it is very helpful to be aware of how the file is structured.<\/p>\r\n<p style=\"text-align: justify\">For instance, if we import <em>vector1.txt<\/em> using the command we <code>file.choose()<\/code> that we saw in the previous chapter, we can use a simple piece of code:<\/p>\r\n\r\n<pre><code>&gt; v1 &lt;- scan(file=choose.files())<\/code><\/pre>\r\n<p style=\"text-align: justify\">Again, a window will open and we can open <em>vector1.txt<\/em> from whereever we saved it. However, if you try to use the same code to open <em>vector2.txt<\/em> you get an error message:<\/p>\r\n\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; v2<\/span><span class=\"GNKRCKGCMRB ace_keyword\"> &lt;- scan(file=choose.files())\r\n<\/span><span class=\"GNKRCKGCASB ace_constant\">Error in scan(file = choose.files()) : \r\n  scan() expected 'a real', got 'anton'<\/span><\/span><\/code><\/pre>\r\n<p style=\"text-align: justify\">The error message occurs for two reasons. Firstly, <em>vector2.txt <\/em>contains character strings, not numbers. Secondly, the data points are separated, as we said, by a whitespace. We have to augment the <code>scan()<\/code> command by two arguments:<\/p>\r\n\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><span><code><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">v2 &lt;- scan(file=choose.files(),sep=\" \",what = \"char\"); v2\r\n<\/span><span class=\"GNKRCKGCASB ace_constant\">Read 5 items\r\n<\/span><\/code><span class=\"GNKRCKGCGSB\"><code>[1] \"anton\"  \"berta\"  \"caesar\" \"dora\"   \"emil\" <\/code> <\/span><\/span><\/pre>\r\n<p style=\"text-align: justify\">These arguments, which are also called flags, are not optional for the separator and the data type. If you don't include them, you cannot open <em>vector2.txt<\/em>.<\/p>\r\n<p style=\"text-align: justify\">When working on a script in R over multiple sessions, it becomes tedious to manually select the file you work with each time. In order to target the correct file automatically, adapt the file path and enter the absolute file location:<\/p>\r\n\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">v2 &lt;- scan(file=\"C:\/Users\/Max\/Documents\/R\/stats for linguistics\/chapter 2\/vector2.txt\", what=\"char\", sep=\" \")\r\n<\/span><span class=\"GNKRCKGCASB ace_constant\">Read 5 items<\/span><\/span><\/code><\/pre>\r\n<p style=\"text-align: justify\">This also works with the <code>read.table()<\/code> function which we used in the previous chapter:<\/p>\r\n\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">verbs &lt;- read.table(file=\"C:\/Users\/Max\/Documents\/R\/stats for linguistics\/chapter 1\/verbs.txt\", header=TRUE, sep=\"\\t\")<\/span><\/span><\/code><\/pre>\r\n<p style=\"text-align: justify\">So far, we have only discussed how to get data into R. But, of course, sometimes we want to edit the data in R and then save the results. For this we can use the <code>cat()<\/code> function.<\/p>\r\n<p style=\"text-align: justify\">Imagine, for instance, that you want to create a list of possible names for your unborn child. So far, you have compiled a list of of alphabetically sorted names: the <em>v2<\/em> you imported above. You feel that five names just don't cut it, you want at least seven. So you add the two next candidates to the list:<\/p>\r\n\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">v2 &lt;- append(v2, c(\"friedrich\", \"gudrun\"))<\/span><\/span><\/code><\/pre>\r\n<p style=\"text-align: justify\">In a next step, you want to save the new variable with seven names so you can print the list and discuss the new additions with your spouse. To do so, you need the <code>cat()<\/code> function and the arguments it takes: the variable you want to save, the file path and the separator type:<\/p>\r\n\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">cat(v2, file=\"C:\/Users\/Max\/Documents\/R\/stats for linguistics\/chapter 2\/vector2_more_names.txt\", sep=\"\\n\")<\/span><\/span><\/code><\/pre>\r\n<p style=\"text-align: justify\">Here, we decided to save the longer list as a new file, with line breaks separating the names. If you want to overwrite an existing file, you can also use the <code>file.choose()<\/code> function:<\/p>\r\n\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">cat(v2, file.choose())<\/span><\/span><\/code><\/pre>\r\n&nbsp;\r\n\r\n<strong>Vectors<\/strong>\r\n<p style=\"text-align: justify\">Finally, since we are already working with them, let us add some words about vectors. We have discussed how vectors can be imported, lengthened and saved. With <em>v1<\/em> and <em>v2<\/em> we have seen that we can store both numeric and character data in vectors. In the example above, we played around with <em>v2<\/em>, so now let's take a look at what we can do with the numeric vector, <em>v1<\/em>.<\/p>\r\nLike we did above, we can add more information to the vector:\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">numbers &lt;- append(v1,c(8,9,11)) ; numbers\r\n<\/span><span class=\"GNKRCKGCGSB\">[1]  1  2  3  4  5  8  9 11<\/span><\/span><\/code><\/pre>\r\nThen we can calculate the mean:\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">mean(numbers)\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 5.375<\/span><\/span><\/code><\/pre>\r\nOr run it through a loop:\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">for (n in c(1:10)) {numbers = append((mean(numbers)+length(numbers)),numbers)} ; numbers\r\n<\/span><span class=\"GNKRCKGCGSB\"> [1] 30.65330 28.71213 26.77463 24.84129 22.91272 20.98965 19.07298 17.16389 15.26389 13.37500  1.00000\r\n[12]  2.00000  3.00000  4.00000  5.00000  8.00000  9.00000 11.00000<\/span><\/span><\/code><\/pre>\r\n<p style=\"text-align: justify\">There are many ways to manipulate numeric vectors, and we will discuss them more in subsequent chapters.<\/p>\r\n<p style=\"text-align: justify\">What happens if we encounter a vector which contains both numbers and words? We can try that out by generating one:<\/p>\r\n\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">a &lt;- c(1,2,3, \"hi!\")<\/span><\/span><\/code><\/pre>\r\nFirst, let's check whether this is even a vector:\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">is.vector(a)\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] TRUE<\/span><\/span><\/code><\/pre>\r\nWe can also check out whether <em>a<\/em> has as many positions as we think it does:\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">length(a)\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 4<\/span><\/span><\/code><\/pre>\r\nThen, we can use the <code>str()<\/code> function to view the structure of this object <em>a<\/em>:\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">str(a)\r\n<\/span><span class=\"GNKRCKGCGSB\"> chr [1:4] \"1\" \"2\" \"3\" \"hi!\"<\/span><\/span><\/code><\/pre>\r\n<p style=\"text-align: justify\">Here, we see that even the numbers in this vector are characters. This makes sense since R is incapable of coercing characters strings into numbers:<\/p>\r\n\r\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><span><code><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">as.integer(a)\r\n<\/span><span class=\"GNKRCKGCGSB\">[1]  1  2  3 NA\r\n<\/span><span class=\"GNKRCKGCASB ace_constant\">Warning message:\r\n<\/span><\/code><span class=\"GNKRCKGCASB ace_constant\"><code>NAs introduced by coercion<\/code> <\/span><\/span><\/pre>\r\n<p style=\"text-align: justify\">And since numbers can be converted into characters, this is what R does.<\/p>\r\nUse your understanding of how vectors to build a simple random name generator:\r\n\r\n[h5p id=\"35\"]\r\n<p style=\"text-align: justify\">In the next couple of chapters, we will discuss the fundamentals of statistics, which means we will be working mostly with numeric data. Find out what this means in detail by continuing to the next chapter, <a href=\"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/chapter\/basics-of-descriptive-statistics\/\">Basics of Descriptive Statistics<\/a>.<\/p>\r\n&nbsp;\r\n<p style=\"text-align: justify\">If you are interested in acquainting yourself with further useful functions in R before continuing, we recommend reading the documentation of and playing around with the following functions:<\/p>\r\n\r\n<ul>\r\n \t<li>as.characters<\/li>\r\n \t<li>as.integers<\/li>\r\n \t<li>attach<\/li>\r\n \t<li>plot<\/li>\r\n \t<li>hist<\/li>\r\n \t<li>barplot<\/li>\r\n \t<li>pie<\/li>\r\n<\/ul>\r\n<p style=\"text-align: justify\">For further reading and more exercises we recommend reading chapter 2 in Gries (2008) and visiting his website at <a href=\"http:\/\/www.stgries.info\/research\/sflwr\/sflwr.html\">http:\/\/www.stgries.info\/research\/sflwr\/sflwr.html<\/a>.<\/p>\r\n&nbsp;\r\n<p style=\"text-align: justify\"><strong>Reference:<\/strong><\/p>\r\nGries, Stefan Th. 2008. <em>Statistik f\u00fcr Sprachwissenschaftler<\/em>. G\u00f6ttingen: Vandenhoeck &amp; Ruprecht.","rendered":"<p style=\"text-align: justify\">In this chapter, we will take a step back and discuss several defining features of R. If you are not yet familiar with any programming language, you will learn the conventional terminology and get to know practical and common concepts of programming.<\/p>\n<p>&nbsp;<\/p>\n<p style=\"text-align: justify\"><strong>Why R?<\/strong><\/p>\n<p><a href=\"https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_2zy3cc4p\">https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_2zy3cc4p<\/a><\/p>\n<p><span><br \/>\n<!-- iframe plugin v.6.0 wordpress.org\/plugins\/iframe\/ --><br \/>\n<iframe loading=\"lazy\" width=\"640\" height=\"360\" src=\"https:\/\/tube.switch.ch\/embed\/f5548367\/spanspan\" frameborder=\"0\" 0=\"webkitallowfullscreen\" 1=\"mozallowfullscreen\" 2=\"allowfullscreen\" scrolling=\"yes\" class=\"iframe-class\"><\/iframe><br \/>\n<\/span><\/p>\n<p style=\"text-align: justify\">In the last chapter, we saw how R can be used to get a handle on large data sets, and if you got this far, there will be no need to recapitulate how statistics can augment linguistic research. If, however, you already know another programming language you might wonder why you should learn R. Well, there are several reasons:<\/p>\n<ul style=\"text-align: justify\">\n<li>It\u2019s very close to statistics<\/li>\n<li>It\u2019s powerful, as you will see later on<\/li>\n<li>It makes minimal use of procedural style and has a functional flavor<\/li>\n<li>There are many libraries for specific tasks<\/li>\n<li>The code is often less hackish than e.g. Perl<\/li>\n<li>It has less syntactic overhead than e.g. Perl<\/li>\n<li>It is less verbose than e.g. Java<\/li>\n<li>It is conceptually easier than Prolog<\/li>\n<\/ul>\n<p style=\"text-align: justify\">And, of course, different languages are useful for different purposes. Don\u2019t use R instead of, but in addition to the languages you already know. There are, for example, APIs from Perl, Prolog and Python to R which allow you to integrate the different languages with each other.<\/p>\n<p style=\"text-align: justify\">Unlike other statistics programs, R is a free, open source language, and has a very active community committed to programming extensions to R. These extensions are known as libraries, and usually if you have a clear idea of what you want to do in R but can&#8217;t find a predefined function to do it, you will find that someone has programmed a library which serves your need. We are grateful to guyjantic on Imgur for visualizing the allure of R over other programs:<\/p>\n<figure id=\"attachment_166\" aria-describedby=\"caption-attachment-166\" style=\"width: 960px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/1.-Why-R.png\" alt=\"\" class=\"size-full wp-image-166\" width=\"960\" height=\"817\" srcset=\"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/1.-Why-R.png 960w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/1.-Why-R-300x255.png 300w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/1.-Why-R-768x654.png 768w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/1.-Why-R-65x55.png 65w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/1.-Why-R-225x191.png 225w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/1.-Why-R-350x298.png 350w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/><figcaption id=\"caption-attachment-166\" class=\"wp-caption-text\">Figure 2.1: R in comparison (https:\/\/imgur.com\/gallery\/CCr9jE9).<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n<p style=\"text-align: justify\"><strong>Introduction to R<\/strong><\/p>\n<p>This section builds on Gries (2008), chapter 2.<\/p>\n<p><a href=\"https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_3eal1bck\">https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_3eal1bck<\/a><\/p>\n<p style=\"text-align: justify\"><span><br \/>\n<!-- iframe plugin v.6.0 wordpress.org\/plugins\/iframe\/ --><br \/>\n<iframe loading=\"lazy\" width=\"640\" height=\"360\" src=\"https:\/\/tube.switch.ch\/embed\/7ac34dc7\/spanspan\" frameborder=\"0\" 0=\"webkitallowfullscreen\" 1=\"mozallowfullscreen\" 2=\"allowfullscreen\" scrolling=\"yes\" class=\"iframe-class\"><\/iframe><br \/>\n<\/span><\/p>\n<p style=\"text-align: justify\">In the last chapter, you already got a taste of some simple commands. Here, we briefly recapitulate some of what we discussed in the previous chapter and introduce several more foundational concepts.<\/p>\n<p style=\"text-align: justify\">For those of you new to programming, it is useful to think of programming as writing sets of calculations and\/or commands. These commands have to be entered at the prompt in the console of R. Remember that the prompt is the <code>&gt;<\/code>.<\/p>\n<p style=\"text-align: justify\">At the most basic level, we can use R as a calculator. In those cases, we just enter our calculations and get the result:<\/p>\n<pre id=\"rstudio_console_output\" class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">17\/2\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 8.5<\/span><\/span><\/code><\/pre>\n<p style=\"text-align: justify\">As we have seen, we can also define variables. Variables can take different forms and structures. In addition to numeric data, we often use character strings in linguistics. Character strings are identifiable by the quotation marks around them, as we will see below. Both strings and numerals can come as single values, data frames (which we saw in the last chapter) or vectors, which we discuss at length below.<\/p>\n<p>Here you see an example of a variable which contains a string and of one which contains a numeric vector:<\/p>\n<pre class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">a&lt;- \"Hello World\"\r\n<\/span><\/span><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">a\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] \"Hello World\"\r\n<\/span><\/span><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">b=c(1,3,4)\r\n<\/span><\/span><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">b\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 1 3 4<\/span><\/span><\/code><\/pre>\n<p style=\"text-align: justify\">Note how the strings are identifiable by the quotation marks and the numerals by the absence of quotation marks.<\/p>\n<p style=\"text-align: justify\">So far, we have only shown examples where there is one command per line. However, it is also possible to enter sequences of commands. To do this, we enter several commands separated by a semicolon:<\/p>\n<pre class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">a &lt;- 2+3 ; a &lt;- a+a ; a\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 10<\/span><\/span><\/code><\/pre>\n<p style=\"text-align: justify\">In RStudio, we can also use the script to write and execute a sequence of commands. When working in the script window of RStudio, you can write each command on a new line. Then you select the lines you want to run and execute them by pressing &#8220;Ctrl+Enter&#8221; or clicking &#8220;Run&#8221; in the top right of the window.<\/p>\n<figure id=\"attachment_170\" aria-describedby=\"caption-attachment-170\" style=\"width: 1919px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/2.-Sequence-RStudio.jpg\" alt=\"\" class=\"size-full wp-image-170\" width=\"1919\" height=\"1030\" srcset=\"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/2.-Sequence-RStudio.jpg 1919w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/2.-Sequence-RStudio-300x161.jpg 300w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/2.-Sequence-RStudio-768x412.jpg 768w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/2.-Sequence-RStudio-1024x550.jpg 1024w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/2.-Sequence-RStudio-65x35.jpg 65w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/2.-Sequence-RStudio-225x121.jpg 225w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/2.-Sequence-RStudio-350x188.jpg 350w\" sizes=\"auto, (max-width: 1919px) 100vw, 1919px\" \/><figcaption id=\"caption-attachment-170\" class=\"wp-caption-text\">Figure 2.2: A sequence of commands in the RStudio script<\/figcaption><\/figure>\n<p style=\"text-align: justify\">As we mentioned above, R contains a lot of predefined functions beyond simple mathematical ones. Unless you use R only as a calculator, you will use functions which take an argument. This means that function is performed on an object, like one of our variables. Let&#8217;s begin by looking at a simple mathematical function which takes an argument: the square root. We can use the predefined function <code>sqrt()<\/code> to calcualte the square root of a numeric value. Let&#8217;s overwrite our variable <em>b<\/em> to become the square root of 5 and then print the result:<\/p>\n<pre class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">b &lt;- sqrt(5); print(b)\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 2.236068<\/span><\/span><\/code><\/pre>\n<p style=\"text-align: justify\">A function we will use a lot is <code>plot()<\/code>. Let&#8217;s a numeric vector to the new variable <em>v<\/em> and plot the result. The picture below contains the two bottom windows of RStudio, showing the console with the command and the resulting plot.<\/p>\n<figure id=\"attachment_171\" aria-describedby=\"caption-attachment-171\" style=\"width: 1903px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/3.-Plotting-vector.jpg\" alt=\"\" class=\"size-full wp-image-171\" width=\"1903\" height=\"518\" srcset=\"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/3.-Plotting-vector.jpg 1903w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/3.-Plotting-vector-300x82.jpg 300w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/3.-Plotting-vector-768x209.jpg 768w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/3.-Plotting-vector-1024x279.jpg 1024w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/3.-Plotting-vector-65x18.jpg 65w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/3.-Plotting-vector-225x61.jpg 225w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/3.-Plotting-vector-350x95.jpg 350w\" sizes=\"auto, (max-width: 1903px) 100vw, 1903px\" \/><figcaption id=\"caption-attachment-171\" class=\"wp-caption-text\">Figure 2.3: Plotting vector <em>v<\/em> in RStudio<\/figcaption><\/figure>\n<p style=\"text-align: justify\">Another function which takes an argument is <code>sample ()<\/code>. In fact, <code>sample()<\/code> takes two arguments. Test your intuition of how this function works in the exercise below.<\/p>\n<p style=\"text-align: justify\">\n<div id=\"h5p-3\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-3\" class=\"h5p-iframe\" data-content-id=\"3\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"2.1 Arguments of sample()\"><\/iframe><\/div>\n<\/div>\n<p style=\"text-align: justify\">In the last chapter, we already briefly touched on ranges. This is a fairly simple concept, but it can be very useful if you want to generate a bunch of numbers quickly (as sample data, for instance). The operater which enables us to create a range is the <code>:<\/code>. In the example below, we use this to generate a variable containing a numerical vector ranging from 0 through 10:<\/p>\n<pre class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">MyRange &lt;- c(0:10) ; MyRange\r\n<\/span><span class=\"GNKRCKGCGSB\"> [1]  0  1  2  3  4  5  6  7  8  9 10<\/span><\/span><\/code><\/pre>\n<p style=\"text-align: justify\">A final basic concept are indexes, which we also touched on in the last chapter. Indexes refer to particular positions in vectors and can be accessed using square brackets. For instance, to access the fifth position in the <em>MyRange<\/em> variable, we can use the command:<\/p>\n<pre class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">MyRange[5]\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 4<\/span><\/span><\/code><\/pre>\n<p style=\"text-align: justify\">As we discussed at some length in the last chapter, we can also use indexes to access specific cells in data frames. The data frame examples, of course, had two dimensions (rows and columns), while the vector, being a list, only has one. If you try to incorporate a second dimension in <em>MyRange<\/em> by inserting a comma in the brackets, you get an error message:<\/p>\n<pre class=\"GNKRCKGCGSB\"><span><code><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">MyRange[5,]\r\n<\/span><\/code><span class=\"GNKRCKGCASB ace_constant\"><code>Error in MyRange[5, ] : incorrect number of dimensions<\/code> <\/span><\/span><\/pre>\n<p>&nbsp;<\/p>\n<p><strong>Sampling and looping<\/strong><\/p>\n<p style=\"text-align: justify\">So far, so good, or so we hope. As you might expect, things get more complicated fairly quickly. For instance, we can formulate conditional commands in R, meaning that the output depends on an element which is determined by a step in a squence of commands, rather than by us.<\/p>\n<p><a href=\"https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_vijio01q\">https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_vijio01q<\/a><\/p>\n<p style=\"text-align: justify\"><span><br \/>\n<!-- iframe plugin v.6.0 wordpress.org\/plugins\/iframe\/ --><br \/>\n<iframe loading=\"lazy\" width=\"640\" height=\"360\" src=\"https:\/\/tube.switch.ch\/embed\/54b95209\/spanspan\" frameborder=\"0\" 0=\"webkitallowfullscreen\" 1=\"mozallowfullscreen\" 2=\"allowfullscreen\" scrolling=\"yes\" class=\"iframe-class\"><\/iframe><br \/>\n<\/span><\/p>\n<p>Take, for example, the sequence of commands below:<\/p>\n<pre class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">randomthingy = sample(c(1,2,3),1) ; if (randomthingy &gt; 1) {print (\"You are lucky\")} else {print(\"I am sorry\")}\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] \"You are lucky\"<\/span><\/span><\/code><\/pre>\n<p style=\"text-align: justify\">You are now, of course, familiar with the <code>sample()<\/code> function, so you will immediately see that the first command in this sequence assigns the value 1, 2 or 3 to the variable randomthingy. The second part of the sequence is a little more complicated: it is a logical test. The argument in brackets behind the <code>if<\/code>, <code>(randomthingy &gt; 1)<\/code>, tests whether the number we sampled before is greater than one. If this is the case, the output is &#8220;You are lucky&#8221;. If, however, the number we sampled is equal to one, the output is &#8220;I am sorry&#8221;.<\/p>\n<p style=\"text-align: justify\">You can see that the conditional output is put in quotation marks, because we want the output to be character strings, and in curly brackets, { }. The curly brackets are referred to as blocks. In this example, the blocks allow us to conditionally define the output.<\/p>\n<p style=\"text-align: justify\">Blocks are also used in another important feature of programming: loops. Loops allow us to run multiple iterations of a command. Before we discuss what this means in detail, take a look at this example:<\/p>\n<pre class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">for (i in c(1:3)) {myS = sample(c(1,2,3),i) ; print(myS)}\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 2\r\n[1] 2 1\r\n[1] 1 2 3<\/span><\/span><\/code><\/pre>\n<p style=\"text-align: justify\">What is going on here? This is perhaps easiest to grasp by looking at the what is in the block. You can see that it is a squence of familiar commands: in the first, we draw a sample from a vector and assign it to the variable <em>myS<\/em>, and in the second we tell R to display the variable.<\/p>\n<p style=\"text-align: justify\">The new element here is the second argument in the <code>sample()<\/code> function, <code>i<\/code>. What is <code>i<\/code>? If you look at the <code>for()<\/code> command in front of the block, you get the answer. Here, we tell R to iterate through the vector ranging from 1 through 3. This means that in the first iteration, <code>i<\/code> takes the value 1. Then, the commands in the block are executed. In the second iteration, <code>i<\/code> takes the value 2. Again, the commands in the block are executed. In the third and final iteration, <code>i<\/code> takes the value 3. Again, the commands in the block are executed.<\/p>\n<p style=\"text-align: justify\">If it was slightly repetitive to read the explanation, you have understood why loops are a useful concept. Imagine you want to run a loop through not three, but, say, 10&#8217;000 iterations. With this small but powerful line of code, you can sometimes save yourself hours of tedious manual labor.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>Default assumptions<\/strong><\/p>\n<p><a href=\"https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_oje11hip\">https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_oje11hip<\/a><\/p>\n<p style=\"text-align: justify\"><span><br \/>\n<!-- iframe plugin v.6.0 wordpress.org\/plugins\/iframe\/ --><br \/>\n<iframe loading=\"lazy\" width=\"640\" height=\"360\" src=\"https:\/\/tube.switch.ch\/embed\/89735fe5\/spanspan\" frameborder=\"0\" 0=\"webkitallowfullscreen\" 1=\"mozallowfullscreen\" 2=\"allowfullscreen\" scrolling=\"yes\" class=\"iframe-class\"><\/iframe><br \/>\n<\/span><\/p>\n<p style=\"text-align: justify\">Often it is not entirely clear what a command is supposed to do, and there are many default assumptions being made by R in the background. Run the following loop, which is identical to the one above except that it runs not through three but through eight iterations:<\/p>\n<pre class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">for (i in c(1:8)) {myS = sample(c(1,2,3),i) ; print(myS)}\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 2\r\n[1] 1 2\r\n[1] 1 2 3\r\n<\/span><span class=\"GNKRCKGCASB ace_constant\">Error in sample.int(length(x), size, replace, prob) : \r\n  cannot take a sample larger than the population when 'replace = FALSE'<\/span><\/span><\/code><\/pre>\n<p style=\"text-align: justify\">The error message R prints shows that we are working with the default assumption under which the <code>sample()<\/code> function samples without replacement. In other words, there are only three values which can be drawn, and once all three numbers are drawn R prints the error message because it cannot draw the fourth value which would be required in the fourth iteration. In order to avoid this error, we can augment our sample command with a further argument, where we define \u201creplace = TRUE\u201d:<\/p>\n<pre class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">for (i in c(1:8)) {myS = sample(c(1,2,3),i, replace=TRUE) ; print(myS)}\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 3\r\n[1] 2 2\r\n[1] 3 2 2\r\n[1] 1 2 3 2\r\n[1] 2 1 1 1 2\r\n[1] 1 2 2 2 3 3\r\n[1] 2 3 1 2 3 3 3\r\n[1] 2 2 2 3 2 1 3 1<\/span><\/span><\/code><\/pre>\n<p style=\"text-align: justify\">With this additional argument, we allow for sampling with replacement, which means that R can draw each value multiple times. Now the loop runs without error messages.<\/p>\n<p style=\"text-align: justify\">It is not always obvious which functions take which arguments, let alone what the default assumptions of these arguments are. In those cases, we can look up the default arguments using R\u2019s inbuilt documentation using one of the two help commands: <code>help(sample)<\/code>\u00a0or <code>?sample<\/code>. If you are using the R base, a new window will open with the documentation file, and if you are using RStudio, the documentation will open in the bottom right window.<\/p>\n<figure id=\"attachment_176\" aria-describedby=\"caption-attachment-176\" style=\"width: 1348px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/Help-in-RStudio.jpg\" alt=\"\" class=\"size-full wp-image-176\" width=\"1348\" height=\"515\" srcset=\"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/Help-in-RStudio.jpg 1348w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/Help-in-RStudio-300x115.jpg 300w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/Help-in-RStudio-768x293.jpg 768w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/Help-in-RStudio-1024x391.jpg 1024w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/Help-in-RStudio-65x25.jpg 65w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/Help-in-RStudio-225x86.jpg 225w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/Help-in-RStudio-350x134.jpg 350w\" sizes=\"auto, (max-width: 1348px) 100vw, 1348px\" \/><figcaption id=\"caption-attachment-176\" class=\"wp-caption-text\">Figure 2.4: Screenshot of the documentation to sample() in RStudio<\/figcaption><\/figure>\n<p style=\"text-align: justify\">In the beginning, the information in the documentation may be more confusing than helpful because of the style it is presented in. Most help pages feature several examples at the end of the page, and it is worth scrolling down to look at those if you get stuck. In time, you will become used to the information in the help pages and understand it readily.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>Nestedness and assistance<\/strong><\/p>\n<p><a href=\"https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_ycqp6cqk\">https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_ycqp6cqk<\/a><\/p>\n<p style=\"text-align: justify\"><span><br \/>\n<!-- iframe plugin v.6.0 wordpress.org\/plugins\/iframe\/ --><br \/>\n<iframe loading=\"lazy\" width=\"640\" height=\"360\" src=\"https:\/\/tube.switch.ch\/embed\/14f1cbb8\/spanspan\" frameborder=\"0\" 0=\"webkitallowfullscreen\" 1=\"mozallowfullscreen\" 2=\"allowfullscreen\" scrolling=\"yes\" class=\"iframe-class\"><\/iframe><br \/>\n<\/span><\/p>\n<p style=\"text-align: justify\">A further concept we want to introduce is nestedness. Nestedness describes situations where commands contain other commands, leading to a layering of commands. Take a look at the following example:<\/p>\n<pre class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">sort(sample(c(1,2,3),5, replace=T))\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 1 1 2 3 3\r\n<\/span><\/span><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">sort(sample(c(1,2,3),5, replace=T))\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 1 1 2 2 3<\/span><\/span><\/code><\/pre>\n<p style=\"text-align: justify\">First, let&#8217;s focus on the familiar elements. In brackets, we see a sample function which draws 5 observations with replacement from a vector ranging from 1 through 3. We discussed similar examples above, and there we always received an unsorted sample. Here, however, the numbers are sorted in ascending order. You are correct in thinking that this is because of the <code>sort()<\/code> function. What may be less evident is that we are dealing with nested commands here. You can clearly see this if you perform the steps separately:<\/p>\n<pre class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">chaos &lt;- sample(c(1,2,3),5, replace=T)\r\n<\/span><\/span><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">chaos\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 3 1 3 3 3\r\n<\/span><\/span><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">order &lt;- sort(chaos)\r\n<\/span><\/span><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">order\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 1 3 3 3 3<\/span><\/span><\/code><\/pre>\n<p style=\"text-align: justify\">With an example like this, nestedness is fairly easy to see and to work with. However, consider this more complex set of nested elements:<\/p>\n<pre class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">L &lt;- sample(c(1:10),1); if (L == 1) {print (\"O dear ...\")} else { if (L &gt; 8) {print(\"you lucky bastard\")}}<\/span><\/span><\/code><\/pre>\n<p style=\"text-align: justify\">\n<div id=\"h5p-5\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-5\" class=\"h5p-iframe\" data-content-id=\"5\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"2.2 Nestedness (correct)\"><\/iframe><\/div>\n<\/div>\n<p style=\"text-align: justify\">It is with examples like this that you will start to get a feeling for the importance of keeping a good overview over your brackets. In the base R, you are responsible for getting the brackets right yourself, while in RStudio offers a bit of assistance on that front by coloring the other half of the bracket grey if your text cursor touches a bracket:<\/p>\n<figure id=\"attachment_178\" aria-describedby=\"caption-attachment-178\" style=\"width: 204px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/5.-RStudio-Cursor-aid.jpg\" alt=\"\" class=\"size-full wp-image-178\" width=\"204\" height=\"28\" srcset=\"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/5.-RStudio-Cursor-aid.jpg 204w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/5.-RStudio-Cursor-aid-65x9.jpg 65w\" sizes=\"auto, (max-width: 204px) 100vw, 204px\" \/><figcaption id=\"caption-attachment-178\" class=\"wp-caption-text\">Figure 2.5: RStudio&#8217;s grey bracket assistance<\/figcaption><\/figure>\n<p style=\"text-align: justify\">A user-friendly editing function which is included both in RStudio and base R is the history. You can scroll through the history with the &#8220;arrow-up&#8221; key, which saves you the trouble of copying and pasting if you want to run a command multiple times without (or with few) adjustments. If you scroll back too far, you can use the &#8220;arrow-down&#8221; key to get to where you want to go. The history allows you to retrieve any command you entered during the current session.<\/p>\n<p><a href=\"https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_6hlfnata\">https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_6hlfnata<\/a><\/p>\n<p><span><br \/>\n<!-- iframe plugin v.6.0 wordpress.org\/plugins\/iframe\/ --><br \/>\n<iframe loading=\"lazy\" width=\"640\" height=\"360\" src=\"https:\/\/tube.switch.ch\/embed\/bd179a3d\/spanspan\" frameborder=\"0\" 0=\"webkitallowfullscreen\" 1=\"mozallowfullscreen\" 2=\"allowfullscreen\" scrolling=\"yes\" class=\"iframe-class\"><\/iframe><br \/>\n<\/span><\/p>\n<p style=\"text-align: justify\">Another user-friendly editing function included in R is the auto-fill. If you type the letters \u201csor\u201d into the console and hit the TAB key, you will see a small window open which contains all possible completions of \u201csor\u201d, from <code>sort<\/code> to <code>sortedXyData<\/code>. Use the up- and down-arrow keys to navigate through your options and hit TAB again on the one you want.<\/p>\n<figure id=\"attachment_179\" aria-describedby=\"caption-attachment-179\" style=\"width: 844px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/6.-RStudio-autofill.jpg\" alt=\"\" class=\"size-full wp-image-179\" width=\"844\" height=\"181\" srcset=\"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/6.-RStudio-autofill.jpg 844w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/6.-RStudio-autofill-300x64.jpg 300w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/6.-RStudio-autofill-768x165.jpg 768w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/6.-RStudio-autofill-65x14.jpg 65w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/6.-RStudio-autofill-225x48.jpg 225w, https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/6.-RStudio-autofill-350x75.jpg 350w\" sizes=\"auto, (max-width: 844px) 100vw, 844px\" \/><figcaption id=\"caption-attachment-179\" class=\"wp-caption-text\">Figure 2.6: The auto-fill function in RStudio<\/figcaption><\/figure>\n<p>Armed with these concepts, you should be prepared for most of the things we discuss from here on.<\/p>\n<p>&nbsp;<\/p>\n<p style=\"text-align: justify\"><strong>Mistakes and the importance of practice<\/strong><\/p>\n<p style=\"text-align: justify\">In the beginning, you will make a lot of mistakes. This does not mean anything is wrong with you. From beginners to experienced programmers, everyone makes mistakes and runs into error messages from time to time. In fact, the more time you spend programming, the more error messages you will see. In a sense, learning a programming language forces you to confront your mistakes more than other solitary practices, since computers are extremely nit-picking and take everything you enter exactly as it is, regardless of whether it makes sense or not. To help you avoid more error messages than are necessary, we compiled a list of several frequent mistakes and let you sort out the correct from the incorrect commands.<\/p>\n<p style=\"text-align: justify\">\n<div id=\"h5p-7\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-7\" class=\"h5p-iframe\" data-content-id=\"7\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"2.3 Drag and drop the pieces of code into the appropriate category\"><\/iframe><\/div>\n<\/div>\n<p><a href=\"https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_c7ry0ht7\">https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_c7ry0ht7<\/a><\/p>\n<p style=\"text-align: justify\"><span><br \/>\n<!-- iframe plugin v.6.0 wordpress.org\/plugins\/iframe\/ --><br \/>\n<iframe loading=\"lazy\" width=\"640\" height=\"360\" src=\"https:\/\/tube.switch.ch\/embed\/5bf28343\/spanspan\" frameborder=\"0\" 0=\"webkitallowfullscreen\" 1=\"mozallowfullscreen\" 2=\"allowfullscreen\" scrolling=\"yes\" class=\"iframe-class\"><\/iframe><br \/>\n<\/span><\/p>\n<p style=\"text-align: justify\">You will spend a lot of time browsing manuals, debugging and googling. As the saying goes, there\u2019s no learning like learning the hard way. While this is perhaps a negative way of looking at things, it has a decisively positive flipside: you are in constant interaction with the computer which means it is easy to make progress. So you get error messages, and you get output, and the more you practice, the more often the output will be what you want it to be and at some point you will find solutions to errors more quickly and, eventually, the error messages will become rarer (although if you stop getting any error messages, it means that you probably stopped programming).<\/p>\n<p style=\"text-align: justify\">What is more, you are not alone in this. The online community has answered a lot of questions, and often it is enough to just google the error message to find a solution. For instance, if you search Google for \u201cR random number generate\u201d you will find many different ways of doing just that. We should also add, though, that it becomes easier to find solutions online if you are familiar with the R terminology, so we would encourage you to think, for example, about \u201cdata frames\u201d rather than \u201ctables\u201d when working with R. We cannot practice for you, but we make sure that you pick up the correct terminology from reading this book.<\/p>\n<p><a href=\"https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_r1y7v2d8\">https:\/\/uzh.mediaspace.cast.switch.ch\/media\/0_r1y7v2d8<\/a><\/p>\n<p style=\"text-align: justify\"><span><br \/>\n<!-- iframe plugin v.6.0 wordpress.org\/plugins\/iframe\/ --><br \/>\n<iframe loading=\"lazy\" width=\"640\" height=\"360\" src=\"https:\/\/tube.switch.ch\/embed\/9c7af654\/spanspan\" frameborder=\"0\" 0=\"webkitallowfullscreen\" 1=\"mozallowfullscreen\" 2=\"allowfullscreen\" scrolling=\"yes\" class=\"iframe-class\"><\/iframe><br \/>\n<\/span><\/p>\n<p style=\"text-align: justify\">After our litany on practicing, we want to give one more piece of advice before beginning with statistics proper. One of the difficulties beginners encounter, and indeed some of the most frustrating challenges occur when importing files into R. We already discussed one way of doing that in the previous chapter, where you imported a tab-separated table with headers into R. In addition to tab-separated files, there are two more structures which are worth knowing. Open these two raw text files in new tabs:<\/p>\n<p><a href=\"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/vector1.txt\">vector1.txt<\/a> and <a href=\"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-content\/uploads\/sites\/23\/2019\/08\/vector2.txt\">vector2.txt<\/a><\/p>\n<p style=\"text-align: justify\">First of all, since they are one-dimensional we can identify both of these files as vectors. Then, we see that the data points in the first vector are separated with new lines, and those in the second are separated by a whitespace. When importing a file it is very helpful to be aware of how the file is structured.<\/p>\n<p style=\"text-align: justify\">For instance, if we import <em>vector1.txt<\/em> using the command we <code>file.choose()<\/code> that we saw in the previous chapter, we can use a simple piece of code:<\/p>\n<pre><code>&gt; v1 &lt;- scan(file=choose.files())<\/code><\/pre>\n<p style=\"text-align: justify\">Again, a window will open and we can open <em>vector1.txt<\/em> from whereever we saved it. However, if you try to use the same code to open <em>vector2.txt<\/em> you get an error message:<\/p>\n<pre class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; v2<\/span><span class=\"GNKRCKGCMRB ace_keyword\"> &lt;- scan(file=choose.files())\r\n<\/span><span class=\"GNKRCKGCASB ace_constant\">Error in scan(file = choose.files()) : \r\n  scan() expected 'a real', got 'anton'<\/span><\/span><\/code><\/pre>\n<p style=\"text-align: justify\">The error message occurs for two reasons. Firstly, <em>vector2.txt <\/em>contains character strings, not numbers. Secondly, the data points are separated, as we said, by a whitespace. We have to augment the <code>scan()<\/code> command by two arguments:<\/p>\n<pre class=\"GNKRCKGCGSB\"><span><code><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">v2 &lt;- scan(file=choose.files(),sep=\" \",what = \"char\"); v2\r\n<\/span><span class=\"GNKRCKGCASB ace_constant\">Read 5 items\r\n<\/span><\/code><span class=\"GNKRCKGCGSB\"><code>[1] \"anton\"  \"berta\"  \"caesar\" \"dora\"   \"emil\" <\/code> <\/span><\/span><\/pre>\n<p style=\"text-align: justify\">These arguments, which are also called flags, are not optional for the separator and the data type. If you don&#8217;t include them, you cannot open <em>vector2.txt<\/em>.<\/p>\n<p style=\"text-align: justify\">When working on a script in R over multiple sessions, it becomes tedious to manually select the file you work with each time. In order to target the correct file automatically, adapt the file path and enter the absolute file location:<\/p>\n<pre class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">v2 &lt;- scan(file=\"C:\/Users\/Max\/Documents\/R\/stats for linguistics\/chapter 2\/vector2.txt\", what=\"char\", sep=\" \")\r\n<\/span><span class=\"GNKRCKGCASB ace_constant\">Read 5 items<\/span><\/span><\/code><\/pre>\n<p style=\"text-align: justify\">This also works with the <code>read.table()<\/code> function which we used in the previous chapter:<\/p>\n<pre class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">verbs &lt;- read.table(file=\"C:\/Users\/Max\/Documents\/R\/stats for linguistics\/chapter 1\/verbs.txt\", header=TRUE, sep=\"\\t\")<\/span><\/span><\/code><\/pre>\n<p style=\"text-align: justify\">So far, we have only discussed how to get data into R. But, of course, sometimes we want to edit the data in R and then save the results. For this we can use the <code>cat()<\/code> function.<\/p>\n<p style=\"text-align: justify\">Imagine, for instance, that you want to create a list of possible names for your unborn child. So far, you have compiled a list of of alphabetically sorted names: the <em>v2<\/em> you imported above. You feel that five names just don&#8217;t cut it, you want at least seven. So you add the two next candidates to the list:<\/p>\n<pre class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">v2 &lt;- append(v2, c(\"friedrich\", \"gudrun\"))<\/span><\/span><\/code><\/pre>\n<p style=\"text-align: justify\">In a next step, you want to save the new variable with seven names so you can print the list and discuss the new additions with your spouse. To do so, you need the <code>cat()<\/code> function and the arguments it takes: the variable you want to save, the file path and the separator type:<\/p>\n<pre class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">cat(v2, file=\"C:\/Users\/Max\/Documents\/R\/stats for linguistics\/chapter 2\/vector2_more_names.txt\", sep=\"\\n\")<\/span><\/span><\/code><\/pre>\n<p style=\"text-align: justify\">Here, we decided to save the longer list as a new file, with line breaks separating the names. If you want to overwrite an existing file, you can also use the <code>file.choose()<\/code> function:<\/p>\n<pre class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">cat(v2, file.choose())<\/span><\/span><\/code><\/pre>\n<p>&nbsp;<\/p>\n<p><strong>Vectors<\/strong><\/p>\n<p style=\"text-align: justify\">Finally, since we are already working with them, let us add some words about vectors. We have discussed how vectors can be imported, lengthened and saved. With <em>v1<\/em> and <em>v2<\/em> we have seen that we can store both numeric and character data in vectors. In the example above, we played around with <em>v2<\/em>, so now let&#8217;s take a look at what we can do with the numeric vector, <em>v1<\/em>.<\/p>\n<p>Like we did above, we can add more information to the vector:<\/p>\n<pre class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">numbers &lt;- append(v1,c(8,9,11)) ; numbers\r\n<\/span><span class=\"GNKRCKGCGSB\">[1]  1  2  3  4  5  8  9 11<\/span><\/span><\/code><\/pre>\n<p>Then we can calculate the mean:<\/p>\n<pre class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">mean(numbers)\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 5.375<\/span><\/span><\/code><\/pre>\n<p>Or run it through a loop:<\/p>\n<pre class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">for (n in c(1:10)) {numbers = append((mean(numbers)+length(numbers)),numbers)} ; numbers\r\n<\/span><span class=\"GNKRCKGCGSB\"> [1] 30.65330 28.71213 26.77463 24.84129 22.91272 20.98965 19.07298 17.16389 15.26389 13.37500  1.00000\r\n[12]  2.00000  3.00000  4.00000  5.00000  8.00000  9.00000 11.00000<\/span><\/span><\/code><\/pre>\n<p style=\"text-align: justify\">There are many ways to manipulate numeric vectors, and we will discuss them more in subsequent chapters.<\/p>\n<p style=\"text-align: justify\">What happens if we encounter a vector which contains both numbers and words? We can try that out by generating one:<\/p>\n<pre class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">a &lt;- c(1,2,3, \"hi!\")<\/span><\/span><\/code><\/pre>\n<p>First, let&#8217;s check whether this is even a vector:<\/p>\n<pre class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">is.vector(a)\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] TRUE<\/span><\/span><\/code><\/pre>\n<p>We can also check out whether <em>a<\/em> has as many positions as we think it does:<\/p>\n<pre class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">length(a)\r\n<\/span><span class=\"GNKRCKGCGSB\">[1] 4<\/span><\/span><\/code><\/pre>\n<p>Then, we can use the <code>str()<\/code> function to view the structure of this object <em>a<\/em>:<\/p>\n<pre class=\"GNKRCKGCGSB\"><code><span><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">str(a)\r\n<\/span><span class=\"GNKRCKGCGSB\"> chr [1:4] \"1\" \"2\" \"3\" \"hi!\"<\/span><\/span><\/code><\/pre>\n<p style=\"text-align: justify\">Here, we see that even the numbers in this vector are characters. This makes sense since R is incapable of coercing characters strings into numbers:<\/p>\n<pre class=\"GNKRCKGCGSB\"><span><code><span class=\"GNKRCKGCMSB ace_keyword\">&gt; <\/span><span class=\"GNKRCKGCMRB ace_keyword\">as.integer(a)\r\n<\/span><span class=\"GNKRCKGCGSB\">[1]  1  2  3 NA\r\n<\/span><span class=\"GNKRCKGCASB ace_constant\">Warning message:\r\n<\/span><\/code><span class=\"GNKRCKGCASB ace_constant\"><code>NAs introduced by coercion<\/code> <\/span><\/span><\/pre>\n<p style=\"text-align: justify\">And since numbers can be converted into characters, this is what R does.<\/p>\n<p>Use your understanding of how vectors to build a simple random name generator:<\/p>\n<div id=\"h5p-35\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-35\" class=\"h5p-iframe\" data-content-id=\"35\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"2.3 Random name generator\"><\/iframe><\/div>\n<\/div>\n<p style=\"text-align: justify\">In the next couple of chapters, we will discuss the fundamentals of statistics, which means we will be working mostly with numeric data. Find out what this means in detail by continuing to the next chapter, <a href=\"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/chapter\/basics-of-descriptive-statistics\/\">Basics of Descriptive Statistics<\/a>.<\/p>\n<p>&nbsp;<\/p>\n<p style=\"text-align: justify\">If you are interested in acquainting yourself with further useful functions in R before continuing, we recommend reading the documentation of and playing around with the following functions:<\/p>\n<ul>\n<li>as.characters<\/li>\n<li>as.integers<\/li>\n<li>attach<\/li>\n<li>plot<\/li>\n<li>hist<\/li>\n<li>barplot<\/li>\n<li>pie<\/li>\n<\/ul>\n<p style=\"text-align: justify\">For further reading and more exercises we recommend reading chapter 2 in Gries (2008) and visiting his website at <a href=\"http:\/\/www.stgries.info\/research\/sflwr\/sflwr.html\">http:\/\/www.stgries.info\/research\/sflwr\/sflwr.html<\/a>.<\/p>\n<p>&nbsp;<\/p>\n<p style=\"text-align: justify\"><strong>Reference:<\/strong><\/p>\n<p>Gries, Stefan Th. 2008. <em>Statistik f\u00fcr Sprachwissenschaftler<\/em>. G\u00f6ttingen: Vandenhoeck &amp; Ruprecht.<\/p>\n","protected":false},"author":29,"menu_order":2,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-23","chapter","type-chapter","status-publish","hentry"],"part":3,"_links":{"self":[{"href":"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-json\/pressbooks\/v2\/chapters\/23","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-json\/wp\/v2\/users\/29"}],"version-history":[{"count":27,"href":"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-json\/pressbooks\/v2\/chapters\/23\/revisions"}],"predecessor-version":[{"id":631,"href":"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-json\/pressbooks\/v2\/chapters\/23\/revisions\/631"}],"part":[{"href":"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-json\/pressbooks\/v2\/parts\/3"}],"metadata":[{"href":"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-json\/pressbooks\/v2\/chapters\/23\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-json\/wp\/v2\/media?parent=23"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-json\/pressbooks\/v2\/chapter-type?post=23"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-json\/wp\/v2\/contributor?post=23"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/dlf.uzh.ch\/openbooks\/statisticsforlinguists\/wp-json\/wp\/v2\/license?post=23"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}