Content from Working With Variables


Last updated on 2023-12-04 | Edit this page

Estimated time: 50 minutes

Overview

Questions

  • How can I store values and do simple calculations with them?
  • Which type of operations can I do?

Objectives

  • Navigate among important sections of the MATLAB environment.
  • Assign values to variables.
  • Identify what type of data is stored in a variable.
  • Creating simple arrays.
  • Be able to explore the values of saved variables.
  • Learn how to delete variables and keep things tidy.

Introduction to the MATLAB GUI


Before we can start programming, we need to know a little about the MATLAB interface. Using the default setup, the MATLAB desktop contains several important sections:

  • In the Command Window we can execute commands. Commands are typed after the prompt >> and are executed immediately after pressing Enter.
  • Alternatively, we can open the Editor, write our code and run it all at once. The advantage of this is that we can save our code and run it again in the same way at a later stage.
  • The Workspace contains all the variables which we have loaded into memory.
  • The Current Folder window shows files in the current directory. We can change the current folder using this window.
  • Search Documentation on the top right of your screen lets you search for functions. Suggestions for functions that will do what you want to do will pop up. Clicking on them will open the documentation. Another way to access the documentation is via the help command — we will return to this later.

Working with variables


In this lesson we will learn how to manipulate the inflammation dataset with MATLAB. But before we discuss how to deal with many data points, we will demonstrate how to store a single value on the computer.

We can create a new variable by assigning a value to it using =

MATLAB

>> x = 55

OUTPUT

x =
    55

Notice that MATLAB responded by printing an output confirming that the variable has the desired value, and also that the variable appeared in the workspace.

A variable is just a name for a piece of data or value. Variable names must begin with a letter, and are case sensitive. They can also contain numbers or underscores. Examples of valid variable names are weight, size3, patient_name or alive_on_day_3.

The reason we work with variables is so that we can reuse them, or save them for later use. We can also do operations with these variables. For example, we can do a simple sum:

MATLAB

>> y = 10
>> x + y

OUTPUT

y =
    10
ans =
    65

Note that the answer was saved in a new variable called ans. This variable is temporary, and will be overwritten with any new operation we do. For example, if we now substract y from x we get:

MATLAB

>> x - y

OUTPUT

ans =
    45

The result of the sum is now gone forever. We can assign the result of an operation to a new variable, for example:

MATLAB

>> z = x * y

OUTPUT

z =
    550

This created a new variable z. If you look at the workspace, you can see that the value of z is 550.

We can even use a variable in an operation, and save the value in the same variable. For example:

MATLAB

>> y = y/5

OUTPUT

y =
    2

Here you can see that the expression to the right of the = sign is evaluated first, and the result is then assigned to the variable specified to the left of the = sign.

We can use multiple variables in a single operation, for example:

MATLAB

>> z = z - y^3 + 5*x

OUTPUT

z =
    817

where we used the caret symbol ^ to take the third power of y.

Logical operations

In programming, there is another type of operation that becomes very important: comparison. We can compare two numbers (or variables) to see which one is smaller, for example

MATLAB

>> mass = 20
>> age = 2.5
>> frac = mass/age
>> c1 = frac < 10

OUTPUT

mass =
    20
age =
    2.5000
frac =
     8

c1 =
  logical
   1

Something interesting just happened with the variable c1. If I ask you whether frac (8) is smaller than 10, you would say “yes”. Matlab answered with a logical 1. If I ask you whether frac is greater than 10, you would say “no”. Matlab answers with a logical 0.

MATLAB

>> c2 = frac > 10

OUTPUT

c2 =
  logical
   0

There are only two options (yes or no, true or false, 0 or 1), and so it is “cheaper” for the computer to save space only for those two options.

The “type” of this data is not the same as the “type” of data that represents a number. It comes from a logical comparison, and so MATLAB identifies it as such.

You can also see that in the workspace these variables have a tick next to them, instead of the squares we had seen. There are actually other symbols that appear there, relating to the different types of information we can save in variables (unfold the info below if you want to know more).

Data types

We mentioned above that we can get other symbols in the workspace which relate to the types of information we can save.

We know we can save numbers, and logical values, but we can also save letters or strings, for example. Numbers are by default saved as type double, which means they can store very big or very small numbers. Letters are type char, and words or sentences are strings. Logical values (or booleans) are values that mean true or false, and are represented with zero or one. They are usually the result of comparing things.

MATLAB

>> weight = 64.5
>> size3 = 'L'
>> patient_name = "Jane Doe"
>> alive_on_day_3 = true

OUTPUT

weight =
   64.5000
size3 =
    'L'
patient_name =
    "Jane Doe"
alive_on_day_3 =
  logical
   1

Notice the single tick for character variables, in contrast with the double quote for strings.

If you look at the workspace, you’ll notice that the icon next to each variable is different, and if you hover over it, it will tell you the type of variable it is.

You can also check the “class” of the variable with the class function:

MATLAB

>> class(patient_name)

OUTPUT

ans =
    'string'

We can also check if two variables (or even operations) are the same

MATLAB

>> c3 = frac == mass/age

OUTPUT

c3 =
  logical
   1

We can also combine comparisons. For example, we can check whether frac is smaller than 10 and the age is greater than 5

MATLAB

>> c4 = frac < 10 && age > 5

OUTPUT

c4 =
  logical
   0

In this case, both conditions need to be met for the result to be “yes” (1).

If we want a “yes” as long as at least one of the conditions are met, we would ask if frac is smaller than 10 or the age is greater than 5

MATLAB

>> c5 = frac < 10 || age > 5

OUTPUT

c5 =
  logical
   1

Negating conditions and including the limits

We often asks questions or characterise things in negative. “We did not start late today.”, “I was not going faster than the speed limit officer!”, and “I didn’t shoot no deputy” are just some examples.

Naturally, we may want to do so in programming too. In MATLAB the negative is represented with ~. For example, we can check if the speed is indeed not faster than the limit with ~(speed > 70), which MATLAB reads as “not speed greater than 70”.

Can you express these questions in MATLAB code?

  • Is 1 + 2 + 3 + 4 not smaller than 10?
  • Is 5 to the power of 3 different from 125?
  • Is x + y greater or equal to x/y?
  • Is x + y not greater or equal to x/y?

We can ask the first two question in positive, encapsulate it in brackets, and then negate it:

  • ~(1 + 2 + 3 + 4 < 10)
  • ~(5^3 == 125)

Asking if two things are different is so common, that MATLAB has a special symbol for it. So the second question, we could have asked instead with

  • 5^3 ~= 125

We can ask if x+y is greater or equal to x/y with:

  • x+y > x/y || x+y == x/y

There is actually again a shortcut for this, MATLAB understands >= as “greater or equal to”, and of cours for smaller or equal too it understands <=. So the same condition could be written as:

  • x+y >= x/y

Asking if x + y is not greater or equal to x/y is the same question as above, but negated. Remembering to add the brackets, we get:

  • ~(x+y > x/y || x+y == x/y)
  • or ~(x+y >= x/y)

Arrays

You may notice that all of the variable types start with a 1x1. This is because MATLAB thinks in terms of groups of variables called arrays, or matrices.

We can create an array using square brackets and separating each value with a comma:

MATLAB

>> A = [1, 2, 3]

OUTPUT

A =
     1     2     3

If you now hover over the data type icon, you’ll find that it shows 1x3. This means that the array A has 1 row and 3 columns.

We can create matrices using semi-colons to separate rows:

MATLAB

>> B = [1, 2; 3, 4; 5, 6]

OUTPUT

B =
     1     2
     3     4
     5     6

You’ll notice that B has three rows and two columns, which explains the 3x2 we get from the workspace.

We can also create arrays of other types of data. For example, we could create an array of names:

MATLAB

>> Names = ["John", "Abigail", "Bertrand", "Lucile"]

OUTPUT

Names =
  1×4 string array
    "John"    "Abigail"    "Bertrand"    "Lucile"

We can use logical values too:

MATLAB

>> C = [true; false; false; true]

OUTPUT

C =
  4×1 logical array
   1
   0
   0
   1

Something to bear in mind, however, is that all values in an array must be of the same type.

We mentioned before that MATLAB is actually more used to working with arrays than individual variables. Well, if it is so used to working with arrays, can we do operations with them?

The answer is yes! In fact, this is what makes MATLAB a particularly interesting programming language.

We can, for example, check the whole matrix B and look for values greater than, say, 3.

MATLAB

>> B > 3

OUTPUT

ans =
  3×2 logical array
   0   0
   0   1
   1   1

MATLAB then compared each element of B and asked “is this element greater than 3?”. The result is another array, of the same size and dimensions as B, with the answers.

We can also do sums, multiplications, and pretty much anything we want with an array, but we need to be careful with what we do.

Despite this being so interesting and increadibly powerful, this course will focus more on basic programming concepts, and so we won’t use this feature very much. However, it is very important that you keep it in mind, and that you do ask questions about it during the break if you are interested.

Suppressing the output

In general, the output can be a bit redundant (or even annoying!), and it can make the code slower, so it is considered good form to suppress it. To suppress it, we add a semi-colon at the end of the line:

MATLAB

>> x = 33;

At first glance nothing appears to have happened, but the workspace shows the new value was assigned.

Printing a variable’s value

If we really want to print the variable, then we can simply type its name and hit Enter,

MATLAB

>> patient_name

OUTPUT


patient_name =

    "Jane Doe"

or using the disp function.

Functions are pre-defined algorithms (chunks of code), that can be used multiple times. They usually take some “inputs” inside brackets, and either have an effect on something or output something.

The disp function, in particular, takes just one input – the variable that you want to print – and what it does is to print the variable in a nice way. For the variable patient_name, we would use it like this:

MATLAB

>> disp(patient_name)

OUTPUT

Jane Doe

Note how the output is a bit different from what we got when we just typed the variable name. There is less indentation and less empty lines.

Keeping things tidy

We have declared a few variables now, and we might not be using all of them. If we want to delete a variable we can do so by typing clear and the name of the variable, e.g.:

MATLAB

>> clear alive_on_day_3

You might be able to see it disappear from the workspace. If you now try to use alive_on_day_3, MATLAB will give an error.

We can also delete all of our variables with the command clear, without any variable names following it. Be careful though, there’s no way back!

Another thing you might want to clear every once in a while is the output pane. To do that, we use the command clc.

MATLAB

>> clc

Again, be careful usig this command, there is no way back!

Key Points

  • Variables store data for future use. Their names must start with a letter, and can have underscores and numbers.
  • We can add, substract, multiply, divide and potentiate numbers.
  • We can also compare variables with <, >, ==, >=, <=, ~=, and use ~ to negate the result.
  • Combine logical operations with && (and) and || (or).
  • MATLAB stores data in arrays. The data in an array has to be of the same type.
  • You can supress output with ;, and print a variable with disp.
  • Use clear to delete variables, and clc to clear the console.

Content from Arrays


Last updated on 2024-03-22 | Edit this page

Estimated time: 40 minutes

Overview

Questions

  • How can I access the information in an array?

Objectives

  • Learn how to create multidimensional arrays
  • Select individual values and subsections of an array.

Initializing an Array


We just talked about how MATLAB thinks in arrays, and declared some very simple arrays using square brackets. In some cases, we will want to create space to save data, but not save anything just yet. One way of doing so is with zeros. The function zeros takes the dimensions of our array as arguments, and populates it with zeros. For example,

MATLAB

>> Z = zeros(3,5)

OUTPUT

Z =
     0     0     0     0     0
     0     0     0     0     0
     0     0     0     0     0

creates a matrix of 3 rows and 5 columns, filled with zeros. If we had only passed one dimension, MATLAB assumes you want a square matrix, so

MATLAB

>> Z = zeros(3)

OUTPUT

Z =
     0     0     0
     0     0     0
     0     0     0

yields a 3×3 array. If we want a single row and 5 columns, we need to remember that MATLAB reads rows×columns, so

MATLAB

>> Z = zeros(1,5)

OUTPUT

Z =
     0     0     0     0     0

This way zeros function works is shared with many other functions that create arrays.

For example, the ones function is nearly identical, but the arrays are filled with ones, and the rand function assigns uniformly distributed random numbers between zero and 1 to each space in the array.

MATLAB

>> R = rand(8);
>> O = ones(10,10);

Callout

Note: This is when supressing the output becomes more important. You can more comfortably explore the variables R and O by double clicking them in the workspace.

The ones function can actually help us initialize a matrix to any value, because we can multiply a matrix by a constant and it will multiply each element. So for example,

MATLAB

>> Fives = ones(3,6)*5;

Produces a 3×6 matrix full of fives.

The magic function works in a similar way, but you can only declare square matrices with it. The magic thing about them is that the sum of the elements on each row or column is the same number.

MATLAB

>> M = magic(4)

OUTPUT

M =
    16     2     3    13
     5    11    10     8
     9     7     6    12
     4    14    15     1

In this case, each row or column adds up to 34. But how could I tell in a bigger matrix? How can I select some of the elements of the array and sum them, for example?

Array indexing


Array indexing, is the method by which we can select one or more different elements of an array. A solid understanding of array indexing will be essential to working with arrays. Lets start with selecting one element.

First, we will create an 8×8 “magic” matrix:

MATLAB

>> M = magic(8)

OUTPUT

ans =

   64    2    3   61   60    6    7   57
    9   55   54   12   13   51   50   16
   17   47   46   20   21   43   42   24
   40   26   27   37   36   30   31   33
   32   34   35   29   28   38   39   25
   41   23   22   44   45   19   18   48
   49   15   14   52   53   11   10   56
    8   58   59    5    4   62   63    1

We want to access a single value from the matrix:

Accessing a single value

To do that, we must provide its index in parentheses. In a 2D array, this means the row and column of the element separated by a comma, that is, as (row, column). This index goes after the name of our array. In our case, this is:

MATLAB

>> M(5, 6)

OUTPUT

ans = 38

So the index (5, 6) selects the element on the fifth row and sixth column of M.

Callout

Note: Matlab starts counting indices at 1, not 0! Many other programming languages start counting indices at 0, so be careful!.

An index like the one we used selects a single element of an array, but we can also select a group of elements if instead of a number we give arrays of indices. For example, if we want to select this submatrix:

Accessing a submatrix

we want rows 4, 5 and 6, and columns 5, 6 and 7, that is, the arrays [4,5,6] for rows, and [5,6,7] for columns:

MATLAB

>> M([4,5,6],[5,6,7])

OUTPUT

ans =
   36   30   31
   28   38   39
   45   19   18

The : operator

In matlab, the symbol : (colon) is used to specify a range. The range is specified as start:end. For example, if we type 1:6 it generates an array of consecutive numbers from 1 to 6:

MATLAB

>> 1:6

OUTPUT

ans =
   1     2     3     4     5     6

We can also specify an increment other than one. To specify the increment, we write the range as start:increment:end. For example, if we type 1:3:15 it generates an array starting with 1, then 1+3, then 1+2*3, and so on, until it reaches 15 (or as close as it can get to 15 without going past it):

MATLAB

>> 1:3:15

OUTPUT

ans =
   1     4     7    10    13

The array stopped at 13 because 13+3=16, which is over 15.

The rows and columns we just selected could have been specified as ranges. So if we want the rows from 4 to 6 and columns from 5 to 7, we can specify the ranges as 4:6 and 5:7. On top of being a much quicker and neater way to get the rows and columns, MATLAB knows that the range will produce an array, so we do not even need the square brackets anymore. So the command above becomes:

MATLAB

>> M(4:6, 5:7)

OUTPUT

ans =
   36   30   31
   28   38   39
   45   19   18

Checkerboard

Select the elements highlighted on the image:

Accessing strided rows and columns

We need to select every other element in both dimensions. To do that, we define the apropriate intervals with an increment of 2:

MATLAB

>> M(1:3:8, 2:2:8)

OUTPUT

ans =
    2   61    6   57
   26   37   30   33
   15   52   11   56

Selecting whole rows or columns

If we want a whole row, for example:

Accessing a row

we could in principle pick the 5th row and for the columns use the range 1:8.

MATLAB

>> M(5, 1:8)

OUTPUT

ans =
   32   34   35   29   28   38   39   25

However, we need to know that there are 8 columns, which is not very robust.

The key-word end

When indexing the elements of an array, the key word end can be used to get the last index available.

For example, M(2, end) returns the last element of the second row:

MATLAB

>> M(2, end)

OUTPUT

ans =
   16

We can also use it in combination with the : operator. For example, M(5:end, 3) returns the elements of column 3 from row 5 until the end:

MATLAB

>> M(5:end,3)

OUTPUT

ans =
   35
   22
   14
   59

We can then use the keyword end instead of the 8 to get the whole row with 1:end.

MATLAB

>> M(5, 1:end)

OUTPUT

ans =
   32   34   35   29   28   38   39   25

This is much better, now this works for any size of matrix, and we don’t need to know the size.

Using : as an index

Getting a whole row or column is such a common operation, that MATLAB has a shortcut: Using : alone is equivalent to 1:end!

For example, We can then get the whole fifth row with:

MATLAB

>> M(5, :)

OUTPUT

ans =
   16

As you can see, the : operator is quite important when accessing arrays!

We can use it to select multiple rows,

Accessing multiple rows

MATLAB

>> M(1:4, :)

OUTPUT

ans =
   64    2    3   61   60    6    7   57
    9   55   54   12   13   51   50   16
   17   47   46   20   21   43   42   24
   40   26   27   37   36   30   31   33

or multiple columns:

Accessing multiple columns

MATLAB

>> M(:, 6:end)

OUTPUT

ans =
    6    7   57
   51   50   16
   43   42   24
   30   31   33
   38   39   25
   19   18   48
   11   10   56
   62   63    1

or even the whole matrix. Try for example:

MATLAB

>> N = M(:)

and you’ll see that it returns all the elements of M. The result, however, is a column vector, not a matrix. We can make sure that the result of M(:) has 8x8=64 elements by using the function size, which returns the dimensions of the array given as an input:

MATLAB

>> size(N)

OUTPUT

ans =
   64    1

So it has 64 rows and 1 column. Effectively, then, M(:) ‘flattens’ the array into a column vector. The order of the elements in the resulting vector comes from appending each column of the original array in turn. This is the result of something called linear indexing, which is a way of accessing elements of an array by a single index.

Master indexing

Select the elements highlighted on the image without using the numbers 5 or 8, and using end only once:

Accessing strided columns

We need to tart with row 2, and subsequently select every third row:

MATLAB

>> M(2:3:end, :)

OUTPUT

ans =
    9   55   54   12   13   51   50   16
   32   34   35   29   28   38   39   25
    8   58   59    5    4   62   63    1

Slicing character arrays


A subsection of an array is called a slice. We can take slices of character arrays as well:

MATLAB

>> element = 'oxygen';
>> disp("first three characters: " + element(1:3))
>> disp("last three characters: " + element(4:6))

OUTPUT

first three characters: oxy
last three characters: gen

And we can use all the tricks we have learned to select the data we want. For example, to select every other character we can use the colon operator with an increment of 2:

MATLAB

>> element(1:2:end)

OUTPUT

ans =
    'oye'

We can also use the colon operator to access all the elements of the array, but you’ll notice that the only difference between evaluating element and element(:) is that the former is a row vector, and the latter a column vector.

Key Points

  • Some functions to initialize matrices include zeros, ones, and rand. They all produce a square matrix if only one argument is given, but you can specify the dimensions you want separated by a comma, as in zeros(rows,columns).
  • To select data points we use round brackets and provide the row and column indices of the elements we want. They can be just numbers or arrays of numbers, e.g. M(5,[3,4,5]).
  • Use the colon operator : to generate ordered arrays as start:end or start:increment:end.
  • Use the keyword end to obtain the index of the final element.
  • The colon operator by itself : selects all the elements.

Content from Loading data


Last updated on 2023-12-08 | Edit this page

Estimated time: 40 minutes

Overview

Questions

  • How can I load data to an array?

Objectives

  • Read data from a csv to be able to work with it in matlab.
  • Familiarize ourselves with our sample data.

Loading data to an array


Reading data from files and writing data to them are essential tasks in scientific computing, and something that we’d rather not spend a lot of time thinking about. Fortunately, MATLAB comes with a number of high-level tools to do these things efficiently, sparing us the grisly detail.

Before we get started, however, let’s make sure we have the directories to help organise this project.

Tip: Good Enough Practices for Scientific Computing

Good Enough Practices for Scientific Computing is a paper written by researchers involved with the Carpentries, which covers basic workflow skills for research computing. It recommends the following for project organization:

  1. Put each project in its own directory, which is named after the project.
  2. Put text documents associated with the project in the doc directory.
  3. Put raw data and metadata in the data directory, and files generated during clean-up and analysis in a results directory.
  4. Put source code for the project in the src directory, and programs brought in from elsewhere or compiled locally in the bin directory.
  5. Name all files to reflect their content or function.

We already have a data, results and src directories in our matlab-novice-inflammation project directory, so we are ready to continue.

A final step is to set the current folder in MATLAB to our project folder. Use the Current Folder window in the MATLAB GUI to browse to your project folder (the one now containing the ‘data’, ‘results’ and ‘src’ directories).

To verify the current directory in MATLAB we can run pwd (print working directory).

MATLAB

>> pwd

OUTPUT

.../Desktop/matlab-novice-inflammation

A second check we can do is to run the ls (list) command in the Command Window to list the contents of the working directory — we should get the following output:

MATLAB

>> ls

OUTPUT

data  results  src

We are now set to load our data. As a reminder, our data is structured like this:

Information saved in each data file.

But it is stored without the headers, as comma-separated values. Each line in the file corresponds to a row, and the value for each column is separated from its neighbours by a comma. The first few rows of our first file, data/base/inflammation-01.csv, look like this:

0,0.065,0.169,0.271,0.332,0.359,0.354,0.333,0.304,0.268,0.234,0.204,0.179,0.141,0.133,0.115,0.083,0.076,0.065,0.065,0.047,0.04,0.041,0.028,0.02,0.028,0.012,0.02,0.011,0.015,0.009,0.01,0.01,0.007,0.007,0.001,0.008,-0,0.006,0.004
0,0.034,0.114,0.2,0.272,0.321,0.328,0.32,0.314,0.287,0.246,0.215,0.207,0.171,0.146,0.131,0.107,0.1,0.088,0.065,0.061,0.052,0.04,0.042,0.04,0.03,0.031,0.031,0.016,0.019,0.02,0.017,0.019,0.006,0.009,0.01,0.01,0.005,0.001,0.011
0,0.081,0.216,0.277,0.273,0.356,0.38,0.349,0.315,0.23,0.235,0.198,0.106,0.198,0.084,0.171,0.126,0.14,0.086,0.01,0.06,0.081,0.022,0.035,0.01,0.086,-0,0.102,0.032,0.07,0.017,0.136,0.022,-0,0.031,0.054,-0,-0,0.05,0.001

There is a very tempting button that says “Import Data” in the toolbar. If you click on it, you can find the file, and it will take you through a GUI wizard to upload the data. However, this is much more complicated than what we need, and it is not very helpful for loading multiple files (as we will in later episodes). Instead, lets try to do it on the command window.

We can search the documentation to try to learn how to read our matrix of data. Type read matrix into the documentation toolbar. MATLAB suggests using readmatrix. If we have a closer look at the documentation, MATLAB also tells us which inputs and output this function has.

For the readmatrix function we need to provide a single argument: the path to the file we want to read data from. Since our data is in the ‘data’ folder, the path will begin with “data/”, we’ll also need to specify the subfolder (we will start by using “base/”), and this will be followed by the name of the file:

MATLAB

>> patient_data = readmatrix('data/base/inflammation-01.csv');

This loads the data and assigns it to a variable, patient_data. This is a good example of when to use a semi-colon to suppress output — try re-running the command without the semi-colon to find out why. You should see a wall of numbers printed, which is the data from the file.

We can see in the workspace that the variable has 60 rows and 40 columns. If you can’t see the workspace, you can check this with size, as we did before:

MATLAB

>> size(patient_data)

OUTPUT

ans =
    60 40

You might also recognise the icon in the workspace telling you that the variable is of type double. If you don’t, you can use the class function to find out what type of data lives inside an array:

MATLAB

>> class(patient_data)

OUTPUT

ans =
    'double'

Again, this just means that you can store very small or very large numbers, called double precision floating-point numbers.

Initial exploration


We know that in our data each row represents a patient and each column a different day.

One patient at a time

We know how to access sections of our data, so lets look at a single patient first. If we want to look at a single patients’ data, then, we have to get all the columns for a given row, with:

MATLAB

>> patient_5 = patient_data(5,:)

OUTPUT

patient_5 =
  Columns 1 through 14
         0    0.0370    0.1330    0.2280    0.3060    0.3410    0.3410    0.3480    0.3160    0.2750    0.2540    0.2250    0.1870    0.1630
  Columns 15 through 28
    0.1440    0.1190    0.1070    0.0880    0.0720    0.0600    0.0510    0.0510    0.0390    0.0330    0.0240    0.0280    0.0170    0.0200
  Columns 29 through 40
    0.0160    0.0200    0.0190    0.0180    0.0070    0.0160    0.0220    0.0180    0.0150    0.0050    0.0100    0.0100

Looking at these 40 numbers tells us very little, so we might want to look at the mean instead, for example.

MATLAB

>> mean_p5=mean(patient_5)

OUTPUT

mean_p5 =
    0.1046

We can also compute other statistics, like the maximum, minimum and standard deviation.

MATLAB

>> max_p5 = max(patient_5)
>> min_p5 = min(patient_5)
>> std_p5 = std(patient_5)

OUTPUT

max_p5 =
    0.3480
min_p5 =
     0
std_p5 =
    0.1142

All data points at once

Can you think of a way to get the mean of the whole data? What about the max, min and std?

We already know that the colon operator as an index returns all the elements, so patient_data(:) will return a vector with all the data points. To compute the mean, we then use:

MATLAB

>> global_mean = mean(patient_data(:))

OUTPUT

global_mean =
    0.1053

This works for max, min and std too:

MATLAB

>> global_max = max(patient_data(:))
>> global_min = min(patient_data(:))
>> global_std = std(patient_data(:))

OUTPUT

global_max =
    0.4530
global_min =
     0
global_std =
    0.1118

Now that we have the global statistics, we can check how patient 5 compares with them:

MATLAB

>> mean_p5 > global_mean
>> max_p5 == global_max
>> min_p5 == global_min
>> std_p5 < global_std
ans =
  logical
   0
ans =
  logical
   0
ans =
  logical
   1
ans =
  logical
   0

So we know that patient 5 did not suffer more inflamation than average, that they are not the patient who got the most inflamed, that they had the global minimum inflamation at some point (0), and that the std of their inflamation is not below the average.

Food for thought

How would you find the patient who got the highest inflamation?

Would you be happy to do it if you had 1000 patients?

One day at a time

We could also have looked not at a single patient, but at a single day. The approach would be very similar, but instead of selecting all the columns in a row, we want to select all the rows for a given column:

MATLAB

>> day_9 = patient_data(:,9);

The result is now not a row of 40 elements, but a column with 60 items. However, MATLAB is smart enough to figure out what to do with enquieries just like the ones we did before.

MATLAB

>> mean_d9 = mean(day_9)
>> max_d9 = max(day_9)
>> min_d9 = min(day_9)
>> std_d9 = std(day_9)

OUTPUT

mean_d9 =
    0.3116
max_d9 =
    0.3780
min_d9 =
    0.2290
std_d9 =
    0.0186

We could now check how day 9 compares to the global values:

MATLAB

>> mean_d9 > global_mean
>> max_d9 == global_max
>> min_d9 == global_min
>> std_d9 < global_std
ans =
  logical
   1
ans =
  logical
   0
ans =
  logical
   0
ans =
  logical
   1

So we know that at day 9 there was significant inflamation, but that it is not the day with the highest inflamation; Also, that every patient was at least a bit inflamed at that moment, and that the standard deviation of inflamation this day is below the standard deviation of the whole dataset (so datapoints are closer to each other).

Food for thought

How would you find which days had an inflamation value above the global mean?

Would you be happy to do it if you had 1000 days worth of data?

Whole array analysis

The analysis we’ve done until now would be very tedious to repeat for each patient or day. Luckily, we’ve learnt that MATLAB is used to thinking in terms of arrays. Surely it must be possible to get the mean of each patient or each day in one go. It is definitely tempting to simply call the mean on the array, so let’s try it:

MATLAB

>> x=mean(patient_data);

We’ve supressed the output, but the workspace (or use of size) tells us that the result is a 1x40 array. Matlab assumed that we want column averages, and indeed that is something we might want.

The other statistics behave in the same way, so we can more appropriately label our variables as:

MATLAB

>> per_day_mean = mean(patient_data);
>> per_day_max = max(patient_data);
>> per_day_min = min(patient_data);
>> per_day_std = std(patient_data);

You’ll notice that each of the above variables is a 1×40 array.

Now that we have the information for each day in an array, we can take advantage of Matlab’s capacity to do array operations. For example, we can find out which days had an inflamation above the global average:

MATLAB

>> per_day_mean > global_mean
ans =
  1×40 logical array
  Columns 1 through 20
   0   0   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   0   0   0
  Columns 21 through 40
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

We could count which day it is, but lets take a shortcut and use the find function:

MATLAB

>> find(ans)
ans =
     3     4     5     6     7     8     9    10    11    12    13    14    15    16    17

So it seems that days 3 to 17 were the critical days.

But what if we want the analysis per patient, instead of per day?

Lets look at the documentation for mean, either through the documentation browser or using the help command

MATLAB

>> help mean

OUTPUT

mean   Average or mean value.
    S = mean(X) is the mean value of the elements in X if X is a vector. 
    For matrices, S is a row vector containing the mean value of each 
    column. 
    For N-D arrays, S is the mean value of the elements along the first 
    array dimension whose size does not equal 1.
 
    mean(X,DIM) takes the mean along the dimension DIM of X.
 
    S = mean(...,TYPE) specifies the type in which the mean is performed, 
    and the type of S. Available options are:
 
    'double'    -  S has class double for any input X
    'native'    -  S has the same class as X
    'default'   -  If X is floating point, that is double or single,
                   S has the same class as X. If X is not floating point, 
                   S has class double.
 
    S = mean(...,NANFLAG) specifies how NaN (Not-A-Number) values are 
    treated. The default is 'includenan':
 
    'includenan' - the mean of a vector containing NaN values is also NaN.
    'omitnan'    - the mean of a vector containing NaN values is the mean 
                   of all its non-NaN elements. If all elements are NaN,
                   the result is NaN.
 
    Example:
        X = [1 2 3; 3 3 6; 4 6 8; 4 7 7]
        mean(X,1)
        mean(X,2)
 
    Class support for input X:
       float: double, single
       integer: uint8, int8, uint16, int16, uint32,
                int32, uint64, int64
 
    See also median, std, min, max, var, cov, mode.

The first paragraph explains why it worked for a single day or patient. The input we used was a vector, so it took the mean.

The second paragraph explains why we got per-day means when we used the whole data as input. Our array is 2D, and the first dimention is the rows, so it averaged the rows.

The third paragraph is the key to what we want to do now. A second argument DIM can be used to specify the direction in which to take the mean. If we want patient averages, we want the columns to be averaged, that is, dimension 2.

MATLAB

>> per_patient_mean = mean(patient_data,2);

As expected, the result is a 60×1 vector, with the mean for each patient.

Unfortunately, max, min and std do not behave quite in the same way. If you explore their documentation, you’ll see that we need to add another argument, so that the commands become:

MATLAB

>> per_patient_max = max(patient_data,[],2);
>> per_patient_min = min(patient_data,[],2);
>> per_patient_std = std(patient_data,[],2);

All of the above return a 60×1 vector.

Most inflamed patients

Can you find the patients that got the highest inflamation?

Using the power MATLAB has to compare arrays, we can check which patients have a max equal to the global_max. If we wrap this check in the find function, we get the row numbers:

MATLAB

>> find(per_patient_max == global_max)
ans =
     31

So patient 31 has the maximum inflamation level.


We can only do this because we had already calculated per_patient_max. However, there is another way of doing this. Just as we used find to locate the patients that had a maximum inflammation value equal to the global maximum, we can find the value from the whole data set:

MATLAB

>> find(patient_data == global_max)
ans =
     391

However, this resulted in a rather odd number. This number represents the linear index of the global maximum. Linear indices result from counting through the elements in the first column, then continue the count on the second column and so on. Luckily, there is a function to convert this linear index into a row and number column, ind2sub. We need to provide the size of our array, and the linear index, i.e. ind2sub([60,40],391). We also need to provide space for both outputs (the row and column numbers), so we call it as [r,c]=ind2sub([60,40],391). ternatively, we can get the size and index inside the call:

MATLAB

>> [r,c]=ind2sub(size(patient_data),find(patient_data == global_max))
r =
    31
c =
     7

We can gain some insight exploring the data like we have so far, but we all know that an image speaks more than a thousend numbers, so we’ll learn to make some plots.

Key Points

  • Use readmatrix to read tabular CSV data into a program.
  • Use mean, min, max, and std on vectors to get the mean, minimum, maximum and standard deviation.
  • Use mean(array,DIM) to specify the dimension of your array in which to compute the mean.
  • For min, max, and std, the arguments need to be (array,[],DIM) instead.

Content from Plotting data


Last updated on 2024-03-22 | Edit this page

Estimated time: 60 minutes

Overview

Questions

  • How can I visualize my data?

Objectives

  • Display simple graphs with appropriate titles and labels.
  • Get familiar with the plot function.
  • Learn how to plot multiple lines at the same time.
  • Learn how to show images side by side.
  • Get familiar with the heatmap and imagesc functions.

Plotting


The mathematician Richard Hamming once said, “The purpose of computing is insight, not numbers,” and the best way to develop insight is often to visualise data. Visualisation deserves an entire lecture (or course) of its own, but we can explore a few features of MATLAB here.

We will start by exploring the function plot. The most common usage is to provide two vectors, like plot(X,Y). Lets start by plotting the average inflammation across patients over time. For the Y vector we can provide per_day_mean, and for the X vector we want to use the number of the day in the trial, which we can generate as a range with:

MATLAB

>> day_of_trial = 1:40;

Then our plot can be generated with:

MATLAB

>> plot(day_of_trial, per_day_mean)

Callout

Note: If we only provide a vector as an argument it plots a data-point for each value on the y axis, and it uses the index of each element as the x axis. For our patient data the indices coincide with the day of the study, so plot(per_day_mean) generates the same plot. In most cases, however, using the indices on the x axis is not desireable.

Callout

Note: We do not even need to have the vector saved as a variable. We would obtain the same plot with the command plot(1:40, mean(patient_data, 1)), or plot(mean(patient_data, 1)).

As it is, the image is not very informative. We need to give the figure a title and label the axes using xlabel and ylabel, so that other people can understand what it shows (including us, if we return to this plot 6 months from now).

MATLAB

>> title("Daily average inflammation")
>> xlabel("Day of trial")
>> ylabel("Inflammation")
Average inflammation

That’s much better! Now the plot actually communicates something. As we expected, this figure tells us way more than the numbers we had seen in the previous section.

Let’s have a look at two other statistics: the maximum and minimum inflammation per day across all patients.

MATLAB

>> plot(day_of_trial, per_day_max)
>> title("Maximum inflammation per day")
>> ylabel("Inflammation")
>> xlabel("Day of trial")
Maximum inflammation

Scripts

We often have to repeat a series of commands to achieve what we want, like with these plots. To be able to reuse our commands with more ease, we use scripts.

A more in depth exploration of scripts will be covered on the next episode. For now, we’ll just start by clicking new->script, using ctrl+N, or typing edit on the command window.

Any of the above should open a new “Editor” window. Save the file inside the src folder, as single_plot.m.

Alternatively, if you run

MATLAB

>> edit src/single_plot.m

it creates the file with the correct path and name for you.

Note: Make sure to add the src folder to your path, so that MATLAB knows where to find the script. To do that, right click on the src directory, go to “Add to Path” and to “Selected Folders”. Alternatively, run:

MATLAB

>> addpath("src")

Try copying and pasting the plot commands for the max inflammation on the script and clicking on the “Run” button!

Because we now have a script, it should be much easier to change the plot to the minimum inflammation:

MATLAB

>> day_of_trial = 1:40;
>> plot(day_of_trial, per_day_min)
>> title("Minimum inflammation per day")
>> ylabel("Inflammation")
>> xlabel("Day of trial")
Minumum inflammation

These two are much more noisy than the mean, as we’d be expect.

Multiple lines in a plot


It is often the case that we want to plot more than one line in a single figure. In MATLAB we can “hold” a figure and keep plotting on the same axes. For example, we might want to contrast the mean values accross patients with the inflammation of a single patient.

Lets reuse the code we have in the script, but save it as a new script called “multiline_plot.m”. You can do that using the dropdown menu on the save button, or by running this command on the terminal:

MATLAB

>> copyfile("src/single_plot.m","src/multiline_plot.m")

and then open the new file with edit src/multiline_plot.m, as before.

If we are displaying more than one line, it is important to add a legend. We can specify the legend names by adding ,DisplayName="legend name here" inside the plot function. We then need to activate the legend by running legend. So, to plot the mean values we first do:

MATLAB

>> day_of_trial = 1:40;
>> plot(day_of_trial, per_day_mean, DisplayName="Mean")
>> legend
>> title("Daily average inflammation")
>> xlabel("Day of trial")
>> ylabel("Inflammation")
Average inflamation with legend

Then, we can use the instruction hold on to add a plot for patient_5.

MATLAB

>> hold on
>> plot(day_of_trial, patient_5, DisplayName="Patient 5")
>> hold off
Average inflamation and Patient 5

So this patient seems fairly average.

Remember to tell MATLAB you are done by adding hold off when you have finished adding lines to the figure!

Patients 3 & 4

Try to plot the mean across all patients and the inflammation data for patients 3 and 4 together.

The first part for the mean remains unchanged:

MATLAB

>> day_of_trial = 1:40;
>> plot(day_of_trial, per_day_mean, DisplayName="Mean")
>> legend
>> title("Daily average inflammation")
>> xlabel("Day of trial")
>> ylabel("Inflammation")

Now we need to get the specific data for each patient. We can get the data for patients 3 and 4 as we did in the previous episode i.e. patient_data(3,:). Now we can either save that data in a variable, or we use it directly in the plot instruction, like this:

MATLAB

>> hold on
>> plot(day_of_trial, patient_data(3,:), DisplayName="Patient 3")
>> plot(day_of_trial, patient_data(4,:), DisplayName="Patient 4")
>> hold off

The result looks like this:

Average inflammation and Patients 3 & 4

Patient 4 seems also quite average, but patient’s 3 measurements are quite noisy!

Multiple plots in a figure


Note: The subplot command was deprecated in favour of tiledlayout in 2019.

It is often convenient to show different plots side by side. The tiledlayout(m,n) command allows us to do just that. The first two parameter define a grid of m rows and n columns in which our plots will be placed. To be able to plot something on each of the tiles, we use the nexttile command.

Lets start a new script for this topic:

MATLAB

>> edit src/tiled_plot.m

We can show the average daily min and max plots together with:

MATLAB

>> day_of_trial = 1:40;
>> tiledlayout(1, 2)
>> nexttile
>> plot(day_of_trial, per_day_max)
>> title("Max")
>> xlabel("Day of trial")
>> ylabel("Inflamation")
>> nexttile
>> plot(day_of_trial, per_day_min)
>> title("Min")
>> xlabel("Day of trial")
>> ylabel("Inflamation")
Max Min tiledplot

We can also specify titles and labels for the whole tiled layout if we assign the tiled layout to a variable and pass it as a first argument to title, xlabel or ylabel, for example:

MATLAB

>> day_of_trial = 1:40;
>> tlo=tiledlayout(1, 2);
>> title(tlo,"Per day data")
>> xlabel(tlo,"Day of trial")
>> ylabel(tlo,"Inflamation")
>> nexttile
>> plot(day_of_trial, per_day_max)
>> title("Max")
>> nexttile
>> plot(day_of_trial, per_day_min)
>> title("Min")
Max Min tiledplot with shared labels

Where is the nexttile?

You can specify which tile you want to plot next by specifying the number as an argument to nexttile like so:

MATLAB

>> tiledlayout(3,5)
>> nexttile(3)

Note that, as opposed to numerical arrays, the indexing goes along the row first, and then jumps to the next column.

Resizing tiles

You can also choose a different size for a plot by occupying many tiles in one go. You do that by specifying the number of rows and columns you want to use in an array ([rows,columns]), like this:

MATLAB

>> nexttile([3,1])

And you can specify the starting tile at the same time, like this:

MATLAB

>> nexttile(8,[2,3])

Note that using a starting tile that overlaps another plot will erase that axes. For example, try:

MATLAB

>> nexttile(1,[2,2])

Clearing a figure

If you now try to plot something like the mean, as we had done before, you will notice that the plot is assigned to the second plot space in the tiled layout.

To clear the tiled layout, you can use the instruction

MATLAB

>> clf

which stands for “clear figure”.

Heatmaps


If we wanted to look at all our data at the same time we need three dimensions: One for the patients, one for the day, and another one for the inflamation. One option is to use a heatmap, that is, use the colour of each point to represent the inflamation values.

In MATLAB, at least two methods can do this for us. The heatmap function takes a table as input and produces a heatmap:

MATLAB

>> heatmap(patient_data)
>> title("Inflammation")
>> xlabel("Day of trial")
>> ylabel("Patient number")
Heat map

We gain something by visualizing the whole dataset at once; for example, we can see that some patients (3, 15, 25, 31, 36 and 60) have very noisy data. However, it is harder to distinguish the details of the inflammatory response.

Similarly, the imagesc function represents the matrix as a color image.

MATLAB

>> imagesc(patient_data)
>> title("Inflammation")
>> xlabel("Day of trial")
>> ylabel("Patient number")
imagesc Heat map

Every value in the matrix is mapped to a color. Blue regions in this heat map are low values, while yellow shows high values.

Both functions provide very similar information, and can be tweaked to your liking. The imagesc function is usually only used for purely numerical arrays, whereas heatmap can process tables (that can have strings or categories in them). In our case, which one you use is a matter of taste.

Key Points

  • Use plot(vector) to visualize data in the y axis with an index number in the x axis.
  • Use plot(X,Y) to specify values in both axes.
  • Document your plots with title("My title"), xlabel("My horizontal label") and ylabel("My vertical label").
  • Use hold on and hold off to plot multiple lines at the same time.
  • Use legend and add ,DisplayName="legend name here" inside the plot function to add a legend.
  • Use tiledlayout(m,n) to create a grid of m x n plots, and use nexttile to change the position of the next plot.
  • Choose the location and size of the tile by passing arguments to nextile as nexttile(position,[m,n]).
  • Use heatmap or imagesc to plot a whole matrix with values coded as color hues.

Content from Writing MATLAB Scripts


Last updated on 2023-12-08 | Edit this page

Estimated time: 35 minutes

Overview

Questions

  • How can I save and re-use my programs?

Objectives

  • Write and save MATLAB scripts.
  • Save MATLAB plots to disk.
  • Document our scripts for future reference.

In the previous episode we started talking about scripts. A MATLAB script is just a text file with a .m extension, and we found that they let us save and run several commands in one go.

In this episode we will revisit the scripts in a bit more depth, and will recap some of the concepts we’ve learned so far.

We’ve written commands to load data from a .csv file, compute statistics from the data and plot the data in some figures. Let’s put those commands in a script called patient_analysis.m, which we’ll save in the src directory in our current folder, matlab-novice-inflammation.

To create a new script we can click the “New script” button on the top left, or use the command:

MATLAB

>> edit src/patient_analysis.m

Matlab will create a file called patient_analysis.m in the src folder. It is important that we let MATLAB know that we want it to find stuff in this folder. To do this, right click on the folder icon in the file browser and select “Add to Path”.

The MATLAB path

MATLAB knows about files in the current directory, but if we want to run a script saved in a different location, we need to make sure that this file is visible to MATLAB. We do this by adding directories to the MATLAB path. The path is a list of directories MATLAB will search through to locate files.

To add a directory to the MATLAB path, we go to the Home tab, click on Set Path, and then on Add with Subfolders.... We navigate to the directory and add it to the path to tell MATLAB where to look for our files. When you refer to a file (either code or data), MATLAB will search all the directories in the path to find it. Alternatively, for data files, we can provide the relative or absolute file path.

We can now type the contents of the script:

MATLAB

% Load patient data
patient_data = readmatrix("data/base/inflammation-01.csv");

% Compute global statistics
g_mean = mean(patient_data(:));
g_max = max(patient_data(:));
g_min = min(patient_data(:));

% Compute patient statistics
p_mean = mean(patient_data(5,:));
p_max = max(patient_data(5,:));
p_min = min(patient_data(5,:));

% Compare patient vs global
disp("Patient 5:")
disp("High mean?")
disp(p_mean > g_mean)
disp("Highest max?")
disp(p_max == g_max)
disp("Lowest min?")
disp(p_min == g_min)

Comments

You might have noticed that we described what we want our code to do in lines starting with the percent sign: %. This is another plus of writing scripts: you can comment your code to make it easier to understand when you come back to it after a while.

Now, before running this script lets clear our workplace so that we can see what is happening.

MATLAB

>> clear
>> clc

If you now run the script by clicking “Run” on the graphical user interface, pressing F5 on the keyboard, or typing the script’s name patient_analysis on the command line (the file name without the extension), you’ll see a bunch of variables appear in the workspace and this output:

MATLAB

>> patient_analysis

OUTPUT

Patient 5:
High mean?
   0
Highest max?
   0
Lowest min?
   1

Remember, we supressed most outputs with ;, so the only lines printed are the ones with disp.

As you can see, the script ran every line of code in the script in order, and created any variable we asked for. Having the code in the script makes it much easier to follow what we are doing, and also make changes. For example, if we now want to look at patient 8, all we need to do is change the number in lines 10, 11 and 12. We can actually do a bit better, and replace that number with a variable patient_number.

This variable needs to exist before it is used, so lets insert it before computing the patient statistics, like so:

MATLAB

% Load patient data
patient_data = readmatrix("data/base/inflammation-01.csv");

% Compute global statistics
g_mean = mean(patient_data(:));
g_max = max(patient_data(:));
g_min = min(patient_data(:));

% Compute patient statistics
patient_number = 8;
p_mean = mean(patient_data(patient_number,:));
p_max = max(patient_data(patient_number,:));
p_min = min(patient_data(patient_number,:));

% Compare patient vs global
disp("Patient:")
disp(patient_number)
disp("High mean?")
disp(p_mean > g_mean)
disp("Highest max?")
disp(p_max == g_max)
disp("Lowest min?")
disp(p_min == g_min)

Note that we also changed the disp commands to show the right patient number.

Getting the results for whichever patient is now as simple as changing the value of patient_number.

For the case of patient 8, we get:

MATLAB

>> patient_analysis

OUTPUT

Patient:
   8
High mean?
   1
Highest max?
   0
Lowest min?
   1

Help text

A comment can appear on any line, but be aware that the first line or block of comments in a script or function is used by MATLAB as the help text. When we use the help command, MATLAB returns the help text. The first help text line (known as the H1 line) typically includes the name of the program, and a brief description. The help command works in just the same way for our own programs as for built-in MATLAB functions. You should write help text for all of your own scripts and functions.

Let’s write an H1 line at the top of our script:

MATLAB

% PATIENT_ANALYSIS   Computes mean, max and min of a patient and compares to global statistics.

We can then get help for our script by running

MATLAB

>> help patient_analysis

OUTPUT

  patient_analysis   Computes mean, max and min of a patient and compares to global statistics.

Script for plotting


You should already have a script from the previous lesson that plots the mean, max and min using a tiled layout. We will replicate that script, but add comments to make it easier to understand.

Create a new script in the current directory called plot_daily_average.m

MATLAB

>> edit src/plot_daily_average.m

In the script, lets recap what we need to do:

MATLAB

% PLOT_DAILY_AVERAGE   Plots daily average, max and min inflammation accross patients.

% Load patient data
patient_data = readmatrix("data/base/inflammation-01.csv");

fig = figure;

% Define tiled layout and labels
tlo = tiledlayout(1,3);
xlabel(tlo,"Day of trial")
ylabel(tlo,"Inflammation")

% Plot average inflammation per day
nexttile
plot(mean(patient_data, 1))
title("Average")

% Plot max inflammation per day
nexttile
plot(max(patient_data, [], 1))
title("Max")

% Plot min inflammation per day
nexttile
plot(min(patient_data, [], 1))
title("Min")

Note that we are explicitly creating a new figure window using the figure command.

Try this on the command line:

MATLAB

>> figure

MATLAB’s plotting commands only create a new figure window if one doesn’t already exist: the default behaviour is to reuse the current figure window as we saw in the previous episode. Explicitly creating a new figure window in the script avoids any unexpected results from plotting on top of existing figures.

Now lets run the script:

MATLAB

>> plot_daily_average

You should see the figure appear.

Try running plot_daily_average again without closing the first figure to see that it does not plot on top of the previous figure A second figure is created. If you look carefully, at the top it is labelled as “Figure 2”.

It is worth mentioning that it is possible to close all the currently open figures with close all.

Saving figures

We can ask MATLAB to save the image too using the saveas command. In order to maintain an organised project we’ll save the images in the results directory:

MATLAB

% Save plot in "results" folder as png image:
saveas(fig,"results/daily_average_01.png")

Getting the current figure

In the script we saved our figure as a variable fig. This is very useful because we can pass it as a reference, for example, for the saveas function. If we had not done that, we would need to pass the “current figure”. You can get the current figure with gcf, like so:

MATLAB

% Save plot in "results" folder as png image:
saveas(gcf,"results/daily_average_01.png")

You can also use gcf to test you are on the right figure, for example with

MATLAB

gcf == fig

Hiding figures

When saving plots to disk, it’s sometimes useful to turn off their visibility as MATLAB plots them. For example, we might not want to view (or spend time closing) the figures in MATLAB, and not displaying the figures could make the script run faster.

Let’s add a couple of lines of code to do this.

We can ask MATLAB to create an empty figure window without displaying it by setting its Visible property to 'off'. We can do this by passing the option as an argument to the figure creation: figure(Visible='off')

When we do this, we have to be careful to manually “close” the figure after we are doing plotting on it - the same as we would “close” an actual figure window if it were open. We can do so with the command close

Adding these two lines, our finished script looks like this:

MATLAB

% PLOT_DAILY_AVERAGE   Saves plot of daily average, max and min inflammation accross patients.

% Load patient data
patient_data = readmatrix("data/base/inflammation-01.csv");

fig = figure(Visible='off');

% Define tiled layout and labels
tlo = tiledlayout(1,3);
xlabel(tlo,"Day of trial")
ylabel(tlo,"Inflammation")

% Plot average inflammation per day
nexttile
plot(mean(patient_data, 1))
title("Average")

% Plot max inflammation per day
nexttile
plot(max(patient_data, [], 1))
title("Max")

% Plot min inflammation per day
nexttile
plot(min(patient_data, [], 1))
title("Min")

% Save plot in "results" folder as png image:
saveas(fig,"results/daily_average_01.png")

close(fig)

The scripts we’ve written make regenerating plots easier, and looking at individual patient’s data much simpler, but we still need to open the script, change the patient number, save, and run. In contrast, when we have used functions we can provide arguments, which are then used to do something. So, can we create our own functions?

Key Points

  • Save MATLAB code in files with a .m suffix.
  • The set of commands in a script get executed by calling the script by its name, and all variables are saved to the workspace. Be careful, this potentially replaces variables.
  • Comment your code to make it easier to understand using % at the start of a line.
  • The first line of any script or function (known as the H1 line) should be a comment. It typically includes the name of the program, and a brief description.
  • You can use help script_name to get the information in the H1 line.
  • Create new figures with figure, or new ‘invisible’ figures with figure(visible=‘off’). Remember to close them with close(), or close all.
  • Save figures with saveas(fig,"results/my_plot_name.png"), where fig is the figure you want to save, and can be replaced with gcf if you want to save the current figure.

Content from Making Choices


Last updated on 2023-12-04 | Edit this page

Estimated time: 40 minutes

Overview

Questions

  • How can programs make choices depending on variable values?

Objectives

  • Introduce conditional statements.
  • Test for equality within a conditional statement.
  • Combine conditional tests using AND and OR.
  • Construct a conditional statement using if, elseif, and else.

In the last lesson we began experimenting with scripts, allowing us to re-use code for analysing data and plotting figures over and over again. To make our scripts even more useful, it would be nice if they did different things in different situations - either depending on the data they’re given or on different options that we specify. We want a way for our scripts to “make choices”.

The tool that MATLAB gives us for doing this is called a conditional statement. We will use conditional statements together with the logical operations we encountered back in lesson 01. The simplest conditional statement consists starts with an if, and concludes with an end, like this:

MATLAB

num = 127;
disp('before conditional...')

if num > 100
    disp('The number is greater than 100')
end

disp('...after conditional')

OUTPUT

before conditional...
The number is greater than 100
...after conditional

Now try changing the value of num to, say, 53:

OUTPUT

before conditional...
...after conditional

MATLAB skipped the code inside the conditional statement because the logical operation returned false.

The choice making is not quite complete yet. We have managed to “do” or “not do” something, but we have not managed to choose between to actions. For that, we need to introduce the keyword else in the conditional statement, like this:

MATLAB

num = 53;
disp('before conditional...')

if num > 100
    disp('The number is greater than 100')
else
    disp('The number is not greater than 100')
end

disp('...after conditional')

OUTPUT

before conditional...
The number is not greater than 100
...after conditional

If the logical operation that follows is true, the body of the if statement (i.e., the lines between if and else) is executed. If the logical operation returns false, the body of the else statement (i.e., the lines between else and end) is executed instead. Only one of these statement bodies is ever executed, never both.

We can also “nest” a conditional statements inside another conditional statement.

MATLAB

num = 53;

disp('before conditional...')
if num > 100
    disp('The number is greater than 100')
else
    disp('The number is not greater than 100')
    if num > 50
        disp('But it is greater than 50...')
    end
end

disp('...after conditional')

OUTPUT

before conditional...
The number is not greater than 100
But it is greater than 50...
...after conditional

This “nesting” can be quite useful, so MATLAB has a special keyword for it. We can chain several tests together using elseif. This makes it simple to write a script that gives the sign of a number:

MATLAB

%CONDITIONAL_DEMO   Demo script to illustrate use of conditionals

num = 53;

if num > 0
    disp('num is positive')
elseif num == 0
    disp('num is zero')
else
    disp('num is negative')
end

Recall that we use a double equals sign == to test for equality rather than a single equals sign (which assigns a value to a variable).

During a conditional statement, if one of the conditions is true, this marks the end of the test: no subsequent conditions will be tested and execution jumps to the end of the conditional.

Let’s demonstrate this by adding another condition which is true.

MATLAB

% Demo script to illustrate use of conditionals
num = 53;

if num > 0
    disp('num is positive')
elseif num == 0
    disp('num is zero')
elseif num > 50
    % This block will never be executed
    disp('num is greater than 50')
else
    disp('num is negative')
end

We can also combine logical operations, using && (and) and || (or), as we did before:

MATLAB

if ((1 > 0) && (-1 > 0))
    disp('both parts are true')
else
    disp('At least one part is not true')
end

OUTPUT

At least one part is not true

MATLAB

if (1 < 0) || (3 < 4)
    disp('At least one part is true')
end

OUTPUT

at least one part is true

True and False Statements

The operations we tested above evaluate to a logical value: true or false. However these numerical comparison tests aren’t the only values which are true or false in MATLAB. For example, 1 is considered true and 0 is considered false. In fact, any value can be used in a conditional statement.

Run the code below in order to discover which values are considered true and which are considered false.

MATLAB

if ''
    disp('empty string is true')
else
    disp('empty string is false')
end

if 'foo'
    disp('non empty string is true')
else
    disp('non empty string is false')
end

if []
    disp('empty array is true')
else
    disp('empty array is false')
end

if [22.5, 1.0]
    disp('non empty array is true')
else
    disp('non empty array is false')
end

if [0, 0]
    disp('array of zeros is true')
else
    disp('array of zeros is false')
end

if true
    disp('true is true')
else
    disp('true is false')
end

Close Enough

Write a script called near that performs a test on two variables, and displays 1 when the first variable is within 10% of the other and 0 otherwise. Compare your implementation with your partner’s: do you get the same answer for all possible pairs of numbers?

MATLAB

%NEAR   Display 1 if variable a is within 10% of variable b
%       and display 0 otherwise
a = 1.1;
b = 1.2;

if a/b >= 0.9 && a/b <= 1.1
    disp(1)
else
    disp(0)
end

Scripts with choices


In the last lesson, we wrote a script that saved several plots to disk. It would nice if our script could be more flexible. Could we modify it so that it either saved the plots to disk or displayed them on screen? Could we do this in such a way to make it easy to change between the two behaviours? This is something that conditional statements allow us to do.

We introduce a variable save_plots that we can set to either true or false and modify our script so that when save_plots = true the plots are saved to disk, and when save_plots = false the plots are printed to the screen.

MATLAB

% PLOT_DAILY_AVERAGE_OPTION   Plots daily average, max and min inflammation across patients. If save_plots is set to 
% true, the figures are saved to disk. If save_plots is set to false, the figures are displayed on the screen.

% Load patient data
patient_data = readmatrix('data/base/inflammation-01.csv');

save_plots=true;

if save_plots == true
    figure(visible='off')
else
    figure
end

% Define tiled layout and labels
tlo = tiledlayout(1,3);
xlabel(tlo,'Day of trial')
ylabel(tlo,'Inflammation')

% Plot average inflammation per day
nexttile
plot(mean(patient_data, 1))
title('Average')

% Plot max inflammation per day
nexttile
plot(max(patient_data, [], 1))
title('Max')

% Plot min inflammation per day
nexttile
plot(min(patient_data, [], 1))
title('Min')

if save_plots == true 
    % Save plot in 'results' folder as png image:
    saveas(gcf,'results/daily_average_01.png')

    close()

end

Save the script in a file names plot_daily_average_option.m and investigate what setting the variable save_plots to true and false does.

Changing behaviour based on patient data

We’d like to improve our patient_analysis script from the previous lesson, specifically it’s output. Currently the script displays 0 or 1 to indicate whether or not the patient has a high mean, has a maximum equivalent to the highest in the dataset, and has a minimum equivalent to the lowest in the dataset. Instead, we’d like the script to print a line of descriptive text only when each of these is true:

  1. The mean inflammation for the patient is higher than the global mean.
  2. The maximum inflammation for the patient is the same as the global maximum.
  3. The minimum inflammation for the patient is the same as the global minimum.
  4. If none of the above is the case, then the script should print a line informing us that the patient’s mean, maximum and minimum inflammation are not remarkable.

Using the patient_analysis script from the previous lesson as a starting point (shown below for reference), can you use conditional statements to make a script that does this?

MATLAB

% Load patient data
patient_data = readmatrix('data/base/inflammation-01.csv');

% Compute global statistics
g_mean = mean(patient_data(:));
g_max = max(patient_data(:));
g_min = min(patient_data(:));

patient_number = 8;

% Compute patient statistics
p_mean = mean(patient_data(patient_number,:));
p_max = max(patient_data(patient_number,:));
p_min = min(patient_data(patient_number,:));

% Compare patient vs global
disp('Patient:')
disp(patient_number)
disp('High mean?')
disp(p_mean > g_mean)
disp('Highest max?')
disp(p_max == g_max)
disp('Lowest min?')
disp(p_min == g_min)

There are several different ways to do this, so compare your finished script with your neighbour and see if you did it the same way.

MATLAB

% Load patient data
patient_data = readmatrix('data/base/inflammation-01.csv');

% Compute global statistics
g_mean = mean(patient_data(:));
g_max = max(patient_data(:));
g_min = min(patient_data(:));

patient_number = 8;

% Compute patient statistics
p_mean = mean(patient_data(patient_number,:));
p_max = max(patient_data(patient_number,:));
p_min = min(patient_data(patient_number,:));

% Compare patient vs global
disp('Patient:')
disp(patient_number)

printed_something = false;

if p_mean > g_mean
    disp('Patient''s mean inflammation is higher than the global mean inflammation.')
    printed_something = true;
end

if p_max == g_max
    disp('Patient''s maximum inflammation is the same as the global maximum.')
    printed_something = true;
end

if p_min == g_min
    disp('Patient''s minimum inflammation is the same as the global minimum.')
    printed_something = true;
end

if printed_something == false
    disp('Patient''s mean, maximum and minimum inflammation are not of interest.')
end

Key Points

  • Use conditional statements to make choices based on values in your program.
  • A conditional statement block starts with an if and finishes with end. It can also include an else.
  • Use elseif to nest conditional statements.
  • Use && (and), || (or) to combine logical operations.
  • Only one of the statement bodies is ever executed.

Content from Creating Functions


Last updated on 2023-12-04 | Edit this page

Estimated time: 65 minutes

Overview

Questions

  • How can I teach MATLAB to do new things?
  • How can I make programs I write more reliable and re-usable?

Objectives

  • Learn how to write a function
  • Define a function that takes arguments.
  • Compare and contrast MATLAB function files with MATLAB scripts.
  • Recognise why we should divide programs into small, single-purpose functions.

Writing functions from scratch


It has come to our attention that the data about inflammation that we’ve been analysing contains some systematic errors. The measurements were made using the incorrect scale, with inflammation recorded in Arbitrary Inflammation Units (AIU) rather than the scientific standard International Inflmmation Units (IIU). Luckily there is a handy formula which can be used for converting measurements in AIU to IIU, but it involves some hard to remember constants:

MATLAB

inflammation_IIU = (inflammation_AIU + B)*A
B = 5.634
A = 0.275

There are twelve files worth of data to be converted from AIU to IIU: is there a way we can do this quickly and conveniently? If we have to re-enter the conversion formula multiple times, the chance of us getting the constants wrong is high. Thankfully there is a convenient way to teach MATLAB how to do new things, like converting units from AIU to IIU. We can do this by writing a function.

We have already used some predefined MATLAB functions which we can pass arguments to. How can we define our own?

A MATLAB function must be saved in a text file with a .m extension. The name of the file must be the same as the name of the function defined in the file.

The first line of our function is called the function definition and must include the special function keyword to let MATLAB know that we are defining a function. Anything following the function definition line is called the body of the function. The keyword end marks the end of the function body. The function only knows about code that comes between the function definition line and the end keyword. It will not have access to variables from outside this block of code apart from those that are passed in as arguments or input parameters. The rest of our code won’t have access to any variables from inside this block, apart from those that are passed out as output parameters.

A function can have multiple input and output parameters as required, but doesn’t have to have any. The general form of a function is shown in the pseudo-code below:

MATLAB

function [out1, out2] = function_name(in1, in2)
    % FUNCTION_NAME   Function description
    %    Can add more text for the function help
    %    An example is always useful!

    % This section below is called the body of the function
    out1 = calculation using in1 and in2;
    out2 = another calculation;
end

Just as we saw with scripts, functions must be visible to MATLAB, i.e., a file containing a function has to be placed in a directory that MATLAB knows about. Following the same logic we used with scripts, we will put our source code files in the src folder.

Let’s put this into practice to create a function that will teach MATLAB to use our AIU to IIU conversion formula. Create a file called inflammation_AIU_to_IIU.m in the src folder, enter the following function definition, and save the file:

MATLAB

function inflammation_in_IIU = inflammation_AIU_to_IIU(inflammation_in_AIU)
   % INFLAMMATION_AIU_TO_IIU  Convert inflammation mesured in AIU to inflammation measued in IIU.

   A = 0.275;
   B = 5.634;

   inflammation_in_IIU = (inflammation_in_AIU + B)*A;

end

We can now call our function as we would any other function in MATLAB:

MATLAB

>> inflammation_AIU_to_IIU(0.5)

OUTPUT

ans = 1.6869

We got the number we expected, and at first glance it seems like it is almost the same as a script. However, if you look at the variables in the workspace, you’ll probably notice one big difference. Although a variable called inflammation_in_IIU was defined in the function, it does not exist in our workspace.

Lets have a look using the debugger to see what is happening.

When we pass a value, like 0.5, to the function, it is assigned to the variable inflammation_in_AIU so that it can be used in the body of the function. To return a value from the function, we must assign that value to the variable inflammation_in_IIU from our function definition line. What ever value inflammation_in_IIU has when the end keyword in the function definition is reached, that will be the value returned.

Outside the function, the variables inflammation_in_AIU, inflammation_in_IIU, A, and B aren’t accessible; they are only used by in function body.

This is one of the major differences between scripts and functions: a script can be thought of as automating the command line, with full access to all variables in the base workspace, whereas a function has its own separate workspace.

To be able to access variables from your workspace inside a function, you have to pass them in as inputs. To be able to save variables to your workspace, it needs to return them as outputs.

As with any operation, if we want to save the result, we need to assign the result to a variable, for example:

MATLAB

>> val_in_IIU = inflammation_AIU_to_IIU(0.5)

OUTPUT

val_in_IIU = 1.6869

And we can see val_in_IIU saved in our workspace.

Writing your own conversion function

We’d like a function that reverses the conversion of AIU to IIU. Re-arrange the conversion formula and write a function called inflammation_IIU_to_AIU that converts inflammation measued in IIU to inflammation measured in AIU.

Remember to save your function definition in a file with the required name, start the file with the function definition line, followed by the function body, ending with the end keyword.

MATLAB

function inflammation_in_AIU = inflammation_IIU_to_AIU(inflammation_in_IIU)
   % INFLAMMTION_IIU_TO_AIU   Convert inflammation measured in IIU to inflammation measured in AIU.

   A = 0.275;
   B = 5.634;

   inflammation_in_AIU = inflammation_in_IIU/A - B;

end

Functions that work on arrays

One of the benefits of writing functions in MATLAB is that often they will also be able to operate on an array of numerical variables for free.

This will work when each operation in the function can be applied to an array too. In our example, we are adding a number and multiplying by another, both of which work on arrays.

This will make converting the inflammation data in our files using the function we’ve just written very quick. Give it a go!

Transforming scripts into functions


In the patient_analysis script we created, we can choose which patient to analyse by modifying the variable patient_number. If we want information about patient 13, we need to open patient_analysis.m, go to line 9, modify the variable, save and then run patient_analysis. This is a lot of steps for such a simple request.

Can we use what we’ve learned about writing functions to transform (or refactor) our script into a function, increasing its usefulness in the process?

We already have a .m file called patient_analysis, so lets begin by defining a function with that name.

Open the patient_analysis.m file, if you don’t already have it open. Instead of line 9, where patient_number is set, we want to provide that variable as an input. So lets remove that line, and right at the top of our script we’ll add the function definition telling MATLAB what our function is called and what inputs it needs. The function will take the variable patient_number as input and since we removed the line that assigned a value to that variable, the input will decide which patient is analysed.

MATLAB

function patient_analysis(patient_number)
    % PATIENT_ANALYSIS   Computes mean, max and min of a patient and compares to global statistics.
    %    Takes the patient number as an input, and prints the relevant information to console.
    %    Sample usage:
    %       patient_analysis(5)

    % Load patient data
    patient_data = readmatrix('data/base/inflammation-01.csv');

    % Compute global statistics
    g_mean = mean(patient_data(:));
    g_max = max(patient_data(:));
    g_min = min(patient_data(:));

    % Compute patient statistics
    p_mean = mean(patient_data(patient_number,:));
    p_max = max(patient_data(patient_number,:));
    p_min = min(patient_data(patient_number,:));

    % Compare patient vs global
    disp('Patient:')
    disp(patient_number)
    disp('High mean?')
    disp(p_mean > g_mean)
    disp('Highest max?')
    disp(p_max == g_max)
    disp('Lowest min?')
    disp(p_min == g_min)

end

Congratulations! You’ve now created a MATLAB function from a MATLAB script!

You may have noticed that the code inside the function is indented. MATLAB does not need this, but it makes it much more readable!

Lets clear our workspace and run our function in the command line:

MATLAB

>> clear
>> clc
>> patient_analysis(13)

OUTPUT

Patient 13:
High mean?
   0
Highest max?
   0
Lowest min?
   1

So now we can get the patient analysis of whichever patient we want, and we do not need to modify patient_analysis.m anymore. However, you may have noticed that we have no variables in our workspace. Remember, inside the function, the variables patient_data, g_mean, g_max, g_min, p_mean, p_max, and p_min are created, but then they are deleted when the function ends. If we want to save them, we need to pass them as outputs.

Lets say, for example, that we want to save the mean of each patient. In our patient_analysis.m we already compute the value and save it in p_mean, but we need to tell MATLAB that we want the function to return it.

To do that we modify the function definition like this:

MATLAB

function p_mean = patient_analysis(patient_number)

It is important that the variable name is the same that is used inside the function.

If we now run our function in the command line, we get:

MATLAB

p13 = patient_analysis(13)

OUTPUT

Patient 5:
High mean?
   0
Highest max?
   0
Lowest min?
   1

p13 =
    0.1049

We could return more outputs if we want. For example, lets return the min and max as well. To do that, we need to specify all the outputs in square brackets, as an array. So we need to replace the function definition for:

MATLAB

function [p_mean,p_max,p_min] = patient_analysis(patient_number)

To call our function now we need to provide space for all 3 outputs, so in the command line, we run it as:

MATLAB

[p13_mean,p13_max,p13_min] = patient_analysis(13)

OUTPUT

Patient 5:
High mean?
   0
Highest max?
   0
Lowest min?
   1
p13_mean =
    0.1049
p13_max =
    0.3450
p13_min =
     0

Callout

Note If you had not provided space for all the outputs, Matlab assumes you are only interested in the first one, so ans would save the mean.

Plotting daily average of different data files

Look back at the plot_daily_average script. The data and resulting image file names are hard-coded in the script. We actually have 12 datafiles. Turn the script into a function that lets you generate the plots for any of the files.

The function should operate on a single data file, and should have two parameters: data_file and plot_file. When called, the function should create the three graphs, and save the plot as plot_file.

You should mostly be reusing code from the plot_all script.

MATLAB

function plot_daily_average(data_file,plot_name)
    %PLOT_DAILY_AVERAGE   Plots daily average, max and min inflammation accross patients.
    %   The function takes the data in data_file and saves it as plot_name
    %   Example usage:
    %       plot_daily_average('data/base/inflammation-03.csv','results/plot3.png')

    % Load patient data
    patient_data = readmatrix(data_file);

    figure(visible='off')

    % Define tiled layout and labels
    tlo = tiledlayout(1,3);
    xlabel(tlo,'Day of trial')
    ylabel(tlo,'Inflammation')

    % Plot average inflammation per day
    nexttile
    plot(mean(patient_data, 1))
    title('Average')

    % Plot max inflammation per day
    nexttile
    plot(max(patient_data, [], 1))
    title('Max')

    % Plot min inflammation per day
    nexttile
    plot(min(patient_data, [], 1))
    title('Min')

    % Save plot in 'results' folder as png image:
    saveas(gcf,plot_name)

    close()
end

Plotting patient vs mean

Create a function called patient_vs_mean that generates a plot like this one:

Plotting patient vs mean

The function should have the following inputs:

  • per_day_mean - A 1D array with the average inflammation per day already loaded (you’ll have to load the data and compute per_day_mean before calling the function).

  • patient_data - A 1D array with the data for the patient of interest only.

  • patient_reference - A string that will be used to identify the patient on the plot, and also as a file name (you should add the extension png in your function).

When called, the function should create and save the plot as patient_reference.png in the results folder.

Look back at the previous lessons if you need to!

MATLAB

function patient_vs_mean(per_day_mean,patient_data,patient_reference)
    % PATIENT_VS_MEAN   Plots the global mean and patient inflammation on top of each other.
    %   per_day_mean should be a vector with the global mean.
    %   patient_data should be a vector with only the patient data.
    %   patient_reference will be used to identify the patient on the plot.
    %
    %   Sample usage:
    %       patient_data = readmatrix('data/base/inflammation-01.csv');
    %       per_day_mean = mean(patient_data);
    %       patient_vs_mean(per_day_mean,patient_data(5,:),"Patient 5")

    figure(visible='off')

    %Plot per_day_mean
    plot(per_day_mean,DisplayName="Mean")
    legend
    title('Daily average inflammation')
    xlabel('Day of trial')
    ylabel('Inflammation')

    %Overlap patient data
    hold on
    plot(patient_data,DisplayName=patient_reference)
    hold off

    % Save plot
    saveas(gcf,"results/"+patient_reference+".png")

    close()

end

Key Points

  • A MATLAB function must be saved in a text file with a .m extension. The name of the file must be the same as the name of the function defined in the file.
  • Define functions using the function keyword to start the definition, and close the definition with the keyword end.
  • Functions have an independent workspace. Access variables from your workspace inside a function by passing them as inputs. Access variables from the function returning them as outputs.
  • The header of a function with inputs an outputs has the form:

function [output_1,output_2,...] = function_name(input_1,input_2,...)

  • Break programs up into short, single-purpose functions with meaningful names.

Content from Repeating With Loops


Last updated on 2023-12-04 | Edit this page

Estimated time: 50 minutes

Overview

Questions

  • How can I repeat the same operations on multiple values?

Objectives

  • Explain what a for loop does.
  • Correctly write for loops that repeat simple commands.
  • Trace changes to a loop variable as the loops runs.
  • Use a for loop to process multiple files.

Recall that we have twelve datasets in total. We’re going to need a better way to analyse them all than typing out commands for each one, because we’ll find ourselves writing a lot of duplicated code. Code that is repeated in two or more places will eventually be wrong in at least one as our project develops over time. Also, if we make changes in the way we analyse our datasets, we have to introduce that change in every copy of our code. To avoid all of this repetition, we have to teach MATLAB to repeat our commands, and to do that, we have to learn how to write loops.

We’ll start with an example. Suppose we want to print each character in the word “lead” on a line of its own. One way is to use four disp statements:

MATLAB

%LOOP_DEMO   Demo script to explain loops

word = 'lead';

disp(word(1))
disp(word(2))
disp(word(3))
disp(word(4))

OUTPUT

l
e
a
d

But this is a bad approach for two reasons:

  1. It doesn’t scale: if we want to print the characters in a string that’s hundreds of letters long, we’d be better off typing them in.

  2. It’s fragile: if we change word to a longer string, it only prints part of the data, and if we change it to a shorter one, it produces an error, because we’re asking for characters that don’t exist.

MATLAB

%LOOP_DEMO   Demo script to explain loops

word = 'tin';

disp(word(1))
disp(word(2))
disp(word(3))
disp(word(4))

OUTPUT

error: A(I): index out of bounds; value 4 out of bound 3

There’s a better approach:

MATLAB

%LOOP_DEMO   Demo script to explain loops

word = 'lead';

for letter = 1:4
    disp(word(letter))
end

OUTPUT

l
e
a
d

This improved version uses a for loop to repeat an operation — in this case, printing to the screen — once for each element in an array.

The general form of a for loop is:

MATLAB

for variable = collection
    # Do things with variable
end

The for loop executes the commands in the loop body for every value in the array collection. This value is called the loop variable, and we can call it whatever we like. In our example, we gave it the name letter.

We have to terminate the loop body with the end keyword, and we can have as many commands as we like in the loop body. But, we have to remember that they will all be repeated as many times as there are values in collection.

Our for loop has made our code more scalable, and less fragile. There’s still one little thing about it that should bother us. For our loop to deal appropriately with shorter or longer words, we have to change the first line of our loop by hand:

MATLAB

%LOOP_DEMO   Demo script to explain loops

word = 'tin';

for letter = 1:3
    disp(word(letter))
end

OUTPUT

t
i
n

Although this works, it’s not the best way to write our loop:

  • We might update word and forget to modify the loop to reflect that change.

  • We could make a mistake while counting the number of letters in word.

Fortunately, MATLAB provides us with a convenient function to write a better loop:

MATLAB

%LOOP_DEMO   Demo script to explain loops

word = 'aluminum';

for letter = 1:length(word)
    disp(word(letter))
end

OUTPUT

a
l
u
m
i
n
u
m

This is much more robust code, as it can deal with words of arbitrary length. Loops are not only for working with strings, they allow us to do repetitive calculations regardless of data type. Here’s another loop that calculates the sum of all even numbers between 1 and 10:

MATLAB

%LOOP_DEMO   Demo script to explain loops

total = 0;
for even_number = 2 : 2 : 10
    total = total + even_number;
end

disp('The sum of all even numbers between 1 and 10 is:')
disp(total)

It’s worth tracing the execution of this little program step by step.

The debugger

We can use the MATLAB debugger to trace the execution of a program.

The first step is to set a break point by clicking just to the right of a line number on the - symbol. A red circle will appear — this is the break point, and when we run the script, MATLAB will pause execution at that line.

A green arrow appears, pointing to the next line to be run. To continue running the program one line at a time, we use the step button.

We can then inspect variables in the workspace or by hovering the cursor over where they appear in the code, or get MATLAB to evaluate expressions in the command window (notice the prompt changes to K>>).

This process is useful to check your understanding of a program, in order to correct mistakes.

This process is illustrated below: debugger-demo

Since we want to sum only even numbers, the loop index even_number starts at 2 and increases by 2 with every iteration. When we enter the loop, total is zero - the value assigned to it beforehand. The first time through, the loop body adds the value of the first even number (2) to the old value of total (0), and updates total to refer to that new value. On the next loop iteration, even_number is 4 and the initial value of total is 2, so the new value assigned to total is 6. After even_number reaches the final value (10), total is 30; since this is the end of the range for even_number the loop finishes and the disp statements give us the final answer.

Note that a loop variable is just a variable that’s being used to record progress in a loop. It still exists after the loop is over, and we can re-use variables previously defined as loop variables as well:

MATLAB

>> disp(even_number)

OUTPUT

10

Performing Exponentiation

MATLAB uses the caret (^) to perform exponentiation:

MATLAB

>> disp(5^3)

OUTPUT

125

You can also use a loop to perform exponentiation. Remember that b^x is just b*b*b*x times.

Let a variable b be the base of the number and x the exponent. Write a loop to compute b^x. Check your result for b = 4 and x = 5.

MATLAB

% Loop to perform exponentiation
b = 4;    % base
x = 5;    % exponent

result=1;
for i = 1:x
    result = result * b;
end

disp([num2str(b), '^', num2str(x), ' = ', num2str(result)])

Incrementing with Loops

Write a loop that spells the word “aluminum,” adding one letter at a time:

OUTPUT

a
al
alu
alum
alumi
alumin
aluminu
aluminum

MATLAB

% spell a string adding one letter at a time using a loop

word = 'aluminium';

for letter = 1:length(word)
    disp(word(1:letter))
end

Looping in Reverse

In MATLAB, the colon operator (:) accepts a stride or skip argument between the start and stop:

MATLAB

>> disp(1:3:11)

OUTPUT

1 4 7 10

MATLAB

>> disp(11:-3:1)

OUTPUT

11 8 5 2

Using this, write a loop to print the letters of “aluminum” in reverse order, one letter per line.

OUTPUT

m
u
n
i
m
u
l
a

MATLAB

% Spell a string in reverse using a loop

word = 'aluminium';

for letter = length(word):-1:1
    disp(word(letter))
end

Analyzing patient data from multiple files


We now have almost everything we need to process multiple data files using a loop and the plotting code in our plot_daily_average function from the last lesson.

We will need to generate a list of data files to process, and then we can use a loop to repeat the analysis for each file.

We can use the dir command to return a structure array containing the names of the files in the data directory. Each element in this structure array is a structure, containing information about a single file in the form of named fields.

MATLAB

>> files = dir('data/base/inflammation-*.csv')

OUTPUT

files =
  12×1 struct array with fields:
    name
    folder
    date
    bytes
    isdir
    datenum

To access the name field of the first file, we can use the following syntax:

MATLAB

>> filename = files(1).name;
>> disp(filename)

OUTPUT

inflammation-01.csv

To get the modification date of the third file, we can do:

MATLAB

>> mod_date = files(3).date;
>> disp(mod_date)

OUTPUT

06-Nov-2023 14:34:15

A good first step towards processing multiple files is to write a loop which prints the name of each of our files. Let’s write this in a script plot_all.m which we will then develop further:

MATLAB

%PLOT_ALL	Developing code to automate inflammation analysis

files = dir('data/base/inflammation-*.csv');

for i = 1:length(files)
	file_name = files(i).name;
	disp(file_name)
end

MATLAB

>> plot_all

OUTPUT

inflammation-01.csv
inflammation-02.csv
inflammation-03.csv
inflammation-04.csv
inflammation-05.csv
inflammation-06.csv
inflammation-07.csv
inflammation-08.csv
inflammation-09.csv
inflammation-10.csv
inflammation-11.csv
inflammation-12.csv

Another task is to generate the file names for the figures we’re going to save. Let’s name the output file after the data file used to generate the figure. So for the data set inflammation-01.csv we will call the figure inflammation-01.png. We can use the replace command for this purpose.

The syntax for the replace command is like this:

MATLAB

NEWSTR = replace(STR, OLD, NEW)

So for example if we have the string big_shark and want to get the string little_shark, we can execute the following command:

MATLAB

>> new_string = replace('big_shark', 'big', 'little');
>> disp(new_string)

OUTPUT

little_shark

GNU Octave

In Octave, the replace function doesn’t exist, but the strrep function is a direct replacement. The above example becomes

MATLAB

>> new_string = strep('big_shark', 'big', 'little')
little_shark

Recall that we’re saving our figures to the results directory. The best way to generate a path to a file in MATLAB is by using the fullfile command. This generates a file path with the correct separators for the platform you’re using (i.e. forward slash for Linux and macOS, and backslash for Windows). This makes your code more portable which is great for collaboration.

Putting these concepts together, we can now generate the paths for the data files, and the image files we want to save:

MATLAB

%PLOT_ALL	Developing code to automate inflammation analysis

files = dir('data/base/inflammation-*.csv');

for i = 1:length(files)
    file_name = files(i).name;

    % Generate string for image name
    img_name = replace(file_name, '.csv', '.png');

    % Generate path to data file and image file
    file_name = fullfile('data', 'base', file_name);
    img_name = fullfile('results',img_name);

    disp(file_name)
    disp(img_name)
end

OUTPUT

data/inflammation-01.csv
results/inflammation-01.png
data/inflammation-02.csv
results/inflammation-02.png
data/inflammation-03.csv
results/inflammation-03.png
data/inflammation-04.csv
results/inflammation-04.png
data/inflammation-05.csv
results/inflammation-05.png
data/inflammation-06.csv
results/inflammation-06.png
data/inflammation-07.csv
results/inflammation-07.png
data/inflammation-08.csv
results/inflammation-08.png
data/inflammation-09.csv
results/inflammation-09.png
data/inflammation-10.csv
results/inflammation-10.png
data/inflammation-11.csv
results/inflammation-11.png
data/inflammation-12.csv
results/inflammation-12.png

We’re now ready to modify plot_all.m to actually process multiple data files:

MATLAB

%PLOT_ALL   Print statistics for all patients.
%           Save plots of statistics to disk.

files = dir('data/base/inflammation-*.csv');

% Process each file in turn
for i = 1:length(files)
    file_name = files(i).name;

    % Generate strings for image names:
    img_name  = replace(file_name, '.csv', '.png');

    % Generate path to data file and image file
    file_name = fullfile('data', 'base', file_name);
    img_name  = fullfile('results', img_name);

    plot_daily_average(file_name, img_name);

end

We run the modified script using its name in the Command Window:

MATLAB

>> plot_all

The first three figures output to the results directory are as shown below:

inflammation-01.png
inflammation-02.png
inflammation-03.png

We’ve now automated the generation of these figures for all the data stored in our data folder. With minor modifications, this script could be re-used to check all our future data files.

Investigating patients with a high mean

We’re particularly interested in patients who have a mean inflammation higher than the global mean.

Write a script called plot_high_mean_patients that reads in the file inflammation-01.csv and compares the patients mean inflammation to the global mean. If their mean inflammation is greater than the global inflammation, use the function patient_vs_mean to save a plot of their inflammation to disk for later analysis. Use both for loops and conditional statements to do this.

Using what you’ve learned about dealing with multiple files, turn this script into a function that takes the filename of a data file as input and run it on all of the inflammation data files.

MATLAB

% PLOT_HIGH_MEAN_PATIENTS   Saves plots of patients with mean inflammation higher than the global mean inflammation.

patient_data = readmatrix('data/base/inflammation-01.csv');

per_day_mean = mean(patient_data);
global_mean =  mean(patient_data(:));

number_of_patients = size(patient_data,1);

for patient_id = 1:number_of_patients

    patient_mean = mean(patient_data(patient_id,:));

    if(patient_mean > global_mean)
        patient_reference = "Patient " + string(patient_id)
        patient_vs_mean(per_day_mean, patient_data(patient_id,:), patient_reference)
    end

end

Key Points

  • Use for to create a loop that repeats one or more operations.