Four ways of seeing matrix multiplication

The act of multiplying two matrices is common and trivial (though occasionally error prone!). However, most learn only a single way of viewing the process. The following will explore the most common way of doing matrix multiplication, followed by three more uncommon (but equally valid) ways.

The first way.

We’re going to use two toy matrices throughout our exercises here, matrices A and B:

A = \begin{bmatrix}  2&3 \\  4&5  \end{bmatrix}  B = \begin{bmatrix}  1&3 \\  7&9  \end{bmatrix}

The most common way of viewing the product of two matrices is one where each individual element or coordinate of the result matrix is the dot product of a row vector and column vector from the first and second matrices, respectively. Therefore the value of


is 23 because,

\begin{bmatrix}  2\\  3  \end{bmatrix}^T  \begin{bmatrix}  1\\  7  \end{bmatrix} = 23

The process of dotting the vector combinations continue until the final result is produced:

\begin{bmatrix}  23&33\\  39&57  \end{bmatrix}

Graphically what we’ve done looks something like this:


We’ve taken the red vector, dotted it with the blue vector, and produced the green dot (the dot product).

The second and third way

Linear algebra is big on the notion of linear combinations and so to see the product of matrices as linear combinations is reasonable. Let’s view a column in the product as a linear combination of the columns of the multiplier. The scalers for each column in the first matrix come from the vectors of the multiplicand. We can therefore see the first column of the product of our toy matrices as the following linear combination:

1*\begin{bmatrix}  2\\  4  \end{bmatrix} + 7 *  \begin{bmatrix}  3\\  5  \end{bmatrix} =  \begin{bmatrix}  23\\  39  \end{bmatrix}

and the second column as:

3*\begin{bmatrix}  2\\  4  \end{bmatrix} + 9 *  \begin{bmatrix}  3\\  5  \end{bmatrix} =  \begin{bmatrix}  33\\  57  \end{bmatrix}

We can again represent the operation graphically like this:


Where we multiply the red columns by the blue column to produce the green column.

Alternatively we can look at a row in the result as the linear combination of rows from the multiplicand (the “third way”).

So we take row elements from row 1 of the multiplier as scalers for the row vectors of the multiplicand:

2*\begin{bmatrix}  1\\  3  \end{bmatrix} + 3 *  \begin{bmatrix}  7\\  9  \end{bmatrix} =  \begin{bmatrix}  23\\  33  \end{bmatrix}

And the same for the second row:

4*\begin{bmatrix}  1\\  3  \end{bmatrix} + 5 *  \begin{bmatrix}  7\\  9  \end{bmatrix} =  \begin{bmatrix}  39\\  57  \end{bmatrix}

And finally we can see the operation graphically:


Where we are multiplying the blue rows by the red row producing the green one.

The fourth way

The fourth and final way is a divergence from what we’ve seen previously.

Recall that multiplying two vectors can result in a vector or a full matrix depending upon the shape and order of the vectors being multiplied. What we did in the preceding two “ways” was to multiply vectors in such a manner as to collapse the product into a vector. But we can multiply differently to produce a matrix for each operation. This will require a summation of the resultant matrices to produce our final product. Let’s turn to our example matrices.

Multiply the first column of the first matrix by the second row of the second matrix to produce a matrix:

\begin{bmatrix}  2\\  4  \end{bmatrix} *  \begin{bmatrix}  1&3  \end{bmatrix} =   \begin{bmatrix}  2&6\\  4&12  \end{bmatrix}

Then multiply the second column of the first matrix by the second row of the second matrix:

\begin{bmatrix}  3\\  5  \end{bmatrix} *  \begin{bmatrix}  7&9  \end{bmatrix} =   \begin{bmatrix}  21&27\\  35&45  \end{bmatrix}

We then sum the products:

\begin{bmatrix}  2&6\\  4&12  \end{bmatrix} + \begin{bmatrix}  21&27\\  35&45  \end{bmatrix} =  \begin{bmatrix}  23&33\\  39&57  \end{bmatrix}

And if we compare that with our previous results we see it checks out.

An intuitive understanding of optical flow

It seems in every field there are vast array of tools and tricks used by people who really don’t understand them. This is okay in some situations, but often one finds taking the time to delve a bit deeper offers benefits not otherwise found by “just using it” (including the simple joy of figuring out how things work, of course). In the field of computer vision one such tool is called optical flow, and it is a method for computing vector fields describing motion (magnitude as well as direction) in video streams. Often when you see explanations of optical flow they’re purely mathematical or, worse, simply showing what functions to call in whatever computer vision library you happen to be using (such as OpenCV); neither of which offer great insight into why it works. The point of this post is to walk the reader through the math (gently!) and then to describe why it works the way it works.

The basics

I mentioned optical flow computes vectors signifying motion or flow within a sequence of video frames. The motion could be a dog walking across the field of view, a car driving by and so on. By “flow” I’m referring to the pixels of varying intensity and/or color flowing across the image due to said motion. Take this animated GIF:


The first thing you’ll see is the silver car lurching forward in the two image sequence. It can be said the pixels representing the silver car flow “forward” (or at an angle towards the viewer); it is the task of optical flow to accurately represent this flow in a form we can use.

“In a form we can use”, in many instances, means in a form that can be leveraged to predict where the object in motion will likely end up in the next frame. Optical flow does this by providing us with a vector for each pixel or super-pixel. I simply annotated the car image to make my point below:


Here you can see vectors assigned to pixels or pixel groups signifying the direction of motion of the object being analyzed (which in this instance is the silver car). Optical flow, if implemented correctly and used under the right conditions, provides this for us.


Optical flow bases itself on a number of assertions, the most important of which is the brightness consistency restraint. The restraint essentially claims that as a pixel moves from point 1 to point 2 in the scene it will maintain the same brightness intensity (in gray-scale images). This is crucial to a lot of the building blocks to optical flow. Mathematically this can be expressed as:

I(x,y,t) = I(x+\Delta x,y+\Delta y,t + \Delta t)

Where I() as a function returns the intensity of the pixel given parameters x, y and t (x-coordinate, y-coordinate and time-coordinate respectively). This essentially means as the point undergoes \Delta x and \Delta y over \Delta t, the position but not the intensity changes.

Typically the mathematical statements of optical flow then leverage a Taylor series approximation of I(x+\Delta x,y+\Delta y,t + \Delta t) so as to express it as a sum of first order partial derivatives (with the higher order terms dropped out).

I(x+\Delta x,y+\Delta y,t + \Delta t) = I(x,y,t) + \frac{\partial I}{\partial x}\Delta x + \frac{\partial I}{\partial y}\Delta y + \frac{\partial I}{\partial t}\Delta t + H.O.T

Doing this affords us a realization: that  \frac{\partial I}{\partial x}\Delta x + \frac{\partial I}{\partial y}\Delta y + \frac{\partial I}{\partial t}\Delta t is equal to zero (recall the pixel intensity does not change, according to optical flow). This gives us a nifty restraint we can place on pixels: that at a pixel, the change in the image’s intensity with respect to x (the gradient in the x direction) plus the image’s intensity with respect to y (the gradient in the y direction) plus the change in that pixel’s intensity over the time it took to move effectively sums to zero. This is a deep realization with optical flow, so let’s devote a bit of time to understanding what it means.

Understanding the restraint

Let’s state it again:

\frac{\partial I}{\partial x}\Delta x + \frac{\partial I}{\partial y}\Delta y + \frac{\partial I}{\partial t}\Delta t = 0

And now let’s think a bit about what this means by examining how a human notices motion. Take the following image with a vertical gradient:


Imagine we were to stretch this gradient ten miles to the left and ten miles to the right, so that we had a long, thin rectangle filled with this particular pattern. Now imagine we moved it, slowly, to the left. Because one cannot see any unique color changes in the rectangle as it slides in this direction (in particular, perpendicular to the gradient), it would be very difficult for a human to note the rectangle was moving at all (recall that we do not see the ends of the rectangle).


Correspondingly, if we were to stretch the gradient vertically ten miles up and ten miles down and began to move it downwards, over time we’d eventually see the rectangle growing darker as we encountered the darker and darker portions of the gradient before our eyes.

This tells us something important about human vision: if something is moving but it’s color and brightness doesn’t change, we have no way of knowing it’s moving — our eyes play a trick on us! But if there is a change in color and intensity in front of us, we can extrapolate something quite interesting: if we never divert our eyes from one point in front of us, know the rate at which the object darkened over a certain unit distance, and keep track of how long it takes the single point we’re observing to darken, we can determine the velocity of the moving object (because the change in brightness equates to distance traveled). And this is precisely what optical flow is doing.

Looping back

Before we get too far ahead of ourselves, let’s return to the formula we worked out previously, namely:

\frac{\partial I}{\partial x}\Delta x + \frac{\partial I}{\partial y}\Delta y + \frac{\partial I}{\partial t}\Delta t = 0

Let’s fiddle with the statement a bit. By subtracting the partial derivative with respect to time from both sides, we come up with the following equality:

\frac{\partial I}{\partial x}\Delta x + \frac{\partial I}{\partial y}\Delta y = -\frac{\partial I}{\partial t}\Delta t

We can phrase this relationship in words as the following: the direction of the intensity gradient with respect to the point <x,y> in the x direction is inversely proportional to the change in intensity of <x,y> with respect to time in the x direction (and likewise in the y direction). This is simply a fancier way of expressing what we already did above.

To round out the equation, let’s then divide both sides by \Delta t because resulting in:

\frac{\partial I}{\partial x}\frac{\Delta x}{\Delta t} + \frac{\partial I}{\partial y}\frac{\Delta y}{\Delta t} = -\frac{\partial I}{\partial t}

The change in x over time means the velocity in the x-direction; the change in y over time means the velocity in the y-direction and so we can finish our statement as such:

\frac{\partial I}{\partial x}V_x + \frac{\partial I}{\partial y}V_y = -\frac{\partial I}{\partial t}

We now have an equality that can be used to determine the velocity of a pixel in the x-direction and y-direction, give that we can compute the change in the image intensity with respect to the x-direction, the y-direction and the “t-direction” (time). However, we also immediately see we have one equation with two unknowns, meaning it cannot be solved as-is. There are many ways in which we can solve this particular problem (called the aperture problem); one of the most famous was one developed by Lucas and Kanade, known as the Lucas-Kanade method.

Lucas-Kanade Method

Lucas-Kanade works by restraining our problem to a window of pixels as opposed to looking at a single pixel. This means we can develop a system of linear equations which can therefore be used to approximate (using a least-squares approximation) the average velocity of the pixels within the window. The linear system is constructed in this manner.

Let I_x = \frac{\partial I}{\partial x} and I_y = \frac{\partial I}{\partial y} and I_t = \frac{\partial I}{\partial y} so that our restraint above can then be stated as I_xV_x + I_yV_y = -I_t

We can then break out things in Ax = b form:

A = \begin{bmatrix}  I_{x1} & I_{y1}\\  I_{x2} & I_{y1} \\  ... & ... \\  I_{xn} & I_{yn}  \end{bmatrix}    x = \begin{bmatrix}  V_x \\  Y_x  \end{bmatrix}    b = \begin{bmatrix}  -I_{t1}\\  -I_{t2}\\  ...\\  -I_{tn}\\  \end{bmatrix}

And solve for x using standard least squares approximation.

x = (A^TA)^{-1}A^Tb

If you made it this far, congrats. Here’s some Kenny Loggins.

Making your own weather forecast Part 2

Now we need to download some data to process.

One thing that tripped me up initially (being relatively naive about how weather models worked) was the simple fact the WRF system requires the output of other models to run a prediction. Why? WRF is what they call a mesoscale or local forecasting model, meaning it doesn’t try to understand what the whole world is doing, it simply focuses on one little area (such as a state or region). Therefore it requires other global models (such as the GFS) telling it what’s going on outside its focused area (what weather patterns are going into the forecast area, for example). So… WRF takes as input a global model and does a far more detailed analysis on the local level to give a forecast better than what the GFS alone could do (in theory).*

So let’s download GFS data for our model run. What I do is simply head on over to NOAA and find a range of GFS products (products = output) I’d like to process here. A ton of data is stored here, and you’ll see there are multiple folders with the same date. It’s really kind of a dumping grounds of output data and can be quite daunting to get your bearings. So let’s slowly walk through it.

At first you’ll see a range of folders, these are the output of GFS runs done at different times during the day, so in theory the latest one is going to be the most accurate. For me — doing this on August 22nd of 2014, the following folders appear waaaay at the bottom of the page:


You see there’s a GFS model run for the 22nd of August at 00-hours, 06-hours and at 12-hours and 18-hours. Let’s jump into the gfs.2014082218 folder (18-hours) since that’d be the best available forecast we have. Doing so puts me at this URL:

Now, there’s a TON of data in here. It’s all various output from global model runs, but what we’re looking for is data packaged into a particular data format (grib2). All of the files have a special name format (so you can do regular expression matches to get the exact data you need), but for our purposes let’s simply do a search for files containing the character sequence “pgrbf“.

Doing this search results in files being highlighted as such:


We’re interested in the 16 megabyte files ending in *.grib2. So let’s go ahead and download the files from time 00 to time 09 (the ones we see in the snippet above).

To do so on Linux, I typically move into the directory containing all my other directories (such as WPS and WRFV3) and create a new one called Data. From within this directory I then execute wget to download everything, so something like this:


…and so on until all the files I’m interested in are downloaded (typically I write a script to do this, but that’s up to you). When done, the directory should look like this:



And now the magic begins.

  1. Move into the WPS directory and execute the ./link_grib.csh script. What this will do is generate symbolic links to the grib2 data files within the WPS directory. So we execute ./link_grib.csh ../DATA/
  2. Listing the directory contents now reveals a sequence of files named GRIBFILE.*** where *** is a unique moniker suffix. These files are symbolic links to your recently downloaded GRIB2 data files.
  3. Now we’re going to unpackage (or “ungrib”) these datafiles. To do so, we’re going to use a tool called ungrib.exe. Ungrib.exe uses a configuration file to know the fields to unpack from the GFS GRIB2 files, so we’ll create a link to that configuration file by typing ln -s ungrib/Variable_Tables/Vtable.GFS Vtable.
  4. One last step before running ungrib.exe: we need to now update the namelist.wps configuration file so ungrib knows what range of data to unpack. Unfortunately it’s not immediately obvious as to what dates are encoded within the GFS files, so you’ll need to use a utility to inspect the files and return the dates they harbor called wgrib2. Instructions for compilation is here. For me this was painless (I also copied the binary to /usr/local/bin).To find the start time, execute this command against the GFS forecast file at 00 and 09 hours:start_end_dates…then take the output dates and update the namelist.wps file within the WPS directory. The contents of the file should look like this (given the output we get above):
    max_dom = 1,
    start_date = '2014-08-22_18:00:00',
    end_date = '2014-08-23_03:00:00',
    interval_seconds = 10800
  5. Now launch ./ungrib.exe. When all is done you should have a series of output files beginning with “FILE:”. These are the unpackaged GRIB2 files in an intermediate format for use by WPS.
  6. We now need to launch geogrid.exe, which is used to generate an understanding of the local terrain for the weather model run. To do so we need to, again, update our namelist.wps configuration file with the geospatial coordinates of the region by placing the following immediately after the share section from the previous step (be sure to update geog_data_path to point to the directory within which you unpacked the geog data from the previous post). NOTE: For Lincoln, NE (my hometown), you can simply copy/paste what I put here, otherwise update with your own coordinates (the lat/lon values should be the only ones you need to change).
    e_we = 75,
    e_sn = 70,
    dx = 30000,
    dy = 30000,
    map_proj = 'lambert',
    ref_lat = 40.8106,
    ref_lon = -96.6803,
    truelat1 = 40.8106,
    truelat2 = 40.8106,
    stand_lon = -96.6803,
    geog_data_path = '/home/brush/Downloads/weather/geog/'
  7. Now run geogrid.exe.  As it’s executing you’ll see it process a number of geographical datapoints in preparation of the full run.
  8. metgrid.exe is then ran. This bridges the gap between the GFS data and geographical data by horizontally interpolating the former over the later, so basically it connects the two together. This requires another addition to namelist.wps, albeit a shorter and simpler one:
    fg_name = 'FILE'
    io_form_metgrid = 2,

    Now execute metgrid.exe.

  9. We are now done with the WPS directory, meaning we have prepped all of the data and are now ready for the actual model run. To do this we will update one more configuration file and execute two applications located within the WRFV3/test/em_real directory. So change into that directory and open up namelist.input. Within this file you’ll immediately see a bunch of variables, but luckily we only have to change a few. Update the following fields under the “&time_control” section:
    run_days = 0,
    run_hours = 9,
    run_minutes = 0,
    run_seconds = 0,
    start_year = 2014,
    start_month = 08,
    start_day = 22,
    start_hour = 18,
    start_minute = 00,
    start_second = 00,
    end_year = 2014,
    end_month = 08,
    end_day = 23,
    end_hour = 03,
    end_minute = 00,
    end_second = 00,

    Now update the following fields under the “&domains” section…

    time_step = 180,
    time_step_fract_num = 0,
    time_step_fract_den = 1,
    max_dom = 1,
    e_we = 75,
    e_sn = 70,
    e_vert = 35,
    p_top_requested = 5000,
    dx = 30000, 10000, 3333.33,
    dy = 30000, 10000, 3333.33,
  10. To launch the two apps responsible for actually generating the model we need to install mpirun (this is part of MPI, a system that enables distributed computing), and you can do so with your favorite package manager (I installed openmpi).
  11. Copy all of the files generated by metgrid (starting with “met_em”) into your current directory.
  12. Run  mpirun -np 1 real.exe
  13. Then run mpirun -np 8 wrf.exe ..sit back and wait….
  14. When done you should have a file starting with “wrfout”; this is the model output! You’re done! You can then work with this file using any number of WRF post processing utilities and languages.

* It’s worth noting WRF is capable of doing a global analysis by way of “nesting forecasts”, but it’s not very well supported and not recommended for accurate forecasts (most people I’ve talked to consume GFS or other global model output).

Making your own weather forecast Part 1

It’s totally possible given the WRF model (which stands for the Weather Research and Forecast). Here I’m going to give you a step-by-step guide to downloading the 3.6.1 version of the software, installing it, and then also downloading the needed data to make your own custom forecast. Since I live in Nebraska, I’m going to set it up to do a model run for Lincoln (the capital), but you can easily make the necessary changes to do the run for whatever location you’d like.

The Prerequisites: 

First, a few things need to be in order. This stuff is meant for Linux. You can apparently run it on Windows using Cygwin, but it appears to be happiest when installed and ran under a Linux distribution. Mine is Ubuntu 14.04, 64-bit. Your mileage may vary depending on which flavor of Linux you use, but for any Debian-based systems these instructions should go smoothly. It’s worth noting I’m by no means an expert in Linux and I was able to get this thing going just fine; so hopefully this guide will help for even novices.

You’re going to need some disk space, not a lot, but you will need a few Gig to spare. That’s because the data you’ll be downloading to get this thing to run isn’t insignificant, and neither is the data it generates.

One final thing to bear in mind is this system, as is common for many “systems” running on Linux it seems, is made up of multiple programs, all working in concert with one another. Typically the output from one is taken and fed into another, and so the chain continues. It can be slightly daunting at first to know who is talking to whom, so I recommend glancing around this page and reading up on the 30,000-foot view (I won’t talk about it here since it’s fairly well described in the link provided). The reason I bring this up is because there is no single source code package, you’ll need to download and compile and setup multiple programs (such is life); yet they all fall under the title of the “Model System” as I’m describing it.

The Setup: 

One final thing before we get started — I wrote this guide alongside doing another install (for a second time on a different Linux box). The order isn’t necessarily important as it’s just how I did it…if you notice any irregularities and would like to mix it up a bit, it’s likely you’ll be able to do so.

  1. Download the source for the latest version of the WPS program here.
  2. Download the source for the latest version of the WRF program here.
  3. Download the geographical input data for the model system here. A couple things to note about this step — first, what you’re downloading here is terrain data, etc. that allows the model to properly build a view of the environment around which it’s generating its predictions. Second, it’s really not important to know what this data contains (unless you really want to). And finally, they don’t make it easy to know what, exactly, you are to download (as there are different versions of this for different versions of the software). If the page to which I link to above hasn’t changed much, the link to click for the data is titled “Download Complete Dataset”, and it’s actually the header for a table column.
  4. Now untar/gzip the files you’ve downloaded. I downloaded everything into a single directory to make management easier. Once all is unpackaged, there are three folders for me named geog, WPS and WRFV3 (hint, to stay in sync with this tutorial, you should rename and folders appropriately).
  5. Next we download some third-party packages needed for building the WPS/WRF programs. Start by downloading HDF5 (a library for manipulating data formats used by WRF) from here. Once again, untar/gzip the file. Move into the directory and configure the install by typing ./configure –prefix=/usr/local. Once this is done, type make install to install the package. NOTE: Make sure you’re running sudo.
  6. There’s also a third party package called NetCDF also used for processing specialized data formats used by WRF. There appears to be a version for both the C and Fortran compilers, for completeness I downloaded and installed both (though I have a suspicion only the Fortran version is needed). To download the C and Fortran versions go here. Unpackage them and for each ./configure –prefix=/usr/local.  And, yet again for each, make install.
  7. Before proceeding you’ll have to set an environment variable that’ll be used by the compilation scripts to locate netcdf. If you’re using bash as your shell, type export NCARG_ROOT=/usr/local. 
  8. NOTE: If you don’t have gfortran –the GNU Fotran compiler — installed, you should do so now (on Ubuntu, you can use the aptitude or apt-get package manager to do so). NOTE: The following compile scripts are written for CSH, not bash, so you’ll have to make sure you have csh installed (again, use apt-get, aptitude or your package manager of choice to get it).
  9. Now cd into the WRFV3 directory and type ./configure. When executing the script for me I am prompted twice, once for the location to the /include directories for NetCDF and again for the /lib directories to NetCDF. Type them in accordingly. Next you’ll be prompted for the build chain to use when compiling the WRFV3 package – I chose option #32 (x86_64 Linux, gfortran compiler with gcc   (serial)). One final prompt comes your way, asking for nesting options (I chose option #1, or basic). Now you’ll want to actually do the compilation, and since we’ll be running real world simulations, we’ll tell the build system to set things up accordingly by typing ./compile em_real. And away it goes….
  10. Now to install some more third-party packages required for compiling the WPS package. Go here and search for the phrase “external libraries”, you’ll see three: Jasper, PNG and zlib. Download each of them, ./configure –prefix=/usr/local and make install each accordingly (seriously, this was a pretty easy setup for me).
  11. Next cd into the WPS directory and type ./configure. Like with the WRFV3 script, this will prompt you for the location of the netcdf include and library directories, which you will specify just the same. It will then prompt you for the compilers (both Fortran and C) to use for building WPS. On my system (see above) I chose option #1 (Linux x86_64, gfortran    (serial)). Once you select your option, it’ll prompt you to type “compile” to start the compilation process and you should do that now.
  12. If all compiled well, you’re done!

In my next post I’ll describe how to download data and actually stitch all of these programs together.