Chain Rule

Learning Objectives

  • State the chain rules for one or two independent variables.
  • Use tree diagrams as an aid to understanding the chain rule for several independent and intermediate variables.

Chain Rules for One or Two Independent Variables

Recall that the chain rule for the derivative of a composite of two functions can be written in the form

[latex]\large{\frac{d}{dx}(f(g(x)))=f'(g(x))g'(x)}.[/latex]

In this equation, both [latex]f(x)[/latex] and [latex]g(x)[/latex] are functions of one variable. Now suppose that [latex]f[/latex] is a function of two variables and [latex]g[/latex] is a function of one variable. Or perhaps they are both functions of two variables, or even more. How would we calculate the derivative in these cases? The following theorem gives us the answer for the case of one independent variable.

theorem: Chain rule for one independent variable


Suppose that [latex]x=g(t)[/latex] and [latex]y=h(t)[/latex] are differentiable functions of [latex]t[/latex] and [latex]z=f(x, y)[/latex] is a differentiable function of [latex]x[/latex] and [latex]y[/latex]. Then [latex]z=f(x(t), y(t))[/latex] is a differentiable function of [latex]t[/latex] and

[latex]\LARGE{\frac{dz}{dt}=\frac{\partial{z}}{\partial{x}}\cdot\frac{dx}{dt}+\frac{\partial{z}}{\partial{y}}\cdot\frac{dy}{dt}},[/latex]

where the ordinary derivatives are evaluated at [latex]t[/latex] and the partial derivatives are evaluated at [latex](x, y)[/latex].

Proof

The proof of this theorem uses the definition of differentiability of a function of two variables. Suppose that [latex]f[/latex] is differentiable at the point [latex]P(x_0, y_0)[/latex], where [latex]x_0=g(t_0)[/latex] and [latex]y_0=h(t_0)[/latex] for a fixed value of [latex]t_0[/latex]. We wish to prove that [latex]z=f(x(t), y(t))[/latex] is differentiable at [latex]t=t_0[/latex] and that the Chain Rule for One Independent Variable holds at that point as well.

Since [latex]f[/latex] is differentiable at [latex]P[/latex], we know that

[latex]\large{z(t)=f(x,y)=f(x_0,y_0)+f_x(x_0,y_0)(x-x_0)+f_y(x_0,y_0)(y-y_0)+E(x,y)},[/latex]

where [latex]\displaystyle{\lim_{(x,y)\to(x_0,y_0)}}\frac{E(x,y)}{\sqrt{(x-x_0)^2+(y-y_0)^2}}=0[/latex]. We then subtract [latex]z_0=f(x_0,y_0)[/latex] from both sides of this equation:

[latex]\hspace{3cm}\large{\begin{alignat}{2} z(t)-z(t_0) &= f(x(t),y(t))-f(x(t_0),y(t_0)) \\ &= f_x(x_0,y_0)(x(t)-x(t_0))+f_y(x_0,y_0)(y(t)-y(t_0))+E(x(t),y(t)). \\ \end{alignat}}[/latex]

Next, we divide both sides by [latex]t-t_0[/latex]:

[latex]\large{\frac{z(t)-z(t_0)}{t-t_0}=f_x(x_0,y_0)\left(\frac{x(t)-x(t_0)}{t-t_0}\right)+f_y(x_0,y_0)\left(\frac{y(t)-y(t_0)}{t-t_0}\right)+\frac{E(x(t),y(t))}{t-t_0}}.[/latex]

Then we take the limit as [latex]t[/latex] approaches [latex]t_0[/latex]:

[latex]\large{\displaystyle\lim_{t\to t_0}\frac{z(t)-z(t_0)}{t-t_0} = f_x(x_0,y_0)\displaystyle\lim_{t\to t_0}\left(\frac{x(t)-x(t_)}{t-t_0}\right)+f_y(x_0,y_0)\displaystyle\lim_{t\to t_0}\left(\frac{y(t)-y(t_0)}{t-t_0}\right)+ \displaystyle\lim_{t\to t_0}\frac{E(x(t),y(t))}{t-t_0}}.[/latex]

The left-hand side of this equation is equal to [latex]dz/dt[/latex], which leads to

[latex]\large{\frac{dz}{dt}=f_x(x_0,y_0)\frac{dx}{dt}+f_y(x_0,y_0)\frac{dy}{dt}+\displaystyle\lim_{t\to t_0}\frac{E(x(t),y(t))}{t-t_0}}.[/latex]

The last term can be rewritten as

[latex]\hspace{3cm}\large{\begin{alignat}{2} \displaystyle\lim_{t\to t_0}\frac{E(x(t),y(t))}{t-t_0} &= \displaystyle\lim_{t\to t_0}\left(\frac{E(x,y)}{\sqrt{(x-x_0)^2+(y-y_0)^2}}\frac{\sqrt{(x-x_0)^2+(y-y_0)^2}}{t-t_0}\right) \\ &= \displaystyle\lim_{t\to t_0}\left(\frac{E(x,y)}{\sqrt{(x-x_0)^2+(y-y_0)^2}}\right)\displaystyle\lim_{t\to t_0}\left(\frac{\sqrt{(x-x_0)^2+(y-y_0)^2}}{t-t_0}\right). \\ \end{alignat}}[/latex]

As [latex]t[/latex] approaches [latex]t_0, (x(t), y(t))[/latex] approaches [latex](x(t_0), y(t_0))[/latex], so we can rewrite the last product as

[latex]\displaystyle\lim_{(x,y)\to (x_0,y_0)}\left(\frac{E(x,y)}{\sqrt{(x-x_0)^2+(y-y_0)^2}}\right)\displaystyle\lim_{(x,y)\to (x_0,y_0)}\left(\frac{\sqrt{(x-x_0)^2+(y-y_0)^2}}{t-t_0}\right)[/latex]

Since the first limit is equal to zero, we need only show that the second limit is finite:

[latex]\hspace{3cm}\begin{alignat}{2} \displaystyle\lim_{(x,y)\to (x_0,y_0)}\left(\frac{\sqrt{(x-x_0)^2+(y-y_0)^2}}{t-t_0}\right) &= \displaystyle\lim_{(x,y)\to (x_0,y_0)}\left(\sqrt{\frac{(x-x_0)^2+(y-y_0)^2}{(t-t_0)^2}}\right) \\ &= \displaystyle\lim_{(x,y)\to (x_0,y_0)}\left(\sqrt{\left(\frac{x-x_0}{t-t_0}\right)^2+\left(\frac{y-y_0}{t-t_0}\right)^2}\right) \\ &= \sqrt{\left(\displaystyle\lim_{(x,y)\to (x_0,y_0)}\left(\frac{x-x_0}{t-t_0}\right)\right)^2+\left( \displaystyle\lim_{(x,y)\to (x_0,y_0)}\left(\frac{y-y_0}{t-t_0}\right)\right)^2}. \end{alignat}[/latex]

Since [latex]x(t)[/latex] and [latex]y(t)[/latex] are both differentiable functions of [latex]t[/latex], both limits inside the last radical exist. Therefore, this value is finite. This proves the chain rule at [latex]t=t_0[/latex]; the rest of the theorem follows from the assumption that all functions are differentiable over their entire domains.

[latex]_\blacksquare[/latex]

Closer examination of the Chain Rule for One Independent Variable reveals an interesting pattern. The first term in the equation is [latex]\frac{\partial f}{\partial x}\cdot\frac{dx}{dt}[/latex] and the second term is [latex]\frac{\partial f}{\partial y}\cdot\frac{dy}{dt}[/latex]. Recall that when multiplying fractions, cancelation can be used. If we treat these derivatives as fractions, then each product “simplifies” to something resembling [latex]\partial f/dt[/latex]. The variables [latex]x[/latex] and [latex]y[/latex] that disappear in this simplification are often called intermediate variables: they are independent variables for the function [latex]f[/latex], but are dependent variables for the variable [latex]t[/latex]. Two terms appear on the right-hand side of the formula, and [latex]f[/latex] is a function of two variables. This pattern works with functions of more than two variables as well, as we see later in this section.

Example: Using the chain rule

Calculate [latex]dz/dt[/latex] for each of the following functions:

a. [latex]z=f(x,y)=4x^2+3y^2, x=x(t)=\sin{t},y=y(t)=\cos{t}[/latex]

b. [latex]z=f(x,y)=\sqrt{x^2-y^2},x=x(t)=e^{2t},y=y(t)=e^{-t}[/latex]

Try it

Calculate [latex]dz/dt[/latex] given the following functions. Express the final answer in terms of [latex]t[/latex].

[latex]z=f(x, y)=x^{2}-3xy+2y^{2}, x=x(t)=3\sin 2t, y=y(t)=4\cos 2t[/latex]

It is often useful to create a visual representation of the Chain Rule for One Independent Variable for the chain rule. This is called a tree diagram for the chain rule for functions of one variable and it provides a way to remember the formula (Figure 1). This diagram can be expanded for functions of more than one variable, as we shall see very shortly.

A diagram that starts with z = f(x, y). Along the first branch, it is written ∂z/∂x, then x = x(t), then dx/dt, then t, and finally it says ∂z/∂x dx/dt. Along the other branch, it is written ∂z/∂y, then y = y(t), then dy/dt, then t, and finally it says ∂z/∂y dy/dt.

Figure 1. Tree diagram for the case [latex]\small{\dfrac{dz}{dt}=\dfrac{\partial z}{\partial x}\cdot\dfrac{dx}{dt}+\dfrac{\partial z}{\partial y}\cdot\dfrac{dy}{dt}}[/latex].

In this diagram, the leftmost corner corresponds to [latex]z=f(x, y)[/latex]. Since [latex]f[/latex] has two independent variables, there are two lines coming from this corner. The upper branch corresponds to the variable [latex]x[/latex] and the lower branch corresponds to the variable [latex]y[/latex]. Since each of these variables is then dependent on one variable [latex]t[/latex], one branch then comes from [latex]x[/latex] and one branch comes from [latex]y[/latex]. Last, each of the branches on the far right has a label that represents the path traveled to reach that branch. The top branch is reached by following the [latex]x[/latex] branch, then the [latex]t[/latex] branch; therefore, it is labeled [latex](\partial z/\partial x)\times(dx/dt)[/latex]. The bottom branch is similar: first the [latex]y[/latex] branch, then the [latex]t[/latex] branch. This branch is labeled [latex](\partial z/\partial x)\times(dy/dt)[/latex]. To get the formula for [latex]dz/dt[/latex], add all the terms that appear on the rightmost side of the diagram. This gives us the Chain Rule for One Independent Variable.

In the Chain Rule for Two Independent Variables, [latex]z=f(x, y)[/latex] is a function of [latex]x[/latex] and [latex]y[/latex], and both [latex]x=g(u, v)[/latex] and [latex]y=h(u, v)[/latex] are functions of the independent variables [latex]u[/latex] and [latex]v[/latex].

Theorem: Chain Rule for two independent variables


Suppose [latex]x=g(u, v)[/latex] and [latex]y=h(u, v)[/latex] are differentiable functions of [latex]u[/latex] and [latex]v[/latex], and [latex]z=f(x, y)[/latex] is a differentiable function of [latex]x[/latex] and [latex]y[/latex]. Then, [latex]z=f(g(u, v), h(u, v))[/latex] is a differentiable function of [latex]u[/latex] and [latex]v[/latex], and

[latex]\large{\frac{\partial z}{\partial u}=\frac{\partial z}{\partial x}\frac{\partial x}{\partial u}+\frac{\partial z}{\partial y}\frac{\partial y}{\partial u}}[/latex]

and

[latex]\large{\frac{\partial x}{\partial v}=\frac{\partial z}{\partial x}\frac{\partial x}{\partial v}+\frac{\partial z}{\partial y}\frac{\partial y}{\partial v}}[/latex]

We can draw a tree diagram for each of these formulas as well as follows.
A diagram that starts with z = f(x, y). Along the first branch, it is written ∂z/∂x, then x = g(u, v), at which point it breaks into another two branches: the first subbranch says ∂x/∂u, then u, and finally it says ∂z/∂x ∂x/∂u; the second subbranch says ∂x/∂v, then v, and finally it says ∂z/∂x ∂x/∂v. Along the other branch, it is written ∂z/∂y, then y = h(u, v), at which point it breaks into another two branches: the first subbranch says ∂y/∂u, then u, and finally it says ∂z/∂y ∂y/∂u; the second subbranch says ∂y/∂v, then v, and finally it says ∂z/∂y ∂y/∂v.

Figure 2. Tree diagram for [latex]\small{\frac{\partial z}{\partial u}=\frac{\partial z}{\partial x}\frac{\partial x}{\partial u}+\frac{\partial z}{\partial y}\frac{\partial y}{\partial u}}[/latex] and [latex]\small{\frac{\partial x}{\partial v}=\frac{\partial z}{\partial x}\frac{\partial x}{\partial v}+\frac{\partial z}{\partial y}\frac{\partial y}{\partial v}}.[/latex]

To derive the formula for [latex]\partial z/\partial u[/latex], start from the left side of the diagram, then follow only the branches that end with [latex]u[/latex] and add the terms that appear at the end of those branches. For the formula for [latex]\partial z/\partial v[/latex], follow only the branches that end with [latex]v[/latex] and add the terms that appear at the end of those branches.

There is an important difference between these two chain rule theorems. In the Chain Rule for One Independent Variable, the left-hand side of the formula for the derivative is not a partial derivative, but in the Chain Rule for Two Independent Variables it is. The reason is that, in The reason is that, in the Chain Rule for One Independent Variable, [latex]z[/latex] is ultimately a function of [latex]t[/latex] alone, whereas in Chain Rule for Two Independent Variables, [latex]z[/latex] is a function of both [latex]u[/latex] and [latex]v[/latex].

Example: using the chain rule for two variables

Calculate [latex]\partial z/\partial u[/latex] and [latex]\partial z/\partial v[/latex] using the following functions:

[latex]z=f(x,y)=3x^2-2xy+y^2,x=x(u,v)=3u+2v,y=y(u,v)=4u-v[/latex]

Try it

Calculate [latex]\partial x/\partial u[/latex] and [latex]\partial z/\partial v[/latex] given the following functions:

[latex]z=f(x,y)=\frac{2x-y}{x+3y},x(u,v)=e^{2u}\cos 3v, y(u,v)=e^{2u}\sin 3v[/latex]

Watch the following video to see the worked solution to the above Try It

You can view the transcript for “CP 4.24” here (opens in new window).

The Generalized Chain Rule

Now that we’ve see how to extend the original chain rule to functions of two variables, it is natural to ask: Can we extend the rule to more than two variables? The answer is yes, as the generalized chain rule states.

Theorem: Generalized Chain Rule


Let [latex]w=f(x_1,x_2,\ldots,x_m)[/latex] be a differentiable function of [latex]m[/latex] independent variables, and for each [latex]i\in\{1,\ldots,m\}[/latex], let [latex]x_i=x_i(t_1,t_2,\ldots,t_n)[/latex] be a differentiable function of [latex]n[/latex] independent variables. Then

[latex]\large{\frac{\partial w}{\partial t_j}=\frac{\partial w}{\partial x_1}\frac{\partial x_1}{\partial t_j}+\frac{\partial w}{\partial x_2}\frac{\partial x_2}{\partial t_j}+\cdots+\frac{\partial w}{\partial x_m}\frac{\partial x_m}{\partial t_j}}[/latex]

for any [latex]j\in\{1,2,\ldots,n\}[/latex].

In the next example we calculate the derivative of a function of three independent variables in which each of the three variables is dependent on two other variables.

Example: using the generalized Chain Rule

Calculate [latex]\partial w/\partial u[/latex] and [latex]\partial w/\partial v[/latex] using the following functions:

[latex]\hspace{9cm}\begin{align} w&=f(x,y,z)=3x^2-2xy+4z^2 \\ x&=x(u,v)=e^u\sin v \\ y&=y(u,v)=e^y\cos v \\ z&=z(u,v)=e^u. \end{align}[/latex]

Try it

Calculate [latex]\partial w/\partial u[/latex] and [latex]\partial w/\partial v[/latex] given the following functions:

[latex]\hspace{8cm} \begin{align} w&=f(x,y,z)=\frac{x+2y-4z}{2x-y+3z} \\ x&=x(u,v) = e^{2u}\cos 3v \\ y&=y(u,v) = e^{2u}\sin 3v \\ z&=z(u,v) = e^{2u}. \end{align}[/latex]

Watch the following video to see the worked solution to the above Try It

You can view the transcript for “CP 4.25” here (opens in new window).

Example: drawing a tree diagram

Create a tree diagram for the case when

[latex]\large{w=f(x,y,z),x=x(t,u,v),y=y(t,u,v),z=z(t,u,v)}[/latex]

and write out the formulas for the three partial derivatives of [latex]w[/latex].

Try it

Create a tree diagram for the case when

[latex]\large{w=f(x,y),x=x(t,u,v),y=y(t,u,v)}[/latex]

and write out the formulas for the three partial derivatives of [latex]w[/latex].