# The Meaning of Relativity/Lecture 3

The Meaning of Relativity
Lecture III. The General Theory of Relativity
1517816The Meaning of Relativity — Lecture III. The General Theory of RelativityAlbert Einstein

LECTURE III

THE GENERAL THEORY OF RELATIVITY

All of the previous considerations have been based upon the assumption that all inertial systems are equivalent for the description of physical phenomena, but that they are preferred, for the formulation of the laws of nature, to spaces of reference in a different state of motion. We can think of no cause for this preference for definite states of motion to all others, according to our previous considerations, either in the perceptible bodies or in the concept of motion; on the contrary, it must be regarded as an independent property of the space-time continuum. The principle of inertia, in particular, seems to compel us to ascribe physically objective properties to the space-time continuum. Just as it was necessary from the Newtonian standpoint to make both the statements, tempus est absolutum, spatium est absolutum, so from the standpoint of the special theory of relativity we must say, continuum spatii et temporis est absolutum. In this latter statement absolutum means not only "physically real," but also "independent in its physical properties, having a physical effect, but not itself influenced by physical conditions."

As long as the principle of inertia is regarded as the keystone of physics, this standpoint is certainly the only one which is justified. But there are two serious criticisms of the ordinary conception. In the first place, it is contrary to the mode of thinking in science to conceive of a thing (the space-time continuum) which acts itself, but which cannot be acted upon. This is the reason why E. Mach was led to make the attempt to eliminate space as an active cause in the system of mechanics. According to him, a material particle does not move in unaccelerated motion relatively to space, but relatively to the centre of all the other masses in the universe; in this way the series of causes of mechanical phenomena was closed, in contrast to the mechanics of Newton and Galileo. In order to develop this idea within the limits of the modern theory of action through a medium, the properties of the space-time continuum which determine inertia must be regarded as field properties of space, analogous to the electromagnetic field. The concepts of classical mechanics afford no way of expressing this. For this reason Mach's attempt at a solution failed for the time being. We shall come back to this point of view later. In the second place, classical mechanics indicates a limitation which directly demands an extension of the principle of relativity to spaces of reference which are not in uniform motion relatively to each other. The ratio of the masses of two bodies is defined in mechanics in two ways which differ from each other fundamentally; in the first place, as the reciprocal ratio of the accelerations which the same motional force imparts to them (inert mass), and in the second place, as the ratio of the forces which act upon them in the same gravitational field (gravitational mass). The equality of these two masses, so differently defined, is a fact which is confirmed by experiments of very high accuracy (experiments of Eötvös), and classical mechanics offers no explanation for this equality. It is, however, clear that science is fully justified in assigning such a numerical equality only after this numerical equality is reduced to an equality of the real nature of the two concepts.

That this object may actually be attained by an extension of the principle of relativity, follows from the following consideration. A little reflection will show that the theorem of the equality of the inert and the gravitational mass is equivalent to the theorem that the acceleration imparted to a body by a gravitational field is independent of the nature of the body. For Newton's equation of motion in a gravitational field, written out in full, is

 ${\displaystyle {\text{(Inert mass)}}\cdot {\text{(Acceleration)}}={\text{(Intensity of the gravitational field)}}\cdot {\text{(Gravitational mass)}}.}$

It is only when there is numerical equality between the inert and gravitational mass that the acceleration is independent of the nature of the body. Let now ${\displaystyle K}$ be an inertial system. Masses which are sufficiently far from each other and from other bodies are then, with respect to ${\displaystyle K}$, free from acceleration. We shall also refer these masses to a system of co-ordinates ${\displaystyle K'}$, uniformly accelerated with respect to ${\displaystyle K}$. Relatively to ${\displaystyle K'}$ all the masses have equal and parallel accelerations; with respect to ${\displaystyle K'}$ they behave just as if a gravitational field were present and ${\displaystyle K'}$ were unaccelerated. Overlooking for the present the question as to the "cause" of such a gravitational field, which will occupy us later, there is nothing to prevent our conceiving this gravitational field as real, that is, the conception that ${\displaystyle K'}$ is "at rest" and a gravitational field is present we may consider as equivalent to the conception that only ${\displaystyle K}$ is an "allowable" system of co-ordinates and no gravitational field is present. The assumption of the complete physical equivalence of the systems of co- ordinates, ${\displaystyle K}$ and ${\displaystyle K'}$ we call the "principle of equivalence;" this principle is evidently intimately connected with the theorem of the equality between the inert and the gravitational mass, and signifies an extension of the principle of relativity to co-ordinate systems which are in non-uniform motion relatively to each other. In fact, through this conception we arrive at the unity of the nature of inertia and gravitation. For according to our way of looking at it, the same masses may appear to be either under the action of inertia alone (with respect to ${\displaystyle K}$) or under the combined action of inertia and gravitation (with respect to ${\displaystyle K'}$). The possibility of explaining the numerical equality of inertia and gravitation by the unity of their nature gives to the general theory of relativity, according to my conviction, such a superiority over the conceptions of classical mechanics, that all the difficulties encountered in development must be considered as small in comparison.

What justifies us in dispensing with the preference for inertial systems over all other co-ordinate systems, a preference that seems so securely established by experiment based upon the principle of inertia? The weakness of the principle of inertia lies in this, that it involves an argument in a circle: a mass moves without acceleration if it is sufficiently far from other bodies; we know that it is sufficiently far from other bodies only by the fact that it moves without acceleration. Are there, in general, any inertial systems for very extended portions of the space-time continuum, or, indeed, for the whole universe? We may look upon the principle of inertia as established, to a high degree of approximation, for the space of our planetary system, provided that we neglect the perturbations due to the sun and planets. Stated more exactly, there are finite regions, where, with respect to a suitably chosen space of reference, material particles move freely without acceleration, and in which the laws of the special theory of relativity, which have been developed above, hold with remarkable accuracy. Such regions we shall call "Galilean regions." We shall proceed from the consideration of such regions as a special case of known properties.

The principle of equivalence demands that in dealing with Galilean regions we may equally well make use of non-inertial systems, that is, such co-ordinate systems as, relatively to inertial systems, are not free from acceleration and rotation. If, further, we are going to do away completely with the difficult question as to the objective reason for the preference of certain systems of co-ordinates, then we must allow the use of arbitrarily moving systems of co-ordinates. As soon as we make this attempt seriously we come into conflict with that physical interpretation of space and time to which we were led by the special theory of relativity. For let ${\displaystyle K'}$ be a system of co-ordinates whose ${\displaystyle z'}$-axis coincides with the ${\displaystyle z}$-axis of ${\displaystyle K}$, and which rotates about the latter axis with constant angular velocity. Are the configurations of rigid bodies, at rest relatively to ${\displaystyle K'}$, in accordance with the laws of Euclidean geometry? Since ${\displaystyle K'}$ is not an inertial system, we do not know directly the laws of configuration of rigid bodies with respect to ${\displaystyle K'}$, nor the laws of nature, in general. But we do know these laws with respect to the inertial system ${\displaystyle K}$, and we can therefore estimate them with respect to ${\displaystyle K'}$. Imagine a circle drawn about the origin in the ${\displaystyle x'y'}$ plane of ${\displaystyle K'}$, and a diameter of this circle. Imagine, further, that we have given a large number of rigid rods, all equal to each other. We suppose these laid in series along the periphery and the diameter of the circle, at rest relatively to ${\displaystyle K'}$. If ${\displaystyle U}$ is the number of these rods along the periphery, ${\displaystyle D}$ the number along the diameter, then, if ${\displaystyle K'}$ does not rotate relatively to ${\displaystyle K}$, we shall have

 ${\displaystyle {\frac {U}{D}}=\pi .}$

But if ${\displaystyle K'}$ rotates we get a different result. Suppose that at a definite time ${\displaystyle t}$, of ${\displaystyle K}$ we determine the ends of all the rods. With respect to ${\displaystyle K}$ all the rods upon the periphery experience the Lorentz contraction, but the rods upon the diameter do not experience this contraction (along their lengths!).[1] It therefore follows that

 ${\displaystyle {\frac {U}{D}}>\pi .}$

It therefore follows that the laws of configuration of rigid bodies with respect to ${\displaystyle K'}$ do not agree with the laws of configuration of rigid bodies that are in accordance with Euclidean geometry. If, further, we place two similar clocks (rotating with ${\displaystyle K'}$), one upon the periphery, and the other at the centre of the circle, then, judged from ${\displaystyle K}$, the clock on the periphery will go slower than the clock at the centre. The same thing must take place, judged from ${\displaystyle K'}$ if we define time with respect to ${\displaystyle K'}$ in a not wholly unnatural way, that is, in such a way that the laws with respect to ${\displaystyle K'}$ depend explicitly upon the time. Space and time, therefore, cannot be defined with respect to ${\displaystyle K'}$ as they were in the special theory of relativity with respect to inertial systems. But, according to the principle of equivalence, ${\displaystyle K'}$ is also to be considered as a system at rest, with respect to which there is a gravitational field (field of centrifugal force, and force of Coriolis). We therefore arrive at the result: the gravitational field influences and even determines the metrical laws of the space-time continuum. If the laws of configuration of ideal rigid bodies are to be expressed geometrically, then in the presence of a gravitational field the geometry is not Euclidean.

The case that we have been considering is analogous to that which is presented in the two-dimensional treatment of surfaces. It is impossible in the latter case also, to introduce co-ordinates on a surface (e.g. the surface of an ellipsoid) which have a simple metrical significance, while on a plane the Cartesian co-ordinates, ${\displaystyle x_{1},x_{2}}$, signify directly lengths measured by a unit measuring rod. Gauss overcame this difficulty, in his theory of surfaces, by introducing curvilinear co-ordinates which, apart from satisfying conditions of continuity, were wholly arbitrary, and afterwards these co-ordinates were related to the metrical properties of the surface. In an analogous way we shall introduce in the general theory of relativity arbitrary co-ordinates, ${\displaystyle x_{1},x_{2},x_{3},x_{4}}$, which shall number uniquely the space-time points, so that neighbouring events are associated with neighbouring values of the co-ordinates; otherwise, the choice of co-ordinates is arbitrary. We shall be true to the principle of relativity in its broadest sense if we give such a form to the laws that they are valid in every such four-dimensional system of co-ordinates, that is, if the equations expressing the laws are co-variant with respect to arbitrary transformations.

The most important point of contact between Gauss's theory of surfaces and the general theory of relativity lies in the metrical properties upon which the concepts of both theories, in the main, are based. In the case of the theory of surfaces, Gauss's argument is as follows. Plane geometry may be based upon the concept of the distance ${\displaystyle ds}$, between two indefinitely near points. The concept of this distance is physically significant because the distance can be measured directly by means of a rigid measuring rod. By a suitable choice of Cartesian co-ordinates this distance may be expressed by the formula ${\displaystyle ds^{2}=dx_{1}^{2}+dx_{2}^{2}}$. We may base upon this quantity the concepts of the straight line as the geodesic (${\displaystyle \delta \int ds=0}$), the interval, the circle, and the angle, upon which the Euclidean plane geometry is built. A geometry may be developed upon another continuously curved surface, if we observe that an infinitesimally small portion of the surface may be regarded as plane, to within relatively infinitesimal quantities. There are Cartesian co-ordinates, ${\displaystyle X_{1},X_{2}}$, upon such a small portion of the surface, and the distance between two points, measured by a measuring rod, is given by

 ${\displaystyle ds^{2}=dX_{1}^{2}+dX_{2}^{2}.}$

If we introduce arbitrary curvilinear co-ordinates, ${\displaystyle x_{1},x_{2}}$, on the surface, then ${\displaystyle dX_{1},dX_{2}}$, may be expressed linearly in terms of ${\displaystyle dx_{1},dx_{2}}$. Then everywhere upon the surface we have

 ${\displaystyle ds^{2}=g_{11}dx_{1}^{2}+2g_{12}dx_{1}dx_{2}+g_{22}dx_{2}^{2}}$

where ${\displaystyle g_{11},g_{12},g_{22}}$ are determined by the nature of the surface and the choice of co-ordinates; if these quantities are known, then it is also known how networks of rigid rods may be laid upon the surface. In other words, the geometry of surfaces may be based upon this expression for ${\displaystyle ds^{2}}$ exactly as plane geometry is based upon the corresponding expression.

There are analogous relations in the four-dimensional space-time continuum of physics. In the immediate neighbourhood of an observer, falling freely in a gravitational field, there exists no gravitational field. We can therefore always regard an infinitesimally small region of the space-time continuum as Galilean. For such an infinitely small region there will be an inertial system (with the space co-ordinates, ${\displaystyle X_{1},X_{2},X_{3}}$, and the time co-ordinate ${\displaystyle X_{4}}$ relatively to which we are to regard the laws of the special theory of relativity as valid. The quantity which is directly measurable by our unit measuring rods and clocks,

 ${\displaystyle dX_{1}^{2}+dX_{2}^{2}+dX_{3}^{2}-dX_{4}^{2}}$

or its negative,

 ${\displaystyle ds^{2}=-dX_{1}^{2}-dX_{2}^{2}-dX_{3}^{2}+dX_{4}^{2}}$ (54)

is therefore a uniquely determinate invariant for two neighbouring events (points in the four-dimensional continuum), provided that we use measuring rods that are equal to each other when brought together and superimposed, and clocks whose rates are the same when they are brought together. In this the physical assumption is essential that the relative lengths of two measuring rods and the relative rates of two clocks are independent, in principle, of their previous history. But this assumption is certainly warranted by experience; if it did not hold there could be no sharp spectral lines; for the single atoms of the same element certainly do not have the same history, and it would be absurd to suppose any relative difference in the structure of the single atoms due to their previous history if the mass and frequencies of the single atoms of the same element were always the same.

Space-time regions of finite extent are, in general, not Galilean, so that a gravitational field cannot be done away with by any choice of co-ordinates in a finite region. There is, therefore, no choice of co-ordinates for which the metrical relations of the special theory of relativity hold in a finite region. But the invariant ${\displaystyle ds}$ always exists for two neighbouring points (events) of the continuum. This invariant ${\displaystyle ds}$ may be expressed in arbitrary co-ordinates. If one observes that the local ${\displaystyle dX_{\nu }}$ may be expressed linearly in terms of the co-ordinate differentials ${\displaystyle dx_{\nu },ds^{2}}$ may be expressed in the form

 ${\displaystyle ds^{2}=g_{\mu \nu }dx_{\mu }dx_{\nu }}$ (55)

The functions ${\displaystyle g_{\mu \nu }}$ describe, with respect to the arbitrarily chosen system of co-ordinates, the metrical relations of the space-time continuum and also the gravitational field. As in the special theory of relativity, we have to discriminate between time-like and space-like line elements in the four-dimensional continuum; owing to the change of sign introduced, time-like line elements have a real, space-like line elements an imaginary ${\displaystyle ds}$. The time-like ${\displaystyle ds}$ can be measured directly by a suitably chosen clock.

According to what has been said, it is evident that the formulation of the general theory of relativity assumes a generalization of the theory of invariants and the theory of tensors; the question is raised as to the form of the equations which are co-variant with respect to arbitrary point transformations. The generalized calculus of tensors was developed by mathematicians long before the theory of relativity. Riemann first extended Gauss's train of thought to continua of any number of dimensions; with prophetic vision he saw the physical meaning of this generalization of Euclid's geometry. Then followed the development of the theory in the form of the calculus of tensors, particularly by Ricci and Levi-Civita. This is the place for a brief presentation of the most important mathematical concepts and operations of this calculus of tensors.

We designate four quantities, which are defined as functions of the ${\displaystyle x_{\nu }}$ with respect to every system of co-ordinates, as components, ${\displaystyle A^{\nu }}$, of a contra-variant vector, if they transform in a change of co-ordinates as the co-ordinate differentials ${\displaystyle dx_{\nu }}$. We therefore have

 ${\displaystyle {A^{\mu }}'={\frac {\delta x_{\mu }'}{\delta x_{\nu }}}A^{\nu }}$ (56)

Besides these contra-variant vectors, there are also covariant vectors. If ${\displaystyle B_{\nu }}$ are the components of a co-variant vector, these vectors are transformed according to the rule

 ${\displaystyle B_{\mu }'={\frac {\delta x_{\nu }}{\delta x_{\mu }}}B_{\nu }.}$ (57)

The definition of a co-variant vector is chosen in such a way that a co-variant vector and a contra-variant vector together form a scalar according to the scheme,

 ${\displaystyle \phi =B_{\nu }A^{\nu }{\text{ (summed over the }}\nu {\text{)}}.}$
Accordingly,
 ${\displaystyle B_{\mu }'{A^{\mu }}'={\frac {\delta x_{\alpha }}{\delta x_{\mu }'}}{\frac {\delta x_{\mu }'}{\delta x_{\beta }}}B_{\alpha }A^{\beta }=B_{\alpha }A^{\alpha }.}$

In particular, the derivatives ${\displaystyle {\frac {\delta \phi }{\delta x_{\alpha }}}}$ of a scalar ${\displaystyle \phi }$, are components of a co-variant vector, which, with the co-ordinate differentials, form the scalar ${\displaystyle {\frac {\delta \phi }{\delta x_{\alpha }}}dx_{\alpha }}$; we see from this example how natural is the definition of the co-variant vectors.

There are here, also, tensors of any rank, which may have co-variant or contra-variant character with respect to each index; as with vectors, the character is designated by the position of the index. For example, ${\displaystyle A_{\mu }^{\nu }}$ denotes a tensor of the second rank, which is co-variant with respect to the index ${\displaystyle \mu }$, and contra-variant with respect to the index ${\displaystyle \nu }$. The tensor character indicates that the equation of transformation is

 ${\displaystyle {A_{\mu }^{\nu }}'={\frac {\delta x_{\alpha }}{\delta x_{\mu }'}}{\frac {\delta x_{\nu }'}{\delta x_{\beta }}}A_{\alpha }^{\beta }.}$ (58)

Tensors may be formed by the addition and subtraction of tensors of equal rank and like character, as in the theory of invariants of orthogonal linear substitutions, for example,

 ${\displaystyle A_{\mu }^{\nu }+B_{\mu }^{\nu }=C_{\mu }^{\nu }.}$ (59)

The proof of the tensor character of ${\displaystyle C_{\mu }^{\nu }}$ depends upon (58).

Tensors may be formed by multiplication, keeping the character of the indices, just as in the theory of invariants of linear orthogonal transformations, for example,

 ${\displaystyle A_{\mu }^{\nu }B_{\sigma \tau }=C_{\mu \sigma \tau }^{\nu }.}$ (60)
The proof follows directly from the rule of transformation.

Tensors may be formed by contraction with respect to two indices of different character, for example,

 ${\displaystyle A_{\mu \sigma \tau }^{\mu }=B_{\sigma \tau }}$ (61)

The tensor character of ${\displaystyle A_{\mu \sigma \tau }^{\mu }}$ determines the tensor character of ${\displaystyle B_{\sigma \tau }}$. Proof—

 ${\displaystyle {A_{\mu \sigma \tau }^{\mu }}'={\frac {\delta x_{\alpha }}{\delta x_{\mu }'}}{\frac {\delta x_{\mu }'}{\delta x_{\beta }}}{\frac {\delta x_{s}}{\delta _{\sigma }'}}{\frac {\delta x_{t}}{\delta x_{\tau }'}}={\frac {\delta x_{s}}{\delta x_{\sigma }'}}{\frac {\delta x_{t}}{\delta x_{\tau }'}}A_{\alpha st}^{\alpha }.}$

The properties of symmetry and skew-symmetry of a tensor with respect to two indices of like character have the same significance as in the theory of invariants.

With this, everything essential has been said with regard to the algebraic properties of tensors.

The Fundamental Tensor. It follows from the invariance of ${\displaystyle ds^{2}}$ for an arbitrary choice of the ${\displaystyle dx_{\nu }}$, in connexion with the condition of symmetry consistent with (55), that the ${\displaystyle g_{\mu \nu }}$, are components of a symmetrical co-variant tensor (Fundamental Tensor). Let us form the determinant, ${\displaystyle g}$, of the ${\displaystyle g_{\mu \nu }}$, and also the minors, divided by ${\displaystyle g}$, corresponding to the single ${\displaystyle g_{\mu \nu }}$. These minors, divided by ${\displaystyle g}$, will be denoted by ${\displaystyle g^{\mu \nu }}$ and their co-variant character is not yet known. Then we have

 ${\displaystyle g_{\mu \alpha }g^{\mu \beta }=\delta _{\alpha }^{\beta }={\begin{matrix}1{\text{ if }}\alpha =\beta \\0{\text{ if }}\alpha \neq \beta \end{matrix}}.}$ (62)

If we form the infinitely small quantities (co-variant vectors)

 ${\displaystyle d\xi _{\mu }=g_{\mu }\alpha dx_{\alpha }}$ (63)
multiply by ${\displaystyle g^{\mu \beta }}$ and sum over the ${\displaystyle \mu }$, we obtain, by the use of (62),
 ${\displaystyle dx_{\beta }=g^{\beta \mu }d\xi _{\mu }.}$ (64)

Since the ratios of the ${\displaystyle d\xi _{\mu }}$ are arbitrary, and the ${\displaystyle dx_{\beta }}$ as well as the ${\displaystyle dx_{\mu }}$ are components of vectors, it follows that the ${\displaystyle g^{\mu \nu }}$ are the components of a contra-variant tensor [2] (contra-variant fundamental tensor). The tensor character of ${\displaystyle \delta _{\alpha }^{\beta }}$ (mixed fundamental tensor) accordingly follows, by (62). By means of the fundamental tensor, instead of tensors with co-variant index character, we can introduce tensors with contra-variant index character, and conversely. For example,

 {\displaystyle {\begin{aligned}A^{\mu }&=g^{\mu \alpha }A_{\alpha }\\A_{\mu }&=g_{\mu \alpha }A^{\alpha }\\T_{\mu }^{\sigma }&=g^{\sigma \nu }T_{\mu \nu }.\end{aligned}}}

Volume Invariants. The volume element

 ${\displaystyle \int dx_{1}dx_{2}dx_{3}dx_{4}=dx}$

is not an invariant. For by Jacobi's theorem,

 ${\displaystyle dx'=\left|{\frac {dx_{\mu }'}{dx_{\nu }}}\right|dx.}$ (65)
But we can complement dx so that it becomes an invariant. If we form the determinant of the quantities
 ${\displaystyle g_{\mu \nu }'={\frac {\delta x_{\alpha }}{\delta x_{\mu }'}}{\frac {\delta x_{\beta }}{\delta x_{\nu }'}}g_{\alpha \beta }}$

we obtain, by a double application of the theorem of multiplication of determinants,

 ${\displaystyle g'=\left|g_{\mu \nu }'\right|=\left|{\frac {\delta x_{\nu }}{\delta x_{\mu }'}}\right|^{2}\cdot \left|g_{\mu \nu }\right|=\left|{\frac {\delta _{\mu }'}{\delta x_{\nu }}}\right|^{-2}g.}$

We therefore get the invariant,

 ${\displaystyle {\sqrt {g'}}dx'={\sqrt {g}}dx.}$

Formation of Tensors by Differentiation. Although the algebraic operations of tensor formation have proved to be as simple as in the special case of invariance with respect to linear orthogonal transformations, nevertheless in the general case, the invariant differential operations are, unfortunately, considerably more complicated. The reason for this is as follows. If ${\displaystyle A^{\mu }}$ is a contra-variant vector, the coefficients of its transformation, ${\displaystyle {\frac {\delta x_{\mu }'}{\delta x_{\nu }}}}$, are independent of position only if the transformation is a linear one. For then the vector components, ${\displaystyle A^{\mu }+{\frac {\delta A^{\mu }}{\delta x_{\alpha }}}dx_{\alpha }}$, at a neighbouring point transform in the same way as the ${\displaystyle A^{\mu }}$, from which follows the vector character of the vector differentials, and the tensor character of ${\displaystyle {\frac {\delta A^{\mu }}{\delta x_{\alpha }}}}$. But if the ${\displaystyle {\frac {\delta x_{\mu }'}{\delta x_{\nu }}}}$ are variable this is no longer true.

That there are, nevertheless, in the general case, invariant differential operations for tensors, is recognized most satisfactorily in the following way, introduced by Levi-Civita and Weyl. Let ${\displaystyle A^{\mu }}$ be a contra-variant vector whose components are given with respect to the co-ordinate system of the ${\displaystyle x_{\nu }}$. Let ${\displaystyle P_{1}}$ and ${\displaystyle P_{2}}$ be two infinitesimally near points of the continuum. For the infinitesimal region surrounding the point ${\displaystyle P_{1}}$, there is, according to our way of considering the matter, a co-ordinate system of the ${\displaystyle X_{\nu }}$ (with imaginary ${\displaystyle X_{\nu }}$-co-ordinates) for which the continuum is Euclidean. Let ${\displaystyle A_{(1)}^{\mu }}$ be the co-ordinates of the vector at the point ${\displaystyle P_{1}}$. Imagine a vector drawn at the point ${\displaystyle P_{2}}$, using the local system of the ${\displaystyle X_{\nu }}$, with the same co-ordinates (parallel vector through ${\displaystyle P_{2}}$, then this parallel vector is uniquely determined by the vector at ${\displaystyle P_{1}}$ and the displacement. We designate this operation, whose uniqueness will appear in the sequel, the parallel displacement of the vector ${\displaystyle A^{\mu }}$ from ${\displaystyle P_{1}}$ to the infinitesimally near point ${\displaystyle P_{2}}$ If we form the vector difference of the vector ${\displaystyle (A^{\mu })}$ at the point ${\displaystyle P_{2}}$ and the vector obtained by parallel displacement from ${\displaystyle P_{1}}$ to ${\displaystyle P_{2}}$, we get a vector which may be regarded as the differential of the vector ${\displaystyle (A^{\mu })}$ for the given displacement ${\displaystyle (dx_{\nu })}$.

This vector displacement can naturally also be considered with respect to the co-ordinate system of the ${\displaystyle x_{\nu }}$. If ${\displaystyle A^{\nu }}$ are the co-ordinates of the vector at ${\displaystyle P_{1}}$, ${\displaystyle A^{\nu }+\delta A^{\nu }}$ the co-ordinates of the vector displaced to ${\displaystyle P_{2}}$ along the interval ${\displaystyle (dx_{\nu })}$, then the ${\displaystyle \delta A^{\nu }}$ do not vanish in this case. We know of these quantities, which do not have a vector character, that they must depend linearly and homogeneously upon the ${\displaystyle dx_{\nu }}$ and the ${\displaystyle A^{\nu }}$. We therefore put

 ${\displaystyle \delta A^{\nu }=-\Gamma _{\alpha \beta }^{\nu }A^{\alpha }dx_{\beta }.}$ (67)

In addition, we can state that the ${\displaystyle \Gamma _{\alpha \beta }^{\nu }}$ must be symmetrical with respect to the indices ${\displaystyle \alpha }$ and ${\displaystyle \beta }$. For we can assume from a representation by the aid of a Euclidean system of local co-ordinates that the same parallelogram will be described by the displacement of an element ${\displaystyle d^{(1)}x_{\nu }}$ along a second element ${\displaystyle d^{(2)}x_{\nu }}$ as by a displacement of ${\displaystyle d^{(2)}x_{\nu }}$ along ${\displaystyle d^{(1)}x_{\nu }}$. We must therefore have

 {\displaystyle {\begin{aligned}d^{(2)}x_{\nu }+(d^{(1)}x_{\nu }-\Gamma _{\alpha \beta }^{\nu }d^{(1)}x_{\alpha }d^{(2)}&x_{\beta })\\&=d^{(1)}x_{\nu }+(d^{(2)}x_{\nu }-\Gamma _{\alpha \beta }^{\nu }d^{(2)}x_{\alpha }d^{(1)}x_{\beta }).\end{aligned}}}

The statement made above follows from this, after interchanging the indices of summation, ${\displaystyle \alpha }$ and ${\displaystyle \beta }$, on the right-hand side.

Since the quantities ${\displaystyle g_{\mu \nu }}$ determine all the metrical properties of the continuum, they must also determine the ${\displaystyle \Gamma _{\alpha \beta }^{\nu }}$. If we consider the invariant of the vector ${\displaystyle A^{\nu }}$, that is, the square of its magnitude,

 ${\displaystyle g_{\mu \nu }A^{\mu }A^{\nu }}$

which is an invariant, this cannot change in a parallel displacement. We therefore have

 ${\displaystyle 0=\delta (g_{\mu \nu }A^{\mu }A^{\nu })={\frac {\delta g_{\mu \nu }}{\delta x_{\alpha }}}A^{\mu }A^{\nu }dx_{\alpha }+g_{\mu \nu }A^{\mu }\delta A^{\nu }+g_{\mu \nu }A^{\nu }\delta A^{\mu }}$

or, by (67),

 ${\displaystyle \left({\frac {\delta g_{\mu \nu }}{\delta x_{\alpha }}}-g_{\mu \beta }\Gamma _{\nu \alpha }^{\beta }-g_{\nu \beta }\Gamma _{\mu \alpha }^{\beta }\right)A^{\mu }A^{\nu }dx_{\alpha }=0.}$

Owing to the symmetry of the expression in the brackets with respect to the indices ${\displaystyle \mu }$ and ${\displaystyle \nu }$, this equation can be valid for an arbitrary choice of the vectors ${\displaystyle (A^{\mu })}$ and ${\displaystyle dx_{\nu }}$ only when the expression in the brackets vanishes for all combinations of the indices. By a cyclic interchange of the indices ${\displaystyle \mu ,\nu ,\alpha }$, we obtain thus altogether three equations, from which we obtain, on taking into account the symmetrical property of the ${\displaystyle \Gamma _{\mu \nu }^{\alpha }}$,

 ${\displaystyle {\begin{bmatrix}\mu \nu \\\alpha \end{bmatrix}}=g_{\alpha \beta }\Gamma _{\mu \nu }^{\beta }}$ (68)

in which, following Christoffel, the abbreviation has been used,

 ${\displaystyle {\begin{bmatrix}\mu \nu \\\alpha \end{bmatrix}}={\frac {1}{2}}\left({\frac {\delta g_{\mu \alpha }}{\delta x_{\nu }}}+{\frac {\delta g_{\nu \alpha }}{\delta x_{\mu }}}-{\frac {\delta g_{\mu \nu }}{\delta x_{\alpha }}}\right).}$ (69)

If we multiply (68) by ${\displaystyle g^{\alpha \sigma }}$ and sum over the ${\displaystyle \alpha }$, we obtain

 ${\displaystyle \Gamma _{\mu \nu }^{\sigma }={\frac {1}{2}}g^{\sigma \alpha }\left({\frac {\delta g_{\mu \alpha }}{\delta x_{\nu }}}+{\frac {\delta g_{\nu \alpha }}{\delta x_{\mu }}}-{\frac {\delta g_{\mu \nu }}{\delta x_{\alpha }}}\right)={\begin{Bmatrix}\mu \nu \\\sigma \end{Bmatrix}}}$ (70)

in which ${\displaystyle {\begin{Bmatrix}\mu \nu \\\sigma \end{Bmatrix}}}$ is the Christoffel symbol of the second kind. Thus the quantities ${\displaystyle \Gamma }$ are deduced from the ${\displaystyle g_{\mu \nu }}$. Equations (67) and (70) are the foundation for the following discussion.

Co-variant Differentiation of Tensors. If ${\displaystyle (A^{\mu }+\delta A^{\mu })}$ is the vector resulting from an infinitesimal parallel displacement from ${\displaystyle P_{1}}$ to ${\displaystyle P_{2}}$, and ${\displaystyle (A^{\mu }+dA^{\mu }}$ the vector ${\displaystyle A^{\mu }}$ at the point ${\displaystyle P_{2}}$ then the difference of these two,

 ${\displaystyle dA^{\mu }-\delta A^{\mu }=\left({\frac {\delta A^{\mu }}{\delta x_{\sigma }}}+\Gamma _{\sigma \alpha }^{\mu }A^{\alpha }\right)dx_{\sigma }}$
is also a vector. Since this is the case for an arbitrary choice of the ${\displaystyle dx_{\sigma }}$, it follows that
 ${\displaystyle A^{\mu };\sigma ={\frac {\delta A^{\mu }}{\delta x_{\sigma }}}+\Gamma _{\sigma \alpha }^{\mu }A^{\alpha }}$ (71)

is a tensor, which we designate as the co-variant derivative of the tensor of the first rank (vector). Contracting this tensor, we obtain the divergence of the contra-variant tensor ${\displaystyle A^{\mu }}$. In this we must observe that according to (70),

 ${\displaystyle \Gamma _{\mu \sigma }^{\sigma }={\frac {1}{2}}g^{\sigma \alpha }{\frac {\delta g_{\sigma \alpha }}{\delta x_{\mu }}}={\frac {1}{\sqrt {g}}}{\frac {\delta {\sqrt {g}}}{\delta x_{\mu }}}.}$ (72)

If we put, further,

 ${\displaystyle A^{\mu }{\sqrt {g}}={\mathfrak {A}}^{\mu }}$ (73)

a quantity designated by Weyl as the contra-variant tensor density [3] of the first rank, it follows that,

 ${\displaystyle {\mathfrak {A}}={\frac {\delta {\mathfrak {A}}^{\mu }}{\delta x_{\mu }}}}$ (74)

is a scalar density.

We get the law of parallel displacement for the co-variant vector ${\displaystyle B_{\mu }}$ by stipulating that the parallel displacement shall be effected in such a way that the scalar

 ${\displaystyle \phi =A^{\mu }B_{\mu }}$

remains unchanged, and that therefore

 ${\displaystyle A^{\mu }\delta B_{\mu }+B_{\mu }\delta A^{\mu }}$
vanishes for every value assigned to ${\displaystyle (A^{\mu })}$. We therefore get
 ${\displaystyle \delta B_{\mu }=\Gamma _{\mu \sigma }^{\alpha }A_{\alpha }dx_{\sigma }}$ (75)

From this we arrive at the co-variant derivative of the co-variant vector by the same process as that which led to (71),

 ${\displaystyle B_{\mu };\sigma ={\frac {\delta B_{\mu }}{\delta x_{\sigma }}}-\Gamma _{\mu \sigma }^{\alpha }B_{\alpha }.}$ (76)

By interchanging the indices ${\displaystyle \mu }$ and ${\displaystyle \sigma }$, and subtracting, we get the skew-symmetrical tensor,

 ${\displaystyle \phi _{\mu \sigma }={\frac {\delta B_{\mu }}{\delta x_{\sigma }}}-{\frac {\delta B_{\sigma }}{\delta x_{\mu }}}.}$ (77)

For the co-variant differentiation of tensors of the second and higher ranks we may use the process by which (75) was deduced. Let, for example, ${\displaystyle (A_{\sigma \tau }}$ be a co-variant tensor of the second rank. Then ${\displaystyle A_{\sigma \tau }E^{\sigma }F^{\tau }}$ is a scalar, if ${\displaystyle E}$ and ${\displaystyle F}$ are vectors. This expression must not be changed by the ${\displaystyle \delta }$-displacement; expressing this by a formula, we get, using (67), ${\displaystyle \delta A_{\sigma \tau }}$ whence we get the desired co-variant derivative,

 ${\displaystyle A_{\sigma \tau ;\rho }={\frac {\delta A_{\sigma \tau }}{\delta x_{\rho }}}-\Gamma _{\sigma \rho }^{\alpha }A_{\alpha \tau }-\Gamma _{\tau \rho }^{\alpha }A_{\sigma \alpha }.}$ (78)

In order that the general law of co-variant differentiation of tensors may be clearly seen, we shall write down two co-variant derivatives deduced in an analogous way:

 ${\displaystyle A_{\sigma ;\rho }^{\tau }={\frac {\delta A_{\sigma }^{\tau }}{\delta x_{\rho }}}-\Gamma _{\sigma \rho }^{\alpha }A_{\alpha }^{\tau }+\Gamma _{\alpha \rho }^{\tau }A_{\sigma }^{\alpha }.}$ (79)
 ${\displaystyle A_{;\rho }^{\sigma \tau }={\frac {\delta A^{\sigma \tau }}{\delta x_{\rho }}}+\Gamma _{\alpha \rho }^{\sigma }A^{\alpha \tau }+\Gamma _{\alpha \rho }^{\tau }A^{\sigma \alpha }.}$ (80)
The general law of formation now becomes evident. From these formulæ we shall deduce some others which are of interest for the physical applications of the theory.

In case ${\displaystyle A_{\sigma \tau }}$ is skew-symmetrical, we obtain the tensor

 ${\displaystyle A_{\sigma \tau \rho }={\frac {\delta A_{\sigma \tau }}{\delta x_{\rho }}}+{\frac {\delta A_{\tau \rho }}{\delta x_{\sigma }}}+{\frac {\delta A_{\rho \sigma }}{\delta x_{\tau }}}}$ (81)

which is skew-symmetrical in all pairs of indices, by cyclic interchange and addition.

If, in (78), we replace ${\displaystyle A_{\sigma \tau }}$ by the fundamental tensor, ${\displaystyle g_{\sigma \tau }}$, then the right-hand side vanishes identically; an analogous statement holds for (80) with respect to ${\displaystyle g^{\sigma \tau }}$; that is, the co-variant derivatives of the fundamental tensor vanish. That this must be so we see directly in the local system of co-ordinates.

In case ${\displaystyle A^{\sigma \tau }}$ is skew-symmetrical, we obtain from (80), by contraction with respect to ${\displaystyle \tau }$ and ${\displaystyle \rho }$,

 ${\displaystyle {\mathfrak {A}}^{\sigma }={\frac {\delta {\mathfrak {A}}^{\sigma \tau }}{\delta x_{\tau }}}}$ (82)

In the general case, from (79) and (80), by contraction with respect to ${\displaystyle tau}$ and ${\displaystyle \rho }$, we obtain the equations,

 ${\displaystyle {\mathfrak {A}}_{\sigma }={\frac {\delta {\mathfrak {A}}_{\sigma }^{\alpha }}{\delta x_{\alpha }}}-\Gamma _{\sigma \beta }^{\alpha }{\mathfrak {A}}_{\alpha }^{\beta }.}$ (83)
 ${\displaystyle {\mathfrak {A}}^{\sigma }={\frac {\delta {\mathfrak {A}}^{\sigma \alpha }}{\delta x_{\alpha }}}+\Gamma _{\alpha \beta }^{\sigma }{\mathfrak {A}}^{\alpha \beta }.}$ (84)

The Riemann Tensor. If we have given a curve extending from the point ${\displaystyle P}$ to the point ${\displaystyle G}$ of the continuum, then a vector ${\displaystyle A^{\mu }}$ given at ${\displaystyle P}$, may, by a parallel displacement, be moved along the curve to ${\displaystyle G}$. If the continuum is Euclidean (more generally, if by a suitable choice of co-ordinates the ${\displaystyle g_{\mu \nu }}$ are constants) then the vector obtained at ${\displaystyle G}$ as a result of this displacement does not depend upon the choice of the curve joining ${\displaystyle P}$ and ${\displaystyle G}$. But otherwise, the result depends upon the path of the displacement. In this case, therefore, a vector suffers a change, ${\displaystyle \Delta A^{\mu }}$ (in its direction, not its magnitude), when it is carried from a point ${\displaystyle P}$ of a closed curve, along the

curve, and back to ${\displaystyle P}$. We shall now calculate this vector change:

 ${\displaystyle \Delta A^{\mu }=\int _{0}\delta A^{\mu }.}$

As in Stokes' theorem for the line integral of a vector around a closed curve, this problem may be reduced to the integration around a closed curve with infinitely small linear dimensions; we shall limit ourselves to this case.

We have, first, by (67),

 ${\displaystyle \Delta A_{\mu }=-\int _{0}\Gamma _{\alpha \beta }^{\mu }A^{\alpha }dx_{\beta }.}$

In this, ${\displaystyle \Gamma _{\alpha \beta }^{\mu }}$ is the value of this quantity at the variable point ${\displaystyle G}$ of the path of integration. If we put

 ${\displaystyle \xi ^{\mu }=(x_{\mu })_{G}-(x_{\mu })_{P}}$

and denote the value of ${\displaystyle \Gamma _{\alpha \beta }^{\mu }}$ at ${\displaystyle P}$ by ${\displaystyle {\overline {\Gamma _{\alpha \beta }^{\mu }}}}$, then we have, with sufficient accuracy,

 ${\displaystyle \Gamma _{\alpha \beta }^{\mu }={\overline {\Gamma _{\alpha \beta }^{\mu }}}+{\frac {\delta {\overline {\Gamma _{\alpha \beta }^{\mu }}}}{\delta x_{\nu }}}\xi ^{\nu }.}$

Let, further, ${\displaystyle A^{\alpha }}$ be the value obtained from ${\displaystyle {\overline {A^{\alpha }}}}$ by a parallel displacement along the curve from ${\displaystyle P}$ to ${\displaystyle G}$. It may now easily be proved by means of (67) that ${\displaystyle A^{\mu }-{\overline {A^{\mu }}}}$ is infinitely small of the first order, while, for a curve of infinitely small dimensions of the first order, ${\displaystyle \Delta A^{\mu }}$ is infinitely small of the second order. Therefore there is an error of only the second order if we put

 ${\displaystyle A^{\alpha }={\overline {A^{\alpha }}}-{\overline {\Gamma _{\sigma \tau }^{\alpha }}}\;{\overline {\xi ^{\beta }}}.}$

If we introduce these values of ${\displaystyle \Gamma _{\alpha \beta }^{\mu }}$ and ${\displaystyle A^{\alpha }}$ into the integral, we obtain, neglecting all quantities of a higher order of small quantities than the second,

 ${\displaystyle \Delta A^{\mu }=-\left({\frac {\delta \Gamma _{\sigma \beta }^{\mu }}{\delta x_{\alpha }}}-\Gamma _{\rho \beta }^{\mu }\Gamma _{\sigma \alpha }^{\rho }\right)A^{\sigma }\int _{0}\xi ^{\alpha }d\xi ^{\beta }.}$ (85)

The quantity removed from under the sign of integration refers to the point ${\displaystyle P}$. Subtracting ${\displaystyle {\frac {1}{2}}d(\xi ^{\alpha }\xi ^{\beta })}$ from the integrand, we obtain

 ${\displaystyle {\frac {1}{2}}\int _{0}(\xi ^{\alpha }d\xi ^{\beta }-\xi ^{\beta }d\xi ^{\alpha }).}$

This skew-symmetrical tensor of the second rank, ${\displaystyle f_{\alpha \beta }}$, characterizes the surface element bounded by the curve in magnitude and position. If the expression in the brackets in (85) were skew-symmetrical with respect to the indices ${\displaystyle \alpha }$ and ${\displaystyle \beta }$, we could conclude its tensor character from (85). We can accomplish this by interchanging the summation indices ${\displaystyle \alpha }$ and ${\displaystyle \beta }$ in (85) and adding the resulting equation to (85). We obtain

 ${\displaystyle 2\Delta A^{\mu }=-R_{\sigma \alpha \beta }^{\mu }A^{\sigma }f^{\alpha \beta }}$ (86)

in which

 ${\displaystyle R_{\sigma \alpha \beta }^{\mu }=-{\frac {\delta \Gamma _{\sigma \alpha }^{\mu }}{\delta x_{\beta }}}+{\frac {\delta \Gamma _{\sigma \beta }^{\mu }}{\delta x_{\alpha }}}+\Gamma _{\rho \alpha }^{\mu }\Gamma _{\sigma \beta }^{\rho }-\Gamma _{\rho \beta }^{\mu }\Gamma _{\sigma \alpha }^{\rho }.}$ (87)

The tensor character of ${\displaystyle R_{\sigma \alpha \beta }^{\mu }}$ follows from (86); this is the Riemann curvature tensor of the fourth rank, whose properties of symmetry we do not need to go into. Its vanishing is a sufficient condition (disregarding the reality of the chosen co-ordinates) that the continuum is Euclidean.

By contraction of the Riemann tensor with respect to the indices ${\displaystyle \mu }$, ${\displaystyle \beta }$, we obtain the symmetrical tensor of the second rank,

 ${\displaystyle R_{\mu \nu }=-{\frac {\delta \Gamma _{\mu \nu }^{\alpha }}{\delta x_{\alpha }}}+\Gamma _{\mu \beta }^{\alpha }\Gamma _{\nu \alpha }^{\beta }+{\frac {\delta \Gamma _{\mu \alpha }^{\alpha }}{\delta x_{\nu }}}-\Gamma _{\mu \nu }^{\alpha }\Gamma _{\alpha \beta }^{\beta }.}$ (88)

The last two terms vanish if the system of co-ordinates is so chosen that ${\displaystyle g}$ = constant. From ${\displaystyle R_{\mu \nu }}$ we can form the scalar,

 ${\displaystyle R=g^{\mu \nu }R_{\mu \nu }.}$ (89)

Straightest (Geodetic) Lines. A line may be constructed in such a way that its successive elements arise from each other by parallel displacements. This is the natural generalization of the straight line of the Euclidean geometry. For such a line, we have

 ${\displaystyle \delta \left({\frac {dx_{\mu }}{ds}}\right)=-\Gamma _{\alpha \beta }^{\mu }{\frac {dx_{\alpha }}{ds}}dx_{\beta }.}$

The left-hand side is to be replaced by ${\displaystyle {\frac {d^{2}x_{\mu }}{ds^{2}}}}$,[4] so that we have

 ${\displaystyle {\frac {d^{2}x_{\mu }}{ds}}=-\Gamma _{\alpha \beta }^{\mu }{\frac {dx_{\alpha }}{ds}}dx_{\beta }.}$ (90)

We get the same line if we find the line which gives a stationary value to the integral

 ${\displaystyle \int ds{\text{ or }}\int {\sqrt {g_{\mu \nu }dx_{\mu }dx_{\nu }}}}$

between two points (geodetic line).

1. These considerations assume that the behaviour of rods and clocks depends only upon velocities, and not upon accelerations, or, at least, that the influence of acceleration does not counteract that of velocity.
2. If we multiply (64) by \frac{\delta x_\alpha'}{\delta x_\beta}[/itex], sum over the ${\displaystyle \beta }$, and replace the ${\displaystyle d\xi ^{\mu }}$ by a transformation to the accented system, we obtain
 ${\displaystyle dx_{\alpha }'={\frac {\delta x_{\sigma }'}{\delta x_{\mu }}}{\frac {\delta x_{\alpha }'}{\delta x_{\beta }}}g^{\mu \beta }d\xi _{\sigma }'.}$

The statement made above follows from this, since, by (64), we must also have ${\displaystyle dx_{\alpha }'={g^{\sigma \alpha }}'d\xi _{\sigma }'}$, and both equations must hold for every choice of the ${\displaystyle d\xi _{\sigma }'}$.

3. This expression is justified, in that ${\displaystyle A^{\mu }{\sqrt {g}}dx={\mathfrak {A}}^{\mu }dx}$ has a tensor character. Every tensor, when multiplied by ${\displaystyle {\sqrt {g}}}$, changes into a tensor density. We employ capital Gothic letters for tensor densities.
4. The direction vector at a neighbouring point of the curve results, by a parallel displacement along the line element ${\displaystyle (dx_{\beta })}$ from the direction vector of each point considered.