Einstein based his theory of special relativity on two fundamental postulates. First, all physical laws are the same for all inertial frames of reference, regardless of their relative state of motion; and second, the speed of light in free space is the same in all inertial frames of reference, again, regardless of the relative velocity of each reference frame. The Lorentz transformation is fundamentally a direct consequence of this second postulate.
The second postulate Assume the
second postulate of special relativity stating the constancy of the speed of light, independent of reference frame, and consider a collection of reference systems moving with respect to each other with constant velocity, i.e.
inertial systems, each endowed with its own set of
Cartesian coordinates labeling the points, i.e.
events of spacetime. To express the invariance of the speed of light in mathematical form, fix two events in spacetime, to be recorded in each reference frame. Let the first event be the emission of a light signal, and the second event be it being absorbed. Pick any reference frame in the collection. In its coordinates, the first event will be assigned coordinates x_1, y_1, z_1, ct_1, and the second x_2, y_2, z_2, ct_2. The spatial distance between emission and absorption is \sqrt{(x_2-x_1)^2+(y_2-y_1)^2+(z_2-z_1)^2}, but this is also the distance c(t_2-t_1) traveled by the signal. One may therefore set up the equation c^2(t_2 - t_1)^2 - (x_2 - x_1)^2 - (y_2 - y_1)^2 - (z_2 - z_1)^2 = 0. Every other coordinate system will record, in its own coordinates, the same equation. This is the immediate mathematical consequence of the invariance of the speed of light. The quantity on the left is called the
spacetime interval. The interval is, for events separated by light signals, the same (zero) in all reference frames, and is therefore called
invariant.
Invariance of interval For the Lorentz transformation to have the physical significance realized by nature, it is crucial that the interval is an invariant quantity for
any two events, not just for those separated by light signals. To establish this, one considers an
infinitesimal interval, ds^2 = c^2 dt^2 - dx^2 - dy^2 - dz^2, as recorded in a system K. Let K' be another system assigning the interval ds'^2 to the same two infinitesimally separated events. Since if ds^2 = 0, then the interval will also be zero in any other system (second postulate), and since ds^2 and ds'^2 are infinitesimals of the same order, they must be proportional to each other, ds^2 = a ds'^2. On what may a depend? It may not depend on the positions of the two events in spacetime, because that would violate the postulated
homogeneity of spacetime. It might depend on the relative velocity V' between K and K', but only on the speed, not on the direction, because the latter would violate the
isotropy of space. Now bring in systems K_1 and K_2, ds^2 = a(V_1)ds_1^2, \quad ds^2 = a(V_2)ds_2^2, \quad ds_1^2 = a(V_{12})ds_2^2. From these it follows, \frac{a(V_2)}{a(V_1)} = a(V_{12}). Now, one observes that on the right-hand side that V_{12} depend on both V_1 and V_2; as well as on the angle between the
vectors \textbf{V}_1 and \textbf{V}_2. However, one also observes that the left-hand side does not depend on this angle. Thus, the only way for the equation to hold true is if the function a(V) is a constant. Further, by the same equation this constant is unity. Thus, ds^2 = ds'^2 for all systems K'. Since this holds for all infinitesimal intervals, it holds for
all intervals. Most, if not all, derivations of the Lorentz transformations take this for granted. In those derivations, they use the constancy of the speed of light (invariance of light-like separated events) only. This result ensures that the Lorentz transformation is the correct transformation.
Rigorous Statement and Proof of Proportionality of ds2 and ds′2 Theorem: Let n,p\geq 1 be integers, d:= n+p and V a
vector space over \Reals of
dimension d. Let h be an indefinite-inner product on V with
signature type (n,p). Suppose g is a
symmetric bilinear form on V such that the
null set of the
associated quadratic form of h is contained in that of g (i.e. suppose that for every v \in V, if h(v,v) = 0 then g(v,v)=0). Then, there exists a constant C\in\Reals such that g = Ch . Furthermore, if we assume n\neq p and that g also has signature type (n,p), then we have C>0. {{Hidden| title =
Remarks. | content = • In
the section above, the term "infinitesimal" in relation to ds^2 is actually referring (pointwise) to a
quadratic form over a four-dimensional real vector space (namely the
tangent space at a point of the spacetime manifold). The argument above is copied almost verbatim from Landau and Lifshitz, where the proportionality of ds^2 and ds'^2 is merely stated as an 'obvious' fact even though the statement is not formulated in a mathematically precise fashion nor proven. This is a non-obvious mathematical fact which needs to be justified; fortunately the proof is relatively simple and it amounts to basic algebraic observations and manipulations. • The above assumptions on h means the following: h:V\times V\to\Reals is a
bilinear form which is symmetric and
non-degenerate, such that there exists an ordered
basis \{v_1,\dots, v_n,v_{n+1},\dots, v_d\} of V for which h(v_a,v_b) = \begin{cases} -1 & \text{if } a = b, \text{where } a,b \in \{1,\dots, n\}\\ 1 & \text{if } a = b, \text{where } a,b \in \{n+1,\dots, d\}\\ 0&\text{ otherwise} \end{cases} An equivalent way of saying this is that h has the matrix representation \begin{pmatrix} -I_n & 0 \\0 & I_p\end{pmatrix} relative to the ordered basis \{v_1,\dots, v_d\}. • If we consider the special case where n = 1, p = 3 then we're dealing with the situation of
Lorentzian signature in 4-dimensions, which is what relativity is based on (or one could adopt the opposite convention with an overall minus sign; but this clearly doesn't affect the truth of the theorem). Also, in this case, if we assume g and h both have quadratics forms with the same null-set (in physics terminology, we say that g and h give rise to the same light cone) then the theorem tells us that there is a constant C>0 such that g = Ch . Modulo some differences in notation, this is precisely what was used in
the section above. }} {{Hidden| title =
Proof of Theorem (index notation) | content = For convenience, let us agree in this proof that Greek indices like \alpha,\beta range over \{1,\dots, n\} while Latin indices like i,j range over \{n+1,\dots, p\}. Also, we shall use the
Einstein summation convention throughout. Fix a basis \{v_1,\dots, v_d\} of V relative to which h has the matrix representation [h]= \begin{pmatrix} -I_n&0\\ 0&I_p \end{pmatrix} . Also, for each x=(x^1,\dots, x^n)\in \Reals^n and y=(y^{n+1}\dots, y^{n+p})\in\Reals^p having unit
Euclidean norm consider the vector w=x^{\alpha}v_{\alpha}+y^iv_i\in V. Then, by bilinearity we have h(w,w)=-\lVert x\rVert^2+\lVert y\rVert^2=-1+1=0, hence by our assumption, we have g(w,w)=0 as well. Using bilinearity and symmetry of g, this is equivalent to g_{\alpha\beta}x^{\alpha}x^{\beta} + 2g_{\alpha i}x^{\alpha}y^i + g_{ij} y^i y^j= 0. Since this is true for all x,y of unit norm, we can replace y with -y to get g_{\alpha\beta}x^{\alpha}x^{\beta} - 2g_{\alpha i}x^{\alpha}y^i + g_{ij} y^i y^j= 0. Now, we subtract these two equations and divide by 4 to obtain that for all x,y of unit norm, g_{\alpha i}x^{\alpha}y^i= 0. So, by choosing x = e_{\alpha} \in \Reals^n and y = e_i \in \Reals^p (i.e with 1 in the specified index and 0 elsewhere), we see that g_{\alpha i}=0 As a result of this, our first equation is simplified to g_{\alpha\beta}x^{\alpha}x^{\beta}=-g_{ij}y^iy^j. This is once again true for all x\in\Reals^n and y\in\Reals^p of unit norm. As a result all the off-diagonal terms vanish; in more detail, suppose \alpha,\beta\in\{1,\dots, n\} are distinct indices. Consider x_{\pm}=\frac{1}{\sqrt{2}}(e_{\alpha}\pm e_{\beta}). Then, since the right side of the equation doesn't depend on x, we see that g_{\alpha\beta}=-g_{\alpha\beta} and hence g_{\alpha\beta}=0. By an almost identical argument we deduce that if i,j\in\{n+1,\dots, n+p\} are distinct indices then g_{ij}=0. Finally, by successively letting x range over e_1,\dots, e_n\in\Reals^n and then letting y range over e_1,\dots, e_p\in\Reals^p, we see that -g_{11}=\dots = -g_{nn}=g_{n+1,n+1}=\dots = g_{n+p,n+p}, or in other words, g has the matrix representation [g]=-g_{11}\cdot \begin{pmatrix} -I_n& 0\\ 0 & I_p \end{pmatrix} , which is equivalent to saying g=-g_{11}\cdot h. So, the constant of proportionality claimed in the theorem is C=-g_{11}. Finally, if we assume that g,h both have signature types (n,p) and n\neq p then C := -g_{11}>0 (we can't have C = 0 because that would mean g = 0, which is impossible since having signature type (n,p) means it is a non-zero bilinear form. Also, if C, then it means g has n positive diagonal entries and p negative diagonal entries; i.e it is of signature (p,n)\neq (n,p), since we assumed n\neq p, so this is also not possible. This leaves us with C > 0 as the only option). This completes the proof of the theorem. }} {{Hidden| title =
Proof of Theorem (conceptual) | content = Fix a basis \{v_1,\dots, v_d\} of V relative to which h has the matrix representation [h]= \begin{pmatrix} -I_n&0\\ 0&I_p \end{pmatrix} . The point is that the vector space V can be decomposed into subspaces V^- (the span of the first n basis vectors) and V^+ (then span of the other p basis vectors) such that each vector in V can be written uniquely as v + w for v \in V^- and w \in V^+; moreover h(v,v) \leq 0, h(w,w) \geq 0 and h(v,w) = 0. So (by bilinearity) h(v+w,v+w) = h(v,v) + h(w,w) Since the first summand on the right in non-positive and the second in non-negative, for any v \in V^- and w \in V^+, we can find a scalar \alpha such that h(v + \alpha w, v + \alpha w) = 0. From now on, always consider v \in V^- and w \in V^+. By bilinearity \begin{align} g(v+w,v+w) &= g(v,v) + g(w,w) + 2g(v,w) \\ g(v-w,v-w) &= g(v,v) + g(w,w) - 2g(v,w) \end{align} If h(v+w,v+w) = 0, then also h(v-w, v-w) = 0 and the same is true for g (since the null-set of h is contained in that of g). In that case, subtracting the two expression above (and dividing by 4) yields 0 = g(v, w) As above, for each v \in V^- and w \in V^+, there is a scalar \alpha such that h(v + \alpha w, v + \alpha w) = 0, so g(v, \alpha w) = 0, which by bilinearity means g(v,w) = 0. Now consider nonzero v, v' \in V^- such that h(v, v) = h(v', v'). We can find w \in V^+ such that 0 = h(v + w, v + w) = h(v,v) + h(w,w) = h(v' + w, v' + w). By the expressions above, g(v,v) = -g(w,w) = g(v', v') Analogically, for w, w' \in V^+, one can show that if h(w,w)=h(w',w'), then also g(w,w)=g(w',w'). So it holds for all vectors in V. For u, u' \in V, if g(u, u) = Ch(u, u) \neq 0, g(u', u') = C'h(u',u') \neq 0 for some C, C' \in \mathbb{R}, we can (scaling one of the if necessary) assume h(u, u) = h(u',u'), which by the above means that C = C'. So g = Ch. Finally, if we assume that g,h both have signature types (n,p) and n\neq p then C >0 (we can't have C = 0 because that would mean g = 0, which is impossible since having signature type (n,p) means it is a non-zero bilinear form. Also, if C, then it means g has n positive diagonal entries and p negative diagonal entries; i.e. it is of signature (p,n)\neq (n,p), since we assumed n\neq p, so this is also not possible. This leaves us with C > 0 as the only option). This completes the proof of the theorem. }} {{Hidden| title =
Proof of Theorem (conceptual and broken down) | content = By
Sylvester's law of inertia, we can fix a basis \{v_1,\dots, v_d\} of V relative to which h has the matrix representation [h]= \begin{pmatrix} -I_n&0\\ 0&I_p \end{pmatrix} . The point is that the vector space V can be decomposed into subspaces V^- (the span of the first n basis vectors) and V^+ (then span of the other p basis vectors) such that each vector in V can be written uniquely as v + w for v \in V^- and w \in V^+; moreover h(v,v) \leq 0, h(w,w) \geq 0 and h(v,w) = 0. We will write h(u) for h(u,u) from now on.
Lemma: There exists a constant C\in\Reals such that for any v \in V^- and w \in V^+, (a) g(v,w) = 0 (b) g(u) = Ch(u) , where u = v+w {{Hidden| title =
Proof of Lemma | content = • Let a = \sqrt{-h(v)}, b = \sqrt{h(w)}.By bilinearity: • h(bv+aw) = h(bv)+h(aw) = b^2h(v)+a^2h(w) = 0 • h(bv-aw) = h(bv)+h(-aw) = b^2h(v)+a^2h(w) = 0Since the null set of h is contained in that of g: • 0 = g(bv+aw) = b^2g(v)+2bag(v,w)+a^2g(w) • 0 = g(bv-aw) = b^2g(v)-2bag(v,w)+a^2g(w)Adding/subtracting 4 and 5 from each other gives 6 and 7: • g(bv,aw) = 0 • b^2g(v)+a^2g(w) = 0By 6, g(v,w) = 0, proving (a),By 7 and 2,b^2g(v)=-a^2g(w) andb^2h(v)=-a^2h(w) • so \frac{g(v)}{h(v)} = \frac{g(w)}{h(w)}. Keeping v fixed and varying w, we see that this ratio does not depend and w. Similarly, it does not depend on v. Call this ratio C., we have g(u) = g(v) + g(w) = Ch(v) + Ch(w) = Ch(u), where the first equation follows from (a). This proves (b).\quad\square }} For all u,u'\in V , we have g(u,u') = \frac{g(u+u')-g(u-u')}{4} = \frac{Ch(u+u')-Ch(u-u')}{4} = Ch(u,u'), where the first and last equations follow from bilinearity, and the middle equation follows from Lemma part (b). So g = Ch. \quad \square }} == Standard configuration ==