Home1860 Edition

PROBABILITY

Volume 18 · 68,014 words · 1860 Edition

The doctrine of probability is an extensive and very important branch of mathematical science, the object of which is to reduce to calculation the reasons which we have for believing or expecting any contingent event, or for assenting to any conclusion which is not necessarily true. When it is considered that the whole edifice of human science, with the exception of a few self-evident truths, such as the axioms of geometry, is nothing more than an assemblage of propositions which can only be pronounced to be more or less probable, the importance of a calculus which enables us to appreciate exactly the degree of probability existing in each case, will be readily understood.

Our reasons for judging an event to be probable or improbable, are derived from two distinct sources; first, an *a priori* knowledge of the causes or circumstances which determine its occurrence; and, secondly, when the causes are unknown, experience of what has already happened in the same circumstances, or in circumstances apparently similar. Suppose, for example, a hundred white balls to be placed in an urn along with fifty black balls, and that a person, blindfold, proceeds to draw a ball, there is to us, who are acquainted with the contents of the urn, a determinate probability that the ball which is drawn will be white. The balls being supposed to be all in precisely the same circumstances with respect to facility of drawing, we assume that there is the same chance of drawing any one ball as of drawing any other; and, consequently, since there are two white balls for each black ball, and therefore two chances of drawing one of the first colour for each chance of drawing one of the second, we conclude the event which consists in the drawing of a white ball to be twice as probable as the opposite event, or the drawing of a black ball. In this case our knowledge of the contents of the urn enables us to judge of the probable result of the drawing. Suppose, however, that antecedently to the drawing, we were entirely ignorant of the contents of the urn, but that after a great number of trials have been made, (the ball drawn being always replaced in the urn after each trial, in order that the circumstances may be the same in all the trials) it has been observed that a white ball has been drawn twice as often as a black ball, we presume that the urn contains twice as many white as black balls, and consequently affords twice as many chances of drawing a white ball as of drawing a black; and this presumption becomes stronger in proportion to the number of instances included in the observation. In this case experience makes up for the want of *a priori* knowledge, and affords a measure of the probability of the result of a future trial.

It is only in a comparatively small number of cases that all the possible ways in which an event may happen are known *a priori*, and in which, consequently, the ratio of the number of chances favouring the event to the whole number of existing chances is determinate. In fact, most of the questions of this class to which the calculus can be applied, are connected with lotteries and games of hazard. The results obtained from the analysis of such questions cannot be considered as being of any great value in themselves, but they frequently throw light on subjects of far higher importance which present analogous combinations. It is true that the mathematical theory comes in aid of moral considerations, and demonstrates the ruinous tendency of gambling even when the conditions of the play are equal, mathematically speaking; but, unfortunately, those who indulge a passion for this vice are seldom capable of appreciating the force of such arguments. The principal advantage which has resulted from the application of analysis to games of chance is the extension and improvement of the calculus to which it has led.

The calculation of the probabilities of events, the chances of which are not known *a priori*, but inferred from experience, is founded on the presumed constancy of the laws of nature, in obedience to which events depending on constant though unknown causes, are always reproduced in the same order when considered in large numbers. Among the various phenomena of the physical and moral world, nothing is more remarkable than the constancy which is observed to prevail in the recurrence of events of the same kind. The ratio of male to female births furnishes a noted instance. If we consider only a small number of births, nothing can be more uncertain than the result; but taking a very large number, as those of a whole kingdom in the course of a year, the proportion of males to females is found to be almost invariable, and nearly as 21 to 20. The mean duration of human life affords another familiar example. Notwithstanding the proverbial uncertainty of life, the differences of constitutions, and the various accidents to which mankind are exposed, the average duration of the lives of a large number of individuals living in the same country is always found to be very nearly the same, insomuch that pecuniary risks depending on it, if undertaken in sufficiently large numbers, are among the least uncertain of all commercial speculations. A similar constancy is remarked in the results of statistical inquiries of every kind. The number of crimes of the same species committed in a year, the ratio of the number of acquittals to the number of trials, the number of conflagrations, of ships lost in a particular trade, of letters which pass through the post-office, of patients admitted into the public hospitals; in every case the numbers in a given time are observed to fluctuate between very narrow limits, and to approach nearer and nearer, as the observation is more extended, to fixed mean values.

This constant approximation to fixed ratios, which is proved by all experience, in the recurrence of events of the same kind, enables us to apply the calculus of probabilities to many of the most interesting questions connected with our social and political institutions; and to determine the average result of a series of coming events with as much precision as if their chances were determinate, and known *a priori*, like that of obtaining a given point with the throw of a die. Whatever be the nature of the phenomenon under consideration, whether it belong to the physical or moral order of things, the calculus is equally applicable when the requisite data have been determined from experience.

The foundations of the mathematical theory of probabilities were laid by Pascal and Fermat about the middle of the 17th century. Among some other questions relating to chances, the following was proposed to Pascal. "Two persons sit down to play on the condition that the one who first gains three games shall be the winner of the stakes. The first having gained two games, and the second one, they agree to leave off and divide the stakes in proportion to their respective probabilities of winning; what share is each entitled to take?" Pascal solved the question, but by a method which was applicable only to the particular case. Fermat, to whom it was communicated by Pascal, employed the direct and general method of combinations, and gave a solution which could be applied to the case of any number of players. His reasoning, however, did not at first appear to Pascal to be satisfactory, and a correspondence on the subject took place between these two illustrious Probability geometers, which is preserved in their respective works, and throws some light on the history of mathematics in that age.

About the same period Huygens composed his tract *De Ratiociniis in Ludis Aleae*, which was first published in the *Exercitationes Geometricae* of Schooten in 1658. This was the first systematic treatise which appeared on the doctrine of chances. It contained an analysis of the various questions which had been solved by Pascal and Fermat, and at the end five new questions were proposed, the solutions of which, simple as they may now appear, were then attended with considerable difficulty. The analysis of two of them was in fact given for the first time by Montmort, half a century after their publication. Huygen's tract was translated into English and published in 1692, with some additional remarks relative to the advantage of the banker in the game of Pharaoh, in an *Essay on the Laws of Chance*, edited and supposed to have been written by Motte, then Secretary of the Royal Society.

James Bernoulli appears to have been the first who perceived that the theory of probability may be applied to much more important purposes than to regulate the stakes and expectations of gamblers, and that the phenomena, both of the moral and physical world, anomalous and irregular as they appear when viewed in detail, exhibit, when considered in large numbers, a constancy of succession which renders their occurrence capable of being submitted to numerical estimation. The *Ars Conjectandi*, published in 1713, seven years after the death of the author, contains a number of interesting questions relative to combinations and infinite series; but the most remarkable result which it contains is a theorem respecting the indefinite repetition of events, which may be said to form the basis of all the higher applications of the theory. It consists in this, that if a series of trials be instituted respecting an event which must either happen or fail in each trial, the probability becomes greater and greater, as the number of trials is increased, that the ratio of the number of times it happens, to the whole number of trials, will be equal to its *a priori* probability in a single trial; and that the number of trials may be made so great as to give a probability, approaching as nearly to certainty as we please, that the difference between the ratio of its occurrences to the number of trials, and the fraction which measures its *a priori* probability, will be less than any assigned quantity. Bernoulli informs us, that the solution of this important theorem had engaged his attention during a period of twenty years.

In the interval between the death of Bernoulli and the appearance of the *Ars Conjectandi*, Montmort published his *Essai d'Analyse sur les Jeux de Hazard*. The first edition was in 1708; the second, which is considerably extended, and enriched by several letters of John and Nicolas Bernoulli, appeared in 1713. The work possesses considerable merit; but being chiefly confined to the examination of the conditions of games of chance, many of which are now forgotten, it has lost much of its original interest.

About the same time, De Moivre began to turn his attention to the subject of probability, and his labours, which were continued during a long life, contributed greatly to the advancement of the general theory, as well as the extension of some of its most interesting applications. De Moivre's first publication on the subject was a Latin memoir *De Mensura Sortis*, in the *Transactions of the Royal Society* for 1711. His *Essay on the Doctrine of Chances* first appeared in 1716; a second edition in 1738; but the third and most valuable, including also his *Treatise on Annuities on Lives*, is dated 1756. This work contains a great variety of questions relating to chances, solved with much clearness and elegance; but it is chiefly remarkable for the theory of recurring series, there given for the first time, which is of important use in investigations of this kind, and is in fact equivalent to the methods employed in the modern calculus for the integration of equations of finite differences having constant co-efficients. Of the particular results obtained by De Moivre, one of the most important in reference to theory, is an extension of the theorem of James Bernoulli, above mentioned. It follows from Bernoulli's theorem, that if we have a given probability that the ratio of the number of occurrences of an event to the whole number of trials, will approach to the *a priori* probability of the event within certain given limits, those limits will become narrower and narrower, as the number of trials is multiplied; but in order to complete the theorem, it is necessary to assign the numerical value of the probability that in a large number of future trials, the number of occurrences will fall within assigned limits. For this purpose we must find the product of the natural numbers 1, 2, 3, 4, &c., up to the number of trials; an operation which, if attempted by direct multiplication, becomes very laborious, even when the number of trials is inconsiderable, and when the number is great, as 10,000 for example, is altogether beyond the reach of human industry. A formula was however discovered by Stirling, by means of which an approximate value of the product is found by the summation of a few of the first terms of a series which converges the more rapidly as the number of trials is greater. With the aid of this formula, De Moivre was enabled to assign the probability in question, and thus give a practical value to the theorem of Bernoulli.

The objects and important applications of the theory of probabilities having been made known by the works now mentioned, the subject has ever since been regarded as one of the most curious and interesting branches of mathematical speculation, and accordingly has received more or less attention from almost every mathematician of eminence. A great variety of questions connected with it and especially relating to lotteries, are interspersed in the volumes of the *Paris* and *Berlin Memoirs*, (particularly the latter,) by John and Nicolas Bernoulli, Euler, Lambert, Beguelin, and others. D'Alembert has likewise treated of the theory in several of the volumes of his *Opuscules*; and it is not a little remarkable, that in some instances its first principles should have been misunderstood by so ingenious and profound a writer. In the St. Petersburg Memoirs, (vol. v.) there is an interesting paper by Daniel Bernoulli on the relative values of the expectations of individuals who engage in play, or stake sums on contingent benefits, when regard is had to the difference of their fortunes; a consideration which, in many cases, it is necessary to take into account; for it is obvious, that the value of a sum of money to an individual, depends not merely on its absolute amount, but also on his previous wealth. On this principle Bernoulli has founded a theory of *moral expectation*, which admits of numerous and important applications to the ordinary affairs of life. The *Transactions of the Royal Society* for the years 1763 and 1764, contain two papers by the Rev. Mr. Bayes, with additions to the latter by Dr. Price, which deserve to be noticed, inasmuch as the principles on which the probability of an event is determined, when the event depends on causes of which the existence and influence are only presumed from experience, are there for the first time correctly laid down. The question proposed and solved by Bayes was this: a series of experiments having been made relative to an event, to determine the presumption there is, that the fraction which measures its probability falls within given limits.

One of the earliest applications of the theory of probability was to determine, from observations of mortality, the average duration of human life, and the value of pecuniary interests depending on its continuance or failure. This particular application appears to have been first thought of,

---

1 Oeuvres de Blaise Pascal, tome iv. Paris, 1819; Opera Petri de Fermat, Toulouse, 1679. Probability or at least attempted to be carried into practical effect, in Holland, by Huddie and the celebrated pensionary De Witt; but the first tables of mortality, with the corresponding values of annuities on single lives, were constructed by our illustrious countryman Dr. Halley, and published in the Philosophical Transactions for 1693. For the history of this branch of the subject, we refer to the two articles, Annuities and Mortality in this work. We may remark, however, that although the English writers, who have expressly treated of it, have almost without exception confined themselves to the explanation of the methods of computing annuity tables, and of determining from them the values of sums depending on life contingencies, the aid which this branch of economy derives from the general theory of probabilities, is by no means confined to the consideration of such elementary questions. The number of observations necessary to inspire confidence in the tables, the extent to which risks may be safely undertaken, the comparative weights of different sets of observations, and the probable limits of departure from the average results of previous observations in a given number of future instances, are all questions of the utmost importance, which come within the scope of the calculus, and cannot, in fact, be justly appreciated by any other means.

The application of the theory of probability to the subject of jurisprudence, and the verdicts of juries and decisions of tribunals, has been discussed by the Marquis Condorcet in various articles in the Encyclopédie Méthodique; but more especially in his Essai sur l'application de l'Analyse à la Probabilité des Decisions rendues à la Pluralité des Voix, Paris 1785; a work of great ingenuity, and abounding with interesting remarks on subjects of the highest importance to humanity. James Bernoulli, it appears, had intended to treat jurisprudence as a branch of probability in the Ars Conjectandi, but his premature death prevented that work from being completed. There is a memoir on the subject by his nephew Nicolas, in the Leipzig Acts for 1711. The most important questions to be determined, are the number of jurors of which a jury ought to consist, and the majority which should be required to agree in a verdict in order to afford, on the one hand, the greatest probability that an accused person will not be wrongly condemned; and, on the other, to give to society the greatest security that its interests will not be compromised, by allowing too great facilities for the guilty to escape. This important subject has been treated more profoundly, and with numerical elements derived from much better data than existed in the time of Condorcet, in a recent work by Poisson, to which we shall presently allude.

Another of the moral subjects to which the theory of probability has been applied, and connected with the preceding, is the appreciation of the evidence of testimony. In matters of this kind, it is easy to see that the calculus must be founded almost entirely upon hypothetical data. The veracity of a witness can scarcely be made the subject of direct experiment; and by reason of the complicated circumstances with which the facts forming the subject of testimony are usually accompanied, and the numberless ways in which mankind are influenced by their passions, credulity, or ignorance, it is perhaps equally impossible to deduce an average value from the comparison of a great number of statements which have been ascertained to be true or false. Numerical results can therefore only be obtained by having recourse to hypotheses, and consequently must be considered as only probable approximations. The knowledge, however, which is thus obtained of the various combinations of the quantities concerned, affords important aid in guiding our judgments in complicated cases, and when we have to decide upon conflicting testimony. Approximations deduced from a train of accurate and systematic reasoning, are always to be preferred to the most specious arguments drawn from any other source.

The analysis of probability has been applied with signal advantage in many researches of Natural Philosophy, but especially in appreciating the mean errors of observations. Owing to the imperfections of sense and of instruments, physical magnitudes are only susceptible of being measured within certain limits of accuracy; and where the last degree of precision is indispensable, as in practical astronomy, it is only by means of a very great number of measures, compared with one another, and combined according to the methods which this calculus points out, that we can obtain the nearest approximation to the true values which the observations are capable of giving. The mean errors of observations were treated as questions of probability by Lagrange in the Turin Memoirs for 1773; but it is to Laplace that the theory owes its principal extension and most important results. The method of combining numerous equations of condition now universally followed, known as the method of minimum squares, and which Laplace has demonstrated to be that which leaves the least probable amount of error in the final equations, was made known by Legendre in an Appendix to his Nouvelles Méthodes pour la Détermination des Orbes des Comètes, published in 1806. A similar method, however, or rather the same, (for they are identical in principle,) had been discovered by Gauss, and employed by him for several years before the work of Legendre made its appearance.

Laplace's great work, the Théorie Analytique des Probabilités, first published in 1812, is one of the most remarkable productions that has ever appeared in abstract science. The principles of the calculus, as well as the peculiar methods of analysis which it requires, and the most interesting and difficult questions which it presents, are here discussed in a far more general manner than had been attempted by any former writer on the subject; and it may be said, accordingly, to have placed the theory under an entirely new aspect. It is much to be regretted that so little pains have been taken by the illustrious author to render the work intelligible to the generality of mathematical readers. Consisting for the greater part of separate memoirs presented at different times to the Academy of Sciences, arranged without regard to symmetry or order, it abounds with repetitions which only serve to embarrass the student; while the deficiency of explanation combined with the subtlety of the analysis, and the inherent intricacy of the subject, render it often a painfully difficult task to seize the force of the demonstrations. Notwithstanding these defects, however, it forms one of the most splendid creations of mathematical genius; and is alike admirable, whether we regard the extension which has been given to the calculus, or the results which have been arrived at, or the tone of lofty philosophy in which subjects bearing on some of the most important concerns of mankind are treated.

Next to the Théorie Analytique of Laplace, the most important work which has hitherto appeared on the subject of probability is the recent one of Poisson, entitled Recherches sur la Probabilité des Jugements, (Paris 1837.) Although it might be inferred from the title that this work relates only to a single though very interesting application of the theory, the greater part of it is devoted to the development and demonstration of the general principles, and the discussion of the principal questions which present themselves in the different applications; and it is only in the last of the five books of which it consists that the special subject to which the title refers is taken into consideration. In applying the theory to the decisions of tribunals, Condorcet and Laplace had been unable to obtain positive results from the want of authentic data; but the recent publication by the French government of the Comptes Généraux de l'Administration de la Justice Criminelle, in France, having furnished an immense collec-

---

1 Théorie Motus Corporum Coelestium, p. 221. Probability of facts from which the requisite data could be obtained, Poisson was led to consider the subject anew, and the results of his investigations, which are of singular interest, are given in the work now mentioned. Poisson had already given a theory of the mean errors of observations in the Additions to the Connaissance des Tems for 1827 and 1832.

It is in these two works of Laplace and Poisson that the higher and more abstruse parts of the theory of probabilities must be studied. A very clear exposition of the principles, accompanied with many interesting remarks on the uses and applications of the theory, is given by Lacroix in his valuable little work, Traité Élémentaire du Calcul des Probabilités, Paris 1822.

Since the time of De Moivre, the English treatises on the general theory of probability have neither been numerous, nor, with one or two exceptions, very important. Simpson's Laws of Chance (1740) contains a considerable number of examples, in the solution of which the author displays his usual acuteness and originality; but as they belong entirely to that class in which the chances are known a priori, they give no idea of the most interesting applications of the theory. Dodson's Mathematical Repository contains a large selection of the same kind. The Essay in the Library of Useful Knowledge, by Mr. Lubbock, gives a more comprehensive and philosophical, though an elementary view of the subject; but by far the most valuable work in the language is the Treatise in the Encyclopédie Metropolitana, by Professor De Morgan, 1837. In this very able production, Mr. De Morgan has treated the subject in its utmost generality, and embodied, within a moderate compass, the substance of the great work of Laplace.

Within the limits to which the present article must be confined, it would be hopeless to attempt giving a complete view of a branch of science which embraces so many complicated and intricate subjects of research, and which requires the aid of some of the most abstruse and recondite theories of the modern mathematics. In the higher applications of the theory, the analysis of many of the questions which arise, in order to be made intelligible, would require an extent of development and a parade of mathematical formulæ altogether incompatible with the plan and scope of this work. All that we can propose to ourselves, therefore, is to explain as briefly as may appear consistent with perspicuity, the general principles of the theory, and to give an outline of the manner in which these are applied to some of the more important questions which have been investigated by Laplace and Poisson. The examples will be selected with a view to show the nature of the principal results of the mathematical theory, as well as the peculiar methods of analysis which are of most general application.

**Sect. I. General Principles of the Theory of Probability.**

1. The term probable, in its popular acceptation, is used in reference to any unknown or future event, to denote that in our judgment the event is more likely to be true than not, or more likely to happen than not to happen. Without attempting to make an accurate enumeration of the various circumstances which are favourable or unfavourable to its occurrence, or to balance their respective influences, we suppose there is a preponderance on one side, and accordingly pronounce it to be probable that the event has occurred, or will occur, or the contrary.

2. If we can see no reason why an event is more likely to happen than not to happen, we say it is a chance whether the event will happen or not; or if it may happen in more ways than one, and we have no reason for supposing it will happen in any one of these ways rather than in another, we say it is a chance whether it will happen in any assigned way or in any other. Suppose, for example, an unknown number of balls of different colours to be placed in an urn, from which a ball is about to be extracted by a person blindfolded. Here we have no reason for supposing that the ball about to be drawn will be of one colour rather than another, that it will be white rather than black, or red; and accordingly say it is a chance whether the ball will come out of a particular colour, or a different. In this instance, then, the term chance denotes, simply, the absence of a known cause. If, however, we are made acquainted with the number of balls in the urn, and the number there are of each of the different colours, the term is used in a definite sense. For instance, suppose the urn to contain ten balls, of which nine are white, and the remaining one black, we say there are nine chances in favour of drawing a white ball, and one chance only in favour of drawing the black ball. Chance, in this sense, denotes a way of happening, or a particular case or combination that may arise out of a number of other possible cases or combinations; and an event becomes probable or improbable according as the number of chances in its favour is greater or less than the number against it. Chance and presumption are also frequently used synonymously with probability.

3. The mathematical probability of any event is the ratio of the number of ways in which that event may happen to the whole number of ways in which it may either happen or fail. Thus, recurring to the previous example, the event, namely, the drawing of a ball from an urn containing 9 white balls and 1 black, may happen in 10 different ways, inasmuch as any one of the 10 balls may be drawn; but in one only of those ways will the event be a black ball; and therefore the probability of drawing the black ball is $\frac{1}{10}$. In like manner, as there are 9 different ways in which a white ball may be drawn, or 9 chances of drawing a white ball, and ten chances in all, the probability of drawing a white ball at the first trial is $\frac{9}{10}$. It follows immediately from this definition, that the probability of drawing a ball of either colour will remain the same, however the number of balls in the urn may be increased, provided those of each colour are increased in the same proportion. For instance, suppose the number of white balls to be 45, and the number of black balls to be 5; the number of chances in favour of drawing a black ball is 5, while there are 50 chances in all, consequently the probability of a black ball being drawn is $\frac{5}{50} = \frac{1}{10}$. In the same manner, the probability of drawing a white ball is $\frac{45}{50} = \frac{9}{10}$; the same as before. Generally, let $E$ and $F$ be two contrary events, that is to say, such that the one or the other of them must necessarily happen, and both cannot happen together; and let $a$ be the number of chances or combinations which produce the event $E$, and $b$ be the number of combinations which produce the event $F$, or cause the failure of $E$; then the probability that $E$ will happen is $\frac{a}{a+b}$; and the probability that $F$ will happen, or that $E$ will not happen is $\frac{b}{a+b}$. In future, the term probability will be used only to signify mathematical probability.

4. It is to be carefully remarked, that the different chances or combinations which form the elements of probability are supposed to be perfectly equal. If this equality does not hold, and there is any circumstance respecting the event under consideration which renders one combination or set of combinations more likely to occur than another, the different combinations must be multiplied by numbers proportional to their respective facilities, after which the units in each multiplier may be regarded as so many distinct chances, from which the probability of the event will be found by the above formula. This is equivalent to saying that a combination or chance which is twice as likely to happen as another, must be regarded as two equal and similar combinations in comparison of that other; a proposition which is sufficiently obvious. 5. It follows from the above definition, that the probability of any contingent event is measured by a fraction less than unity, and may have any value between 0 and 1. It follows, also, that the sum of the two fractions which measure the probabilities of two contrary events is equal to unit, which is the measure of certainty, inasmuch as either the one or the other necessarily occurs. Thus, in the last example, the probability of the event $E$ is $\frac{a}{a+b}$, and that of the contrary event $F$ is $\frac{b}{a+b}$; and $\frac{a}{a+b} + \frac{b}{a+b} = 1$. Hence if $p$ denote the probability of any event $E$, and $q$ the probability of the contrary event $F$, we have $q=1-p$. This consequence of the definition is of great importance in the calculation of probabilities.

6. We have here supposed the result of a trial to be necessarily one or other of two events $E$ and $F$; but it is easy to imagine the trial to be of such a kind that it may give rise to any one of a number of events $E$, $F$, $G$, $H$, &c., each having a given number of chances in its favour. This case is represented by supposing an urn to contain balls of as many different colours or sorts as there are different events. Let the urn be conceived to contain $a$ balls of the sort which produces the event $E$, $b$ of the sort which produces $F$, $c$ of the sort which produces $G$, and so on; and let $a + b + c + d$, &c. = $k$, so that $k$ is the whole number of balls in the urn. The probabilities of the different events $E$, $F$, $G$, $H$, &c. are then, respectively, by the definition,

$$\frac{a}{k}, \frac{b}{k}, \frac{c}{k}, \frac{d}{k}, \text{&c.}$$

the sum of which = 1. In fact, if a ball be drawn at all, it must be of one or other of the different sorts contained in the urn; and consequently the sum of all the probabilities amounts to unit or certainty.

7. When an event is compounded of two or more simple events independent of each other, the probability of the compound event is equal to the product of the probabilities of the several simple events of which it is compounded. Let us imagine two urns, $A$ and $B$, of which $A$ contains $a$ white balls and $b$ black, and $B$ contains $a'$ white and $b'$ black. Make $a + b = c$, and $a' + b' = c'$, and let the compound event whose probability is to be determined be the drawing of a white ball from both urns. Now, as each of the $c$ balls in $A$ may be drawn with any one of the $c'$ balls in $B$, the whole number of ways in which the balls in $A$ may be differently combined by pairs with the balls in $B$, or the whole number of possible cases is $cc'$. But the number of cases favourable to the compound event is evidently the number of different ways in which a white ball may be drawn from $A$ with a white ball from $B$, and therefore equal to $aa'$. Hence by the definition (4), the probability that a white ball will be drawn from both urns is $\frac{aa'}{cc'}$. Now, if $p$ denote the probability of drawing a white ball from $A$, and $p'$ that of drawing a white ball from $B$, we have by the definition $p = \frac{a}{c}$, and $p' = \frac{a'}{c'}$; whence $\frac{aa'}{cc'} = pp'$.

In general, let $p$ denote the probability of an event $E$, $p'$ that of another event $E'$, $p''$ that of a third $E''$, and so on; then the probability of the concourse of the events $E$, $E'$, $E''$, &c., or the probability that they will all happen, is $p \times p' \times p'' \times \text{&c.}$; that is to say, the probability of an event compounded of any number of simple and independent events, is the product of the respective probabilities of the several simple events.

The probabilities that the several simple events $E$, $E'$, $E''$, &c., will not all happen, or that some of them will happen and others fail, are easily determined in the same manner; it will be sufficient to indicate their several expressions. Suppose there are only three simple events, of which the probabilities are respectively $p$, $p'$, and $p''$; and let $q = 1 - p$, $q' = 1 - p'$, $q'' = 1 - p''$. The product $pp'q''$ expresses the probability of the compound event which consists in $E$ happening and $E'$ and $E''$ both failing; $qp'q''$ is the probability that $E'$ will happen, and that $E$ and $E''$ will both fail; $1 - pp'p''$ is the probability they will all three happen; or that one of them at least will fail; $qq'q''$ is the probability they will all fail; and $1 - qq'q''$ is the probability they will not all three fail, or that one at least of them will happen.

8. As an example of the application of this rule, suppose it were required to assign the probability of throwing aces, at one throw, with two common dice. As a common die has six symmetrical faces, there are in respect of each die six ways equally possible, in which the simple event may happen. The probability therefore of throwing ace with one die is $\frac{1}{6}$, that is, $p = \frac{1}{6}$. In respect of the second die, we have also $p' = \frac{1}{6}$; hence the probability of the compound event, or that aces will be thrown is $pp' = \frac{1}{6} \times \frac{1}{6} = \frac{1}{36}$. The probability that aces will not be thrown at any assigned trial is therefore $(1 - pp') = \frac{35}{36}$; and the odds against throwing aces at any given trial are 35 to 1.

Again, suppose two numbers, each consisting of 7 digits, to be taken at random, (for instance from a table of logarithms), and let it be proposed to assign the probability that the subtraction of the one from the other will be performed without its being necessary, in any case, to increase the upper figure. Here, as each digit may have any one of the ten values from 0 to 9 both inclusive, and as each of those values in the upper line may be combined with any one of them in the lower line, there are 100 different combinations or equally possible cases for each partial subtraction. Now, if the upper figure be 0, there is only one of those cases favourable to the event, or which will admit of the subtraction being performed, namely, when the figure below is also 0. If the upper figure be 1, there are two cases favourable, namely, those in which the under figure is 0 or 1. If the upper figure be 2, there are three favourable cases, namely, when the under figure is 0, 1, or 2. Proceeding in this way through all the digits, the whole number of favourable cases is found to be

$$1+2+3+4+5+6+7+8+9+10=55.$$

Hence, for each partial subtraction there are 55 favourable cases out of 100 possible cases; therefore (4) the probability that any one of the figures in the upper line is not less than the corresponding figure in the under line is $\frac{55}{100}$, and we have $p = p' = p'' = \text{&c.} = \frac{55}{100}$ for the probability of each of the seven simple events or partial subtractions, whence, by (7), the probability of the compound event is

$$p \times p' \times p'' \times \text{&c.} = \left(\frac{55}{100}\right)^7 = 0.152243,$$

which is less than $\frac{3}{20}$, and greater than $\frac{1}{10}$.

9. When an event may happen in several different ways, each independent of the others, the probability of the event is the sum of all the partial probabilities taken in respect of each of the different ways.

Suppose there are $n$ different urns $A_1$, $A_2$, $A_3$, ..., $A_n$, each containing balls of two colours, white and black, and let the whole number of balls in each urn respectively, be $e_1$, $e_2$, $e_3$, ..., $e_n$, and the number of white balls in each be $a_1$, $a_2$, $a_3$, ..., $a_n$, and let the event $E$ be the extraction of a white ball in drawing a ball from any urn at random. In this case there are $n$ different ways, all equally probable, in which the event may happen, for it may be drawn with equal facility from any one of the urns. The probability that the ball will be Probability drawn from any given urn, \( A_1 \), is therefore \( \frac{1}{n} \); and if it be drawn from this urn, the probability of its being white is \( \frac{a_1}{e_1} \); therefore, by (7), the probability of a white ball being drawn from \( A_1 \) is \( \frac{1}{n} \cdot \frac{a_1}{e_1} \). In like manner the probability of a white ball being drawn from \( A_2 \) is shown to be \( \frac{1}{n} \cdot \frac{a_2}{e_2} \); from \( A_3 \) to be \( \frac{1}{n} \cdot \frac{a_3}{e_3} \), and so on. Denoting therefore by \( p \) the whole probability of the event \( E \), the proposition affirms that

\[ p = \frac{1}{n} \left( \frac{a_1}{e_1} + \frac{a_2}{e_2} + \frac{a_3}{e_3} + \cdots + \frac{a_n}{e_n} \right). \]

To prove this, let the fractions \( \frac{a_1}{e_1}, \frac{a_2}{e_2}, \ldots \) be reduced to a common denominator, and suppose the equivalent fractions to be

\[ \frac{a_1}{y}, \frac{a_2}{y}, \frac{a_3}{y}, \ldots, \frac{a_n}{y}. \]

We may now conceive the urns \( A_1, A_2, A_3, \ldots, A_n \) to be replaced by others, each containing the same number, \( y \), of balls, and of which the first contains \( a_1 \) white balls, the second \( a_2 \), and so on; and it is evident that the chance of a white ball being drawn from this new system of urns will be precisely the same as it was for a white ball being drawn from the first system. Now the probability of drawing a white ball from the new system will not be altered by placing the whole of the \( n \) balls in a single urn, for they may still be conceived as arranged in groups, disposed in any manner whatever, each group containing the same number of balls, and the same proportion of white to black as were in the separate urns; and as each group contains the same number of balls, the chance of laying the hand on any one group is the same as that of laying it on any other. The probability of drawing a white ball from the single urn, is therefore the same as for drawing it from the group of separate urns which contain each the same number of balls. But the probability of drawing it from the single urn is the ratio of the number of white balls contained in the urn to the number of both colours, therefore (this probability being \( p \)) we have

\[ p = \frac{1}{ny} \left( a_1 + a_2 + a_3 + \cdots + a_n \right); \]

whence, substituting for \( \frac{a_1}{y}, \frac{a_2}{y}, \ldots \), their respective values, \( \frac{a_1}{e_1}, \frac{a_2}{e_2}, \ldots \), we have

\[ p = \frac{1}{n} \left( \frac{a_1}{e_1} + \frac{a_2}{e_2} + \frac{a_3}{e_3} + \cdots + \frac{a_n}{e_n} \right). \]

As a particular case suppose three urns \( A, B, C \) to be placed together, of which \( A \) contains 2 white balls and 1 black; \( B \) 3 white balls and 2 black, and \( C \) 4 white and 3 black, and let it be required to determine the probability \( p \) of a white ball being drawn from the group by a person who is ignorant of the contents of the different urns. As there is no reason for selecting one urn in preference to another, the probability that he will put his hand into the urn \( A \) is \( \frac{1}{3} \); and if he draw from this urn the probability that a white ball will be drawn is \( \frac{2}{3} \), there being 2 cases favourable to that event, and 3 cases in all. The probability of both events is therefore \( \frac{1}{3} \times \frac{2}{3} = \frac{2}{9} \). In like manner, the probability of the ball being drawn from \( B \) is \( \frac{1}{3} \); and if drawn from \( B \) the probability of its being white is \( \frac{3}{5} \); therefore, the probability of this compound event is \( \frac{1}{3} \times \frac{3}{5} = \frac{1}{5} \). Lastly, the probability of the ball being drawn from \( C \) is \( \frac{1}{3} \); and if drawn from \( C \) the probability of its being white is \( \frac{4}{7} \); therefore, the probability of this compound event is \( \frac{1}{3} \times \frac{4}{7} = \frac{4}{21} \). Hence,

by the proposition now demonstrated, the complete probability of the event \( E \) is

\[ p = \frac{2}{9} + \frac{1}{5} + \frac{4}{21} = \frac{1}{3}. \]

If all the balls had been placed in a single urn, the probability of drawing a white ball would have been \( \frac{1}{3} \), for there are 3 + 5 + 7 = 15 balls in all, of which 2 + 3 + 4 = 9 are white. But \( \frac{1}{3} = \frac{1}{3} \); a fraction which differs sensibly from \( \frac{1}{3} \), the measure of the probability of the same event when the balls are distributed in the manner above supposed amongst the different urns. The distinction between the two cases is important.

10. The rule laid down in (7) for finding the probability of a compound event applies alike whether the simple events are determined simultaneously or in succession. In fact, when the simple events are entirely independent of each other, the chances which determine the compound event are not influenced in any way by the intervention of time. Suppose, for example, the compound event to be the throwing of a certain number of points with a given number \( m \) of dice; the chances for and against the event are obviously the same whether the \( m \) dice are thrown at once, or a single die is thrown \( m \) times successively. But as the determination of the probability of a compound event is in general facilitated by supposing the simple events to be decided one after the other, it will be convenient to view the subject in this light in explaining the method of forming the different combinations of the chances by which the probabilities of compound events are determined.

**Sect. II. Of the Probability of Events Depending on a Repetition of Trials, or Compounded of Any Number of Simple Events, the Chances in Respect of Which Are Known a Priori, and Constant.**

11. Suppose an urn to contain \( a + b \) balls, \( a \) white and \( b \) black, and let a ball be successively drawn, and replaced in the urn after each drawing, in order that the chances in favour of drawing a ball of either colour may be the same in every trial, and let it be required to find the respective probabilities of the different possible results of any number of drawings.

Let us first suppose the number of trials to be two. The event may happen in any of these four different ways: first white, second white; first white, second black; first black, second white; first black, second black. Assuming \( W \) to represent the simple event which consists in the drawing of a white ball, and \( B \) that of a black ball, and supposing the order of the arrangement of the two letters to correspond with the order of succession of the simple events, the four possible cases or combinations will be represented thus:

- WW, WB, BW, BB.

Now let the probability of drawing a white ball in any trial be \( p \), and that of drawing a black ball be \( q \) (whence, \( p = \frac{a}{a+b}, q = \frac{b}{a+b} \)) the probabilities of the four possible compound events are by (7) respectively as under:

- probability of WW = \( p \times p = p^2 \) - probability of WB = \( p \times q = pq \) - probability of BW = \( q \times p = pq \) - probability of BB = \( q \times q = q^2 \)

If we disregard the order of succession, and consider the two arrangements WB and BW, which are equally probable, as forming the same compound event, namely, a ball of both colours in the two trials, the probability of this event, by (9), becomes \( 2pq \). The sum of the probabilities of all the possible arrangements is therefore

\[ p^2 + 2pq + q^2 = (p + q)^2; \]

whence it appears that the probabilities of the different arrangements in two trials are respectively the terms of the development of the binomial, \( (p+q)^2 \).

Let us next suppose the number of trials to be three. The different arrangements that may be formed of the simple events in three trials, with the probability of each respectively, are as follows:

WWW, probability of which \(=ppp=p^3\) WWB, \(=ppq=p^2q\) WBW, \(=pq^2=q^2p\) BWW, \(=qqp=q^2p\) WBB, \(=ppq=p^2q\) BBW, \(=pq^2=q^2p\) BBB, \(=qqq=q^3\)

It thus appears that the probability of obtaining two events of one kind, and one of the other, is the same in whatever order they succeed each other, and, in fact, is independent of the order. Disregarding, then, the order of succession, and considering the combination of two white balls with one black, in whatever order they may be arranged, as the same compound event, the probability of its occurrence in any order whatever, being the sum of its probabilities in each particular order (9), is \(3pq^2\). In like manner, regarding the combination of two black balls with one white, in any order of arrangement, as the same compound event, its probability is \(3pq^2\). The compound event resulting from three trials must then happen in one of four different ways, namely, 3 white balls; 2 white, combined with 1 black, in any order; 2 black, combined with one white, in any order; or, lastly, 3 black; and the sum of the probabilities of these different cases is

\[p^3 + 3p^2q + 3pq^2 + q^3 = (p+q)^3.\]

Hence the probabilities of all the different possible combinations in three trials are respectively given by the development of the binomial \((p+q)^3\).

12. In general, let \(p\) denote the probability of any simple event E, then the probability of E happening twice in two trials is \(p^2\), of happening thrice in 3 trials \(p^3\), and of happening \(m\) times in \(m\) successive trials, \(p^m\). In like manner, the probability of the contrary event F being \(q\) \((p+q=1)\), the probability of F happening \(n\) times in \(n\) successive trials is \(q^n\). Hence (7) the probability of E happening \(m\) times, and then F happening \(n\) times in succession, in \(m+n\) trials, is \(p^m q^n\). But the probability of these events happening in any assigned order is the same as that of their happening in any other assigned order; therefore \(p^m q^n\) is the measure of the probability that E will occur \(m\) times, and F will occur \(n\) times in a determinate order. Now, let \(m+n=h\), and let U be the number of different ways in which \(m\) events E, and \(n\) events F, can be combined in \(h\) trials, and P be the probability of any one of these combinations whatever, or the probability of E occurring \(m\) times, and F occurring \(n\) times in \(h\) trials, without regard to the order in which they succeed each other, we have then

\[P = Up^m q^n.\]

In order to determine the value of U, we may suppose the events in question to be so many different things represented by the letters A, B, C, D, E, &c., of which there are \(m\) of one kind, and \(n\) of another, and make \(m+n=h\); then by the algebraic theory of combinations, we have

\[U = \frac{1 \cdot 2 \cdot 3 \cdots \cdots \cdots h}{1 \cdot 2 \cdot 3 \cdots \cdots \cdots m \times 1 \cdot 2 \cdot 3 \cdots \cdots \cdots n}.\]

This value of U is symmetrical in respect of \(m\) and \(n\), and may be otherwise written in either of the two following forms,

\[U = \frac{h(h-1)(h-2) \cdots \cdots \cdots h-n+1}{1 \cdot 2 \cdot 3 \cdots \cdots \cdots m},\] \[U = \frac{h(h-1)(h-2) \cdots \cdots \cdots h-n+1}{1 \cdot 2 \cdot 3 \cdots \cdots \cdots n},\]

which show that the probability \(P\), or the product \(Up^m q^n\) is the \((m+1)\)th term of the development of the binomial \((p+q)^h\) arranged according to the increasing powers of \(p\), or the \((n+1)\)th term of the same development arranged according to the increasing powers of \(q\). Hence Probability we conclude that when \(p\) and \(q\) remain constant, the probabilities of all the different compound events which can be formed by the combination of the simple events E and F in \(h\) trials, are expressed by the different terms of the formula \((p+q)^h\) expanded by the binomial theorem.

The whole number of possible cases is evidently \(h+1\), for in \(h\) experiments, E may occur \(h\) times, \(h-1\) times, \(h-2\) times, \(\ldots\), \(h-h\) times; this last being the case in which the contrary event F occurs in all the trials. The different cases are unequally probable, both by reason of the greater or smaller number of combinations by which they may be produced, and which in reference to each case is represented by U, and by reason of the inequality between \(p\) and \(q\). It will be shewn afterwards, that when \(p=q\), and \(h\) is a whole number, the most probable case is that in which the occurrences of E and F are equal; and if \(h\) is an odd number, the two most probable cases are those in which the difference in the number of occurrences of E and the number of occurrences of F is unity.

13. In order to place the proposition now demonstrated in a clearer light, let us consider separately the different terms of the development of \((p+q)^h\), namely,

\[p^h + hp^{h-1}q + \frac{h(h-1)}{1 \cdot 2} p^{h-2}q^2 + \cdots \cdots \cdots + \frac{h(h-1)(h-2) \cdots \cdots \cdots h-n+1}{1 \cdot 2 \cdot 3 \cdots \cdots \cdots n} p^{h-n}q^n + \cdots \cdots \cdots + q^h.\]

The first term \(p^h\) expresses the probability that the event E will in every one of the \(h\) trials. The second term \(hp^{h-1}q\) expresses the probability that E will occur \(h-1\) times, and F once, without distinction of order; that is to say F may happen at the first or last or any intermediate trial. If a determined succession is proposed, for example, that of \(h-1\) times the event E in succession, and F in the next trial, the probability of the event in the assigned order, is found by suppressing the coefficient \(h\), and is consequently \(p^{h-1}q\).

The third term \(\frac{h(h-1)}{1 \cdot 2} p^{h-2}q^2\) expresses the probability that the result of \(h\) trials will be \(h-2\) times the event E, and twice the event F, without distinction of order. If a particular order be assigned, it is necessary to suppress the coefficient, and the probability of the simple events occurring in that particular order is \(p^{h-2}q^2\).

The general term \(\frac{h(h-1)(h-2) \cdots \cdots \cdots h-n+1}{1 \cdot 2 \cdot 3 \cdots \cdots \cdots n} p^{h-n}q^n\) expresses the probability that the result of \(h\) trials will be \((h-n)\) times the event E, and \(n\) times the event F in any order. The probability of \((h-n)\) times E and \(n\) times F in an assigned order is \(p^{h-n}q^n\).

14. If we suppose the event E to be such that the chances in favour of its happening or failing are equal, that is, if \(p=q=\frac{1}{2}\), the different terms of the binomial \((p+q)^h\), on suppressing the coefficients, become all equal; so that a particular order being assigned in each of the possible cases or combinations, all the cases become equally probable. Thus, suppose a shilling to be tossed 100 times in succession, the probability of head turning up in every trial is \(\left(\frac{1}{2}\right)^{100}\). The probability of 50 heads and 50 tails in any assigned order is \(\left(\frac{1}{2}\right)^{50} \times \left(\frac{1}{2}\right)^{50} = \left(\frac{1}{2}\right)^{100}\); if \(m+n=100\), the probability of \(m\) heads and \(n\) tails is also \(\left(\frac{1}{2}\right)^{m} \times \left(\frac{1}{2}\right)^{n} = \left(\frac{1}{2}\right)^{m+n} = \left(\frac{1}{2}\right)^{100}\). Hence the probability of any compound event formed by the combination of two simple contrary events succeeding each other in an assigned order, and each having the same probability, is independent of the ratio of the simple events, and depends only on the number of trials. Before the trials, it is an even wager that head will be turned up in succession 100 times, and that the result of 100 trials will be 50 heads and 50 tails in a given order of succession, or any proportion Probability of heads to tails in an order arbitrarily chosen. This consideration is frequently lost sight of in reasoning about those events of the natural world, which are termed extraordinary and miraculous. If in tossing a shilling 100 times into the air, the number of heads turned up is found nearly equal to the number of tails, the event excites no surprise; something like it was expected. On the contrary, if the difference between the number of heads and the number of tails is considerable, the event is termed extraordinary; and if head turned up in every trial without exception, we should scarcely be persuaded that such an event was entirely the result of chance, and independent of a special cause. Nevertheless, the *a priori* probability that every trial will give head, is precisely the same as the probability of throwing any given number of heads and tails in an assigned order of succession. It will, however, be proved afterwards, that if such an event as throwing head 100 times in succession were actually observed, the probability of a special cause having intervened, would approach very nearly to certainty.

15. Hitherto we have supposed the compound event to be formed by the combination of two simple events only, E and F, one of which necessarily excludes the other. Let us now suppose there are any number of simple events, E₁, E₂, E₃, &c., of which the respective probabilities are p₁, p₂, p₃, &c., and such that one or other of them necessarily happens in each trial, so that p₁ + p₂ + p₃ + &c. = 1, and determine the probability of any assigned combination of them in a given number of trials. This case may be represented by supposing an urn to contain a number of balls of as many different colours as there are distinct events; the event E₁ will be the drawing of a ball of the colour i, and its probability p₁ will be the fraction whose numerator is equal to the number of balls of the colour i, and denominator the whole number of balls in the urn. Now the probability of the event E₁ happening m times in succession is p₁ᵐ (by (12)); that of E₂ happening n times in succession is p₂ⁿ; that of E₃ happening r times in succession p₃ʳ; and so on. Therefore (7) the probability of the compound event which is formed by the occurrence of m times E₁, n times E₂, r times E₃, and so on, these events succeeding each other in order, is the product p₁ᵐ p₂ⁿ p₃ʳ, &c. But the probability of the simple events succeeding each other in any particular order is the same as that of their succeeding in any other assigned order (12); consequently, if U denote the number of different ways in which m events E₁, n events E₂, r events E₃, &c. can be combined, or succeed each other, and P' be the probability of the compound event in any order whatever, we have,

\[ P' = U'p₁ᵐ p₂ⁿ p₃ʳ, \text{&c.} \]

Assuming h = m + n + r + &c., we have also by the theory of combinations,

\[ U' = \frac{1}{1 \cdot 2 \cdot 3 \cdot \ldots \cdot h} \times \frac{m!}{1 \cdot 2 \cdot 3 \cdot \ldots \cdot m} \times \frac{n!}{1 \cdot 2 \cdot 3 \cdot \ldots \cdot n} \times \frac{r!}{1 \cdot 2 \cdot 3 \cdot \ldots \cdot r} \times \text{&c.} \]

the factor U' being the coefficient of the term which has for its multiplier p₁ᵐ p₂ⁿ p₃ʳ, &c. in the expansion of the multinomial (p₁ + p₂ + p₃ + &c.)ⁿ, whence

\[ P' = \frac{1}{1 \cdot 2 \cdot 3 \cdot \ldots \cdot h} \times \frac{m!}{1 \cdot 2 \cdot 3 \cdot \ldots \cdot m} \times \frac{n!}{1 \cdot 2 \cdot 3 \cdot \ldots \cdot n} \times \frac{r!}{1 \cdot 2 \cdot 3 \cdot \ldots \cdot r} \times \text{&c.} \]

We shall now proceed to give some examples of the applications of the preceding formula.

16. Let it be proposed to assign the probability P, of throwing ace once, and not oftener, in four successive throws of the same die.—Simpson, p. 15.

Here, the chance of throwing ace in a single trial being \( \frac{1}{6} \), we have \( p = \frac{1}{6} \), and consequently \( q = \frac{5}{6} \), and also \( h = 4 \).

Now the compound event being the occurrence of the simple event E₁, whose probability is p, once, and of the contrary event F three times, the probability of the compound event is that term of the development of \((p+q)^4\) which is multiplied by \(pq^3\). If, therefore, in the formula,

\[ P = \frac{1}{1 \cdot 2 \cdot 3 \cdot \ldots \cdot m} \times \frac{m!}{1 \cdot 2 \cdot 3 \cdot \ldots \cdot m} \times \frac{n!}{1 \cdot 2 \cdot 3 \cdot \ldots \cdot n} \times \text{&c.} \]

we make \( p = \frac{1}{6}, q = \frac{5}{6}, h = 4, m = 1, n = 3 \), we shall have

\[ P = \frac{1}{1 \cdot 2 \cdot 3 \cdot 4} \times \frac{1}{6} \times \left( \frac{5}{6} \right)^3 = \frac{125}{324} \]

which is the probability required, and the same as that of throwing one ace, and not more than one, at a single throw with 4 dice.

The probability of the contrary event, that is to say, the probability of either not throwing an ace at all, or of throwing more aces than one is \(1 - \frac{125}{324} = \frac{199}{324}\); and therefore the odds against throwing one ace and no more in 4 throws of a common die are 199 to 125, or 8 to 5 very nearly.

17. If in this example it had been proposed to assign the probability of throwing ace once at least, instead of once and not more, it would have been necessary to have included those cases in which the ace occurs twice, or three times, or in each of the four trials. The binomial \((p+q)^4\) gives

\[ p^4 + 4pq^3 + 6p^2q^2 + 4pq^3 + q^4 \]

the first term of which expresses the probability of throwing ace four times in succession; the second that of throwing ace three times, and another number once; the third that of throwing ace twice, and a different face twice; the fourth that of throwing ace once, and a different face three times; and the fifth that of throwing a different face in each of the four trials. But as every one of these compound events, excepting the last, satisfies the condition of ace being thrown once at least, the whole probability of that event must be the sum of the probabilities of the different events by which it may be produced (9) and is consequently

\[ \left( \frac{1}{6} \right)^4 + 4 \left( \frac{1}{6} \right)^3 \left( \frac{5}{6} \right) + 6 \left( \frac{1}{6} \right)^2 \left( \frac{5}{6} \right)^2 + 4 \left( \frac{1}{6} \right) \left( \frac{5}{6} \right)^3 + \left( \frac{5}{6} \right)^4 = \frac{671}{1296} \]

In general, the sum of the first \(n+1\) terms of \((p+q)^n\) expresses the probability of obtaining not less than \(h-n\) events, the probability of each of which is \(p\), or not more than \(n\) contrary events, the probability of each of which is \(q\).

Since \(p+q=1\), the sum of all the terms of the series produced by the expansion of \((p+q)^n\) is equal to unit, and therefore the sum of any number of the terms is equal to unit diminished by the sum of the remaining terms. This consideration frequently gives the means of abridging the calculations. Thus, in the preceding example, instead of expanding the binomial \((\frac{1}{6} + \frac{5}{6})^4\) in order to find the probabilities of throwing 4 aces, 3 aces, 2 aces, and 1 ace only, in a series of 4 trials, we might have sought the probability of not throwing ace at all. The probability of not throwing ace in a single trial is \(q = \frac{5}{6}\), and therefore (7) that of not throwing it in 4 trials is \(\left( \frac{5}{6} \right)^4 = \frac{625}{1296}\). Hence the probability of the contrary event, namely, that ace will be thrown once or oftener, is \(1 - \frac{625}{1296} = \frac{671}{1296}\); the same as before.

18. Let a shilling be tossed; what is the probability that more than 3 heads will turn up in the first 10 trials? In this case, \(p = \frac{1}{2}, q = \frac{1}{2}, h = 10\); therefore \((p+q)^{10} = (\frac{1}{2} + \frac{1}{2})^{10} = (\frac{1}{2})^{10}(1+1)^{10}\). Now the last term of this development expresses the probability that head will not turn up in any one of the ten trials; the last but one, the probability that it will turn up once; the last but two, the probability that it will turn up twice; and the last but three, the probability that it will turn up three times; therefore the four last terms include all the different ways in which the ten trials give not more than three heads; and their sum consequently expresses the probability that not more than 3 heads will be thrown. Now the last four (or first four) terms of the expansion of \((1+1)^{10}\) are

\[ 1, 10, \frac{10}{1 \cdot 2}, \frac{10}{1 \cdot 2 \cdot 3} \]

and their sum is 176, which multiplied by \((\frac{1}{2})^{10} = \frac{1}{1024}\), gives \(\frac{176}{1024}\), for the probability that not more than 3 heads will turn up; whence the probability of the contrary event Probability or that more than 3 heads will be thrown, is \(1 - \frac{1}{64} = \frac{63}{64}\); and the odds in favour of throwing heads more than three times in 10 trials are 53 to 11.

19. A and B engage in play; the probability of A's winning a game is \(p\), and the probability of B's winning a game is \(q\); required the probability \(P\) of A's winning \(m\) games before B wins \(n\) games, the play being supposed to terminate when either of those events has occurred.

It is evident that the question must be decided at the latest, by the \((m+n-1)\)th game; for supposing \(m+n-2\) games have been played, there is only one combination according to which the match can remain undecided, namely, that in which A has won \(m-1\), and B \(n-1\) games; and in this case the next game necessarily decides the match.

Suppose \(m+x\) games have been played. The probability that of these games \(m\) have been won by A, and \(x\) by B, is represented by the term of the binomial \((p+q)^{m+x}\) in which the factor \(p^m q^x\) occurs (13); which term is

\[ \frac{1 \cdot 2 \cdot 3 \cdots \cdots m+x}{1 \cdot 2 \cdot 3 \cdots \cdots m \times 1 \cdot 2 \cdot 3 \cdots \cdots x} p^m q^x. \]

But A cannot win \(m\) games out of \(m+x\) exactly unless he wins the last game, for otherwise he must have won \(m\) games out of \(m+x-1\), if not out of a smaller number. In order therefore that A may win \(m\) games out of \(m+x\) exactly, it is necessary in the first place that he wins \(m-1\) out of \(m+x-1\) in any order, and then that he wins also the next game. Now the probability of his winning \(m-1\) games out of \(m+x-1\) in any order (13) is

\[ \frac{1 \cdot 2 \cdot 3 \cdots \cdots m+x-1}{1 \cdot 2 \cdot 3 \cdots \cdots m-1 \times 1 \cdot 2 \cdot 3 \cdots \cdots x} p^{m-1} q^x; \]

and the probability of his winning the following game is \(p\), whence the probability of both events is (7)

\[ \frac{1 \cdot 2 \cdot 3 \cdots \cdots m+x-1}{1 \cdot 2 \cdot 3 \cdots \cdots m-1 \times 1 \cdot 2 \cdot 3 \cdots \cdots x} p^m q^x, \]

which, therefore, expresses the probability of A's winning \(m\) games out of \(m+x\) exactly.

If we suppose \(x=0\), this formula becomes \(p^m\), which is the probability of A's winning \(m\) games in succession. If \(x=1\), it becomes \(mp^m q\), the probability that A wins \(m\) games out of \(m+1\). If \(x=2\), it becomes \(\frac{m(m+1)}{1 \cdot 2} p^m q^2\), the probability that A wins \(m\) games out of \(m+2\). If \(x=3\), it becomes \(\frac{m(m+1)(m+2)}{1 \cdot 2 \cdot 3} p^m q^3\), the probability that A wins \(m\) games out of \(m+3\); and so on. Continuing this process till we arrive at the term multiplied by \(p^m q^x\), the sum of the probabilities of all the different compound events is

\[ p^m \left\{ \frac{1 + mq + \frac{m(m+1)}{1 \cdot 2} q^2 + \cdots + \frac{m(m+1) \cdots m+x-1}{1 \cdot 2 \cdots x} q^x}{1 \cdot 2 \cdots x} \right\}, \]

which expresses the probability of A's winning \(m\) games out of a number not greater than \(m+x\).

Now it has been shewn, that the match is necessarily decided by \((m+n-1)\) games; consequently the solution of the question is obtained by substituting \(n-1\) for \(x\) in the last formula, which will then express the probability of A's winning \(m\) games in any order, out of a number not greater than \(m+n-1\). On making this substitution, we obtain

\[ P = p^m \left\{ \frac{1 + mq + \frac{m(m+1)}{1 \cdot 2} q^2 + \cdots + \frac{m(m+1) \cdots m+n-2}{1 \cdot 2 \cdots n-1} q^{n-1}}{1 \cdot 2 \cdots n-1} \right\}. \]

The probability \(Q\) that the match will be decided in favour of B, or that B will win \(n\) games out of a number not greater than \(m+n-1\), is found by changing \(m\) into \(n\), and \(p\) into \(q\), and is therefore

\[ Q = q^n \left\{ \frac{1 + np + \frac{n(n+1)}{1 \cdot 2} p^2 + \cdots + \frac{n(n+1) \cdots n+m-2}{1 \cdot 2 \cdots m-1} p^{m-1}}{1 \cdot 2 \cdots m-1} \right\}. \]

As an example, let us suppose \(p = \frac{2}{3}, q = \frac{1}{3}, m = 4,\) and \(n = 2\). The probability of A's winning the match, or the value of \(P\), becomes

\[ \left( \frac{2}{3} \right)^4 \left\{ \frac{1 + 4 \cdot \frac{1}{3}}{1 \cdot 2 \cdot 3} \right\} = \frac{112}{243}; \]

and the probability of B's winning the match, or the value of \(Q\),

\[ \left( \frac{1}{3} \right)^2 \left\{ \frac{1 + 4 \cdot \frac{1}{3} + 2 \cdot 3 \cdot \frac{2}{3} + 2 \cdot 3 \cdot 4 \cdot \frac{2}{3}}{1 \cdot 2 \cdot 3} \right\} = \frac{131}{243}. \]

In this example the skill of A is supposed to be twice as great as that of B, and the number of games that must be won by him in order to gain the match is also twice as great as the number required to be won by B in order that B may gain; one might therefore suppose, that when they begin to play the chances in favour of each are equal. But the result shows that the chances in favour of A are fewer than those in favour of B in the proportion of 112 to 131; whence it appears that it would be unsafe to wager that a player who has two chances in his favour while his adversary has only one, will gain four games before his adversary shall have gained two.

Suppose A and B, engaged in play, agree to leave off before the match is decided, it is evident that the stakes ought to be shared between them in proportion to their respective probabilities of winning, and consequently the share of each is found from either of the above expressions for \(P\) and \(Q\). This was one of the questions proposed by the Chevalier de Méré to the celebrated Pascal, to which allusion has already been made.

20. An urn contains \(m+1\) balls, marked with the numbers 0, 1, 2, 3, ..., \(m\); a ball is successively drawn and replaced in the urn, so that the chance of drawing any given number remains the same in each trial, what is the probability that in \(h\) trials the sum of the numbers drawn will be equal to \(s\)?

The solution of this problem depends on the number of ways in which the number \(s\) can be formed by the addition of \(h\) different numbers, each of which may have any value from 0 to \(m\). If we suppose the numbers marked on the balls to be indexes of a certain quantity \(x\), and develop the expression \((x^0 + x^1 + x^2 + \cdots + x^m)^h\), the coefficient of any term of the development will indicate the number of different ways in which the balls may be drawn, so that the sum of the numbers drawn in \(h\) trials shall be equal to the sum of the indexes of \(x\) in that term. If, therefore, we denote by \(N\) the coefficient of that term of the development in which the sum of the indexes is \(s\), then \(N\) will be the number of cases favourable to the event. But the whole number of possible cases is \((m+1)^h\); therefore the probability of the event is \(N/(m+1)^h\).

On account of the particular form of the polynomial in question, the value of \(N\) is found without difficulty.

Because \(x^0 + x^1 + x^2 + \cdots + x^m = \frac{1-x^{m+1}}{1-x}\), therefore

\[ (x^0 + x^1 + x^2 + \cdots + x^m)^h = (1-x^{m+1})^h (1-x)^{-h}. \]

Now, expressing these two factors in series, we have \((1-x^{m+1})^h = 1-hx^{m+1} + \frac{h(h-1)}{1 \cdot 2} x^{2(m+1)} - \frac{h(h-1)(h-2)}{1 \cdot 2 \cdot 3} x^{3(m+1)} + \cdots\)

\[ (1-x)^{-h} = 1+hx + \frac{h(h+1)}{1 \cdot 2} x^2 + \frac{h(h+1)(h+2)}{1 \cdot 2 \cdot 3} x^3 + \cdots \]

and the coefficients of the several terms of the product of Probability these two series in which the sum of the indexes is \( s \) will be found as follows:

1. Multiply the first term of the first series by that term of the second series of which the argument is \( x^s \); the coefficient of the product will be \( \frac{h(h+1)(h+2)\ldots(h+s-1)}{1\cdot2\cdot3\ldots s} \).

2. Multiply the second term of the first series by that term of the second series which has for its argument \( x^{s-n-1} \); the coefficient of the product will be \( -h \times \frac{(h+1)(h+2)\ldots(h+s-n-2)}{1\cdot2\cdot3\ldots s-n-1} \).

3. Multiply the third term of the first series by that term of the second series which has for its argument \( x^{s-n-2} \); the coefficient of the product will be \( \frac{h(h-1)(h+1)(h+2)\ldots(h+s-2n-3)}{1\cdot2\cdot3\ldots s-2n-2} \).

4. Proceed in the same manner with the fourth term of the first series, and so on with the others, advancing at each new multiplication one term to the right in the first series, and \( n+1 \) terms to the left in the second series, until a term is reached in the first series, the exponent of \( x \) in which is equal to, or greater than \( s \). The sum of the several products thus obtained will be the value of \( N \). We have therefore

\[ N = \frac{h(h+1)(h+2)\ldots(h+s-1)}{1\cdot2\cdot3\ldots s} \]

\[ - \frac{h}{1} \times \frac{(h+1)(h+2)\ldots(h+s-n-2)}{1\cdot2\cdot3\ldots s-n-1} \]

\[ + \frac{h(h-1)}{1\cdot2} \times \frac{(h+1)(h+2)\ldots(h+s-2n-3)}{1\cdot2\cdot3\ldots s-2n-2} \]

\[ \text{etc.} \]

The series now found for \( N \) may be changed into another, having a more elegant form, by reducing all the terms to others having the common denominator \( 1\cdot2\cdot3\ldots h-1 \). This will be accomplished by leaving out of the numerator and denominator of the first term all the numbers after \( h-1 \) to \( s \) (including \( s \)), when \( s \) is greater than \( h-1 \), or by inserting the numbers between \( s \) and \( h-1 \) (the last included), when \( s<h-1 \); by leaving out of the numerator and denominator of the second term all the numbers from \( h-1 \) to \( s-n \), or by inserting those numbers; and so on with the other terms. If we then make the common denominator \( 1\cdot2\cdot3\ldots h-1=k \), we shall have

\[ N = \frac{1}{k} \times \frac{(s+1)(s+2)(s+3)\ldots(s+h-1)}{1\cdot2\cdot3\ldots s} \]

\[ - \frac{h}{k} \times \frac{(s-n)(s-n+1)\ldots(s-n+h-2)}{1\cdot2\cdot3\ldots s-n} \]

\[ + \frac{h(h-1)}{1\cdot2\cdot3\ldots k} \times \frac{(s-2n-1)(s-2n)\ldots(s-2n+h-3)}{1\cdot2\cdot3\ldots s-2n} \]

\[ \text{etc.} \]

to be continued till the last factor of one of the terms becomes 0 or negative. If we also make

\[ s+h-1=f \]

\[ s-n+h-2=f'(n+1)=f' \]

\[ s-2n+h-3=f''(n+1)=f'' \]

\[ s-3n+h-4=f'''(n+1)=f''' \]

\[ \text{etc.} \]

and write the factors in each of the terms in the reverse order, the above value of \( N \) will become

\[ N = +f(f'-1)(f''-2)\ldots(f''-h+2) \]

\[ - f'(f''-1)(f''-2)\ldots(f''-h+2) \]

\[ + f''(f'''-1)(f'''-2)\ldots(f'''-h+2) \]

\[ - f'''(f''''-1)(f''''-2)\ldots(f''''-h+2) \]

\[ \text{etc.} \]

21. As an example of the application of this formula, let it be required to assign the probability of throwing the point 16 with 4 common dice. (Simpson, p.53.)

A die having no face marked 0, it is necessary, in order to adopt the formula to this case, to suppose the number of points on each face to be diminished by unit, which is equivalent to supposing \( s=1 \) to be substituted for \( s \). The numbers are then 0, 1, 2, 3, 4, 5, and we have \( n=5 \), \( h=4 \), and \( s=(16-4)=12 \). Hence

\[ f=s+h-1=15 \]

\[ f'=f-(n+1)=9 \]

\[ f''=f'-2(n+1)=3 \]

\[ f'''=f''-3(n+1)=-3, \]

and \( k=1\cdot2\cdot3 \). Substituting these values in the formula, we find

\[ N=15\cdot14\cdot13\cdot1\cdot6\cdot(-455) \]

\[ -9\cdot8\cdot7\cdot4\cdot5\cdot(-336) \]

\[ +3\cdot2\cdot1\cdot4\cdot3\cdot(-6) \]

or \( N=125 \). Now the probability of the event is \( N/(n+1)^h \); and in the present case \((n+1)^h=6^4=1296\); consequently the probability required, namely that of throwing the point 16 with 4 dice, is \( \frac{125}{1296} \).

22. In the numerical solution of questions of this sort, it sometimes happens that the labour may be abridged by computing the probability of throwing a different point from that which is proposed, but which has the same number of chances in its favour. For example, let it be proposed to determine the probability that in throwing 10 dice the sum of the points will be 50. In this case, the smallest number of points that can possibly be thrown is 10, and the greatest 60; and the chances in favour of throwing 10 and of throwing 60 are obviously equal. The probability of throwing any given number of points above 10 is also evidently the same as that of throwing the number which is as much under 60; and consequently the probability of throwing 50 is the same as the probability of throwing 20, these numbers being at equal distances from the extremes. Now to find the probability of throwing 20 with 10 dice, or, which is the same, the probability that in 10 successive drawings from an urn containing 6 balls, marked with the numbers 0, 1, 2, 3, 4, 5, the sum of the numbers drawn will be 10, we have \( h=10 \), \( n=5 \), \( s=10 \); whence \( f=19 \), \( f'=13 \), \( f''=7 \), \( f''' \) negative, and \( k=1\cdot2\cdot3 \). Substituting these numbers in the series for \( N \), and observing that since \( f''-h+2=-1 \), the third term becomes negative, we have

\[ N = \frac{19\cdot18\cdot17\cdot16\cdot15\cdot14\cdot13\cdot12\cdot11}{1\cdot2\cdot3\cdot4\cdot5\cdot6\cdot7\cdot8\cdot9} = (-92378) \]

\[ - \frac{13\cdot12\cdot11\cdot10\cdot9\cdot8\cdot7\cdot6\cdot5}{1\cdot2\cdot3\cdot4\cdot5\cdot6\cdot7\cdot8\cdot9} = (-7150), \]

and consequently \( N=85228 \). Dividing this by \((n+1)^h=6^4=60466176 \), the probability of throwing 20, or of throwing 50, with 10 dice is found \( \frac{85228}{60466176} \), or between \( \frac{1}{710} \) and \( \frac{1}{709} \).

23. The probability that the whole number of points drawn in \( h \) trials will not exceed \( s \) is found by substituting for \( s \) the different values 0, 1, 2,... in the series for \( N \), and taking the sum of the results. This labour, however, may be avoided by means of a property of the figurate numbers. It is well known that the sum of the series of numbers obtained by giving \( n \) every value from \( n=1 \) to \( n=n \) ( \( n \) being any number whatever) in the formula Let \(1 - u = \frac{b}{\gamma}\) and \(q = \frac{b}{c}\), we shall then have \(\log(1-u) = \log b - \log c\); hence

\[ \log \beta - \log \gamma \text{ and } \log q = \log b - \log c; \]

Substituting in this general formula the particular numbers given in the question, namely \(b = 5\), \(c = 6\); and supposing \(u = \frac{1}{2}\), and consequently \(b = 1\), \(c = 2\), we have \(h = \frac{\log 2}{\log 6 - \log 5}\);

whence, by computing from the logarithmic tables, \(h = 3.8\).

From this it follows, that in four trials the probability of throwing ace once at least, is greater than the probability of not throwing it at all.

If the question had been to determine in how many throws with two dice one may undertake, on an equality of chance, to throw aces at least once, we should have had \(p = \frac{1}{4}\), \(q = \frac{3}{4}\), and consequently \(b = 35\), and \(c = 36\). Substituting these numbers in the general formula, and observing, that in this case also \(b = 1\), \(c = 2\), we get \(h = \frac{\log 2}{\log 36 - \log 35} = 24.6\).

The probability of not throwing aces once is therefore greater than the opposite probability or that of throwing aces once or oftener, when the number of throws is 24, but less when the number is 25.

These two questions are celebrated in the early history of the theory of Probability, from the circumstance that the Chevalier de Méré, by whom they were proposed to Pascal, declared the two results above stated to be inconsistent with each other, and thence took occasion to question the accuracy of the theory of combinations by means of which they had been obtained. He reasoned thus: Since the probability of throwing ace with one die is \(\frac{1}{6}\), and that of throwing aces with two dice \(\frac{1}{36}\); therefore, if there be a given probability in favour of throwing ace in four throws with one die, there must likewise be the same probability of throwing aces with two dice in \(6 \times 4 = 24\) throws; in other words, the chances in favour of an event E in a single trial, being six times more numerous than those in favour of F, there will be as many chances in favour of E in six trials as there are in favour of E in one. The error consists in supposing that the number of trials must increase or diminish exactly in the inverse ratio of the probability of obtaining the proposed point.

25. The general question may be enunciated as follows:

Let \(p\) = the probability an event E will happen, \(q\) = the probability it will fail; how many trials are required to give a probability \(= u\) that E will happen \(k\) times.

Let \(x\) = the number required. Taking the sum of all the terms of the development of \((p + q)^n\) in which the exponent of \(p\) is less than \(k\), we shall have the probability that the event does not happen \(k\) times in \(x\) trials. This sum must consequently be made equal to \(1 - u\); therefore, beginning the last term, and writing the terms in the reverse order, we have the equation

\[ q^x + x q^{x-1} p + \frac{x(x-1)}{1 \cdot 2} q^{x-2} p^2 + \cdots + \frac{x(x-1) \cdots (x-k+2)}{1 \cdot 2 \cdots (k-1)} q^{x-k+1} p^{k-1} = 1 - u. \]

Let \(p = e q\), and this equation becomes

\[ q^x + x q^{x-1} e + \frac{x(x-1)}{1 \cdot 2} q^{x-2} e^2 + \cdots + \frac{x(x-1) \cdots (x-k+2)}{1 \cdot 2 \cdots (k-1)} q^{x-k+1} e^{k-1} = 1 - u, \]

from which the value of \(x\) may be found by the ordinary methods of converging series.

If \(p = q = \frac{1}{2}\), then \(e = 1\); and if we also suppose \(u = \frac{1}{2}\), and consequently \(1 - u = \frac{1}{2}\), the equation will become

---

1 Pascal, *Œuvres*, tom. iv. p. 867; Lacroix, *Eléméntaire*, p. 36. Probability

\[1 + x + \frac{x(x-1)}{1 \cdot 2} + \frac{x(x-1)(x-2)}{1 \cdot 2 \cdot 3} + \cdots + \frac{x(x-1) \cdots (x-k+2)}{1 \cdot 2 \cdots k} = \frac{1}{2}(1+x)^k.\]

But the first side of this equation is the expansion of \((1+x)^k\) continued to \(k\) terms; therefore, since the sum of the first \(k\) terms is equal to one-half of the whole series, and the terms of the first half of the series are the same as those of the last, it follows that the whole number of terms must be \(2k\). But the whole number of terms in the expansion of \((1+x)^k\) is \(x+1\); therefore \(2k=x+1\), and \(x=2k-1\). Suppose \(k=10\), then \(x=19\); hence in tossing a shilling it is an even bet that head will turn up 10 times in 19 throws.

**Sect. III. Of the Probability of Events Depending on a Repetition of Trials, or Compounded of Any Number of Simple Events, the Changes in Respect of Which Are Known A Priori, and Vary in the Different Trials.**

26. Let us suppose the trials to consist in drawing balls from an urn containing \(a\) white balls, and \(b\) black balls, and that when a ball is extracted it is not returned to the urn. Make \(a+b=c\), and let the extraction of a white ball be the event \(W\), and that of a black ball the event \(B\). At the first trial the probability of \(W\) is \(\frac{a}{c}\) (4), and that of \(B\), \(\frac{b}{c}\). But at the second trial, the number of balls in the urn is diminished by 1; and the probability of drawing a white ball at the second trial is therefore not the same as it was in the first, but is influenced by the event which has already taken place. If \(W\) happened at the first trial, the number of white balls remaining in the urn is then \(a-1\); the number of black is \(b\), and the number of both colours \(c-1\). The probability of \(W\) at the next trial is therefore \(\frac{a-1}{c-1}\), and that of \(B\) is \(\frac{b}{c-1}\). In like manner, if \(B\) happened at the first trial, the probability of \(W\) at the second is \(\frac{a}{c-1}\), and that of \(B\) is \(\frac{b-1}{c-1}\). Hence (7) the different combinations which can arise from two trials are the following:

| WWW | WB | BW | BB | |-----|----|----|----| | \(\frac{a(a-1)}{c(c-1)}\) | \(\frac{ab}{c(c-1)}\) | \(\frac{ba}{c(c-1)}\) | \(\frac{b(b-1)}{c(c-1)}\) |

Now if we neglect the order of succession in the two cases in which \(W\) and \(B\) are combined, the probability of the compound event which consists of the extraction of a ball of each colour in the two trials, is \(\frac{2ab}{c(c-1)}\), and the probabilities of the three possible combinations are respectively:

| WWW | WB | BW | BB | |-----|----|----|----| | \(\frac{a(a-1)}{c(c-1)}\) | \(\frac{2ab}{c(c-1)}\) | \(\frac{b(b-1)}{c(c-1)}\) |

Comparing these with the probabilities of the same combinations when the chances are constant, or the ball is returned to the urn after each drawing, namely

\[\frac{a^2}{c^2}, \frac{2ab}{c^2}, \frac{b^2}{c^2},\]

the analogy of the two cases is obvious.

After two balls have been drawn, the whole number remaining in the urn is \(c-2\); but the number of each colour depends on the two events that have already occurred. If two white balls have been drawn, the probability of drawing a white ball at the next trial will be \(\frac{a-2}{c-2}\); but we have just seen that the probability of \(WW\) is \(\frac{a(a-1)}{c(c-1)}\); therefore

(7) the probability of \(WWW\) is \(\frac{a(a-1)(a-2)}{c(c-1)(c-2)}\). The probability of drawing a black ball after two white have been drawn is \(\frac{b}{c-2}\) (for there are now \(c-2\) balls in the urn, of which \(b\) are white); therefore the probability of WWB is \(\frac{a(a-1)b}{c(c-1)(c-2)}\).

On forming in this manner all the different possible combinations which can result from three trials, we find

| WWW | WWB | WBW | BWW | BBW | WBB | BBB | |-----|-----|-----|-----|-----|-----|-----| | \(\frac{a(a-1)(a-2)}{c(c-1)(c-2)}\) | \(\frac{a(a-1)b}{c(c-1)(c-2)}\) | \(\frac{ab(a-1)}{c(c-1)(c-2)}\) | \(\frac{ba(a-1)}{c(c-1)(c-2)}\) | \(\frac{b(b-1)a}{c(c-1)(c-2)}\) | \(\frac{ab(b-1)}{c(c-1)(c-2)}\) | \(\frac{b(b-1)(b-2)}{c(c-1)(c-2)}\) |

If we disregard the order of succession in those combinations into which \(W\) and \(B\) both enter, and consider the occurrence of \(W\) twice and \(B\) once as the same compound event; and also the occurrence of \(W\) once and \(B\) twice as the same compound event, in whatever order they occur, the probability of the former will be \(\frac{3a(a-1)b}{c(c-1)(c-2)}\), and of the latter \(\frac{3ab(b-1)}{c(c-1)(c-2)}\).

27. In general, if \(m+n\) balls have been drawn, of which \(m\) have been found to be white and \(n\) black, the number of white balls in the urn will now be \(a-m\), the number of black \(b-n\), and the whole number of both colours \(c-m-n\). Hence the probability of drawing a white ball in the next trial will be \(\frac{a-m}{c-m-n}\); and that of drawing a black ball \(\frac{b-n}{c-m-n}\). Now if in these two fractions we substitute successively for \(m\) and \(n\) all the different numbers from 0 to \(m-1\) and \(n-1\) respectively, the product of the \(m+n\) numbers thus obtained will (7) be the probability of drawing \(m\) white balls and \(n\) black in an assigned order, in \(m+n\) trials. Let this probability be denoted by \(K\), and we shall have:

\[K = \frac{a(a-1)(a-2)\cdots(a-m+1)(b-1)(b-2)\cdots(b-n+1)}{c(c-1)(c-2)\cdots(c-m-n+1)},\]

whatever the given order may be. Hence, if we denote by \(P\) the probability of \(m\) white balls and \(n\) black being drawn in any order whatever, in \(h\) trials, we shall have \(P=UK\), where, as in (12), \(U=\frac{1 \cdot 2 \cdot 3 \cdots \cdots h}{1 \cdot 2 \cdot 3 \cdots \cdots m \times 1 \cdot 2 \cdot 3 \cdots \cdots n}\), the co-efficient of that term of the binomial \((p+q)^h\) which has for its argument \(p^m q^n\); this co-efficient expressing all the different arrangements which can be formed of \(m\) things of one kind, and \(n\) things of another.

28. When the urn is supposed to contain balls of more than two different colours, the probability of any proposed number of each colour being drawn in a given number of trials is found with the same facility. Suppose it to contain \(a_1\) of the first colour, \(a_2\) of the second, \(a_3\) of the third, and so on; and let \(a_1+a_2+a_3+\cdots+c\); then the probability that Probability in \( m + n + r + \ldots \) trials, there will be drawn \( m \) of the first colour, \( n \) of the second, \( r \) of the third, &c., is

\[ U' \times a_1(a_1-1)(a_1-2)\ldots(a_1-m+1) \]

\[ \times a_2(a_2-1)(a_2-2)\ldots(a_2-n+1) \]

\[ \times a_3(a_3-1)(a_3-2)\ldots(a_3-r+1) \]

\[ \times a_4(a_4-1)(a_4-2)\ldots(a_4-s+1) \]

\[ \times \ldots \]

\[ \times c(c-1)(c-2)\ldots(c-h+1) \]

where, as in (15),

\[ U = \frac{1 \cdot 2 \cdot 3 \ldots \ldots \ldots h}{m \cdot n \cdot r \cdot s \ldots} \]

29. The following examples will show the use of the preceding formulas.

Suppose a bag to contain 16 balls, of which 8 are white and 8 black, what is the probability that in drawing 8 balls from the bag the whole of them will be white?

Applying the formula (27) to the solution of this question, we have \( a = 8, b = 8, c = 16, m = 8, n = 0 \), and as the probability required is that of drawing white balls only, \( b \) cannot enter into any of the factors of the numerator; hence

\[ K = \frac{8 \cdot 7 \cdot 6 \cdot 5 \cdot 4 \cdot 3 \cdot 2 \cdot 1}{16 \cdot 15 \cdot 14 \cdot 13 \cdot 12 \cdot 11 \cdot 10 \cdot 9} = \frac{1}{12870} \]

and since \( m = h, U = 1 \), the probability sought, is therefore

\[ \frac{1}{12870} \]

Let there be a heap of 20 cards, wherein are 7 diamonds, 6 hearts, 4 spades, and 3 clubs; required the probability that in drawing 8 of them at a venture there shall come out 3 diamonds and 2 hearts? (Simpson, p. 21.)

The probability required in this case being that of drawing 3 diamonds, 2 hearts, and 3 other cards which are neither diamonds nor hearts, the spades and clubs may be considered as forming one parcel, containing 7 cards. We have then in the formula (28) \( a = 7, a_1 = 6, a_2 = 7, c = 20; m = 3, n = 2, r = 3, h = 8 \); therefore

\[ U' = \frac{1 \cdot 2 \cdot 3 \cdot 4 \cdot 5 \cdot 6 \cdot 7 \cdot 8}{1 \cdot 2 \cdot 3 \cdot 1 \cdot 2 \cdot 1 \cdot 2 \cdot 3} = 560, \]

and the probability required becomes,

\[ \frac{7 \cdot 6 \cdot 5 \cdot 6 \cdot 5 \cdot 7 \cdot 6 \cdot 5}{20 \cdot 19 \cdot 18 \cdot 17 \cdot 16 \cdot 15 \cdot 14 \cdot 13} = \frac{1225}{3978}, \]

The odds against the event are therefore 2753 to 1225, or nearly 9 to 4.

Let 4 cards be drawn from a pack of 52; what is the probability of drawing one of each sort?

In this case we have \( a = 13, a_1 = 13, a_2 = 13, a_3 = 13, c = 52 \); also \( m = 1, n = 1, r = 1, s = 1, h = 4 \), whence \( U = \frac{1 \cdot 2 \cdot 3 \cdot 4}{1 \cdot 1 \cdot 1 \cdot 1} = 24 \), and the probability required becomes, on substituting these numbers in the formula (28),

\[ 24 \times \frac{13 \cdot 13 \cdot 13 \cdot 13}{52 \cdot 51 \cdot 50 \cdot 49} = \frac{2157}{20825} = \frac{1}{9} \text{ nearly}. \]

The odds against this event are nearly 8 to 1.

30. The following question, proposed by Huygens, and solved by De Moivre and Bernoulli (Ars Conjectandi, p. 59), belongs to the class of problems now under consideration.

An urn contains 12 balls, of which 4 are white and 8 black. Three gamblers A, B, and C agree that the first who, blindfolded, shall draw a white ball shall be the winner of the stakes. They also agree that A shall draw first, B second, C third, A fourth, and so on; and the balls drawn are not replaced in the urn. It is proposed to find their respective probabilities of winning.

Here the play terminates as soon as a white ball is drawn, and it must therefore terminate with the 9th trial, if not sooner, insomuch as, after 8 black balls have been drawn, the urn will contain only white balls, and the probability of drawing a white ball at the next trial will become certainty. The question will therefore be solved, if we determine the probabilities of the play ending with the 1st, 2nd, 3rd, 4th, &c., Probability games respectively, and take the sum of the probabilities of its ending with the 1st, 4th, and 7th, for the probability of A's winning; the sum of the probabilities of its ending with the 2nd, 5th, and 8th, for the probability of B's winning; and the sum of the probabilities of its ending with the 3rd, 6th, and 9th, for the probability of C's winning.

For the sake of rendering the solution more general, let \( a \) be the number of white balls in the urn, \( b \) the number of black, and let \( a + b = c \). The probability of drawing a white ball at the first trial, or of the play ending with the first trial, is then \( \frac{a}{c} \).

The probability of the play ending with the second trial is compounded of the probability of a black ball being drawn at the first trial, and a white at the second; and the probability of both events (25) is \( \frac{b}{c} \cdot \frac{a}{c-1} \).

The probability of the play ending with the third trial is compounded of three separate probabilities, namely, that a black ball will be drawn at the first trial; that a black ball will be drawn at the second; that a white ball will be drawn at the third;—and the probability of the concourse of these events (25) is \( \frac{b(b-1)}{c(c-1)(c-2)} \).

In general the probability of a black ball being drawn in \( x - 1 \) trials successively, and a white ball at the \( x \)th is \( b(b-1)(b-2)\ldots(b-x+2)a \). Substituting for \( a, b, \) and \( c \) in this formula the numbers proposed by Huygens, we obtain in respect of the 1st, 4th, and 7th trials, or the probability in favour of A,

\[ \frac{4}{12} \cdot \frac{8 \cdot 7 \cdot 6 \cdot 5 \cdot 4 \cdot 3 \cdot 2}{12 \cdot 11 \cdot 10 \cdot 9} = \frac{77}{165} \]

in respect of the 2nd, 5th, and 8th trials, or the probability in favour of B,

\[ \frac{8}{12} \cdot \frac{4}{11} \cdot \frac{8 \cdot 7 \cdot 6 \cdot 5 \cdot 4 \cdot 3 \cdot 2}{12 \cdot 11 \cdot 10 \cdot 9} = \frac{53}{165} \]

and in respect of the 3rd, 6th, and 9th trials, or the probability in favour of C,

\[ \frac{8}{12} \cdot \frac{7}{11} \cdot \frac{8 \cdot 7 \cdot 6 \cdot 5 \cdot 4 \cdot 3 \cdot 2}{12 \cdot 11 \cdot 10 \cdot 9} = \frac{35}{165} \]

The chances in favour of A, B, and C, are therefore proportional to the numbers 77, 53, 35, respectively.

If the condition of the play had been that the ball was to be returned to the urn after each trial, the chances in favour of the three gamblers would have been easily found by the formula (12) to be respectively as the numbers 9, 6, and 4.

**SECT. IV. OF MATHEMATICAL AND MORAL EXPECTATION.**

31. In the theory of probability, the term expectation is used to denote the product found by multiplying the value of a casual benefit into the probability of the event on which it is contingent taking place. But the value of a benefit may be estimated either with respect to its absolute amount, or to the amount of relative advantage it affords the individual who receives it. This consideration has led to a distinction between mathematical and moral expectation. When we place the circumstances of the individual entirely out of consideration, and have regard merely to the abstract or absolute value of the benefit, the product of its amount by the probability of obtaining it is the mathematical expe- Probability ratios of the individual; but when a relative value is assigned to the benefit, the product of this relative value by the probability of obtaining it is called the moral expectation, because it is estimated by certain moral considerations respecting the circumstances or fortune of the individual in whose favour the expectation exists, on the principle that a sum of money which may be relatively of very little importance to a man in possession of a large fortune may be of great importance to another who is less favourably circumstanced. We shall first consider the mathematical expectation.

32. Suppose A and B to engage in play; let \( p \) be the probability of A's winning a game, \( q \) the probability of B's winning it, and \( s \) a sum of money staked on the issue of the game. By the definition, the mathematical expectation of A is \( ps \), and that of B is \( qs \). Now if we suppose these expectations to be purchased by A and B, the sums they ought respectively to pay for them, or in other words to stake on the issue of the game, must be proportional to their respective expectations, in order that they may play on equal terms. Let therefore \( a \) be the sum staked by A, and \( b \) the sum staked by B, we have then \( ps : qs :: a : b \), and consequently \( pb = qa \). Now suppose \( a + b = s \), or that the sum played for is the amount of the stakes; then, since \( b \) is the sum A expects to gain, and \( p \) is the probability of his gaining it, \( pb \) is the mathematical value of A's expectation of gain. In like manner \( qa \) is the mathematical value of B's expectation of gain. Hence it follows, that when the sum staked by each is proportional to his probability of winning, the mathematical expectations of the two players are equal; so that after the stakes have been placed, and before the event is decided, they might exchange places without advantage or disadvantage to either. It follows likewise, that since the sum which the one must gain is just that which the other must lose, the product \( qa \), which is B's expectation of gain, may be regarded as A's expectation of loss; or (if taken with a negative sign) as part of A's whole expectation, which then becomes \( pb - qa \). But \( pb - qa = 0 \); whence the condition of A before the event is decided is not altered by the circumstance of his having staked on the issue of the play.

33. This conclusion at first sight appears paradoxical; for it is certain, that after the stakes are placed, A must either gain the sum \( b \) or lose \( a \), and therefore his fortune will of necessity either be increased by the gain of his adversary's stake, or diminished by the loss of his own. The explanation depends on theorems which will afterwards be demonstrated relative to the repetition of trials, from which it results, that though in a single trial the player must either lose or gain, yet on multiplying sufficiently the number of games, a probability will at length be obtained, approaching as nearly to certainty as we please, that the sum gained or lost in the long run will not exceed a certain given fraction (which may be as small as we please) of the whole sum staked, provided the play is undertaken on terms of mathematical equality. But this indefinite repetition of the hazard is practically impossible; and innumerable cases may easily be imagined, in which an individual will be guided by other considerations than the mere mathematical value of the expectation in undertaking or declining a risk. A person of moderate fortune would scarcely be persuaded to risk L500 for the expectation of gaining L5, though the chances might be 100 to 1 in favour of the event which would produce that sum; but numbers would be found willing enough to pay L5 for the expectation of gaining L500, the chances being 100 to 1 against them. In both cases, however, the expectation would be purchased at its real abstract value. According to the formula of mathematical expectation, the man whose sole fortune consists of a lottery ticket which has an equal chance of turning up a prize of L20,000 or a blank, is in an equally advantageous position as he who is in possession of L10,000; yet no man of ordinary prudence, if offered his choice of the two states, would hesitate as to which he ought to give the preference. Common sense will prevent a man from risking a sum, the loss of which would be attended with great privations, even when, mathematically speaking, the chances are considerably in his favour. It is also obvious that two individuals whose fortunes are very unequal cannot engage in play with the same advantage, although the chances in favour of each, in respect of a single game, are precisely the same. The one who has a large fortune can repeat the hazard so often as to obtain a probability almost equal to certainty that his loss will not amount to any given sum; whereas the other, who cannot continue the play in case of loss, runs the risk of being ruined. It is thus evident, that in a multitude of cases the abstract theory of probability is not alone sufficient to give the value of an expectation, and that in dealing with contingent events, an individual must be guided to a certain extent by considerations of relative advantage.

34. Various hypotheses have been imagined for the purpose of reducing such relative or moral considerations to accurate calculation; but that which appears the most natural, and applicable to the greatest number of cases, consists in supposing the relative value of any infinitely small sum to be directly proportional to its absolute value, and inversely as the fortune of the individual who has an expectation of receiving it. This principle was first proposed by Daniel Bernoulli in the Petersburg Commentaries (vol.v.), and is there applied by him to the solution of a number of questions of great practical interest.

Let \( x \) be the absolute value of the capital, or, as it is denominated by Laplace, the physical fortune, of an individual; then, according to the hypothesis of Bernoulli, the moral advantage which he derives from an infinitely small increment of fortune \( dx \), is measured by the expression \( e^{dx}x \), being a constant to be determined by the nature of the question. Now, if we suppose the physical fortune to arise from the accumulation of the elements \( dx \), and denote by \( y \) the relative or moral value of the fortune, of which the absolute or physical value is \( x \), we shall have

\[ y = \int e^{dx}x = c \log x + \text{constant}. \]

To determine the constant, we may suppose \( y = a \), when \( x \) has a given value \( = a \); this gives \( c = \log a + \text{constant} \), whence \( y = c (\log x - \log a) \), or \( y = c \log \frac{x}{a} \); and it is to be observed, that those values of \( x \) and \( y \) can never become negative, for as Bernoulli has remarked, it is only the person who is dying of hunger that can be said to possess absolutely nothing. In every other circumstance the mere possession of existence may be accounted a moral advantage, to which, however, it would be absurd to attempt to assign a numerical value.

35. From the above formula, it is easy to deduce a numerical expression for the value of a moral expectation. Let \( a \) be the original fortune of the individual, and \( \alpha, \beta, \gamma, \ldots \) sums to be received on the occurrence of certain contingent events, E, F, G, &c. This being supposed, if the event E happens, the absolute fortune of the individual becomes \( a + \alpha \), and its relative value, therefore, according to the formula, is \( c \log \frac{a + \alpha}{a} \). If F happens, his absolute fortune becomes \( a + \beta \), to which the corresponding relative value is \( c \log \frac{a + \beta}{a} \); and so on. Now, let the probabilities of the events E, F, G, &c. be respectively \( p, q, r, \ldots \) (assuming \( p + q + r + \ldots = 1 \), so that one or other of the events will necessarily happen), and let Y represent the relative fortune of the individual arising from his expectation, then, since the Probability value of a benefit in expectation is equal to the amount of the benefit multiplied by the probability of obtaining it; we have

\[ Y = p \log_a + q \log_b + r \log_c + \ldots \]

Let also \( X \) denote the absolute value of \( Y \); then, by the formula, we have \( Y = \log_a X \). On comparing these two values of \( Y \), we get

\[ \log_a X = p \log_a + q \log_b + r \log_c + \ldots \]

and on passing to numbers,

\[ \frac{X}{a} = \frac{(a+a)^p (a+b)^q (a+c)^r \ldots}{a^p b^q c^r \ldots}, \]

therefore, since \( p + q + r + \ldots = 1 \),

\[ X = (a+a)^p (a+b)^q (a+c)^r \ldots \]

In this expression \( X \) denotes the absolute value of the original fortune and of the expectation added together; if, therefore, we deduct \( a \) from \( X \), the difference will be the value of the expectation, or the sum which, if it were to be received certainly, would procure the individual the same relative advantage as his expectation.

36. If the sums \( a, b, c, \ldots \) are supposed to be very small in comparison of \( a \), so that quantities of the order \( \left( \frac{a}{a} \right)^2 \) may be neglected, the preceding equation becomes

\[ X = a + p + q + r + \ldots - 1 \]

whence, since \( p + q + r + \ldots = 1 \),

\[ X = a + p + q + r + \ldots \]

Deducing from this the original fortune \( a \), the remainder \( p + q + r + \ldots \) is the value of the expectation, or the sum equivalent to the moral advantage. But the value of the mathematical expectation of the benefits \( a, b, c, \ldots \) of which the probabilities are respectively \( p, q, r, \ldots \) is also \( p + q + r + \ldots \) (31); therefore, when the contingent benefits are very small in comparison of the original fortune, the moral advantage and the mathematical expectation are sensibly the same.

37. From the formula \( X = (a+a)(a+b)(a+c) \ldots \) Bernoulli deduces the consequence that gambling or betting is attended with a moral disadvantage, even when the chances of gain or loss, mathematically speaking, are perfectly equal. To show this, he proposes the following question. A, whose fortune is 100 crowns, bets 50 crowns with B, on the issue of an event of which the probability is \( \frac{1}{2} \); on these terms: if the event happens, A is to receive from B 50 crowns; if it fails, he is to pay B 50 crowns; what is the relative value of A's fortune? In this case, we have \( a = 100, b = 50, c = -50, d = 0 \); also \( p = \frac{1}{2}, q = \frac{1}{2}, r = 0 \); and the formula (35) becomes

\[ X = (100+50)^{\frac{1}{2}} \times (100-50)^{\frac{1}{2}}, \]

whence \( X = \sqrt{150 \times 50} = 87 \); and, consequently, the condition of A is worse by 13 crowns than it was before he hazarded the bet. The moral disadvantage is therefore equivalent to this sum, though the terms of the play, according to the mathematical theory, are equal.

38. The conclusion arrived at in this particular case is easily shown to be universally true. Let \( a \) be the capital of the player, \( p \) his probability of winning, \( q \) his probability of losing, and \( s \) the sum at stake. In order that he may play on terms of mathematical equality, the part of the stakes contributed by himself, or the sum which he can lose, must be \( ps \) (32), and the part contributed by his adversary, or that which he may gain, must be \( qs \). The equation in (35) therefore becomes

\[ X = (a+qs)^p \times (a-ps)^q, \]

and if it can be shewn that this value of \( X \) is less than \( a \), it will follow that his condition is rendered worse in consequence of having staked on the game. Now, dividing by \( a \), and taking the logarithm of both sides of the equation, we get \( \log_a X = p \log_a (1 + \frac{qs}{a}) + q \log_a (1 - \frac{ps}{a}) \), the differential of which (making \( x \) variable) is

\[ d \log_a X = pqds \left( \frac{1}{1 + \frac{qs}{a}} - \frac{1}{1 - \frac{ps}{a}} \right). \]

But the second side of this equation is evidently negative; therefore \( d \log_a X - a \) is negative; consequently the logarithm of \( X - a \) is negative, and \( X \) must be less than \( a \). In all cases, therefore, the bet, if on even terms, produces a moral disadvantage.

39. Another consequence deduced by Bernoulli from this theory of moral expectation, is, that when property of any kind is exposed to a risk or hazard, it is more advantageous to expose it in parts to several risks independent of each other, than to expose the whole at once to a single risk, although the probability of loss be in both cases precisely the same. To prove this, he takes the following example. A merchant has a capital of L4000, besides goods of the value of L8000, which must be transported by sea. The probability of the loss of a vessel in the voyage being \( \frac{1}{10} \), let it be proposed to find the value of the moral expectation of the merchant in the case of the goods being embarked in a single vessel, and also in the case of one half being embarked in one vessel and the other half in another. Supposing the merchandise embarked in one ship, the absolute fortune of the merchant will be increased to L12,000 in the event of the safe arrival of the ship, and will be reduced to L4000 in the event of its being lost. The probability of the first of these events is \( \frac{9}{10} \), and of the second \( \frac{1}{10} \); therefore his absolute fortune becomes, in virtue of his expectation,

\[ X = (12,000)^{\frac{9}{10}} \times (4000)^{\frac{1}{10}}, \]

whence \( X = 10751 \). Deducting his other capital, L4000, there remains L6751 for the value of the moral expectation in respect of the venture.

Let us next suppose the merchandise embarked in equal parts in two ships. In this case there are three compound events to be considered, 1st, Both vessels may arrive in safety; the probability of which is \( \frac{9}{10} \times \frac{9}{10} = \frac{81}{100} \); 2d, One may arrive in safety and the other be lost; the probability of which, as it may happen in two ways, (11) is \( 2 \times \frac{9}{10} \times \frac{1}{10} = \frac{18}{100} \); 3d, Both may be lost; the probability of which is \( \frac{1}{10} \times \frac{1}{10} = \frac{1}{100} \). If the first of these events happen, the capital of the merchant will become L4000 + L8000 = L12,000; if the second happen it will be L4000 + L4000 = L8000; and if the third happen it will be only L4000. With these numbers the formula becomes

\[ X = (12,000)^{\frac{9}{10}} \times (8000)^{\frac{1}{10}} \times (4000)^{\frac{1}{10}}, \]

whence \( X = 11033 \). Deducting his other capital, which was exposed to no risk, there remains L7033 for the value of the moral expectation. This sum exceeds the former by L292; and it is easily found by following the same process of reasoning, that in proportion as the risk is divided among a greater number of ships, the moral expectation is increased, and approaches its limit, which is the value of the mathematical expectation, or \( \frac{9}{10} \) of L8000 = L7200.

40. The theory of moral expectation enables us likewise to assign the circumstances in which it is advantageous or Probability otherwise, to insure property against particular hazards.

There are three principal questions to be considered in reference to this subject: 1. The amount of premium the insured may pay without disadvantage; 2. The ratio of his fortune to the value of the sum exposed to risk, in order that it may be advantageous to insure at a given premium; and 3. The capital which the insurer or underwriter ought to possess, in order that he may insure a given risk with probable advantage to himself, and safety to the insured.

Let \( s \) be the value of a cargo which a merchant embarks in a ship, \( p \) the probability of the safe arrival of the vessel, and \( a \) his capital independently of \( s \). The mathematical value of the premium for insurance is \( qs \); for, if we denote the premium by \( y \), then \( y/s \) is the sum the insurer will gain if the vessel reaches its destination in safety, and \( s-y \) is the sum he will lose if it does not; and by the theorem for the mathematical expectation \( pyq(s-y) \) whence, since \( p+q=1 \), \( y=q \).

If, therefore, the merchant insures the cargo, his absolute fortune becomes \( a+ps-a-ps \); and if he does not insure, it is the value of \( X \) in the equation \( X=(a+s)^{ps} \). Hence it will be advantageous or otherwise to insure according as \( a+ps \) is greater or less than \( (a+s)^{ps} \). Now the logarithm of the first of these expressions, or \( \log(a+ps) \), is equivalent to the integral \( \int \frac{pds}{a+ps} \); and the logarithm of the second, or \( p \log(a+s)+q \log a \), is equivalent to \( \int \frac{pds}{a+s} \); but since \( p \) is a proper fraction, \( a+ps \) is less than \( a+s \), and therefore the first integral is greater than the second. Consequently \( a+ps \) is, in general, greater than \( (a+s)^{ps} \), and the insurance is attended with advantage. Let us now assume \( x=a+ps-(a+s)^{ps} \), and \( x \) will be the sum the merchant could afford to pay the insurer above the mathematical value of the risk without moral disadvantage. If he pays less than \( qs+x \), his relative fortune is increased by insuring; and if he pays more he is a loser. In practice the premium may be considered as less than \( qs+x \), but greater than \( qs \); so that while the insured pays more than the mathematical value of the risk, he gains a moral advantage by the transaction.

To solve the second question, let \( e \) be the premium demanded for insuring the amount \( s \); then, the other capital of the merchant being \( a \), his fortune after being insured is \( a+s-e \); while if he takes the risk on himself, its value becomes \( (a+s)^{ps} \). If, therefore, the value of \( a \) be determined from the equation \( a+s-e=(a+s)^{ps} \), we shall have the amount of capital he ought to possess in order that it may be morally a matter of indifference to him whether he insures or not. As an example, let the value of the merchandise, or \( s \), be L10,000, \( e=L800 \), and \( p=\frac{1}{2} \). The equation then becomes

\[ a+9200=(a+10000)^{\frac{1}{2}} \cdot \frac{1}{2} \]

whence \( a \) is found by approximation \( =5043 \). It follows, therefore, that unless his other capital amounts to L5043, it would be disadvantageous to neglect insuring, although the premium demanded exceed the mathematical value of the risk (which is \( \frac{1}{2} \times L10,000 = L500 \)) by L300.

The third question, the amount of capital the underwriter ought to possess, is determined precisely in the same way. Let \( b \) be his capital. After accepting the risk of the sum \( s \) for the premium \( e \), his capital will become \( b+e \) in the case of the vessel arriving in safety, and \( b-e \) in the case of its being lost. The formula of the moral expectation therefore becomes \( X=(b+e)(s-b-e)^{ps} \); and in order that there may be neither advantage nor disadvantage in undertaking the risk, this value of \( X \) must be equal to his original capital, \( b \). Supposing, therefore, \( s, e, p, q \), to have the same significations as above, the equation from which \( b \) is to be determined is \( b=(b+800)^{\frac{1}{2}}(b-9200)^{\frac{1}{2}} \), whence \( b=14243 \).

Unless, therefore, the capital of the insurer amounts to L14,243, there would be a moral disadvantage in undertaking the risk of insuring a cargo worth L10,000 for a premium of L800; and it is easy to see, that if a smaller premium was demanded, the capital ought to be still greater. On making \( e=600 \), (which still exceeds the mathematical value of the risk), the value of \( b \) becomes L29,878. Hence it follows, that a company possessing a large capital may not only with safety engage in speculations which might prove ruinous to another whose resources are more limited, but even derive from them a sure profit.

41. The theory of moral expectation which we have now been considering had its origin in a problem proposed by Nicolas Bernoulli to Montmort, which, from its having been discussed at great length by Daniel Bernoulli in the Petersburg Memoirs, has been usually called the Petersburg problem. It is this: A and B play at heads and tails. A agrees to pay B 2 crowns if head turn up at the first throw; 4 crowns if it turn up at the second, and not before; 8 if it turn up at the third, and not before; and, in general, \( 2^n \) crowns if it turn up at the \( n \)th throw, and not before: required the value of B's expectation? Here the probability of head turning up at the first throw is \( \frac{1}{2} \); the probability of its turning up at the second, and not at the first, is \( \frac{1}{2} \times \frac{1}{2} = \frac{1}{4} \); the probability of its not turning up either at the first or second, and of its turning up at the third, \( \frac{1}{2} \times \frac{1}{2} \times \frac{1}{2} = \frac{1}{8} \), and so on. Hence the probabilities of B receiving 2, 4, 8, 16,...\( 2^n \) crowns are respectively \( \frac{1}{2}, \frac{1}{4}, \frac{1}{8}, \frac{1}{16},...,\frac{1}{2^n} \), consequently (31)

the mathematical value of B's expectation is

\[ \frac{1}{2} \times 2 + \frac{1}{4} \times 4 + \frac{1}{8} \times 8 + \frac{1}{16} \times 16 + ... + \frac{1}{2^n} \times 2^n \text{ crowns}. \]

Now, as no limit can be assigned to \( n \), inasmuch as it is possible that head may not turn up till after a very great, or any assignable number, of throws, this series, of which each term is unity, may go on for ever, and consequently the value of B's expectation becomes infinite. Yet it is obvious that no one would pay any considerable sum for the expectation. This disagreement between the dictates of common sense and the results of the mathematical theory, appeared to Montmort to involve a great paradox; although the question differs in this respect from no other question of chances in which the contingent benefit is very great, and the probability of receiving it very small. If the play could be repeated an infinite number of times, B might undertake to pay without disadvantage any sum, however large, for his expectation. A result, however, more in accordance with ordinary notions, is obtained from the principle of Bernoulli. Let \( a \) be the amount of B's fortune before the play begins, \( x \) the value of his expectation, or the sum he pays A in consideration of the agreement, and make \( z=a-x \). If head turn up at the first throw, B's fortune becomes \( z+2 \); if at the second, and not before, \( z+2^2 \); if at the third, and not before, \( z+2^3 \); and so on. But the probabilities of these events being respectively \( \frac{1}{2}, \frac{1}{4}, \frac{1}{8},...,\frac{1}{2^n} \), the formula for the moral expectation becomes (35)

\[ X=(z+2)^{\frac{1}{2}}(z+2^2)^{\frac{1}{4}}(z+2^3)^{\frac{1}{8}}...(z+2^n)^{\frac{1}{2^n}}. \]

Now the sum which B ought to pay will be determined by making the value of his moral expectation, after the bet, and before the play begins, equal to his previous fortune; we have therefore \( a=X \), that is,

\[ a=(z+2)^{\frac{1}{2}}(z+2^2)^{\frac{1}{4}}(z+2^3)^{\frac{1}{8}}...(z+2^n)^{\frac{1}{2^n}}. \]

---

* See the Commentarii Acad. Petropolitanae, tom. v.; Laplace, Théorie des Prob. p. 432; Lagroix, Traité Élémentaire, p. 132. The general term of this series being \((z+2)^{\frac{1}{n}} = \frac{1}{2^n}(1+\frac{z}{2^n})^{\frac{1}{n}}\), the equation may be put under the form

\[a=(2^{\frac{1}{2}}+2^{\frac{3}{4}}+2^{\frac{5}{8}}+\ldots) \times \left(1+\frac{z}{2}\right)^{\frac{1}{2}} \left(1+\frac{z}{4}\right)^{\frac{1}{4}} \left(1+\frac{z}{8}\right)^{\frac{1}{8}} \ldots\]

and since the logarithm of the first factor of this expression is \(\left(\frac{1}{2} + \frac{2}{4} + \frac{3}{8} + \frac{4}{16} + \ldots\right) \log_2 2\), or \(= \frac{1}{2} \left(1 + \frac{1}{2}\right)\)

\[+ 3 \left(\frac{1}{2}\right)^3 + 4 \left(\frac{1}{2}\right)^4 + \ldots\] log_2 2 = \frac{1}{2} \left(1 - \frac{1}{2}\right)^{-1} \log_2 2 = 2 \log_2 2\), we have \(\log a = 2 \log_2 2 + \frac{1}{2} \log \left(1 + \frac{z}{2}\right) + \frac{1}{4} \log \left(1 + \frac{z}{4}\right) + \frac{1}{8} \log \left(1 + \frac{z}{8}\right) + \ldots\)

from which a value of \(z\) may be found by trial and error for any given value of \(a\). Suppose \(z=100\); on computing the first 10 terms of the series there results \(a=107.89\), whence (since \(x=a-z\)) \(x=7.89\); that is to say, if B possessed only 100 crowns before beginning the play, it would be morally disadvantageous for him to risk 8 crowns for the expectation, although its mathematical value be infinitely great. If we suppose \(z=1000\), the sum of 11 terms gives \(a=1011\), nearly; so that if B possessed a fortune of 1011 crowns, the value of the moral expectation would, to him, be about 11 crowns.

It is scarcely necessary to remark, that the results deduced from the principle of Bernoulli are of a character widely different from those which are calculated according to the mathematical expectation. The latter gives the precise value of a contingent benefit, without any assumption or hypothesis respecting the personal circumstances of the individual who may gain or lose it; whereas the considerations of relative advantage, of which it is the object of Bernoulli's theory to take account, are entirely arbitrary, and by their very nature incapable of being made the subject of accurate computation. It is evidently impossible to have regard to, or appreciate, all the circumstances which may render the same sum of money a more important benefit to one man than to another; and consequently every rule that can be given for the purpose must be liable to numerous exceptions. The principle, however, is thus far valuable, that it gives in the most common cases a plausible and judicious estimate of the value of things which are not susceptible of exact appreciation; and it has the advantage of being readily submitted to analysis. A different principle, proposed by the celebrated naturalist Buffon, consists in making the value itself of a casual benefit, instead of its infinitely small elements, inversely proportional to the fortune of the expectant; but as this hypothesis has seldom been adopted, it is unnecessary to discuss it in this place.

**Sect. V.—Of the Probability of Future Events Deduced from Experience.**

42. In the preceding part of this article it has been assumed, in every case, that the number of chances favourable and unfavourable to the occurrence of a contingent event is known *a priori*, and consequently, that the probability of the event, or the ratio of the number of favourable cases to the whole number of cases possible, can be absolutely determined. But in numerous applications of the theory of probabilities, and these, generally speaking, by far the most important, the ratio of the chances in favour of an event to those which oppose it is altogether unknown; and we can form no idea of the probability of the event excepting from a comparison of the number of instances in which it has been observed to happen, with the whole number of instances in which it has been observed to happen and fail.

In order to assign the probability of a contingent event in such cases, it is necessary to consider all the different causes or combinations of circumstances by which the event could possibly be produced, and to determine its probabilities successively on the hypotheses that each of these causes exists to the exclusion of all the others. The comparative facilities which these hypotheses give to the occurrence of the event which has actually arrived, will then enable us to determine the relative probabilities of the different hypotheses, and consequently their absolute probabilities, since their sum is necessarily equal to unity; and when the probabilities of the different hypotheses, and of the occurrence of the event on each hypothesis, have been determined, the probability of the event occurring in a future trial will be found by the methods already explained.

43. Taking a simple case, let us suppose an urn to contain 4 counters, which are either white or black; that the number of each colour is unknown, but in four successive drawings (the counter drawn being replaced in the urn after each trial) a white counter has been drawn three times, and a black once; and let it be proposed to assign the probability of drawing a counter of either colour at the next trial.

In the present case three hypotheses may be formed relative to the number of white and black counters in the urn. 1st. The urn may contain 3 white counters and 1 black; 2d. It may contain 2 white and 2 black; 3d. It may contain 1 white and 3 black; for a counter of each colour having been drawn, the other two possible cases, namely, that they are all white or all black, are excluded by the observation. Now, let \(p_1, p_2, p_3\) be the probabilities respectively of drawing a white counter on each hypothesis, and \(q_1, q_2, q_3\) the probabilities of drawing a black. Supposing the first hypothesis to be true, or that the compound event which has been observed was produced by the cause indicated by that hypothesis, we have \(p_1=\frac{3}{4}, q_1=\frac{1}{4}\); and the probability of the observed event, or that 3 white counters and 1 black would be drawn, (12) is \(4p_1^3q_1=\frac{27}{46}\). The second hypothesis gives \(p_2=\frac{1}{2}, q_2=\frac{1}{2}\), whence \(4p_2^2q_2=\frac{16}{46}\). The third hypothesis gives \(p_3=\frac{1}{4}, q_3=\frac{3}{4}\), whence \(4p_3^3q_3=\frac{3}{46}\). The probabilities of the observed compound event, on each of the three hypotheses, are therefore, respectively, \(\frac{27}{46}, \frac{16}{46}, \frac{3}{46}\); and the question now arises, how are the probabilities of the different hypotheses to be estimated? As we have no data, *a priori*, for determining this question, we must assume the probabilities of the different hypotheses to be respectively proportional to the probabilities they severally give of the observed compound event; in other words, we must assume the probability of any hypothesis to be greater or less according as it affords a greater or smaller number of combinations favourable to the event which has been observed to take place. Thus, if C and C₁ be two independent causes from which an observed event E may be supposed to arise, and C furnishes 20 different combinations out of a given number, favourable to the occurrence of E, while C₁ furnishes only 10 such combinations out of the same number, we naturally infer that the probability of the cause C having operated to produce E, is twice as great as the probability that the event was produced by the operation of the cause C₁.

Applying this principle to the present example, the probabilities of the three hypotheses are respectively proportional to the three fractions \(\frac{27}{46}, \frac{16}{46}, \frac{3}{46}\), or to the numbers 27, 16, 3; and as no other hypotheses are admissible, the sum of their probabilities must be unity; therefore, making \(w_1\) the probability of the first hypothesis, \(w_2\) that of the second, and \(w_3\) that of the third, we have

\[w_1 = \frac{27}{46}, \quad w_2 = \frac{16}{46}, \quad w_3 = \frac{3}{46}.\]

44. Having found the probabilities of the different hypotheses, that of drawing a white counter at the next trial Probability is obtained without difficulty; for according to what was shown in (9), the probability of this simple event must be equal to the sum of its probabilities relative to the different hypotheses, each multiplied into the probability of the hypothesis itself. Now it has been seen that, on the first hypothesis, the probability of drawing a white ball is \( \frac{3}{4} \); on the second \( \frac{2}{4} \); and on the third \( \frac{1}{4} \); and that the probabilities of the hypotheses are respectively \( \frac{3}{4}, \frac{2}{4}, \frac{1}{4} \); therefore the probability of a white counter being drawn at the next trial is

\[ \frac{3}{4} \times \frac{27}{46} + \frac{2}{4} \times \frac{16}{46} + \frac{1}{4} \times \frac{3}{46} = \frac{116}{184}. \]

In like manner, the probability of a black counter being drawn at the next trial is

\[ \frac{1}{4} \times \frac{27}{46} + \frac{2}{4} \times \frac{16}{46} + \frac{3}{4} \times \frac{3}{46} = \frac{68}{184}, \]

and the sum of these two fractions is unity, as it ought to be, since the counter drawn must necessarily be white or black.

45. The reasoning which has been employed in this particular case is of general application. Let E be an observed event, simple or compound, of which the particular cause is unknown, but which may be ascribed to any one of the n causes, C₁, C₂, C₃, ..., Cₙ, which, before the event has happened, are all equally probable, and such that the operation of any one of them excludes that of the others, so that the event E is produced by one of them alone, and not by the joint agency of several of them. Let the probabilities of the observed event E on the hypothesis that it has proceeded from each of those causes be respectively P₁, P₂, ..., Pₙ; so that if the cause, for instance, C₁ were the true one, the probability of the event E, previous to the observation, would be P₁; and let the probabilities (as determined by the event) of the existence of the different causes be respectively π₁, π₂, ..., πₙ. From the principle laid down in the preceding paragraph, namely, that the probabilities of the different causes or hypotheses are proportional to the probabilities they respectively give of the observed event, we have

\[ \pi₁ : \pi₂ : ... : \piₙ :: P₁ : P₂ : ... : Pₙ, \]

whence making \( P₁ + P₂ + ... + Pₙ = Pₙ \) and observing that \( \pi₁ + \pi₂ + ... + \piₙ = 1 \) (since it is assumed that there are no other causes than those specified from which the event could arise), we have

\[ \sigma₁ = \frac{P₁}{\Sigma P₁}, \sigma₂ = \frac{P₂}{\Sigma P₁}, ... , \sigmaₙ = \frac{Pₙ}{\Sigma P₁}, \]

whence it appears that the probability of each hypothesis respecting the cause of the observed event is found by dividing the probability of the event on the supposition that that particular cause alone existed, by the sum of its probabilities in respect of all the causes. Let us now assume the probabilities of a future event E' (which may be the same with E or different, but depending on the same causes) in respect of the several hypotheses, to be, p₁, p₂, ..., pₙ; so that if the particular cause C₁ be the true one, the probability of E' is p₁; and let Π be the probability of E' in respect of all the causes, then by (9), Π will be equal to the sum of the probabilities p₁, p₂, ..., pₙ relative to the different hypotheses, each multiplied by the probability of the hypothesis; that is to say we shall have

\[ Π = p₁π₁ + p₂π₂ + ... + pₙπₙ; \]

or \( Π = \Sigma pᵢπᵢ \), the symbol \( \Sigma \) indicating the sum of all the different values of p and π in respect of the different causes C₁, C₂, ..., Cₙ.

46. It may be worth while to remark that the word cause is not here used in its ordinary acceptation to denote the combination of circumstances, physical or moral, of which the event is a necessary consequence. In the sense we have used the term, the cause C is that which gives rise to the determinate probability P, that the event E will happen; but so long as this probability falls short of certainty, its existence also implies that of another probability, I—P, that the contrary event F will happen. If we make P=1, the existence of the cause C would necessarily involve the occurrence of E; and it is in this particular sense that the word cause is ordinarily used. In the theory of probabilities the causes of events are considered only in reference to the number of chances they afford for the occurrence of those events which they may possibly, but do not necessarily, produce.

47. The following example may serve to illustrate the method of applying the preceding formulae. An urn contains n balls, which are known to be either white or black. A ball is drawn at random and found to be white; required the probability of drawing a white ball at the next trial?

In this case, the number of hypotheses that may be made respecting the contents of the urn, is n; for we may suppose that it contained one, or two, or any number of white balls from 1 to n, and each of these cases may be considered as a distinct cause of the observed event E. Let these causes or hypotheses be C₁, C₂, C₃, ..., Cₙ, and let us suppose the true cause was C₀ or that the urn contained i white balls. On this hypothesis the probability of the observed event E is \( \frac{i}{n} \), whence \( Pᵢ = \frac{i}{n} \); and therefore, making i successively equal to 1, 2, 3, ..., n, we have \( \Sigma Pᵢ = \frac{1}{n}(1+2+3+...+n) \).

But the sum of this arithmetical series is \( \frac{n(n+1)}{2} \), therefore \( \Sigma Pᵢ = \frac{1}{2}(n+1) \), and consequently,

\[ \sigmaᵢ = \frac{2i}{n(n+1)}, \]

which is the probability of the assumption that the event proceeded from the cause Cᵢ or that the urn contained i white balls. If we suppose i=n we have \( \sigmaᵢ = \frac{2}{n+1} \) for the probability that all the balls are white; and if we also suppose n=3, this becomes \( \frac{1}{2} \); whence if an urn contain 3 balls which must be either black or white, and a white ball be drawn at the first trial, it is an even wager, after the trial, that all the balls are white.

48. Having found, from the observed event E, the probabilities of the different hypotheses, we have now to determine the probability Π of the event E' (the drawing of a white ball) at the next trial. Here two cases present themselves; according as the ball is replaced in the urn, or is not; or in general, according as the law of the chances remains constant during the series of trials or varies.

1st. Let us suppose that the ball has been replaced in the urn. In this case the probability of the event E', on the hypothesis that the urn contains i white balls, is \( \frac{i}{n} \); that is to say \( pᵢ = \frac{i}{n} \). But the probability \( \sigmaᵢ \) of this hypothesis, as found above, is \( \frac{2i}{n(n+1)} \); therefore \( pᵢσᵢ = \frac{2i²}{n(n+1)} \), whence the general formula (45) \( Π = \Sigma pᵢσᵢ \) becomes \( Π = \Sigma \frac{2i²}{n(n+1)} = \frac{2}{n(n+1)} \Sigma i² \). Now \( \Sigma i² = \Sigma (i+1) - 2i \). But by the property of the figurate numbers referred to in (23), the sum of the series of numbers obtained by giving i every value from i=1 to i=n in the formula \( \frac{n(n+1)}{1 \cdot 2 \cdot 3} \) is expressed by \( \frac{n(n+1)(n+2)}{1 \cdot 2 \cdot 3} \); therefore \( \Sigma (i+1) = \frac{n(n+1)(n+2)}{3} \). Probability We have also as above

\[ z_i = 1 + 2 + 3 + \ldots + n = \frac{n(n+1)}{2} \]

consequently,

\[ z_i^2 = \frac{n(n+1)(n+2)}{3} - \frac{n(n+1)}{2} = \frac{n(n+1)(2n+1)}{6} \]

and therefore

\[ \Pi = \frac{2}{n(n+1)} \times \frac{n(n+1)(2n+1)}{2 \cdot 3} = \frac{2n+1}{3n}. \]

2d. Suppose the ball which has been extracted is not replaced in the urn. In this case, on the hypothesis that the urn at first contained \( i \) white balls, the probability of drawing a white ball at the next trial is \( \frac{i-1}{n-1} \); that is, \( p_i = \frac{i-1}{n-1} \);

and the probability of the hypothesis is the same as in the former case, or \( w_i = \frac{2i}{n(n+1)} \); therefore \( p_i = \frac{2i(i-1)}{(n-1)n(n+1)} \),

and consequently \( \Pi = \frac{2}{(n-1)n(n+1)} \times \frac{2i(i-1)}{3} \).

Now the value of \( z_i(i-1) \) will evidently be found by writing \( n-1 \) for \( n \) in the above expression for \( z_i(i+1) \); whence

\[ z_i(i-1) = \frac{(n-1)n(n+1)}{3}, \]

and, therefore, in this case

\[ \Pi = \frac{2}{(n-1)n(n+1)} \times \frac{(n-1)n(n+1)}{3} = \frac{2}{3}. \]

When \( n \) is a very large number, the ratio of \( 2n+1 \) to \( 3n \), the value of \( \Pi \) in the former case, does not sensibly differ from \( \frac{2}{3} \), and therefore in both cases \( \Pi = \frac{2}{3} \). Hence it follows, that if an event, depending on unknown causes, can happen only in one of two ways, and it has been observed to happen once, the odds are two to one in favour of its happening in the same way at the next occurrence.

49. The expression for \( w \) in (45) was determined on the supposition that previously to the experiments being made, we are entirely ignorant of the relative numbers of the two sorts of balls in the urn, and have no reason to suppose one hypothesis more probable than another. If, however, we happen to know, previously to the experiment, that the different causes \( C_1, C_2, C_3, \ldots \) have not all the same number of chances in their favour, or that the probabilities of the different hypotheses have relative values, it becomes necessary to introduce those relative values, in consequence of which \( w_i = w_j, \ldots \), will receive a modification. Let us conceive a number of urns, each containing balls of two colours, black and white, to be distributed in \( n \) groups, \( A_1, A_2, A_3, \ldots, A_n \), in such a manner that the ratio of the number of white balls to the number of black balls is the same in respect of each urn belonging to the same group, and consequently that the probability of drawing a ball of either colour is the same from whichever urn in the group it may happen to be drawn, but different in respect of the different groups; and let the probabilities of drawing a white ball from each of the different groups be respectively \( P_1, P_2, P_3, \ldots, P_n \). Now, let us suppose there are \( a_1 \) urns in the group \( A_1, a_2 \) in the group \( A_2 \), and so on, and let \( s \) be the whole number of urns, so that \( s = a_1 + a_2 + \ldots + a_n \); then, if we make \( \frac{a_1}{s} = \lambda_1, \frac{a_2}{s} = \lambda_2, \ldots \), and so on, \( \lambda_i \) will be the \( a \) priori probability that a ball drawn from any urn at random, will be drawn from the group \( A_i \); \( \lambda_2 \) the probability it will be drawn from the group \( A_2 \); and, in general, \( \lambda_i \) the probability it will be drawn from the group \( A_i \). This being premised, suppose a trial to be made, and that the event \( E \) is a white ball; the probability \( w_i \) of the hypothesis that the ball was drawn from the group \( A_i \) is found as follows. The \( a \) priori probability of the ball being drawn from the group \( A_i \) is \( \lambda_i \); and if the ball is actually drawn from that group, the probability of its being white is \( P_i \); therefore the probability of both events is \( \lambda_i P_i \); and consequently (45), \( w_i = \frac{\lambda_i P_i}{\sum \lambda_i P_i} \), the symbol of summation \( \Sigma \) extending to all the values of \( i \) from \( i = 1 \) to \( i = n \).

50. In the applications of the theory to physical or moral events, the different groups of urns here imagined may be regarded as so many independent causes \( C_1, C_2, C_3, \ldots \), by any one of which the event \( E \) might have been produced; \( w_i \) is the probability that the event was produced by the particular cause \( C_i \); \( P_i \) is the probability that the cause \( C_i \) if it had alone existed, would have produced the observed event \( E \); and \( \lambda_i \) is the probability, previously to the experiment, that \( C_i \) would be the efficient cause. The formula \( w_i = \frac{\lambda_i P_i}{\sum \lambda_i P_i} \), therefore, shows that the probability of any one of the possible causes \( C_i \) of an observed event is equal to the product of the probability \( P_i \) of the event taking place if that cause acted alone multiplied into the probability \( \lambda_i \) that the cause \( C_i \) is the true one, and divided by the sum \( \sum \lambda_i P_i \) of all the similar products formed relatively to each of the causes from which the event can be supposed to arise.

51. The formula now obtained can only be used when the number of hypotheses is finite; but in the applications of the theory it most frequently happens that an infinite number of hypotheses may be made respecting the causes of an observed event, as would be the case in the above example if the number of balls in the urn had been unknown. In such cases, in order to find the values of \( w \) and \( \Pi \), it becomes necessary to transform the sums \( \Sigma \) into definite integrals, which is accomplished by means of the theorem

\[ \Sigma x^n dx = \int x^n dx, \]

where \( x \) is a function of \( x \). Suppose a ball to have been drawn a great number of times in succession from an urn (the number in which is unknown) and replaced in the urn after each drawing; and that the result has been a white ball \( m \) times and a black ball \( n \) times, the probable constitution of the urn, and thence the probability of drawing a white ball at a future trial will be found as follows. Assume the hypothesis that the ratio of the number of white balls to the whole number in the urn is \( x : 1 \), and let \( w \) be the probability of the hypothesis. On this hypothesis the probability of drawing a white ball in any trial is \( x \), and that of drawing a black ball \( 1-x \), and consequently, the probability of drawing \( m \) white and \( n \) black in \( m+n \) trials is \( Ux^m(1-x)^n \) by (12). We have therefore for the probability of the observed compound event

\[ P = Ux^m(1-x)^n; \]

whence in consequence of the above formula for transforming a sum into a definite integral

\[ \Sigma P = \int Ux^m(1-x)^n dx \quad (U \text{ being independent of } x) \]

therefore

\[ w = \frac{P}{\Sigma P} = \frac{x^m(1-x)^n}{\int x^m(1-x)^n dx}. \]

The value of the integral in the denominator of this fraction is obtained by the usual method of integrating by parts. Since

\[ d \left( \frac{x^{n+1}(1-x)^n}{n+1} \right) = x^n(1-x)^n dx - \frac{n}{n+1} x^{n+1}(1-x)^{n-1} dx, \]

therefore

\[ \int x^n(1-x)^n dx = \frac{x^{n+1}(1-x)^n}{n+1} + \frac{n}{n+1} \int x^{n+1}(1-x)^{n-1} dx. \]

In like manner we get

\[ \int x^{n+1}(1-x)^{n-1} dx = \frac{x^{n+2}(1-x)^{n-1}}{n+2} + \frac{n-1}{n+2} \int x^{n+2}(1-x)^{n-2} dx. \]

Continuing this operation \( n \) times, or till the exponent of \( (1-x) \) becomes \( n-n=0 \), the last integral will be Probability

\[ \int x^{m+n} dx = \frac{x^{m+n+1}}{m+n+1}; \]

therefore, collecting the several terms into one sum, we have

\[ \int x^m (1-x)^n dx = \frac{x^{m+1}(1-x)^n}{m+1} + \frac{(m+1)(x^{m+2})}{(m+1)(m+2)} + \cdots + \frac{n(n-1)(n-2) \cdots 2 \cdot 1}{m+1(m+2) \cdots m+n+1}. \]

When \( x = 0 \), all the terms of this series vanish, and when \( x = 1 \) they all vanish excepting the last; therefore between the limits \( x = 0 \) and \( x = 1 \), the value of the integral is the last term of the series when \( x \) in that term \( = 1 \); that is to say,

\[ \int_0^1 x^m (1-x)^n dx = \frac{n(n-1)(n-2) \cdots 2 \cdot 1}{(m+1)(m+2) \cdots m+n+1}. \]

For the sake of brevity, let the symbol \([x]\) be adopted to represent the continued product \(1 \cdot 2 \cdot 3 \cdots x\) of the natural numbers from 1 to \( x \), whence by analogy \([x+y]\) will represent the continued product of the same series from 1 to the number denoted by \( x+y \). Multiplying, then, the numerator and denominator of the above expression by \(1 \cdot 2 \cdot 3 \cdots m=[m]\), we get

\[ \int_0^1 x^m (1-x)^n dx = \frac{[m][n]}{[m+n+1]}; \]

whence the probability of the hypothesis, in consequence of the equation above found, becomes

\[ w = \frac{[m+n+1]}{[m][n]} x^n (1-x)^m. \]

From this value of \( w \) we are enabled to deduce that of \( \Pi \), the probability of drawing a white ball at the next trial. By (45) \( \Pi = 2w \). Now, since by hypothesis the number of white balls in the urn is to the whole number of both colours in the ratio of \( x \) to 1, the probability of drawing a white ball is \( x \); consequently \( p = x \), and therefore \( \Pi = 2wx = \int_0^1 x dx = \frac{[m+1]}{[m]} \int_0^1 x^{m+1}(1-x)^n dx \).

But the value of \( \int_0^1 x^{m+1}(1-x)^n dx \) will evidently be obtained by substituting \( m+1 \) for \( m \) in the expression found for \( \int_0^1 x^m (1-x)^n dx \). This substitution gives

\[ \int_0^1 x^{m+1}(1-x)^n dx = \frac{[m+1][n]}{[m+n+2]}; \]

whence, observing that \([m+1]+[m]=m+1\), and \([m+n+2]+[m+n+1]=m+n+2\), we have

\[ \Pi = \frac{m+1}{m+n+2}. \]

The probability of the contrary event, or of drawing a black ball, is \( 1-\Pi = \frac{n+1}{m+n+2} \). As the numbers \( m \) and \( n \) become larger, these two fractions approach nearer and nearer to their limits \( \frac{m}{m+n} \) and \( \frac{n}{m+n} \), which are the apriori probabilities \( p \) and \( q \) of the respective events when the ratio of the number of white balls in the urn is to that of the black balls as \( m \) to \( n \).

52. The probability of drawing \( m' \) white balls and \( n' \) black balls in \( m'+n' \) future trials is found in a similar manner, and the problem may be thus stated. \( E \) and \( F \) are two contrary events, depending on constant but unknown causes; and it has been observed, that in \( m+n=h \) successive instances the event \( E \) has occurred \( m \) times and \( F \) \( n \) times, required the probability that in \( m'+n'=h' \) future instances, \( E \) will occur \( m' \) times and \( F \) \( n' \) times.

Assume, as in the last case, the facility of the occurrence of \( E \) to that of \( F \) to be in the ratio of \( x \) to \( 1-x \); we have then, as before, for the probability of the hypothesis, \( w = \frac{[h+1]}{[m][n]} x^n (1-x)^m \). Now on this hypothesis the probability of \( E \) in the next instance is \( x \), and that of \( F \) is \( 1-x \), whence the probability of \( m' \) times \( E \) and \( n' \) times \( F \) in the next \( h' \) trials being denoted by \( p \), we have \( (12) \)

\[ p = U' x^n (1-x)^m = \frac{1 \cdot 2 \cdot 3 \cdots h'}{1 \cdot 2 \cdot 3 \cdots m' \cdot 1 \cdot 2 \cdot 3 \cdots n'} = \frac{[h']}{[m'][n']} \]

We have therefore \( p = U' \frac{[h+1]}{[m][n]} x^n (1-x)^m \) for the probability of the compound event on this hypothesis. To find its probability \( \Pi \) on the infinite number of hypotheses formed by supposing \( x \) to increase by infinitely small increments from \( x=0 \) to \( x=1 \), we have

\[ \Pi = \sum p = \int_0^1 x^n (1-x)^m dx = \frac{[m][n]}{[m+n+1]} \]

On substituting for \( p \) the value just found, we get \( \Pi = U' \frac{[h+1]}{[m][n]} \int_0^1 x^{m+m'}(1-x)^{n+n'} dx \), and it is manifest that the value of this integral will be obtained by substituting \( m+m' \) for \( m \), and \( n+n' \) for \( n \) in the value of \( \int_0^1 x^m (1-x)^n dx \) found above. This substitution gives

\[ \int_0^1 x^{m+m'}(1-x)^{n+n'} dx = \frac{[m+m'][n+n']}{[h+h'+1]} \]

whence we conclude

\[ \Pi = U' \frac{[m+m'][n+n'][h+1]}{[m][n][h+h'+1]} \]

The most probable hypothesis will be found by making the value of \( w \) a maximum, or its differential coefficient equal to zero. Differentiating the equation \( w = \frac{[h+1]}{[m][n]} x^n (1-x)^m \),

and making \( \frac{dw}{dx} = 0 \), we get \( m(1-x)=nx \), whence \( x = \frac{m}{m+n} \). The most probable supposition, therefore, respecting the contents of the urn is, that the two sorts of balls are in the same proportions as have been shown by the previous drawings. We shall have further occasion for these formulae when we come to consider the cases in which \( m \) and \( n \) are large numbers.

Sect. VI.—Of Benefits Depending on the Probable Duration of Human Life.

53. In applying the principles of the theory of probability to the determination of the values of benefits depending on life, the fundamental element which it is necessary to determine from observation is the probability that an individual at every given age within the observed limits of the duration of life, will live over a given portion of time, for instance one year; for when this has been determined for each year of age, the probability that an individual, or any number of individuals, will live over any assigned number of years, is easily deduced. Thus, if the probabilities that an individual \( A \), whose age is \( y \), will live over 1, 2, 3,...\( x \) years, be denoted respectively by \( p_1, p_2, p_3, \ldots, p_x \), and if \( q_1, q_2, q_3, \ldots, q_x \) denote the same probabilities in respect of an individual whose age is \( y+1 \) years; \( r_1, r_2, r_3, \ldots, r_x \), the same in respect of an individual whose age is \( y+2 \) years, and so on; then, since the probability \( p_y \) which \( A \) has of living over 2 years is obviously compounded of the probability \( p_1 \) of his living over 1 year, and of the probability \( q_1 \) that, having attained the age \( y+1 \), he will live another year, we... Probability have, by (7), \( p_1 = p_1 q_1 \). Again, the probability \( p_2 \) that A will live over three years, being compounded of the probability \( p_2 \) that he will live over two years, and of the probability \( r_2 \) that, having attained the age \( y + 2 \) years, he will survive another year, we have \( p_3 = p_2 r_2 = p_1 q_1 r_2 \). In like manner \( p_4 = p_3 r_3 \), and so on; so that the probabilities \( p_1, p_2, p_3, \ldots \) are successively derived from \( p_1, q_1, r_1, s_1, \ldots \), which are supposed to be the data of observation.

If a large number \( n \) of individuals, all born in the same year, were selected, and if it were observed that the number of them remaining alive at the end of the first year is \( n_1 \), at the end of the second year \( n_2 \), at the end of the third \( n_3 \), and so on, then the probabilities \( p_1, p_2, p_3, \ldots \) would be given directly by the observation, being respectively equal to the quotients \( \frac{n_1}{n}, \frac{n_2}{n}, \frac{n_3}{n}, \ldots \). But the most accurate observations of mortality are furnished by the experience of the annuity and assurance offices, where they are not made on an isolated number, diminishing, and consequently giving a less valuable result every year, but on a comparison of the numbers which, in a series of years, enter upon and survive each year of life. This observation gives \( p_1, q_1, r_1, s_1, \ldots \), whence \( p_2, p_3, p_4, \ldots \) are found, as above, for every year of life.

54. The values of annuities on lives, and of reversionary sums to be paid on the failure of lives, are found by combining the probabilities \( p_1, p_2, p_3, \ldots \) with the rate of interest of money. Let \( r \) be the rate of interest, that is to say, the interest of L1 for a year, and \( v \) the present value of L1 to be received at the end of a year, we shall then have \( v = 1/(1+r) \). Now an annuity, payable yearly, is always understood in this sense, that the first payment becomes due at the end of a year after the annuity is created. Suppose then the annuity to be L1, the present value of the first payment, if it were to be received certainly, is \( v \); but the receipt of this sum is contingent on the annuitant being alive at the end of the year, the probability of which we suppose to be \( p_1 \); therefore (7) the present value of L1 subject to the contingency, is \( vp_1 \). In like manner, the present value of L1 to be received certainly at the end of \( x \) years is \( v^x \); but the annuity will only be received at the end of the \( x \)th year if the annuitant be then living, the probability of which is \( p_x \); therefore the present value of that particular payment is \( v^x p_x \). Hence if \( A \) denote the present value of the annuity, or the sum in hand which is equivalent to all the future payments, we shall have \( A = \sum v^x p_x \); the sum including all values of \( x \) from 1 to \( x \) the number for which \( p_x = 0 \). If the annuity be \( a \) pounds, its value is obviously \( a \sum v^x p_x = aA \).

55. The series denoted by \( \sum v^x p_x \) may be divided into two parts, \( \sum v^x p_x + \sum v^x p_x \), where \( n \) is to be taken from 1 to \( n \), and \( x \) from \( n+1 \) to the number for which \( p_x \) vanishes. The first gives the value of the temporary annuity on the given life for \( n \) years, and the second the value of the deferred annuity, that is to say, of the annuity to commence \( n \) years hence if the individual shall be then living, and to continue during the remainder of his life. Let \( A \) be the value of the annuity on the life of a person now aged \( y \) years for the whole of life, \( A^{(n)} \) the value of a temporary annuity on the same life for \( n \) years, and \( A^{(d)} \) the value of an annuity deferred \( n \) years on the same life, we have then \( A = A^{(n)} + A^{(d)} \).

To find \( A^{(d)} \), let \( A_n \) be the value of an annuity on a life aged \( y+n \) years. If the person now aged \( y \) years lives over \( n \) years, the value of an annuity on the remainder of his life will then be \( A_n \). The present value of this sum, if it were to be received certainly, is \( v^n A_n \), and the probability of receiving it is \( p_n \); therefore its value is \( v^n p_n A_n \). Hence

\[ A^{(d)} = v^n p_n A_n, \quad \text{and} \quad A^{(n)} = A - v^n p_n A_n; \]

so that the values of temporary and deferred annuities are readily computed from tables of \( A \) and \( p \) for all the different ages.

56. The equation \( A = A^{(n)} + A^{(d)} \) gives a formula by which the values of \( A \) are readily deduced from one another. Let \( n = 1 \); we have then \( A = A^{(n)} + v p_1 A_1 \). But \( A^{(n)} \), the value of an annuity for one year, is merely the value of the first payment to be received in the event of the given life surviving one year. Its value is therefore \( v p_1 \); and we have consequently \( A = v p_1 + v p_1 A_1 \), or \( A = v p_1 (1 + A_1) \).

This formula, which gives the value of an annuity at any age in terms of the next higher age, and greatly facilitates the computation of the annuity tables, is due to Euler.

57. The value of an annuity on the joint lives of any number of individuals, that is, to continue only while they are all living, is calculated precisely in the same manner as the annuity on a single life. Let there be any number of individuals, A, B, C, D, &c., and let the probabilities of each living over one year be respectively \( p_1, q_1, r_1, s_1, \ldots \), and let \( P_1 \) be the probability that they will all live over one year; then

\[ P_1 = p_1 \times q_1 \times r_1 \times s_1, \ldots \]

\[ P_2 = p_2 \times q_2 \times r_2 \times s_2, \ldots \]

and the value of an annuity of L1 on the joint lives is \( \sum v^x P_x \), from \( x = 1 \) to \( x = \) the number which renders any one of the probabilities \( p_x, q_x, r_x, s_x \) &c. nothing.

58. The value of an annuity on the survivor of any number of given lives, that is, to continue so long as any one of them exists, is thus found. The probability that A will be alive at the end of the \( x \)th year being \( p_x \), the probability that he will not be alive at the end of that time is \( 1-p_x \). The probability that all the lives will be extinct at the end of the \( x \)th year is therefore

\[ (1-p_x)(1-q_x)(1-r_x)(1-s_x), \ldots \]

and the probability that they will not all be extinct, or that at least one of them will be in being, is

\[ 1-(1-p_x)(1-q_x)(1-r_x)(1-s_x), \ldots \]

which becomes by multiplication

\[ p_x + q_x + r_x + s_x + \ldots \]

\[ -v^x p_x + v^x q_x + v^x r_x + v^x s_x + \ldots \]

Multiplying each of the terms by \( v^x \), and taking the sums of the respective products from \( x = 1 \), and observing that \( \sum v^x p_x q_x \) is the value of the annuity on the joint lives of A and B, \( \sum v^x p_x q_x r_x \) that on the joint lives of A, B, and C, and so on, we have this rule—

The value of an annuity on the survivor of any number of lives is equal to the sum of the annuities on each of the lives, minus the sum of the annuities on each pair of joint lives, plus the sum of the annuities on the joint lives taken by threes, and so on. When there are only two lives, the value of the annuity on the life of the survivor becomes

\[ \sum v^x p_x + \sum v^x q_x - \sum v^x p_x q_x. \]

59. Let \( V \) denote the value of an assurance on the life of A, or the present worth of L1 to be received at the end of the year in which A shall die. In respect of any year, the \( x \)th, after the present, the probability of A dying in the course of that year is \( p_{x-1} - p_x \). For let \( u \) be the probability that a life \( x-1 \) years older than A will live over one year, then \( 1-u \) is the probability of a life of that age not living over one year; therefore \( p_{x-1} - p_x \) being the probability of A living over \( x-1 \) years, \( p_{x-1}(1-u) \) is the chance of his living over \( x-1 \) years, and dying in the following year (7). But \( p_{x-1}(1-u) = p_{x-1} - p_{x-1}u \); and by (53), \( p_{x-1}u = p_x \); therefore \( p_{x-1} - p_x \) is the chance that A will survive \( x-1 \)

---

1 For further details on this subject, see Mortality, vol. xv. p. 637. Probability years and not survive \( x \) years. Now \( v_x \) is the value of \( L_1 \) to be received certainly at the end of the \( x \)th year; therefore in respect of the \( x \)th year the value of the expectation is \( v_x (p_{x-1} - p_x) \); whence we have for the value of the assurance

\[ V = \sum v^x (p_{x-1} - p_x), \]

from \( x = 1 \) to \( x = n \) the number which makes \( p = 0 \). Now, if we observe that \( p_0 = 1 \), and \( \sum v^x p_{x-1} = v(1 + v^x p_x) \); hence denoting \( \sum v^x p_x \) by \( A \), (\( A \) being as in (54) the value of the annuity on the given life), we have

\[ V = v(1 + A) - A; \quad \text{or} \quad V = v - (1 - v)A. \]

60. The values of assurances on joint lives, (that is, to be paid at the end of the year in which any one of the lives shall fail), or on the survivor of any number of joint lives, are calculated from the corresponding annuities by means of the same formula. Thus, let \( A' \) be the value of an annuity of \( L_1 \) on any number of joint lives, and \( V' \) the value of the assurance of \( L_1 \) on the same joint lives, then \( V' = v - (1 - v)A' \).

If \( A'' \) be the annuity, and \( V'' \) the assurance on the life of the survivor of any number of given lives, we have still

\[ V'' = v - (1 - v)A''. \]

61. Assurances on lives are usually paid not in single payments, but by equal yearly payments, the first being made at the time the contract is entered into, and the succeeding ones at the end of each future year during the life of the assured. The present value of the sum which the assured contracts to pay is therefore equal to the first payment added to the value of an annuity of the same amount on his life; and if the assurance is made on terms of mathematical equality, this sum must be precisely equal to the value of the assurance in a single payment. Therefore, if \( y \) denote the amount of the yearly payment, we have the equation

\[ y(1 + A) = V; \quad \text{whence} \quad y = V(1 + A). \]

62. The value of a temporary assurance for \( n \) years, that is, of an assurance to be paid only in the event of the individual dying before the end of \( n \) years is thus found. Let \( V \) be the present value of \( L_1 \), to be paid on the death of a person now aged \( y \) years, and \( V_n \) the present value of \( L_1 \), to be paid on the death of a person now aged \( y + n \) years.

At the end of \( n \) years from the present time, the value of \( L_1 \) assured on the life of a person now aged \( y \) years will be \( V_n \) if he be then living. But the present value of \( L_1 \) to be received certainly at the end of \( n \) years is \( p_n \); and the probability that the life will continue \( n \) years is \( p_n \); therefore the present value of \( V_n \), subject to the contingency of the life continuing \( n \) years, is \( v^n p_n V_n \). If, therefore, we subtract this from \( V \), we shall have the value of the temporary assurance in a single payment, namely \( V - v^n p_n V_n \).

The equivalent annual premium is found by observing, that as the first payment is made immediately, and \( n \) payments are to be made in all, the value of all the premiums after the first is that of a temporary annuity of the same amount for \( n - 1 \) years. Denoting therefore the annual premium by \( u \), and the value of a temporary annuity for \( n - 1 \) years by \( A^{(n)} \), the value of all the premiums is \( u + uA^{(n)} = u(1 + A^{(n)}) \); and we have consequently

\[ u(1 + A^{(n)}) = V - v^n p_n V_n, \]

whence

\[ u = \frac{V - v^n p_n V_n}{1 + A^{(n)}}. \]

63. The following question is of frequent occurrence. Required the present value of a sum of money to be received at the end of the year in which \( A \) dies, provided he die while \( B \) is living.

Let the sum be \( L_1 \), \( W \) its present value, \( p_x \) the probability of \( A \) living over \( x \) years, and \( q_x \) the probability of \( B \) living over \( x \) years. The chance of receiving the sum at the end of any given year, the \( x \)th, depends on two contingencies: 1. \( A \) may die in the course of that year, and \( B \) live over it; 2. \( A \) and \( B \) may both die in that year, \( A \) dying first. The probability of \( A \) dying in the \( x \)th year has been shewn (59) to be \( p_{x-1} - p_x \); whence (7) the probability of the first contingency is \( (p_{x-1} - p_x)(q_{x-1} - q_x) \); and for so short a period as one year, it may be considered an even chance whether \( A \) or \( B \) will die first, whatever be the difference of their ages; therefore the probability in respect of the second contingency is \( \frac{1}{2}(p_{x-1} - p_x)(q_{x-1} - q_x) \). Hence the whole probability of the sum being received at the end of the \( x \)th year, is

\[ \frac{1}{2}(p_{x-1} - p_x)(q_{x-1} - q_x) + \frac{1}{2}(p_{x-1} - p_x)(q_{x-1} - q_x) = \frac{1}{2}(p_{x-1} - p_x)(q_{x-1} - q_x) + \frac{1}{2}(p_{x-1} - p_x)(q_{x+1} - q_x), \]

which being developed, and multiplied by \( v^x \), becomes

\[ \frac{1}{2}(p_{x-1} - p_x)(q_{x-1} - q_x) + \frac{1}{2}(p_{x-1} - p_x)(q_{x-1} - q_x) = \frac{1}{2}(p_{x-1} - p_x)(q_{x-1} - q_x) + \frac{1}{2}(p_{x-1} - p_x)(q_{x+1} - q_x), \]

and the sum of all the values of this expression from \( x = 1 \), gives the value of \( W \).

It has been already shewn (59) that \( \sum v^x (p_{x-1} - p_x) = v - (1 - v)A \), where \( A \) is the annuity on the life of \( A \). In like manner, if we denote by \( AB \) the value of an annuity on the joint lives of \( A \) and \( B \), we shall have \( \sum v^x (p_{x-1} - p_x)(q_{x-1} - q_x) = v - (1 - v)AB \), which is the value of an assurance to be paid on the death of the first dying. Assume \( p' \) such that \( p' = p_{x-1} - p_x \), then \( p' \) is evidently the probability that an individual \( A' \) one year younger than \( A \), will live over \( x \) years (53), and \( \sum v^x p_{x-1} q_x = \frac{1}{p'} \sum v^x p' q_x = \frac{1}{p'} A'B \); denoting by \( A'B \) the value of an annuity on the joint lives of \( A' \) and \( B \). Again, let \( q' = q_{x-1} - q_x \), then \( q' \) is the probability that \( B' \), who is one year younger than \( B \), will live over \( x \) years, and \( \sum v^x p_{x-1} q_x = \frac{1}{q'} \sum v^x p' q_x = \frac{1}{q'} A'B' \); denoting by \( A'B' \) the value of an annuity on the joint lives of \( A \) and \( B' \). Collecting the different terms, we have therefore

\[ W = \frac{1}{2} \left[ v - (1 - v)AB + \frac{1}{p'} A'B - \frac{1}{q'} A'B' \right], \]

whence \( W \) is easily computed from tables of annuities on joint lives.

If \( A \) and \( B \) are both of the same age, the two last terms destroy each other, and \( W \) is equal to the value of \( L_1 \), to be paid on the failure of the joint lives, as it evidently ought to be, since there is in this case the same chance of \( A \) dying before \( B \) as of \( B \) dying before \( A \).

The formula gives the value of \( L_1 \) in a single payment; the equivalent yearly payment is \( W \) divided by \( 1 + AB \), for the contract ceases on the failure of the joint lives by the death of either.

It would be easy to extend the formula to the case of an assurance to be paid on the contingency of the failure of any number of lives during the continuance of any number of other lives, or of an assurance to continue only during a stated time; but as it is not our purpose to give solutions of the various problems of this kind which may occur in practice, but merely to shew the manner in which the general principles of the theory are applied to them, we shall not pursue the subject farther, but refer the reader to the article Annuities, and to the standard works of Baily and Milne, in which it is treated in detail.

**Sect. VII. Of the Application of the Theory of Probability to Testimony, and to the Decisions of Juries and Tribunals.**

64. The case of a witness making an assertion may be represented by an urn containing balls of two colours, the Probability ratio of the number of one colour to that of the other being unknown, but presumed from the result of a number of experiments, which consist in drawing a ball at random, and replacing it in the urn after each trial. A true assertion being represented by a ball of one colour, and a false one by a ball of the other, it follows from the theorem in (51), that if a witness has made $m + n$ assertions, of which $m$ are true and $n$ false, the probability of a future assertion being true is $\frac{m+1}{m+n+2}$, and that of its being false $\frac{n+1}{m+n+2}$. Let the first of these fractions be represented by $v$, and the second by $w$, then $v$ is the measure of the veracity of the individual, or the probability of his speaking the truth, and $w$ the opposite probability, since $v+w=1$. In general, the existing data are insufficient to enable us to determine the numerical values of $v$ and $w$ in this manner; and therefore in applying the formula to particular cases, we must assign arbitrary values to these quantities, founded on previous knowledge of the moral character of the individual, or on some notions, more or less sanctioned by experience, of the relative number of true and false statements made by men in general, placed in similar circumstances.

65. Having assumed $v$ and $w$, let us suppose a witness to testify that an event has taken place, the a priori probability of which is $p$, and let it be proposed to determine the probability of the event after the testimony. In this case the event observed ($E$) is the assertion of the witness, and two hypotheses only can be made respecting its cause; 1st, that the event testified really took place; 2nd, that it did not. On the first hypothesis the witness has spoken the truth, the probability of which is $v$; and an event has occurred of which the probability is $p$; therefore (7) the probability ($P_1$) of the coincidence is $vp$. On the second hypothesis, the witness has testified falsely, the probability of which is $w$; and the event attested did not happen, the probability of which is $q$; therefore the probability ($P_2$) of the coincidence is $wq$. Hence, by the formula (47) $w_1 = P_1 + \epsilon P_2$, the probability ($w_1$) of the first hypothesis becomes $\frac{vp}{vp+wq}$, and the probability ($w_2$) of the second $\frac{wq}{vp+wq}$. The sum of these two probabilities is unit, a condition which ought evidently to be fulfilled, since no other hypothesis can be made, and consequently one or other of the two must be true. It is to be observed, that these values of $w_1$ and $w_2$ are the respective probabilities, after the testimony has been given, that the event attested took place, and that it did not.

Since $w_1 = \frac{vp}{vp+wq}$ we have $w_1 - p = \frac{p(v-qp-wq)}{vp+wq} = \frac{p\{v(1-p)-wq\}}{vp+wq} = \frac{p(vq-wq)}{vp+wq}$; but $v-w=v-1+\epsilon v = 2v-1$, therefore $w_1 - p = \frac{p(2v-1)q}{vp+wq}$. This fraction being positive or negative, according as $2v-1$ is greater or less than unity, or as $v$ is greater or less than $\frac{1}{2}$, it follows that if $v > \frac{1}{2}$, then $w_1 > p$; that is to say, the probability of the event after the testimony is greater than its a priori probability when the veracity of the witness is greater than $\frac{1}{2}$. On the contrary, if the veracity of the witness is less than $\frac{1}{2}$, the effect of the testimony is to render the probability of the event less than its a priori probability.

66. If the event asserted by the witness be of such a nature that its occurrence is a priori extremely improbable, so that $p$ is a very small fraction, and $q$ consequently approaches nearly to unity, although at the same time the veracity of the witness be great, and measured by a fraction approaching to unity, the value of $w_1$ becomes nearly equal to $p + w$, (for on this supposition $p + wq \cdot v$ is nearly equal to $w$). But it is obvious, that however great the improbability of a witness giving false testimony may be supposed, the improbability of a physical event may be any number of times greater; in other words, however small a value may be given to $w$, the value of $p$ may still be any number of times smaller; so that notwithstanding the veracity of the witness, the probability of the event after the testimony, namely $w_1 = p + w$, may be less than any assignable quantity. On this principle mankind do not easily give credence to a witness asserting a very extraordinary or improbable event. The odds against the occurrence of the event may be so great, that the testimony of no single witness, however respectable his character, would suffice to induce belief.

67. In the case of the character of a witness being altogether unknown, we may suppose $v$ to have all possible values within certain limits, and to find the value of $w_1$ by integrating the fraction $f(w, dw)$ between those limits. Since $w_1 = \frac{vp}{vp+wq}$, we have $f(w, dw) = \int \frac{vp}{vp+wq} dw$, which on substituting $1-v$ and $1-p$ for $w$ and $q$ respectively, becomes $\int \frac{vp}{1-p+(2p-1)v} dw$, the integral of which is

$$C \left( \frac{v}{2p-1} - \frac{1-p}{(2p-1)^2} \log \left( \frac{1-p+(2p-1)v}{1-p} \right) \right) + C,$$

$C$ being a constant, the value of which will be determined from the assumed limits. If $v$ be supposed to vary between the limits $v=0$ and $v=1$, then

$$f(w, dw) = \frac{p}{2p-1} \left( \frac{1-p}{2p-1} \log \frac{p}{1-p} \right);$$

and if we assume $p=\frac{1}{2}$, we have $f(w, dw) = \frac{1}{2}(1-\log_3)$, which, since the logarithm is the Napierian logarithm, and Nap. log. $3=1.0986$, becomes $\frac{3}{4} \times 4507 = 0.676$, or nearly $\frac{3}{4}$. Whence we see, that on this hypothesis the probability of the event is diminished in consequence of the testimony.

68. The credit due to the testimony of a witness depends not merely on his good faith, but also on the probability that he is not himself deceived with respect to the event he asserts. The chances of a witness being deceived through credulity or ignorance are much more numerous in general than the chances of intentional fraud; and this must be the case more particularly when the event is of such a nature that it may happen in various ways which may be mistaken one for another: as for instance, in the case of a lottery ticket being drawn, and the witness asserting that it bears a particular number, which might with equal probability be any other number on the wheel. The following question will illustrate the method of applying the calculus when a distinction is made between these sources of error.

An urn contains $s$ balls, of which $a_1$ are marked $A_1$, $a_2$ marked $A_2$, ..., $a_n$ marked $A_n$. A ball having been drawn at random, a witness of the drawing affirms that the ball drawn is marked $A_m$; required the probability of the testimony being true.

Here we have $s = a_1 + a_2 + a_3 + \cdots + a_n$ ($n$ being the number of the different indices or sorts of balls); so that if we make $p_1 = a_1 + s$, $p_2 = a_2 + s$, ..., $p_n = a_n + s$, then $p_i$ is the a priori probability that the ball drawn is of the class marked $A_i$, $p_i$ the probability that it belongs to the class whose index is $A_i$, and so on. It is evident that $n$ different hypotheses may be made respecting the index of the ball which has been drawn, for it may belong to any one of the different classes $A_1, A_2, \ldots, A_n$. Let the probabilities of these hypotheses be respectively $w_1, w_2, \ldots, w_n$ (that is, in respect of any particular index $i$, $w_i$ is the probability after the assertion that the ball drawn is marked $A_i$); and let the probabilities of the assertion on each of these hypotheses be respectively $P_1, P_2, \ldots, P_n$ (that is, if the ball drawn be marked $A_i$, then $P_i$ is the probability the witness will assert it Probability to be marked $A_m$. Lastly, let $v$ be the veracity of the witness, and $u$ the probability that he has not been deceived.

(1.) Let us first consider the hypothesis that the ball drawn is marked $A_m$, and consequently that the assertion is true. In order to find $P_{m}$, the probability of the assertion being made, there are four cases to be considered. 1st, we may suppose the witness is not deceived himself ($w$), and that he speaks the truth ($v$). The probability of the assertion in this case is $uv$. 2d, The witness knows the truth, but intends to deceive, or testifies falsely. In this case the probability of the assertion being made, on the hypothesis under consideration, is $0$. 3d, The witness has been deceived himself, but intends to speak the truth. In this case also the probability of the assertion being made is $0$. 4th, The witness has been deceived himself, and intends to deceive. In this case the assertion might be made; and to find the probability of its being made we have to consider, that since the witness has been deceived, he must have supposed some other index than $A_m$ to have been drawn; and since he intends to deceive, he must assert some other index to be drawn than that which he supposes to be drawn. Setting aside, therefore, the index which he supposes to have been drawn, there remain $n-1$ others, any one of which he is as likely to name as any other. The probability, therefore, of his naming $A_m$ when he intends to deceive is $\frac{1}{n-1}$. Hence the probability of the assertion in this case is compounded of the probabilities of three simple events, as follows: 1. Probability the witness is deceived = $(1-u)$; 2. Probability he intends to deceive = $(1-v)$; 3. Probability he names $A_m$ = $1-(n-1)$. The probability of the assertion is therefore in this case = $(1-u)(1-v)+\frac{1}{n-1}$. Adding this to the probability found in the first case, we have $P_{m}$, the whole probability of the assertion being made on the hypothesis that the index of the ball drawn was $A_m$, namely

$$P_{m} = uv + \frac{(1-u)(1-v)}{n-1}.$$

(2.) Let us now consider one of the remaining hypotheses, and suppose that the ball actually drawn was marked $A_n$ and not $A_m$ as attested by the witness. As before, there are four possible cases for consideration. 1st, The witness knows the fact and speaks the truth. In this case the assertion could not be made, or its probability is $0$. 2d, The witness knows the fact, and intends to deceive. In this case the probability of his asserting $A_m$ to be drawn is compounded of the probability that he is not deceived ($w$), the probability that he testifies falsely ($1-v$), and the probability that, knowing the index $A_n$ to be drawn, he selects $A_m$ from among the $n-1$ (which remain after rejecting $A_n$) $(1-(n-1))$. The probability of the assertion being made in this case is therefore $u(1-v)+\frac{1}{n-1}$. 3d, The witness is deceived, and intends to speak the truth. By reasoning as in the last case, it is easy to see that the probability of the assertion being made in this case is $(1-u)+\frac{1}{n-1}$. 4th, The witness is deceived, and intends to deceive. The probability of the assertion being made in this case will be found by considering, that as the witness is himself deceived, he must suppose some particular index to be drawn different from $A_n$ (which is drawn by hypothesis), for instance $A_m$, the probability of which is $1-(n-1)$; and intending to deceive, he must fix on some index different from $A_n$, which he supposes to be drawn; and he announces $A_m$, the probability of which selection is also $1-(n-1)$. The probability, therefore, that the witness supposes $A_n$ to be drawn, and announces $A_m$, is $1-(n-1)^2$. But it is evident, that whatever can be affirmed with respect to the particular index $A_n$, may be affirmed with equal truth of every one of the other $n$ indexes, excepting $A_n$ which is actually drawn, (since by hypothesis the witness is deceived), and $A_m$, which he announces, (since by hypothesis he lies). There are therefore $n-2$ different ways in which he may at the same time be deceived, and intend to deceive, and announce $A_m$; consequently the probability of this announcement in any of these ways is $(n-2)+(n-1)^2$. Multiplying this into the probability of his being deceived $(1-u)$, and the probability of his giving false testimony $(1-v)$, the probability of the assertion in this case becomes $\frac{(1-u)(1-v)(n-2)}{(n-1)^2}$. Hence the whole probability of the assertion, in all the cases included in the hypothesis that the ball actually drawn was marked $A_n$ is

$$P_{n} = \frac{u(1-v)}{n-1} + \frac{(1-u)v}{n-1} + \frac{(1-u)(1-v)(n-2)}{(n-1)^2}.$$

As this expression will evidently be the probability of the assertion on any other of the $n-1$ hypotheses that the ball actually drawn was marked with an index different from $A_m$, the sum of the probabilities of the assertion on all these hypotheses is $2P_{n}$ where $i$ is successively each of the numbers $1, 2, 3, \ldots, n$, excepting $m$.

We have now to find $w_{m}$, the probability of the first hypothesis. Since the hypotheses, in the present question, are not all equally probable a priori, we must have recourse to the formula (49) $w_{m} = \lambda_{m}P_{m} + \Sigma \lambda_{i}P_{i}$ and consequently in the present case we have

$$w_{m} = \frac{\lambda_{m}P_{m}}{\lambda_{m}P_{m} + \Sigma \lambda_{i}P_{i}},$$

the sign of summation $\Sigma$ including every value of $i$ from $0$ to $n$, excepting $i=m$. Now the value of $P_{i}$ being the same in respect of each of the hypotheses which suppose the assertion untrue, $\Sigma \lambda_{i}P_{i} = P_{i}\Sigma \lambda_{i}$; and the sum of all the values of $\lambda_{i}$ from $i=0$ to $i=n$ being $1$, on excluding $\lambda_{m}=\alpha_{m}+v$, we have $\Sigma \lambda_{i}=(s-\alpha_{m})+v$. Substituting this, together with the values of $P_{m}$ and $P_{i}$ as above found, and making $u'=1-u$, $v'=1-v$, the formula becomes, after the proper reduction,

$$w_{m} = \frac{\alpha_{m}(n-1)uv+uv'}{\alpha_{m}(n-1)uv+uv' + (s-\alpha_{m})(uv'+uv+\sum_{i=1}^{n-1}uv')}.$$

which is the probability of the hypothesis that a ball marked $A_n$ was drawn, or that the testimony is true.

When there are no two balls in the urn having the same index, the numbers $\alpha_{1}, \alpha_{2}, \alpha_{3}, \ldots$ become each $=1$, and $s=n$. In this case the formula gives

$$w_{m} = \frac{(n-1)uv+uv'}{(n-1)uv+uv' + (n-1)(uv'+uv)+\sum_{i=1}^{n-1}uv'},$$

which, on observing that $uv+uv'+uv+\sum_{i=1}^{n-1}uv'=1$, becomes by reduction

$$w_{m} = \frac{(1-u)(1-v)}{n-1}.$$

This is the probability of the truth of the testimony of a witness, who affirms that the number $m$ is drawn from an urn which contains $n$ balls, numbered $1, 2, \ldots, n$. It is obvious, that when $u$ and $v$ are fractions approaching to unity, and $n$ is a considerable number, the second term becomes very small, and may be neglected. The probability then becomes simply $w_{m}=uv$.

69. We now proceed to consider the probability of an event attested by several witnesses; and first let us suppose the witnesses to agree in their testimony. The measures of the veracity of the several witnesses being respectively $v_{1}, v_{2}, v_{3}, \ldots$, and the a priori probability of the event being $p$, we have by (58) for its probability after the testimony of the first witness,

$$w_{1} = \frac{v_{1}p}{v_{1}p+(1-v_{1})(1-p)}.$$

In order to find the probability of the event after the second witness gives his testimony, we may suppose the a priori probability to be changed from $p$ to $w_{1}$ by the testimony of the first witness, and the same formula gives Let a third witness now come forward, and give testimony in favour of the same event. Its probability after his testimony will become in like manner

\[ w_3 = \frac{v_1 v_2}{v_1 v_2 + (1-v_1)(1-v_2)} = \frac{v_1 v_2 p}{v_1 v_2 p + (1-v_1)(1-v_2)(1-p)}. \]

In general, let \( w_x \) be the probability of an event after it has been attested by \( x \) witnesses, and let \( v_x \) be the veracity of the last witness, then \( w_{x-1} \) being the probability of the event after \( x-1 \) eyewitnesses have each testified in its favour, we have

\[ w_x = \frac{v_1 v_2 \ldots v_x p}{v_1 v_2 \ldots v_x p + (1-v_1)(1-v_2) \ldots (1-v_x)(1-p)}. \]

If we suppose the witnesses all equally credible, or that \( v_1 = v_2 = \ldots = v_x \), this becomes

\[ w_x = \frac{v_x p}{v_x p + (1-v_x)^x(1-p)} = \frac{1}{1 + \left(\frac{1-v_x}{v_x}\right)^x \frac{1-p}{p}}. \]

Now, if \( v = \frac{1}{2} \), then \( (1-v) = \frac{1}{2} \), and \( w_x = p \); whence it appears that the probability of an event is not increased by the testimony of any number of witnesses, when the veracity of each is only \( \frac{1}{2} \); but when \( v \) is greater than \( \frac{1}{2} \), the event becomes more probable as the number of witnesses is greater, and when \( v \) is a considerable fraction, its probability increases very rapidly with the number of witnesses.

70. When the values of \( v \) and \( p \) are given, that of \( x \) in the last formula may be found so as to render \( w_x \) of any given value. Hence we may find the number of witnesses required to make it an even wager, whether an event exceedingly improbable, and in favour of which they give unanimous testimony, has happened or not. For example, let the odds against the event be a million million to one, that is, let \( p = \frac{1}{1,000,000,000,001} = \frac{1}{10^{12}+1} \), and let \( v \) the veracity of each witness be \( \frac{9}{10} \). In order that \( w_x \) may equal \( \frac{1}{2} \), we must have \( \left(\frac{1-v}{v}\right)^x \frac{1-p}{p} = 1 \). Now \( \frac{1-v}{v} = \frac{1}{9} \) and \( \frac{1-p}{p} = 10^{-12} \), therefore \( \left(\frac{1}{9}\right)^x \times 10^{12} = 1 \); whence \( x \log \frac{1}{9} = \log \frac{1}{10^{12}} \) or \( x \log 9 = 12 \) and therefore \( x = \frac{12}{\log 9} = 12.6 \).

nearly, so that 13 independent witnesses would suffice to render it more probable that the event really took place than that it did not.

This example is given by Mr. Babbage, (Ninth Bridgewater Treatise, Note E), with a view to show the fallacy of Hume's celebrated argument respecting miracles. What the example proves is simply this, that if we suppose an urn to contain a million million of white balls, and only one black ball, and that on a ball being drawn at random from the urn, thirteen eyewitnesses of the drawing, each of whom makes only one false statement in ten, without collusion, and independently of each other, affirm to A, who was not present at the drawing, that the ball drawn was black, then A would have rather a stronger reason for believing than for disbelieving the testimony. But it is sufficiently obvious, that the event attested in this case, though exceedingly improbable a priori, cannot be regarded as in any way miraculous. On the contrary, the black ball might be drawn with the same facility, and was a priori as likely to be drawn, as any other specified ball in the urn. Let it be granted that an event is within the range of fortuitous occurrence, and that there exists a single chance in its favour out of any number of millions of chances, it may then happen in any one trial; nay, a number of trials may be assigned, such that its non-occurrence would be many times more improbable than the contrary.

71. Let us next consider the case of a number of witnesses contradicting each other. If the first witness announces an event of which the probability is \( p \), then the probability, after the testimony, of its having happened is \( w_p \), and the probability that it has not happened is \( 1-w_p \). Suppose a second witness now to appear, and testify that the event has not happened, and let the probability of the truth of his testimony be denoted by \( w_q \); then \( 1-w_q \) being the probability before his testimony was given that what he asserts is true, and \( v_q \) being the measure of his veracity, we have, as in (69), \( w_q' = \frac{v_q(1-w_q)}{v_q(1-w_q) + (1-v_q)w_q} \); hence, since

\[ w_q = \frac{v_q p}{v_q p + (1-v_q)(1-p)}, \]

there results

\[ w_q' = \frac{(1-v_q)v_q(1-p)}{(1-v_q)v_q(1-p) + v_q(1-v_q)p}. \]

for the measure of the probability that the event has not happened. The probability that it has happened is therefore \( 1-w_q' \), and accordingly if \( w_q' \) be less than \( \frac{1}{2} \), there is a stronger reason for believing that the event happened than that it did not. The method of forming the expression for the probability of the event, after it has been attested or denied by a third witness, or any number of successive witnesses, is obvious.

If we suppose the values of \( v_q \) and \( v_q \) to be equal, the expression becomes \( w_q' = 1-p \), which is the a priori probability that the event did not happen. It is obvious that this must be the case, inasmuch as two contradictory testimonies of equal weight neutralize each other. In general, the probability of an event which is affirmed by \( m \) witnesses, and denied by \( n \) witnesses, all equally credible, is the same as that of an event which is affirmed by \( m-n \) witnesses who agree in their testimony.

72. When a relation has been transmitted through a series of narrators, of whom the first only has a direct knowledge of the event, and each of the others derives his knowledge from the relation of the preceding, the probability of the event is diminished by every succeeding relation. In order to obtain a general expression for the probability of a traditional testimony, we may take the event considered in (68), namely the extraction of a ball marked \( A_m \) from an urn containing \( s \) balls, of which \( a_1 \) are marked \( A_1 \), \( a_2 \) marked \( A_2 \), ..., \( a_n \) marked \( A_n \), there being in all \( n \) different indices. Now suppose the relation to have passed through a chain of narrators, \( T_1, T_2, T_3, \ldots, T_s \), in number \( s+1 \), of whom the first only was an eyewitness of the event, each of the others receiving his knowledge of it from the one preceding him, and communicating it in its turn to the succeeding, the question is to determine the probability that a ball marked \( A_m \) was drawn, after this event has been narrated by \( T_s \), the last witness of the series.

73. In order to apply the general formula of (68) to this case, it is necessary to remark that the event observed is the attestation of \( T_s \) of his having been informed by \( T_{s-1} \) that the ball drawn from the urn, the drawing of which was seen by \( T_1 \) was marked \( A_m \). There are \( n \) different hypotheses respecting the index of the ball actually drawn, but it is only necessary to consider two of them, namely, the hypothesis that the ball actually drawn was marked \( A_m \), and any one of the other hypotheses which consist in supposing that a ball with a different index from \( A_m \) was drawn, for example \( A_r \). Let the probability of the attestation, on the hy- Probability hypothesis that the index of the ball drawn was $A_m$, be denoted by $y_s$, and its probability on the hypothesis that the index was $A_r$ by $y'_s$, ($y_s$ and $y'_s$ corresponding to $P_m$ and $P_r$ in (68), which express the same probabilities in respect of the eyewitness $T$), then by (68), the probability of the hypothesis that $A_m$ was drawn is

$$w_m = \frac{\lambda_m y_s}{\lambda_m y_s + \lambda_r y'_s}.$$

But since $y'_s$ is the same for all the hypotheses that the index drawn was different from $A_m$, $\lambda_r y'_s = y'_s \lambda_r$; and by (68) $\lambda_r = (\lambda_m - a_m) + s$, and $\lambda_m = a_m + s$, therefore

$$w_m = \frac{a_m y_s}{a_m y_s + (\lambda_m - a_m) y'_s}.$$

We have now to find $y_s$ and $y'_s$ in terms of $x$. Let $v_1, v_2, v_3, \ldots$ be the respective probabilities of $T_1, T_2, T_3, \ldots$ speaking the truth, then the probability of $T_x$ speaking the truth is $v_x$, and the probability that he does not $1-v_x$ whether because he is dishonest, and intends to deceive, or because he has mistaken the statement of the preceding witness. Now there are two ways in which it may happen that $A_m$ is announced by $T_x$. First, if he speaks the truth, and has been informed by the preceding narrator $T_{x-1}$ that $A_m$ was the index drawn; secondly, if he lies, and has been informed by $T_{x-1}$ that a different index from $A_m$ was drawn. Assuming $y_{x-1}$ to have the same signification with respect to $T_{x-1}$ that $y_x$ has been assumed to have with respect to $T_x$ (that is to say, the probability of the assertion being made by $T_{x-1}$ on the hypothesis that the ball actually drawn was $A_m$), the probability of the first of these combinations is $v_x y_{x-1}$. With respect to the second case, it is to be observed, that if $T_x$ announces a different index from that which has been announced to him by $T_{x-1}$, the chance of his announcing $A_m$ out of $n-1$ indexes different from that announced by $T_{x-1}$ is $1-(n-1)$; and on multiplying this by the probability $1-v_x$ that the testimony of $T_x$ is false, and by the probability $1-y_{x-1}$ that $T_{x-1}$ has announced a different index from $A_m$, we have, for the probability of the second combination $(1-v_x)(1-y_{x-1})+(n-1)$. The whole probability of $T_x$ testifying that $A_m$ was drawn, is therefore, on the first hypothesis, given by the equation,

$$y_x = v_x y_{x-1} + (1-v_x)(1-y_{x-1}) : (n-1).$$

This is an equation of finite differences of the first order, the complete integral of which is

$$y_x = \frac{1}{n} + \frac{C(nv_{x-1})(nv_{x-2}) \cdots (nv_{x-n+1})(nv_{x-n})}{(n-1)!}.$$

In order to determine the arbitrary constant $C$, it is to be observed, that, since $y_1, y_2, \ldots, y_n$ as well as $v_1, v_2, \ldots, v_n$ apply to the narrators $T_1, T_2, \ldots, T_n$ respectively, if we suppose $s=0$, the resulting value of the integral will be the probability that $A_m$ was announced by the eyewitness $T_1$, on the hypothesis that $A_m$ was actually drawn. Let this probability be $P_m$; then the equation becomes $P_m = C + 1/n$ whence $C = (nP_m - 1)/n$. If, therefore, we make

$$X = \frac{(nv_{x-1})(nv_{x-2}) \cdots (nv_{x-n+1})(nv_{x-n})}{(n-1)!},$$

we obtain, on the first hypothesis, for any value of $x$,

$$y_x = \left\{1 + \frac{(nP_m - 1)X}{n}\right\} : n.$$

In the same manner we find the probability $y'_s$ of the testimony given by $T_x$ on the hypothesis that the ball actually drawn was $A_r$. The probability of the event, after being testified by the eyewitness, being on this hypothesis $P_r$ we have,

$$y'_s = \left\{1 + \frac{(nP_r - 1)X}{n}\right\} : n.$$

Substituting these values of $y_s$ and $y'_s$ in the expression above found for $w_m$, we obtain for the probability of the event observed by $T_x$ and narrated by $T_y$, the narration having passed from one to another in the manner supposed,

$$w_m = \frac{a_m \left\{1 + \frac{(nP_m - 1)X}{n}\right\}}{a_m \left\{1 + \frac{(nP_m - 1)X}{n}\right\} + (s-a_m) \left\{1 + \frac{(nP_r - 1)X}{n}\right\}},$$

74. Since $\frac{m-1}{n-1} = \frac{1-v_x}{n-1}$, and since $v_x$ is always less than unity, and $n$ always greater than unity, each of the terms of the series represented by $X$, whether positive or negative, is a proper fraction, whence the value of $X$ becomes smaller and smaller as $x$ increases. Suppose $x$ infinite, then $X=0$, and $w_m = a_m + s$, which is the a priori probability of the event. Hence we see that the probability of an event transmitted through a series of traditionary evidence becomes weaker at every step, and ultimately equal to the simple probability of the event, independent of any testimony.

75. When the urn is supposed to contain only $n$ balls, each having a different index, the expression for $w_m$ is greatly simplified; for, in this case, $a_m = 1, s = m$; therefore, (since $P_m + (n-1)P_r = 1$) the denominator becomes $m$, and we have consequently $w_m = \left\{1 + \frac{(nP_m - 1)X}{n}\right\} : n$, which coincides with the value of $y_x$ found above, that is to say, with the probability of the event being testified by $T_x$ on the hypothesis that it actually happened. Laplace, in solving this particular case of the problem, (p. 456) assumes that the probabilities here denoted by $y_x$ and $w_x$ are identical. They are, however, as is evident from the above analysis, quite distinct in their nature, and their values are only equal in the particular case in which $s-a_m$ is to $a_m$ in the ratio of $m-1$ to $1$. (Poisson, p. 112.)

76. The question of determining the probability that the verdict of a jury is correct, is precisely analogous to that of finding the probability of an event attested by one or more witnesses. Let us first take the case of a single juror, and assume $u$ the probability that the juror gives a correct verdict, (that is, correct in respect of the facts), and $p$ the probability that the accused is guilty before being put on his trial. Suppose the verdict guilty to be returned; two hypotheses may be made respecting the cause of the verdict, first, that the accused is guilty; secondly, that he is innocent. On the first hypothesis, the accused will be condemned if the juror gives a right verdict, the probability of which is $u$. On the second hypothesis, the accused will be condemned if the juror gives a wrong verdict, the probability of which is $1-u$. But the a priori probabilities of these causes (the guilt or innocence of the accused) being respectively $p$ and $1-p$, we have by (49)

$$w_1 = \frac{up}{up + (1-u)(1-p)}, \quad w_2 = \frac{(1-u)(1-p)}{up + (1-u)(1-p)},$$

$w_1$ being the probability of the first hypothesis, or the probability that the accused is guilty after the verdict has been given, and $w_2$ the probability resulting from the verdict that the accused is innocent.

77. Suppose the verdict not guilty to be given, and let $w_1$ and $w_2$ be the probabilities after the verdict of the two hypotheses. On the first hypothesis, namely, that the accused is guilty, this verdict will be given if the juror gives a wrong verdict, of which the probability is $1-u$; and on the second hypothesis, the verdict will be given if the juror gives a right verdict, of which the probability is $u$ and the Probability probabilities of these hypotheses before the verdict being respectively \( p \) and \( 1-p \) as before, we have

\[ w' = \frac{(1-u)p}{(1-u)p + u(1-p)} \quad w'' = \frac{u(1-p)}{(1-u)p + u(1-p)} \]

From the above value of \( w \), we obtain \( w_1 - p = \frac{p(1-p)(2u-1)}{up+(1-u)(1-p)} \); a fraction which is positive or negative according as \( u \) is greater or less than \( \frac{1}{2} \). Hence it appears that the guilt of the accused is only rendered more probable by the verdict guilty being pronounced, when the probability that the juror gives a correct verdict is greater than \( \frac{1}{2} \).

In like manner it is shown that \( w_1 \) (the presumption of the guilt of the accused after a verdict of acquittal), is greater than \( p \) when \( u \) is less than \( \frac{1}{2} \).

78. The a priori probability of the condemnation of the accused before he is put on his trial is \( up + (1-u)(1-p) \); for there are two ways in which this condemnation may take place; first, if the accused be guilty, and the juror give a correct verdict, the probability of which concurrence is \( up \); and, secondly, if the accused be innocent, and the juror give a wrong verdict, the probability of which is \( (1-u)(1-p) \). Therefore, making \( c \) the probability of a verdict of condemnation, we have \( c = up + (1-u)(1-p) \); and for a verdict of acquittal, \( 1-c = (1-u)p + u(1-p) \).

79. Let us next suppose that after the verdict of the first juror has been pronounced, the accused is put on his trial before a second juror, and let \( u_g \) be the probability the second juror gives a correct verdict, and \( e_g \) be the probability the accused will be pronounced guilty by him. After the verdict guilty has been pronounced by the first juror, the probability of the guilt of the accused is \( w_1 \), and it is evident that \( c_g \) will be found by substituting \( u_g \) for \( u \), and \( w_1 \) for \( p \) in the above value of \( c \), whence \( c_g = u_g w_1 + (1-u_g)(1-w_1) \). The probability of a verdict of condemnation by both jurors is \( cc_g \); therefore, observing that \( w_1 = up + (1-u)p \), we have for this probability

\[ cc_g = u_g p + (1-u_g)(1-u_g)(1-p). \]

The probability of the guilt of the accused after a verdict of acquittal has been pronounced by the first juror being \( w_1 \), the probability of a verdict of acquittal being given by the second juror is \( 1-c_g = (1-u_g)p + u_g(1-w_1) \); therefore, observing that \( w_1 = (1-u)p + u(1-p) \), we have for the probability of a verdict of acquittal by both jurors

\[ (1-e_g)(1-c_g) = (1-u_g)(1-u_g)p + u_g(1-u_g)(1-p). \]

Adding the probability of a verdict of condemnation by both jurors, to that of acquittal by both, we have \( u_g p + (1-u_g)(1-u_g) \) for the probability of both giving the same verdict. This result is independent of \( p \), and is evidently true a priori, insomuch as there are two ways in which the same verdict may be given, namely, when both jurors are right, and when both are wrong.

The probability of acquittal by the second juror, after a verdict of guilt by the first, is \( 1-e_g = (1-u_g)p + u_g(1-w_1) \); multiplying by \( e_g \) and substituting for \( e \) and \( w_1 \) their values, we have for the probability of a verdict of guilt by the first, and not guilty by the second,

\[ e(1-e_g) = u(1-u_g)p + (1-u)u_g(1-p). \]

In like manner, if the accused has been acquitted by the first juror, the presumption of his guilt becomes \( w_1 \), and the probability of a verdict guilty by the second is \( e_g = u_g w_1 + (1-u_g)(1-w_1) \); therefore the probability of a verdict of not guilty by the first, and of guilty by the second is

\[ (1-e_g)c_g = (1-u_g)u_g p + u(1-u_g)(1-p). \]

The sum of these two expressions gives for the probability of a discordant verdict, \( u(1-u_g) + (1-u)u_g \).

80. If we now suppose \( u = u_g \), and make \( 1-u = u \), the probability that the two jurors will agree in their verdict, whether they are both right or both wrong, is \( u^2 + w^2 \); and the probability of a discordant verdict \( u + w = 2uw \). The probability sum of the two expressions is \( u^2 + 2uw + w^2 = (u+w)^2 \); and therefore the probabilities of the different cases are respectively given by the development of the binomial \( (u+w)^2 \).

By pursuing this reasoning, it is easy to see that if there be any number \( h \) whatever of jurors, or voters on any question which admits only of simple affirmation or negation, all being supposed to possess the same integrity and knowledge, so that there is the same probability \( u \) of a correct decision in respect of each, the probabilities of the different cases are found by the development of the binomial \( (u+w)^h \). The probability of a correct verdict being pronounced unanimously is \( u^h \); of an erroneous one being pronounced unanimously is \( w^h \); and the probability that a correct verdict will be given by \( m \) of the jurors, and an erroneous one by \( n \), is \( Uw^m \), where \( U = \frac{1 \cdot 2 \cdot 3 \cdots h}{1 \cdot 2 \cdot 3 \cdots m \times 1 \cdot 2 \cdot 3 \cdots n} \).

81. The probability that the accused will be pronounced guilty by \( m \) jurors, and acquitted by \( n \), on the supposition that the value of \( u \) is the same for each juror, is thus found. There are two ways in which this event may take place; 1st, if the accused be guilty (the probability of which is \( p \)), and \( m \) jurors decide correctly, and \( n \) wrongly (the probability of which is \( Uw^m \)); the probability of the condemnation taking place in this way is therefore \( Uw^m p \). 2d, If the accused be innocent (the probability of which is \( q \)) and \( n \) jurors decide rightly, and \( m \) wrongly (the probability of which is \( Uw^n \)); the probability of the event taking place in this way is therefore \( Uw^n q \). Let \( G \) therefore denote the whole probability of the verdict, and we have

\[ G = U(u^m w^p + u^n w^q). \]

Hence the probability that the accused will be condemned unanimously by a jury consisting of \( h \) jurors is \( u^h + w^h \); and the probability that he will be unanimously acquitted \( u^q + w^q \).

82. Suppose the accused to have been pronounced guilty by \( m \) jurors, and not guilty by \( n \) jurors, the probability of the verdict of the majority being correct is found from the formula in (49). Two hypotheses may be made: 1st, the accused is guilty; 2d, he is innocent. The probability \( P_1 \) of the observed event (the condemnation by \( m \), and acquittal by \( n \) jurors) on the first hypothesis is \( Uw^m \); and the a priori probabilities of the two hypotheses (or the probabilities denoted by \( \lambda_1 \) and \( \lambda_2 \) in (49), being \( p \) and \( q \); therefore if \( w_1 \) denote the probability of the verdict being correct, that is, the probability of the first hypothesis after the verdict has been pronounced, and \( w_2 \) the probability of its being wrong, we shall have (49)

\[ w_1 = \frac{u^m w^p}{u^m w^p + u^n w^q}, \quad w_2 = \frac{u^n w^q}{u^m w^p + u^n w^q}. \]

If the verdict has been pronounced unanimously, then \( m = h \) and \( n = 0 \), and the formula become

\[ w_1 = \frac{u^p}{u^p + w^q}, \quad w_2 = \frac{w^q}{u^p + w^q}. \]

If \( p = q = \frac{1}{2} \), and \( m = n \), we have then

\[ w_1 = \frac{u^p + w^q}{u^p + w^q} = \frac{u^p}{u^p + w^q}. \]

But this is the probability of a verdict being correct which has been pronounced unanimously by \( h \) jurors; whence it follows that the probability of a decision rendered by a given majority being correct, is the same as that of a decision rendered unanimously by a jury equal in number to the difference between the majority and minority, and is therefore independent of the total number of jurors. This, however, is only true on the supposition that the value of \( u \) is known a priori; for if \( u \) be not absolutely known, the weight of the Probability verdict depends on the ratio of the majority to the whole number of jurors. This is in accordance with common notions, for it will readily be admitted that a verdict given unanimously by a jury of 10 will be entitled to much more weight than one pronounced by a jury consisting of a large number, as 100, in which 55 are of one opinion, and 45 of the opposite. In this case, the opinion of the minority throws great doubt on the correctness of the verdict. It is to be observed, however, that the probability of a verdict being given by a small majority becomes less and less as the number of jurors is increased.

83. When the number who dissent from the opinion of the majority is unknown, and we merely know that the majority exceeds the minority by at least \( j \) jurors, the probability of the verdict being correct is found as follows. Suppose the verdict to be guilty. On the hypothesis that it is correct, the probability of the accused being found guilty by \( h \) jurors, and not guilty by \( n \) jurors, is by the formula in (80),

\[ U_{h} = \frac{w^{n}}{m!} \]

Now, if we give \( x \) successively all the values 0, 1, 2, ..., \( n \), where \( n = \frac{h}{j} \), and assume \( U_{x} \) to denote the value of \( U \) when \( x = 0 \), its value when \( x = 1 \), and so on; and also make \( W \) the probability of the accused being pronounced guilty by \( h - n \) jurors at least, we shall have

\[ W = U_{0} + U_{1} + U_{2} + \cdots + U_{n} \]

In like manner, if \( W' \) denotes the probability of a verdict guilty by \( h - n \) jurors at least, on the hypothesis that the accused is not guilty, we shall have,

\[ W' = U_{0} + U_{1} + U_{2} + \cdots + U_{n} \]

whence, \( p \) and \( q \) being as above the \( a \) priori probabilities of the two hypotheses, the probability that the verdict guilty is correct, when pronounced by \( h - n \) jurors at least, becomes

\[ Wp + (Wp + Wq) \]

84. It is evident that no application can be made of these formulas without assigning arbitrary values to \( u \) and \( p \); unless, indeed, we have data for determining their mean values from experience. With respect to \( p \), we may assume, for the sake of shewing the general consequences of the formulae, its value to be \( \frac{1}{2} \); for it cannot well be supposed less than \( \frac{1}{2} \), or that a person brought before a jury is more likely to be innocent than guilty; and if it much exceeds \( \frac{1}{2} \) and approaches to unity, a verdict of guilty may be expected from any jury, however constituted. When a mean value of \( u \) cannot be determined from experience, the only way of obtaining numerical results, is to suppose \( u \) to have all possible values within given limits, and to integrate the equations between those limits. As it seems unreasonable to suppose that a juror is more likely to give a wrong verdict than a right one, we may assume that \( u \) cannot be less than \( \frac{1}{2} \).

Suppose, then, that \( u \) increases by infinitely small increments from \( u = \frac{1}{2} \) to \( u = 1 \), and let it be proposed to determine the probability that a decision is correct when the accused has been pronounced guilty by \( m \) jurors, and not guilty by \( n \). Here an infinite number of hypotheses may be made respecting the value of \( u \), and we must therefore have recourse to the formula in (51). Let \( s = x \) be one of those hypotheses, \( P_{s} \) the probability on that hypothesis of the event observed (that is, of the accused being pronounced guilty by \( m \), and not guilty by \( n \) jurors), \( w_{s} \) the probability of the assumed hypothesis, and \( H \) the mean probability of the correctness of the verdict from all the hypotheses. By the formula in (81) we have

\[ P_{s} = U_{s} \left[ x^{m}(1-x)^{n} + x^{m}(1-x)^{n}(1-p) \right] \]

and as all the hypotheses are supposed equally probable, we have (45) \( w_{s} = P_{s} + zP_{s} \). But between the proposed limits \( P_{s} = \int_{0}^{1} x^{m}(1-x)^{n}dx + (1-p)\int_{0}^{1} x^{m}(1-x)^{n}dx \); if, therefore, we make \( p = \frac{1}{2} \), we shall have by reason of \( \int_{0}^{1} x^{m}(1-x)^{n}dx = \int_{0}^{1} x^{m}(1-x)^{n}dx \), \( zP_{s} = \frac{1}{2} \int_{0}^{1} x^{m}(1-x)^{n}dx \), and therefore

\[ w_{s} = \frac{x^{m}(1-x)^{n} + x^{m}(1-x)^{n}}{\int_{0}^{1} x^{m}(1-x)^{n}dx} \]

for the probability of the hypothesis. But (82) the probability on this hypothesis of the accused being guilty, is

\[ \frac{x^{m}(1-x)^{n}}{\int_{0}^{1} x^{m}(1-x)^{n}dx} \]

multiplying this by the probability of the hypothesis, \( w_{s} \), we obtain for the probability of the verdict being correct \( x^{m}(1-x)^{n} + \int_{0}^{1} x^{m}(1-x)^{n}dx \); and, therefore, for the probability of the verdict being correct on all the hypotheses from \( x = \frac{1}{2} \) to \( x = 1 \),

\[ H = \frac{\int_{0}^{1} x^{m}(1-x)^{n}dx}{\int_{0}^{1} x^{m}(1-x)^{n}dx} \]

Hence the probability that a verdict given by a majority \( m \) out of \( m + n = h \) jurors is wrong, is

\[ 1 - H = \frac{\int_{0}^{1} x^{m}(1-x)^{n}dx}{\int_{0}^{1} x^{m}(1-x)^{n}dx} \]

which, on effecting the integrations by the formula in (51) becomes after reduction

\[ 1 - H = \frac{1}{2k+1} \left[ 1 + \frac{h+1}{1} + \frac{(h+1)h}{1 \cdot 2} + \frac{(h+1)h(h-1)}{1 \cdot 2 \cdot 3} + \cdots + \frac{(h+1)h(h-1) \cdots (h-n+2)}{1 \cdot 2 \cdot 3 \cdots (n+1)} \right] \]

Assuming \( h \) (the number of jurors) = 12, and making \( n \) successively 0, 1, 2, 3, 4, 5, the series gives

\[ \begin{array}{cccccc} \frac{1}{8192} & \frac{14}{8192} & \frac{92}{8192} & \frac{378}{8192} & \frac{1093}{8192} & \frac{2380}{8192} \end{array} \]

for the respective probabilities of the error of a verdict when pronounced unanimously by 12 jurors, by a majority of 11 to 1, of 10 to 2, of 9 to 3, of 8 to 4, and of 7 to 5. In the last case the probability of the error is nearly \( \frac{1}{4} \).

85. From these results it appears that the chance of a verdict being wrong which has been pronounced unanimously by twelve jurors is very small; but it is to be remarked, that they have been deduced on the supposition that the unanimity proceeds from agreement in the same opinion, and that the jurors are unbiased by each other. In this country, where unanimity is compelled by law, the mean probability of a correct verdict can scarcely be considered as greater than that of a verdict pronounced by a simple majority; for, though in most cases the verdict may be supposed to represent the opinion of a larger majority than seven, it may happen, not unfrequently, that a smaller number than five, possessing greater energy or perseverance, may persuade the others into a surrender of their judgment. In fact, unless the presumption of the guilt of the accused be very great, it would scarcely be possible, without concert, to procure an unanimous verdict in any case. It is also to be observed, that the assumption of all values of \( u \) from \( \frac{1}{2} \) to 1 being equally probable, may lead to results widely different from the truth. The mean value of \( u \), which depends on the general intelligence of the class of persons from amongst whom the lists of jurors are made up, can only be rightly determined from data furnished by experience. One of the elements, however, which require to be known for this purpose, is the number of jurors who concur in and dissent from, the verdict. The forced unanimity of the law renders it impossible to obtain this element from the records of the English courts; but in France and Belgium, where the majority and minority are known and recorded, the same obstacle does not exist, and the "Comptes Généraux de l'Administration de la Justice Criminelle," published by the French Government, have enabled Poisson to deduce mean values of \( u \) and \( p \) for that country, and consequently to obtain the necessary data for one of the most interesting applications of the theory of Probabilities. The general results were as follows: During Probability the six years from 1825 to 1830 inclusive, the system of criminal legislation in France underwent no change; the jury consisted of 12, and a simple majority was only required to concur, though when it happened that the majority was the least possible, the Court had power to override the verdict.

On comparing, according to the rules of the theory, the verdicts given in the cases tried before the criminal courts during those six years, it was found that for the whole of France, the probability (u) of a juror giving a correct verdict was a little greater than \( \frac{3}{4} \) with respect to crimes against the person, and nearly equal to \( \frac{3}{4} \) with respect to crimes against property; without distinction of the species of crime, it was found to be a very little below \( \frac{3}{4} \). The other element, the probability (p) of the guilt of the accused before the trial, was found not much to exceed \( \frac{1}{2} \) (being between 0.53 and 0.54) with respect to crimes against the person, while it a little exceeded \( \frac{3}{4} \) in respect of crimes against property. Without distinction of crime, its value was very nearly 0.64.

86. On substituting these values of u and p (namely \( u = \frac{3}{4}, p = 0.64 \), whence \( u = \frac{3}{4}, q = 0.36 \)) in the formula in (81), and making \( m = 7, n = 5 \), and consequently

\[ U = \frac{1 \cdot 2 \cdot 3 \cdot \ldots \cdot 12}{1 \cdot 2 \cdot 3 \cdot \ldots \cdot 7 \cdot 1 \cdot 2 \cdot 3 \cdot 5} = 792, \]

we have \( G = \frac{792}{4^{12}} \).

\[ (3^7 \times 64 + 3^5 \times 36) = .07 \text{ nearly}. \]

Hence it may be expected, that in a hundred trials it will happen only seven times that the accused will be pronounced guilty by the smallest possible majority. If \( m = 12 \), and \( n = 0 \), we shall have \( u^p p^q + w^p q^q = 0.02027 \approx 0.07 \) nearly, for the probability of an unanimous verdict of guilty, and \( u^q + w^p = 0.0114 \) for the probability of an unanimous verdict of not guilty.

Making the same substitutions in the formula in (82), we have for the probability of a verdict guilty being correct, from which 5 jurors out of 12 dissent, \( w = \frac{1}{12} \); and \( w = \frac{1}{12} \) for the probability of its being wrong.

Substituting the same values in the series represented by W and W' in (83), and supposing n to have all values from \( n = 0 \) to \( n = 5 \), there results \( W = \frac{3^7}{4^{12}} \times 7254, W' = \frac{1}{4^{12}} \times 239122 \), whence

\[ W_p + W_q = \frac{126915984}{1279929033} = \frac{118}{119} \]

nearly. This is the probability that a verdict guilty, pronounced by a majority of seven at least, is correct. The probability of the same verdict being wrong, is therefore \( \frac{1}{19} \); so that out of 119 verdicts, respecting which we know nothing else, than that seven at least of the jury concurred in finding the accused guilty, we may expect one to be wrong, or that one person out of 119 so condemned will be innocent.

**Sect. VIII. Of the Solution of Questions Involving Large Numbers.**

87. The probabilities of the different compound events which can result from the combination of any number of simple events, E₁, E₂, E₃, &c. being (13) measured respectively by the several terms of the development of the multinomial \( (p+q+r+\ldots)^n \), the most probable of those compound events will be that which corresponds to the term having the greatest numerical value. Let us consider the case of two simple contrary events E and F, the probabilities of which are respectively \( p \) and \( q \), and suppose the number of occurrences to be 4. Neglecting the order of occurrence, the different combinations, with their respective probabilities, are the following:

- EEEE, EEEF, EEFF, EFFF,

\[ p^4, 4p^3q, 6p^2q^2, 4pq^3, q^4. \]

Now it is evident that the numerical values of these probabilities depend on the ratio of \( p \) to \( q \), as well as on the coefficient by which they are multiplied, and that values may be given to \( p \) and \( q \), such that any one of the terms may be made the greatest or the least in the series. If we suppose \( p = q \), and consequently \( p = \frac{1}{2}, q = \frac{1}{2} \), (since \( p + q = 1 \)) the probabilities of the different cases become respectively

\[ \frac{1}{16}, \frac{1}{8}, \frac{3}{16}, \frac{1}{8}, \frac{1}{16}; \]

whence it appears, that the most probable combination is that which corresponds to \( 6p^2q^2 \), or in which each of the simple events occurs twice, the probability of this combination being \( \frac{3}{16} \), while that of either of the simple events occurring four times in succession is only \( \frac{1}{16} \).

When the number of trials is 5, the probabilities of the several cases are respectively

\[ p^5, 5p^4q, 10p^3q^2, 10p^2q^3, 5pq^4, q^5, \]

which, when \( p = q \), become

\[ \frac{1}{32}, \frac{5}{32}, \frac{10}{32}, \frac{10}{32}, \frac{5}{32}, \frac{1}{32}, \]

so that there are two different combinations equally probable, namely, that in which E occurs three times and F twice, and that in which E occurs twice and F three times; and of the six possible combinations these two are the most probable, having in their favour a number of chances twice as great as the two cases in which one of the events occurs only once, and the other four times, and ten times greater than the two cases in which either of the simple events occurs in each of the five trials.

From these two instances it may be inferred in general, that when h is an even number, the most probable compound event is that of which the probability is represented by the middle term of the development of \( (p+q)^h \); and that when h is an odd number, there are two compound events equally probable, and more probable than any other, namely, those corresponding to the terms which occupy the middle of the series, supposing in both cases \( p = q \). This supposition gives \( (p+q)^h = (1+1)^{\frac{h}{2}} \); therefore in the case in which h is an even number, the general expression for the greatest term is

\[ \frac{(h-1)(h-2)\ldots(h-\frac{1}{2})+1}{1 \cdot 2 \cdot 3 \ldots \cdot \frac{h}{2}} \left( \frac{1}{2} \right)^h; \]

and when h is odd, the general expression for either of the two equal terms, which are greater than any of the other terms, is

\[ \frac{(h-1)(h-2)\ldots(h-\frac{1}{2})(h+1)+1}{1 \cdot 2 \cdot 3 \ldots \cdot \frac{h}{2}} \left( \frac{1}{2} \right)^h. \]

88. When \( p \) and \( q \) are unequal, the greatest term of the expansion of \( (p+q)^h \) will not occupy the middle of the series, but its place may be found by comparing two consecutive terms. Let \( h = m + n \). The general term of the series then becomes

\[ \frac{1 \cdot 2 \cdot 3 \ldots \cdot h}{1 \cdot 2 \cdot 3 \ldots \cdot m \cdot 1 \cdot 2 \cdot 3 \ldots \cdot n} p^m q^n; \]

and the term immediately preceding is

\[ \frac{1 \cdot 2 \cdot 3 \ldots \cdot (m+1)}{1 \cdot 2 \cdot 3 \ldots \cdot (m+1) \cdot 1 \cdot 2 \cdot 3 \ldots \cdot (n-1)} p^{m+1} q^{n-1}. \]

Dividing the first of these by the second, we get for the quotient \( \frac{(m+1)q + np}{(m+1)p + nq} \), which, therefore, is the ratio of two consecutive terms taken at any part of the series. If this ratio be greater than 1, the term which has been taken as the dividend is greater than the preceding one which has been taken as the divisor; and it is evident that the terms must go on increasing, from the beginning of the series, so long as the ratio in question is greater than 1. But if the ratio be less than 1, the preceding term is greater than the succeeding, and the terms will become less and less as they are nearer the end of the series. Let \( (m+1)q + np = 1 \); then, since \( p + q = 1 \), and \( m + n = h \), we have \( n = (h+1)q \), and consequently the ratio of any term to the next preceding is greater or less than 1 according as \( n \) is less or greater than \( (h+1)q \). Now \( n \) is necessarily a whole number; therefore if \( (h+1)q \) be a whole number, take \( n = (h+1)q \), and the two terms of the series given by the expansion Probability of \((p+q)^n\), in which the exponents of \(q\) are \(n-1\) and \(n\), will be equal to each other, and each greater than any other term of the series. But if \((h+1)q\) be not a whole number, let \((h+1)q-x\) be the nearest whole number less than \((h+1)q\), and make \(n=(h+1)q-x\); then the greatest term of the development will be that in which the exponent of \(q\) is \(n\).

Since \(m=(h+1)q-x\), we have \(q=(n+x)/(h+1)\), whence

\[ p = 1 - \frac{n+x}{h+1} = \frac{m+1-x}{h+1}, \]

and therefore \(q = \frac{m+1-x}{h+1}\).

Now \(x\) is by hypothesis less than 1, therefore if \(m\) and \(n\) are large numbers, we have, very nearly, \(q:p=n:m\); or, since \(m+n=h\), \(m=hp\), \(n=hq\). It follows therefore, that the greatest term of the development of the binomial \((p+q)^n\) is that in which the exponents of \(p\) and \(q\) are to each other in the ratio of \(p\) to \(q\), or more nearly in that ratio than any other two numbers whose sum is \(h\). In other words, the most probable combination of two simple events, \(E\) and \(F\), in any number of trials, is that in which the number of occurrences of \(E\) is to the number of occurrences of \(F\) in the ratio of their respective probabilities.

89. In the same manner it may be shewn, that when there are more than two simple events, of which one must occur in every trial, the most probable result of any number of trials is that combination in which the number of repetitions of each simple event is in proportion to its probability in a single trial. Thus, the probabilities of the simple events being respectively \(p_1, q_1, r_1, \ldots\), the most probable compound event is that whose probability is expressed by that term of the expansion of \((p_1+q_1+r_1+\ldots)^n\), which has for its arguments \(p^n, q^n, r^n, \ldots\).

90. Having determined the form of the greatest term of the series, we have next to find a method of approximating to its numerical value; for its coefficient containing the product of the natural numbers from 1 to \(h\) inclusive, its direct calculation becomes impracticable even when \(h\) is only a moderately large number. The theorem which gives the approximate value of this product is known by the name of Stirling's Theorem, having been discovered by that mathematician. As its investigation is a matter of pure analysis, we shall not stop to give it here, but refer the reader to the Treatise on Differences and Series, by Sir John Herschel, in the translation of Lacroix's Elementary Treatise on the Differential and Integral Calculus, p. 658. The theorem is as follows: Let \(x\) be any number, then

\[ 1 \cdot 2 \cdot 3 \cdots x = x^x e^{-x} \sqrt{2\pi x} \left(1 + \frac{1}{12x} + \frac{1}{288x^2} + \ldots\right), \]

where \(e\) is the number of which the Napierian logarithm is unit, or the number 2.71828, and \(\pi\) the ratio of the circumference of a circle to the diameter, or 3.14159.

When \(x\) is a large number, the term divided by \(12x\) becomes very small, and the series within the brackets may be considered as equal to unity. In this case, then, the formula becomes

\[ 1 \cdot 2 \cdot 3 \cdots x = x^x e^{-x} \sqrt{2\pi x}, \]

which gives a sufficient approximation in most cases. If, for example, \(x=1000\), the result will be within a 12000th part of the truth.

Now, let \(E\) and \(F\) be two events of such a nature that the one or the other must happen in every trial; let \(p\) and \(q\) be their respective probabilities, and \(P\) the probability that in \(m+n=h\) trials, \(E\) will happen \(m\) times and \(F\) \(n\) times; then by (12) we have

\[ P = \frac{1 \cdot 2 \cdot 3 \cdots h}{1 \cdot 2 \cdot 3 \cdots m \cdot 1 \cdot 2 \cdot 3 \cdots n} p^m q^n. \]

When \(m, n,\) and \(h\) are large numbers, the value of this Probability coefficient may be computed from the above formula, which gives

\[ 1 \cdot 2 \cdot 3 \cdots h = h^h e^{-h} \sqrt{2\pi h}, \] \[ 1 \cdot 2 \cdot 3 \cdots m = m^m e^{-m} \sqrt{2\pi m}, \] \[ 1 \cdot 2 \cdot 3 \cdots n = n^n e^{-n} \sqrt{2\pi n}, \]

whence

\[ P = \frac{h^h e^{-h} \sqrt{2\pi h}}{m^m e^{-m} \sqrt{2\pi m}} p^m q^n = \left(\frac{hp}{m}\right)^m \left(\frac{hq}{n}\right)^n \sqrt{\frac{h}{2\pi mn}}. \]

This expression represents any term of the series \((p+q)^n\). The greatest term, which corresponds to the most probable result, is (88) that in which \(m\) and \(n\) are to each other in the ratio of \(p\) to \(q\), or when \(m=hp\), and \(n=hq\). Let the greatest term therefore be denoted by \(P_s\); that is to say, let \(P_s\) be the chance of the most probable result of \(h\) trials, and we shall have

\[ P_s = \sqrt{(h+2\pi mn)}, \text{ or } P_s = \sqrt{(1+2\pi hpq)}. \]

This last formula shows that the absolute probability of that combination which has the greatest number of chances in its favour becomes less and less as the number of trials is increased; for the fraction \(1-h\), to the square root of which the probability is proportional, diminishes as \(h\) is increased.

91. As an example, suppose a shilling to be tossed 100 times in succession. In this case \(p=q=\frac{1}{2}\), \(hp=50\), \(hq=50\), and the most probable result of the trials is 50 times head and 50 times tail. We have then \(h=100\), \(m=50\), and \(\sqrt{(h+2\pi mn)}=1+\sqrt{(50\pi)}\) for the measure of the probability that the event will happen in this way exactly. On calculation, this is found \(=0.07979\); whence it appears, that although 50 heads and 50 tails is a more probable result of 100 trials than any other combination which can be named, its absolute probability is measured by a very small fraction. The probability of the contrary event, or that there will not be thrown 50 heads and 50 tails exactly, is \(1-0.07979=0.92021\), so that the odds against the event are about 92 to 8, or 23 to 2. Had the number of trials been 1000, the probability of 500 times head and 500 times tail exactly, though more likely to occur than any other combination, would have been found \(1+\sqrt{(500\pi)}\); that is to say, \(\sqrt{10}\) times, or rather more than 3 times less than in the former case. In general, when the chances in favour of the simple events are equal, the probability of the combination which is more likely to happen than any other, is inversely proportional to the square root of the number of trials.

92. The formulæ in (90) enable us also to determine the ratio of the greatest term of the development of \((p+q)^n\) to any other term of the series, and consequently the relation of the probabilities of the different compound events. Let \(m:n::p:q\), whence \(m=hp\) and \(n=hq\), and let \(P_s\) denote the probability that in \(h\) trials the event \(E\) will occur \((m-x)\) times, and the event \(F\) \((n+x)\) times, the probabilities of the simple events \(E\) and \(F\) being respectively \(p\) and \(q\). By (13) we have

\[ P_s = \frac{1 \cdot 2 \cdot 3 \cdots h}{1 \cdot 2 \cdot 3 \cdots (m-x) \cdot 1 \cdot 2 \cdot 3 \cdots (n+x)} p^{m-x} q^{n+x}, \]

which by (90) becomes

\[ P_s = \frac{h^h e^{-h} \sqrt{2\pi h}}{(m-x)^{m-x} e^{-(m-x)} \sqrt{2\pi (m-x)}} \times \frac{(n+x)^{n+x} e^{-(n+x)} \sqrt{2\pi (n+x)}}{x^x e^{-x} \sqrt{2\pi x}}; \]

whence, substituting \(m-x\) for \(p\), and \(n+x\) for \(q\), and leaving out the factors common to the numerator and denominator, we find,

\[ P_s = \sqrt{\left(\frac{h}{2\pi}\right)(m-x)^{m-x} (n+x)^{n+x} m-x n+x}. \] Now \( \log(m-x) = -m + x - \frac{1}{2} \log(m-x) \); and

\[ \log(m-x) = \log m - \frac{x^2}{2m^2} + \text{etc.} \]

therefore \( \log(m-x) = -m + x - \frac{1}{2} \log(m-x) \)

\[ (-m + x - \frac{1}{2}) \log m - \left( \frac{x^2}{m} + \frac{x^2}{2m^2} + \text{etc.} \right); \]

whence, neglecting terms divided by \( m^2, m^3, \text{etc.} \), \( m \) being supposed to be a large number in comparison with \( x \),

\[ \log(m-x) = -m + x - \frac{x^2}{2m} + \frac{x^3}{2m^2}; \]

therefore, on passing to numbers,

\[ (m-x)^{-m+x-1} = m^{-m+x-1} e^{-\frac{x^2}{2m}} \times e^{\frac{x^3}{2m^2}}; \]

or, since \( e^{\frac{x^2}{2m}} = 1 + \frac{x^2}{2m} + \frac{x^4}{2 \cdot 4m^2} + \text{etc.} \),

\[ (m-x)^{-m+x-1} = m^{-m+x-1} e^{-\frac{x^2}{2m}} \left( 1 + \frac{x^2}{2m} + \frac{x^4}{2 \cdot 4m^2} + \text{etc.} \right). \]

In like manner, by changing \( m \) into \( n \), and \( x \) into \( -x \), we get

\[ (n+x)^{-n-x-1} = n^{-n-x-1} e^{-\frac{x^2}{2n}} \left( 1 - \frac{x^2}{2n} + \text{etc.} \right). \]

Multiplying the first of these two expressions by \( m^{-m} \), and the second by \( n^{-n} \), we have

\[ (m-x)^{-m+x-1} \times n^{-n-x-1} = m^{-m} e^{-\frac{x^2}{2m}} \left( 1 + \frac{x^2}{2m} + \frac{x^4}{2 \cdot 4m^2} + \text{etc.} \right) \]

whence, substituting these values in that of \( P_x \), and neglecting the quantity divided by \( mn \),

\[ P_x = \sqrt{\frac{h}{2mn}} e^{-\frac{x^2}{2m}} \left( 1 + \frac{x^2}{2m} + \frac{x^4}{2 \cdot 4m^2} + \text{etc.} \right). \]

The term of the series \((p+q)^n\) which corresponds to this value of \( P_x \) is that which is \( x \) places to the right of the greatest term; and it has been shown, (90), that the greatest term has for its expression \( \sqrt{(h+2mn)} \); therefore the greatest term being denoted, as before, by \( P_x \), and the term which comes after it \( x \) places by \( P_{x+1} \), we have

\[ P_x = P_{x+1} e^{-\frac{x^2}{2m}}, \]

that is to say, the probability the event \( E \) will happen \( m \) times and fail \( n \) times in \( m+n \) trials, is to the probability of its happening \((m-x)\) times and failing \((n+x)\) times in the ratio of 1 to \( e^{-\frac{x^2}{2m}} \).

Since the numbers \( m \) and \( n \) enter symmetrically into the expression \( e^{-\frac{x^2}{2m}} \), it is evident that the result would have been the same if, instead of seeking the ratio of the greatest term to that which succeeds it by \( x \) places, we had sought the ratio of the greatest term to that which precedes it by \( x \) places. Hence if the most probable result of \( m+n \) trials be that \( E \) will happen \( m \) times and fail \( n \) times, the probability that it will happen \( m-x \) times and fail \( n+x \) times is the same as the probability that it will happen \( m+x \) times and fail \( n-x \) times.

The following example will suffice to shew the application of the formula: A die is thrown 6000 times, required the probability that the number of aces turned up will be exactly 960?

Here \( p \), the chance of throwing ace, is \( \frac{1}{6} \), \( q=\frac{5}{6} \), and \( h=6000 \); whence \( m=hq=1000 \), and \( n=lq=5000 \). We have first to find \( P_x \), the chance of the most probable result, or of 1000 aces. By (90), \( P_x = \sqrt{(h+2mn)} \); whence, substituting the above values, \( P_x = \sqrt{3 \cdot 5000 \times 3.14159} \).

On performing the operation indicated by the logarithmic tables, we get \( \log P_x = 8.14050 \), whence \( P_x = 0.0138 \).

The calculation of \( e^{-\frac{x^2}{2m}} \) is as follows: Assume Probability \( t^2 = h^2 + 2mn \). We have \( x = 1000 - 960 = 40 \).

\[ \log 40 = 1.60206 \]

\[ \log h = \log 6000 = 3.77815 \]

\[ \log h^2 = 6.98227 \]

\[ \log 2mn = \log 10,000,000 = 7 \]

\[ \log e = 0.43429, \log 43429 = 9.43778 \]

\[ \log (\ell \times 43429) = 9.52005 \]

\[ \log (\ell \times 43429) = 9.41692 \]

\[ \log (\ell \times 43429) = 9.58308 \]

add log \( P_x = 8.14050 \)

\[ \log P_x = 7.72358 \]

therefore \( P_x = 0.0053 \), which is the chance of 960 aces exactly. The odds against this event are therefore 9947 to 53, or nearly 188 to 1.

93. When \( h, m, \) and \( n \) are large numbers, and \( x \) is small, the exponential \( e^{-\frac{x^2}{2m}} \) differs little from unity, and it decreases slowly as \( x \) increases, so long as \( x \) is small in comparison of \( m \) and \( n \). Suppose \( m=n \) and \( x=\sqrt{m} \), it becomes

\[ e^{-1} = \frac{1}{e} = \frac{1}{27182818} \]

so that if we assume \( m=100 \), the 10th term before or after the greatest would still exceed the 3d part of the greatest. But when \( x \) becomes greater than \( \sqrt{m} \) or \( \sqrt{n} \) the exponential, and consequently also the terms which are multiplied by it, begin to diminish with great rapidity, and the diminution is more rapid as \( x \) increases. If \( m=n=100 \), and \( x=50 \), then the exponent \( h^2+2mn=25 \), so that \( e^{-\frac{x^2}{2m}} = 1-e^{25} \), a quantity which is altogether insensible. We may therefore conclude generally that when \( h \) is a large number, the principal terms of the development of \((p+q)^n\) are those which are near the greatest term, and that \( h \) may be taken so large that the terms towards the beginning or end of the series may at length become smaller than any assignable quantity.

94. From the proposition which has now been demonstrated it follows, that although the probability of that particular compound event which has the greatest number of chances in its favour is very small when the number of trials is great, yet on account of the rapid diminution of the terms towards the beginning and end of the series, the sum of a comparatively small number of terms taken on both sides of the greatest, may be very much greater than all the remaining terms of the series; and, consequently, there will be a very great probability that the compound event will be represented by one or other of those terms. This consideration leads us to one of the most important questions in the theory, namely, to determine the probability that in a large number of trials, \( h \), an event \( E \), which must either happen or fail in each trial, and of which the chance of happening in any trial is \( p \), will happen not less than \( hp-l \) times, and not oftener than \( hp+l \) times; or, making \( hp=m, hp=n \), to determine the probability that the number of occurrences of \( E \) will be included between the limits \( m-l \).

Let \( x \) be any number between 0 and \( l \). Then (92) the probability that \( E \) will occur \((m-x)\) times and fail \((n+x)\) times is \( P_x = P_{x+1} e^{-\frac{x^2}{2m}} \) (where \( P_x = \sqrt{(h+2mn)} \)). Now if in this expression we make \( x \) successively equal to each of the numbers \( 0, 1, 2, \ldots, l \), we shall have the respective probabilities of \( E \) happening \( m, m-1, m-2, \ldots, m-l \) times in \( h \) trials; and the sum of these probabilities will be the probability that \( E \) happens not oftener than \( m \) times, and not seldom than \( m-l \) times. The same suppositions with respect to \( x \) will Probability give the probabilities of E happening \( m, m+1, m+2, \ldots, m+l \) times, the sum of which will be the probability that E happens not seldom more than \( m \) times, and not often less than \( m+l \) times. Adding, therefore, those two sums, and deducting \( P_s \), the probability which corresponds to \( x=0 \), on account of its being included in each sum, and therefore having been counted twice, the result will be the sum of the terms of the binomial \((p+q)^n\) comprised between, and including, the two terms of which the first has for a factor \( p^{n-1} \), and the last \( p^n \), and will therefore express the probability that the number of occurrences of E will fall within the limits \( m=l \). Let this probability be denoted by \( R \), and let \( SP_x \) represent the sum of all the values of \( P_x \) obtained by substituting successively \( 0, 1, 2, 3, \ldots, l \) for \( x \); we then have \( R=2SP_x-P_s \), whence, writing for \( P_x \) and \( P_s \) their values,

\[ R=2S\sqrt{\frac{h}{(2mn)}} e^{\frac{-h}{2mn}} \sqrt{\frac{h}{2\pi mn}}. \]

95. In order to find an approximate value of this expression we must have recourse to a formula first given by Euler for converting sums of the kind denoted by \( S \) into definite integrals (for which see Lacroix, Traité du Calcul Différentiel et Intégral, tom. iii. p. 136, or Herschel's Treatise on Differences, p. 513). Assuming \( u \) to denote a function of \( x \), the formula is as follows:

\[ Su=\int u dx + \frac{1}{2} u + \frac{1}{1 \cdot 2 \cdot 3} \frac{du}{dx} + \text{constant}. \]

On making \( u=P_x=P_x e^{-hx/2mn} \), we find

\[ \frac{du}{dx} = -\frac{P_x hx}{mn} e^{-hx/2mn}; \]

therefore, if we suppose \( x \) to be not greater than \( \sqrt{m} \) or \( \sqrt{n} \), this differential coefficient is of the order \( 1-h \), (as may be easily shown by substituting \( hp \) for \( m \), and \( hq \) for \( n \)), and may be rejected, since \( h \) is supposed to be a very large number. The above equation therefore becomes

\[ SP_x = P_x \int e^{-hx/2mn} dx + \frac{1}{2} P_x e^{-hx/2mn} + \text{constant}; \]

and on supposing \( x=0 \) this gives \( 0=-\frac{1}{2} P_x + \text{constant} \), therefore the constant is equal to \( \frac{1}{2} P_x \), and we have

\[ SP_x = P_x \int e^{-hx/2mn} dx + \frac{1}{2} P_x e^{-hx/2mn} + \frac{1}{2} P_x. \]

Assume \( t=x\sqrt{(h+2mn)} \), whence \( dt=dx\sqrt{(h+2mn)} \); substitute these in the above equation, and it becomes by reason of \( P_x=\sqrt{(h+2mn)} \),

\[ SP_x = \frac{1}{\sqrt{\pi}} \int e^{-t^2} dt + \frac{1}{2} P_x e^{-t^2} + \frac{1}{2} P_x, \]

whence, from the equation \( R=2SP_x-P_s \), we obtain

\[ R=\frac{2}{\sqrt{\pi}} \int e^{-t^2} dt + P_x e^{-t^2}. \]

The integral in this expression must be taken between the limits \( t=0 \) and \( t=l \). When \( x=0 \) we have also \( t=0 \), and when \( x=l \), then \( t=l\sqrt{(h+2mn)} \); therefore assuming \( t \) for the limiting value of \( t \), that is, making \( t=l\sqrt{(h+2mn)} \), and therefore \( l=\sqrt{(2mn-h)} \), there results

\[ R=\frac{2}{\sqrt{\pi}} \int_0^\tau e^{-t^2} dt + P_x e^{-t^2} \]

for the probability that the number of occurrences of E will fall between \( m=l\sqrt{(2mn-h)} \); or, replacing \( m \) and \( n \) by \( hp \) and \( hq \), the probability that the number of occurrences of E will fall within the limits \( hp=l\sqrt{(hpq)} \), or be equal to one of those limits.

This expression for \( R \) has been found by neglecting quantities of the order of smallness \( 1-h \); consequently the greater the value of \( h \) the more nearly it approaches to accuracy, but it is only rigorously true when \( h \) is infinite.

96. The integral \( \int e^{-t^2} dt \) is computed as follows. Developing the exponential \( e^{-t^2} \) in a series of the ascending powers of \( t \), and integrating the successive terms between the limits \( t=0 \) and \( t=\tau \), we find

\[ \int_0^\tau e^{-t^2} dt = \frac{\tau^3}{1 \cdot 3} + \frac{\tau^5}{1 \cdot 2 \cdot 5} + \frac{\tau^7}{1 \cdot 2 \cdot 3 \cdot 7} + \text{etc.}, \]

a series which converges rapidly when \( \tau \) is less than unity. In the contrary case, however, or when \( \tau \) is greater than unity, the series is divergent, and it is necessary to proceed by a different method. Let the factor \( e^{-t^2} \) be multiplied and divided by \( t \); we have then

\[ \int e^{-t^2} dt = \int_0^\tau e^{-t^2} dt, \]

and on integrating by parts

\[ \int_0^\tau e^{-t^2} dt = \frac{e^{-t^2}}{2t} - \frac{1}{2} \int_0^\tau e^{-t^2} dt. \]

Repeating the same process on the last integral, and so on with the last after each succeeding integration, the following series is obtained,

\[ \int_0^\tau e^{-t^2} dt = \frac{e^{-t^2}}{2t} \left\{ 1 - \frac{1}{2t^2} + \frac{3}{2^2t^4} - \frac{35}{2^4t^6} + \text{etc.} \right\}. \]

When \( t=\infty \) the right hand side of this equation becomes \( 0 \); whence between the limits \( t=\tau \) and \( t=\infty \), we have

\[ \int_\tau^\infty e^{-t^2} dt = \frac{e^{-t^2}}{2t} \left\{ 1 - \frac{1}{2t^2} + \frac{3}{2^2t^4} - \frac{35}{2^4t^6} + \text{etc.} \right\}, \]

a series which converges very rapidly when \( \tau \) is greater than unity. Now the value of a definite integral between 0 and infinity is obviously equal to its two parts, of which the first is taken between 0 and \( \tau \), and the second between \( \tau \) and infinity; that is to say,

\[ \int_0^\tau e^{-t^2} dt = \int_0^\tau e^{-t^2} dt - \int_\tau^\infty e^{-t^2} dt. \]

But \( \int_0^\tau e^{-t^2} dt \) is well known to have for its expression \( \frac{1}{2} \sqrt{\pi} \), therefore

\[ \int_0^\tau e^{-t^2} dt = \frac{1}{2} \sqrt{\pi} - \int_\tau^\infty e^{-t^2} dt, \]

so that the integral may be computed from either of the above series, according as \( \tau \) is less or greater than 1.

The integral \( \int e^{-t^2} dt \) is of great importance in the higher mathematics. It occurs in the investigation of the path of a ray of light through the atmosphere, and of the law of the diffusion of heat in the interior of solid bodies, as well as in the determination of the degree of reliance that may be placed on the results of astronomical observations, and generally in most of the more difficult and important applications of the theory of probabilities. A table of its values from \( t=0 \) to \( t=3 \), for intervals each \( =0.1 \), was given by Kramp, at the end of his Analyse des Refractions Astronomiques, Strasbourg, 1799. In the Berliner Astronomisches Jahrbuch for 1834, there is also a table of its values from \( t=0 \) to \( t=2 \) (for the same intervals) multiplied by \( 2\sqrt{\pi} \), with their first and second differences for the purpose of facilitating interpolation. This last table, which appears to have been derived from that of Kramp, and which is immediately applicable in the calculation of the probability \( R \), we have extended to \( t=3 \), and given at the end of the present article. As the function which is thus tabulated will Probability occur frequently in what follows, we shall in future, for convenience in printing, denote it by $\Theta$, which is to say, we shall assume

$$\Theta = \frac{2}{\sqrt{\pi}} \int_0^\infty e^{-t^2} dt = 1 - \frac{2}{\sqrt{\pi}} \int_{-\infty}^0 e^{-t^2} dt,$$

the two forms being equivalent in consequence of the above equation.

97. Some very important conclusions follow immediately from the formula in (95). The quantity $R$ denotes the probability that in a very great number of trials $h$, the event $E$, of which the a priori probability in any trial is $p$, will occur not seldomer than $hp-l$ times, and not oftener than $hp+l$ times, or that the number of its occurrences will be included between the limits $hp-l$, or at least be equal to one of those limits. Hence $R$ also denotes the probability that the ratio of the occurrences of $E$ to the whole number of trials, will be included within the limits $p-h/p+h$. We have assumed $r=\sqrt{(h+2mp)}$; but $m=hp$ and $n=hq$; therefore $r=\sqrt{(2hpq)}$, whence $l=r\sqrt{(2pq)}$, and consequently $l+p=l+r\sqrt{(2pq)}$. Now, if we suppose $r$ to be constant, so that the probability expressed by $R$ may remain the same, then $p$ and $q$ being given, $l$ is proportional to the square root of $h$, and consequently the greater the number of trials the smaller will $l$ be in proportion to that number. Thus, if the number of trials be 1000, and we have a given probability $R$ that the number of occurrences of $E$ will not differ more than 10 from the number which is the most probable of all (that is, from 1000 $p$), then if we take 100,000 trials, we shall have the same probability $R$ that the number of occurrences of $E$ will not differ more than $10\times\sqrt{100}=100$ from the most probable number. But a difference of 10 in 1000 is 1/100th of the whole, whilst a difference of 100 in 100,000 is 1/1000th of the whole, and thus the ratio of $l$ to $h$ becomes smaller and smaller, or the ratio of the occurrences of $E$ to the whole number of trials approaches nearer and nearer to $p$, as the number of trials is increased; and the experiments may be repeated until the difference between $p$ and $p-h/p+h$, in respect of a given probability $R$ which may be as great as we please, shall be less than any assignable quantity.

If, on the other hand, we suppose $l-p$ to be constant, then $r$ is proportional to the square root of the number of trials. But as $r$ increases, $\Theta$, and consequently $R$, approaches nearer and nearer to unity, (and it may be seen, by referring to the table, that it is only necessary to have $r=3$ in order to have $\Theta=99999779$); whence the number of trials & may always be increased until we obtain a probability approaching as nearly to certainty as we please, that the number of occurrences of $E$ will be comprised within the given limits $(hp-l)$; or, which is the same thing, that the ratio of the number of occurrences of $E$ to the whole number of trials, shall not differ from $p$, the probability of $E$ in a single trial, more than a given quantity $l-p$ which may be less than any assigned fraction. This is the celebrated theorem which was demonstrated by James Bernoulli in the Ars Conjectandi.

98. The application of the preceding results to numerical examples, is rendered extremely easy by means of the table of the values of $\Theta$. From the formula in (95) we have the probability

$$R=\Theta+P_r e^{-r},$$

that the occurrences of $E$ in $h$ trials will fall within the limits $hp-l$, the relation between $l$ and $r$ being given by the equation $l=r\sqrt{(2hpq)}$. If, therefore, we suppose $l$ to be given, $r$ becomes known, and the corresponding value of $\Theta$ is found from the table; and, conversely, if $\Theta$ be assumed, $r$ is given by the table, whence the corresponding limits $l$ are deduced. With respect to the quantity $P_r e^{-r}$, we may observe that it denotes the probability that the number of occurrences of the event $E$ will be $hp+l$ or $hp-l$ precisely (92), and is therefore always a very small fraction Probability when $h$ is a large number (90). It may be regarded as a correction of $\Theta$, which in most cases might be omitted without sensibly affecting the result; but when $h$ is not very large, or $l$ a small number, it becomes necessary to take it into account. In such cases its value may be computed directly as in the example in (92); but this labour may be avoided by increasing $r$, so as to include it within the limits of the integral $\Theta$. Thus, let $R$ be the probability that the number of arrivals of $E$ will be included within the limits $hp-l$, and $R'$ the probability of the limits being $hp-l$, and let $\Theta$ and $\Theta'$ be respectively the corresponding values of the integral. We have then, giving $P_r$ the same signification as in (92), the two equations

$$R=\Theta+P_r, R'=\Theta'+P_{r+1},$$

and the difference $R'-R$ of these two probabilities is obviously the double of the probability that the result of the trials will be either $(hp+l)$ times $E$, or $(hp-l)$ times $E$, exactly. But the chance of either of these events being $P_{r+1}$, we have therefore $R'-R=2P_{r+1}$. Now, when $h$ is large, $P_r$ and $P_{r+1}$ are very small, and very nearly equal to each other, (their difference is in fact of the order of quantities omitted); hence $R'-R=\Theta'-\Theta$, and also $2P_{r+1}=2P_r$, and consequently $\Theta'-\Theta=2P_r$, or $P_r=(\Theta'-\Theta)/2$. Substituting this value of $P_r$ in the equation $R=\Theta+P_r$, we get $R=\frac{1}{2}(\Theta+\Theta')$; so that if we take from the table the values of $\Theta$ and $\Theta'$ corresponding to $l$ and $l+1$, half their sum will give $R$. But as the interval between $\Theta'$ and $\Theta$ in the table is always small, half their sum will not differ sensibly from the value of $\Theta$ corresponding to $l+1$, whence this value of $\Theta$ is equal to $R$, and we have the following rule for determining the limits corresponding to a given probability, or vice versa:

When the limits are assumed, find $r$ from the equation $l+r=r\sqrt{(2hpq)}$; then the value of $\Theta$ in the table, corresponding to $r$ is the probability that in $h$ trials the number of occurrences of the event $E$, the chance of which in a single trial is $p$, will lie within the limits $hp-l$ both inclusive. Conversely, when $\Theta$ is assumed, find the corresponding value of $r$ in the table, by means of which the limit $l$ will be given by the equation $l+r=r\sqrt{(2hpq)}$. It is obvious, that if the limit $l$ and the probability $\Theta$ be both assumed, then $h$ may be determined from the same equation.

99. We will now give some examples of the application of the preceding formulae.

Suppose $p=q=\frac{1}{2}$, and $h=200$, and let it be proposed to assign the limits within which there is a probability $\frac{1}{2}$ that the number of occurrences of $E$ will fall. In this case the equation $l+r=r\sqrt{(2hpq)}$ becomes $l+r=\sqrt{100}=10r$. Now, it is easily found from the table that for $\Theta=\frac{1}{2}$ we have $r=4769$, whence $l=4769$, and $l=4769$. On tossing a shilling 200 times, it is therefore more than an even wager that head will turn up not seldomer than 95 times, and not oftener than 105 times.

Suppose $p=q=\frac{1}{2}$, $h=3600$, and let it be proposed to assign the probability that the number of occurrences of $E$ will not exceed the limits $1800\pm30$. In this case the equation $l+r=r\sqrt{(2hpq)}$ becomes $305=\sqrt{2\times900}=30\sqrt{2}$, whence $r=305\pm30\sqrt{2}=7189$; and the table gives $\Theta=6907$ nearly. Hence in tossing a shilling 3600 times, the odds are 28 to 13 that head will not turn up oftener than $1800\pm30=1830$ times, nor seldomer than $1800-30=1770$ times. Neglecting the second term of R (95) and taking simply $l=10r$, the table gives $\Theta=6827$, which is the solution given by De Moivre, p. 245.

Suppose $p=\frac{1}{2}$, $q=\frac{1}{2}$, and let it be proposed to determine how many trials must be made in order that it may be one to one that the number of occurrences of $E$ will not differ more than 10 from the most probable number. Probability For \( \theta = \frac{1}{2} \) we have \( m = 4769 \); therefore the equation becomes

\[ \frac{1}{4} = \sqrt{(2pq)} \quad \text{becomes} \quad 105 = \sqrt{(105 + 35)}, \quad \text{whence} \]

\( h = 36(105 + 4769)^2 \). On computing this formula \( h \) is found

\( = 17452 \). Say 1746, \( \frac{1}{2} \) of which is 291; and it follows that

if a die be thrown 1746 times it is an even wager that the number of aces will fall between 291 and 10, that is, between 281 and 301, or be equal to one of those numbers.

In (92) we found the probability to be -0053, that in 6000 throws of a die the number of aces will be exactly 960. Let it now be proposed to assign the probability \( \Theta \), that in 6000 throws the number of aces will lie between 960 and 1040, that is, between 1000 - 40. Here \( k = 5000 \),

\( p = \frac{1}{6}, q = \frac{5}{6}, \) and \( r = 40 \); the equation of the limits therefore becomes

\( 405 = \sqrt{(10000 + 6)}, \quad \text{whence} \quad s = 405 \times 6 = 902, \)

corresponding to which the table gives \( \Theta = 0.8394 \).

The following question is discussed by Nicolas Bernoulli in the Appendix to Montmort's Analyse des Jeux de Hazard, and is noticed by De Moivre and Laplace. From the observations of the births of both sexes in London during 82 years (from 1629 to 1711) it was found that the average number of children annually born in London, was about 14,000, and the ratio of the number of males to that of females, was nearly as 18 to 17; the average number of male births being 7200, and of female births 6800. In the year in which the greatest difference from this ratio took place, the actual numbers were 7037 males and 6963 females, so that the difference from the average amounted to 168.

Assuming, then, the comparative facility of male and female births to be as 18 to 17, required the probability that out of 14000 children born, the number of males shall not be greater than 7383, nor less than 7037.

This question is evidently equivalent to the following:

Let 14000 dice, each having 35 faces, 18 white and 17 black, be thrown; what is the probability that the number of white faces turned up, will be comprised between the limits 7200 and 163. We have therefore \( h = 1400, p = \frac{1}{6}, q = \frac{5}{6}, r = 163, \) and the formula \( I + \frac{1}{2} = \sqrt{(2pq)} \) becomes

\( 1635 = \sqrt{(2 \times 1400 \times 18 \times 17)} + 35, \quad \text{whence} \quad t = 1955. \)

The corresponding value of \( \Theta \) is found from the table \( = 0.9943, \)

which is the probability that the number of white faces shall not be greater than 7363, nor less than 7037. The odds in favour of the event are therefore 9943 to 57, or about 175 to 1.

100. We now proceed to consider the case in which the probabilities of the simple events are not known, \( a \) priori, but inferred from the results of experience. It was shewn in (52) that the probability \( P \) of an event happening \( m' \) times, and failing \( n' \) times in \( k' \) trials, \( (k' = m' + n'), \) when it has been observed to happen \( m \) times, and fail \( n \) times in \( k \) previous trials, is expressed by this equation

\[ P = \frac{m + m'}{m + n + k + 1}. \]

Now, when \( m, n, m', n' \) are large numbers, an approximate value of \( P \), more accurate in proportion as those numbers become larger, is obtained from Stirling's theorem (90), which for any number \( x \) gives \( [x] = x e^{-x} \sqrt{2 \pi x}. \)

Applying the theorem therefore to the expressions within the brackets in the above equation, and assuming

\[ K = \frac{h + 1}{h + h' + 1} \sqrt{\frac{(m + m')(n + n')(h + 1)}{mn(h + h' + 1)}}, \]

we obtain, in consequence of \( m + n = h, \)

\[ P = \frac{m + m'}{m + n + (h + h' + 1)}. \]

Let \( m' = \delta m, n' = \delta n, \) and consequently \( h' = \delta h; \) then taking

\[ \frac{h}{(h + h')^{a + b}} \quad \text{for} \quad \frac{(h + 1)^a}{(h + h' + 1)^{a + b}} \quad \text{(which may be done without sensible error, since \( h \) is by supposition a large number,) the equation becomes} \]

\[ P = \frac{U K \frac{m + m'}{m + n + (h + h' + 1)}}{m + n + (h + h' + 1)}. \]

or, since \( m + n = h, m' + m' = h', \)

\[ P = \frac{U K \frac{m + m'}{m + n + (h + h' + 1)}}{m + n + (h + h' + 1)}. \]

Making the same substitutions in the expression denoted by \( K, \) we get, after reduction, \( K = 1 + \sqrt{(1 + \theta)}; \) whence,

\[ P = \frac{U}{\sqrt{1 + \theta}} \left( \frac{m}{h} \right) \left( \frac{n}{h} \right). \]

The value of \( P \) now found, is the probability that in a future series of trials the ratio of the occurrences of \( E \) to those of \( F \) will be the same as in the preceding trials, which are supposed to have been very numerous. If the chances of \( E \) and \( F \) had been given \( a \) priori equal to \( m + h \) and \( n + h \) respectively, the probability of \( m' \) times \( E, \) and \( n' \) times \( F \) in \( m' + n' \) future trials would have been \( P = \)

\[ U \left( \frac{m}{h} \right) \left( \frac{n}{h} \right) \text{by (12); hence (since } m' + h' = m + h \text{ and } n' + h' = n + h), \text{ the relation between the probability } P, \]

of that combination of simple events which has the greatest number of chances in its favour, when the chances of the simple events are known \( a \) priori, and the probability of the same combination when the chances of the simple events are only presumed from previous trials, is expressed by this equation,

\[ P = P + \sqrt{(1 + \theta)}. \]

101. When \( h' \) is very small in comparison of \( h, \theta \) becomes a very small fraction, and may be neglected, and we have then \( P = P'. \) But when \( h' \) is a number comparable with \( h, \) \( P \) is less than \( P', \) and it diminishes rapidly when \( \theta \) exceeds 1. The reason of this is obvious. If the contents of the urn are not known \( a \) priori, however numerous the trials may have been there is only a presumption that the chance of drawing a white ball in a single trial is measured by \( m + h; \) whereas, in the case of the ratio of the balls being previously known, the measure of the probability is certain. As an instance of the manner in which the probability of an assigned series of future events diminishes, when the probabilities of the simple events are inferred from experience, let us suppose \( k = h, \) whence \( \theta = 1, \) and consequently \( P = \frac{1}{2} = 7071 \times P. \) Now it was shewn in (91) that if a ball be drawn at random 100 times from an urn which contains an equal number of black and white balls, the probability \( P \) that the result will be 50 white balls, and 50 black, precisely, is 0.7979. It follows therefore, that if the contents of the urn be unknown, and we can only judge of the relative numbers of the two sorts of balls it contains from having observed that in 100 trials there have been drawn 50 white balls and 50 black, the probability \( P \) of that combination in 100 future trials, becomes 0.7979 × 7071 = 0.5642.

102. The result obtained in (100) enables us to determine the probability that the number of occurrences of \( E \) in \( h' \) future trials, will not differ in excess or defect from the most probable number, by more than a certain given number \( l. \) It has been shewn (95) that in the case of the probabilities \( p \) and \( q \) of the simple events being given \( a \) priori, if we determine \( r \) from the equation \( t = \sqrt{(2pq)}, \)

the formula

\[ R = \Theta + \sqrt{(1 + 2pq)} e^{-r^2} \]

gives the probability \( R \) that \( m \) will be comprised within the limits \( h' = \sqrt{(2pq)}; \) or, dividing by \( h, \) the probability that the ratio of \( m \) to \( h \) will be comprised within the limits Probability \( p = \sqrt{2pq + h} \). Conversely, when \( p \) and \( q \) are not known, but the event \( E \) has been observed to happen \( m \) times in \( h \) trials, then

\[ R = \Theta + \sqrt{(h + 2mn)e^{-t}} \]

gives the probability \( R \) that \( p \) is comprised within the limits

\[ \frac{m}{h} \pm \frac{\tau}{h} \sqrt{2mn} \]

These limits approach more nearly to each other as \( h \) increases; and when \( h \) is a large number, the ratios \( m/h \), \( n/h \) may be assumed, without sensible error, as the chances of \( E \) and \( F \) in computing the probable result of a future series of \( h' \) trials, provided, however, that \( h' \) (though absolutely a large number) be small relatively to \( h \). When this condition is not fulfilled, the assumption of \( m/h \) and \( n/h \) as the a priori chances of \( E \) and \( F \), might lead to considerable error; but an approximation to the limits corresponding to a given value of \( R \) may be obtained from the following considerations:

Suppose a large number \( h \) of events to have been observed, and that the result of the observation gave \( m \) times \( E \) and \( n \) times \( F \). Let a new series of \( h' \) trials be made, and suppose that in this new series \( p \) is the real chance of \( E \) and \( q \) of \( F \); we have then a given probability \( R \) that the number of occurrences of \( E \) will fall within the limits \( h'p = \sqrt{2h'pq} \). Now, for \( p \) and \( q \) substitute the ratios observed in the first set of experiments, namely, \( m/h \) and \( n/h \), and the limits corresponding to \( R \) become

\[ \frac{mh'}{h} \pm \frac{\tau}{h} \sqrt{2h'mn} \]

which, therefore, are the true limits on the hypothesis that the chance of \( E \) in a single trial is \( m/h \). But as this chance is not certain, but only presumed, the limits require to be extended in order that \( R \) may preserve the same value. Continuing our attention to \( \Theta \), the first term of the expression for \( R \) (the second may be disregarded in the present approximation), let \( h' = m' + n' \) and \( m'/n' = m/n \); then \( \Theta \) is the sum of the terms of the binomial \( (p+q)^{n'} \) from that in which the exponent of \( p \) is \( m' + l \) to that in which the exponent is \( m' - l \). Now, when \( p \) and \( q \) are given a priori, the chance of \( m' \) times \( E \) and \( n' \) times \( F \) in \( h' \) trials is \( P \); and when \( p \) and \( q \) are only presumed from the results of previous trials, the chance of the same combination is \( \Pi \); and \( (100) \Pi \) is less than \( P \), in the ratio of \( 1/\sqrt{1+\theta} \).

In like manner, the chance of each of the other combinations of \( E \) and \( F \) included in the integral \( \Theta \) will be less in the case of \( p \) and \( q \) presumed, than in the case of \( p \) and \( q \) given, in the same ratio of \( 1/\sqrt{1+\theta} \). But it has been seen (93) that when \( h' \) is a large number, the terms of the development of \( (p+q)^{n'} \) which are nearest the greatest term, diminish at first very slowly; and, further, that only a small number of terms on each side of the greatest are required to be taken, since \( l \) is less than \( \sqrt{m'/n'} \) or \( \sqrt{n'/m'} \) (95); we may therefore, without sensible error, assume \( \Theta \) to be proportional to the number of terms included in the summation, or that the value of \( \Theta \) will not be changed if we include in the summation a number of terms greater in proportion as the value of each individual term is less. Hence it follows that the limits must be increased in the ratio of \( \sqrt{1+\theta} \) to 1, and the value of \( \Theta \) corresponding to \( r \) will give the probability that the number of events \( E \), in \( h' \) future trials, will be included between

\[ \frac{m'h'}{h} \pm \frac{\tau}{h} \sqrt{2h'mn(1+\theta)} \]

103. The following question may be proposed as an example of the application of the last formula. Out of a given number \( h \) of individuals taken at the age \( A \), it has been observed that \( m \) are alive at the age \( A+a \); required the probability that out of \( h' \) other individuals taken at the same age \( A \) the number who survive at the age \( A+a \) will be included between \( m' \pm l \), the ratio of \( m' \) to \( h' \) being the same as that of \( m \) to \( h \).

To solve this question, we have to find \( r \) from the equation \( l = \frac{\tau}{h} \sqrt{2h'mn(1+\theta)} \); and the corresponding value of \( \Theta \) in the table, will give the required probability.

From the table given in the article Mortality, vol. xv. p. 555, it appears that out of 5642 individuals taken at the age 30, the number surviving at the age 50, according to the Carlisle Table, is 4397. Taking those numbers as an example, we have \( h=5642, m=4397, n=1245 \); and assuming also \( h'=5642 \), whence \( l=1 \) and \( \sqrt{1+\theta}=\sqrt{2} \), the equation of the limits becomes \( l=x \times 6230 \). Let it be proposed to determine \( l \) from the condition \( \Theta=\frac{1}{2} \). In this case the table gives \( r=4769 \), and we have consequently \( l=297 \). Hence it appears, that if it has been observed that of 5642 individuals taken at the age of 30, 1245 die before reaching the age of 50, it is an even wager that out of 5642 other individuals also taken at the age of 30, and subjected to the same chances of mortality, the number who die before reaching the age of 50 will lie between 1245 and 30, that is, between 1215 and 1275.

104. The following experiment recorded by Buffon, in his Arithmetique Morale, affords an example of the application of the preceding formulae to the determination of the probable existence of a physical cause from the results of a large number of observations. A piece of money was tossed 4040 times successively, and the result was head 2048 times, and tail 1992 times. Supposing the piece to have been perfectly symmetrical, the most probable result would have been the same number of heads and tails. Let it now be proposed to assign the probability afforded by the experiment that the piece was not symmetrical, and that its form or physical structure was such as to render head an event, a priori, more probable than tail.

In this case \( h=4040, m=2048, n=1992 \); and by (102) we have the probability \( R \) (or \( \Theta \), neglecting the correction) that \( p \), the unknown chance of head, is comprised between the limits \( \frac{m}{h} \pm \frac{\tau}{h} \sqrt{2mn} \). Now \( \frac{m}{h} = \frac{2048}{4040} = 0.50693 \), and \( \frac{\tau}{h} \sqrt{2mn} = \tau \times 0.11124 \), therefore if we assume \( \tau \times 0.11124 = 0.0693 \), we shall have the probability \( \Theta \) that \( p \) is comprised between the limits \( 0.50693 \pm 0.0693 \), that is, between two limits of which the least is 5, or one-half. This assumption gives \( r=0.0693 \times 0.11124=0.623 \); and the corresponding value of \( \Theta \) is found from the table \( =0.62170 \). Now if \( p \) lie between the above limits, its value is evidently greater than \( \frac{1}{2} \); but the probability of its lying between those limits is not the whole probability that \( p \) is greater than \( \frac{1}{2} \); for there is a chance of its exceeding the greatest limit, in which case also its value will be greater.

This inference, though admitted by both Laplace and Poisson, is not strictly correct. In a paper published in the Transactions of the Cambridge Philosophical Society, (vol. vi. part iii.) Mr. De Morgan has shown by a direct analysis that, in the case of \( p \) and \( q \) not being known a priori, but made equal to the observed ratios \( m/h \), \( n/h \), the presumption of the true value of \( p \) lying within the limits stated in the text is not, as there inferred, \( \Theta + \frac{1}{\sqrt{2pq}} e^{-t} \), but \( \Theta = \frac{13pq-13p+1}{6pq} e^{-t} \). The last correction to \( \Theta \) is smaller than the former, and being divided by \( h \), is of the order of quantities that have been rejected in the approximations. It is right to state that the method of simplifying the calculation of \( R \) in the direct case, by taking the integral \( \Theta \) between limits corresponding to \( l \pm \frac{1}{2} \) instead of \( l \), is noticed, for the first time so far as we are aware, by Mr. De Morgan in the same paper. Probability than \( \frac{1}{2} \). The probability that \( p \) is not comprised between the assumed limits is \( 1 - 0.6217 = 0.3783 \); and if it is not comprised between these limits, there is an equal chance of its being greater than the greatest limit, or less than the least; the probability of its exceeding the greatest limit is consequently \( \frac{1}{2} \times 0.3783 = 0.18915 \). Hence the whole probability that \( p \) is greater than \( \beta \), or that the chance of head is greater than that of tail is \( 0.6217 + 0.18915 = 0.81085 \); and the odds are therefore 81 to 19, or rather more than 4 to 1 that the piece was not perfectly symmetrical.

105. The formulae which have been demonstrated in the present section are immediately applicable to the determination of the probable limits of the gain or loss which may arise from undertaking a great number of risks with a given expectation in respect of each. The following question has important practical applications. A is interested in a great number of similar enterprises, in each of which E or F must necessarily happen. When E happens he receives the sum \( a \), and when F happens he pays the sum \( b \); required the probability that his gain or loss shall be comprised within given limits?

Let \( p \) be the chance of the event \( E \), \( q \) that of \( F \), and \( k \) the number of enterprises. Suppose \( E \) happens \( m \) times, and \( F \) \( n \) times; the sum to be received will be \( ma \), and the sum to be paid will be \( nb \), and therefore his gain will be \( ma - nb \). Let \( m = hp \), \( n = hq \), then \( m \) time \( E \) and \( n \) times \( F \) is the most probable result, and in this case the gain \( ma - nb \) becomes \( h(pq - q^2) \). Find \( r \) from \( \frac{1}{2} = \sqrt{(2hpq)} \), then (98) \( \Theta \) is the probability that the number of occurrences of \( E \) will lie between the limits \( hp - l \) and \( hp + l \). But if \( E \) happens \( hp - l \) times, and consequently \( F \) \( hp + l \) times, the corresponding benefit is \( (hp - l)(pq - q^2) + l(a + b) \); and if \( E \) happens \( hp + l \) times, and \( F \) \( hp - l \) times, the benefit is \( (hp + l)(pq - q^2) + l(a + b) \); whence \( \Theta \) is the probability that his gain, that is, the difference between what he receives and what he pays, will be included within the limits \( h(pq - q^2) \pm l(a + b) \) both inclusive.

106. The following conclusions follow immediately from this solution.

(1). If \( pq \) be greater than \( q^2 \), so that \( A \) has a mathematical advantage (however small) in each risk, the risk may be repeated a sufficient number of times, or \( h \) may be taken a sufficiently high number, to give a probability as nearly equal to certainty as we please, that \( A \)'s gain shall exceed any given sum, however great.

(2). Let there be two players \( A \) and \( B \), whose chances of gaining a game are respectively \( p \) and \( q \), and let \( b \) be the sum staked upon each game by \( A \), and the sum staked by \( B \); then \( pq \) is the mathematical expectation of \( A \) in respect of a single game, and \( q^2 \) that of \( B \); and if \( pq \) be greater than \( q^2 \) (however small the difference) the game may be repeated so often as to give rise to a probability approaching as nearly to certainty as we please, that \( A \)'s gain shall become equal to the whole of \( B \)'s capital, and, consequently, that \( B \) will be ruined.

(3). If the mathematical expectations of the two players be equal, then \( pq = q^2 = 0 \); and the most probable individual result of a large number of games, is that the gains and losses on either side shall be the same. But if \( h \) be supposed constant, then \( r \) is inversely proportional to \( \sqrt{h} \), and consequently the game may be repeated until \( \Theta \), the probability that the gain or loss \( l(a + b) \) shall be comprised within given limits, shall become as small as we please. Hence \( 1 - \Theta \), the probability that the gain or loss shall not be comprised within given limits, may be rendered as great as we please; and it follows that although the play may be on terms of perfect equality, it may be continued until a probability shall be obtained, approaching as nearly to certainty as we please, that one of the two players shall be ruined.

(4). The number of games which must be played, to afford a given amount of probability that one of the parties shall lose the whole of his fortune, depends on the magnitude of the stakes \( (a + b) \); but whether the stakes be large or small, the final result is the same. When the stakes are small, a greater number of games must be played.

107. As an example of this class of problems, we may take the following question: \( A \) and \( B \) engage in play with equal chances of winning, and stake five sovereigns on each game; how many games must they undertake to play in order that it may be two to one that one of them shall lose at least 100 sovereigns?

Here \( p = \frac{1}{2}, q = \frac{1}{2}, a = 5, b = 5, \) and \( l(a + b) = 100 \), whence \( l = 10 \). The equation \( \frac{1}{2} = \sqrt{(2hpq)} \) therefore becomes \( 10 = \sqrt{(h + 2)} \), whence \( h = 2 \times (10^2 - 1) \). Now, the odds being 2 to 1 against the limits of the gain or loss not exceeding 100, the probability \( \Theta \) of the limits not exceeding 100 is \( \frac{1}{2} = 0.3333 \), corresponding to which the table gives by interpolation \( r = 30.458 \); substituting which in the above equation we find \( h = 2376.8 \); so that if 2377 games are played, the odds are 2 to 1 that one of the players shall have gained at least 10 games more than half that number, and, consequently, that the other shall have gained at least 10 less than half, or that one of them shall have gained at least 20 games more than the other, and consequently have gained at least 100 sovereigns.

It is to be carefully observed that this question supposes the account between \( A \) and \( B \) not to be balanced until 2377 games have been played. If the condition of the play had been that it should cease as soon as \( A \) or \( B \) should have lost 100 sovereigns, the question would have been of an entirely different kind, and a much smaller number of games would have given the same probability of an equal loss.

108. The question just alluded to belongs to a class of problems connected with the Duration of Play, of extreme difficulty, and which have given rise to some of the most abstruse and refined researches in the modern analysis. In order to give an idea of the subject, we may take the following question, which has been frequently considered.

\( A \) and \( B \), whose chances of winning a game are respectively \( p \) and \( q \), play on these terms: \( A \) has \( m \) counters, and \( B \) has \( n \) counters; when \( A \) loses a game he gives a counter to \( B \), and when \( B \) loses a game he gives a counter to \( A \), and the play is to cease when one of them has lost all his counters. What is the probability that the play, which may go on for ever, shall be finished before more than \( h \) games shall have been played.

To take a simple case, suppose each to have three counters, and let the probability be required that the play shall be concluded with or before the ninth game. As the play cannot end with less than three games, let the binomial \( (p + q)^3 \) be developed, and the terms

\[ p^3 + 3p^2q + 3pq^2 + q^3 \]

give the respective probabilities of all the cases which can arise in three games. The first term is the probability of \( A \) gaining all the three games, the last term is the probability of \( B \) gaining them, and the sum of the remaining two terms is the probability that neither will win all the games, or the chance that a fourth will be played. Now, if the fourth game be played, \( p \) is \( A \)'s chance of winning it, and \( q \) \( B \)'s chance; but these chances will only exist in respect of the fourth game, provided the play be not concluded with the previous one, the probability of which is \( 3p^3q + 3pq^2 \).

Multiplying, therefore, \( 3p^3q + 3pq^2 \) by \( p + q \), the product

\[ 3p^4q + 6p^3q^2 + 3pq^3 \]

gives the respective probabilities of the different ways in which the four games may be gained by \( A \) and \( B \), excepting the two ways in which the play would have terminated with the third game. But the play cannot end in any of these ways; for, taking the first term for example, if \( B \) gains a counter before \( A \) gains three, the play cannot terminate until \( A \) gain back that counter, and three others besides, so that five games must be played. In fact, it is Probability obvious that there is no way of gaining an odd number of counters in an even number of games, or vice versa. The last product therefore expresses the chance of the 5th game being played; and by reason of \( p + q = 1 \) it is equal to \( 3pq^4 + 3pq^2q \); the chance of the 4th being played, as it obviously ought to be, since the play cannot terminate with the 4th.

Again, if the 5th game be played, \( p \) is A's chance of gaining it, and \( q \) B's chance of gaining it; multiplying therefore the last product by \( p + q \), the different terms of the result, namely,

\[ 3pq^4 + 9p^2q^3 + 9pq^2q + 3pq^4 \]

give the respective probabilities of all the cases which can arise by the 5th game. The first term is the probability of A gaining 4 games and B gaining 1, and the last term is the probability of B gaining 4 and A gaining 1. These terms therefore are the probabilities of the play ending in favour of A and B respectively with the 5th game, and the sum of the other two terms is the probability that the play will not terminate with the 5th game, or the chance of another game being played.

By pursuing the same reasoning it will be evident that on rejecting the two extreme terms of the above product, and multiplying the remainder by \( p + q \), there will result the probabilities of the different ways in which six games may be played without the one player gaining all the counters of the other. But as the play cannot terminate with the 6th game, multiply again by \( p + q \), and the result

\[ 9pq^4 + 27p^2q^3 + 27pq^2q + 9pq^4 \]

will indicate the probability of the different cases that can arise out of the 7th game. Rejecting the two extreme terms, which give the respective probabilities of the play being concluded in favour of A or B, and multiplying the remaining two first by \( p + q \) to obtain the different probabilities in respect of the 8th game, and again by \( p + q \), as the play cannot terminate with the 8th, we have the product

\[ 27pq^4 + 81p^2q^3 + 81pq^2q + 27pq^4, \]

of which the first and last terms give the respective chances of A and B winning at the 9th game, and the sum of the other two terms the probability that the play will not be concluded by the 9th.

If we now collect the terms which have been set aside in the successive products, and denote by \( a \) and \( b \) the respective probabilities of A and B gaining at the 9th game, or sooner, we shall have

\[ a = p^4 + 3pq^3 + 9pq^2q + 27pq^4, \]

\[ b = q^4 + 3q^2q + 9q^2pq + 27q^4p, \]

where the law of the series is evident.

It is easy to see that this process may be applied whatever be the number of counters which A and B have at the commencement, and whatever be the number, \( h \), of games to which the play is limited. The general rule is as follows: of the two numbers \( m \) and \( n \), let \( m \) be that which is not less than the other. Raise \( p + q \) to the power \( n \), and reject the first term (which gives the chance of A winning \( n \) games in succession), and also the last if \( m = n \). Multiply the remainder \((h - n)\) times in succession by \((p + q)\), rejecting at each multiplication the first or last term of the product when it gives a combination which would terminate the play in favour of A or B; the sum of the terms rejected from the left-hand side of the different products gives the probability in favour of A, and the sum of the terms rejected from the right-hand side the probability in favour of B. As the coefficients of the successive products are obviously formed by adding the coefficient of the corresponding term in the preceding product to that of the term immediately before it, the products may be written down at once without the trouble of multiplication; but it is evident that when \( m, n, \) and \( h \) are large numbers, it would be quite impracticable to sum the series formed of the rejected terms by the ordinary methods.

From the manner in which the series are derived, they are called recurring series; a general theory of which was first given by Demoivre in his Doctrine of Chances, and forms the most remarkable portion of that work.

109. The general problem is reduced to an equation of finite differences as follows: Let \( y_{x,t} \) represent A's expectation when \( x \) games have been played, and he has still \( t \) counters to win, or B has \( t \) counters in his hand. If A gain the next game the value of his expectation will become \( y_{x+1,t-1} \), and the chance of his gaining it is \( p \); therefore his expectation in respect of that event is \( py_{x+1,t-1} \). On the other hand, if A loses the next game his expectation will become \( y_{x+1,t+1} \), and the chance of losing it is \( q \); therefore his expectation in respect of that event is \( qy_{x+1,t+1} \).

Hence, according to the principles laid down in (32),

\[ y_{x,t} = py_{x+1,t-1} + qy_{x+1,t+1}, \]

a linear equation of finite differences, with three independent variables. It is therefore on the integration of an equation of this kind that the problem of the duration of play ultimately depends, but the subject is of much too complicated a nature to admit of its being satisfactorily explained in this place. We must therefore content ourselves with referring the reader to the treatise on generating functions, which forms the first part of the Théorie Analytique of Laplace.

110. In the preceding section we have considered a class of questions which apply to events depending on constant causes, and supposed to be of such a nature that they necessarily happen or fail in each experiment, and have given formulae by which approximate results can be obtained when the numbers involved are so large that they cannot be conveniently treated, or cannot be treated at all, by the ordinary methods of calculation. We come now to a more difficult problem, namely, to investigate the probable result of a large number of observations which have reference not to the simple occurrence or failure of a certain event, but to the magnitude of a thing, susceptible, within certain limits, of a very great or an infinite number of different values, equally or unequally probable, the chance of any particular value being also supposed to vary in each experiment. On account of its immediate application to the determination of the most probable values of astronomical and physical elements from the results of observation, this is, perhaps, in reference to practical utility, the most important question in the theory.

111. Let A represent a thing of any sort (as a line, or an angle, or a function of any quantity) which may have

---

1 Lagrange, in vol. I. of the Memoirs of the Society of Turin, was the first who showed that the investigation of the general term of a recurring series depends on the integration of a linear equation of finite differences. In vols. vi. and vii. of the Mémoires présentés à l'Académie des Sciences de Paris, Laplace proposed a general method for the summation of recurring series by the integration of such equations, and in the latter volume gives a number of examples of their use in the more complicated questions in the theory of chances, amongst which is the problem enunciated in (108). The subject was afterwards resumed by Lagrange in the volume of the Berlin Memoirs for 1775, where he has given a more direct method than that of Laplace, for the integration of the class of equations in question, and also applied it to the solution of the principal problems proposed in the works of Montmort and Demoivre. A general solution of the problem in the text is given by Ampère in a Tract entitled Considérations sur la Théorie Mathématique du Jeu, (Lyons, 1802.) Probability every possible value within given limits, or which may be constant in itself, but of such a nature that its real magnitude can only be observed within certain limits of accuracy, and suppose a great number of observations to be made. The object is, in the first place, to assign the probability that the sum of the observed values shall fall within given limits, supposing the chances of the different values of \( A \) to be known a priori; and, in the second place, when the law of the chances is unknown, to determine from the observations themselves the most probable mean value of \( A \), and also the limits within which there is a given amount of probability that the difference between such mean value, and the true but unknown value of \( A \), shall be contained.

112. Let \( a \) and \( b \) be the limits of the possible values of \( A \), \( x \) a value of \( A \) between \( a \) and \( b \), and \( P \) the probability that the sum of the values of \( A \) given by \( h \) observations will be \( s \) exactly, \( s \) being a given quantity between \( ha \) and \( hb \). Assume the values of \( A \) to be equidistant, and multiples of a certain constant \( e \), and make

\[ a = a_0, \quad b = b_0, \quad \sigma = \sigma_0, \quad i = i_0, \]

where \( a_0, b_0, \) and \( \sigma_0 \) are whole numbers (which may be positive or negative), and \( i_0 \) is also a whole number proportional to \( x \), and varying between the limits \( i = a_0 \) and \( i = b_0 \), and which, therefore, may be positive or negative, or zero. If the different values of \( A \) are supposed to be equally probable, the chance of obtaining any given one of them, as \( x \), in a single trial is unit divided by the number of possible values, or equal to \( 1/(b-a+1) \); and if we assume an indeterminate quantity \( w \), then (20) the number of combinations which give the sum of \( h \) values of \( A \) equal to \( \sigma \) is the coefficient of that term of the multinomial

\[ (w^a + w^{a+1} + w^{a+2} + \cdots + w^b)^h, \]

(or of the development of \( (\Sigma w^i)^h \) from \( i = a \) to \( i = b \)) in which the exponent of \( w \) is \( \sigma \); and consequently the probability \( P \) that the sum of the values of \( A \) will be \( s \) exactly is that coefficient divided by \( (b-a+1)^h \).

113. If the chances of the different values of \( A \) are unequal, and also vary in each trial, let \( p_i \) be the probability of the observed value of \( A \) being \( x \) in the first trial, \( p_{i+1} \) the probability of its being \( x \) in the second, \( p_{i+2} \) that of its being \( x \) in the third, and so on. Now when \( h = 1 \), or when there is only a single trial, then \( s = x = i_0 \), and we have \( P = p_{i_0} \). If \( h = 2 \), then, assuming \( i_0 \) to be the value of \( A \) in the first trial, and \( i_1 \) its value in the second, \( i_0 \) and \( i_1 \) being any two numbers between \( a \) and \( b \), the two observations may give the sum of the two values equal to \( \sigma \) in as many different ways as it is possible to satisfy the equation \( i_0 + i_1 = \sigma \); and consequently, according to the theory of combinations, \( P \) is the coefficient of that term of the product (arranged according to the powers of \( w \)) of the two series represented by \( 2p_{i_0}w^{i_0} \) and \( 2p_{i_1}w^{i_1} \), in which the exponent of \( w \) is equal to \( \sigma \). In like manner, if \( h = 3 \), then the sum of the observed values of \( A \) may be equal to \( \sigma \) in as many different ways as the equation \( i_0 + i_1 + i_2 = \sigma \) admits of different solutions, and consequently \( P \) is the coefficient of the term of the development of the product \( 2p_{i_0}w^{i_0} \cdot 2p_{i_1}w^{i_1} \cdot 2p_{i_2}w^{i_2} \), in which the exponent of \( w \) is equal to \( \sigma \). Generally, when the number of observations is \( h \), the probability \( P \) of the sum of the observed values of \( A \) being \( s \) or \( \sigma \), exactly, is the coefficient of \( w^\sigma \) in the development of the product

\[ 2p_{i_0}w^{i_0} \cdot 2p_{i_1}w^{i_1} \cdot 2p_{i_2}w^{i_2} \cdots 2p_{i_h}w^{i_h}, \]

the sums \( \Sigma \) including all values of \( i \) from \( i = a \) to \( i = b \).

Assume \( w = e^{\sqrt{-1}} \) (e being the base of the Napierian logarithms), and let the above product be denoted by \( X \). We shall then have

\[ X = 2p_{i_0}e^{i_0\sqrt{-1}} \cdot 2p_{i_1}e^{i_1\sqrt{-1}} \cdot 2p_{i_2}e^{i_2\sqrt{-1}} \cdots 2p_{i_h}e^{i_h\sqrt{-1}}. \]

Now since \( P \) is the coefficient of the term of the development of this product which contains the factor \( e^{\sigma\sqrt{-1}} \), if we conceive the development effected we shall have

\[ X = Pe^{\sigma\sqrt{-1}} + P'e^{\sigma'\sqrt{-1}} + \cdots + P''e^{\sigma''\sqrt{-1}} + \cdots, \]

a series in which all the terms are of the same form. Multiplying both sides of the equation by \( e^{-\sigma\sqrt{-1}} \), we get

\[ Xe^{-\sigma\sqrt{-1}} = P + P'e^{\sigma'\sqrt{-1}} + \cdots + P''e^{\sigma''\sqrt{-1}} + \cdots. \]

Now by a well-known theorem in trigonometry, (Algebra, art. 269),

\[ e^{\sigma'\sqrt{-1}} = \cos(\sigma' - \sigma)\theta + \sin(\sigma' - \sigma)\theta; \]

substituting therefore this value, and multiplying by \( db \), the equation becomes

\[ Xe^{-\sigma\sqrt{-1}} = Pd\theta + P'e^{\sigma'\sqrt{-1}} + \cdots + P''e^{\sigma''\sqrt{-1}} + \cdots. \]

The factor which multiplies \( P' \) in this equation will evidently become zero when integrated from \( \theta = -\pi \) to \( \theta = \pi \), \( (\sigma' \) being the semicircumference to radius 1), the positive and negative elements of the integral being equal, and consequently destroying each other. The same thing also takes place with respect to the following terms, which are all of the same form. Integrating therefore between those limits, and observing that \( \int d\theta = 2\pi \), we find

\[ P = \frac{1}{2\pi} \int_{-\pi}^{\pi} Xe^{-\sigma\sqrt{-1}} d\theta. \]

114. This value of \( P \) denotes the infinitely small chance that the sum of the values of \( A \) in \( h \) trials will be \( s \) exactly. Let \( \mu \) and \( \nu \) be two integer numbers between \( ha \) and \( hb \), and let \( Q \) denote the probability that \( s \) will be comprised between the two limits \( \mu \) and \( \nu \) (these limits being included between \( ha \) and \( hb \)); then \( Q \) will be found by substituting successively \( \mu, \mu + 1, \mu + 2, \ldots, \nu \) for \( \sigma \) in the above value of \( P \), and taking the sum of all the resulting terms. This substitution gives the following series multiplied by \( X \) under the sign of integration:

\[ e^{-\mu\sqrt{-1}} + e^{-(\mu+1)\sqrt{-1}} + e^{-(\mu+2)\sqrt{-1}} + \cdots + e^{-\nu\sqrt{-1}}. \]

On multiplying the series now found by \( e^{\mu\sqrt{-1}} - e^{-\nu\sqrt{-1}} \)

\[ = 2\sqrt{-1}\sin(\mu\theta), \]

all the terms of the product, excepting the first and the last, destroy each other, and the sum of the terms becomes simply

\[ e^{-(\mu-1)\sqrt{-1}} - e^{-(\nu+1)\sqrt{-1}}; \]

therefore on making the substitution, and performing the multiplication now indicated, and dividing by \( 2\sqrt{-1}\sin(\mu\theta) \), we obtain for the value of \( Q \) the equation

\[ Q = \frac{1}{2\pi} \int_{-\pi}^{\pi} X \left[ e^{-(\mu-1)\sqrt{-1}} - e^{-(\nu+1)\sqrt{-1}} \right] \frac{dt}{\sin(\mu\theta)}. \]

115. In order to simplify the expression for \( Q \), let the number of possible values of \( A \) within the given limits be conceived to be infinite, in which case the constant \( \epsilon \) becomes infinitely small, and therefore, since the limits are finite, \( \mu \) and \( \nu \) infinitely great. Let the following substitutions also be made:

\[ \mu = \psi - \delta, \quad \nu = \psi + \delta, \quad \delta = \epsilon, \]

\( \delta \) being positive in order that \( \nu \) may be greater than \( \mu \), agreeably to what has already been assumed. On substituting these expressions in the above equation, the limits of the new variable \( z \) will be \( \pm \infty \); for \( \epsilon \) having been supposed infinitely small, \( z \) must become infinitely great when \( \epsilon = \infty \). Now since \( \mu \) and \( \nu \) are infinitely great, \( \mu - \frac{1}{2} \) and \( \nu + \frac{1}{2} \) become sensibly \( \mu \) and \( \nu \), whence we have Probability \( e^{-(\phi-1)/\sqrt{-1}} - e^{-(\phi+1)/\sqrt{-1}} = e^{-\phi z} \left( e^{\phi z \sqrt{-1}} - e^{-\phi z \sqrt{-1}} \right) \)

\( = e^{-\phi z \sqrt{-1}} \sin \phi z \). Again, by reason of \( \phi = \epsilon \), we have \( d\phi = d\epsilon \); and \( \epsilon \) being infinitely small, \( \phi \) must be a very small arc, therefore \( \frac{1}{2} \phi \) may be taken for \( \sin \phi \), whence \( d\phi = \frac{1}{2} \phi \approx 2d\epsilon \approx z \). By means of these transformations the expression for \( Q \) becomes

\[ Q = \frac{1}{\pi} \int_{-\infty}^{+\infty} Xe^{-\phi z \sqrt{-1}} \sin \phi z \cdot dz \]

and denotes the probability that the sum of the \( h \) values of \( A \) will lie between \( \psi = \pm \).

116. It is now necessary to assign a value to the product denoted by \( X \). Since the number of possible values of \( A \) between \( a \) and \( b \) has been supposed infinite, the chance of obtaining any given one of them, as \( x \), in a single trial, is infinitely small. Assuming this chance to be a function of \( x \), and to vary in the different trials, let it be represented by \( \phi_x \) in respect of the \( n \)th trial. In order to preserve continuity in the values of \( A \), this must be understood as signifying that \( \phi_x dx \) is the infinitely small chance that the value of \( A \) given by the \( n \)th observation will lie between \( x \) and \( x + dx \). The function \( \phi_x \), therefore, represents the law of the facility of the different values of \( A \). It is positive for all values of \( x \) between \( a \) and \( b \), and vanishes for all values of \( x \) less than \( a \) or greater than \( b \); and it is important to remark, that whatever number \( n \) may be, the integral \( \int \phi_x dx \) taken from \( x = a \) to \( x = b \) is always equal to unity; for since every observation gives a value of \( A \) between \( a \) and \( b \), the sum of all the probabilities in respect of each observation must be unity or certainty. From this assumption, then, we have \( \phi_x dx = p_x dx = p_x \ldots \)

whence the sums \( \sum p_x e^{i\phi z \sqrt{-1}} \) (113) are changed into definite integrals; and therefore, since \( \phi = \epsilon \), \( x = i\epsilon \), and consequently \( i\epsilon = z \), we obtain for the value of \( X \),

\[ X = \int e^{i\phi z \sqrt{-1}} \phi_x dx = \int e^{i\phi z \sqrt{-1}} \phi_x dx \ldots \int e^{i\phi z \sqrt{-1}} \phi_x dx \]

the limits of the integrals being \( x = a \) and \( x = b \).

By reason of \( e^{i\phi z \sqrt{-1}} = \cos zx + \sqrt{-1} \sin zx \), each of these integrals may be expressed in terms of the cosine and sine of \( zx \). The \( n \)th, for instance, becomes \( \int \phi_x \cos zx dx + \int \phi_x \sin zx dx \). Now since \( \int \phi_x dx = 1 \) (from \( x = a \) to \( x = b \)), \( \phi_x \) can have only positive values, each of the integrals is less than 1; whence we may assume

\[ \int \phi_x \cos zx dx = R_a \cos r_a; \quad \int \phi_x \sin zx dx = R_a \sin r_a; \]

\( R_a \) being a positive quantity, and \( r_a \) an angle having always a real value. This gives

\[ \int e^{i\phi z \sqrt{-1}} \phi_x dx = R_a (\cos r_a - \sqrt{-1} \sin r_a) = R_a e^{i\phi z \sqrt{-1}}; \]

whence substituting successively for \( n \) the numbers 1, 2, 3, ..., and for the sake of brevity making

\[ Y = R_1 \times R_2 \times R_3 \ldots \times R_n; \]

we get \( X = Y e^{i\phi z \sqrt{-1}} \); and the expression for \( Q \) becomes

\[ Q = \frac{1}{\pi} \int_{-\infty}^{+\infty} Ye^{i(y-\phi z) \sqrt{-1}} \sin \phi z \cdot dz \]

117. The integral in this last expression is equivalent to two others, namely

\[ \int Y \cos(y-\phi z) \sin \phi z \cdot dz + \int Y \sin(y-\phi z) \sin \phi z \cdot dz \]

Now, on attending to the nature of the quantities represented by \( Y \) and \( y \), it will be manifest that according as \( z \) is positive or negative, \( r_a \) and consequently \( y \) is positive or negative, while \( Y \) is positive in all cases, since \( R_a \) is always positive. Hence \( \cos(y-\phi z) \) is always positive, and the elements of the first of the above integrals having thus the same value and the same sign for the same value of \( z \), whether \( z \) be positive or negative, the value of the integral from \( -\infty \) to \( +\infty \) is double of its value from 0 to \( \infty \). On the other hand, since \( y \) and \( z \) have both the same sign, \( \sin(y-\phi z) \) is positive or negative according as \( z \) is positive or negative, and the elements of the integral into which it enters being equal for \( -z \) and \( +z \), but having contrary signs, destroy each other, and the integral from \( -\infty \) to \( +\infty \) vanishes. The expression for \( Q \) is therefore transformed into

\[ Q = \frac{2}{\pi} \int_{-\infty}^{+\infty} Y \cos(y-\phi z) \sin \phi z \cdot dz \]

118. The formula now found cannot in general be integrated by any of the known methods, but in the present case the quantities denoted by \( Y \) and \( y \) are such that an approximate value of \( Q \) may be obtained, which will always be more nearly equal to the true value as \( h \), the number of observations, is increased. On adding the squares of the two quantities represented by \( R_a \cos r_a \) and \( R_a \sin r_a \), we get

\[ R_a^2 = (\int \phi_x \cos zx dx)^2 + (\int \phi_x \sin zx dx)^2. \]

If \( z = 0 \), this becomes \( R_a = \int \phi_x dx \), whence by (116), \( R_a = 1 \).

When \( z \) is a real value, then it may be shown that \( R_a \) is less than 1; for let \( z' \) be any value of \( z \) different from \( z \), then \( z' \) can only vary from \( a \) to \( b \), we have obviously,

\[ \int \phi_x \cos zx' dx' = \int \phi_x \cos zx dx, \quad \text{and} \quad \int \phi_x \sin zx' dx' = \int \phi_x \sin zx dx, \]

and the above equation may be put under this form,

\[ R_a^2 = \int \phi_x \cos zx dx \cdot \int \phi_x \cos zx dx' + \int \phi_x \sin zx dx \cdot \int \phi_x \sin zx dx', \]

whence

\[ R_a = \int \phi_x \cos(z-z') dx dx'. \]

Now, excepting the case in which \( z = 0 \), this double integral is always less than \( \int \phi_x dx \cdot \int \phi_x dx' \), or less than \( (\int \phi_x dx)^2 \), and consequently \( R_a \) is less than \( \int \phi_x dx \), that is, less than unity. Since, then, it has been shown that \( R_a \) is equal to unity when \( z = 0 \), and less than unity for all other values of \( z \), and since \( Y \) is a quantity of the order \( R_a^2 \), it follows that \( Y \) must diminish with great rapidity when \( z \), or its equal \( \phi z \), differs sensibly from 0 , and even for very small values of \( z \) becomes insensible when \( h \) is a large number.

We may therefore assume \( Y = e^{-\phi z} \), an expression which is equal to unity when \( \phi = 0 \), and diminishes rapidly as \( \phi \) is increased, and becomes zero when \( \phi \) is infinite.

119. For the sake of abridging let us assume

\[ h_a = \int \phi_x dx, \quad k_a = \int \phi_x dx, \quad k'_a = \int \phi_x dx, \quad \text{etc.} \]

(the integrals in respect of \( x \) being always from \( x = a \) to \( x = b \)). From known formulae we have

\[ \cos zx = 1 - \frac{z^2}{2} + \frac{z^4}{2 \cdot 3 \cdot 4} - \cdots; \quad \sin zx = \frac{z^3}{2 \cdot 3} - \frac{z^5}{2 \cdot 3 \cdot 4} + \cdots; \]

substituting these series for \( \cos zx \) and \( \sin zx \) in the integrals \( \int \phi_x \cos zx dx \) and \( \int \phi_x \sin zx dx \), and also \( k_a, k'_a, k''_a \), etc., for the values they have now been assumed to represent, then, from (116) we have

\[ R_a \cos r_a = 1 - \frac{z^2}{2} k'_a + \frac{z^4}{2 \cdot 3 \cdot 4} k''_a - \cdots; \]

\[ R_a \sin r_a = z k_a - \frac{z^3}{2 \cdot 3} k'_a + \cdots; \]

and it will be seen presently that all the terms involving higher powers of \( z \) than the cube may be neglected. Adding together the squares of these two equations, we get

\[ R_a^2 = 1 - \frac{z^2}{2} (k'_a - k''_a) + \frac{z^4}{2 \cdot 3 \cdot 4} - \cdots; \]

whence

\[ R_a = 1 - \frac{z^2}{2} (k'_a - k''_a) + \frac{z^4}{2 \cdot 3 \cdot 4} - \cdots; \] Probability \( f \) being independent of \( z \). On dividing the second by the first, there results \( \tan r = zh - \frac{1}{2}z^2h' + \frac{1}{3}z^3h'' + \ldots \); whence by reason of \( r = \tan^{-1} \frac{\theta}{\phi} + \epsilon + \delta \), \( r_n = zh - \frac{1}{2}z^2(h' - 3h'' + 2h''') + \ldots \).

If, therefore, we make

\[ e_n = \frac{1}{2}(h' - h''), \quad g_n = \frac{1}{2}(h' - 3h'' + 2h'''), \]

the values of \( R_n \) and \( r_n \) become respectively

\[ R_n = 1 - \frac{1}{2}e_n + z^2f' + \ldots \quad \text{and} \quad r_n = zh - \frac{1}{2}z^2g_n + \ldots \]

Now, by hypothesis (116) \( Y = \sum R_n X_n \times R_n \times \ldots \times R_n \); therefore \( \log Y = \sum \log R_n = \sum \log(1 - \frac{1}{2}e_n + z^2f' + \ldots) = -\sum \left[ \frac{1}{2}e_n - \frac{1}{2}(f' - \frac{1}{2}e_n) + \ldots \right] \) (by reason of the formula \( \log R_n = R_n - 1 - \frac{1}{2}(R_n - 1)^2 + \ldots \)).

But we have also assumed

\[ (117) \quad Y = e^{-\theta^2}; \quad \text{hence} \quad \log Y = -\theta^2, \quad \text{and consequently} \quad \theta^2 = \sum \left[ \frac{1}{2}e_n - \frac{1}{2}(f' - \frac{1}{2}e_n) + \ldots \right]. \]

In like manner, since \( y = r_1 + r_2 + \ldots + r_n = \sum r_n \), therefore \( y = zh - \frac{1}{2}z^2g_n + \ldots \).

Now, the sums \( \Sigma \) include all values of \( c_n, k_n, g_n \) from \( n = 1 \) to \( n = h \); let the mean values of those quantities, therefore, be denoted by \( c, k, g \), that is to say, let

\[ \Sigma c = hc, \quad \Sigma k = hk, \quad \Sigma g = hg, \]

and make also \( hf' = \sum (f' - \frac{1}{2}c) \), and we have \( \theta^2 = zh - \frac{1}{2}z^2he - z^4hf'' + \ldots \).

By reverting the series the value of \( z \) is found in terms of \( \theta \); namely \( z = \frac{\theta}{\sqrt{(hc)}} + \frac{f'g'}{2bc\sqrt{(hc)}} + \ldots \).

But the second term of this series, being divided by \( h'/h \), and \( h \) being by supposition a large number, is very small in comparison of the first, and may be neglected as insensible. All the succeeding terms of the series are divided by higher powers of \( h \), and may therefore be rejected a fortiori. Confining the approximation, therefore, to terms of the order \( 1 + \sqrt{h} \), and rejecting all those into which \( h \) or its powers enters as a divisor, we have \( z = \frac{\theta}{\sqrt{(hc)}} \), and likewise \( dz = \frac{d\theta}{\sqrt{(hc)}} \).

From (116) we have also \( y = zh - \frac{1}{2}z^2k + \ldots \), therefore in consequence of the above transformations, \( y = zh - \frac{1}{2}z^2hg + \ldots \); and on substituting for \( z \) its value just found in terms of \( \theta \), \( y = zh - \frac{1}{2}z^2(\theta + \epsilon) - \frac{1}{2}z^2(\theta + \epsilon)^2 + \ldots \).

Consequently \( y - \psi = zh - \frac{1}{2}z^2(\theta + \epsilon) - \frac{1}{2}z^2(\theta + \epsilon)^2 + \ldots \).

In order to deduce from this an expression for \( \cos(y - \psi) \), let \( u \) and \( v \) denote any two arcs, then by trigonometry, \( \cos(u - v) = \cos u \cos v + \sin u \sin v \). Suppose \( \epsilon \) to be small, and let its cosine and sine be developed in series and substituted in this equation; it will become

\[ \cos(u - v) = \cos \frac{u^2}{2} \cos u + \epsilon \sin u - \frac{v^2}{2} \sin u + \ldots \]

whence, making \( u = \frac{1}{2}(\theta + \epsilon) \), \( v = \frac{1}{2}(\theta - \epsilon) \), and rejecting as before terms of the order \( 1 + \epsilon \), we have

\[ \cos(y - \psi) = \cos \left\{ \frac{1}{2}(\theta + \epsilon) \right\} + \frac{1}{2}(\theta + \epsilon) \sin \left\{ \frac{1}{2}(\theta + \epsilon) \right\}. \]

If we now substitute the values of \( Y, z, \theta, \cos(y - \psi) \) found in the last three paragraphs in the value of \( Q \) (117) we obtain the following expression in which the largest terms omitted are of the order \( 1 + \epsilon \), and which therefore is more accurate as \( h \) is a higher number, viz.

\[ Q = \frac{2}{\pi} \int_{-\infty}^{\infty} e^{-\theta^2} \cos \left\{ \frac{1}{2}(\theta + \epsilon) \right\} \sin \left\{ \frac{1}{2}(\theta + \epsilon) \right\} d\theta \]

\[ + \frac{2g}{\pi c\sqrt{(hc)}} \int_{-\infty}^{\infty} e^{-\theta^2} \sin \left\{ \frac{1}{2}(\theta + \epsilon) \right\} \sin \left\{ \frac{1}{2}(\theta + \epsilon) \right\} d\theta. \]

120. As no restriction has yet been made with respect to the value of \( \psi \), excepting that it is a mean between \( \mu \) and \( \nu \), and therefore included between \( h\mu \) and \( h\nu \) (115), let us now assume \( \psi = h\mu \). This gives \( \cos(h\mu - \psi) = 1 \), and \( \sin(h\mu - \psi) = 0 \); and the equation becomes

\[ Q = \frac{2}{\pi} \int_{-\infty}^{\infty} e^{-\theta^2} \sin \left\{ \frac{1}{2}(\theta + \epsilon) \right\} d\theta, \]

which is the probability that the sum of the observed values of \( A \) will fall between \( h\mu - \epsilon \) and \( h\mu + \epsilon \).

121. The last step in this investigation is to reduce the integral now found to a known form, which may be accomplished as follows: Let \( u \) be a new variable, then by means of the trigonometrical formula \( \cos u = \frac{1}{2}e^{iu} + \frac{1}{2}e^{-iu} \),

\[ \int e^{-\theta^2} \cos u\theta d\theta = \int e^{-\theta^2} \left( \frac{1}{2}e^{iu} + \frac{1}{2}e^{-iu} \right) d\theta = \frac{1}{2} \int e^{-\theta^2} e^{iu} d\theta + \frac{1}{2} \int e^{-\theta^2} e^{-iu} d\theta. \]

But \( -\theta^2 + u\theta \sqrt{-1} = -\frac{1}{2}u^2 - (\theta - \frac{1}{2}u\sqrt{-1})^2 \); assume therefore, \( v = \theta - \frac{1}{2}u\sqrt{-1} \) (whence \( dv = d\theta \)), then

\[ \frac{1}{2} \int e^{-\theta^2} e^{iu} d\theta = \frac{1}{2} \int e^{-v^2} e^{iu} dv = \frac{1}{2} \int e^{-v^2} e^{iu} dv. \]

When \( \theta = 0 \), then \( v = -\frac{1}{2}u\sqrt{-1} \), and when \( \theta \) is infinite, \( v \) is infinite; therefore, if the integral in respect of \( \theta \) be taken from \( \theta = 0 \) to \( \theta = \infty \), the integral in respect of \( v \) must be taken from \( v = -\frac{1}{2}u\sqrt{-1} \) to \( v = \infty \).

If we now suppose \( u \) to be negative, we shall have in like manner \( \frac{1}{2} \int e^{-\theta^2} e^{-iu} d\theta = \frac{1}{2} \int e^{-v^2} e^{-iu} dv \); the limits in this case being from \( v = -\frac{1}{2}u\sqrt{-1} \) to \( v = \infty \). Hence

\[ \int e^{-\theta^2} \cos u\theta d\theta = \frac{1}{2} \int e^{-v^2} \left( \int e^{-v^2} e^{iu} dv + e^{-iu} dv \right). \]

But the sum of the two integrals on the right-hand side of this equation, the first being taken from \( v = -\frac{1}{2}u\sqrt{-1} \) to infinity, and the second from \( v = -\frac{1}{2}u\sqrt{-1} \) to infinity, is obviously the double of \( \int e^{-v^2} dv \) from \( v = 0 \) to \( v = \infty \), or (96) equal to \( \sqrt{\pi} \); and we have therefore

\[ \int_{-\infty}^{\infty} e^{-\theta^2} \cos u\theta d\theta = \frac{1}{2} \sqrt{\pi} e^{-\frac{1}{4}u^2}. \]

Let both sides of the equation be multiplied by \( du \), and integrated from \( u = 0 \) to \( u = \pm \frac{1}{2}u\sqrt{-1} \); then observing that \( \int \cos(u\theta) d\theta = \sin(u\theta) + \theta \), we shall have

\[ \int_{-\infty}^{\infty} e^{-\theta^2} \sin \left\{ \frac{1}{2}(\theta + \epsilon) \right\} d\theta = \frac{1}{2} \sqrt{\pi} \int_{-\infty}^{\infty} e^{-\theta^2} d\theta. \]

Comparing this equation with that in (110), we find

\[ Q = \frac{1}{\sqrt{\pi}} \int_{-\infty}^{\infty} e^{-\theta^2} d\theta. \]

Now, let \( s = 2t \), and let \( r \) be what \( t \) becomes when \( u = \frac{1}{2}(\theta + \epsilon) \); then \( \frac{1}{2}u^2 = t^2, dt = 2dt, \)

\[ \frac{1}{2}(\theta + \epsilon) = 2r\sqrt{(hc)} \text{ or } \theta = 2r\sqrt{(hc)}, \]

and we have finally,

\[ Q = \frac{2}{\sqrt{\pi}} \int_{-\infty}^{\infty} e^{-\theta^2} d\theta, \text{ or, } Q = 1 - \frac{2}{\sqrt{\pi}} \int_{-\infty}^{\infty} e^{-\theta^2} d\theta, \]

for the probability that \( s \), the sum of the observed values of \( A \), will be comprised between the limits \( \psi - \delta \) and \( \psi + \delta \), that is, between \( h\mu - 2r\sqrt{(hc)} \) or, that the arithmetical mean of all the observations, namely \( s = h\mu \), will lie between \( h\mu - 2r\sqrt{(hc)} \).

122. The expression now found for \( Q \) is that which in (96) was denoted by \( \Theta \), and of which the table gives the values corresponding to the different values of \( r \). The general result of the investigation is, therefore, that whatever be the nature of the function \( \phi(x) \) which represents the law of the facility of the different values of \( A \), if a large number of observations be made, the sum of the values of \( A \), divided by the number of observations, approaches continually to a certain special quantity \( k \) (which is the true mean value of \( A \)) as the number of observations is increased, and that by multiplying the number of observations, a probability \( \Theta \) may always be obtained, approaching as nearly to certainty as we please, that the difference between the arith- Probability metrical mean or average of the observations and the true mean value of \( A \) will be comprised within limits which may be made as small as we please.

The analysis employed in the preceding articles, (113 to 121), for the purpose of establishing this very important result, belongs to Poisson, and is given in nearly the same form in the *Recherches sur la Probabilité des Juges*ments*, chap. iv., and in the Additions to the Connaissance des Tems* for 1832. We have preferred it to the method followed by Laplace in the *Théorie Analytique*, as being somewhat simpler and also more general.

123. In order that the limits \( 2\sqrt{he} \) may be real, it is necessary that the special quantity \( e \) be positive, a condition which has hitherto been assumed. Now, since \( e = \sum_{i=1}^{n} a_i \), it is obvious that \( e \) will be positive if \( c = \frac{1}{2}(K - h^2) \) be positive. On writing for \( K' \) and \( k \), their values (119) we have

\[ 2c_a = \int x^2 \phi_x dx - (\int \phi_x dx)^2, \]

the limits of the integrals being always from \( x = a \) to \( x = b \).

But it is evident that no change will be made in the values of these definite integrals (the limits continuing the same), by substituting in them any other of the possible values of \( A \), as \( z \). We have therefore \( \int \phi_x dx = \int \phi_x dx \), and since in all cases \( \int \phi_x dx = 1 \), the above equation may otherwise written

\[ 2c_a = \int x^2 \phi_x dx - \int \phi_x dx \int x \phi_x dx, \]

whence \( 2c_a = \int \phi_x dx \int (x^2 - xz) dx dz \),

or \( 2c_a = \int \phi_x dx \int (x^2 - xz) dx dz \).

Adding together the two last equations, there results

\[ 4c_a = \int \phi_x dx \int (x^2 - xz)^2 dx dz, \]

a quantity which is necessarily positive, and can never be zero so long as \( x \) can have different values.

124. The special quantity \( k \) to which the average of the values of \( A \) continually approaches, is connected with the centre of gravity of the area of a curve by the following relation. Let \( x \) and \( y \) be the co-ordinates of a curve, of which the equation is \( y = \phi_x \); then the element of the area is \( \phi_x dx \).

But (116) \( \int \phi_x dx \) is the infinitely small probability that the value of \( A \) in the \( n \)th observation will lie between \( x \) and \( x + dx \); therefore the element of the area of the curve represents this probability, and the curve itself represents the law of the probability of the different values of \( A \) in respect of the \( n \)th trial. In like manner, the curve whose co-ordinates are \( x \) and \( (1 + h) \int \phi_x dx \), represents the law of the mean probability of \( A \) in respect of the whole series of observations. Now, if \( x_1 \) be the absciss of the centre of gravity of any curve whose co-ordinates are \( x \) and \( y \), the well known formula of mechanics gives \( x_1 = \int x \phi_x dx / \int \phi_x dx \); therefore, applying this formula to the curve of the mean probability, and making the whole area \( \int \phi_x dx \) from \( x = a \) to \( x = b \) = 1 , the absciss of the centre of gravity is \( x_1 = \frac{1}{2}(1 + h) \int \phi_x dx \).

But this is the quantity denoted by \( k \) (121); hence the special quantity to which the average of a large number of observations indefinitely approaches is the absciss of the centre of gravity of the area of the curve which represents the law of the mean chances of \( A \).

125. It has been assumed in the foregoing analysis that \( A \) is susceptible of an infinite number of values, increasing continuously from \( a \) to \( b \). The results, however, are easily adapted to those cases in which the number of possible values of \( A \) is finite. Suppose \( A \) to be a thing susceptible of only \( \lambda \) different values, represented by \( a_1, a_2, \ldots, a_\lambda \), and let the chances of these values, which may be different in the different trials, be respectively \( \gamma_1, \gamma_2, \ldots, \gamma_\lambda \), in respect of the \( n \)th trial. Now, suppose \( \phi_x \) to be a discontinuous function, which vanishes for all values of \( x \), of which the difference from one or other of the above values of \( A \) exceeds an infinitely small quantity \( \epsilon \); then the whole integral \( \int \phi_x dx \) from \( x = a \) to \( x = b \), will be made up of a series of \( \lambda \) partial integrals \( \int \phi_x dx \) taken between the limits \( a_i = a \pm \epsilon \), the sum of which will be unity, since one or other of the values of \( A \) must necessarily be given by the trial. But the integral \( \int \phi_x dx \) between the limits \( a_i = a \pm \epsilon \) is the expression of the chance that the value of \( A \) given in the \( n \)th trial will lie between \( a_i = a \pm \epsilon \); whence for those limits \( \int \phi_x dx = \gamma_i \).

Now the difference \( x - a_i \) must be infinitely small, so that it cannot exceed \( \epsilon \); we may therefore substitute \( a_i \) for \( x \), and \( a_i^2 \) for \( x^2 \) under the sign of integration, when the limits are \( a_i = a \pm \epsilon \), so that for those limits we have \( \int \phi_x dx = a_i \int \phi_x dx = \gamma_i a_i \).

On writing for \( i \) all the different numbers \( 1, 2, 3, \ldots, \lambda \), and observing that the \( \lambda \) partial integrals thus formed make up the whole integral \( \int \phi_x dx \) from \( x = a \) to \( x = b \), and that therefore their sum is \( k \), we have, in respect of the \( n \)th trial,

\[ k = \gamma_1 a_1 + \gamma_2 a_2 + \gamma_3 a_3 + \cdots + \gamma_\lambda a_\lambda. \]

In like manner, for \( k' = \int x^2 \phi_x dx \) (from \( a \) to \( b \)), we have

\[ k' = \gamma_1 a_1^2 + \gamma_2 a_2^2 + \gamma_3 a_3^2 + \cdots + \gamma_\lambda a_\lambda^2; \]

so that the two special quantities \( k \) and \( k' \) become

\[ k = (1 + h) \sum (\gamma_i a_i + \gamma_i a_i^2 + \gamma_i a_i^3 + \cdots + \gamma_i a_i^\lambda), \]

\[ k' = (1 + h) \sum (\gamma_i a_i^2 + \gamma_i a_i^3 + \gamma_i a_i^4 + \cdots + \gamma_i a_i^\lambda), \]

the sums extending to all the \( h \) values of \( n \), or to all the trials, the chances denoted by \( \gamma_1, \gamma_2, \ldots, \gamma_\lambda \) being supposed to vary in the different trials.

126. When the chances of the different values of \( A \) are equal and constant, then \( \gamma_i = 1 + \lambda \), and the above values of \( k \) and \( k' \) become

\[ k = (1 + \lambda)(a_1 + a_2 + a_3 + \cdots + a_\lambda), \]

\[ k' = (1 + \lambda)(a_1^2 + a_2^2 + a_3^2 + \cdots + a_\lambda^2), \]

so that \( k \) is the arithmetical mean of the possible values of \( A \), and \( k' \) the mean of the squares of those values. On this hypothesis, therefore, \( k \) and \( k' \) may be computed *a priori*, and consequently the limits determined within which there is a given probability \( \alpha \) that the average of \( h \) observations will fall, the limits being \( k = 2\sqrt{\alpha/(e-h)} \), where \( e = \frac{1}{2}(K - h^2) \).

When the chances of the different values of \( A \) are unequal, but constant in the different trials, then \( k = k_a \) and \( k' = k'_a \), and we have

\[ k = \gamma_1 a_1 + \gamma_2 a_2 + \gamma_3 a_3 + \cdots + \gamma_\lambda a_\lambda, \]

\[ k' = \gamma_1 a_1^2 + \gamma_2 a_2^2 + \gamma_3 a_3^2 + \cdots + \gamma_\lambda a_\lambda^2. \]

In this case the special quantity \( k \) to which the average of the observed values continually approaches, is the sum of the possible values, each multiplied into its respective probability; and \( k' \) is the sum of the products of the squares of those values into their respective probabilities.

127. Resuming the consideration of the general formula in (121), we shall now give an example of its application when the function which represents the law of facility of the different values of \( A \) is supposed to be known *a priori*.

Of all the hypotheses which may be made respecting the law of facility, the simplest is that which supposes the chances of all the possible values of the thing observed to be equal, and to remain constant during the series of trials. This supposes \( \phi_x = \text{constant}; \) whence \( \int \phi_x dx \) between the limits \( x = a \) and \( x = b \), becomes \( (b-a) \phi_x \). But between those limits we have also \( \int \phi_x dx = 1 \); therefore \( \phi_x = \frac{1}{b-a} \).

From this value of \( \phi_x \) it is easy to deduce the special quantities \( k \) and \( k' \). On the present hypothesis \( k = k_a \) and \( k' = k'_a \); therefore, the limits of the integral being \( x = a \) and \( x = b \), we have \( k = \int x \phi_x dx = \int \frac{x}{b-a} dx = \frac{b^2-a^2}{2(b-a)} \).

\( \frac{1}{2}(b+a) \), whence \( k = \frac{1}{2}(b+a) \).

In like manner \( k' = \int x^2 \phi_x dx \) becomes \( \int \frac{x^2}{b-a} dx = \frac{1}{2}(b^2+ba+a^2) \); whence \( c = \frac{1}{2}(k' - k^2) \). Probability \( \frac{1}{h^2 + b + a + a^2} - \frac{1}{b + a} \). Hence by (121) we have the probability \( \phi \) that the average value of \( A \) given by \( h \) observations, or the sum of the values of \( A \) divided by their number, will lie between

\[ \frac{1}{2}(b + a) = 2\sqrt{\frac{1}{8}(b^2 + ba + a^2)} - \frac{1}{2}(b + a)^2 \div \sqrt{h}. \]

128. This formula may be applied to the following question. Of the comets which have been observed since the year 240 of our era, the parabolic elements of 138 have been computed, and the mean inclination of their orbits to the ecliptic is found to be 45° 55′. Now, supposing every possible inclination of an orbit to be equally probable, let the probability demanded that the mean inclination of 138 orbits will not differ from 45° (the mean of the possible inclinations) more than 5° in excess or defect.

In this case the limits of the possible values of the phenomenon are 0 and 90°. We have therefore \( a = 0, b = 90°, h = 138 \), and the above limits of the error of the average, become 45° 55′ ± 90° ÷ √(6 × 138). In order that the limits may not exceed 5°, we have to determine \( r \) from the equation \( r \times 90° ÷ \sqrt{(6 \times 138)} = 5° \), which gives \( r = \sqrt{23} \); whence \( r = 1.6 \) very nearly. The tabular value of \( \phi \) corresponding to \( r = 1.6 \) is .97635, or nearly \( \frac{4}{5} \); and the odds are therefore 41 to 1 that on the supposition of all inclinations being equally probable, the mean inclination of 138 comets would fall between 45° 55′, that is, between 40° and 50°. The mean of the inclinations actually computed falls within those limits (being 48° 45′); there is therefore a very great probability that whatever may be the nature of the unknown causes which determine the positions of the cometary orbits, it is not such as to render different inclinations unequally probable.

If the question had been to assign the limits within which it is as probable that the mean of the inclinations will fall as not, we should have had \( a = \frac{1}{2} \), and consequently (from the table) \( r = 476936 \), and the limits would have been 45° 55′ ± 90° ÷ √(6 × 138), which is found on calculation to be 45° ± 1° 5′. On the supposition, therefore, that all inclinations are equally probable, it is one to one that the mean of 138 inclinations will fall between 43° 5′ and 46° 5′, or at least not exceed these limits.

129. On the same hypothesis of an equal probability of all possible values, if we suppose the mean value of \( A \) to be 0, we have then \( a = -b \), and \( \phi x \) becomes 1 - \( \frac{1}{2}a \), whence the limits corresponding to a given value of \( \phi \) (127) become \( 0 = \pm 2r \div \sqrt{6h} \). Let \( \phi = \frac{1}{2} \), whence \( r = 476936 \), and suppose \( h = 600 \). With these values the limits become 0 ± 0.016 nearly; that is to say, it is an even wager that the average of 600 observations will not differ from the true mean value of \( A \) more than the sixteen-thousandth part of \( a \) or \( b \), what is the greatest possible difference.

130. As a second hypothesis, suppose the chance of a given value of \( A \) to decrease uniformly as the magnitude increases from 0 to \( \pm a \); then \( \phi x \) will be found as follows: Let \( \phi x = 0 \) when \( a = 0 \); we have then by the hypothesis \( \phi x = B(a - x); a \), whence \( \phi x = (a - x) \div b + a \), consequently \( \int \phi x dx = ax - bx^2 + 2a \), which, from \( x = 0 \) to \( x = +a \), becomes \( \frac{1}{2}ba \). But \( \int \phi x dx \) from \( x = -a \) to \( x = +a \) is 1 (errors beyond those limits being supposed impossible), therefore from \( x = 0 \) to \( x = +a \), \( \int \phi x dx = \frac{1}{2} \), and consequently \( \frac{1}{2}ba = \frac{1}{2} \), or \( b = 1 \div a \). Hence \( \phi x = (a - x) \div a \), from which the value of \( c \) is easily deduced, that of \( h \) being 0, as in the former case.

131. Although the function \( \phi x \) which represents the law of facility of the different values of \( A \) is in general unknown, its form may be assigned if we assume that it is subject to certain conditions, which, from the nature of the thing, must be very nearly, if not absolutely true, in most practical cases: 1st, That the chance of an error diminishes as the magnitude of the error increases, and for errors beyond a certain limit vanishes altogether; and, 2d, that positive and negative errors, of equal magnitude, are equally probable. The last condition is equivalent to the assumption that the average of the observed values is the true mean value. For simplification, we suppose the chance of an error of a given magnitude to remain constant in all the trials.

132. Let \( x, x', x'' \), &c., be a series of values of \( A \), the sum of which is \( s \), and the number \( h \), and make \( m = \frac{s}{h} \), then \( m \) is the arithmetical mean or average, which by hypothesis is the true value of the phenomenon \( A \). Let \( x - m = \Delta, x' - m = \Delta', x'' - m = \Delta'' \), &c., so that \( \Delta, \Delta', \Delta'' \), &c., are the errors of \( x, x', x'' \), &c. Now, the most probable single error is 0; and the probability of obtaining an error of a given magnitude \( \Delta \) in any observation is obviously the same as that of obtaining a given value of \( x \); therefore \( \phi(\Delta - m) = \phi \Delta \); so that \( \phi \Delta \) is the probability of a single error being exactly \( \Delta \). In like manner, the probability of an error \( \Delta = \phi \Delta \); and if we take \( P \) to denote the probability of a given system of errors, \( \Delta, \Delta', \Delta'' \), &c., then the errors being supposed independent of each other, we have

\[ P = \phi \Delta \cdot \phi \Delta' \cdot \phi \Delta'' \cdot \text{&c.} \]

Let this system be assumed to be the most probable result of the observations, then \( P \) is a maximum, and its differential co-efficient zero. Taking the logarithms of both sides of the equation, differentiating, and making \( d \log \phi \Delta = \phi \Delta d \Delta \), and \( dP = d \Delta = 0 \), we obtain

\[ 0 = \phi \Delta \cdot \phi \Delta' \cdot \phi \Delta'' \cdot \text{&c.} \]

an equation which may be otherwise written

\[ 0 = \frac{\phi \Delta}{\Delta} + \frac{\phi \Delta'}{\Delta'} + \frac{\phi \Delta''}{\Delta''} + \text{&c.} \]

This is the conditional equation of the most probable system of errors. But the hypothesis of the average being the true value, furnishes this other equation,

\[ 0 = (x - m) + (x' - m) + (x'' - m) + \text{&c.} \]

or, which is the same, \( 0 = \Delta + \Delta' + \Delta'' + \text{&c.} \); and on comparing this with the above conditional equation, it is evident that they can only be both true simultaneously on the supposition of \( \frac{\phi \Delta}{\Delta} = \frac{\phi \Delta'}{\Delta'} = \frac{\phi \Delta''}{\Delta''} = \text{&c.} \). Hence it follows that \( \phi \Delta \div \Delta \) is independent of any particular value of \( \Delta \), or is equal to a constant, which we shall call \( K \). We have then

\[ \frac{\phi \Delta}{\Delta} = \frac{d \log \phi \Delta}{d \Delta} = K. \]

The integral of this expression is \( \log \phi \Delta = \frac{1}{2}K\Delta^2 + \text{const.} \), which, making the last constant \( = \log H \), and passing to numbers, gives \( \phi \Delta = He^{\frac{1}{2}K\Delta^2} \). It now only remains to determine the two constants \( H \) and \( K \). With respect to \( K \), as we suppose the most probable value of \( \Delta \) to be 0, and that \( \phi \Delta \) diminishes as \( \Delta \) increases, it is obvious that \( K \) must be negative. Assume \( \frac{1}{2}K = -\gamma \), and the formula becomes \( \phi \Delta = He^{-\gamma \Delta^2} \). For the determination of \( H \) we have the equation \( \int \phi \Delta d\Delta = 1 \), the limits of the integral being \( -a \) and \( +a \), where \( a = \frac{1}{2}(b-a) \), \( a \) and \( b \) being the limiting values of \( x \). But it is to be observed, that as all values of \( \Delta \) exceeding the limits \( \pm a \) are supposed to be impossible, or at least to be so improbable that it is unnecessary to take account of them, the value of the integral \( \int \phi \Delta d\Delta \) from \( \Delta = -a \) to \( \Delta = +a \) will not be altered by extending the limits from \( -\infty \) to \( +\infty \). We have therefore

\[ \int_{-\infty}^{+\infty} \phi \Delta d\Delta = 1. \]

Let \( \Delta = \pm \sqrt{\gamma} \), then \( d\Delta = dt \div \sqrt{\gamma} \), and \( \phi \Delta = He^{-\gamma \Delta^2} = He^{-t^2} \), on substituting which in the last equation, and observing that from \( t = -\infty \) to \( t = +\infty \) we have \( \int e^{-t^2} dt = \sqrt{\pi} \) (96), we find \( (H \div \sqrt{\gamma}) \sqrt{\pi} = 1 \), and \( H = \sqrt{\gamma \div \pi} \). Whence, finally, \( \phi \Delta = \sqrt{\gamma \div \pi} e^{-\gamma \Delta^2} \).

133. The general properties of the function now found Probability may be illustrated by means of a curve line. Let \(aN\) be a curve of which \(a\Delta\) is the ordinate corresponding to the absciss \(\Delta\). Let \(AB\) be its axis, and \(MN\) its greatest ordinate. Suppose the origin to be placed at \(M\), draw \(PQ\), an ordinate at any point \(P\), and \(pq\) indefinitely near to \(PQ\), and make \(MB = a'\), \(MA = -a'\), \(MP = \Delta\), and \(PQ = \phi\Delta\); then as was shown in (124), \(PQ\) represents the element of the area, representing the chance of an error lying between \(\Delta\) and \(\Delta + d\Delta\), that is, of an error greater than \(MP\) but less than \(Mq\). Now, if \(\phi\Delta = \sqrt{(\gamma + \pi)}e^{-\gamma\Delta^2}\) the function will not be changed by changing \(\Delta\) into \(-\Delta\); therefore \(\phi\Delta = \phi(-\Delta)\), and the curve is symmetrical on both sides of \(MN\), as it obviously ought to be according to the hypothesis; for on making \(MP = MP\), then positive and negative errors of equal magnitude being equally probable, we must have \(PQ = PQ\). Again, since \(e^{-\gamma\Delta^2}\) diminishes rapidly as \(\Delta\) increases, the curve at a short distance from \(MN\) must approach very near to its axis \(AB\); but as the function only vanishes when \(\Delta\) is infinite, the curve will not meet the axis at any finite distance from \(MN\). This curve, therefore, can only represent approximately the law of facility, inasmuch as it is supposed that errors beyond a certain limit are impossible; but on account of the rapid diminution of the ordinate at a short distance from \(MN\), the chance of an error exceeding a small value of \(\Delta\), as \(MB\), becomes insensible. Hence the limits of the integrals in respect of \(\Delta\) may be extended without sensibly altering their values from \(\Delta = a'\) to \(\Delta = -\infty\).

134. It is now necessary to find the special quantities \(k\), \(k'\), and \(c\). Substituting \(\Delta\) for \(x\), and observing that as the law of the chances is here supposed to remain constant, we have \(k = k\), \(k' = k'\), the formula in (119) become \(k = \int_{-\infty}^{\infty} e^{-\gamma\Delta^2}d\Delta\), \(k' = \int_{-\infty}^{\infty} \phi\Delta e^{-\gamma\Delta^2}d\Delta\). Hence on making \(\phi\Delta = \sqrt{(\gamma + \pi)}e^{-\gamma\Delta^2}\), we have

\[ k = \frac{1}{\sqrt{\pi}} \int_{-\infty}^{\infty} e^{-\gamma\Delta^2}d\Delta = \frac{1}{\sqrt{\pi}} \int_{-\infty}^{\infty} e^{-\gamma\Delta^2}d\Delta = \frac{1}{\sqrt{\pi}} \int_{-\infty}^{\infty} e^{-\gamma\Delta^2}d\Delta, \]

When \(\Delta\) becomes infinite, this becomes 0, therefore from \(\Delta = -\infty\) to \(\Delta = +\infty\), \(k = 0\). This is an obvious consequence of the symmetry of the curve, for the centre of gravity is necessarily in the straight line \(MN\).

With respect to \(k'\) we may proceed thus. We have

\[ k' = \int_{-\infty}^{\infty} \phi\Delta e^{-\gamma\Delta^2}d\Delta = \sqrt{(\gamma + \pi)} \int_{-\infty}^{\infty} e^{-\gamma\Delta^2}d\Delta. \]

But from the principles of the differential calculus,

\[ \int_{-\infty}^{\infty} \Delta e^{-\gamma\Delta^2}d\Delta = -\frac{1}{2\gamma} e^{-\gamma\Delta^2} + \frac{1}{2\gamma} \int_{-\infty}^{\infty} e^{-\gamma\Delta^2}d\Delta. \]

Now, from \(\Delta = -\infty\) to \(\Delta = +\infty\), the term of this equation which is not under the sign of integration vanishes, and

\[ \int_{-\infty}^{\infty} \Delta e^{-\gamma\Delta^2}d\Delta = \sqrt{(\gamma + \pi)} (\text{from (96)}, \text{on substituting } t^2 \text{ for } \gamma\Delta^2), \text{ therefore } \int_{-\infty}^{\infty} \Delta e^{-\gamma\Delta^2}d\Delta = (1 + 2\gamma)\sqrt{(\pi + \gamma)}; \text{ and consequently } k' = 1 + 2\gamma. \]

In (119) we assumed \(c = \frac{1}{2}(k' - k)\); therefore in the present case \(c = \frac{1}{2}k'\), whence \(c = 1 + 4\gamma\), or \(\gamma = 1 + 4c\).

135. The expressions which have now been found for the function which represents the probability of an error, and the limits corresponding to an assigned degree of probability, are given in terms of the indeterminate constant \(\gamma\) (or \(c\)), which depends on the nature of the observation, and therefore, where instruments are requisite, on the goodness of the instrument and the skill of the observer. This constant is called by Laplace the modulus of the law of facility.

It cannot, in general, be assigned a priori; but if we assume that positive and negative departures from the mean are alike probable, which is the most plausible hypothesis the nature of the thing admits of, an approximation to its value, in respect of observations of a given kind, may be deduced with great probability from the results of a large series of observations of the same kind already made. We now proceed to give the analysis by which this is accomplished, following the method of Poisson. The approximation is carried only to quantities of the order \(1 + \frac{1}{h}\); terms having \(h\) for a divisor are neglected on account of their smallness, \(h\) being supposed a large number.

136. In the expression for \(Q\) in (119), suppose \(\psi = 2\), and consequently \(\psi = 2\), \(\psi = 2\), and write also \(z\) for \(z + \sqrt{(hc)}\); the equation then becomes

\[ Q = \frac{2}{\pi} \int_{-\infty}^{\infty} e^{-\psi^2} \cos(hkz - dz) \sin dz \cdot \frac{dz}{\theta} \]

\[ + \frac{2g}{\pi \sqrt{(hc)}} \int_{-\infty}^{\infty} e^{-\psi^2} \sin(hkz - dz) \sin dz \cdot \theta dz, \]

and \(Q\) is the probability that \(s\), the sum of the values of \(A\) given by all the observations, will lie between 0 and \(2\). If therefore, we suppose \(s\) to be variable, the differential of this expression taken with respect to \(s\), will express the infinitely small chance of the sum of the values being \(2\) exactly. Differentiating, and observing that if \(u\) and \(v\) denote any two arcs, the trigonometrical formulæ give

\[ \sin(u - v) \sin u + \cos(u - v) \cos v = \cos(2v - u), \]

\[ - \cos(u - v) \sin u + \sin(u - v) \cos v = \sin(2v - u), \]

we shall find

\[ \frac{dQ}{ds} = \frac{2\theta}{\pi} \int_{-\infty}^{\infty} e^{-\psi^2} \cos(2\theta - hkz) \frac{dz}{\theta} \]

\[ - \frac{2g\theta}{\pi \sqrt{(hc)}} \int_{-\infty}^{\infty} e^{-\psi^2} \sin(2\theta - hkz) \frac{dz}{\theta}. \]

Let \(t\) be a variable quantity, and assume \(2\theta = hk + 2tv\sqrt{(hc)}\), whence \(dz = dt\sqrt{(hc)}\), and let the corresponding value of \(dQ/ds\) be denoted by \(gd\); we shall have, on substituting these values, and replacing \(z\) by \(z + \sqrt{(hc)}\), \(gd = \frac{2dt}{\pi} \int_{-\infty}^{\infty} e^{-\psi^2} \cos(2\theta) \frac{dz}{\theta} - \frac{2gdt}{\pi \sqrt{(hc)}} \int_{-\infty}^{\infty} e^{-\psi^2} \sin(2\theta) \frac{dz}{\theta}.\)

The two integrals in this equation are found from the formula in (121). Writing \(2\theta\) for \(u\), that formula gives

\[ \int_{-\infty}^{\infty} e^{-\psi^2} \cos(2\theta) \frac{dz}{\theta} = \frac{1}{\sqrt{\pi}} e^{-\psi^2}; \]

and if this last equation be differentiated in respect of \(t\), three times in succession, the result will be

\[ \int_{-\infty}^{\infty} e^{-\psi^2} \sin(2\theta) \frac{dz}{\theta} = \frac{1}{\sqrt{\pi}} (3te^{-\psi^2} - 2t^2e^{-\psi^2}); \]

whence, if we make \(V = \frac{g}{2c\sqrt{(hc)}} (3te^{-\psi^2} - 2t^2e^{-\psi^2})\), we shall have

\[ gd = (1 + \sqrt{\pi})(1 - V)e^{-\psi^2}dt, \]

where \(V\) is a quantity containing only uneven powers of \(t\), and of the order \(1 + \frac{1}{h}\), so that when multiplied by another of the same order, the product will be of the order \(1 + \frac{1}{h}\), and will therefore be rejected in the present approximation. This value of \(gd\) is the probability that \(s\) will be precisely \(2\theta\) or \(hk + 2tv\sqrt{(hc)}\), or it is the infinitely small probability of the equation

\[ s = hk + 2tv\sqrt{(hc)}. \]

137. In order to apply this result to the determination of the probable limits in terms of observations actually Probability made, it is necessary to remark that the analysis by means of which it has been obtained is grounded on the very general supposition that the thing to be measured may be any function whatever of the quantity observed; for the infinitely small chance of a particular value of the function is evidently the same as that of the corresponding value of the quantity, and is consequently \( \phi dx \). Let \( X \) therefore be a function of \( x \), and let \( K, C, T \) be what \( h, c, t \) become when \( X \) is substituted for \( x \); the above equation then becomes

\[ 2X = hK + 2T\sqrt{hC}, \]

the symbol \( 2 \) including all the \( h \) values of \( X \); and the probability of this equation is an expression of the same form as that which is represented by \( qdt \).

138. Hitherto no restriction has been made with respect to \( \phi dx \); we now introduce the hypothesis that positive and negative departures from the mean of equal magnitude are equally probable, and consequently that the curve representing the law of facility is symmetrical, but shall suppose the chances of a particular value, or a particular error, to vary in the different trials. Let the origin be transferred to the centre of gravity, the absciss of which is \( -k \), and let \( x = k \), \( x' = k' \), etc. We have then by (132) \( \phi dx = \phi dx \),

\[ \int \phi dx = \int \phi dx = \int \phi dx = \int \phi dx, \]

the integration in respect of \( x \) being from \( -\infty \) to \( +\infty \). The special quantities \( k \) and \( k' \) then become \( k = (1 + h)\sum \Delta^2 \phi dx \Delta d \Delta \), hence

\[ c = (1 + h)\sum (\lambda_a - k)^2 - e'U', \]

(U being a quantity of the order \( 1 + \sqrt{h} \)); and the probability of this equation is

\[ q'dt = (1 + \sqrt{h})(1 - V')e^{-e't}dt', \]

where \( V' \) is a function containing only uneven powers of \( t' \), and of the order \( 1 + \sqrt{h} \).

In the equation (137) suppose \( x = x = \lambda_a \), and let \( e' \) and \( e'' \) be the corresponding values of \( T \) and \( C \), then since on this supposition \( K = h \), the equation becomes \( 2\lambda_a = h + 2e''\sqrt{he'} \), whence

\[ k = (1 + h)\sum \lambda_a - e''U', \]

(U' being of the order \( 1 + \sqrt{h} \)); and the probability of this equation is

\[ q'dt = (1 + \sqrt{h})(1 - V')e^{-e't}dt', \]

where \( V' \), like \( V' \) and \( V \), contains only uneven powers of \( t' \) and is of the order \( 1 + \sqrt{h} \).

140. The two equations (1) and (2) may be regarded as two distinct events, having the respective probabilities now assigned to them, and therefore the probability of their being true simultaneously is the product of their respective probabilities, and is accordingly (neglecting the product \( V'V'' \) which is a quantity divided by \( h \)),

\[ q'q''dt'dt'' = (1 + \sqrt{h})(1 - V' - V'')e^{-e't}e^{-e''t''}dt'dt''. \]

Let the value of \( k \) given by equation (2) be substituted in (1), and the expression now given will accordingly be the probability of the resulting equation, namely,

\[ c = \frac{1}{2h}(\lambda_a - \frac{1}{h}\lambda_a + e'U')^2 - e''U'. \]

Let \( m = (1 + h)\sum \lambda_a \), then \( m \) is the average or arithmetical mean of the observed values, and \( \lambda_a - m \) the reported error of the observation. The last equation will then become \( c = (1 + 2h)\sum (\lambda_a - m + e'U')^2 - e''U' \); or, rejecting \( (e'U')^2 \) which is of the order \( 1 + \sqrt{h} \),

\[ c = (1 + 2h)\sum (\lambda_a - m)^2 + 2(\lambda_a - m)e'U' - e''U'. \]

For the sake of abridging let us also assume

\[ \mu = (1 + h)\sum (\lambda_a - m)^2, v = (1 + h)\sum (\lambda_a - m)e', \]

so that \( \mu \) is the mean of the squares of the errors, or mean square of the errors, and the equation becomes

\[ c = \frac{1}{2}\mu + vU' - e''U', \]

(3)

the probability of which is \( q'q''dt'dt'' \).

141. Now, by (121), we have the probability \( \Theta \) that \( x = k \), or \( x = k' \), \( x = k'' \), etc., the arithmetical mean of the observed values of \( A \), will fall within the limits \( k - 2\sqrt{\mu} \) and \( k + 2\sqrt{\mu} \). Substituting in those limits the above value of \( c \), and observing that \( (\frac{1}{2} + \sqrt{\mu})U' - e''U' \) is \( \frac{1}{2}(\mu) + N(vU' - e''U') + \text{etc.} \), and that \( U' \) and \( U'' \) being of the order \( 1 + \sqrt{h} \), when divided again by \( \sqrt{h} \) are to be rejected, the limits become

\[ k - 2\sqrt{\mu} \pm \sqrt{\mu} + h, \quad k + 2\sqrt{\mu} \pm \sqrt{\mu} + h, \]

and the probability of these being the true limits is \( \Theta \) multiplied into the probability of the equation \( c = \frac{1}{2}\mu + vU' - e''U' \); and is therefore (140)

\[ (1 + \pi)\Theta(1 - V' - V'')e^{-e't}e^{-e''t}dt'dt''. \]

142. The expression now obtained is the infinitely small probability of the limits \( k - 2\sqrt{\mu} \pm \sqrt{\mu} + h \) of the average \( m \), in respect of the particular value of \( s \), for which we have deduced the equation (3). But for every value of \( s \) between the limits 0 and 2\( \pi \), there will be an equation corresponding to (3); therefore, in order to have the whole probability of those limits, the integral of the expression must be found for all values of \( t \) and \( t' \). From the nature of the expressions \( e^{-e't} \) and \( e^{-e''t} \) as well as the consideration that errors beyond a certain magnitude, though possible, are wholly improbable, it is evident that the integration may be extended without sensible error from \( -\infty \) to \( +\infty \); and since the functions \( V' \) and \( V'' \) contain only uneven powers of \( t' \) and \( t'' \), the terms into which they enter, disappear in the integrations between those limits (See Lacroix, Calcul Différentiel et Integral, tom. iii. p. 506). Now, from \( t' = -\infty \) to \( t' = +\infty \) we have \( (96)fe^{-e't}dt' = \sqrt{\pi} \); and \( fe^{-e''t}dt'' = \sqrt{\pi} \); therefore

\[ \frac{1}{\pi}\Theta(1 - V' - V'')e^{-e't}e^{-e''t}dt'dt'' = 0. \]

The result of the preceding analysis is therefore that on the hypothesis of positive and negative errors of equal magnitude being equally probable, and on rejecting terms divided by \( h \) (the number of the observations may be always so great as to render such terms insensible), we may substitute \( \frac{1}{2}\mu \) for \( c \) in the limits of the error to be apprehended, without sensibly altering the probability, and consequently there is the probability \( \Theta = \frac{2}{\sqrt{\pi}}\int_0^\infty e^{-e't}dt \) that the true mean value \( k \) of the phenomenon \( A \) will lie between the limits \( m - 2\sqrt{\mu} \pm \sqrt{\mu} + h \), or \( m + 2\sqrt{\mu} \pm \sqrt{\mu} + h \), which contain only quantities given by observation.

On this hypothesis we have also (138) \( c = (1 + 2h)\sum \Delta^2 \phi dx \Delta d \Delta \), therefore \( \mu = \int \Delta^2 \phi dx \Delta d \Delta \); that is to say, the mean of the squares of the actual errors may be taken for the sum of the squares of the possible errors multiplied by their respective Probability probabilities. It is important to remark that as the observations become more numerous, the quantity \( \mu \), the mean of the squares of the errors, converges more and more to a constant quantity, and finally becomes independent of the number of observations.

143. The limits now found may be otherwise expressed. By hypothesis, \( m = (1 + h) \sum \lambda_i - \text{the arithmetical mean of the observed values, and } p = (1 + h) \sum (\lambda_i - m)^2 = \text{the mean of the squares of the repeated errors.} \)

Now \( (\lambda_i - m)^2 = \lambda_i^2 - 2\lambda_i m + m^2 \); and \( (1 + h) \sum \lambda_i = 2m(1 + h) \sum \lambda_i = 2m^2 \); therefore \( p = (1 + h) \sum (\lambda_i - m)^2 = m^2 \), that is to say, the mean of the squares of the observations minus the square of the mean. Hence the limits, corresponding to a given probability \( \Theta \), of the difference between the average of all the observations and the true value, are expressed by either of these formulae:

\[ \pm \sqrt{\left\{ \frac{2}{n} \times \text{mean square of errors} + h \right\}} \]

\( \pm \sqrt{\left\{ \frac{2}{n} \times \text{mean square of obs.} - (\text{mean of obs.})^2 \right\}} + \sqrt{h} \),

\( h \) being the number of observations, and the relation between \( \Theta \) and \( r \) being given by the table. Generally speaking, the first of these formulae is the most convenient for calculation.

144. Let \( l \) be the limit of the error to be feared in taking the average of the observations as the true result; then \( l = \sqrt{(2\mu + h)} \), and \( r = l/\sqrt{(h + 2\mu)} \). Now when \( r \) is constant, that is, for a given probability \( \Theta \), the determination will be more exact in proportion as \( l \) is a smaller number, and the precision will therefore be proportional to \( \sqrt{(h + 2\mu)} \).

Hence \( \sqrt{(h + 2\mu)} \) is called by Gauss the measure of the precision of the determination. Suppose two series of observations to have been made for the determination of an element, the comparative accuracy of the results will depend on two things, the number of observations in each series, and the amount of the squares of the errors in each. If the number of observations is the same in both series, the precision of each result will be inversely as the square root of the sum of the squares of the errors, and the presumption of accuracy is in favour of that result with respect to which the sum of the squares of the errors is less than in the other. On the other hand, if the mean square of the errors is the same in both series, then the observations are alike good in both, and their relative values of the two results are directly as the square roots of the number of observations in each series. Hence, in order that one determination may be twice as good as another, it must be founded on four times the number of equally good observations. These considerations are very important, in comparing tables of mean values of whatever kind, for example, of the probabilities of life at the different ages, and in estimating risks which depend upon them.

145. Astronomers employ the terms, weight, probable error, and mean error, of a result, to denote certain functions of \( \mu \), the mean square of the errors. The square of the quantity which measures the precision of the result, is called the weight of the determination. Denoting the weight by \( w \), we have therefore

\[ w = h + 2\mu = h^2 + 2\sum(\lambda_i - m)^2, \]

or the weight is equal to the square of the number of observations divided by twice the sum of the squares of the errors. Substituting this in the expression of the limits, we have \( l = r + \sqrt{w} \), and \( r = l/\sqrt{w} \); that is to say, for a given probability \( \Theta \), the limits of the error to be apprehended in taking the average as the true result are reciprocally proportional to the square root of the weight. When observations of different kinds, or results deduced from observation, are compared with each other, their relative weights (supposing the number of observations the same) are inversely as \( \mu \), and are expressed numerically by taking the weight of a certain series of observations as the unit of weight.

146. The probable error of the determination is that which corresponds to the probability \( \Theta = \frac{1}{2} \). For \( \Theta = \frac{1}{2} \) we have \( r = \sqrt{476936} \); whence \( \sqrt{2} = 674489 \), and the formula \( m = \sqrt{(2\mu + h)} \) becomes \( m = 674489 \sqrt{(\mu + h)} \); whence the probable error \( = 674489 \sqrt{(\mu + h)} \).

147. The mean error of the result of a large number of observations may be deduced from the general formula in (136) as follows. That formula gives \( qdt = (1 + \sqrt{x}) (1 - V)e^{-ct}dt \) for the probability that the sum of the observed values will be \( 2b = hk + 2t \sqrt{(ke)} \) exactly. Dividing the sum by \( b \), \( qdt \) is also the probability that the average value given by all the observations will be exactly \( k + 2t \sqrt{(e + h)} \). Now, on the hypothesis that positive and negative departures from the mean are equally probable, and supposing the origin of the co-ordinates to be transferred to the centre of gravity of the curve of mean probability, we have \( k = 0 \), and \( qdt = (1 + \sqrt{x}) (1 - V)e^{-ct}dt \) is the infinitely small chance of the average error being \( 2t \sqrt{(e + h)} \) exactly. Multiplying therefore this error into the chance of its taking place, and integrating the product from \( t = 0 \) to \( t = \infty \), we shall have the mean error, or mean risk of all the possible average errors affected with the positive sign. Now, observing that \( V \) represents a quantity divided by \( \sqrt{h} \), and therefore when multiplied by \( 2t \sqrt{(e + h)} \) becomes of the order \( 1 + h \), and may consequently be rejected, the product of the average error \( 2t \sqrt{(e + h)} \) into its probability is \( 2\sqrt{(e + h)} \times te^{-ct}dt \); and since \( \int te^{-ct}dt = \frac{1}{c}te^{-ct} - \frac{1}{c^2}e^{-ct} \), which from \( t = 0 \) to \( t = \infty \) becomes simply \( \frac{1}{c} \), the integral of the above product from \( t = 0 \) to \( t = \infty \) is \( \sqrt{(e + h)} \). Substituting for \( c \) its value (142) \( = \frac{1}{2}h \), this result becomes \( \sqrt{(h + 2\mu)} \); whence on computing \( \sqrt{(1 + 2\mu)} \) we obtain

\[ \text{mean error of series} = 398942 \sqrt{(\mu + h)}. \]

This is the mean error or mean risk in respect of positive errors alone, or on the supposition that negative errors are not taken into account. But as positive and negative errors are equally likely, the mean error in respect of negative errors is the same quantity, whence the mean error in respect of errors of both kinds is \( .797884 \sqrt{(\mu + h)} \). This is usually called the average error. The mean error differs from the probable error in this respect, that it depends on the magnitude of individual errors, as well as on the proportion in which errors of different magnitudes occur. The probable error is independent of the magnitude.

148. When the quantity \( \mu \) (the mean square of the errors) has been found from a series of observations, the precision, weight, probable error, and mean error, of a coming observation of the same kind are found by supposing \( h = 1 \) in the above expressions, and are respectively

| Precision | \( = \sqrt{(1 + 2\mu)} \) | |-----------|--------------------------| | Weight | \( = 1 + 2\mu \) | | Probable error | \( = 674489 \sqrt{\mu} \) | | Mean error | \( = 398942 \sqrt{\mu} \) |

149. The preceding formulae give the limits of the error to be feared in determining the value of a quantity from a series of observations, when the thing to be determined is that on which the observations are immediately made. We have now to apply the formulae to the cases in which the quantity sought is not observed itself, but is a function of several others, which are separately determined by observation. The following problem is important:

Let \( u \) be a given function of a number of unknown quantities, \( x, x', x'' \), &c.; it is required to assign the limits of the probable error in the determination of \( U \), and the weight of the result, when values of \( x, x', x'' \), found from observations independent of each other, and respectively affected with the probable errors \( \sqrt{\mu}, \sqrt{\mu'}, \sqrt{\mu''}, \) &c. (\( \mu = 674489 \)) are adopted instead of the true but unknown values of those quantities.

Let \( u = f(x, x', x'', \ldots) \) be the given function, \( \lambda, \lambda', \lambda'' \), &c. observed values of \( x, x', x'' \), &c. and make \( \lambda = x = 6 \). Probability \( \lambda' - x = e' \), \( \lambda'' - x'' = e'' \), &c., so that \( e, e', e'' \), &c., are the errors of observation, supposed to be so small that their squares may be rejected. Make \( \frac{du}{dx} = a, \frac{du}{dx'} = a', \frac{du}{dx''} = a'' \), &c., then \( a, a', a'' \), are given quantities; and on substituting \( x + e, x' + e', x'' + e'' \) for \( x, x', x'' \), respectively, in the equation \( u = f(x, x', x'', \&c.) \), and supposing \( u \) to become \( u + E \) when the substitutions are made, so that \( E \) is the corresponding error of \( u \), we have, on expanding \( u \) by Taylor's theorem,

\[ E = ae + ae' + ae'' + \&c. \]

in respect of a single observation of each of the quantities. Taking the square of both sides of the equation, we have

\[ E^2 = a^2e^2 + a'^2e'^2 + a''^2e''^2 + \&c. \]

Now since positive and negative errors are supposed equally probable, the sums of the products \( ee', ee'', ee''', \&c. \) or their mean values, become each \( = 0 \); therefore

\[ \Sigma E^2 = a^2\Sigma e^2 + a'^2\Sigma e'^2 + a''^2\Sigma e''^2 + \&c. \]

Taking the mean value of each of these sums, and observing that \( \mu \) the mean value of \( \Sigma e^2 \) is independent of the number of observations (142), and assuming \( M \) to be the mean value of \( \Sigma E^2 \), we get

\[ M = a^2\mu + a'^2\mu' + a''^2\mu'' + \&c. \]

This equation contains the solution of the problem, for all the functions of the error are given in terms of \( M \). The probable error is \( 674489\sqrt{M} \).

150. Let \( W \) be the weight of the determination, and \( w, w', w'' \), &c., the weights corresponding to \( \mu, \mu', \mu'' \), &c., then by the definition of weight, \( w \) is reciprocally proportioned to \( \mu \), and \( W \) to \( M \); and we have by substitution,

\[ W = \frac{a^2}{w} + \frac{a'^2}{w'} + \frac{a''^2}{w''} + \&c. \]

If the weights are supposed all equal, this becomes

\[ W = \frac{a^2 + a'^2 + a''^2 + \&c.}{w}. \]

Suppose the errors \( e, e', e'' \), &c., to be respectively multiplied by numbers proportional to the square roots of the weights, (which is equivalent to supposing all the observations to have the same degree of precision measured by \( \sqrt{\mu w} \)), then the value of \( M \) becomes

\[ M = a^2\mu w + a'^2\mu' w' + a''^2\mu'' w'' + \&c. \]

But \( w \) being reciprocally as \( \mu \), we have \( \mu w = \mu' w' = \mu'' w'' \), &c., \( = 1 \), therefore

\[ W = \frac{1}{a^2 + a'^2 + a''^2 + \&c.}. \]

**SECT. X. OF THE METHOD OF LEAST SQUARES.**

151. In the determination of astronomical and physical elements from the data of observation, the thing which is actually observed is for the most part not the element which is sought to be determined, but a known function of that element. Thus, if \( V \) be a given function of \( X \) determined by the equation \( V = F(X) \), the quantity observed may be a value of \( V \), whilst the element sought to be determined is \( X \). If the observation could give the value of \( V \) with absolute accuracy, then \( X \) would also be absolutely known; but as all observations are affected with certain errors of greater or less amount, owing to the imperfections of instruments or of sense, or the ever varying circumstances under which they are made, an exact value of \( X \) cannot be found from any single observation; and in order to obtain the utmost precision, it is necessary to employ a great number of observations, repeated under every variety of circumstance by which the result can be supposed to be affected.

152. The observed quantity \( V \), instead of being a function of a single element \( X \), may be a function of several elements \( X, Y, Z, \&c.; \) for example, \( V \) may be the position of a planet, in which case it is a function of the six elements of the orbit, for the determination of which the observation is made. Each observation gives rise to an equation of this form, \( V = F(X, Y, Z, \&c.) \); therefore when the number of equations is just equal to the number of unknown quantities, the problem is determinate; and supposing \( F \) to be an algebraic function, the values of \( X, Y, Z, \&c. \) may be found by the ordinary methods of elimination. If the number of equations is less than the number of unknown quantities, the problem is indeterminate; but if greater, it may be said to be more than determinate, inasmuch as the equations may be combined in an infinite number of ways, each distinct combination giving a different value of the elements. It therefore becomes a question of the utmost importance to the perfection of the sciences of observation, to assign the particular combination which gives the most advantageous results, or values of \( X, Y, Z, \&c. \) affected with the smallest probable errors.

153. As approximate values of the elements are in all cases either already known, or can be easily found, the object of accumulating observations is the correction of the approximate values. Let \( V \) be the true value of the thing observed, \( V_s \) an approximate value, however found, \( X \) the true value of the element sought, \( X_s \) an approximate value, corresponding to \( V_s \), so that we have the two equations \( V = F(X) \), \( V_s = F(X_s) \); also, let the observed value of \( V \) in any observation be \( L \), and make

\[ V = V_s + \frac{dV_s}{dX} \cdot x, \]

then \( v \) is the true but unknown error of the observation, and \( l \) its reputed error, that is to say, the difference between the computed value of the function and the result of the observation. Now if we assume \( x \) to represent the true correction of the approximate element, so that \( X = X_s + x \), on substituting \( X_s + x \) for \( X \) in the function \( F \), we get \( V = F(X_s + x) \); whence, expanding the function by Taylor's theorem, and rejecting terms multiplied by \( x^2 \) and higher powers of \( x \), because \( x \) is a very small quantity

\[ V = V_s + \frac{dV_s}{dX} \cdot x. \]

Let us now denote the differential coefficient, which is a known quantity, by \( a \); then, observing that \( V = V_s + l \), the equation becomes \( v = l + ax \); that is to say, the true error of the observation is a linear function of the correction of the element.

154. In like manner, when there are several elements, \( X, Y, Z, \&c.; \) on making \( \frac{dV_s}{dX} = a, \frac{dV_s}{dY} = b, \frac{dV_s}{dZ} = c, \&c. \) a single observation furnishes the equation

\[ v = l + ax + by + cz + \&c., \]

and a series of observations, whose errors are respectively \( e, e', e'' \), &c., gives a system of linear equations equal in number to the number of observations; namely,

\[ \begin{align*} v &= l + ax + by + cz + \&c. \\ e' &= l' + a'x + b'y + c'z + \&c. \\ e'' &= l'' + a''x + b''y + c''z + \&c. \\ &\vdots \end{align*} \]

(1)

and the object is to give such values to \( x, y, z, \&c. \) that the errors \( e, e', e'' \), &c., in respect of the whole of the observations, shall be the least possible. The equations being supposed independent of each other, if their number is just equal to that of the unknown quantities, the errors \( e, e', e'' \), &c., can be made all zero; but if, as is usually the case, there are more equations than unknown quantities, it is impossible by any means whatever to annihilate the whole of them, and therefore all that can be accomplished is to Probability find the system of values of \(x, y, z, \&c.\) which most nearly, and with the greatest probability, satisfies the whole of the equations. If the observations are not all equally good, the equations are supposed to be each multiplied by a number proportional to the square root of the presumed weight of the observation on which it depends, in order that they may all have the same degree of precision.

155. As the question is to find the most probable values of \(x, y, z, \&c.\) the first thing necessary is to express each of these elements in terms of the observations. Suppose \(k, k', k'', \&c.\) to be a system of indeterminate quantities, independent of \(x, y, z, \&c.\) and let the first of the above conditional equations be multiplied by \(h\), the second by \(k'\), the third by \(k''\), and so on; then adding the products, if \(k, k', k'', \&c.\) be determined so as to make the coefficient of \(x\) equal to unit, and those of \(y, z, \&c.\) each equal to 0; that is to say, so as to satisfy the equations

\[ ka + ka' + ka'' + \&c. = 1 \\ kb + kb' + kb'' + \&c. = 0 \\ kc + kc' + kc'' + \&c. = 0 \\ \]

we shall then have \(x = K + ke + ke' + ke'' + \&c.\) where \(K\) is a quantity independent of \(v, v', v'', \&c.\). Hence \(x\) is found \(= K\), with an error \(= ke + ke' + ke'' + \&c.\); and the weight of the determination, by the formula in (150), is

\[ \frac{1}{k^2 + k'^2 + k''^2 + \&c.} \]

The weight of the determination is consequently greater in proportion as \(k^2 + k'^2 + k''^2 + \&c.\) is smaller; and hence of all the possible systems of indeterminate coefficients \(k, k', k'', \&c.\) which satisfy the equations (2), the system which gives the most probable value of \(x\), or the most advantageous result, is that for which \(k^2 + k'^2 + k''^2 + \&c.\) is an absolute minimum.

156. We have now to find, in terms of known quantities, values of the indeterminate coefficients \(k, k', k'', \&c.\) which satisfy the condition of the minimum. For the sake of abridging, let us denote the aggregate of the products \(aa' + aa'' + aa'' + \&c.\) by \(S(aa)\), that of \(ab + ab' + ab'' + \&c.\) by \(S(ab)\), and so on, and also assume

\[ \xi = av + av' + av'' + \&c. \\ \eta = bv + bv' + bv'' + \&c. \\ \zeta = cv + cv' + cv'' + \&c. \\ \]

On substituting in these equations the values of \(v, v', v'', \&c.\) given by the equations (1), there results

\[ \xi = S(aa) + xS(aa') + yS(ab) + zS(ac) + \&c. \\ \eta = S(bb) + xS(ab') + yS(bb') + zS(bc) + \&c. \\ \zeta = S(cc) + xS(ac') + yS(bc') + zS(cc') + \&c. \\ \]

a system of equations equal in number to the number of elements \(x, y, z, \&c.\) and from which, consequently, those elements would be determined absolutely if the observations were perfectly exact, that is, if the errors \(v, v', v'', \&c.\) were individually zero, and consequently \(\xi, \eta, \zeta, \&c.\) were each zero. On eliminating \(y, z, \&c.\) from the last system, the value of \(x\) is given in terms of \(\xi, \eta, \zeta, \&c.\) and known quantities by a linear equation of the following form:

\[ x = A + f + g + h + \&c. \\ \]

where \(f, g, h, \&c.\) are co-efficients independent of \(x, y, z, \&c.\) and also of \(\xi, \eta, \zeta, \&c.\).

If we now substitute in equation (5) the values \(\xi, \eta, \zeta, \&c.\) given by equations (3), and also assume

\[ a = fa + gb + hc + \&c. \\ a' = fa' + gb' + hc' + \&c. \\ a'' = fa'' + gb'' + hc'' + \&c. \\ \]

we shall have by addition

\[ x = A + ax + ax' + ax'' + \&c. \\ \]

whence it appears that \(a, a', a'', \&c.\) are a system of multipliers by which \(y, z, \&c.\) are eliminated from equations (1); they must therefore satisfy the equations (2), whence

\[ aa' + aa'' + aa'' + \&c. = 1 \\ ab + ab' + ab'' + \&c. = 0 \\ ac + ac' + ac'' + \&c. = 0 \\ \]

Subtracting these from the equations (2) we obtain

\[ 0 = (k-a)a + (k'-a)a' + (k''-a)a'' + \&c. \\ 0 = (k-a)b + (k'-a)b' + (k''-a)b'' + \&c. \\ 0 = (k-a)c + (k'-a)c' + (k''-a)c'' + \&c. \\ \]

on multiplying which respectively by \(f, g, h,\) and adding the products, we get by reason of the equations (6),

\[ 0 = (k-a)a + (k'-a)a' + (k''-a)a'' + \&c. \\ \]

This equation may be put under the form \(k^2 + k'^2 + k''^2 + \&c.\)

\[ = a^2 + a'^2 + a''^2 + \&c. + (k-a)^2 + (k'-a)^2 + (k''-a)^2 + \&c. \\ \]

from which it is evident that \(k^2 + k'^2 + k''^2 + \&c.\) will be a minimum when \(k = a, k' = a', k'' = a''.\) Hence it follows that the most probable value of \(x\) which can be deduced from the equations (1), is \(x = A;\) and by (150) the weight of the determination is \(1 + (aa' + aa'' + aa'' + \&c.) = 1 + S(aa).\)

This quantity \(S(aa)\) is equal to the co-efficient of \(\xi\) in the equation (5); for on multiplying the first of equations (7), by \(f,\) the second by \(g,\) and the third by \(h,\) and adding the products, we obtain by reason of equations (6),

\[ aa' + aa'' + aa'' + \&c. = f. \\ \]

157. The method explained in the two last paragraphs of determining the most advantageous combination of a system of linear equations, of the form of those in (154), is given by Gauss in his *Theoria Combinationis Observationum erroris mininis obnoxiorum*, (Gottingen, 1823). The practical rule to which it leads is as follows: Having given a near value \(V\) of a function of several elements, \(X, Y, Z, \&c.\) and also a series \(L, L', L'', \&c.\) of observed values of \(V,\) make \((V-L)/\sqrt{w} = v,\) \((V-L')/\sqrt{w'} = v',\) \((V-L'')/\sqrt{w''} = v'', \&c.\) and form the equations in (1). From these equations (4) are easily deduced; and from these, again, by elimination, are found the values of \(x, y, z, \&c.\) the corrections of the approximate elements \(X, Y, Z, \&c.\) in equations of the form (5), which, for the sake of symmetry, may be thus written:

\[ x = A + (aa)\xi + (ab)\eta + (ac)\zeta + \&c. \\ y = B + (ba)\xi + (bb)\eta + (bc)\zeta + \&c. \\ z = C + (ca)\xi + (cb)\eta + (cc)\zeta + \&c. \\ \]

then the most probable values of \(x, y, z, \&c.\) are respectively \(A, B, C, \&c.;\) the weights of the determinations respectively

\[ \frac{1}{(aa)}, \frac{1}{(bb)}, \frac{1}{(cc)}, \&c.; \text{and the probable errors of the several determinations are } \rho(aa), \rho(bb), \rho(cc), \&c., \\ \]

where \(p = 476936.\)

158. The values of \(x, y, z, \&c.\) now deduced are obtained immediately, by supposing the sum of the squares of the errors of observation to be a minimum. Thus, forming the squares of the equations (1), and making \(Q = v^2 + v'^2 + v''^2 + \&c.,\) the differentiation of \(Q\) in respect of each of the variables \(x, y, z, \&c.\) produces the quantities denoted in (156), by \(\xi, \eta, \zeta, \&c.\) that is to say, it gives

\[ \frac{dQ}{dx} = 2\xi, \frac{dQ}{dy} = 2\eta, \frac{dQ}{dz} = 2\zeta, \&c. \\ \]

therefore if \(Q\) be a minimum, \(\xi, \eta, \zeta\) become severally zero, and the equations (4) give by elimination, \(x = A,\) \(y = B,\) \(z = C,\) where \(A, B,\) and \(C\) denote the same quantities as above. Now from equation (5) the general value of \(x\) is

\[ x = A + (aa)\xi + (ab)\eta + (ac)\zeta. \\ \] Probability and the most probable value being \( x = A \), it follows that the most probable values of the corrections \( x, y, z \) are found by making the differential coefficients of \( A \) equal to zero, that is, by making \( x^2 + y^2 + z^2 + \ldots \) an absolute minimum. Hence this method of combining equations of condition is called the method of least squares; and it follows from the preceding analysis, that it gives the most probable values of the corrections, or the most advantageous results.

159. As an example, let us suppose there is only one unknown element \( X \), of which \( X_0 \) is known to be an approximate value, and \( L, L', L'', \ldots \) are observed values, the weights of which are respectively proportional to \( w, w', w'' \), etc., and that it is required to determine the most probable value of \( X \) from the observations, and also the weight of the determination. Make \( (X - L)\sqrt{w} = (X_0 - L)\sqrt{w} = l \), and let \( x \) be the correction of \( X \), so that \( X = X_0 - x \). On substituting this in \( (X - L)\sqrt{w} = r \), we have \( (X_0 - x - L)\sqrt{w} = r \), or \( v = l\sqrt{w} - x\sqrt{w} \). Each observation gives a similar equation, and the equations (1) in (154) consequently become

\[ v = l\sqrt{w} - x\sqrt{w} \]

\[ v' = l'\sqrt{w'} - x\sqrt{w'} \]

\[ v'' = l''\sqrt{w''} - x\sqrt{w''} \]

etc.

therefore, multiplying each by the coefficient of its own \( x \), we have \( v = S(w) - xS(w) \), whence consequently \( x = S(w) - S(w) - \xi \div S(w) \), that is to say, the most probable value of \( x \) is

\[ x = \frac{lw + l'w' + l''w'' + \ldots}{w + w' + w'' + \ldots} \]

and the weight of the determination is proportional to the reciprocal of \( w + w' + w'' + \ldots \).

Since \( X = L - x - l \), we have also

\[ X = \frac{Lw + L'w' + L''w'' + \ldots}{w + w' + w'' + \ldots} \]

whence this proposition: If a series of values of an element are found from observations which have not all the same degree of precision, the most probable value of the element is found by multiplying each observation by a number proportional to its weight, and dividing the sum of the products by the sum of the weights; and the comparative weight of the result is unit divided by the sum of all the weights.

If the weights be all equal, and the number of the observations be \( n \), then \( X = (L + L' + L'' + \ldots)/n \); that is to say, the average of a series of equally good observations gives the most probable value. The average may, therefore be considered as a particular case of the method of least squares.

160. To illustrate the method of proceeding when there are several elements to be corrected from the observations, we shall take the following numerical example from Gauss (Theoria Motus). Suppose there are three elements, and that three observations, of equal weight, have given the equations \( x - y + 2z = 3, 3x + 2y - 5z = 5, 4x + y + 4z = 21 \); and that a fourth observation, of which the relative weight is one-fourth, or its precision one-half, of that of the others has given \( -2x + 6y + 6z = 28 \). The first step is to reduce this last equation to the same standard of weight with the others, for which purpose it must be multiplied by \( \frac{1}{2} \); it then becomes \( -x + 3y + 3z = 14 \). Now, as \( x, y, \) and \( z \) cannot be determined so as to satisfy four independent equations, we suppose each observation, or equation, to be affected with an error \( e_i \) and accordingly obtain the following system of equations, corresponding to equations (1), viz.:

\[ e_1 = -3 + x - y + 2z \]

\[ e_2 = -5 + 3x + 2y - 5z \]

\[ e_3 = -21 + 4x + y + 4z \]

\[ e_4 = -14 - x + 37 + 3z, \]

from which the most probable values of \( x, y, \) and \( z \) are to be deduced. Let each equation be multiplied by the coefficient of its own \( x \), taken with its proper sign, namely, the first by 1, the second by 3, the third by 4, and the fourth by \( -1 \); the results added together give the value of \( \xi \), namely, \( \xi = -88 + 27x + 6y \). In like manner, let the first be multiplied by \( -1 \), the second by 2, the third by 1, and the fourth by 3, the sum of the products will give \( \eta \). Lastly, let the equations be multiplied respectively by the coefficients of \( z \), and the sum of the products made equal to \( \zeta \); we have then the following equations corresponding to the equations (4)

\[ \xi = -88 + 27x + 6y + 0 \]

\[ \eta = -70 + 6x + 15y + z \]

\[ \zeta = -107 + 0 + 4y + 5z. \]

From these we get by elimination

\[ 19859x = 49154 + 8096 - 3249 + 6\xi \]

\[ 737y = 2617 - 12z + 5y - \zeta \]

\[ 39798z = 76242 + 12z - 54y + 1473\xi, \]

whence (157) \( A, B, C \), the most probable values of \( x, y, z \), are respectively

\[ A = \frac{49154}{19859}, B = \frac{2617}{737}, C = \frac{76242}{39798}, \]

and the relative weights \( w, w', w'' \), are respectively

\[ w = \frac{19859}{809} = 246, w' = \frac{737}{54} = 136, w'' = \frac{39798}{1473} = 270, \]

whence the probable errors (\( 476936 + \sqrt{w} \)) are respectively

\[ -095, -129, -092. \]

The method of least squares, to which modern astronomy is indebted for much of its precision, was first proposed by Legendre, in his Nouvelles Méthodes pour la Determination des Orbites des Comètes, (Paris, 1806,) merely as means of avoiding inconvenience and uncertainty arising from the want of a uniform and determinate method of combining numerous equations of condition, and without reference to the theory of probability. The same method, however, had previously been discovered by Gauss, and a demonstration of it, deduced from the general theory of chances, was given by him in his Theoria Motus, (1809.) It may be shown in various ways, that this method of combination gives values of the unknown quantities affected with the smallest probable errors; but it is to be observed, that all the demonstrations are subordinate to the hypothesis, that positive and negative errors of equal magnitude are equally probable, or that the average of a large number of results gives the most probable value, and consequently that the function which represents the probability of an error has the form assigned to it in (132).

The limits of this article will not permit us to enter into further details respecting the applications of the method of least squares. On the general theory of the probable errors of results deduced from observation, and the most advantageous methods of combining equations of condition, the reader may consult the Théorie Analytique des Probabilités de Laplace; the Theoria Motus of Gauss; the Theoria Combinationis Observationum, and the Supplementum Theoria Combinationis, &c. (Göttingen, 1828) of the same author; the Recherches sur la Probabilité des Juge-ments, with the two Memoirs of Poisson in the Connaissance des Temps for 1827 and 1832; and three masterly papers, by Mr Ivory, in the Philosophical Magazine for 1825. In the volumes of the Berliner Astronomisches Jahrbuch for 1833, 1834, and 1835, M. Encke has treated the subject at great length, and given a number of formulae calculated to facilitate the labours of the computer. We may also refer, in conclusion, to a remarkable disquisition on the theory of probable errors, by the celebrated astronomer Bessel, forming Nos. 358 and 359 of Schumacher's Astronomische Abhandlungen, Altona, October 1838. Probability Table of the Values of the Integral $\Theta = \frac{2}{\sqrt{\pi}} \int_{-\infty}^{\infty} e^{-t^2} dt$, for intervals each $=0.1$, from $t=0$ to Probability $=3$, with their first and second differences.

| Θ | Δ | Θ | Δ | Θ | Δ | Θ | Δ | |---|---|---|---|---|---|---|---| | 0.00 | 0.000000 | 0.01 | 0.012833 | 0.02 | 0.025644 | 0.03 | 0.038450 | 0.04 | 0.051257 | | 0.05 | 0.064063 | 0.06 | 0.076877 | 0.07 | 0.090690 | 0.08 | 0.104503 | 0.09 | 0.118316 | | 0.10 | 0.132129 | 0.11 | 0.145942 | 0.12 | 0.159755 | 0.13 | 0.173568 | 0.14 | 0.187381 | | 0.15 | 0.201194 | 0.16 | 0.214997 | 0.17 | 0.228799 | 0.18 | 0.242591 | 0.19 | 0.256383 | | 0.20 | 0.270175 | 0.21 | 0.283967 | 0.22 | 0.297758 | 0.23 | 0.311549 | 0.24 | 0.325340 | | 0.25 | 0.339131 | 0.26 | 0.352912 | 0.27 | 0.366693 | 0.28 | 0.380474 | 0.29 | 0.394255 | | 0.30 | 0.408036 | 0.31 | 0.421817 | 0.32 | 0.435598 | 0.33 | 0.449379 | 0.34 | 0.463160 | | 0.35 | 0.476941 | 0.36 | 0.490722 | 0.37 | 0.504503 | 0.38 | 0.518284 | 0.39 | 0.532065 | | 0.40 | 0.545846 | 0.41 | 0.559627 | 0.42 | 0.573408 | 0.43 | 0.587189 | 0.44 | 0.600970 | | 0.45 | 0.614751 | 0.46 | 0.628532 | 0.47 | 0.642313 | 0.48 | 0.656094 | 0.49 | 0.669875 | | 0.50 | 0.683656 | 0.51 | 0.697437 | 0.52 | 0.711218 | 0.53 | 0.724999 | 0.54 | 0.738780 | | 0.55 | 0.752561 | 0.56 | 0.766342 | 0.57 | 0.780123 | 0.58 | 0.793904 | 0.59 | 0.807685 | | 0.60 | 0.821466 | 0.61 | 0.835247 | 0.62 | 0.849028 | 0.63 | 0.862809 | 0.64 | 0.876590 | | 0.65 | 0.890371 | 0.66 | 0.904152 | 0.67 | 0.917933 | 0.68 | 0.931714 | 0.69 | 0.945495 | | 0.70 | 0.959276 | 0.71 | 0.973057 | 0.72 | 0.986838 | 0.73 | 1.000619 | 0.74 | 1.014399 |

The table provides values of $\Theta$ for different values of $t$, along with their first and second differences.