The most distinctive characteristic of the behavioral approach is its emphasis on reproducible knowledge. This approach does not belittle or ignore knowledge and evidence of a more intuitive or subjective sort, but it does recognize the very real limits of such knowledge. Without insights and suspicions as to certain historical patterns, there would be no place to begin, no hypotheses to test, and no theoretical models to formulate. But in recognizing the impermanence and contestability of subjective knowledge, the behavioral approach seeks methods that might avoid some of those liabilities. These methods are of several types and are best understood in connection with the types of knowledge sought.
Historical knowledge may be distinguished by two very different sets of criteria. The first are essentially theoretical and substantive in nature: Are we indeed getting at the relevant combination of variables in our search for explanation? The second are epistemological: Assuming that we are on a promising substantive and theoretical path, what is the quality of knowledge that we think has been acquired or that we hope to acquire? Leaving the matter of the relevance of our knowledge aside for the moment, we can focus on the qualitative dimensions of our knowledge. One possible way of evaluating the quality of historical knowledge is to first reduce it to its component assertions or propositions, translate these (if need be) into clear and operational language, and then ascertain where each such proposition or cluster of propositions falls along each of three dimensions.
The first, or accuracy, dimension reflects the degree of confidence that the relevant scholarly community can have in the assertion at a given point in time; this confidence level is basically a function of the empirical or logical evidence in support of the proposition, but may vary appreciably both across time and among different scholars and schools of thought at any particular moment. The second qualitative dimension reflects the generality of the proposition, ranging from a single fact assertion (of any degree of accuracy) to an assertion embracing a great many phenomena of a given class. Third is the existential-correlational-explanatory dimension: Is the assertion essentially descriptive of some existential regularity, is it correlational, or is it largely explanatory? With these three dimensions, an epistemological profile of any proposition or set of propositions can be constructed and a given body of knowledge can be classified and compared with another, or with itself over time.
For many the objective is to move as rapidly as possible on all three dimensions. We seek propositions in which the most competent, skeptical, and rigorous scholars can have a high degree of confidence, although these propositions may have originally been put forth on the basis of almost no empirical evidence at all. They will be propositions that are highly "causal" in form, although they may have been built up from, and upon, a number of propositions that come close to being purely descriptive. And they will be general rather than particular, although the generalizations must ultimately be based on the observation of many particular cases. As to the accuracy dimension, a proposition that seems nearly incontrovertible for decades may be overturned in a day, one that is thought of as preposterous may be vindicated by a single study or a brilliant insight, and those that have stood at the "pinnacle of uncertainty" (that is, a subjective probability of 0.5) may slowly or quickly be confirmed or disconfirmed. Moreover, a statement may enjoy a good, bad, or mixed reputation not only because of its inherent truthfulness or accuracy, but merely because it is not in operational language and is therefore not testable.
Shifting from the degree-of-confidence dimension to that of generality, the assertion (of whose accuracy we are extremely confident) that World War I began on 29 July 1914 is less general than the assertion that more European wars of the past century began in the months of April and October than in others, and this in turn is less general than the assertion (which may or may not be true) that all wars since the Treaty of Utrecht have begun in the spring or autumn. Theory (defined here as a coherent and consistent body of interrelated propositions of fairly high confidence levels) must be fairly general, and no useful theory of any set of historical events can be built upon, or concerned only with, a single case. As Quincy Wright reminds us: "A case history, if composed without a theory indicating which of the factors surrounding a conflict are relevant and what is their relative importance, cannot be compared with other case histories, and cannot, therefore, contribute to valid and useful generalizations. On the other hand, a theory, if not applied to actual data, remains unconvincing." (In the same article, he also noted, "Comparison would be facilitated if quantifications, even though crude, are made whenever possible.")
Existential Knowledge and Data-Generating Methods When we leave the accuracy and the generality dimensions and turn to the third proposed dimension along which a piece or body of knowledge may be described, we run into greater conceptual difficulty. A useful set of distinctions are existential, correlational, and explanatory types of knowledge. Existential knowledge is essentially a data set, or string of highly comparable facts. If, for example, we are told that one army had 1,248 men killed or missing in a given battle and that the enemy had "also suffered heavily," we would have something less than data. Similarly, statements that the United States has had two separate alliances with France since 1815, running a total of forty-seven years, and that American alliances with England and Russia have been nearly the same in number and longevity as those with France, would also be something less than data. That is, data provide the basis for comparison and generalization across two or more cases, situations, nations, and so on, and permit the generation of existential knowledge.
Of course, existential knowledge would not be very useful to the diplomatic historian if restricted only to phenomena that are readily quantified. Most of the interesting phenomena of history are of the so-called qualitative, not quantitative, variety, and it is usually assumed that the world's events and conditions are naturally and ineluctably divided into those two categories. Many phenomena that are thought to be "qualitative in nature" at a given time turn out to be readily quantifiable at a later date. In the physical world, examples might range from the difference between yellow and orange to the amount of moisture in the air; these were originally believed to be qualitative concepts. In the biological world, one thinks of metabolic rate or genetic predispositions. Likewise, in the world of social phenomena a good many allegedly qualitative phenomena turn out to be quite quantitative. Some illustrations might be the "linguistic similarity" of two nations, the extent to which nations gain or lose diplomatic "importance" after war, the changing "cohesion" of work groups, or the national "product" of given economies.
It is one thing to think of a way to measure or quantify a phenomenon that has been considered nonquantifiable and quite another thing to demonstrate that the measurement is a valid one. That is, we may apply the same measuring procedure to the same phenomenon over and over, and always get the same score; that demonstrates that our measure is a reliable one. But there is no way to demonstrate that it is a valid one—that it really gets at the phenomenon we claim to be measuring. The closest we come to validation of a measure (also known as an index or indicator) is a consensus among specialists that it taps what it claims to be tapping, and that consensus will rest upon (a) the "face validity" or reasonableness of the claim; (b) the extent to which it correlates with a widely accepted alternative indicator of the same phenomenon; and (c) the extent to which it predicts some measurable outcome variable that it is—according to an accepted theoretical argument—supposed to predict.
Quantification, however, may take a second, and more familiar, form. That is, in addition to assigning a numerical value to each observation of a given phenomenon, one can quantify by merely (a) assigning each such case or observation to a given nominal or ordinal category, and then (b) counting the number of observations that fall into each such category. The nominal category pertains to a set of criteria that are used to classify events and conditions; an ordinal category refers to the criteria used to rank them. To illustrate, generalizing about the American propensity to form alliances might require distinguishing among defense, neutrality, and entente commitments. Once the coding rules have been formulated and written down in explicit language (with examples), a person with limited specific knowledge could go through the texts and contexts of all American alliances and assign each to one of those three categories.
The same could be done, for example, if one wanted to order a wide variety of foreign policy moves and countermoves, in the context of comparing the effects of different strategies upon the propensity of diplomatic conflicts to escalate toward war. The judgments of a panel of experts could be used to ascertain which types of action seem to be high, medium, or low on a conflict-exacerbating dimension. The earlier distinction between the reliability and validity of measures is quite appropriate here. There might be almost perfect agreement among experts that economic boycotts are higher on such a dimension than ultimata, since the latter are merely threats to act. But if one examined a set of diplomatic confrontations and found that those in which boycotts were used seldom ended in war, whereas those characterized by ultimata often did end in war, one might be inclined to challenge the validity of the ordinal measure.
So much, then, for existential knowledge. Whether merely acquired in ready-made form from governmental or commercial statistics, or generated by data-making procedures that are highly operational and reproducible, propositions of an existential nature are the bedrock upon which we can build correlational and explanatory knowledge.
Correlational Knowledge and Data Analysis Methods. Although many diplomatic historians will be quite content to go no further than the acquisition of existential knowledge, there will be others who will not only want to generalize, but also to formulate and test explanations. To do so, it is necessary to begin assembling two or more data sets and to see how strongly one correlates with the other(s). Correlation or covariation may take several forms and may be calculated in several ways, depending on whether the data sets are in nominal, ordinal, or interval (that is, cardinal number) form.
In general, a correlational proposition is one that shows the extent of coincidence or covariation between two (or more) sets of numbers. If these sets of numbers are viewed as the varying or fluctuating magnitudes of each variable, the correlation between them is a reflection of the extent to which the quantitative configuration of one variable can be ascertained when the configuration of the other is known. Or, in statistical parlance, the coefficient of correlation, which usually ranges from +1.00 to –1.00, indicates how accurately one can predict the magnitudes of all the observations in one data set once one knows the magnitudes in the other set of observations. Even though the measured events or conditions occurred in the past, we still speak of "prediction," since we know only that those phenomena occurred, but do not know the strength of association until the correlation coefficient has been computed.
Another way to put it is that the correlation between two sets of data is a measure of their similarity, whether they are based on pairs of simultaneous observations or ones in which variable Y was always measured at a fixed interval of time after each observation or measurement of variable X. If they rise and fall together over time or across a number of cases, they are similar, and the correlation between them will be close to +1.00; but if Y rises every time X drops, or vice versa, they are dissimilar, giving a negative correlation of close to –1.00. Finally, if there is neither a strong similarity nor dissimilarity, but randomness, the correlation coefficient will approach zero. There are many different measures or indices of correlation, usually named after the scholar who developed and applied them, but two of them can serve as good examples. Although any correlation coefficient can be calculated with pencil and paper or a calculator, the most efficient method is the computer, which can be programmed so that it can automatically receive two or more sets of data along with instructions as to which correlation formula to use, and almost instantaneously produce coefficient scores. Looking, then, at the very simple "rank order" correlation, we note that it is used to calculate the similarity or association between two sets of ranked data. It is particularly appropriate when we can ascertain only the orderings, from high to low or top to bottom, of two data sets and cannot ascertain with much confidence the distances or intervals between those rank positions. The rank order statistic is also especially appropriate for checking the validity of two separate measures or indicators and ascertaining whether they "get at" the same phenomena.
To illustrate, if we suspect that a fairly good index of a nation's power is simply the absolute amount of money it allocates to military preparedness—regardless of its population, wealth, or industrial capability—we might investigate how strongly that index correlates with an alternative measure. And, since power is itself a vague and elusive concept, we might decide to derive the second measure by having the nations ranked by a panel of diplomatic historians. When these two listings—one based on a single, simple index and the other based on the fallible human judgments of scholarly specialists—are brought together, we then compute the rank order correlation between them. The results of any such computation can in principle, as noted earlier, range from +1.00 to –1.00, with 0.00 representing the midpoint. If there is absolutely no pattern of association between the two rankings, we say there is no correlation, and the figure would indeed be zero. Further, if each nation has exactly the same rank position in both columns, the rank order correlation between the two variables is +1.00, and if the orders are completely reversed (with the nation at the top of one column found at the bottom of the other, and so on), it would be –1.00. None of these three extreme cases is likely to occur in the real world, of course, and on a pair of variables such as these, a rank order correlation of approximately +0.80 is pretty much what we would expect to find when the computation has been done.
The above example illustrates how a rank order correlation might be used to estimate the similarity between two different rankings. While a high positive correlation would increase confidence in the validity of military expenditure levels as a measure of power, we assumed no particular theoretical or causal connection between the two data sets. Now, however, suppose that we believed (that is, suspected, but did not know with very much confidence) that the war-proneness of a nation was somehow or other a consequence of its level of industrialization. If we only know how many wars a nation has been involved in during a given number of decades, we have a rather crude indicator of its war-proneness. Such a number does not discriminate between long and short wars; wars that led to a great many or very few fatalities; and wars that engaged all of its forces or only a small fraction. Thus, we would be quite reluctant to say that a nation that fought in eight wars is four times more war-prone than one that experienced only two military conflicts in a given period. We would even be reluctant to say that the difference between two nations that participated in six and four wars respectively is the same as that between those nations that fought in seven and five wars. In sum, we might be justified in treating such a measure of warproneness as, at best, ordinal in nature.
Suppose, further, that our measure of industrialization is almost as crude, based, for example, on the single factor of iron and steel production. Even though we might have quite accurate figures on such production, we realize that it is a rather incomplete index, underestimating some moderately powerful nations that have little coal or ore and therefore tend to import much of their iron and steel. In such a case, we would again be wise to ignore the size of the differences between the nations and settle for only a rank order listing. Depending on the magnitude of the resulting coefficient of correlation between these two rank orderings, we could make a number of different inferences about the relationship between industrialization and war-proneness. Suppose now that we were working with much better indices than those used in the two illustrations above, and that we could measure our variables with considerably greater confidence. That is, we now have a basis for believing that our indicators or measures are not only valid (and that has no bearing on the statistical tests that can be applied to a variable) but reliable and quite precise. If one variable were the amount of money spent for the operation of IGOs (intergovernmental organizations) in the international system each half-decade, and the other were the number of nation-months of war that ended in each previous half-decade, and such interval scale data appeared to be very accurate, we could employ a more sensitive type of statistical test, such as Pearson's product moment correlation.
The reason that a product moment type of correlational analysis is more sensitive is that its computation does not—because it need not—ignore the magnitude of the differences between the rank positions on a given pair of listings. Whereas rank order data merely tell us that the nation (or year, or case, or observation) at one position is so many positions or notches above or below another, interval scale data tell us how much higher or lower it is on a particular yardstick. The magnitude of those interrank distances carries a lot of useful information, and when the data are of such a quality to give us that additional information, it is foolish to "throw it away." Thus, when the measures of the variables permit, we generally use a product moment rather than a rank order correlation. As we might expect, certain conditions regarding the normality of the distributions, independence of the observations, randomness of the sample, and so on, must be met before we can use this more sensitive measure of statistical association. Once we have computed the rank order or product moment correlation coefficient between any two sets of measures, several inferences about the relationship between the variables become possible, providing that one additional requirement is met. If the correlation score is close to zero, we can—for the moment, at least—assume that there is little or no association between the variables and tentatively conclude that (a) one measure is not a particularly good index of the other (when validation of a measure is our objective), or (b) that one variable exercises very little impact on the other (when a correlational proposition is our objective). If, however, the correlation coefficient is about 0.50 or higher, either positive (+) or negative (–), we would want to go on and ask whether the above-mentioned requirement has been met.
That requirement is that the correlation be high enough to have had a very low probability of occurring by chance alone. That is ascertained by computing (or looking up in a standard text) the statistical significance of the correlation. When we have very few pairs of observations (or cases) in our analysis, even a correlation as high as 0.90 can occur by sheer chance. And when we have a great many cases, even a figure as low as 0.30 can be statistically significant. To illustrate with what is known as the Z-test, statisticians have computed that a product moment correlation would have to be as high as 0.65 if the association between 12 sets of observations were to be thought of as having only a 1 percent probability of being mere coincidence. Conversely, if there were as many as 120 cases, they calculate that a correlation as low as 0.22 would also have only a 1 percent probability of being mere coincidence. In statistical parlance, we say that for a given number of cases, a given correlation score is "significant at the 1 percent (or 2 percent or 5 percent) level."
Once we have ascertained that the strength of a given correlation, as well as its statistical significance, is sufficiently high (and the evaluation of "sufficiently" is a complex matter, still debated by statisticians and scientists), we can then go on to make a number of inferences about the predictive or the explanatory association between the variables being examined. The nature of those inferences and the justification for them is explored in the next section. Suffice it to say here that when two variables are strongly correlated, and one of them precedes the other in time, we have a typical form of correlational knowledge but are not yet able to say very much of an explanatory nature.
Explanatory Knowledge and Causal Inference It should now be quite clear that operational classification and enumeration, combined with statistical analysis of the resulting data sets, can eventually produce a large body of correlational knowledge. Further, it should be evident that correlational knowledge can indeed provide a rather satisfactory basis for foreign policy prediction, despite the limitations noted above. But the major limitation lies in the difference between predictions based on correlations from the past, and predictions based on theories. Without a fairly good theory (which, it will be recalled, is more than either a hunch or a model), our predictions will often be vulnerable on two counts.
First, there is the problem that has often intrigued the philosopher of science and delighted the traditional humanist. If the decision makers of nation A have a fair idea what predictions are being made about them by the officials of nation B, they can often confound B by selecting a move or a strategy other than the one they think is expected. A good theory, however, has built into it just such contingencies, and can often cope with the "we think that they think that we think, etc." problem. Second, a good theory increases our ability to predict in cases that have no exact (or even approximate) parallel in history. That is, it permits us to first build up—via the inductive mode—a general set of propositions on the basis of the specific cases for which we do have historical evidence, and then to deduce—from the theory based on those general propositions—back down to specific cases for which we do not have historical evidence.
If theories are, then, quite important in the study of foreign policy, how do we go about building, verifying, improving, and consolidating them? To some extent, the answer depends on one's definition of a theory, and the word has, unfortunately, disparate meanings. To the layman, a theory is often nothing more than a hunch or an idea. Worse yet, some define theory as anything other than what is real or pragmatic or observable; hence the expression that such and such may be true "in theory, but not in practice." The problem here is that—and this is the second type of definition—a number of scientists also imply that same distinction by urging that a theory need not be true or accurate, as long as it is useful. To be sure, many theories do turn out to be useful (in the sense that they describe and predict reality) even though they are built upon assumptions that are not true. One example is in the field of economics, where some very useful theories rest on the assumption that most individuals act on the basis of purely materialistic, cost-versus-benefit calculations. We are fairly certain that a great many decisions are made on the basis of all sorts of noneconomic and nonrational considerations, but, somehow or other, the market or the firm nevertheless tends to behave as if individual shoppers, investors, and so on do make such calculations. The important point here is that the theory itself is not out of line with reality, but that the assumptions on which it rests may be untrue without weakening the predictive power of the theory.
This leads to the need for distinguishing between theories that are adequate for predictive purposes and those of a more comprehensive nature that seek to not only predict, but to explain. While the dividing line between them is by no means sharp and clear, we can nevertheless make a rough distinction between those theories that are supposed to tell us what happens, or will happen under certain conditions, and those that tell us why it happens. Even in economics, it is recognized that the predictive power of its major theories can be improved, and their explanatory adequacy markedly enhanced, by looking into and rectifying the psychological or other assumptions on which they rest.
Thus, even though short-run needs may be served by theories that are merely predictive, the concern here is with theories that are capable of explaining why certain regularities (and deviations therefrom) are indeed found in human affairs. To repeat the definition suggested earlier, a theory is a logically consistent body of existential and correlational propositions, most of which are in operational and testable form, and many of which have been tested and confirmed. This definition requires that all of the propositional components in the theory be, in principle at least, true; further, if the theory is to explain why things occur as they do, the propositions underlying it must also be true. Given these stringent requirements, small wonder that that there is so little in the way of explanatory theory in the social sciences.