作者:Anatoly Detwyler;转自:公众号 DH数字人文
评论
Anatoly Detwyler / University of Wisconsin-Madison
———————————–
Humanities scholarship is increasingly aided by forms of computational analysis, from natural language processing and text mining to network and geographical mapping. What unites computational techniques is their basis in data collection, manipulation, and analysis. To study data is to search out patterns and anomalies: what stands out might reveal new things—or confirm received knowledge—about historical relationships or aesthetic trends. Good analysis begins with good data: the distribution of a corpus or database will shape the kinds of questions one can ask, and the quality of the answers received. Such a statement comes as no surprise to anyone with a basic training in statistics, where sampling is an integral part of the science. This includes the social sciences, which for a long time have predominantly focused on the conceits of quantitative analysis (Marx, Freud, and Weber are now more likely to be read in history and literature courses!).
But when it comes to the humanities, the new importance given to data—especially literary and historical data—has become part of a vital and ongoing conversation about the direction of the field. Is quantitative, data-driven analysis fundamentally different from our older practices of focused, careful reading, and the reconstruction of history using evidence? If data analysis and forms of close reading are to be reconciled, how should it be done?
As high-stakes as they are, these questions are open-ended. They have no correct or final answer—anyone who claims otherwise risks ignoring either the real gains or the real threats that data poses to humanistic knowledge production. One’s response to such questions serves as a proxy for that person’s attitude toward the field of “digital humanities” that is currently spreading across university departments and centers inside and outside of China. As data—and its many processes, from datafication and digitization, to quantitative, scaled analysis—become accordingly more prevalent, we are faced with the task of retracing its older manifestations and uses in our field. (of course, it is also important to examine the role of data in the development of modern existence and knowledge more generally). When, where, and how did early forms of data analysis and pattern recognition become integral— or even simply possible—as tools of humanistic inquiry? Answering this question reveals forgotten or repressed historical episodes, where older generations scholars leveraged data analysis to read, think about, and ultimately produce new forms of cultural and historical knowledge. A prehistory of the digital humanities would thus give us important context for considering the situation of the digital humanities today—including not just this new field’s myriad attractions, but also its limitations, disappointments, failures, and the forms of resistance and skepticism that they engender. Perhaps such an inquiry would even revise the broader self-identity of our field, challenging longstanding assumptions about the divisions between humanities and quantitative disciplines.
The concept of a “prehistory” is itself problematic. The word calls attention to a gap between two eras, where the latter period anachronistically marks the former through the absence of some defining thing or quality. What is implied by binding two eras together is more than simply a thematic connection, and instead an assertion about continuity and even causality: prehistory thus both affirms conventional historical periodization and challenges it by expanding an event horizon, pointing to an earlier origin. When it comes to the digital humanities, a field that though still in its relative infancy, is already very diffuse and famously difficult to precisely define, it is difficult enough to write a coherent history of the field—never mind excavating a prehistory! If one merely focuses on a fairly concrete marker such as the name “digital humanities” (or less influential but more evocative terms such as “distant reading” or “cultural analytics”) and simply tracks its appearance in scholarly discourse, the result is little more than a limited form of discourse analysis. It doesn’t tell us the process by which members of a field achieved their self-awareness. And the fatal mistake of over-focusing on words is the fact that historical actors often lack the terms to describe their conditions of existence (one need only to think about the example of the word oxygen, which humans have been breathing for far longer than its identification). In order to push beyond the epistemological boundaries of a name, one runs up against questions of ontology. What, really, is the historical object that is the “digital humanities”? Does its “digital” apply only to digital computers? If so, then the historical horizon wouldn’t extend back beyond the 1950s, for example the early collaborations between the Italian Jesuit priest, Roberto Busa, and the computer company, IBM, which resulted in a concordance of St. Thomas Aquinas’s work recorded entirely on punch cards. Or the 1964 conference organized by IBM on “humanities computing” and literary data processing, an event that inaugurated a fecund period of experimentation and dialog, including the foundation of the journal Computers and the Humanities in 1966. A fuller understanding of the digital humanities before its period of consolidation in the late 1970s and 1980s must await further research.
Already such projects illustrate how scholars have been experimenting with computers to explore questions related to text, authorship, and language for decades before the humanities became “digital.” But these examples don’t come near to exhausting the possible prehistory of the digital humanities. If one takes a more expansive approach to defining digital humanities by method and technique rather than by the role of the computer, then many vital connections spanning the computing era and the preceding century can be identified. Indeed, forms of counting and numeracy have been part of humanistic knowledge production since the early 1800s—the period when the discipline of humanities itself emerged into what we now recognize it as. Examples range from nineteenth-century Germany philologists counting poetic meter of classic Greek poetry, to the experiments by the physicist, Thomas Mendenhall, to measure words by the number of letters in order to determine authorship of certain Shakespearean works. Like the history of the early computing age, this picture is still being filled in. Within this early history, one of the most interesting cases is the role of quantitative analysis and statistical reasoning in formation and institutionalization of the humanities at Tsinghua University in the 1920s.
Patterns: Distance as Knowing
Insofar as this episode has a beginning, it is a lecture by the famous intellectual, Liang Qichao, that he gave at the Southeast University (Dongnan daxue) in November of 1922. The speech was transcribed and published in the widely-read Supplements to the Mornal News (Chenbao fukan), giving us a fascinating—if overlooked—record of an invention. In this speech, Liang introduced a new method that he was developing, titled “statistical historiography” (Tongji lishixue).
As its name suggests, the method applies the principles of statistics to historical data in order to identify historical trends and patterns. Liang’s inspiration was the rise and fall of China’s population across dynasties. This interest in historical population was not new, and had in fact for several decades drawn the close interest of late Qing intellectuals interested in social reform. Indeed, population was at the center of a growing interest in biopower such as Thomas Malthus and eugenics; and Liang himself had published an article on population in 1903 in the pages of his journal Xinmin Series Newspaper (Xinmin congbao). In this earlier piece, he looked to recent history to explain and criticize the unreliability of the Qing government’s figures and the state’s poor management of the population. Twenty years later, however, Liang reversed the relationship between statistics and historiography. Instead of using history to explain a popular statistic (that of China’s population as 400,000,000) and its implication for China’s domestic and international situation, Liang now sought to put statistics in the service of writing history. (This doesn’t represent a retreat from politics or contemporary significance, but rather Liang’s interest in scholarly rigor, a turn that is reflected in the ambitious works of scholarship that he produced in this final stage of his career). The new method, simply put, aimed to collect and appraise all the small details and facts that even a very careful scholar might otherwise ignore when reading historical accounts. As Liang memorably puts its:
“……欲知历史真相,决不能单看台面上几个大人物几桩大事件便算完结;最要的是看出全个社会的活动变化。全个社会的活动变化,要集积起来比较一番才能看见。往往有很小的事,平常人绝不注意者,一旦把他同类全搜集起来,分别部居一研究,便可以发现出极新奇的现象而且发明出极有价值的原则……统计学的作用,是要“观其大较”。换句话说:是专要看各种事物的平均状况, 拉匀了算总账。[…In order to obtain the full picture of history, it is not sufficient to watch great events of important persons of high social status. It is of more significance to see changes of all social activities, which can only be possible through accumulating archives and comparing. Common people would certainly not pay attention to trivia, all sorts of which could be collected and categorized. In that process, one may find opportunities of discovering very significant social phenomena and very valuable principles… The purpose of statistics is to “observe macrotrends” (Guan qi da jiao). It is to see the average normal situation of everything at their balance.]
As we look back a century, Liang Qichao continues to astound us with the breadth of his interests and his intellectual creativity. “Statistical history” is a quintessentially modern moment, representative of a proliferating interest in the Republican era in new scientific approaches to history and literature, and its sociological orientation represent as “passion for facts” that is characteristic of the period.[1] But Liang’s method stands out in particular as a precursor to later experiments with quantitative analysis by the Annales School and cliometrics. It is even possible to see it as a forerunner of the digital humanities, in particular the “distant reading” of Franco Moretti, and his focus on “units that are much smaller or much larger than the text: devices, themes, tropes—or genres and systems.”[2] Indeed, when Jiang Wentao and I established our academic column in the journal Shandong Journal of Social Sciences (Shandong shehui kexue)three years ago, we decided to adopt Liang’s eloquent phrase “Observing Macrotrends” (Guan qi da jiao)as a title to honor this parallel.
But while “statistical history” opens a prehistory of the digital humanities, it also casts China’s much older historiographical tradition in a new light. Specifically, Liang does not merely credit his invention to the introduction of western statistical science to China. Instead, he positions it as a synthesis between modern, western science and Qing Evidential Learning (Kaozheng xue)scholars such as Gu Donggao and his magisterial study, List of Big Events in the Spring and Autumn Period (Chunqiu dashi biao), a text which dis- integrates the classic Spring and Autumn period (Chunqiu)into a series of charts (biao ) organizing the names, events, and places into neat registers. The influence of Gu’s work on Liang demonstrates the contingency and slipperiness of locating “pre-history” itself. Even if we look at List of Big Events in the Spring and Autumn Period as a kind of origin point, we must acknowledge that its main technology, the chart, itself has a prehistory that goes back much further than the Qing. Indeed, as a way of grouping information and structuring it in a way that makes it accessible and easy to compare between points within a set, the charts resembles an early form of database, or data frame. What is new in Liang’s method is the use of numbers to symbolically manipulate the data (though the use of numbers for the representation of life and events is itself ancient, dating back far before written history. In English, “digit” refers both to number as well to finger, denoting the latter’s use as a numerical index. Humans—and culture—have always been digital). Historical Statistics (Lishi tongjixue) thus is a kind of originary event, but one that builds upon older practices and technologies.
In the context of 1920s China, Liang Qichao was more interested in looking forward to the future of scholarship. He envisioned a large-scale project of twenty-four general charts (tong biao) that would supplement China’s twenty-four general history (tong shi). Though he would pass away before initiating this ambitious program, his method enjoyed broad purchase amongst his contemporaries, inspiring in the following decade a number of studies on the geographical distribution of historical figures(and here we can draw another parallel with contemporary scholarship, Harvard’s Chinese Biographical Database Project, which showcases the great wealth of prosopographical information in China’s textual record). But nowhere is Liang’s influence more evident than in the work of the classicist scholar, Wei Juxian (卫聚贤1899-1989), who worked directly to develop historical statistics into a universal method that anyone could use. The case of Wei is of particular interest to the prehistory of digital humanities because of his interest in moving beyond historical sociology into the realm of textual analysis.
Using an Abacus to do History
In the mid 1920s, Liang Qichao, along with other prominent historians such as Chen Yinke (陈寅恪) and Wang Guowei (王国维)formed a core group of instructors within Tsinghua University’s Institute of National Learning(Guoxue yanjiu yuan国学研究院), a short-lived but dynamic institution that played a key role in the development of China’s modern scholarly episteme. This period, much like our own today, when academic institutions and scholarly disciplines are in great flux, reminds us that some of the most interesting ideas resulted from serious attempts to reconcile traditional and modern epistemologies and methods. While it is true that, retroactively, the discipline of National Learning has come to be considered as a conservative field of scholarship thanks in part to its nativism and its opposition to (or at least difference with) the cosmopolitanism of May Fourth intellectualism. But it is worth recalling that many of those associated with National Learning explicitly sought to investigate China’s national history using new, modern tools. One of the best-known advocates of this project was Hu Shi(胡适), who, along with Fu Sinian and Gu Jiegang, called for the “re-organization”(重新整理)of traditional historiography by bringing it into closer alignment with the tenets of natural science. As Hu famously exhorted, scholars should be “bold in proposing one’s hypothesis, and minute in seeking out evidence” 大胆的假设,小心的求证.Within this move to scientize scholarly method, however, only Liang’s historical method used statistical science, basing arguments on the calculation of average states rather than the identification of logical inconsistencies. Within the National Learning, however, Historical statistics was relatively marginalized: it does not appear to have been taught to students, and Liang’s colleagues did not adopt it for their projects.
The exception to this was the work of one student at the academy named Wei Juxian. Wei entered the National Learning with a somewhat untraditional educational background, having studied accounting in business school before switching to history. He would later recount how he was frequently teased by his classmates at Tsinghua, who, upon seeing him performing his research with an abacus in hand, derided him as an unintellectual “merchant.” But Wei’s interest in accounting and the tabulation of data made the empiricism of statistics especially attractive to him. During his time at Tsinghua, he worked hard to expand Liang’s method into a fuller set of tools, publishing a series of texts explaining how to use statistics in the study of the past, and showcasing the results of such work.
To get a clear sense of Wei’s agenda, one need only to flip through a key article that appeared in 1929, an abbreviated version of Wei’s primary research project that was titled “The Method of Applied Statistics in the Reorganization of National Studies” (yingyong tongji de fangfa zhengli guoxue 应用统计的方法整理国学).That the article was published in Eastern Miscellany ( Dongfang zazhi 东方杂志 ) one of the most widely-circulating popular journals of its day, suggests how Wei and the journal editors envisioned a broad appeal of historical statistics; and the article helped cement Wei’s reputation as the leading advocate of the method. In Wei’s hands, the application of “historical statistics” expanded into “statistical historiography”[统计历史学], where any text could become a kind of population of individual words or characters, all of which could in turn be counted and analyzed. The article firmly anchors the value of the statistical method in the rhetorical and visual appeal of data visualizations like graphs and tables, including over a dozen very finely-crafted and beautiful pie charts, graphs, and other visualizations to compare language and content of the Spring and Autumn Annals and the Zuozhuan commentary.
Several years later, following a series of lectures at Chizhi Academy (Chizhi xueyuan 持志学院)in Shanghai, Wei expanded on his article by publishing a textbook in 1934 titled simply Historical Statistics, which aimed at operationalizing the method more fully. From articulating a definition of “data” and how to extract it from a text, to how to calculate it and express it visually in order to infer historical facts, Wei’s book stands as the most comprehensive explanation and demonstration of historical statistics. Here the historian is reimagined as a social scientist who surveys the past. Reviewing various observational methods that produce data, such as direct surveying or sampling, Wei proposes a new category, that of “indexing” (索隐, also glossed as引得), which refers to the process of extracting directly data from historical documents. Relying on his own experiences at Tsinghua, Wei describes the materiality and mental labor of indexing in detail, for example instructing the reader to avoid reading and notating a text at the same time because, he warns, the brain can’t do both at once; or instructing the reader to use a pen or a colored pencil to mark places’ names, events, characters, or passages, and then collect them onto index cards (the format for which he provides a handy template). On its own, this discussion is a fascinating and original account of data-fication. But it is folded into a larger description of statistical method that charts analysis across three stages, where make textual data increasingly abstract or processed: one begins with a statistical genealogy (tongji pu 统计谱)that processes text, then turns it into to a statistical chart (tongji biao 统计表)by expressing the data with numbers which can be statistically analyzed, and, finally, summarizes the results of the analysis with a statistical graph (tongji tu 统计图), making them easy to understand and visually appealing. Armed with this protocol, anyone could perform statistical historiography. But what is particularly striking is how aptly this process describes digital humanities today. Any introductory course in China on the subject could assign this striking text for its first week.
But the explanation and illustration of this historiographical method takes up the first third of the work. The rest of the text also deserves mention in our critique of prehistories, for it is devoted to the subject of “The History of Statistics in China” [中国统计学史]. Here, Wei shows an acute awareness of the need to couch his method in nativist terms and national history. Moreover, as writes in the introduction:
中国人的保守观念传统思想非常的大,以为统计学乃是外来的,中国的国学用不着用外人的方法去研究。殊不知统计学是中国的土产,中国的古人曾屡为用;现在将中国土产的图谱学略为改造为统计学,使之研究中国的国学,当较前人的成绩为佳。故作此中国统计学史一文,以为呼醒! [Most Chinese, whose intellectual thinking could be very conservative, take statistics as from foreign nations, and think it is not necessary to study Chinese National Learning through foreign methods. They have no idea that statistics as a method is from this land, and their ancestors often used it. Now we are reforming Chinese learning on genealogies and charts into statistics, and use them to study Chinese National Learning, which would make more achievements than previous generations. Thus, I wrote this historical essay of Chinese statistics to awake people’s consciousness.]
This isn’t simply a cynical strategy to legitimate his method amongst his peers: Wei Juxian is earnest in his attempt to prove that various sorts of statistical practices occurred earlier in China than in Europe. But more interesting than the tendentious claim that statistics “originated” in China is the Wei’s wide-ranging discussion of various sorts of information management in Chinese history. This result is an unprecedented history of research on data practices in China that, from our vantage point today, constitutes a prehistory to the prehistory of digital humanities, a kind of endless layering that challenges the notion that contemporary digital analysis is unique or unprecedented.
Conclusion: a flower that bloomed before its season and therefore left no seed
Like Liang, Wei Juxian did not pursue the development of historical statistics beyond his initial engagement with it. What’s worth paying attention to are not the results of these scholars’ experiments, but rather the methods that they proposed to employ. Collectively these two scholars have left us with a fascinating episode that not only suggests possibilities for a more systematic history of information management and data analysis in modern Chinese scholarship, but also offers a vital point of comparison with which to examine our present moment. Here we have pointed to some of the more obvious similarities between historical statistics and digital humanities insofar as they both desire to marry together empirical or quantitative methods with a field of knowledge that is traditionally more interpretive.
But the differences are equally important. While it seems that the only thing missing is the automation of labor—the tabulation, data extraction, and mathematical analysis— made possible by computers and modern interfaces, the latter technologies do create a marked difference in the scale and complexity of analysis. Historical statistics really could be conducted on an abacus. In contrast, a computationally heavy and reiterative process such as the fairly established technique of topic modeling (which allows one to analyze millions of documents at once and identify groups within the set by shared themes or “topics”) represents a major leap in the capacity to identify patterns in data. Techniques like topic modeling are in fact so sophisticated that they pose a kind of black box, sealing the process of analysis from the human operator, thereby making the computer and its algorithm an active partner rather than simply a passive instrument. Another key difference with this period is attitudinal. Liang and Wei were both positivists, and embraced quantitative knowledge as a kind of certainty. Put differently, historical statistics represents a distinctly modernist passion for systems, rationalization, efficiency and progress. If its promoters found it flawed, it was only because the technologies available in their day were inadequate to their vision. (And this is true despite this method’s connections to earlier modes of scholarship—given their interest in empirical knowledge and the authentication of texts, it is easy to imagine that scholars of evidential learning would have embraced Liang’s positivist attitude.) The best examples of digital humanities scholarship are open and reflexive about the limitations of their results, both in terms of project design and statistical significance and confidence.
Together, the differences and likenesses can illuminate the historical specificity of the digital humanities today. To a degree, this episode suggests that the digital humanities is not exactly derivative or an entirely recent import. Though, to be sure, I do not mean to imply that digital humanities somehow originated in China, either—as I have emphasized, all origin stories are problematic. Instead, this episode is one, small piece within a larger, global puzzle that is episodic and largely discontinuous. But we don’t need to draw a direct lineage between historical statistics and the digital humanities in order to gain inspiration from it. The Japanese scholar, Kojin Karatani, memorably described the work of the author, Natsume Soseki, as “a flower that bloomed before its season and therefore left no seed”(一朵忽然之间绽开的花). We could say the same thing about the Tsinghua scholars of the 1920s. But this flower’s season has now arrived. As the digital humanities grows today, let us recognize ourselves as inheritors of this earlier spirit of exploration and openness.
注 释:
[1] Tong Lam, A Passion for Facts: Social Surveys and the Construction of the Chinese Nation–State, 1900-1949Berkeley: University of California Press, 2011.
[2] “Conjectures on World Literature,”in New Left Review, 1: January-February, 2000. Online at: http:// newleftreview.org/II/1/franco-moretti-conjectures-on-world-literature (accessed 4/15/2017).
编 辑 | 姜文涛
原刊《数字人文》2020第一期, 转载请联系授权。