In the previous chapter, I investigated the issue of the powers of individual entities—powerful particulars or “singularities”—to, in part, self-document themselves in their expressions, and so, within modern rights theories, to have claims of agency rights. As I have discussed, in the Western tradition we see a general historical trajectory from a “strong documentarity” to self-evident powerful particulars and then to the representation of entities by their empirical traces. In the empirical sciences, technologies, techniques, and inscriptions must construct affordances that allow for the expression of a studied entity’s dispositions. Drawing the line between science as affordances that allow dispositions to be expressed and engineering, which allows only certain dispositions to be expressed toward systematic use, is not always easy to do in the technosciences. This is particularly true in the social sciences, and information science as part of such, for the dispositions of a self are socially and culturally constructed and anticipated.
This chapter looks at the problem of evidence from what I call a “post-documentation” perspective. This perspective stresses the principles enumerated below as representative of the current state of documentarity as a shift to evidence attained more on the horizon of empiricism and sense, rather than categories and reference. Post-documentation technologies can result in computer-mediated judgments. Algorithms are not just the documentary “techniques” (Briet, 1951) of yesterday,1 but they are real-time technical mediations of information and communication, and with these, judgment. (Where previously in traditional documentation practice, documentary techniques and technologies were more explicit and external for judgment, now they are more implicit and folded within judgment.) Increasingly, Rimbaud’s modernist edict of “Je est un autre” becomes manifest as rapidly composed and changing judgments in real time, not only by human judgment, but also by computer-aided judgment and by computer-produced judgment. The future will increasingly be composed by computational judgments.
Post-documentation could be said to have the following principles:
Striving to represent particulars qua their powers, affects, and trajectories (“sense”).
Particularity should be represented through particular modes of expressions proper to the individual and group types of the particular entity.
Modes of expression are emergent out of historical situations and situational contexts for individual particulars, which help afford their expressions and the meaning of those expressions for others. Human expressions are constantly mediated and afforded by sociocultural and technological inscriptions, though they can have some choices among which to be inscribed by, indexed through, and which to deploy in expressions. (Principle of historicity and freedom.)
Digital “information infrastructure” refers to sociocultural-computational techniques and technologies (techne) that result in meaningful or “informational” expressions (documents) that can be joined together in communicative exchanges and streams of information.
Reference for meaning and truth-value should be borne by sense, rather than solely by a priori class categories or formal parameters throughout the processing of information.
A priori categories, such as those that result from classification structures, can be heuristics for investigating entities, but they are only that.
Information is the product of communicative or affective relations. “Being” is a relation and a product of evolution. Being becomes evident and is recognized as evidence, but it can have evolved otherwise and could be recognized as otherwise from different perspectives and scales of evolution and analysis.
Reference is a product of the theories, methods, and technologies used to understand entities in relation to their powers. The subjects and objects of reference should be understood by their senses of expression, as seen within a “critical” epistemic perspective.
Science and scholarship are primarily sets of activities and social functions, not the product of such activities in statements, documents, or their institutions (e.g., books, libraries). These latter are expressions of science and scholarship in certain moments of their production and institutions.
“Document” still remains an important concept when analyzing post-documentation technologies, but in regard to these technologies documents can be seen as products of them, as much as the input for such.
The central point that I will develop in this chapter is that of a contrast between documentary representation as a product of a priori categories of identity and difference (i.e., the products of traditional documentation technologies) and documentary representation as more a product of a posteriori or empirical sense (i.e., the products of “post-documentation” technologies); that is, post-documentation technologies are less technologies of direct reference and more technologies of sense (from which reference is then taken).
Though we have seen in this book the tension between concepts of documentarity as being reference and sense driven in the genres of “philosophy” and modern “literature,” respectively, I will suggest, in this chapter, that these genres have been bridged to some degree by post-documentation technologies today. Any understanding of information mediation must begin with the social-psychological pole that we can call “ideology,” on the one hand, and the technical pole that mediates this into the delivery of a “need,” on the other (Day, 2014). Online, we essentially live in a world of cybernetic systems, where technologies follow our expressions and position us in documentary space through data minute by minute, in small time scales (e.g., GPS location) or larger time scales (for most of us, in non-extraordinary times of our lives, Google search rankings). Like the servomechanisms in gunnery apparatuses during the Second World War, which exemplified the theory of cybernetics, information systems position us in order to maximize our information search “hits.” Information mediation for the human social (and even nonhuman, but human-managed) world spins around this dialectical axis of data being positioned for us through us being positioned by the data. The purpose of information technologies is to produce reference out of sense traces of data, which sometimes means creating single referential documents, or more often now means creating sensual vectors for agent identities through collecting recursive inclusions from the past, indexing the present in socio-documentary matrixes, and creating predictive networks to be shaped and verified by future data. Realist genres, whether textual or visual (like game playing), further lead to both high precision and recall by limiting the domains of informational sense and reference. We should realize that when we talk of the empiricism of post-documentation technologies, we are also involving paradigms for collecting and processing data and for making judgments about such. Like empiricism in the sciences, there are, of course, preliminary ontologies and other paradigms involved and realist expectations of what are pragmatic and valuable uses for information.
An historical account of this renewed empiricism in information science could start with the recent shift in the philosophy of language behind contemporary information retrieval. David C. Blair’s works during the 1980s up through his large volume on the importance of Ludwig Wittgenstein’s philosophy of language for information retrieval and information science (Blair, 2006) theoretically foreshadowed and accompanied a seismic documentary shift (at least on the Internet), from category-based indexing and searching to link-analysis, social-network, and machine learning systems. The shift from traditional documentary indexing by means of subject headings and other category or class descriptors to graph analysis was signaled in the search engine world by the triumph of Google’s PageRank algorithm in Google Search over Yahoo!’s earlier directory structure.
Underlying both the theoretical and practical shift in information retrieval is the epistemological perspective that meaning and value in language are products of its use, not products of categories of the mind. Mental categories are themselves viewed as products of language and its use in the world, in relation to language and nonlinguistic materials and events. Documentary categories and other modes of traditional reference are seen as products of cultural forms in social use. In contrast, older information retrieval, emerging out of traditional documentation technologies and techniques, forced social use to follow very professionally constructed and controlled cultural forms, such as controlled vocabulary or subject headings.
Traditional documentation was based on category application to objects, and in bibliography to the content of works. Users had to consult authoritative subject headings or other descriptors, which named entities by means of controlled vocabularies that construct identity through structures of difference and identity (e.g., x is a “dog” because x is not a “cat,” etc., in Library of Congress Subject Headings). In contrast, post-documentation technologies, such as social-network analyses and machine learning algorithms, give value and meaning based on statistical calculations of the use and the relations of data, including, of course, language use.
Returning to our historical account in order to provide more detail, let us recall that both Yahoo!’s directory structure and Google PageRank attempted to address the need to increase relevancy in searching massive datasets. Except in cases of searching very unique names (e.g., the dinner party guests in the old Monty Python’s Flying Circus skit whose names are “A Sniveling Little Rat-Faced Git,” and his wife “Dreary Fat Boring Old Git”), information retrieval precision and recall generally have an inverse relationship to one another. More precision results in lower recall and higher recall results in less precision.
Freely searching a database of indexed websites the size of the Internet can result in massive recall for a search term, which then leads us with the problem of low precision or relevancy. Without further algorithmic adjustments, more recall of a keyword term on the Internet generally brings about less precision. For example, in the historical beginnings of the graphic user interface internet during the mid-1990s, a search on “World War II” in AltaVista—initially, a more or less “pure” keyword search engine—brought back for the user all sorts of results mixed together without meaningful ranking for a user’s query: World War II documents, souvenirs, personal remembrances, and so on.2
What to do to improve this? In the case of the search engine developers at the time, the answer seemed to be to look at what librarians and other documentalists did in the past when they had to organize information.
So, Yahoo! and some other companies at the time employed “ontologists” through either full human means or computer-assisted means to create categories or “directory structures” for searching (e.g., “Art>Painting>Europe>19th century>Impressionism”). However, one of the important limits with such a traditional documentary approach is that of determining the meaning of documents through names that represent the essential “aboutness” of a document. Such naming occurs through professionally managed controlled vocabulary, and so it requires consistency in naming by catalogers or ontologists, and on the other side of indexing, namely search, it requires that users know the preferred name for an indexed entity.
Google’s PageRank algorithm took a different approach, in part utilizing another library tool, that of citation indexing, whose origins goes back to legal indexes in the nineteenth century and the important work in the second half of the twentieth century by Eugene Garfield in bibliometrics and his creation of the Science Citation Index (Rieder, 2012). Distinguishing it from systems such as Yahoo!’s directory structure, Google Search crawls and indexes the web and then increases relevance in search through a link-analysis system called PageRank. (There are other means for increasing relevance in Google Search, but link analysis was the distinguishing means from earlier systems like Yahoo!’s directory structure.) Link analysis is an adaptation of citation-analysis systems (which are still used in scholarly communication) for the indexing and ranking of documents on the Internet (and so, a tool for searching), which attempts to increase precision or relevance by social means.3 Relevance is increased through algorithmic calculations of what others believe are the most important documents for subjects. Seen from the philosophy of language, the success of PageRank (and hence Google Search as well) vindicated a Wittgensteinian philosophy of language, at least when applied to a socially broad databases such as the Internet. It also established the principle that meaning, value, and categories, too, are established through relationships between documents and documents, people and documents, and people and people. Graph data structures and algorithms are thus an important tool in mapping meaning by the social use of language.
However, this same approach also has a downside, particularly when used in extremely large public document systems such as the Internet. The downside is that social sense doesn’t necessarily lead to knowledge or better knowledge. More information is not necessarily more or better knowledge. “Knowledge,” at least in the institutional sense of this term during modernity, has referred to information that has passed through institutional authority for verification and assurance as to its likely truth-value. Such means require methods of evaluation, peer review, consensus among experts, and at least potentially, challenges to the factuality or truth of claims. (Even in nonscientific institutions, such as journalism, fact-checking should take place.) Google Search indexes, retrieves, and ranks documents that are “information” in many senses of the term—opinions and rumors, as well as institutionally vetted knowledge documents. Seen from the perspective of modernist knowledge institutions (scientific, bibliographic, and so forth) Google Search is, generally, an information indexing and retrieval system, not a knowledge indexing and retrieval system. (In contrast, for example, Science Citation Index, as a scholarly communication index that indexes peer-reviewed articles in approved journals, is a “knowledge” indexing and retrieval system in the institutional sense of the term.)
Earlier in this book we discussed Paul Otlet’s theoretical understanding of documentation, based on his belief that books and other documentary materials contain factual representations of the world and that classifications are knowledge organization systems. If traditional documentation is an epistemology and a set of technical tools based on creating and applying exclusive classes or categories of aboutness to entities, then what about the phenomenological senses of those entities? What happens to the documentary paradigm if the “aboutness” of an entity is determined more by its own trajectories, relationships with other entities, and the material-semiotic affordances for expression and agency, rather than by categories applied to it? In other words, what happens when sense recomposes the categories?
Controlled vocabulary of documentation metadata (e.g., subject headings and thesauri terms) have restrictive levels of sense, created by means of rigid totalities of structures for imposing identities and differences among terms. Controlled vocabulary controls variance in the senses of words, so as to produce reference. An entity is either a cat or a dog in Library of Congress Subject Headings (LCSH). There are no “dogcats” as subject concepts in LSCH. Controlled vocabulary is a sort of firm linguistic structuralism.4
In order to remedy this restriction of sense in the construction of reference within older documentation technologies, while still providing precision or search relevance, newer information technologies try to amplify the sensibility of indexed terms through incorporating the historical, social, and geographical uses of these terms, both during indexing and in the process of searching. Examining the neighboring location of signs to one another, their semantic relationships, the social networks and grammars of their use, perhaps location coordinates or time values, and the searches of others and the user’s previous searches, newer documentary technologies use social sense to find a probabilistically better balance between search recall and precision than was available either through older documentation technologies or through pure keyword indexing.
From the perspective of philosophy of language, in his book Becoming-Social in a Networked Age, Neal Thomas (2018) has deeply explored sense and reference creation through “new media.” Thomas analyzes three forms of graph relations and their philosophies of language operating in these media today: knowledge graphs (e.g., linked data and the semantic web), social graphs (e.g., Facebook algorithms), and predictive-analytic graphs (e.g., machine learning), all constituting “post-documentary” (what I’m calling here, “post-documentation”) technologies (Thomas, 2018). Graph relations create reference by means of sense through logical and algorithmic calculations of semantic, social, and recursive data inputs.
For Thomas (2018), the key to understanding post-documentation technologies is understanding how they locate and eventually package data into meaningful referential identities and categories through different techniques of rationality (e.g., analytical reasoning, social communication, and predictive computing). In the case of personal computing and mobile devices, input increasingly includes the actions and past actions of system users for calculating likely information and likely relationships between documentary items or between data. Increasingly in the development of these technologies, the post–World War II perspective of cybernetics in information system design becomes more apparent: the “user” of a system is used by the system in order to create meaning for the user and for others (Thomas, 2012; Day, 2014). The user is a data point in a communicative feedback system that rationalizes the whole, sometimes “on the fly,” and sometimes further in machine processing. In short, the user is a function within a documentary system that isn’t just present for the user, but rather indexes the user based on past occurences of data and, increasingly in online systems, predicted futures of data. Post-documentation technologies maintain the perspective that has been inherited from traditional documentation systems that the user’s needs are constituted by the documents available, but now they make the user a function within iterative feedback or dialectical loops that are increasingly indexed to both users’ and documents’ past, present, and future sensibilities via relationships, weighted judgments, and predictive learning using varieties of data types (language, geographical position, friends, time, etc.).
With post-documentation technologies, the technical structures of traditional documentary technologies disappear into the background, particularly in real time use and buried in the “black box” of learning algorithms, and the social appears as data—as a given. “Social” and “technical,” “users” and “documents” are dialectical poles within an algorithmic phenomenology of information and need. Social and technical parameters and functions, as well as concepts of users and documents, disappear into the “facts” of information and experiential knowledge on the Internet. What are hidden from view are not only the technical, but also the ideological and social parameters of search, including, as Thomas (2018) shows, the philosophical assumptions working in computational approaches.
Returning to a central theme and the vocabulary in my first book, The Modern Invention of Information: Discourse, History, and Power, we must always remember that “information” on the Internet or whatever textual form is “dragged” from experience into what used to be called “the virtual,” and back again. Texts, as virtual spaces, are potential spaces for meaning. These are “informed” not simply by other texts, but by means of experience. Search engines index and rank the real into the virtual, and these virtual spaces are actualized and made real by means of experience.
“Structured data,” in classification structures, and to some extent with linked data, still maintain an epistemic distance between the semantic or “virtual” space of the text and that of experience. It is with link-analysis and other social systems that we have begun to see the greater folding of lived experience into information systems and the mutual overlapping of formerly more distinct information, communication, and media spheres.
Like with older documentation tools, such as classification structures, the danger of “the information age” is that of its seeming disappearance at the moment of its triumph; that is, the triumph of the mediation of reality by “the virtual,” the internalization of what was a distant and explicit documentary tool into a necessary affordance for not only knowing, but also perceiving and actively judging, in everyday life. This is when all of reality is mediated by “virtual” (or to use this term in another of its senses, by potential) relations. When the text becomes real. That we no longer speak of digital information as “virtual” (and with this, as potential), and we no longer speak of “the information age,” but now live it as the real and logically possible, instead of as simply potential, I think is indicative of the evolution of the information age into being time itself for us.
We earlier looked at the characters in Madame Bovary as examples of such an event in a much earlier age of documentation. And we saw Derrida’s reference to the problem of the fable, or the “as-if” structure of texts, when they are made real as informational media. My earlier books, and now this one, have been tracing this trajectory of the realization of the modern sense of information in our present time as ordinary lived experience, and its long evolution from the beginnings of Western metaphysics until now.
The notion of “information” as belonging to the domains of documentation and fact, and “literature” as belonging to the domains of fiction and the imagination, are in part being blurred by information being detached from institutions of knowledge production and their techniques and methods of verifying information. The line has never been solid, of course, since “literature,” as well as art, as we have seen, is made up of all sorts of performative activities using techniques and methods that are then, at least in personal experience, “verified.” Also, human beings constantly rely upon noninstitutional information, treated as knowledge, to get through their daily lives. Personal knowledge and experience are, of course, very important in our lives.
However, despite these issues, there remains the problem of the blurring of models of literary or aesthetic reality meant as analogical road maps for personal experience with institutionally mediated knowledge. Modern institutional knowledge arose in order to address problems with speculative, theological, and purely personal, experiential, knowledge. In our present day, this blurring (to the detriment of institutional knowledge) has to do with political conditions of resentment, distrust, and skepticism toward “official” institutions, and also with the availability of, and constant daily mediation of our lives by, noninstitutional information of many types.
Institutionally mediated and produced knowledge is always going to cost—in the time needed to produce, understand, finance, and preserve it. Libraries, laboratories, and universities are all parts of knowledge construction and circulation systems, though they do, of course, handle information of other types. They can be costly to run. Information that everyone already more or less knows, of course, is likely to be cheaper in all these aspects.
Because communication now occurs in published forms in social media as short documentary assertions (e.g., Twitter tweets and alike), we sometimes assume that the rhetorical and institutional knowledge construction procedures and modes of circulation that have existed for documents exist for these documentary fragments, as well. We may treat these fragments or “memes” as the conclusions to documentary enthymemes. We can be fooled by the evidential trust that we have learned to put into published texts. And when these fragments are then algorithmically used to generate more fragments or generate networks of fragments or fuller documents of fragments, what we then may encounter are networks and constellations of assumptions, hysteria, and innuendo, where the agreement of other “speakers” lay credence to the truth of the assertions being made.
As I write this, the fait accompli of the merging of fiction and documentation is being widely discussed: Phenomena such as “fake news” and political memes have had tremendous political effects. And while false information in the modern media is nothing new, what is new today is the speed and vast participation in live-time feedback systems that are merging communication and documentary ecologies into lived “information.”
Today there is a mixing of documentation, communication, and “media” ecologies, all with claims of being “information.” What is lacking is asking how they are each “informative” and if their informative elements are knowledge? And if they are knowledge, what are the manners and criteria by which claims are validated?
A “post-truth” era shouldn’t be the same as an era of falsity, but rather, it should be an era of the true. It should be an era of understanding different types of truth claims and their rhetorical and social forms, including identifying flatly false assertions.
As I discussed in The Modern Invention of Information, Walter Benjamin’s works of the 1920s and 1930s comment upon a similar dynamic of new media forms (cinema and radio) of “information” in Weimar and Nazi Germany as what has occurred with the Internet during the past thirty years. Then (both in Germany and elsewhere, such as in the US) and now, trust in mainstream institutions and liberal state political parties eroded due to economic distress and political corruption, and along with this the fragile foundations of the modern democratic state, lying in a trust in modern institutions and bourgeois cultural habits, dissolved.5 In the light of the appearance of “new media,” together with distrust of social, cultural, and political institutions, modernist knowledge institutions were cast aside and “the facts” of information were stated through new information technologies. New media doesn’t automatically lead to the erosion of modernist knowledge institutions and the resurgence of prejudice, but it opens up the space for new kinds or sources of information at lower economic and transactional costs. As Benjamin observed in the case of film (Benjamin, 1968b), at first this can be quite liberating for people who are trying to gain information, as the technological and institutional hindrances seem to be, at least at the point of searching, removed. And new media may focus (literally, in the case of film) upon events that were little depicted earlier, making available and magnifying that which escaped older media and knowledge institutions. But then this information space often becomes remediated through the return of old media and old social prejudices within that space.
The problem of the circulation of the documentary fragment in media and its reconstitution as a document by ideology and social prejudice well illustrates the increasingly “sensual” nature of documentary “aboutness” or reference, the shift to social senses, such as taste, for information, and yet the carryover of the aura of documentary knowledge. We expect our published information to be, if not true, then at least potentially truthful, or at least potentially arguable, and there to be evidence. Social sense, however, is not limited to this. Much of our use of the Internet is governed by taste—by likes and dislikes.
I am not a computer scientist, so my comments here will be very brief, but this book would be incomplete if I didn’t at least touch on machine learning and its construction of judgments through neural networks and deep learning techniques and technologies. It seems to me that such technologies are at the forefront of creating categories and representations as computer-mediated judgments, and so also, they suggest and generate prescribed actions that might arise from these, for both humans and machines.
Computer-mediated judgments take the form of, first, support systems for human judgments and, second, direct computer judgments. Examples of support systems are AI systems that support human activities such as identifying suspected persons in crimes. Direct computer judgments are those that result in direct action following computation or chains of computational systems, such as future automated weapons systems that could identify suspects and kill them or completely automated cars. Computer-mediated judgments take place throughout the Kantian categories of judgments: those weighing data and deriving conclusions (theoretical or knowledge judgments), decision support systems that enable human and machine action (practical or moral judgments), and even the automated construction of novels and other works of art (as products of judgments of taste). Increasingly, processing results will be integrated into other processing systems, bridging theoretical, practical, and aesthetic judgments.
Directly or indirectly, already and increasingly in the future, beings will increasingly be shaped by such computational judgments, as computers continue to shape us as both active subjects and as the objects of knowledge, action, and taste. Our children no longer see themselves as the autonomous subjects that we saw ourselves as even thirty years ago, and increasingly we come to view our own being as evolutionary residues of biomes and microbiomes, reaching from the microscopic to the planetary. With computer-mediated judgments we move from information technologies to knowledge technologies to judgment technologies, making manifest the historical role of inscriptionality and its material and conceptual tools as technologies of faculties of mind and action. Increasingly, to a degree larger than now, computational technologies will be not only supplements for organizing and recommending judgments for us to make, but they will directly be making and enacting judgments and we will be inputs in their systems. And increasingly, we will feel comfortable with these judgments in ways that may not be true now. Technology, like custom, tends to seduce us all.
Departing from a traditional AI paradigm of symbol manipulations, where machines are programed a priori to capture all the details of any given situation, “machine learning” derives values based on iteratively weighted input data. Much depends on the initial parameters for learning and for supervised learning, of course, but once a training set has been learned, then this learning can be extended to new instances, modifying the algorithmic weighting mechanisms themselves. These new instances allow machine learning to be both predictive and generative of new knowledge. Technologies that involve weighting are inherently judgmental, even if initially. As Ethem Alpaydin (2016) writes,
Every learning algorithm makes a set of assumptions about the data to find a unique model, and this set of assumptions is called the inductive bias of the learning algorithm.
This ability of generalization is the basic power of machine learning: it allows going beyond the training instances. Of course, there is no guarantee that a machine learning model generalizes correctly—it depends on how suitable the model is, and how well the model parameters are optimized—but if it does generalize well, we have a model that is much more than the data. A student who can solve only the exercises that the teacher previously solved in class has not fully mastered the subject: we want them to acquire a sufficiently general understanding from those examples so that they can also solve new questions about the same topic. (p. 42)
Machine learning must not just generalize, but it also needs to be able to adjust to deviant cases. It must be able to learn about particulars by their particular actions, not just by statistical averages. But through concentrating upon particulars in their histories, machine learning can also be susceptible to not only “catastrophic forgetting” (or “catastrophic inference”) (French, 1999), but also “catastrophic remembering” (Sharkey & Sharkey, 1995) of cases, giving undo weight upon earlier learned events.6
The extension of artificial intelligence to sense in the lived world constitutes an important step in the attempt to model the particular as particular. Seeing particulars in terms of emergent dispositional powers opens up paths for better understanding cell expressions in cancer and other natural phenomena as dispositional expressions in environments of affordances and suppression, as well as social phenomena. However, to reemphasize Alpaydin’s point above, what one ends up with still depends upon the empirical situation chosen, the inductive bias, the mediating parameters and other functions imposed and utilized, and lastly, the ends for which the processing occurs. In the social sciences, there is also the issue that the object of study and the method of studying it are shaped by the initial ontology chosen (e.g., in psychology whether one considers the brain or culture as “mind”). In the attempt to model particulars as particulars in their temporal expressions remains the problem of not only inscription, but also representation, in the choice of and application of data, in the processing of data, in the understanding of results, and in the social use of these results. Strong documentarity can come in the back door of weaker documentarity.