How Should We Think About Big Data?

Harold Sjursen

Professor Emeritus at NYU Tandon School of Engineering

Big Data has rapidly become a subject of interest and controversy, but how should we approach and understand it? Harold Sjursen proposes a broad philosophical perspective to contextualize and emphasize it in light of a redefinition of the human condition.

Big Data is such a large and interesting topic, calling for a theory of everything. How can we begin to approach it, and why is it important? Despite its au courant focus on the new knowledge embedded in and now being released from Big Data, the questions being posed are perennial themes of philosophy appearing in new guise. The prisoners in Plato’s Cave Allegory were likewise called upon to rethink the human condition, based upon the unveiling of new knowledge previously sequestered behind the veil of false appearances. By mining the depths of Big Data, its proponents argue, we will see through false constructs and understand in what sense we, too, have been prisoners, and will subsequently redefine the human condition and be better able to place ourselves on the road to liberation.

Let’s start with a story:

It’s Manhattan in the 1960s, and everything seems up for grabs. Two priests, who were boyhood friends growing up in Brooklyn in the 30s, keep up their friendship by meeting weekly for lunch. One is a Jesuit—cerebral, intellectual, intense—the other a Franciscan—compassionate, relaxed, living to realize Pax et Bonum. Their boyhood friendship is nurtured by the guilty question: Is it acceptable to smoke and pray at the same time? They meet weekly at a small Italian restaurant just south of Greenwich Village and, over eggplant parmigiana, discuss pressing issues. Inevitably, their conflict over the propriety of smoking and praying arrives as the topic for theological reflection. The discussion follows the canonical methods of disinterested scholarship, hermeneutics and apologetics, eudemonistic ethics, and enlightened psychology. Their prodigious collective memory consults the Bible, the Church Fathers, Augustine, Aquinas. They invoke positivistic accounts of language, the later Heidegger’s non-objectifying thinking, the principles of Carl Rogers’ client-centred therapy, but still the solution evades them. Their schedules are full, and they finally agree to pursue the question further the next time they meet.

A week later, they return to the same restaurant and upon arrival, each notes a look of self-satisfaction upon the face of the other. “Father J, you’re looking rather pleased with yourself today,” said Father F. The Jesuit replied in kind, noting the Franciscan’s delight bordering on smugness. “Well, I have solved our puzzle,” the Franciscan said. “The answer is No!” His Jesuit companion, taken aback, retorted, “But that can’t be. We discussed it thoroughly, and the answer is undoubtedly Yes.” After enduring a few moments of silent puzzlement, Father J finally inquired: “What question did you pose?” Without hesitation, Father F confidently asserted, “Exactly the question we puzzled over: Is it alright to smoke while praying?” The Jesuit then allowed that he thought he understood the contradiction. “Ah, in our conversations we discussed praying while smoking.” For if while smoking, for example, one witnesses an act exemplifying the grace of God and responds sincerely with a spontaneous prayer, of course that is acceptable and proper, but on the other hand, if one is in the midst of fulfilling the priestly duty of administering the holy sacraments, then smoking would be an abomination! It’s all in how you frame the question.

But how do we frame the question, and indeed, given the resources of Big Data, what are the questions? The theme of this inaugural issue of HAS Magazine connects Big Data, creativity, and the human condition. Big Data as a concept within the engineering discipline of informatics was described at the beginning of the 21st century. Its famous definition, advanced by Doug Laney, an analyst at Gartner, concisely identifies the potential challenges before us: “Big Data” is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.¹

Like the priests in the story, we believe that there are definitive answers to the existential questions of how we should live our lives, if only we knew and could understand the sources. But for us, unlike our hapless pair, Big Data does not present itself to us within a set of canonical texts with established, if disputed, methods of interpretation. On the contrary, Big Data (some have said, like dark matter) is normally invisible to us—it is wildly heterogeneous, dynamic, and in perpetual flux. Yet we believe that if only we can find the key to this treasure trove, the abundance of insight unlocked will allow us to pursue the good. Today, this kind of techno-optimism may be somewhat muted, but still, our hope is that with the right computational heuristics, we will be able to mine the data and organize the needed information in a manner that will yield key information, permitting the best decisions and ultimately the solution to our most vexing and threatening problems.

These aspirations allow for a variety of creative approaches. Just as there are many ways to search for pebbles on the beach, and as many ways to use or play with those collected, so is our imagination given a full range of opportunity when facing the expansive universe of Big Data. Will such creativities express insight, and will they lead us to understand the existential dilemmas of the good and how to live well? More to the point, perhaps, will they initiate or advance a rethinking of the human condition?

The proposition that the conjunction of Big Data, creativity, and thinking as a possible way to understand the human condition radically reframes enduring questions behind the central admonition of Socrates to “know thyself.” Socrates was surely suggesting a moral imperative, something we ought to do for the sake of living the good and just life. But what knowing oneself really means, and how one goes about doing so, are persistent and open questions. The very idea that the use of Big Data can facilitate a better awareness or understanding of the human condition is both novel and, from a traditional philosophical point of view, against the stream.

In the philosophical tradition, the relationship of thought, knowledge, and understanding to action or praxis has been much discussed without a strong consensus. One can find arguments for both their mutual distinctiveness as well as for the contrary notion that on some level they are the same. Common sense suggests—as reflected in Doug Laney’s formulation—that thinking precedes action, the effectiveness and quality of which is enhanced in rough proportion to the accuracy, detail, and correctness of the thinking. Thus, it is assumed that thinking prepares the way for action, and the better-informed the thinking, the more likely that successful action will follow. But is the collecting and analyzing of Big Data a mode of thinking such that the commonsense belief that it can improve action is in order?

The results of Big Data mining can hardly be likened to the standard body of scientific evidence, let alone to the contemplation of personal experience. Our awareness of Big Data is almost hypothetical. Of course, in ordinary experience, we are also frequently removed from crucial evidence that is invisible to us, mediated by technology such as a microscope, and in this sense Big Data superficially resembles much scientific information. But this sort of scientific evidence, produced through laboratory experimentation or field work, is normally an enlargement of something of which we have an immediate awareness, a clue to or symptom of an underlying complexity. In the case of Big Data, the situation is different—what is purportedly disclosed comes as a surprise because we did not have evidence suggesting it, only theoretical conjectures. For this reason, it can be compared to dark matter, which we know about primarily only inferentially. It is necessary for the universe to hold together, but what we know of it is hardly more than that. So the matter of Big Data may very well influence our lives in significant ways of which we are unaware. Knowledge of it might change our understanding of the human condition. This may be the premise motivating data mining.

But Big Data is more than a matter of practicality. It has inspired creative appropriation by artists like my friend and colleague, Luke DuBois.DuBois is an academically-trained musician—performance and composition—and a visual artist who is completely at home in the world of digital media. Truly an artist, he nonetheless thinks of himself (as do many other contemporary artists I know) as a kind of engineer, in the understanding that engineering is what artists actually do. One of his most interesting recent projects, A More Perfect Union, has been reported with great enthusiasm in the business press, probably because of its deployment of attitudes towards Big Data that seem to resonate with Doug Laney’s famous definition.²

DuBois’ approach is both ironic and challenging. He encourages us to think about what reality is—not an abstract, cosmological account of reality, but the reality of our day-to-day, lived experience. He does this by mining a data base, namely the words that members of online dating services use to describe themselves. DuBois describes the project as follows:

“A More Perfect Union is a large-scale artwork based on online dating and the United States Census. In progress since 2008, the work attempts to create an alternative census based not on the socio-economic fact but on socio-cultural identity.

In the summer of 2010 I joined 21 different online dating services and “spidered” their contents, downloading 19 million profiles of single Americans. These profiles were sorted by zip code and analyzed for significant words. A series of national, state and city maps (43 in all) show this data in various ways. Most notably, a set of prints shows a road atlas of the United States, with the city names replaced by the word used by more people in that city than anywhere else in the country. This lexicon of American romance, as it were, consists of more than 200,000 unique words, and gives an imperfect, but extremely interesting perspective on how Americans describe themselves in a forum where the objective is love.”³

In this project, large heterogeneous data sets are culled and juxtaposed, revealing an aspect of ordinary life with a new and surprising focus. The subject—how one presents oneself when seeking romance—addresses something of our understanding of the human condition, suggesting how we understand basic human characteristics such as erotic desire and the need for companionship. Importantly, however, it also indicates that we don’t know and might not recognize ourselves in this context without the kind of analysis this project reveals.

As it was reported in the Financial Times, “What [people like DuBois] are doing is trying to convey the secret life of data in a way that is elegant and exciting… we have gone from a very literal view of data to a very emotional view.”⁴

This project would seem to satisfy the elements of the proposition that through creativity, Big Data can help us to redefine and thus better understand the human condition. But is that what is actually being done? Are enumerated and correlated records of large amounts of human behaviour (statements or actions) indicative of what makes humanity what it is? Does this enhance our insight and lead to better decision-making? Pragmatically, perhaps. If knowledge of the most successful terminology for finding a romantic partner will lead to my greater success in finding such a partner, then in that sense it can guide me to making a better decision. This seems doubtful, but even if it is the case, it does not afford anything like a better understanding of the human condition. And if this is how we make decisions, are we following our inward light, are we in possession of any genuine insight, or are we merely performing a calculative process that is possibly devoid of any understanding whatsoever?

I cited Socrates’ admonition to know thyself as conveying a moral dimension, but self-knowledge is often elusive. Socrates’ injunction is more than a moral admonition—it’s an epistemic challenge as well. How does one know oneself? Our introspective self-examinations may lead us to reinforce beliefs that obscure genuine self-understanding. Are summations of the data of our lives any more auspicious a path to self-understanding?

Another of DuBois’ projects engages this question. Called Self-Portrait, 1993-2014, he explains it in this way:

“The term quantified selfie was, to my knowledge, coined by Maureen O’Connor in 2013. Writing in New York Magazine (Heartbreak and the Quantified Selfie, 12/2/13), O’Connor discusses the Tumblr blog of journalist Lam Thuy Vo and the work of designer Nick Felton in the framework of a larger cultural trend in which the narcissism of social media and the ubiquity of Big Data collide in a new form of self-portraiture. These data portraits often co-opt, parodically or otherwise, the visual semantics of post-Tufte infographics for the purposes of generating content for the Millennialist online sharing.

The self-portrait I created consists of a force-directed graph of my email since September, 1993. In layman’s terms, imagine a “big bang” of a universe of personal and professional e-mail sent and received for 20 years; the different people in this universe have different mass and gravity, causing galaxies of attraction to form; those in constant dialogue with one another, or whose language is more familiar, or loving, have stronger bonds of attraction. The five or so primary e-mail addresses I’ve used over the years appear in the centre of this star map, with the several thousand people I’ve corresponded to surrounding them in clusters of sentiment and carbon-copy.”⁵

Portraits both reveal and conceal something of the human condition. That is, they open our eyes to perhaps unnoticed dimensions of self-presentation while simultaneously protecting or reinforcing one’s position in the world. The official portraits of the president of a university, for example, are intended to show how an individual embodies the spirit of the institution, while both preserving its legacy and leading it forward to master the new challenges of the future. That is to say, portraits create a person, institution, or event while asserting its natural compatibility and salutary relationship with the human condition writ large. The veracity of a portrayal is a function of its selectivity, no less so with reference to the results of previously unnoticed factoids uncovered by data mining.

So, how seriously should we take efforts to reframe the world according to the results of Big Data disclosures? DuBois’ ironic re-description of commonplace beliefs is playful, and a reminder that what we see is sometimes little more than what we want to see. Our understanding of the human condition, no less than our seeing the world around us, is an intentional act, formed and guided by tradition and necessity. The humorous question, “Is it OK to smoke and pray at the same time?” illustrates this aspect of our understanding of the human condition. Big Data indeed provides a platform for creatively redefining the human condition, but is it a disclosure of truths hidden deep within the human collective psyche or, on the contrary, an arbitrary collection of things/events that we find as evidence in support of our contingent desires?

Consider the three components of Doug Laney’s definition of Big Data: (1) high-
volume, high-velocity and high-variety information assets, (2) that demand cost-
effective, innovative forms of information processing (3) for enhanced insight and decision making. We notice that the source (1) is not accessible to ordinary observation or comprehension. It is too vast, changes too quickly, and is too diverse for that. Normally invisible, these characteristics may evoke a sense of awe when we first become aware of them. Next it is asserted (2) that this awesome source makes demands of us, viz., we are to know it through innovative information processing. Normative modes of information processing will not do. And finally, (3) those who inquire in the proper way will be rewarded. This anti-democratic message is obviously not implicit for everyone—not even most people, but only a select few. The philosophers or high priests of Big Data can access this source and they, at their discretion, mediate the enhanced insight they possess for the benefit of the many.

This doctrine has been put forward before; politics and religion both offer examples. We have mentioned the Platonic version as found in the Republic. The gnostic paradigm⁶ suggests another, perhaps more insidious version. According to the Gnostics of late antiquity, the truth is concealed, and humanity is generally imprisoned in a body surrounded by veils of ignorance. A secret message is conveyed to a select few, providing the salvific key breakout of this constraining environment and on to understanding and liberation. Is it too great a stretch to think of Big Data in these terms—as an unapproachable deity that can provide the secret message that will lift the veil of ignorance and bring humanity to a brighter future? Are artists like Luke DuBois or analysts like Doug Laney the purveyors of such a secret message?⁷

If we believe Aristotle, the human condition is one not of certainty but wonder. The question of purpose, the purpose of action, and the belief that there must be purpose, that things make sense, supports the conviction that with enhanced insight, beneficial decisions are possible, and progress can be made. Behind the idea of progress is the assumption of fixity, a stability against which motion towards a goal is possible. On this view, the human condition is largely a quest for understanding.

This belief in progress and the quest for certainty have fomented the crisis of modernity from Descartes to Kant. For Descartes, the discovery that what appeared to be and was evident to ordinary observation—and which was validated by metaphysics beginning with Aristotle—was false, and called for the wholesale and radical reassessment of all knowledge. His method was disbelieving, or at least doubting all one had been taught and which had been confirmed by experience as correct. Descartes called this discovery our new knowledge, a precarious formulation that ultimately required the severing of mind from body, and the declaration that God is no deceiver, to legitimate it. The faith that Descartes’s God required was in the enhanced insight afforded by modern mathematics (of which Descartes was a prominent founder). Descartes’ assertion of the efficacy of mathematical rationality to both succinctly summarize the true nature of the physical world and to demark the limits of human insight was eventually capped and partially refuted by Kant’s famous declaration that “I had to deny knowledge in order to make room for faith.” Similarly, he asserted: “The schematicism by which our understanding deals with the phenomenal world… is a skill so deeply hidden in the human soul that we shall hardly guess the secret trick that Nature here employs.”⁸

Kant acknowledges, in this way like the advocates of Big Data theory, that the source of our knowledge (the noumena) is beyond our grasp, that what appears to us (phenomena) is due to the structure of human reason itself. The ways of nature are beyond our ken while still determinative of our well-being. Conformity to duty becomes the key ethical principle and guide for our actions and the basis of our hope.

The promise of Big Data asserts the claim to be able, through the data-mining technology of information science, to penetrate Kant’s noumena or, in other words, not to be constrained by the limitations of pure reason. The new knowledge disclosed is (or will be) salvific in that it promises to put us on the road to progress. In this way, it is possible to transcend the limits and constraints on the human condition as understood by Kant. This approach of Big Data is inherently gnostic—it is predicated on the communication of secret knowledge (from a demythologized deity) conveyed by a messenger to an elect few. The messenger of this secret knowledge is technology, aided for the present by human under-labourers. The salvific promise entails the subordination of human action to data mining technology. Indeed, it must be the case, given the presupposed complexity of the fields of Big Data, that successful data mining can ultimately be accomplished only by computing devices managed by Artificial Intelligence. Clearly, such an eventuality would redefine the human condition, the nature of human action, and the existential meaning of being human.

An alternative way of conceiving the human condition, one that preserves the integrity of human action, has been suggested by Hannah Arendt. Let us approach her theory from the standpoint of thinking. Descartes’ famous designation of a human being as a thinking thing (res cogitans) of course raises the questions of just what thinking is, why it is the defining characteristic of humanity, and why it is that humans choose to think. Kant was critical of what he called Denker vom Gewerbe (professional thinkers) because thinking was the natural disposition of humanity. Yet when referring to the highest interests of humanity (for Kant, God, Freedom, and Immortality), he opposes those he mocks as the Luftbaumeister of reason, people who would try to establish the truth about these matters through arguments removed from all common experience and understanding. For Arendt, the problem is precisely how to see thinking in terms of common experience and understanding. Mental activity that is disconnected from such understanding (as indeed the calculative heuristics of mining Big Data would be) cannot lead to action and our determination of ourselves as agents of the human prospect.

In her aptly-titled book, The Human Condition, Arendt delineates several useful distinctions: the public and private realms; the vita activa (active life) and the vita contemplativa (contemplative life); and the three types of activities within the vita activa—labour, work, and action. Unlike in the philosophical tradition, the contemplative life is not viewed as superior to the life of action. Action is not dependent upon the formative influence of thought, and the goal of action need not be to change understanding—Arendt is not simply inverting Marx’s 11th thesis. While Marx argues that humans are animal laborans—that is, defined by the necessity of labour—Arendt asks what if automation (AI technology) frees us from this necessity of labour so that we don’t need to labour merely to survive? Work, according to her scheme, is different because whereas labour is what one does simply to survive, work has different goals and produces durable objects. Action, the third category, includes what we ordinarily call action as well as speech; it is the way by which humans present themselves to each other, and is distinctly human. Being human implies the ability to act. It is through action that the human world is created and maintained, and through which human community is sustained. But this is due to difference, not conformity to an unchanging essence—the human condition is contingent, beginning anew with each birth, and hence a matter of ever-changing possibility. “Human plurality, the basic condition of both action and speech, has the twofold character of equality and distinction. If men were not equal, they could neither understand each other.”⁹

The Cartesian mind-body dualism is by Arendt supplanted by more subtle distinctions in which human action is neither predetermined nor the emulation of an ideal type. Moreover, with her famous emphasis on natality, she underlines the fact that with each birth, a new beginning, with new possibilities and hope, is established. A Hegelian view of history is ruled out. Like Kierkegaard, Arendt sees new individuals as the foundation of the human condition. These individuals are to be sure thinkers, but thinkers in the midst of lived experience, contributing to the common realm of possibility by working through diverse opinions.

The 24th World Congress of Philosophy was held in Beijing in August 2018. The theme of the Congress was Learning To Be Human. The Congress represented all branches of philosophy, and vigorously pursued the general theme from multiple perspectives. Big Data was not a prominent concern among the participants. The idea of learning to be human stands out in an age when the notion of post-humanity is thought by many to be in its incipient stages, or upon us already. In this context, the question of learning how to be human assumes a new urgency. It is a step beyond the Socratic injunction to know thyself in order to live well in accord with the good, beautiful, and just. The question becomes how, or whether it is possible, to co-exist in a world in which non-human entities—cyborgs in possession of intelligent agency—determine the social and cultural norms available to humans. It is curious, and perhaps distressing, that the reality of Big Data, with its inextricable bond to such devices as intelligent robots, has not emerged as one of philosophy’s leading concerns.

As we have suggested, the accessibility of Big Data radically reframes the question of what it means to be human, and of the state of the human condition. This reframing challenges the traditional formulations of philosophy from antiquity and the Enlightenment. Big Data is not available to us either through a rational, deductive logic or through sense perception—the two sources of all knowledge that Descartes argued were exhaustive. Moreover, given the dynamic, even volatile state of Big Data, an epistemology yielding certainty is out of the question. The approach advocated in the techno-business world suggests a dangerous Gnostic typology based upon privileged access to a body of hidden knowledge that can offer the enhanced insight necessary for a life of excellence. The mining of Big Data is offered as the new paradigm, obviating approaches rooted in common experience. Arendt’s notion of action with a pluralistic world of competing doxa derived from experience in the public realm is likewise, on this view, inapplicable.

Where do we turn? It seems that the challenge presented by Big Data is how, in a world where decisions are based on aggregations of information that are beyond the parameters of natural access, is it possible to sustain an idea of humanity that preserves our unique status as agents who can pursue the good, true, and beautiful? Creative attempts to redefine the human condition in works of art suggest, as several of Luke DuBois’ projects do, that rather than active agents, we are caught unawares in the volatility of Big Data’s dynamism. This surely should be a question high on the agenda of philosophy’s quest to learn how to be human.

Svetlana Sicular. “Gartner’s Big Data Definition Consists of Three Parts, Not to Be Confused with Three ”V”s.” Forbes.
Gillian Tett. “The art of Big Data.” Financial Times. July 5, 2013.
Luke DuBois.
Ibid., Financial Times.
Luke DuBois.
The term gnostic paradigm refers to ideas held by the Gnostics of late antiquity but is broader than the inverted theological cosmology they proclaimed. See Hans Jonas, Gnosis und spätantiker geist.
I very seriously doubt that either has entertained anything like the gnostic typology. I mean only that their work hints at structural similarities.
Both remarks are found in Kant’s Kritik der reinen Vernunft.
Arendt, The Human Condition.

Harold Sjursen is a teacher and administrator in higher education for over 40 years, serving on the faculty of a liberal arts college and a school of engineering. With an educational background in the history of philosophy, he has had a lifelong interest in science and technology. His current research and writing interests focus on the philosophy of technology, global philosophy and technological ethics.

http://harold-sjursen.org/

Previous publication

Summary

Next publication

Big Data and
Singularities

JUNE 2020

Author

http://harold-sjursen.org/

PDF version