One of the challenges of studying technology is learning how to deconstruct the tools from the political ideologies held by those who develop them. As Timothy Mitchell explains in his book Colonising Egypt, the practice of science and systems of ordering national standards are modern projects that enable governments to maintain discipline and surveillance. A cog in the colonial project, the science of documenting every political act reflects a “tendency of disciplinary mechanisms, as Michel Foucault has called these modern strategies of control, was not to expect and dissipate as before, but to infiltrate, and colonise.”
In this article, I will discuss a set of research methodologies that emerge at the intersection of both cultural and technical analytics in the fields of digital humanities and new media studies. This analytic approach is born out of the contemporary socio-historical moment as we face new scales of information that demand machinated computation and political influences on our cultural practices that are systematic, transnational, and mobile. I will also summarize the limitations of this new media approach compared to other methods of media analysis. Given these challenges, how can we frame new cultural insights that we glean from the deployment of such methods? How can they act as a means of understanding contemporary media culture?
My first objective is to provide some historical context for the growing interest in the analytic properties of social media content. I argue that the stock of “primary data,” though not determinant, is important in the process of conducting nuanced cultural analyses. I consider the work of data visualizations, especially in relationship to the stock of primary data (actual tweets, status posts, etc).
My second objective is to argue for the importance of paying attention to the specific language of the stock of primary “social media” data. In my work, I focus specifically on the use of Arabic online. Of course, there is the cultural moment of what the United States calls “The Arab Spring.” But most analyses of this so-called “social media” revolution had not taken into consideration the analysis of the meaning of actual Arabic language use. After harvesting and analyzing Twitter posts for more than three years (2008-2012), I became aware that the use of Arabic language online was steadily rising. Hence, by presenting this work, I hope we can come closer to identifying and addressing the gaps in the textual analysis of digital information on the Middle East and North Africa (MENA).
Fig 1. These weighted averages indicate a high level of Arabic tweets on Egypt since 2010. [Run your mouse over the graph to display comparative values]
Short Genealogy of Cultural Analytics
Within an already emergent field of digital humanities and new media studies, cultural analytics is a methodology employed by few and deliberated on by far fewer. There are virtually no peer-reviewed essays, nor any books on this approach to date. There are only a couple of significant publications on cultural analytics, including a blog post by Kurt Ralske and an interview by Kevin Franklin and Karen Rodrigues. Aside from these, what remains are just references to new media work by a handful of artists and, perhaps most famously, a proposal for funding by well-known new media theorist Lev Manovich, in which he outlines his vision in this trajectory.
It is no secret that cultural analytics, as defined and practiced by Manovich, grew out of a large National Endowment for the Humanities grant. In an interview he gave on the topic, Manovich said, “[my] idea of cultural analytics is related to the NEH Digital Humanities Initiative recently announced "Humanities High-Performance Computing" (HHPC) initiative, but there are some important differences.” In the interview Manovich went on to distinguish his methods from a previously NEH funded initiative in several ways: (1) he is not interested in “past cultures (the traditional domain of humanities), but in contemporary cultural areas;” (2) while others have focused on text, “he plans to focus on visual media;” and (3) “building on the exciting work in visualization done today both by scientists and by artists and designers, I want to use this work as an interface for computational analysis.” Like database-driven publishing platforms such as Vectors journal and Scalar, whose aim is to produce digital scholarship, the tool developed in this research project, ImagePlot, similarly aims to visualize collections of images and video of any size for the production of digital scholarship, specifically in the field of visual arts and culture.
While one of the intentions of Manovich’s project in cultural analytics is to create an interdisciplinary interface/tool, the focus of his work is firmly rooted in the humanities where the research receives its funding. Yet, a lack of balance between interpretive debate and technological production of the research flattens the significance in his analyses. For example, one of his early projects, One Million Manga Pages, produced groundbreaking visualizations of millions of Japanese manga pages that were collected from an online scanlation site: OneManga.com. However, the analysis presented did not provide much more information about the history of manga’s in Japan, nor the significance of this digital copying phenomenon in growing fan culture, or attempt semantic analysis of the these comics.
And this is where I believe more contextual analysis combined with computer analytics can provide scholarship that produces studies in both context and genealogy, while remaining able to analyze the digital structure itself across its “reproducibility” and “spreadability.” In circumstances and conditions where technical and formal restraints make rhetorical rigor and evocative description challenging, I suggest the alteration of preexisting forms of presentation and/or generation of new forms.
Just a few days ago, Manovich published a nuanced reading of social media that builds on his earlier work. In his latest piece, “Data stream, database, timeline (part 1),” he described a shift from analyzing databases to analyzing data streams. “I want to suggest that in social media, as it developed until now (2004-2012), database no longer rules. Instead, social media brings forward a new form: a data stream. Instead of browsing or searching a collection of objects, a user experiences the continuous flow of events,” argues Manovich. And though I do not necessarily adhere to the same scope and definition of analyzing large-scales of cultural data as Manovich does, I do agree that his pioneering efforts are critical as we move forward into progressively greater scales media production.
I caution future researchers to remain mindful of checks and balances between contributions by culture theorists and technologists. In his article on the subject titled “What is Cultural Analytics?” Kurt Ralske offers a description may be seen as an imposition of the scientific method onto the humanities, implying that quantitative analyses can provide more accurate, meaningful, and insightful commentary than qualitative analysis.
Meaningful measurements can be developed to help describe the qualities of a work, which may be invoked to support a theoretical position. For example: one can reasonably theorize that, ‘the films of director Michelangelo Antonioni are slower than those of Frederico Fellini.’ Cultural analytic techniques could verify this hypothesis conclusively, and even provide a specific metric to quantify the exact difference in speed between the oeuvres of the two directors.
This overt bias overlooks the cultural nuances of human expression and decontextualizes data, thereby endangering the accuracy of the analysis and strongly biasing the results. I argue that such methodological bias renders the quantitative approach equally subjective as the qualitative. It may be useful to note that several quantitative methods of media analysis have failed in interesting ways that offer valuable insight through practice-based research. In fact, the gross inaccuracies themselves have led to new findings about using old methods of analyses on streams of social media content.
Identity and Sentiment Analysis
One example of an interesting failure in sentiment analysis occurred over a year ago in Cairo. In preparation for a conference at the American University in Cairo in June 2011, I prepared sentiment analyses (English and Arabic tweets separately) on Egyptian likely presidential candidates based on a month of Twitter posts that included the hashtags: #Jan25 and #Egypt. Abdel Moneim Abou El-Fotouh, Hisham Bastawisi, and Amr Moussa were among the candidates who remained in the race through the first round, while candidates who were referenced in larger volumes of posts, such as Mohamed el-Baradei and Naguib Sawiris, dropped out earlier.
While the volume of tweets on each candidate is a precise representation of the data sampled, the accuracy of the sentiment values in these graphs remains questionable. The same technical process to determine sentiment was implemented in these charts on the English language tweets within the sample data where presidential candidate Mortada Mansour came out with a 100% positive sentiment. Anyone who had been following the Egyptian media scene knew such a statistic was wildly inaccurate since Mansour was both a notorious, controversial character and was perceived by the majority of Egyptians as a buffoon. Since most of the positive sentiment expressed in tweets reflected positive expressions about Mansour in jest or parody, the tool and process used to determine sentiment clearly failed to detect sarcasm and tonality.
Since then, I have developed my research with R-Shief.org to find new methods to analyze sentiment. Using its immense lexicon system, R-Shief is embarking on a project to crowdsource sentiment, semantic, and dialect tags in Arabic to extend the localization of semantic tools in a free and open source model in 2013. By then, R-Shief plans to provide a semantic open source API for Arabic as a critical building block and tool for networking and analysis in Arabic online.
Fig 2. This sentiment analysis was conducted on English tweets posted in June 2011 with the
hashtag #Jan25. The size of the bubbles represents the volume of tweets.
As I heard Edward Tufte aptly say in a keynote address on data visualizations: “you will always find what you are looking for.” In Figure 2, I went looking for public sentiment and found it. However, the results indicated that method of research was blind to important semantic context. At that point I returned to exploring the landscape of social media, not through an aggregated set of procedures or tools that required a top-down approach, but through a more intuitive sense of “knowledge discovery” while building new tools and documenting what I find along the way.
One of the most timely and comprehensive research efforts to study the impact of social media, specifically on the Arab world, comes from the Dubai School of Government, which has published three series in their first volumes of their Arab Social Report in January, May, and November 2011. In their first report on “Facebook Usage: Factors and Analysis,” Racha Mourtada and Fadi Salem provide nineteen informative charts about the demography of Facebook users, and a breakdown of usage across the Middle East and North Africa by country. They sample data from April 22 to December 21, 2010 from Facebook Data team. This top-down approach to data mining allows you to find the answers you are looking for—the usage of Facebook by male or female gender, age groups, and nation states. However, these ways to breakdown identity leave no room for hybridity and shifting identities.
By the time Mourtada and Salemey write the third in the series, “The Role of Social Media in Arab Women’s Empowerment,” they start discussing netizens in more detail. However, the analytic approach to their data sets is unable to account for transnational netizens, nor for those who do not already fit into the predetermined categories. Each of their charts provide demographic breakdown by country.
In the second report in the series, “Civic Movements: The Impact of Facebook and Twitter” (Mourtada, Salem, May 2011), Twitter analytics are introduced and build upon the previous Facebook research. It seems more apparent in the second publication that their intention is to make sense of Twitter by recoding it into geo-located information. With a sample of about 10,000,000 tweets and among 190,000 Twitter users, “estimating the size of a Twitter population was a simple two-step process: capture a number of samples (or «sweeps») of users from each country, and use a mark-recapture based technique to compute a population estimate." The double step of recoding is a bit of work to make sure the data is expressed in terms of nation states and populations.
Comparatively, a similar study conducted in 2011 footnotes: “Outside Country” refers to Twitter profiles that had locations outside both the country and the region, and “No location” refers to profiles that either had no location data or had been deleted or suspended since archiving began. The blue bar indicates the period in which journalists began reporting that protests had reached the level of “thousands” of participants.
Another publication on digital media is a 31-page report by the Center for International Media Assistance (CIMA) and the National Endowment for Democracy (NED) on “Digital Media in the Arab World: One Year After the Revolutions.” The author, Jeffrey Ghannam, conducts his research through a series of “35 interviews in person, by telephone, e-mail, and Skype: primary and secondary documents; commentaries; websites; blogs, and other sources.” At the end of his overly optimistic report, Ghannam concluded that “social media’s potential represents the brightest hope for the greater freedom of expression in the Arab region, enabling tens of millions of people, and ultimately many more, to actively pursue civic engagement, free and fair elections, political accountability, the eradication of corruption, as well as free, independent, and pluralistic media in a rapidly changing media environment.” It seems to me to be glaring, but I cannot give much weight to an investigation on “digital” media that does not even employ any analysis of the primary stock of digital information. It is like publishing on France without knowing French. His choice of interviewing 35 key individuals, in my opinion, does not speak to the scale of the research question he poses. It also negatively reinforces the very top-down authority structures that these revolutions resist.
Cultural Research in the Petabyte Age
In 2008, WIRED magazine recounted the history of digital computation (2008):
Sixty years ago, digital computers made information readable. Twenty years ago, the Internet made it reachable. Ten years ago, the first search engine crawlers made it a single database. Now Google and like-minded companies are sifting through the most measured age in history, treating this massive corpus as a laboratory of the human condition. Welcome to the Petabyte Age. The Petabyte Age is different because more is different. As we moved along that progression, we went from the folder analogy to the file cabinet analogy to the library analogy to—and, at petabytes we ran out of organizational analogies. At the petabyte scale, information is not a matter of three- and four-dimensional taxonomy and order but of dimensionally agnostic statistics.
The Petabyte Age calls for an entirely different approach to cultural research. This approach requires us to stop thinking about data as something that can be visualized in its totality. Instead, it requires us to understand the data mathematically first, and only later can we begin to ask questions about context: people/place/time of data production. Let me explain what I mean by this.
In order to produce a database, bits of information are algorithmically processed and fitted into a database structure that enables the data to be “read.” This initial computational processing does not do much to account for the context of the production of the originary information. Basically, it might tell us about the patterns of information but very little about the meaning of this information.
When researching social media, for example, parameters such as time stamping or geolocation  offer pointers to the context of initial production, but in creating the database, the virtual bits are dramatically decontextualized. The database creation process simply cannot easily record the salient conditions of the production of the elements. Walter Benjamin might say that the “aura” of the data is unrecordable.
For this reason, I would argue that social media platforms—such as Twitter, Facebook, and Flickr—do not indeed capture social context. Although they are “authored” by those who are part of the literate world and disseminated to the world through the Internet, the data from these social media are strangely bloodless. And if this is the case, then the question remains: what do these community practices and technological affordances capture?
By the end of first decade of the 21st century, we have clearly moved from the world of “new” media to a world of “more” media. When we reached 2011 and the Egyptian revolution—which many hailed as the revolution brought about by Facebook—the ubiquity of computers, digital media software, and computer networks had led to an exponential rise in the numbers of cultural producers worldwide. No longer simply a matter of the rise of new media production in new global contexts, these social media platforms served as the database architectures for the accumulation of data on a scale heretofore unknown. With the dramatic increase in the scale and scope of data-making—where the data now takes shape as “tweets,” “likes,” “status updates,” and “shared links”—it is extremely difficult, if not impossible, to understand the relationship between the production of data and original social contexts. In short, I argue that this shift in data production makes it difficult to truly understand global cultural developments and dynamics in any substantial detail using 20th century theoretical tools and methods.
Fig 3. Network analysis of 500,000 tweets on #Syria by 60,000 users over 23 days in November 2011. Color key: 59% Arabic (Green), 30% English (Blue), 2.25% Hindi (Gold), 1.6% French (Red), .7% Urdu (Purple) and .6% Finnish. Everything else (Farsi, Spanish, Italian, Portuguese) is less than .5%. Conducted by R-Shief, Inc.
Visualizing the “Now”
Philosophical underpinnings to the nature of a virtual world are neither new nor revelatory; nor does this argument purport that the “what” that is being expressed online in the digital world is necessarily representative of what happens on the ground. In places like Egypt where literacy rates only reach sixty-six percent, analyses of Internet penetration bear less weight. However, elements of the virtual become actualized under unique, local, temporal conditions that cannot be predicted. They only happen in the “now.” Approaching this logic from a visual arts lens, as Laura Marks does in her book, Enfoldment and Infinity: An Islamic Genealogy of New Media Art, she traces new media art along a unique historiography of Islamic thought from the birth of the algorithm in ninth century Iraq through fifteenth century Islamic mysticism and neoplatonism, or “beginnings of virtual reality.” One of the critical points Marks builds upon is a notion of events in time as unique and foldable, and therefore, transformative. Taking it one step further, similarly, Kant’s 19th century notion of the “sublime” event can be transformative. For an event to be transformative, it relies on unpredictable conditions. In other words, the act of Bouazizi lighting himself on fire in Tunisia was as sublime as it was horrible. The data visualizations below seek to illustrate and improve our understanding of the sensibilities and cultural logic(s) that are being expressed by the people on Twitter, Facebook, and Flickr.
Fig 4. Network visualization of tweets on #Egypt and
#Syria during July 2011. Conducted by R-Shief, Inc.
Rosetta Stones from Cyberspace (Fig. 3 and 4) and the mosaics (Fig. 5 and 6) are sets of images created by writing computer code. These images represent the immense and unwieldy social media activity during the month of July 2011. They draw upon one of the most rare and comprehensive archives of social media content from the Arab revolutions—R-Shief.org. This growing repository urgently seeks to critique historical information on the contemporary Arab world—information currently under siege, in real time and place.
I have been collecting and organizing content from all over the Internet in Arabic and English since 2008 (added Persian, French, German, Spanish in 2011, and soon Urdu)—this project is about archive. And then the revolutions began. As the revolutions and intifadas (Arabic for “uprising) spread throughout the Arab World, R-Shief’s technology was immediately employed to capacity. The sheer speed and size of this substance is necessarily remarkable. My intent to offer a new perspective on the microcosms within macrocosms of this world, somewhat unfinished, and unknowable—to bring these complex representations of movements of millions/billions of people with many leaders together in a way that allows one to experience through the senses what cannot be processed cognitively/rationally.
The image mosaics are a series of images created by writing computer code. They represent the immense and unwieldy social media activity from 2011 to 2012. Figure 5, “Riot Smoke,” is comprised of profile thumbnails from the top 987 Twitter users contributing posts on the hashtags #abbasseya #abasiya #abassiya and Flickr photos with the same keywords “Abassiyya.” The original photograph was taken by Jonathan Rashad, which he makes available on his Flickr stream under creative commons licensing. The caption reads: “6230 Riot CS Smoke, produced by US-based firm `Combined Tactical Systems`. Used by police-backed `thugs` on May 2 against protesters in Abbasiya near Egypt`s Ministry of Defense.”
Fig. 5. Created by VJ Um Amel, 2012. [Large view].
In Figure 6: “Walk Like an Egyptian,” I gathered profile thumbnails from the top 1,199 members contributing to the Facebook page, "We are all Khaled Said," from January 2011 to February 2012. (Facebook analyzed using R-Shief`s tools.) Applying procedural techniques to manipulate color, chroma, luminosity, scale, opacity, direction, of this large scale of information, I composited the thumbnail images to imitate the popular meme of the man holding up a sign that reads, “Kefeya” (the Arabic word for “Enough”).
These are expressively artistic interpretations of particular moments. The theoretical form of these visual expressions is intentional—one signal image is never the totality of the moment. Instead, such iconic images come to stand for an infinite number of visual memories, some recorded, most not. The use of the mosaic mode of “assemblage” is intended to capture this notion of the infinite, reiterative algorithmic form of any single visual expression.
These mosaics demonstrate yet another layer of encoding and decoding of the data. In response to the fetishizing of technology, or data, or the Arab “Spring,” they represent a secret world of code in an abstract, algorithmic aesthetic, blown up and situated in and out of time. The mosaic images are not literal representations of this body of text; they are a stand-in, a metonym for it. Thus, the aesthetics of the work I am proposing also trace back to choices made while creating the archive—understanding not only the text within the archive, but that the archive itself is a text is also imperative.
Fig 6. Created by VJ Um Amel, 2012. [Large view].
“What does all this mean?”
Though many people are positively intrigued by this digital arts-based research, they are often left with questions and repeatedly ask me—What does all this mean? Can you pinpoint people and predict events? By the summer of 2011, it had become clear to me that the fetishization of data had consumed many researchers to the detriment of the substance of what was being communicated, and the phenomena being investigated and described. What I am tempted to resist are overzealous prejudiced arguments that overstate the value of quantitative objectivity and accuracy to an extent that rhetorical indicators are flattened and there is no longer room for critical interventions.
In a Foucauldian sense, the project of archiving tweets, posts, and comments became about recording traces of the genealogy in the online media—username of who posted, profile picture of user, various user settings (including language, gender, age, etc), when posted, latitude/longitude from where posted (with permission), from which server posted, which device was used, the tags of post, the title of post, the subtitle of post, and eventually the post itself. My contributions to the cultural production stem from an amalgamation of R-Shief’s database material and visual making through the processes of design, animation, illustration, compositing, lighting, performance, and programming. In the end, what is produced from this technologically-based art practice is editing the multitude of fragments intersecting over time—remixing the data stream.
Rather than attempting the impossible task of understanding the totality of what Arabic speakers are producing online—this archival process collects modest percentages of data and from within defines scopes of analysis. In other terms, then, discontinuity is a bifurcation, a switch from one virtual pattern to another. Critical research and artistic practices are all tied up together—they inform each other. I am suggesting an interdisciplinary approach to questioning and learning that incorporates an art research methodology. Research is the praxis of systematic critical reflection that focuses on compelling questions. And these questions can only be investigated when we unsettle the very tools we use to examine them.
 Timothy Mitchell, Colonising Egypt (Cambridge University Press, 1988), 35.
 Technology studied outside of the sciences must also face a fear among many academicians. As Franklin and Rodriguez introduce their argument, “Hypertext. Hypermedia. High Performance Computing." It`s enough to make a humanities scholar hyperventilate. A debate has raged in the last decade (at least) about whether or not the Digital Age will see the death of The Book, The Library and perhaps, The Humanities more broadly.” Franklin and Rodriguez.
 For example in one month alone (April 2012), more than 80% of the tweets that used the English-language hashtags #Tahrir and #Jan 25 were written in Arabic. More than 95% of tweets using related Arabic hashtags were written in Arabic.
Franklin, Kevin D. and Karen Rodriguez, “The Next Big Thing in Humanities, Arts and Social Science Computing: Cultural Analytics,” HPC Wire, July 29, 2008.
 Manovich, Lev. “Data stream, database, timeline (part 1).” October 27, 2012. http://lab.softwarestudies.com/2012/10/data-stream-database-timeline-new.html
 Ralske, Kurt. “What is Cultural Analytics?” July 2010.
 At the American University of Cairo’s Access to Knowledge for Development Center’s annual conference, I presented sentiment analysis I created using IBM tools through research under USC’s Annenberg Innovation Lab.
 In September 2011, Edward Tufte gave the keynote address at the Tech@State conference on Data Visualization.
 Mourtada, Racha and Fadi Salem. (May 2011). Civil Movements: The Impact of Facebook and Twitter. The Arab Social Media Report. Dubai School of Government. Vol. 1, No. 2.
 Ghannam, Jeffrey. (28 March 2012). Digital Media in the Arab World: One Year After the Revolutions. A Report to the Center for International Media Assistance.
 Anderson, Chris. “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete.” Wired Magazine. June 23, 2008.
 A timestamp is the time at which an event is recorded by a computer, not the time of the event itself. And geolocation is the identification of the real-world geographic location of an object, such as a radar, mobile phone or an Internet-connected computer terminal. However, when it comes to Twitter, less than 1.5% of all users allow geolocation to be active on their devices.
 According to UNICEF’s statistics by country, the total adult literacy rate (%), 2005-2010 in Egypt is 66%.
 According to Social Bakers, among Internet users in Egypt, the total number of FB users is reaching 10.7 million, which translates into a Facebook penetration rate of 13.26%.
 In The Archaeology of Knowledge, Michel Foucault attempts to define a method of historical analysis that is free from the formations of knowledge making. This debate on structuralism is aimed to question teleologies and cultural totalities…in the histories of ideas, science, philosophy, thought, and literature, the focus is on different types of ruptures and discontinuities, of "transformations that serve as new foundations."
Franklin, Kevin D. and Karen Rodriguez, "The Next Big Thing in Humanities, Arts and Social Science Computing: Cultural Analytics,” HPC Wire, July 29, 2008.
Ghannam, Jeffrey. (28 March 2012). Digital Media in the Arab World: One Year After the Revolutions. A Report to the Center for International Media Assistance.
Jameson, Frederic. Postmodernism or, the Cultural Logic of Late Capitalism (Duke University Press, 1991).
Manovich, Lev. "Cultural Analytics: Analysis And Visualization Of Large Cultural Data Sets." A Proposal From Software Studies Initiative @ Calit2, September 30, 2007.
Manovich, Lev. “Data stream, database, timeline (part 1).”
Marks, Laura U. Enfoldment and Infinity: An Islamic Genealogy of New Media Art (2010). Cambridge, MA: The MIT Press.
Mitchell, Timothy. Colonising Egypt (Cambridge University Press, 1988).
Mourtada, Racha and Fadi Salem. (May 2011). Civil Movements: The Impact of Facebook and Twitter. The Arab Social Media Report. Dubai School of Government. Vol. 1, No. 2.
Mourtada, Racha and Fadi Salem. (January 2011). Facebook Usage: Factors and Analysis. The Arab Social Media Report. Dubai School of Government. Vol.1 No.1.
Mourtada, Racha and Fadi Salem. (November 2011). The Role of Social Media in Arab Women’s Empowerment. The Arab Social Media Report. Dubai School of Government. Vol. 1, No. 3.
Pang, B., & Lee, L. (2008). “Opinion Mining and Sentiment Analysis. Foundations and Trends” in Information Retrieval, 2(1-2), 1-135.
Ralske, Kurt. “What is Cultural Analytics?” http://retnull.com/index.php?/on-cultural-analytics, July 2010.
Tufte, Edward. Data Analysis for Politics and Policy (1974). Englewood Cliffs: Prentice-Hall.