The notion of computers invading our privacy is a hot topic in today’s news. The internet is littered with warnings about the NSA digging into our lives and the IRS has acted on the data in a most alarming way. The U.S.A Patriot Act (2001), in an attempt to fight terrorism on U.S. soil has weakened the Fourth amendment of the U.S. Constitution which protects people from “unreasonable searches and seizures” and requires a warrant backed by “…[o]ath or affirmation…” that lists particular “…places to be searched, and…things to be seized” (Bill of Rights, 1791). The NSA’s methods, leaked by former employee Edward Snowden and posted on Wikileaks give a clear view of the type and scope of information our government is currently collecting (Washington Post, 2013). With the amount of available information we create by our web activity that can be found or “stolen” the question comes to mind: How is our privacy compromised by data mining used during internet transactions?

One way to find out more about data mining is to follow the money. The Patriot Act appropriated twenty million dollars and established the National Infrastructure Simulation and Analysis Center (NISAC) to create virtual models of critical infrastructure (2001). Critical infrastructure includes roads, bridges, waterways, food supplies, communications, government offices and economic features such as the stock market. The data for the models is acquired “…from state and local governments and the private sector…” and includes any “…data necessary to create and maintain models of such systems ….” The compromise in privacy occurs as the data is “seized” by the government in a search for things that can hardly be described as “warranted” by the U.S. Bill of Rights.

The U.S. Government has a number of programs that mine data from internet transactions internationally and within the country. According the NSA slide leaked to the press (Washington Post, 2013; Fig. 1), their names are FAIRVIEW, STORMBREW, BLARNEY, OAKSTAR and PRISM. The names are not important except for enumeration. We can see that they all collect information. The first four programs intercept transactions from “fiber cables and infrastructure as data flows past.” This method is commonly known as a wiretap. Under the Patriot Act (2001), this is legal during the “war on terrorism.” PRISM does something completely different. We can compare the two methods this way: If the government wants to collect the information from every person in the country then they would need to store it in a container the size of every computer, server and storage device combined and add some more equipment for calculations and processing. There are reports of a giant mega-database being constructed in Utah but so far the NSA denies it exists (Jenkins, 2012). Alternatively, the object of PRISM is to collect preprocessed data from the experts at Google, Yahoo, Amazon and others. PRISM eliminates the storage and processing problem by using data sets that only need a little more processing to identify a single person (Li & Sarkar, 2013).

Figure 1 Slide leaked by Edward Snowden showing collection points for various government data mining programs Source:

The data sets PRISM can use to save time and energy are generated by programs such as Amazon’s patented method (Benson, Jacobi & Linden, 2006) for producing interesting shopping choices to post on our Facebook pages or send via email (Figure 2). Amazon recommends products based on a consumers previous purchases or from items that have been the subject of a search. The items are weighted by values such as consumer reviews or whether the item was purchased. Tables are developed from many consumers shopping history lists and these tables are used to create a list of personal recommendations. The government can use this information combined with upstream information such as emails and phone messages to provide a very clear picture of who we are and what we are doing.

Figure 2. From Amazon’s patent for generating personal shopping profiles. Source:

Data mining and processing on a grand scale are subjects we should be quite familiar with. The idea of computers infiltrating our lives was forwarded by Dennis Feltham Jones in his book Colossus (1967).The story describes a giant super computer that “decides” to team up with its Russian counterpart and take over the world. Resistance is quickly answered with nuclear missile launches and threats of “…bodies lying unburied” (233). The book was adapted to film, spawning a genre that survives today wherein the computer villain can always adapt and make predictions about what people will do next. The scariest part is: In Colossus the computer is powered by a nuclear device that can last for a thousand years. Our current fears of losing our privacy are driven by this threat of dominance that cannot be “unplugged” by the snarky engineer in the corner.

There is an interesting parallel between the all-seeing nuclear powered super-computer and the internet. In a way we find ourselves “hard to unplug” from the vast network of technology that can monitor our every move (Anthony, 2013). While internet transactions can expose us to privacy intrusion, cell phones can be used to predict our movements twenty four hours in advance by analyzing how many calls we make to a friend and correlating movement patterns (Anthony, 2013).

This application could also be used to suggest which clothes to wear, which friends are in the area or point out a restaurant and include a coupon. It cannot predict a crime but it can predict where criminals will be and place the police within a twenty meter area.

With the endless array of ingenious applications aimed at infiltrating our lives it seems that expectations of privacy are futile. Firewalls can be hacked through, or around. Even top-secret levels of government aren’t safe (Mims, 2012). The anti-virus systems most of us use on our home PC’s need internet access to function and this leaves a trail of bread crumbs that any novice hacker can follow (Byres, 2013; Jenkins, 2012). Is there anything that can protect us from privacy intrusions during internet transactions?

In extremely high-risk systems, such as nuclear reactors, an air gap is used to insulate control systems from internet intrusion. In this method, all connections to the internet are eliminated. The insulated systems could be considered private. However, in a recent article on the subject in Communications of the ACM, Eric Byres, chief technology officer at Tofino Security in British Columbia states that the air gap is a “myth” (2013). Byres believes: air gaps may exist in a nuclear plant or his home heating system but not anywhere else. The extent of internet connectivity in industrial control systems used to run a manufacturing plant or our automobiles can be illustrated by how Sean McGurk, the former director of the Cyber-security and Communications Integration Center (NCCIC) at the U.S. Department of Homeland Security explained the problem: “In our experience in conducting hundreds of vulnerability assessments in the private sector, in no case have we ever found the operations network, the [supervisory control and data acquisition] SCADA system, or energy management system separated from the enterprise network” (qtd. in Byres, 2013, p. 31). This means that the administration, the manufacturing systems and robots and the heating and air systems all have connections to the internet; without exceptions. In other words: The private sector is not private.

The SCADA system takes a great place in the history of internet privacy intrusion. SCADA is a windows based application that allows engineers to communicate with industrial manufacturing machines (Langner, 2011). SCADA also provided a “window” for the Stuxnet virus to sneak through and wreck the Iranian nuclear program. In this case, the virus jumped the air gap using an engineer’s laptop or thumb drive. The first cyber-warfare attack to cause real material damage simply walked in the front door (Langner, 2011).

The only way to keep a secret is: keep it to yourself. There are very few instances of real privacy, especially in remote communication such as a phone call, text or email or other internet transaction. The contents in all of these can be discovered. If it is being transmitted by air or over the wire, it can be intercepted and eventually decrypted (Jenkins, 2012).

In November 2012, a worldwide contest for code breaking occurred after a man in Surrey, England found the remains of a WWII carrier pigeon while renovating his chimney (Robeson, 2012). The message strapped to its leg was turned in to British government code breakers. Within a few weeks they deemed the code unbreakable and turned it loose to let the public take a crack at it. Less than a month later, a Canadian team had transcribed the message (Code Breakers, 2012). Security methods can slow down the process of intrusion and make it manageable but they cannot keep secrets forever.

87% of the people in the U.S. can be uniquely identified with three public attributes: gender, date of birth, and five-digit zip code

Another way to look at how our privacy is affected by internet transactions is to examine our definition of privacy. In a recent study by Bridgette Wessels at the University of Sheffield (2012), students and professionals responded to a survey regarding internet privacy. Both students and professionals mirrored the study premise that privacy is treated differently in social networking, banking, medical or shopping transactions. A key element in the study incorporates the intersection of our social and online identities with the need for development of processes that keep these identities separate to insure privacy (Figure 3). The theoretical privacy gap seems clear in the picture but it is compromised because the identities we project in different environments share identifying attributes. For example: 87% of the people in the U.S. can be uniquely identified with three public attributes: gender, date of birth, and five-digit zip code (qtd. in Li et al., 2013). By adding a personal shopping history from Amazon, the “privacy gap” can be eliminated.


Figures 3a, 3b. Theoretical model showing privacy gap and Venn diagram showing identity disclosure from data mining. Source: Author

Xiao-Bai Li and Sumit Sarkar present a method for protecting privacy while allowing aggregated data to be available for predicting trends (2013). This type of method combines data mining and internet privacy into mutually agreeable concepts. Li et al. offer a technique backed by strong math that takes the sociological variances involved and attempts to match the intricacies of public courtesy with the hard edged delineations in electronic data. According to proof from testing, current methods have drawbacks that cause collateral security leaks while making data unusable for research (Li et al., 2013). Primarily, there is the disclosure risk that occurs when attributes are clustered inside a class or group (Figure 2). Additionally, current data masking used to hide data like credit card numbers skews the results of infrastructure or sociological studies that need lots of accurate numbers. The authors protect secrets by clustering non-confidential information and making sure the confidential information is well distributed. Scattering the private data makes it harder to access. Using microperturbation, the clustered non-confidential information can be used to project a virtual identity that is unrelated but can be used for scholarly research. Perterbation refers to methods for seeing details from outer space or making a fuzzy photograph become focused (Lin, Chen & Shih, 2003). Using Li et al.’s method we can create virtual population models that can be mined for information which is un-biased. However, at the digital level there is no correlation that can compromise our privacy. In other words:

A complete and accurate real-time model of our population and infrastructure can be built and used to plan highways, raise crops, fund the military or catch terrorists without invading our privacy

Invariably the question of whether we are adversely affected by privacy intrusions from data mining rotates around two issues: The intent of the miners and the quality of the mining. By creating robust programming in conjunction with strong societal studies, the traditional ethical approaches to engineering and government can prevail. We can offer the red shoes instead of green; catch the junior malfeasants in their first lock-picking class and offer algebra instead.


Anthony, S.(21 August 2012) . Pre-crime creeps closer to reality, with predictive smartphone location tracking. Extreme Tech. Retrieved from:

Benson, E. A., Jacobi, J. A., Linden, G. D. (26 September 2006) Personalized recommendations of items represented within a database. Google patents. Retrieved from:

Bill of Rights. (15 December 1791). The First congress. Retrieved from:

Code Breakers Finally Crack The Mysterious Cypher On The WWII Carrier Pigeon. (16 December 2012). Business Insider. Retrieved from:

Jenkins, H. W. & Jr. (24 July 2012). Jenkins: Can data mining stop the killing? Did the National Security Agency capture James Eagan Holmes’s transactions in cyber space? If not, why not? Wall Street Journal (Online). Retrieved from:

Jones, D.F. (1967) Colossus. New York: Putnam

Kurose, J; Ross, K. (2013). Computer networking : a top-down approach. Boston: Addison-Wesley

Langner, R.( 2011). Stuxnet: Dissecting a Cyberwarfare Weapon. IEEE Security and Privacy, 9(3), 49-51

Lin, Yao-Min; Chen, Chih-Kuang; Shih, Yu-Sen. (25 August 2003) Position micro-perturbation device. Google patents. Retrieved from

Mims, C. (21 November 2012). French News Report: U.S. Government Hacked Into French Presidential Office. The Atlantic. Retrieved from:

Robeson, S. (23 November 2012). Government’s top code-breakers left stumped by wartime carrier pigeon’s secret note appeal for public’s help to decipher message. Mail Online. Retrieved from:

Uniting and Strengthening America by Providing Appropriate Tools Required to Intercept and Obstruct Terrorism Act of 2001. (26 October 2001). Pub. L. No. 107-56

Washington Post (09 July 2013). NSA Slides explain the PRISM data collection program.Washington Post (Online). Retrieved from:

Wessels, B. (December 2012). Identification and the practices of identity and privacy in everyday digital communication. New media & society(1461-4448), 14 (8), p. 1251. Retrieved from:

Xiao-Bai Li, Sumit Sarkar . Class-Restricted Clustering and Microperturbation for Data Privacy. (2013). Management Science, Volume 59, Number 4 pp. 796-812. retrieved from: