Monday, September 1, 2014

STUXNET: ANATOMY OF A CYBER WEAPON



This is the first of a focused two part discussion of the threats and challenges involved with cyber security.  The exploration of cyber threats and challenges is conducted using the Stuxnet attack as a lens.  The following post picks up with an allegorical analysis of the cyber threat posed by nation-state attacks as well as ideas about how information systems can be built so that they are less tempting targets.

Stuxnet is widely described as the first cyber weapon.  In fact, Stuxnet was the culmination of an orchestrated campaign that employed an array of cyber weapons to achieve destructive effects against a specific industrial target.  This piece explores Stuxnet’s technology, its behavior and how it was used to execute a cyber-campaign against the Iranian uranium enrichment program.  This discussion will continue in a subsequent post describing an orthogonal view on the art and practice of security – one that proposes addressing security as a design-time concern with runtime impacts.

Stuxnet, discovered in June 2010, is a computer worm that was designed to attack industrial programmable logic controllers (PLC). PLCs automate electromechanical processes such as those used to control machinery on factory assembly lines, amusement park rides, or, in Stuxnet’s case, centrifuges for separating nuclear material.  Stuxnet’s impact was significant; forensic analyses conclude that it may have damaged or destroyed as many as 1,000 centrifuges at the Iranian nuclear enrichment facility located in Natanz.   Moreover, Stuxnet was not successfully contained, it has been “in the wild” and has appeared in several other countries, most notably Russia.

There are many aspects of the Stuxnet story, including who developed and deployed it and why.  While recent events seem to have definitively solved the attribution puzzle, Stuxnet’s operation and technology remain both clever and fascinating. 

A Stuxnet attack begins with a USB flash drive infected with the worm.  Why a flash drive?  Because the targeted networks are not usually connected to the internet.  These networks have an “air gap” physically separating them from the internet for security purposes.  That being said, USB drives don’t insert themselves into computers.  The essential transmission mechanism for the virus is, therefore, biological;  a user.   

I’m tempted to use the word “clueless” to describe such a user, but that wouldn’t be fair.  Most of us carbon-based, hominid, bipedal Terran life forms are inherently entropic – we’re hard-wired to seek the greatest return for the least amount of effort. In the case of a shiny new flash drive that’s just fallen into one’s lap, the first thing we’re inclined to do is to shove it into the nearest USB port to see what it contains.  And if that port just happens to be on your work computer, on an air-gapped network. . .well, you get the picture.

It’s now that Stuxnet goes to work, bypassing both the operating system’s (OS) inherent security measures and any anti-virus software that may be present.  Upon interrogation by the OS, it presents itself as a legitimate auto-run file.  Legitimacy, in the digital world, is conferred by means of a digital certificate.  A digital certificate (or identity certificate) is an electronic cryptographic document used to prove identity or legitimacy.  The certificate includes information about a public cryptographic key, information about its owner's identity, and the digital signature of an entity that has verified the certificate's contents are correct.  If the signature is valid, and the person or system examining the certificate trusts the signer, then it is assumed that the public cryptographic key or software signed with that key is safe for use.

Stuxnet proffers a stolen digital certificate to prove its trustworthiness.  Now vetted, the worm begins its own interrogation of the host system. :  Stuxnet confirms that the OS is a compatible version of Microsoft Windows and, if an anti-virus program is present, whether it is one that Stuxnet’s designers had previously compromised.  Upon receiving positive confirmation, Stuxnet downloads itself into the target computer.

It drops two files into the computer’s memory.  One of the files requests a download of the main Stuxnet archive file, while the other sets about camouflaging Stuxnet’s presence using a number of techniques, including modifying file creation and modification times to blend in with the surrounding system files and altering the Windows registry to ensure that the required Stuxnet files run on startup.  Once the archived file is downloaded, the Stuxnet worm unwraps itself to its full, executable form.

Meanwhile, the original Stuxnet infection is still on the USB flash drive.  After successfully infecting three separate computers, it commits “security suicide.”  That is, like a secret agent taking cyanide to ensure that she can’t be tortured to reveal her secrets, Stuxnet deletes itself from the flash drive to frustrate the efforts of malware analysts.

Internally to the target computer, Stuxnet has been busy.  It uses its rootkit to modify, and become part of the OS.  Stuxnet is now indistinguishable from Windows; it’s become part of the computer’s DNA.  It’s now that Stuxnet becomes a detective, exploring the computer and looking for certain files.  Specifically, Stuxnet is looking for industrial control system (ICS) software created by Siemens called Simatic PCS7 or Step 7 running on a Siemens Simatic Field PG notebook (a Windows-based system dedicated for ICS use).  

The problem facing Stuxnet at this point is that a computer can contain millions, if not tens of millions, of files and finding the right Step 7 file is a bit like looking for a needle in a haystack.  In order to systematize the search, Stuxnet needs to find a way to travel around the file system as it conducts its stealthy reconnaissance.  It does this by attaching itself to a very specific kind of process.:  One that is trusted at the highest levels by the OS and that looks at every single file on the computer.  Something like. . . 

. . .the scan process used by anti-virus software.  (In the attack on the facility in Natanz, Stuxnet compromised and used the scan processes from leading anti-virus programs.  (It’s worth noting that all of the companies whose products were compromised have long since remedied the vulnerabilities that Stuxnet exploited.)  Along the way, Stuxnet compromises every comparable process it comes across, pervading the computer’s memory and exploiting every resource available to execute the search.  

All the while, Stuxnet is constantly executing housekeeping functions.  When two Stuxnet worms meet, they compare version numbers, and the earlier version deletes itself from the system.   Stuxnet also continuously evaluates its system permission and access level.  If it finds that it does not have sufficient privileges, it uses a previously unknown system vulnerability (such a thing is called a “Zero-Day,” and will be discussed below) to grant itself the highest administrative privileges and rights.    If a local area network (LAN) connection is available, Stuxnet will communicate with Stuxnet worms on other computers and exchange updates – ensuring that the entire Stuxnet cohort running within the LAN is the most virulent and capable version.   If an Internet connection is found, Stuxnet reaches back to its command and control (C2) servers and uploads information about the infected computers, including their internet protocol (IP) addresses, OS types and whether or not Step 7 software has been found.

As noted earlier, Stuxnet relied on four Zero-Day vulnerabilities to conduct its attacks.  Zero-Days are of particular interest to hacker communities.:  Since they’re unknown, they are by definition almost impossible to defend against.  Stuxnet’s four Zero-Days included:


  • The Microsoft Windows shortcut automatic file execution vulnerability which allowed the worm to spread through removable flash drives;
  • A print spooler remote code execution vulnerability; and
  • TWO different privilege escalation vulnerabilities.

Once Stuxnet finds Step 7 software, it patiently waits and listens until a connection to a PLC is made.  When Stuxnet detects the connection, it penetrates the PLC and begins to wreak all sorts of havoc.  The code controlling frequency converters is modified and Stuxnet takes control of the converter drives.  What’s of great interest is Stuxnet’s method of camouflaging its control.   

Remember the scene in Mission Impossible, Ocean’s 11 and just about every other heist movie where the spies and/or thieves insert a video clip into the surveillance system?  They’re busy emptying the vault, but the hapless guard monitoring the video feed only sees undisturbed safe contents.  Stuxnet turned this little bit of fiction into reality.  Reporting signals indicating abnormal behavior sent by the PLC are intercepted by Stuxnet and in turn signals indicating nominal, normal behavior are sent to the monitoring software on the control computer.

Stuxnet is now in the position to effect a physical attack against the gas centrifuges.  To understand the attack it’s important to understand that centrifuges work by spinning at very high speeds and that maintaining these speeds within tolerance is critical to their safe operation.  Typically, gas centrifuges used to enrich uranium operate at between 807hz and 1,210hz, with 1,064hz as a generally accepted standard.

Stuxnet used the infected PLCs to cause the centrifuge rotors to spin at 1,410hz for short periods of time over a 27 day period.  At the end of the period, Stuxnet would cause the rotor speed to drop to 2hz for fifty minutes at a time.  Then the cycle repeated.  The result was that over time the centrifuge rotors became unbalanced, the motors wore out and in the worst cases, the centrifuges failed violently.

Stuxnet destroyed as much as twenty percent of the Iranian uranium enrichment capacity.  There are two really fascinating lessons that can be learned from the Stuxnet story.  The first is that cyber -attacks can and will have effects in the kinetic and/or physical realm.  Power grids, water purification facilities and other utilities are prime targets for such attacks.  The second is that within the current design and implementation paradigms by which software is created and deployed, if a bad actor with the resources of a nation-state wants to ruin your cyber-day, your day is pretty much going to be ruined.

But that assumes that we maintain the current paradigm of software development and deployment.  In my next post I’ll discuss ways to break the current paradigm and the implications for agile, resilient systems that can go into harm’s way, sustain a cyber-hit and continue to perform their missions.

Wednesday, July 16, 2014

Transformation: A Future Not Slaved to the Past

In his May 30, 2014 contribution to the Washington Post’s Innovations blog, Dominic Basulto lays out a convincing argument as to how cyber-warfare represents a new form of unobserved but continuous warfare in which our partners are also our enemies.  The logic within Basulto’s piece is flawless, and his conclusion, that the “mounting cyber-war with China is nothing less than the future of war” and that “war is everywhere, and yet nowhere because it is completely digital, existing only in the ether” is particularly powerful. 

Unfortunately, the argument, and its powerful conclusion, ultimately fails.  Not because of errors in the internal logic, but rather because the implicit external premise, that the both the architecture of the internet and the processes by which software is developed and deployed are, like the laws of physics, immutable.  From a security perspective, the piece portrays a world where security technology and those charged with its development, deployment and use are perpetually one step behind the attackers who can, will and do use vulnerabilities in both architecture and process to spy, steal and destroy. 

It’s a world that is, fortunately, more one of willful science fiction than of predetermined technological fate.  We live in an interesting age.  There are cyber threats everywhere, to be sure.  But our ability to craft a safe, stable and secure cyber environment is very much a matter of choice.  From a security perspective, the next page is unwritten and we get to decide what it says, no matter how disruptive.

As we begin to write, let’s start with some broadly-agreed givens: 

  • There’s nothing magical about cyber security;
  • There are no silver bullets; and
  • Solutions leading to a secure common, distributed computing environment demand investments of time and resources. 
Let’s also be both thoughtful and careful before we allow pen to touch paper.  What we don’t want to do is perpetuate outdated assumptions at the expense of innovative thought and execution.  For example, there’s a common assumption in the information technology (IT) industry in general and the security industry (ITSec) in particular that mirrors the flaw in Basulto’s fundamental premise; that new security solutions must be applied to computing and internet architectures comparable or identical to those that exist today.  The premise behind this idea, that “what is, is what must be,” is the driver behind the continued proliferation of insecure infrastructures and compromisable computing platforms.

There’s nothing quixotic – or new - about seeking disruptive change.  “Transformation” has been a buzzword in industry and government for at least a decade.  For example, the North Atlantic Treaty Organization (NATO) has had a command dedicated to just that since 2003.  The “Allied Command Transformation” is responsible for leading the military transformation of forces and capabilities, using new concepts and doctrines in order to improve NATO's military effectiveness.  Unfortunately, many transformation efforts are often diverse and fragmented, and yield few tangible benefits.  Fortunately, within the rubric of cyber security, it’s possible to focus on a relatively small number of transformational efforts.

Let’s look at just four examples.  While not a panacea, implementation of these four would have a very significant, ameliorating impact on the state of global cyber vulnerability.

1. Security as part of the development process

Software security vulnerabilities are essentially flaws in the delivered product.  These flaws are, with rare exception, inadvertent.  Often they are undetectable to the end user.  That is, while the software may fulfill all of its functional requirements, there may be hidden flaws in non-functional requirements such as interoperability, performance or security.  It is these flaws, or vulnerabilities, that are exploited by hackers.

In large part, software vulnerabilities derive from traditional software development lifecycles (SDLC) which either fail to emphasize non-functional requirements, use a waterfall model where testing is pushed to the end of the cycle, don’t have a clear set of required best coding practices, are lacking in code reviews or some combination of the four.  These shortcomings are systemic in nature, and are not a factor of developer skill level.  Addressing them requires a paradigm shift.

The DevOps Platform-as-a-Service (PaaS) represents such a shift.  A cloud-based DevOps PaaS enables a project owner to centrally define the nature of a development environment, eliminating unexpected differences between development, test and operational environments.  Critically, the DevOps PaaS also enables the project owner to define continuous test/continuous integration patterns that push the onus of meeting non-functional requirements back to the developer. 

In a nutshell, both functional and non-functional requirements are instantiated as software tests.  When a developer attempts to check a new or modified module into the version control system, a number of processes are executed.  First, the module is vetted against the test regime.  Failures are noted and logged, and the module’s promotion along the SDLC stops at that point.  The developer is notified as to which tests failed, which parts of the software are flawed and the nature of the flaws.  Assuming the module tests successfully, it is automatically integrated into the project trunk and the version incremented.

A procedural benefit of a DevOps approach is that requirements are continually reviewed, reevaluated, and refined.  While this is essential to managing and adapting to change, it has the additional benefits of fleshing out requirements that are initially not well understood and identifying previously obscured non-functional requirements.  In the end, requirements trump process; if you don’t have all your requirements specified, DevOps will only help so much.

The net result is that a significantly larger percentage of flaws are identified and remedied during development.  More importantly, flaw/vulnerability identification takes place across the functional – non-functional requirements spectrum.  Consequently, the number of vulnerabilities in delivered software products can be expected to drop.

2. Encryption will be ubiquitous and preserve confidentiality and enhance regulability

For consumers, and many enterprises, encryption is an added layer of security that requires an additional level of effort.  Human nature being what it is, the results of the calculus are generally that a lower level of effort is more valuable than an intangible security benefit.  Cyber-criminals (and intelligence agencies) bank on this.  What if this paradigm could be inverted such that encryption became the norm rather than the exception?

Encryption technologies offer the twin benefits of 1) preserving the confidentiality of communications and 2) providing a unique (and difficult to forge) means for a user to identify herself.   The confidentiality benefit is self-evident:  Encrypted communications are able to be seen and used only by those who have the necessary key.  Abusing those communications requires significantly more work on an attacker’s part.

The identification benefit ensures that all users of (and on) a particular service or network are identifiable via the possession and use of a unique credential.  This isn’t new or draconian.  For example, (legal) users of public thoroughfares must acquire a unique credential issued by the state:  a driver’s license.  The issuance of such credentials is dependent on the user’s provision of strong proof of identity (such as, in the case of a driver’s license, a birth certificate, passport or social security card). The encryption-based equivalent to a driver’s license, a digital signature, could be a required element, used to positively authenticate users before access to any electronic resources is granted. 

From a security perspective, a unique authentication credential provides the ability to tie actions taken by a particular entity to a particular person.  As a result, the ability to regulate illegal behavior increases while the ability to anonymously engage in such behavior is concomitantly curtailed.

3.  Attribute-based authorization management delivery at both the OS and application levels

Here’s a hypothetical.  Imagine that you own a hotel.  Now imagine that you’ve put an impressive and effective security fence around the hotel, with a single locking entry point, guarded by a particularly frightening Terminator-like entity with the ability to make unerring access control decisions based on the credentials proffered by putative guests.  Now imagine that the lock on the entry point is the only lock in the hotel.  Every other room on the property can be entered simply by turning the doorknob. 

The word “crazy” might be among the adjectives used to describe the scenario above.  Despite that characterization, this type of authentication-only security is routinely practiced on critical systems in both the public and private sectors.  Not only does it fail to mitigate the insider threat, but it is also antithetical to the basic information security principle of defense in depth.  Once inside the authentication perimeter, an attacker can go anywhere and do anything.

A solution that is rapidly gaining momentum at the application layer is the employment of attribute-based access control (ABAC) technologies based on the eXtensible Access Control Markup Language (XACML) standard.  In an ABAC implementation, every attempt by a user to access a resource is stopped and evaluated against a centrally stored (and controlling) access control policy relevant to both the requested resource and the nature – or attributes – a user is required to have in order to access the resource.  Access requests from users whose attributes match the policy requirements go through, those that do not are blocked.

A similar solution can be applied at the operating system level to allow or block read/write attempts across inter-process communications (IPC) based on policies matching the attributes of the initiating process and the target.  One example, known as Secure OS, is under development by Kaspersky Lab.  At either level, exploiting a system that implements ABAC is significantly more difficult for an attacker and helps to buy down the risk of operating in a hostile environment.

4.  Routine continuous assessment and monitoring on networks and systems


It’s not uncommon for attackers, once a system has been compromised, to exfiltrate large amounts of sensitive data over an extended period.  Often, this activity presents as routine system and network activity.  As it’s considered to be “normal,” security canaries aren’t alerted and the attack proceeds unimpeded. 

Part of the problem is that the quantification of system activity is generally binary. That is, it’s either up or it’s down.  And, while this is important in terms of knowing what capabilities are available to an enterprise at any given time, it doesn’t provide actionable intelligence as to how the system is being used (or abused) at any given time.  Fortunately, it’s essentially a Big Data problem, and Big Data tools and solutions are well understood. 

The solution comprises two discrete components.  First, an ongoing data collection and analysis activity is used to establish a baseline for normal user behavior, network loading, throughput and other metrics.   Once the baseline is established, collection activity is maintained, and the collected behavioral metrics are evaluated against the baseline on a continual basis.  Deviations from the norm exceeding a specified tolerance are reported, trigger automated defensive activity or some combination of the two.

Conclusion

To reiterate, these measures do not comprise a panacea.  Instead, they represent a change, a paradigm shift in the way computing and the internet are conceived, architected and deployed that offers the promise of a significant increase in security and stability.  More importantly, they represent a series of choices in how we implement and control our cyber environment.  The future, contrary to Basulto’s assumption, isn’t slaved to the past.

Sunday, September 29, 2013

Popular Analytics: Not Just for Google Anymore



In an article entitled “Google vs. Death,” Time magazine’s Harry McCracken and Lev Grossman explore Google’s corporate strategy of simultaneously investing in both mainstream services and one part risky long shots.  Referred to by Google’s CEO Larry Page as “moon shots,” the research and development efforts in question are a bit more long term and ambitious than those undertaken by other companies.  Among these investments is a start-up called Calico.  

Calico, which will be run by Arthur Levinson, the former CEO of Genentech, will focus on medical issues associated with the aging process.  Among Calico’s operating premises is the technologist’s holy trinity:  Specifically, there is no problem that can’t be solved through a) hefty infusions of capital, b) the use of innovative technology and c) the application of huge amounts of processing power.  This is especially true in medicine, where doctors, researchers and other healthcare providers routinely access and analyze large patient data sets as part of diagnosis and treatment plans.

In many ways, Google’s investment in Calico is reflective of medicine’s return to its roots as an information science.  Indeed, the distinction between physicians and surgeons goes back almost a millennium.  For much of that time the two were members of different – and rival – professions.  English King Henry VIII chartered the London Royal College of Physicians in 1518.  It wasn’t until 1540 that the Company of Barber/Surgeons was granted a charter.  

The classical contrasts were clear:  Physicians performed an analysis of information about the patient’s condition and history within the context of human physiology and pathology.  Based on the information analysis, a diagnosis was made and a course of treatment prescribed.  Surgeons, on the other hand, sliced, diced, hacked, sawed, set, sewed and otherwise mechanically addressed injuries and wounds.  The distinction, at the time, was akin to that between white collar and blue collar workers; physicians practiced physic (roughly comparable to modern internal medicine) while surgeons engaged in, well, manual labor. (That’s not just a turn of phrase.  The word “surgery” derives from the Greek: χειρουργική cheirourgikē (composed of χείρ, "hand", and έργον, "work"), via Latin: chirurgiae, meaning "hand work.")  Fortunately for all concerned, the distinctions between the two in terms of professional standing and credibility as well as the use of information analysis to diagnose and treat have largely evaporated.

The points that Google is making in starting Calico, however, seem to be that:

  •  Surgery, while sometimes unavoidable and demanding the utmost talent and capability on the practitioner’s part, is essentially reactive and remedial in nature;
  • That the need for such remediation can be dramatically reduced, or at least more specifically targeted, by taking proactive measures; and
  • That the necessary proactive measures can be accurately determined by running powerful analytics against ever-growing medical data sets.
These aren’t especially groundbreaking concepts.  Medical professionals typically engage in diagnostic analyses prior to embarking upon a course of treatment.  However, many of these analyses are mental, relying on, and limited to, the individual practitioner’s experience and innate capabilities.  Even when the analyses are computer assisted, they are often based on relatively small data sets and inefficient processing capabilities.   

For example, a Board Certified Behavioral Analyst (BCBA) treating a child with Autism Spectrum Disorder (ASD) in New Jersey generally has access only to her own experiences, supplemented with information published by professional organizations, when designing an Applied Behavioral Analysis (ABA) treatment plan for the child.  In a fortunate circumstance, the BCBA may have access to records from other analysts in the same practice to which she could compare the child’s case and treatment.  However, this is still a limited information pool from which to draw, especially when compared to the data available on a state or nationwide basis.

As important as the information pool is the processing mechanism.  In the example above, the BCBA only has so many hours in a day to read case files, make sense of them, determine whether they apply to her case and, if so, how.  And all that is prior to making any determination as to what kinds of therapies or treatments the information in the files may indicate or suggest.  A critical aspect of information’s utility is its timeliness.  The best ASD therapy in the world is of little use if the time necessary to analyze existing data exceeds the time available for diagnosis prior to treatment.

These data problems are not unique to medicine.  They also acutely impact national security enterprises including defense and the intelligence community (IC) and local enterprises such as law enforcement.  As with the medical community, these entities are faced with critical problems, large data sets and a need for rapid, accurate analysis leading to accurate and effective solutions.   All three, defense, the IC and the medical community, also share a need for reliable, robust information security.  While it’s essential to provide the right information rapidly and efficiently, it’s absolutely crucial that the information be appropriately sanitized and that unauthorized parties are denied access.

Looking at these needs from an acquisitions perspective, a requirement emerges for a generic analytics capability that can be applied to domains ranging from medicine to intelligence to warfighting to academic research, business analysis and law enforcement.  Characteristics of such a capability might include:

  • The ability to define analytics parameters at the user or administrator level;
  • Data source and type agnosticism and the ability to add, remove or change data sources without significant impact to the overall capability;
  • Single Sign-On across the enterprise and/or across multiple domains;
  • Fine grained access control and automated data sanitization based on user attributes; and
  • Rapid analytics processing on commodity hardware.
There are dozens (if not hundreds) more such technical requirements.  However, one of the most important requirements isn’t technical but logistical.  In order for such a capability to make a difference, in order for it to be truly useful, it must be readily proliferated.  It’s one thing to have an entire company built around a specialized analytics capability, as Google has done with Calico.  It’s quite another to provide a drop-in, generic analytics tool that data intensive organizations of varying size can rapidly deploy and use, regardless of their area of specialization or domain.  

Put another way, unless the BCBA’s contract information technology (IT) support can rapidly install and configure the analytics tool (regardless of whether it’s on-premises or in the Cloud), and unless the BCBA can start to use it with a minimum of set-up and training time, the need isn’t being met.  As importantly, the whole exercise also fails unless small and medium-sized organizations can afford to acquire and use the tool.  

There must be data upon which the tool can operate.  However, the ubiquity of affordable, secure and readily employed analytics tools, whether across the medical or the military communities, can be expected to create a groundswell of grass-roots, popular demand for the secure, but open, availability of organizationally (and especially governmentally maintained) data that cannot be resisted by industry and government policymakers or data owners.  For an example of such data, one needs look no further than the headlines.  The Affordable Care Act, regardless of whether one loves it or hates it, will create unprecedented stores of medical information that could be used in the search for remedies and cures and truly effective therapies.  Google, for one is banking on this.

For a precedential example of such demand, one need only look at the effect of the rapid spread of mobile applications and APIs for accessing government data.  Widely proliferated mobile computing devices and processing capability created a popular expectation that government would open the data floodgates to make things faster, easier, more accurate and more convenient.  The result?  Well, take a look at 

 
or


where one can download the Internal Revenue Service’s (IRS) IRS2Go app.  In other words, “if you build it, they (or at least the data) will come.”

Perhaps the most surprising part of the “popular analytics” puzzle is that inexpensive, powerful analytics tools aren’t yet taking the IT landscape by storm.  The components necessary to create such tools are not only widely available and robust, most of them are open source and can be had without licensing or acquisition costs.  A few examples:

Capability
Product
License Type
Secure storage with cell-level access control
Open Source; Apache 2.0
Fine-grained, attribute-based access control
Open Source; Apache 2.0
Scalable, rapidly definable data analytics
Open Source, Apache 2.0
Single Sign-On and authentication management
Open Source, Apache 2.0
Data access and data loose coupling
Open Source, Apache 2.0
Management of APIs to internal and external data stores
Open Source, Apache 2.0

The case for democratizing analytics is compelling.  There’s always a possibility that one person looking at a limited data set may engage in the analysis that will lead to a breakthrough.  However, the odds of such a breakthrough increase significantly when many people look at a very large data set using powerful tools.  In the case of the BCBA doing what she can for one family’s autistic child, aren’t the benefits of increasing the odds of discovering an effective therapy obvious?  Similarly, how much more effective could a law enforcement organization be in protecting a municipality if affordable, powerful and effective analysis of criminal activity, trends and behaviors was the rule rather than the exception?

Democratizing – effectively crowd-sourcing – analytics can have profoundly beneficial results.  The need is there, the tools are there.  Why should Google have all the fun?