In an article entitled “Google vs. Death,” Time magazine’s Harry
McCracken and Lev Grossman explore Google’s corporate strategy of
simultaneously investing in both mainstream services and one part risky long
shots. Referred to by Google’s CEO Larry
Page as “moon shots,” the research and development efforts in question are a
bit more long term and ambitious than those undertaken by other companies. Among these investments is a start-up called
Calico.
Calico, which will be run by Arthur Levinson, the former CEO
of Genentech, will focus on medical issues associated with the aging
process. Among Calico’s operating
premises is the technologist’s holy trinity:
Specifically, there is no problem that can’t be solved through a) hefty
infusions of capital, b) the use of innovative technology and c) the
application of huge amounts of processing power. This is especially true in medicine, where doctors,
researchers and other healthcare providers routinely access and analyze large
patient data sets as part of diagnosis and treatment plans.
In many ways, Google’s investment in Calico is reflective of
medicine’s return to its roots as an information science. Indeed, the distinction between physicians
and surgeons goes back almost a millennium.
For much of that time the two were members of different – and rival –
professions. English King Henry VIII
chartered the London Royal College of Physicians in 1518. It wasn’t until 1540 that the Company of
Barber/Surgeons was granted a charter.
The classical contrasts were clear: Physicians performed an analysis of
information about the patient’s condition and history within the context of
human physiology and pathology. Based on
the information analysis, a diagnosis was made and a course of treatment
prescribed. Surgeons, on the other hand,
sliced, diced, hacked, sawed, set, sewed and otherwise mechanically addressed
injuries and wounds. The distinction, at
the time, was akin to that between white collar and blue collar workers;
physicians practiced physic (roughly
comparable to modern internal medicine) while surgeons engaged in, well, manual
labor. (That’s not just a turn of phrase.
The word “surgery” derives from the Greek: χειρουργική cheirourgikē
(composed of χείρ, "hand", and έργον, "work"), via Latin:
chirurgiae, meaning "hand work.")
Fortunately for all concerned, the distinctions between the two in terms
of professional standing and credibility as well as the use of information
analysis to diagnose and treat have largely evaporated.
The points that Google is making in starting Calico,
however, seem to be that:
- Surgery, while sometimes unavoidable and demanding the utmost talent and capability on the practitioner’s part, is essentially reactive and remedial in nature;
- That the need for such remediation can be dramatically reduced, or at least more specifically targeted, by taking proactive measures; and
- That the necessary proactive measures can be accurately determined by running powerful analytics against ever-growing medical data sets.
These aren’t especially groundbreaking concepts. Medical professionals typically engage in
diagnostic analyses prior to embarking upon a course of treatment. However, many of these analyses are mental,
relying on, and limited to, the individual practitioner’s experience and innate
capabilities. Even when the analyses are
computer assisted, they are often based on relatively small data sets and
inefficient processing capabilities.
For
example, a Board Certified Behavioral Analyst (BCBA) treating a child with
Autism Spectrum Disorder (ASD) in New Jersey generally has access only to her
own experiences, supplemented with information published by professional
organizations, when designing an Applied Behavioral Analysis (ABA) treatment
plan for the child. In a fortunate
circumstance, the BCBA may have access to records from other analysts in the
same practice to which she could compare the child’s case and treatment. However, this is still a limited information
pool from which to draw, especially when compared to the data available on a
state or nationwide basis.
As important as the information pool is the processing
mechanism. In the example above, the
BCBA only has so many hours in a day to read case files, make sense of them,
determine whether they apply to her case and, if so, how. And all that is prior to making any
determination as to what kinds of therapies or treatments the information in
the files may indicate or suggest. A
critical aspect of information’s utility is its timeliness. The best ASD therapy in the world is of
little use if the time necessary to analyze existing data exceeds the time
available for diagnosis prior to treatment.
These data problems are not unique to medicine. They also acutely impact national security
enterprises including defense and the intelligence community (IC) and local
enterprises such as law enforcement. As
with the medical community, these entities are faced with critical problems,
large data sets and a need for rapid, accurate analysis leading to accurate and
effective solutions. All three, defense, the IC and the medical
community, also share a need for reliable, robust information security. While it’s essential to provide the right
information rapidly and efficiently, it’s absolutely crucial that the
information be appropriately sanitized and that unauthorized parties are denied
access.
Looking at these needs from an acquisitions perspective, a
requirement emerges for a generic analytics capability that can be applied to
domains ranging from medicine to intelligence to warfighting to academic
research, business analysis and law enforcement. Characteristics of such a capability might
include:
- The ability to define analytics parameters at the user or administrator level;
- Data source and type agnosticism and the ability to add, remove or change data sources without significant impact to the overall capability;
- Single Sign-On across the enterprise and/or across multiple domains;
- Fine grained access control and automated data sanitization based on user attributes; and
- Rapid analytics processing on commodity hardware.
There are dozens (if not hundreds) more such technical
requirements. However, one of the most
important requirements isn’t technical but logistical. In order for such a capability to make a
difference, in order for it to be truly useful, it must be readily proliferated. It’s one thing to have an entire company
built around a specialized analytics capability, as Google has done with
Calico. It’s quite another to provide a
drop-in, generic analytics tool that data intensive organizations of varying
size can rapidly deploy and use, regardless of their area of specialization or
domain.
Put another way, unless the BCBA’s contract information
technology (IT) support can rapidly install and configure the analytics tool
(regardless of whether it’s on-premises or in the Cloud), and unless the BCBA
can start to use it with a minimum of set-up and training time, the need isn’t
being met. As importantly, the whole
exercise also fails unless small and medium-sized organizations can afford to acquire
and use the tool.
There must be data upon which the tool can operate. However, the ubiquity of affordable, secure
and readily employed analytics tools, whether across the medical or the
military communities, can be expected to create a groundswell of grass-roots,
popular demand for the secure, but open, availability of organizationally (and
especially governmentally maintained) data that cannot be resisted by industry
and government policymakers or data owners.
For an example of such data, one needs look no further than the
headlines. The Affordable Care Act,
regardless of whether one loves it or hates it, will create unprecedented
stores of medical information that could be used in the search for remedies and
cures and truly effective therapies.
Google, for one is banking on this.
For a precedential example of such demand, one need only
look at the effect of the rapid spread of mobile applications and APIs for accessing
government data. Widely proliferated
mobile computing devices and processing capability created a popular expectation
that government would open the data floodgates to make things faster, easier,
more accurate and more convenient. The
result? Well, take a look at
or
where one can download the Internal Revenue Service’s (IRS)
IRS2Go app. In other words, “if you
build it, they (or at least the data) will come.”
Perhaps the most surprising part of the “popular analytics”
puzzle is that inexpensive, powerful analytics tools aren’t yet taking the IT
landscape by storm. The components
necessary to create such tools are not only widely available and robust, most
of them are open source and can be had without licensing or acquisition
costs. A few examples:
Capability
|
Product
|
License
Type
|
Secure storage with cell-level access control
|
Open Source; Apache 2.0
|
|
Fine-grained, attribute-based access control
|
Open Source; Apache 2.0
|
|
Scalable, rapidly definable data analytics
|
Open Source, Apache 2.0
|
|
Single Sign-On and authentication management
|
Open Source, Apache 2.0
|
|
Data access and data loose coupling
|
Open Source, Apache 2.0
|
|
Management of APIs to internal and external data stores
|
Open Source, Apache 2.0
|
The case for democratizing analytics is compelling. There’s always a possibility that one person
looking at a limited data set may engage in the analysis that will
lead to a breakthrough. However, the
odds of such a breakthrough increase significantly when many people look at a
very large data set using powerful tools.
In the case of the BCBA doing what she can for one family’s autistic
child, aren’t the benefits of increasing the odds of discovering an effective
therapy obvious? Similarly, how much
more effective could a law enforcement organization be in protecting a
municipality if affordable, powerful and effective analysis of criminal
activity, trends and behaviors was the rule rather than the exception?
Democratizing – effectively crowd-sourcing – analytics can
have profoundly beneficial results. The
need is there, the tools are there. Why
should Google have all the fun?