Sunday, September 29, 2013

Popular Analytics: Not Just for Google Anymore



In an article entitled “Google vs. Death,” Time magazine’s Harry McCracken and Lev Grossman explore Google’s corporate strategy of simultaneously investing in both mainstream services and one part risky long shots.  Referred to by Google’s CEO Larry Page as “moon shots,” the research and development efforts in question are a bit more long term and ambitious than those undertaken by other companies.  Among these investments is a start-up called Calico.  

Calico, which will be run by Arthur Levinson, the former CEO of Genentech, will focus on medical issues associated with the aging process.  Among Calico’s operating premises is the technologist’s holy trinity:  Specifically, there is no problem that can’t be solved through a) hefty infusions of capital, b) the use of innovative technology and c) the application of huge amounts of processing power.  This is especially true in medicine, where doctors, researchers and other healthcare providers routinely access and analyze large patient data sets as part of diagnosis and treatment plans.

In many ways, Google’s investment in Calico is reflective of medicine’s return to its roots as an information science.  Indeed, the distinction between physicians and surgeons goes back almost a millennium.  For much of that time the two were members of different – and rival – professions.  English King Henry VIII chartered the London Royal College of Physicians in 1518.  It wasn’t until 1540 that the Company of Barber/Surgeons was granted a charter.  

The classical contrasts were clear:  Physicians performed an analysis of information about the patient’s condition and history within the context of human physiology and pathology.  Based on the information analysis, a diagnosis was made and a course of treatment prescribed.  Surgeons, on the other hand, sliced, diced, hacked, sawed, set, sewed and otherwise mechanically addressed injuries and wounds.  The distinction, at the time, was akin to that between white collar and blue collar workers; physicians practiced physic (roughly comparable to modern internal medicine) while surgeons engaged in, well, manual labor. (That’s not just a turn of phrase.  The word “surgery” derives from the Greek: χειρουργική cheirourgikē (composed of χείρ, "hand", and έργον, "work"), via Latin: chirurgiae, meaning "hand work.")  Fortunately for all concerned, the distinctions between the two in terms of professional standing and credibility as well as the use of information analysis to diagnose and treat have largely evaporated.

The points that Google is making in starting Calico, however, seem to be that:

  •  Surgery, while sometimes unavoidable and demanding the utmost talent and capability on the practitioner’s part, is essentially reactive and remedial in nature;
  • That the need for such remediation can be dramatically reduced, or at least more specifically targeted, by taking proactive measures; and
  • That the necessary proactive measures can be accurately determined by running powerful analytics against ever-growing medical data sets.
These aren’t especially groundbreaking concepts.  Medical professionals typically engage in diagnostic analyses prior to embarking upon a course of treatment.  However, many of these analyses are mental, relying on, and limited to, the individual practitioner’s experience and innate capabilities.  Even when the analyses are computer assisted, they are often based on relatively small data sets and inefficient processing capabilities.   

For example, a Board Certified Behavioral Analyst (BCBA) treating a child with Autism Spectrum Disorder (ASD) in New Jersey generally has access only to her own experiences, supplemented with information published by professional organizations, when designing an Applied Behavioral Analysis (ABA) treatment plan for the child.  In a fortunate circumstance, the BCBA may have access to records from other analysts in the same practice to which she could compare the child’s case and treatment.  However, this is still a limited information pool from which to draw, especially when compared to the data available on a state or nationwide basis.

As important as the information pool is the processing mechanism.  In the example above, the BCBA only has so many hours in a day to read case files, make sense of them, determine whether they apply to her case and, if so, how.  And all that is prior to making any determination as to what kinds of therapies or treatments the information in the files may indicate or suggest.  A critical aspect of information’s utility is its timeliness.  The best ASD therapy in the world is of little use if the time necessary to analyze existing data exceeds the time available for diagnosis prior to treatment.

These data problems are not unique to medicine.  They also acutely impact national security enterprises including defense and the intelligence community (IC) and local enterprises such as law enforcement.  As with the medical community, these entities are faced with critical problems, large data sets and a need for rapid, accurate analysis leading to accurate and effective solutions.   All three, defense, the IC and the medical community, also share a need for reliable, robust information security.  While it’s essential to provide the right information rapidly and efficiently, it’s absolutely crucial that the information be appropriately sanitized and that unauthorized parties are denied access.

Looking at these needs from an acquisitions perspective, a requirement emerges for a generic analytics capability that can be applied to domains ranging from medicine to intelligence to warfighting to academic research, business analysis and law enforcement.  Characteristics of such a capability might include:

  • The ability to define analytics parameters at the user or administrator level;
  • Data source and type agnosticism and the ability to add, remove or change data sources without significant impact to the overall capability;
  • Single Sign-On across the enterprise and/or across multiple domains;
  • Fine grained access control and automated data sanitization based on user attributes; and
  • Rapid analytics processing on commodity hardware.
There are dozens (if not hundreds) more such technical requirements.  However, one of the most important requirements isn’t technical but logistical.  In order for such a capability to make a difference, in order for it to be truly useful, it must be readily proliferated.  It’s one thing to have an entire company built around a specialized analytics capability, as Google has done with Calico.  It’s quite another to provide a drop-in, generic analytics tool that data intensive organizations of varying size can rapidly deploy and use, regardless of their area of specialization or domain.  

Put another way, unless the BCBA’s contract information technology (IT) support can rapidly install and configure the analytics tool (regardless of whether it’s on-premises or in the Cloud), and unless the BCBA can start to use it with a minimum of set-up and training time, the need isn’t being met.  As importantly, the whole exercise also fails unless small and medium-sized organizations can afford to acquire and use the tool.  

There must be data upon which the tool can operate.  However, the ubiquity of affordable, secure and readily employed analytics tools, whether across the medical or the military communities, can be expected to create a groundswell of grass-roots, popular demand for the secure, but open, availability of organizationally (and especially governmentally maintained) data that cannot be resisted by industry and government policymakers or data owners.  For an example of such data, one needs look no further than the headlines.  The Affordable Care Act, regardless of whether one loves it or hates it, will create unprecedented stores of medical information that could be used in the search for remedies and cures and truly effective therapies.  Google, for one is banking on this.

For a precedential example of such demand, one need only look at the effect of the rapid spread of mobile applications and APIs for accessing government data.  Widely proliferated mobile computing devices and processing capability created a popular expectation that government would open the data floodgates to make things faster, easier, more accurate and more convenient.  The result?  Well, take a look at 

 
or


where one can download the Internal Revenue Service’s (IRS) IRS2Go app.  In other words, “if you build it, they (or at least the data) will come.”

Perhaps the most surprising part of the “popular analytics” puzzle is that inexpensive, powerful analytics tools aren’t yet taking the IT landscape by storm.  The components necessary to create such tools are not only widely available and robust, most of them are open source and can be had without licensing or acquisition costs.  A few examples:

Capability
Product
License Type
Secure storage with cell-level access control
Open Source; Apache 2.0
Fine-grained, attribute-based access control
Open Source; Apache 2.0
Scalable, rapidly definable data analytics
Open Source, Apache 2.0
Single Sign-On and authentication management
Open Source, Apache 2.0
Data access and data loose coupling
Open Source, Apache 2.0
Management of APIs to internal and external data stores
Open Source, Apache 2.0

The case for democratizing analytics is compelling.  There’s always a possibility that one person looking at a limited data set may engage in the analysis that will lead to a breakthrough.  However, the odds of such a breakthrough increase significantly when many people look at a very large data set using powerful tools.  In the case of the BCBA doing what she can for one family’s autistic child, aren’t the benefits of increasing the odds of discovering an effective therapy obvious?  Similarly, how much more effective could a law enforcement organization be in protecting a municipality if affordable, powerful and effective analysis of criminal activity, trends and behaviors was the rule rather than the exception?

Democratizing – effectively crowd-sourcing – analytics can have profoundly beneficial results.  The need is there, the tools are there.  Why should Google have all the fun?

Monday, July 15, 2013

Open Source: The Dominant Warfighting Doctrine of the 21st Century

Open source software offers the promise of a revolutionary transformation in defense, intelligence, law enforcement and government technology at a cost and pace that satisfies the competing requirements of shrinking resources and constantly accelerating global operations.   While this technological transformation is emphasized by engineers and developers within industry and the acquisition community, it is often perceived as tangential to those with an operational focus. 

This apparent dissonance between the acquisitions and operational communities is caused by the broad nature of the open source phenomenon:  Open source refers as much to a cultural perspective as it does to a technology model.  However, the gulf separating the two sides isn’t as wide as it may appear.  For the acquisitions community, open source technology enables the rapid insertion of new and improved capabilities.  For the operators, open source culture enables the achievement of a doctrine and force structure for self-synchronizing operations.  The cultural and technological synergy of open source is the catalyst for, as David Alberts and Richard Hayes put it, pushing “power to the edge.”

Open Source Culture

One of the best expositions of open source culture is Jason Hibbets’ The Foundation for an Open Source City.  In addition to being a compelling call to arms for open source advocates, the book offers a pragmatic exposition of the characteristics necessary for an organization to reap the benefits of open source culture.  Open source, as Hibbets points out, encompasses more than the selection of a software licensing scheme.  Open source is about empowering both creators and users to determine the most effective solutions to a problem and at the same time, empowering them to add experiential value to those solutions derived from their own knowledge and experience.

The characteristics of an “open source city” are neatly summarized in five bullets:

•    Fostering a culture of citizen participation;
•    Having an effective open government policy;
•    Having an effective open data initiative;
•    Promoting open source groups and conferences; and
•    Being a hub for innovation and open source business.

It’s worth a brief look at each of these.

Citizen – and for our purposes, that can be read as “user” -  participation is critical to the rapid development of innovative solutions.  Broad citizen participation allows issues and challenges facing a political entity to be placed  before its constituency.  More importantly, it allows for solution concepts to be drawn from the constituency – a much larger pool than government.  The contemporary buzzword for obtaining ideas by casting challenges before a mass of people (especially from an online community) is “crowdsourcing.”  It’s really a much older concept.  In The Cathedral and the Bazaar, open source evangelist Eric Raymond called it “Linus’ Law,” in honor of the originator of Linux, an open source operating system, Linus Torvalds.   Linus’ Law says ““Given enough eyeballs, all bugs are shallow,” applying the principles of broad enfranchisement and participation to problem solving.

Open government policies are designed to improve transparency (and, concomitantly, trust), access to public information and coordination between government (and other public sector entities) non-profit, the private sector and the citizenry.  They do so by emphasizing distributed review, inclusiveness and broad and diverse participation.  Such policies are often tangibly expressed as publicly accessible web portals with open data, web and mobile applications for visualizing and using the data and links to stakeholder organizations.

Open data initiatives are about making data both publicly available and useful.  Making raw data sets accessible doesn’t help the vast majority of users.  True open data initiatives provide mechanisms by which users can consume the available data.  These may include visualization tools, search engines and web applications that enable users to derive benefit from the data.

Open source culture is grass roots in nature.  In order to thrive it requires an environment in which stakeholder communities are encouraged and supported.  Support means a number of things, including providing physical space in which to conduct activities and meetings, an environment that encourages a continual influx of new ideas and concepts, using official channels to disseminate information about and of relevance to the open source communities both internally and externally and, of course, financial support.  The nature of this support is important; it should be enabling rather than prescriptive if the broad participation that empowers open source is to be effected.

The last characteristic, that of being a hub for innovation, is really the result of the other four.  When a political entity enables open source with the encouragement of broad participation, enactment of governance policies, open data and material support, many groups with a common vision, but a diversity of approaches to realization, are brought together.  The result is that only broad guidance is necessary to create the desired effects.

Power to the Edge

Power to the edge is a command and control philosophy initially articulated in the early 21st century by the US Department of Defense Command and Control Research Program (CCRP).  Command and control (C2) is, in US doctrinal terms, an organization’s ability to exercise authority and direction over subordinate components to achieve mission objectives.  C2 is achieved through the interplay of personnel, equipment, communications, facilities and procedures that enable a commander to plan, coordinate and control forces during operational activities.

Power to the edge philosophy emphasizes a decentralized form of C2 where the commander’s role shifts from one of providing prescriptive oversight (i.e., a goal, an operational solution and direction as to how to achieve that solution) one of providing general guidance and enabling support.  The concept is not new; indeed, the German Army has emphasized what it called Auftragstaktik, or mission-type tactics, since the early 19th century.  Auftragstaktik as a doctrine features individual initiative, independent decisionmaking and encouraging subordinate leaders to reach tactical decisions on their own accord in furtherance of a general command intent.  The commander’s primary roles are to provide an overall vision of the objective end state and to support the force, giving virtual autonomy to the subordinate commanders in achieving her intent. 

The key difference between Auftragstaktik as practiced by the German Army in two World Wars and Power to the Edge as envisioned by the CCRP is the addition of shared situational awareness, enabled and supported by robust networking,  in both superior-subordinate and peer-to-peer relationships.  Shared situational awareness between peer organizations engenders clear and consistent understanding of the commander’s intent, the timely exchange of relevant and high-quality information, a demand for competent participation at all levels and trust in and between the information being exchanged, subordinates, superiors, peers and technology.  The result is that robustly networked organizations are able to self-synchronize, dramatically reducing both the need for command intervention to cope with the vagaries of the battlespace and the time required for the force to make sense of and respond to new operational stimuli.

For Power to the Edge/self-synchronization to work, the subordinate organizations must be technically enabled and procedurally empowered to fully engage in the supply, consumption and enhancement of battlefield information.  At the same time, the commander must create an environment where all members of the community are aware of the overall mission goals, the intent is clearly communicated and where openness and trust between the command and subordinate units is fostered.  Part of this environment is ensuring that, within relevant operational security constraints, all subordinates have on demand access to the tactical and intelligence data held at the command.  Additionally, the commander needs to ensure that subordinate units have the necessary material and logistics support (i.e., equipment, training, etc.) to participate effectively in Network-Centric Warfare (NCW), and to nurture a shared mindset that encourages subordinates to exercise initiative and find creative solutions without adverse impact for imperfect decisions.

The last requirement is critical.  Prudent risk-taking is an element of initiative, creativity and innovation.  Power to the Edge demands that commanders encourage their subordinates to exercise tactical initiative, while recognizing that errors and reverses will occur. The sum of successes derived from the exercise of battlespace initiative, the theory states, will overcome the occasional setbacks. A "zero defect tolerance" command philosophy discourages initiative and stifles innovation.  In the open source and Agile communities, this is called “failing fast,” and is a necessary and encouraged facet of the development life cycle.

A Rose by Any Other Name

Even a cursory familiarity with C2 philosophy shows a striking correlation between Power to the Edge and the tenets of open source culture.  It’s possible, however, to go a step further and make a strong case that open source culture is synonymous with Power to the Edge, which is rapidly becoming the dominant battlefield doctrine of the 21st century.  The table below relates characteristics of an organization that has embraced open source culture with corresponding tenets of Power to the Edge doctrine:




Open Source Culture Characteristic
Power to the Edge Tenet
Welcoming participation by all stakeholders
Ensure that subordinate units are technically enabled and procedurally empowered to fully engage in the supply, consumption and enhancement of battlefield information
Effective open governance policy
Ensuring that the overall mission goals and commander’s intent are clearly communicated and that the command ethos fosters openness and trust between the command and subordinate units
Effective open data initiative
Ensuring that, consistent with operational security constraints, all subordinates have access to the tactical and intelligence data held at the command
Promotion of open source groups
Ensuring material and logistics support so that subordinate units can participate effectively in Network-Centric Warfare (NCW)
Serving as a hub for innovation
Creating an environment where subordinates are encouraged exercise tactical initiative, while accepting that errors and reverses will occur and ensuring that "zero defect tolerance" command philosophies are unacceptable


Conclusion

Open source is more than a licensing scheme for software or a business model.  It is an organizational culture and a management philosophy that leads to efficient and effective project execution, often at a significant savings in time and resources.  With widespread acceptance of Power to the Edge doctrine, open source is no longer a countercultural rebellion against the inefficiencies of large organizations.  Instead, it has matured into the dominant management philosophy of one of the largest and most complex enterprises on the planet.  We can only hope to see industry and government follow suit.

Thursday, June 6, 2013

Embracing a Resourceful Information Security Culture


Following the news, it’s difficult to escape a sense that the defense community is mired in a Sisyphean game of information security (INFOSEC) catch-up.  It seems that as soon as a policy is embraced by government agencies and their industry partners, new threats emerge and existing dangers increase in magnitude.  

Reminders are ubiquitous: June 3 saw the opening of Bradley Manning’s court martial.  Manning, an Army private first class, is accused having illegally downloaded and forwarded huge amounts of classified information to Wikileaks.  A week or so earlier, the Washington Post published a report indicating that Chinese hackers had successfully penetrated at least a dozen high profile American weapon systems, including those tasked with critical air defense, battlefield mobility and maritime dominance responsibilities.  The list could go on.

Fortunately, many of the community’s INFOSEC challenges are philosophical and fiscal rather than technical.  As such, they can be addressed by harnessing currently available resources, which, in an era that pits escalating requirements against sequestered budgets, is fortunate indeed.  The remainder of this article discusses four key improvement vectors that will enable the defense community – government and industry - to begin to address looming cyber and INFOSEC concerns, specifically: 
  • Acquisitions staffing reform;
  • Baked in INFOSEC;
  • Automated auditing and monitoring; and
  • The use of open source software.

Acquisitions Staffing Reform

Impeccably trained by organizations such as the Defense Acquisition University (DAU), the US Department of Defense (DoD) fields what is arguably the finest team of program managers and acquisition professionals in the world.  These people, who are ultimately coordinated by the Under Secretary of Defense for Acquisition, Technology and Logistics (AT&L) are extremely well versed the art of buying goods and services.  Assisting and advising the program managers are military and civilian experts who are charged with ensuring that the goods and services received are of best value and most use to the end users.

Unfortunately, these advisers, while well versed in operational needs and utility are generally not technical experts with regard to software and computing technology, especially as it applies to cybersecurity or its constituent disciplines such as infrastructure security, application security (AppSec) or malware detection and remediation.  As a result, program offices are placed in the position of relying on the same engineers and developers who are paid to develop systems for technical advice.  The contractors, in turn, may have a conflict of interest between the roles of advisor and solutions provider.

Solving this problem requires a fundamental augmentation to the acquisition community’s capabilities in terms of technical expertise.  Near term resources are available from existing DoD-affiliated expert organizations such as university affiliated research centers (UARC) and federally funded research and development centers (FFRDC).  It is likely that acquisition community will need expand this resource pool (either through direct government hires or through the use of contracted technical validation expertise) to fully represent the technical INFOSEC skill set.  

The silver lining to acquisitions staffing reform is the promise of overall lower system acquisitions costs as INFOSEC gaps are identified and remedied at the requirements level instead of after coding, an important benefit in the era of sequestration.

"Baked in" INFOSEC

The current model for incorporating and validating INFOSEC requirements and capabilities focuses on the tail end of the software development life cycle (SDLC).  While there are some proactive measures, such as the use of approved software tooling or adherence to Security Technical Implementation Guides (STIG) issued by the Defense Information Systems Agency (DISA), most INFOSEC activity takes place after a system has been developed.  

Typically, when a developer completes a new system (or a modification to an existing one), it is submitted for certification and accreditation (C&A) review.  The review includes security validation testing against the required information assurance controls (IAC).  At the conclusion of testing a list of vulnerabilities and mitigations is created, which are then negotiated into actions by the government program manager, the developer and the reviewing organization. 

The fixes are then applied to a completed, coded system, often requiring significant rework, time and expense.  More troublingly, many of the fixes take the form of a security applique, layered on top of the existing system.  These security appliques often compromise the system’s mission utility by increasing operator workload.  Systems featuring this applique approach are also inherently less secure than those that have incorporated INFOSEC mechanisms into their architectures from the beginning. The National Institute of Standards and Technology (NIST) recognized the benefits of designed-in INFOSEC in Special Publication 800-27 Rev A,Engineering Principles for Infomation Technology Security (A Baseline for Achieving Security), which specifies that security should be an integral part of overall system design.

Applying – and validating – INFOSEC capabilities early in the SDLC, at the requirements an architectural levels, not only creates more secure systems, it also cuts down on the expense of rework often necessary to meet C&A guidelines, resulting in more fiscally responsible systems acquisition.  The achievement of “baked in” INFOSEC rests on what we’ll call a “strategic security triad.”  The triad consists of:
  • Requirements stemming from authoritative laws, regulations, policies and guidance (LRPG);
  • A DoD-wide library of modular, standards-based, approved INFOSEC implementation patterns; and
  • DevOps principles of continuous integration and automated testing.
Fortunately for the community, all three legs of the triad are represented by resources that are currently and economically available:  

The DoD is replete with mature, forward leaning INFOSEC LRPG. Typical of these is the Defense Information Enterprise Architecture (IEA) published by the DoD CIO’s office.  The document mandates basic principles of secured access such as assured identity, threat assessment, policy-based access control and centralized identity, credential and access management (ICAM).

An implementation pattern library would include pre-approved INFOSEC tools as well as requirements for what the tools have to accomplish for the system, and how they are to be implemented.  There are many INFOSEC tools currently available from industry.  These include things such as IBM’s Security Framework product line and CA Technologies’ Identity Minder.   

Interesting from a government perspective is the emergence of supported open source products into the INFOSEC space.  These tools offer the promise of open standards, modular design and implementation and enterprise class performance – all with zero acquisition cost.  Typical of this line is the WSO2 Security and Identity Gateway Solution.  This solution integrates four WSO2 products, the WSO2 Enterprise Service Bus, the WSO2 Governance Registry, the WSO2 Identity Server and the WSO2 Business Activity Monitor (BAM) to provide a complete AppSec solution including authentication, authorization, auditing, monitoring, content-based filtering, input validation, throttling, caching and Single Sign-On.  

The last leg of the triad, continuous integration and automated testing, requires tooling that can implement strong programmatic governance.  By requiring all of a program’s developers to use a common, Cloud based development platform that includes automated testing, integration and build tools, both functional and non-functional INFOSEC requirements (such as ports, protocols and services settings) can be validated before a module is accepted for integration into the application’s trunk.  Modules not meeting the requirements are rejected, with a report indicating what the developer needs to address.  As a result, C&A testing is an ongoing, integral part of development, and end-state C&A activities are dramatically curtailed.  An example of such a DevOps platform is the WSO2 App Factory.

Automated Auditing and Monitoring

Among NIST’s computer security principles is the need to implement audit mechanisms to detect unauthorized use.  Implied in this requirement is the need to notify the administration and response team as soon as a breach or other unauthorized activity is detected.  Given the size and scope of system activity and transaction logs, it is necessary to automate this process to achieve the necessary timeliness.  

Essentially, auditing and monitoring is a Big Data analytics problem.  Fortunately for the defense community, industry has been focusing on Big Data Analytics for a number of years.  There are a number of analytics platforms available for this task such as Google’s Dremel, Apache Drill and the WSO2 BAM.  Like the others, WSO2 BAM provides the capability to perform rapid analysis on large scale data sets.  To this, however, WSO2 BAM adds alerting and customizable dashboarding capabilities, which can be used to ensure near-real-time notification of suspect events.  When compared to currently acceptable auditing paradigms, some of which allow for a week or more between manual inspection of security logs, this represents a significant capability improvement.

Open Source Software

The most obvious benefit associated with open source software is cost.  Generally available without licensing fees, the use of open source software makes it much more cost effective to incorporate advanced INFOSEC capabilities than proprietary software.  However, while open source software contributes to lowering a system’s total cost of ownership (TCO) it is not without cost as development and production support services generally require paid subscriptions. 

Open source software has an added security benefit that is particularly compelling.  Specifically, open design and source code enable broad based, detailed code inspection and the rapid detection of both flaws and threats.  The idea that proprietary software is more secure because the source code is hidden just doesn’t stand up to scrutiny.  NIST’s selection of the Rijndael block cipher as the Advanced Encryption Standard (AES) in 2000 followed a nearly three year process in which a number of algorithms were publicly discussed, debated and cryptanalyzed.  In another case, Borland published and widely sold the InterBase database for seven years.  In 2000, InterBase was open-sourced as the Firebird project. Within five months of the product being open sourced, a hard coded backdoor (username “politically,” password “correct”) was discovered.

Conclusion

Strong system INFOSEC is in the best interest of the entire defense community.  While there have been both doctrinal and cultural fits and starts with respect to effective, community-wide policies, a fortunate confluence of technology, development philosophy and leadership exists that can allow the closure of critical security gaps.  As computer security expert John Pescatore noted at the June 4 Kaspersky Government Forum, we need to find the balance between perfect solutions that work, but are untenable and solutions that work and are tenable, but might not be perfect.   Acquisitions staffing reform, baked in INFOSEC, automated auditing and monitoring and the use of open source software are the first big steps toward tenable.