Want better AI for the DOD? Stop treating data like currency


Five years ago, the U.S. military was behind on artificial intelligence. The easiest way to explain the state of AI in the Department of Defense was by comparison to Silicon Valley, and it wasn’t a flattering look. When this became apparent, it took time for the defense community to articulate why AI mattered and why it deserved greater attention in terms of budgets and brainpower.

Today, tech savvy leaders throughout the service branches and the combat support agencies have a clear concept of what’s at stake for defense AI. They understand that AI can make violent conflict more precise, more humane and more predictable; that it can reduce the likelihood of war by giving decision-makers better, timelier information and lowering the risk of misunderstanding between strategic rivals. Above all, defense leaders have embraced the reality that AI will give American and allied war fighters a strategic and tactical edge in the next war.

As an alumnus of the Algorithmic Warfare Cross Functional Team, I like to think that Project Maven played an outsized role in this process of discovery. The team’s original mandate was to pathfind — to “kick over rocks” and identify all of the technical and institutional friction points that made defense AI slow and hard.

We found a lot. Network and hardware limitations were profound, and data scarcity shocked some commercial vendors who were used to plentiful, unclassified AI training sets. We learned exactly how sparse cleared engineering talent can be, and we wrestled with the quirks of information assurance. Even mundane office tools were hard — we pushed for government-accredited access to Slack, to no avail.

One insight stands out from lessons learned over four years of bureaucratic bushwhacking. It speaks to an unfortunate cultural tendency, something that senior leaders could abolish if they realized how pervasive and problematic it is: Many data brokers in the defense community still treat data like currency. This was a No. 1 blocker at Project Maven, hands down. The team lost months of time waiting for partner organizations to release data from archives or grant access to data streams. The timeline for some of the team’s requests, even when initiated by senior leaders, could be measured not in days or weeks, but by the passing of seasons.

Data sharing can get political. Popular wisdom says that data is the new oil. As the raw material for AI and machine learning, data has intrinsic value. This is why data labeling companies working for the government sometimes try to make data proprietary after labeling, just to squeeze a little more money and leverage out of the pipeline. And there are legitimate concerns about data proliferation — if AI training data falls into the wrong hands, it could be used to reverse engineer military algorithms for adversarial purposes.

But between government offices, data-as-currency kills progress. It creates bottlenecks in AI pipelines. Slow data means expensive, talented engineers struggling to find work until the data dump finally arrives and the ingest process can begin. It means uncertainty for project managers, who must wait until data ingest and discovery is complete before they can understand the raw materials they have to work with. It means a bored and underemployed labeling workforce. It means vendors sitting on their heels while waiting for the government to provide data access and define priorities of work.

It means that testing and evaluation, the linchpin of artificial intelligence development that tells leaders whether their models are worth deploying at all, is harder to design because of incomplete data discovery and labeled data scarcity. Ultimately, slow data means that deploying performant algorithms for the war fighter is delayed by months, even years. On this timeline, we can forget about agile development and user-driven design.

There is another way. It starts with properly interpreting the 2020 DOD Data Strategy, a major step forward in reshaping how the defense community thinks about data. Defining data as a strategic asset is a good thing because it elevates the importance of defense data in the collective consciousness. But loose interpretation of the word “strategic” may also give data gatekeepers a sense of anxiety — a misplaced emphasis on security, caution and risk aversion. In practice, defense data should instead be considered a common resource, circulating freely and rapidly within the safeguards of necessary security. When it comes to data, best practice is to inhale trust, exhale teamwork, and pay it forward.

Jaim Coddington led data operations for two years at Project Maven, a Department of Defense artificial intelligence program. He is a Marine Corps Reserve intelligence officer, and a founding member of Spear AI, a defense technology company.


Show More

Related Articles

Back to top button