Airman explores whether novice coders can develop AI programs for military applications
In today's world, AI chatbots such as ChatGPT and Claude can perform many functions, such as composing work emails and planning travel itineraries. These chatbots are systems built around large vision-language models (VLMs): AI trained on a massive dataset that includes books, websites, code, and images. The AI algorithms are then refined on massive amounts of human-generated feedback to follow instructions and avoid harmful or unwanted output, and use that "knowledge" to produce text or images based on input from a user. Although chatbots have clear limitations, they can be very helpful for a wide range of tasks, including in some areas that traditionally require specialized skills, like computer programming.
As part of a project for the Department of the Air Force–MIT AI Accelerator Phantom Program, U.S. Air Force cadet Joshua Lynch — with the help of his mentor, Laura Niss, a technical staff member in the Embedded and AI Systems Group at MIT Lincoln Laboratory — wanted to determine if, as a complete novice to coding, he could develop a fully functional program. He used a process called "vibe-coding," in which a user relies entirely on prompts to guide a generative AI chatbot to write and refine code. His motivation was to empower anyone familiar with the military problem space, regardless of their technical background, to advance their ideas for useful software applications, essentially bypassing the time and cost constraints of the traditional military software development pipeline. Lynch aimed to build his own application while Niss monitored his experience with the technology.
"The Phantom student wanted to see if he could create a useful application through self-identified vibe-coding, without any previous experience," Niss says. "Within this project, I wanted to understand how his perception of AI changed over time with use. We both wanted to understand better where and how AI could be used by nontechnical users in the military."
Lynch set out to see if, starting with no coding skills and using chatbots, he could create an application specific to his type of tactical team to help reduce collateral damage while enhancing survivability in the broader mission. This application would offer capabilities including AI-assisted target recognition; modular intelligence, surveillance, and reconnaissance; autonomous striking; and communication management on the battlefield. During the project, Lynch completed several professional development courses in AI and familiarized himself with both military and nonmilitary uses of the technology. For the basis for his code generation, he used the paid models of three AI chatbots: Anthropic's Claude, OpenAI's ChatGPT, and Google's Gemini. Most of this work was done only through the chatbots' main chat function on a web browser, not as an integrated system within a development environment, as is standard now.
The final application was produced using Google AI Studio App, which can create applications that interface with the Gemini application programming interface and has AI integrated in the development environment. Over three months, Lynch worked with these models to build his application, called the Remote Operating Modular Augmentation Device (ROMAD-AI). During this time, he learned several methods to improve the code output. For example, he often encountered difficulties with the AI chatbots lacking hierarchical focus and modifying unrelated code sections. He discovered it was important to break problems into small parts, frame questions clearly, and steer conversations back on topic when they stray too far from the objective. Learning to recognize the chatbots' limitations and effectively work around them took up most of the project timeline. As Lynch gained more experience with the chatbots, limitations in the AI capabilities and time for development caused him to rescope the project, moving it from an application that could assist on the battlefield to one that could perform basic document processing such as analyzing tactical maps of battlefields and generating mission-planning documents through an interface with a VLM-powered chatbot. While the resulting prototype did not perform all capabilities Lynch originally set out to include (and in its current iteration was not secure for the desired use case), it proved the capability and usefulness of such an application for service members.
"I was quite impressed with this final product, and it showed me how powerful these systems can be at prototyping designs from nonexperts," Niss says. "I'm now of the opinion that these can be powerful tools for nontechnical experts to convey problems and possible solutions to technical experts, and aid in communicating desired outcomes."
Niss observed the change in Lynch's perspective of AI language models during his experience. After starting with an impressive goal, Lynch gained understanding of the capabilities of current technology and significantly scoped down his expectations by the end of the project period. Measures of his perceptions of the different AI systems over time and across system updates were particularly interesting to Lynch and Niss, with Claude showing more stability than ChatGPT across traits such as likeability, anthropomorphism, and perceived IQ. Lynch found AI to be a helpful tutor but noted its inaccuracies on topics he knew well.
The project showed that AI chatbots can empower nontechnical service members to produce viable software applications for their unique problems, though it works better as a prototyping assistant rather than a full production tool when handling sensitive information and for critical applications. Improper vetting of code may lead to security risks, as demonstrated by an instance where Lynch didn't realize that the final application was sending the input documents to a Gemini AI model to analyze, rather than parsing the documents locally on his computer. Though AI can generate significant amounts of functional code, code review remains a bottleneck in this space.
"For me, this project reinforced the expanse between experts in different fields," Niss says. "No matter how good AI gets, I think we'll always need to collaborate to get to the best solutions for the most important problems."
Research was sponsored by the Department of the Air Force Artificial Intelligence Accelerator and was accomplished under Cooperative Agreement Number FA8750-19-2-1000. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Department of the Air Force or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.
Inquiries: contact Ariana Gaines.
Related Links