Publication Abstract

Jones. Measuring the Utility of Human Language Technology for Intelligence Analysis, Int’l Conference on Intelligence Analysis

Abstract

As Human Language Technology (HLT) evolves, it moves from the technologist community, focused on technology-centric measures of performance, such as error rates, to the analyst community, focused on the utility of the tool to help them get their jobs done. Successful transition and deployment of HLT systems requires that the technologists learn how to measure the relevant effectiveness of systems which, in turn, requires guidance and input from the analysts as to what is important in completing their jobs and how they currently measure their effectiveness and the utility of new tools. The aim of this workshop is to bring together the technologist and analyst communities to help both better understand capabilities, needs and ways to measure the utility of HLT system for Intelligence Analysis (IA).  The workshop will be organized into two sessions. In the first session, we will have presentations from both communities to help set the stage. Presenters from the technologist community will describe recently completed utility studies of HLT systems for Machine Translation, Information Extraction, Information Retrieval and Summarization. Presenters from the analyst community will describe the key elements of an IA job, measures of effectiveness used today and case studies in how HLT is currently being used and measured. The second session will consist of a panel discussion with audience participation. Panelists will be asked to respond to key questions, such as (i) how are the technology applications useful for their type of IA? (ii) are the DARPA measures appropriate for these uses? (iii) how is pertinent IA performance measured now? (iv) how can ongoing research maximize its impact on IA over time?  This workshop will be classified at the TS/SI level.


Objective: Participants in the workshop will identify opportunities over the next 12 months to measure the benefits of HLT applications in IA environments, to share data test beds, study designs, etc.


Importance:  Over the past several months, several utility studies have been conducted under Government sponsorship for estimating the utility of HLT technology for various real-world purposes.  This conference provides an opportunity to relate these estimates from the research community to actual current work in intelligence analysis and to correlate results in new experiments in those domains.

Uniqueness:  To motivate people to prepare for the workshop and focus their creativity, our goal is to secure commitment for a modest level of funding, in the $20K range, to help provide external resources (data preparation, truth-marking, human subject compensation, etc.) that may be needed to conduct the experiments.  The government site providing the financial backing would exercise final oversight over the use of the funds.  We have received positive feedback from a sponsor on this idea.

Format:  The workshop will be divided into a morning session of presentations on HLT technology and IA practices and needs.  The afternoon session will have a brainstorming session of small groups of 10-12 people assembled to propose at least four high-value experiments to estimate HLT utility for IA.  Participants at the workshop will vote to recommend the two best experiments to be conducted.  Naturally, participants are free to conduct any experiments they wish when they get home, but this recommendation would be taken into account for providing the modest resources we intend to pursue for the experiments.

Participants: The following people have accepted our invitation to make presentations during the morning session of the workshop: Joseph Olive, program manager at DARPA for EARS / TIDES / GALE; Jack Godfrey (CH/R64, NSA); Amy Weinberg, Area Director for Technology at the Center for Advanced Speech and Language (CASL) and Professor at University of Maryland, College Park. We have invited Barbara Wheatley to speak about HLT needs and goals.  Beth Walton and Doug Jones will begin the presentations with an outline of motivation and goals for the workshop.  Members of the IA community are being invited to make additional presentations.   We expect to be able to accommodate 40-50 workshop participants in our framework.  This is a hands-on workshop that requires participant attendance for the entire day.

Length: one full day.

Outline:

  • Morning Presentations:
    • Measuring human performance for IA.
    • Measuring technology performance / technology transfer for IA.
    • Core HLT Technology – the Big Picture
    • Overview of HLT Evaluation and Utility Studies
  • Afternoon Working Session
    • Panel discussion to identify constraints, provide guidance, wisdom, and suggestions for resources.
    • Small group brainstorming session to identify opportunities to conduct new evaluation experiments in the next 12 months.
    • Moderated wrap-up session: presentation of small-group ideas, vote, recommend the top 2 experiments.                   

Audio Visual Requirements: Standard computer projector and speakers.

Facilities: We need a SCIF for TS/SI work.  We will need to make four TS/SI desktop or laptop computers available for preparing PowerPoint presentations to the full group after the break-out session.  We would also request that it be possible for us to arrange lunch in the SCIF so that a working lunch is possible.

Biography:

  • Elizabeth M. Walton is the Deputy Chief of the Advanced Analysis Laboratory in the SIGINT Directorate at the National Security Agency.  She has been with NSA in a variety of analysis and leadership positions since 1981.
  • Douglas A. Jones, Ph.D., is on the technical staff at MIT Lincoln Laboratory, conducting original research on machine translation and speech recognition evaluation involving human subjects.  He was employed as a Senior Scientific Linguist at NSA, 1996-2000.

Relevant Papers

  • Jones, Douglas A. et al. 2005. "Measuring Human Readability of Machine Generated Text: Three Case Studies in Speech Recognition and Machine Translation".  For HLT Special Session, ICASSP 2005, Philadelphia.
  • Jones, Douglas A., et al. 2003. "Measuring the readability of automatic speech-to-text transcripts", EUROSPEECH-2003, 1585-1588.
  • Gibson, Edward, Douglas Jones et al. 2004. Two New Experimental Protocols for Measuring STT Readability. Report for DARPA/EARS Rich Transcription Workshop.
  • Granoien, Neil, Douglas Jones. et al. 2004. "Enabling English Speakers to Pass Level 3 on a Defense Language Proficiency Test for Arabic".  DARPA Pre-GALE Utility Study.

Contact Information
Douglas Jones
244 Wood Street
Lexington, MA 02420
daj@ll.mit.edu
(781) 981-2592
Fax: (781) 981-0186

This work is sponsored by the Defense Advanced Research Projects Agency and the Defense Language Institute under Air Force Contract F19628-00-C-0002.  Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government.