Features: Estimation of information theoretic measures from associated data samples
Software needed: MATLAB, python, or C++

Interpreted > MATLAB
Compiled > MATLAB, python, C++

OS: Linux, Mac, Windows
Type: Extension/Add-on
Maturity: Stable
Developers: Michael B. Hurley, Edward K. Kao, Nicholas A. Stanisha, Philip Tran
Sponsor: Candid
Contact: hurley@ll.mit.edu


pyInfoMetrics3.zip Size: 217k


Description:This application generates information theoretic measures from two-dimensional arrays of occurrence count data between a pair of values. It is useful for evaluating any decision system where truth data are available and where an M-to-N correspondence can be made between the pairs of values.  It can also be used to estimate information theoretic measures for communications channels by using input and output data.

The information theoretic metrics are estimated from finite data samples using modified versions of Wolpert and Wolf's equations to estimate first- and second-order statistics from which means and covariance matrices can be estimated. Using variation of information, or total conditional entropy, as an overall test statistic, it is possible to determine which of multiple tracking or classification algorithms are the top performers and to determine the statistical significance of the test results.

The main MATLAB function that users call is computeMetricsFromAssocMat, which computes a number of different information theoretic parameters as well as precision and recall.

This latest distribution of the software package provides improved performance over previously released packages as well as support for python and C++. The package still contains a MATLAB function to execute computeMetricsFromAssocMat. The new package also includes software to generate a mex file for MATLAB, an embedded python module, and an example C++ based executable file. The conversion of the core calculations from MATLAB to C++ improves the execution speed by a factor of 150 to 750 times, depending on the size and contents of the occurrence matrices.

The package does not contain executable software. For MATLAB, users must build the mex file to take advantage of the faster execution speed. Software builds are also required for the python module and for C++ based executables that call computeMetricsFromAssocMat. The C++ packages, Boost (http://www.boost.org/) and Eigen (http://eigen.tuxfamily.org/), are required to build the executables and are available from the internet. The user manual contains limited instructions on how to build the mex file, the python module, and a sample C++ executable. The instructions make no attempt to describe all possible configurations of operating systems, development environments, and compilers that users may have available to build software. To ease the python build effort, two alternate approaches are described: one uses CMake (https://cmake.org/) and the other uses the b2 utility in Boost. Familiarity with building C++, MATLAB, python, and boost packages will ease the effort to build these modules.

MATLAB users may use plotEvalResults to plot their results, although it is anticipated that users will eventually develop their own plotting routines to show the facets of their analysis that are of the greatest interest. A sample plotting function has not been provided for python. Users can use the plotting capabilities in python to show the results.

An example of how to use the information theoretic estimation function in MATLAB is provided in EvalTestScript.m. This example script performs the following steps:

  • Create a small example data set which includes association matrices for two simulated classifiers.
  • Compute a collection of metrics given the association result.
  • Plot the results.

For python, the file test.py performs a similar function, without plotted results.

For an overview of the metrics, see the references below.

Edward Kao, Matthew Daggett, and Michael Hurley, "An Information Theoretic Approach for Tracker Performance Evaluation," Proceedings of the IEEE 12th International Conference on Computer Vision, Kyoto, Japan, September 27-October 4, 2009.

Holt, Ryan S., et al. "Information theoretic approach for performance evaluation of multi-class assignment systems." SPIE Defense, Security, and Sensing. International Society for Optics and Photonics, 2010.

Michael B. Hurley and Edward K. Kao, Numerical Estimation of Information Theoretic Measures from Large Data Sets, TR-1169, MIT Lincoln Laboratory, (30 Jan 2013), http://www.dtic.mil/get-tr-doc/pdf?AD=ADA580524

Copyright © 2005–2016, Massachusetts Institute of Technology.  All rights reserved. This package has been approved for unlimited distribution by the MIT Technology Licensing Office.

pyInfoMetrics3 is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. pyInfoMetrics3 is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should receive a copy of the GNU General Public License along with pyInfoMetrics3. If not, see http://www.gnu.org/licenses/.

MATLAB® is a registered MathWorks trademark. Reference to commercial products, tradenames, trademarks, or manufacturer does not constitute or imply endorsement.

top of page