writeup/conclusion.tex

   1 \section{Conclusion}
   2 We have explored applying machine learning techniques to the problem of clustering interactions with APIs and discovered the promising result that it may be possible to cluster library interactions.
   3
   4 For classification, we developed the $k$ Markov chain model based on the GMM mixture model and variable length HMM model developed in class.  We implemented 1) a Java profiler for extracting object-level Java method traces from any Java JAR executable and 2) a MATLAB routine for learning and classifying with a $k$ Markov chains mixture model.  We constructed method traces for learning from Java programs we found online.  In learning, we explored the problem of determining the number of underlying mixtures by running cross-validation for different values of $k$ and examining the BIC score.  We used cross-validation and the BIC score to confirm  our expectations that, after a certain point, increasing $k$ does not yield better results.
   5
   6 Through examining cluster contents and parameters of the Markov Chains, we determined that one could automatically learn a set of common ways of interacting with libraries.  Individual programs did not appear to have idiosyncratic ways of interacting with the libraries we examined; this is consistent with our expectation that there is some fixed number of modes of interaction.  Our model assigned most traces to a single cluster, indicating that there may be a dominant mode of interaction.  It seems that for the \texttt{ArrayList} library, using Markov chains provides sufficiently strong assumptions to yield interesting results.  We noticed, however, the deficiency that a single Markov chain can comprise multiple modes of interaction, but it seems that one could mitigate this problem with post-processing.  We would also like to note that we were fortunate that the Markov model structure could capture the \texttt{ArrayList} interactions in a sufficiently interesting way.  If we had a library that required interactions that were better expressed with, say, a probabilistic context-free grammar, the learning results may not have been quite as nice.
   7
   8 The results of this work has many applications.  Given that we know that there are common use cases of an API, we could generate a useful set of example use cases for the programmer.  We could also use this information in code synthesis: knowing about different usage modes, and perhaps about the common cases, could greatly optimize the search space.  This information is also useful in test generation: we might be able to predict interactions with the code we write and make sure our code is robust to those cases.
   9
  10 If there had been no time constraints, we would have loved to look closely at other libraries, construct traces from more programs, and use variations in classification such as collapsing repeated states into a single states or a set of states and handling variable length clusters in a more clever way.