Maggio, Valerio
(2013)
Improving Software Maintenance using Unsupervised Machine Learning techniques.
[Tesi di dottorato]
Item Type: |
Tesi di dottorato
|
Lingua: |
English |
Title: |
Improving Software Maintenance using Unsupervised Machine Learning techniques |
Creators: |
Creators | Email |
---|
Maggio, Valerio | valerio.maggio@unina.it |
|
Date: |
2 April 2013 |
Number of Pages: |
200 |
Institution: |
Università degli Studi di Napoli Federico II |
Department: |
Matematica e applicazioni "Renato Caccioppoli" |
Scuola di dottorato: |
Scienze matematiche e informatiche |
Dottorato: |
Scienze computazionali e informatiche |
Ciclo di dottorato: |
25 |
Coordinatore del Corso di dottorato: |
nome | email |
---|
Moscariello, Gioconda | gioconda.moscariello@unina.it |
|
Tutor: |
nome | email |
---|
Di Martino, Sergio | sergio.dimartino@unina.it | Corazza, Anna | anna.corazza@unina.it |
|
Date: |
2 April 2013 |
Number of Pages: |
200 |
Uncontrolled Keywords: |
Software Maintenance;Machine Learning;Software Remodularisation;Clone Detection;Code Normalisation;Kernel Methods;Unsupervised Learning; Expectation-Maximisation;Maximum Likelihood Estimation;Source code analysis |
Settori scientifico-disciplinari del MIUR: |
Area 01 - Scienze matematiche e informatiche > INF/01 - Informatica |
[error in script]
[error in script]
Date Deposited: |
04 Apr 2013 11:19 |
Last Modified: |
10 Dec 2014 14:10 |
URI: |
http://www.fedoa.unina.it/id/eprint/9079 |
DOI: |
10.6092/UNINA/FEDOA/9079 |

Abstract
Software maintenance is an essential step in the evolution of software systems
and represents one of the most expensive, time consuming, and challenging phases
of the whole development process. In particular, the cost and the effort
necessary for both the maintenance and the evolution operations (e.g.,
corrective, adaptive, etc.) are mainly related to the effort necessary to
comprehend the system and its source code.
As a consequence many "reverse engineering" tools and
solutions have been proposed to support the maintainers in their activities.
An important resource for maintainers is represented by the architectural
information of the system. However such information is usually not
documented, or the documentation is outdated. Therefore, the existing code
remains the most updated source of information to exploit in order to
automatically retrieve and reconstruct the architecture of a system.
Many research efforts are being devoted to support this task,
in order to define solutions that are able to "re-modularise"
a given software application.
The main purpose of re-modularisation techniques is to automatically partition
the system into meaningful subsystems, in order to locate and group together
software components that are in some way related, e.g., they implement the same
functionalities.
A number of these approaches generally attempt to discover these groups
(or clusters) by exploiting the lexical information provided in the source
code, such as terms in comments, as well as names of identifiers
(e.g., variable, methods and classes).
Nevertheless, the source code lexicon has some specific peculiarities that
make it conceptually different from a typical textual resource: identifiers
are often created by concatenating multiple words (e.g. getAttribute,
MINHEIGHT), which may be additionally shortened (e.g., getAttr,
MINHGT) to avoid long names. As a consequence, tools and techniques
that analyse the source code lexicon must integrate algorithms to
"normalise" its vocabulary.
Another well known and largely investigated issue in software maintenance is
"clone detection": it is focused on the identification of source code
duplications. Software clones might affect the reliability and the
maintainability of large software systems. For example, errors affecting a
fragment of code must be fixed in everyone of its possible duplications.
Clones are usually not documented, and their identification is usually
complicated since programmers adapt software copies by applying multiple
modifications (e.g., adding new statements and renaming variables).
Therefore, automatic and reliable approaches are required in order to
tackle this problem.
In this thesis we proposed new Machine Learning (ML) based approaches
that mine the relevant information directly from the source code
to cope with the three introduced issues, namely the software re-modularisation, the source code vocabulary
normalisation, and the clone detection.
In particular, proposed contributions leverages the benefits of ML
algorithms, which have been properly tailored and customised in order to
make them suitable for the considered domain.
All the presented approaches have been extensively assessed with
empirical evaluations conducted on large software systems, and results
have been compared with other related techniques, whenever possible.
Achieved results outperform the state-of-the-art solutions for all the
three considered problems, thus confirming the benefits derived from the
definition and the application of ML algorithms to maintenance tasks.
Downloads per month over past year
Actions (login required)
 |
View Item |