Mining Configuration Constraints

Overview

This page serves as an online appendix to our ICSE'14 paper "Mining Configuration Constraints: Static Analyses and Empirical Results" as well as to our extended TSE submission "Where do Configuration Constraints Stem From? An Extraction Approach and an Empirical Study"

Members

 

Project Description

This project aims to automatically extract configuration constraints from C code, and compare them to existing variability models to understand the different kinds of constraints enforced in such models. This involves both an engineering effort, and an empirical study. To automatically extract constraints, we use two infrastructures, an extended version of TypeChef to extract the information we need to compute the constraints, and a new infrastructure, FARCE, to compute the constraints.

To understand why configuration constraints are enforced in practice, we use the above infrastructure to extract constraints from four configurable systems: uClibc, eCos, Busybox, and Linux. We quantitatively analyze how many of the existing variability model constraints can be recovered using the constraints we extract, and qualitatively examine where the remaining constraints come from. Our qualitative study consists of:

  • A manual analysis of a random sample 144 non-recovered constraints across the four systems. The raw data of our qualitative study can be found in the excel sheet below.
  • Feedback of 27 developers through online questionnaires and phone interviews. The data collected is then analyzed using grounded theory.
  • Additional automated analyses that count the occurrence of certain types of constraints.

Datasets

 

Constraint Formulas

  uClibc BusyBox eCos Linux kernel
Specification 1        
Preprocessor X X X X 
Parser X X  X X
Type checking X X X X
Linker X X X  X
Combined Spec 1 X X X X
Specification 2        
Feature effect (code) X X  X X
Feature effect (build) X X  N/A X
Combined Spec2 X X X X
Full Code X X X X

File Presence Conditions

The derivation of all constraints relies on extracted file presence conditions (PCs) from the build system. We extracted the PCs from the KBuild-based build systems of BusyBox and the Linux kernel using KBuildMiner, and the PCs of uClibc files manually. In eCos, presence conditions of files are part of the variability model, which we extracted by extending our infrastructure CDLTools (i.e. we extended our extension of the eCos configurator). All file PCs are available: uClibc, BusyBox, eCos,Linux kernel.

Handling Complexity

For cases where we extract thousands of small, individual constraints (e.g., the type system in the Linux kernel), the SAT solver cannot handle combining them all into a single formula. To overcome this, we convert the individual constraints into CNF (i.e., dimacs files), and then create a text based composition mechanism to combine these individual constraints without using a SAT solver. The operation works on two files at a time, and checks for common variables. If there are no common variables, it just adds the new variables and the new clauses. If there common variables, it has to map these variables to each other, and update the corresponding clauses. This can be found in gsd.farce.features.DimacsComposer.

On the other hands, in cases where the idividual constraints themselves are complex (several hundred features with many disjunctions), the SAT solver cannot handle even converting it to CNF. We ignore such constraints and limit the expressions we deal with to those having less than 10 features. We can do the filtering using gsd.farce.features.ConstraintFilterer which simply marks expressions with more than X features.

Additionally, in some cases, combining some constraints to achieve the full formula caused conflicts. We had to debug these, and identify the maximum set of non-conflicting conflicts. The links above point to the final formulas we achieved, and individual constraints to include. Check each system's respective repositories for details.

Scripts

Apart from the individual scripts to run things in each of the repositories below, you can use the following commands to generate the latex values we use in the paper: 

 

./run.sh gsd.farce.comparisons.stats.LatexAccuracyStatsGenerator > accuracy_data.tex

./run.sh gsd.farce.comparisons.stats.LatexRecoveraStatsGenerator > recoverability_data.tex
Note that the location of the data files used is in the gsd.farce.utilities.SystemConfigs scala file.
 

Repositories

The full intermediate results of our analysis are in the respective repositories for uClibc, BusyBox, eCos, and the Linux kernel.

 

AttachmentSize
SampleVerification.xlsx895.19 KB