How challenging is the data set?

Farinaz Koushanfar
ECE Department
Rice University

            In sensor networks and other large-scale networking scenarios, the problems are often complex. Aside from the combinatorial hardness of the problems, the underlying variables and parameters are intrinsically uncertain. To address the same problem, a multitude of statistical models and algorithms have been designed that work well on one data set, but often cannot be readily incorporated nor compared with others. Even when a data set is publicly available, the degree of challenge in addressing the data is often not clear. There is a need for known sets of benchmark instances, datasets and comparison metrics to be used for evaluations.
            In this talk, I describe our ongoing work in generation of challenging benchmark instances for a popular complex sensor network problem: ad-hoc location discovery. The problem addresses determining the spatial coordinates of the distributed nodes, given noisy and maybe inconsistent (outlier) measurements of the inter-node distances and, a small number of nodes with known locations. Our goal is to generate benchmark instances that contain a spectrum of computationally challenging location discovery input data with controlled parameters such as size, uncertainty, topology, and combinatorial hardness. The benchmark generation approach utilizes a combination of real-world data and its distribution, experiment organization, resampling, instance complexity, feasibility, and sensitivity of the location discovery to the uncertain variables.