1. Introduction
  2. Conversion Details
  3. Conversion Restrictions
  4. Running the Original Code
  5. Acceptance Testing
  6. Legal / Licensing

Introduction

This task is to port a set of functions from the "R" language (approximately 460 SLOC) to C or C++.

These functions are taken from the LGPL package "baselineWavelet", available at:

http://code.google.com/p/baselinewavelet

This R package provides a reference implementation of the numerical peak-finding algorithm described here:

Z.M. Zhang, S. Chen, Y.Z. Liang, et al., "An intelligent background-correction algorithm for highly fluorescent samples in Raman spectroscopy." Journal of Raman Spectroscopy 41 (6), 659 (2010).
http://onlinelibrary.wiley.com/doi/10.1002/jrs.2500/abstract

A copy of the published paper will be provided on request, as this document may make clear many aspects of the algorithm and its implementation which could otherwise seem obtuse.

In essence, the algorithm to be ported reads in a file of X, Y values such as the following:

    x   y
    -- --
    1   0
    2   5
    3  30
    4  10
    5  20
    6   0
    7  10
    8   0

Graphed, the data might look like this:

   30|  
   20|     /\
   10|    /  \___/\
    0|___/         \___/\___
      1  2  3  4  5  6  7  8

As you can see, there are "spikes" or "peaks" visible at the x-axis labels (3, 5, and 7).

Simply put, the baselineWavelet algorithm is used to analyze complex data sets and identify where significant peaks are found, based on a variety of input numerical parameters.

Conversion Details

The R functions to be converted are:

You can find these functions, and no others, in a self-complete reduced R package named "baselineWavelet_reduced.tar.gz", provided in this distribution. Alternatively, if you would prefer to work from the original baselineWavelet-4.0.0 distribution, that may of course be obtained from its public site:

http://code.google.com/p/baselinewavelet

Note however that the full distribution contains a number of other R functions which we do not require, including:

Conversion Restrictions

  1. The R code must be ported to working C or C++ source code. We would prefer that the code build and execute directly from gcc/g++ under GNU/Linux, but can accept working versions for Windows Visual C++ or Mac XCode as well.
  2. Utilization of other open-source libraries such as Boost, BLAS, or LAPACK is allowed and encouraged.
  3. It is NOT acceptable to call R or Rserve in any way. Nor should any other process, daemon, or service be called, including Java JVMs, MATLAB runtimes, Unix shells, etc. All algorithms should run natively within a single process running a single executable C/C++ application.
  4. The code should not contain any memory leaks. The generated library or executable should allow many thousands of repeated calls without showing significant growth in the stack or heap over time (confirmed with valgrind or similar).
  5. It is expected that the generated source code will follow reasonable readability and maintainability guidelines. No specific style-guide requirements are imposed, but consistent indentation and logical encapsulation is expected.
  6. No additional licenses or usage restrictions may be applied to the port, which as a derivative work of an open-source LGPL project, will itself inherit the LGPL status.

Running the Original Code

The original R code can be executed and tested as follows (sample is provided under Ubuntu 10.04 Lucid/LTS):

  1. Install R-2.12 (http://cran.r-project.org/src/base/R-2/R-2.12.2.tar.gz)
  2. Install the provided "baselineWavelet_reduced" package (if you, prefer, substitute the baselineWavelet-4.0.0 formal distribution):
    $ sudo R CMD INSTALL baselineWavelet_reduced.tar.gz
  3. Install our provided demostration test-driver package:
    $ sudo R CMD INSTALL demoPeakFinder.tar.gz
  4. Launch R, and process some test input files: $ R R version 2.12.2 (2011-02-25) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: i686-pc-linux-gnu (32-bit) > require(demoPeakFinder) Loading required package: demoPeakFinder Loading required package: baselineWavelet Loading required package: Matrix Loading required package: lattice Attaching package: 'Matrix' The following object(s) are masked from 'package:base': det > testDriver("data/input-h.txt", 63, 3, 2, 8, 5) X-axis labels of declared peaks: 814.15 817.28 835.66 851.83 854.33 868.83 877.45 885.66 1021.61 > testDriver("data/input-x.txt", 63, 3, 2, 8, 5) X-axis labels of declared peaks: 807 814.48 819.42 832.12 839.66 851.67 874.89 880.15 892 974.93 > testDriver("data/input-a.txt", 63, 3, 2, 8, 5) X-axis labels of declared peaks: 806.5 815.63 857.6 876.7 881.5 891.12 906.02 911.84 930.13 1010.64 1030.73 > q() Save workspace image? [y/n/c]: n

To view the test driver source code, which demonstrates how each R function within the baselineWavelet package is being called, simply expand the package tarball:

$ tar zxvf demoPeakFinder.tar.gz $ find demoPeakFinder -type f demoPeakFinder/DESCRIPTION demoPeakFinder/R/generatePeakLabels.R $ head demoPeakFinder/R/generatePeakLabels.R require(baselineWavelet) testDriver <- function(filename, cwtScale, ridgeThreshGap, ridgeSkip, snrThresh, ridgeLength) { # load file of (X, Y) pairs, expecting tab-delimited format data <- read.table(filename, header = FALSE)

Three input data files have been provided, each containing 2048 tab-delimited lines of (x, y) tuples. It may help to visualize the test inputs and sample outputs by graphing these in Excel (or R with gnuplot) and visually correlating the output X-axis labels with the distinct visual peaks evident in the graph.

Acceptance Testing

We would expect the final ported C/C++ code to be testable as follows:

$ make

$ bin/peakFinder data/input-h.txt 63 3 2 8 5
X-axis labels of declared peaks:  814.15 817.28 835.66 851.83 854.33 868.83 877.45 885.66 1021.61 

$ bin/peakFinder data/input-x.txt 63 3 2 8 5
X-axis labels of declared peaks:  807 814.48 819.42 832.12 839.66 851.67 874.89 880.15 892 974.93 
...etc.

We have additional test input sets which we will run against the original and ported code, which will include different values for each function argument (cwtScale, ridgeThreshGap, ridgeSkip, snrThresh, ridgeLength).

A final test will include visual inspection of the code to confirm that the results are indeed being generated through a consistent and correct C/C++ port of the published mathematics.

Legal / Licensing

This port will constitute a derivative work of an LGPL open-source algorithm, and so will itself inherit LGPL status. No violations of the original authors' rights are intended or expected.