This task is to port a set of functions from the "R" language (approximately 460 SLOC) to C or C++.
These functions are taken from the LGPL package "baselineWavelet", available at:
http://code.google.com/p/baselinewavelet
This R package provides a reference implementation of the numerical peak-finding algorithm described here:
Z.M. Zhang, S. Chen, Y.Z. Liang, et al., "An intelligent background-correction algorithm for highly fluorescent samples in Raman spectroscopy." Journal of Raman Spectroscopy 41 (6), 659 (2010).
http://onlinelibrary.wiley.com/doi/10.1002/jrs.2500/abstract
A copy of the published paper will be provided on request, as this document may make clear many aspects of the algorithm and its implementation which could otherwise seem obtuse.
In essence, the algorithm to be ported reads in a file of X, Y values such as the following:
x y -- -- 1 0 2 5 3 30 4 10 5 20 6 0 7 10 8 0
Graphed, the data might look like this:
30| 20| /\ 10| / \___/\ 0|___/ \___/\___ 1 2 3 4 5 6 7 8
As you can see, there are "spikes" or "peaks" visible at the x-axis labels (3, 5, and 7).
Simply put, the baselineWavelet algorithm is used to analyze complex data sets and identify where significant peaks are found, based on a variety of input numerical parameters.
The R functions to be converted are:
You can find these functions, and no others, in a self-complete reduced R package named "baselineWavelet_reduced.tar.gz", provided in this distribution. Alternatively, if you would prefer to work from the original baselineWavelet-4.0.0 distribution, that may of course be obtained from its public site:
http://code.google.com/p/baselinewavelet
Note however that the full distribution contains a number of other R functions which we do not require, including:
The original R code can be executed and tested as follows (sample is provided under Ubuntu 10.04 Lucid/LTS):
$ sudo R CMD INSTALL baselineWavelet_reduced.tar.gz
$ sudo R CMD INSTALL demoPeakFinder.tar.gz
To view the test driver source code, which demonstrates how each R function within the baselineWavelet package is being called, simply expand the package tarball:
$ tar zxvf demoPeakFinder.tar.gz $ find demoPeakFinder -type f demoPeakFinder/DESCRIPTION demoPeakFinder/R/generatePeakLabels.R $ head demoPeakFinder/R/generatePeakLabels.R require(baselineWavelet) testDriver <- function(filename, cwtScale, ridgeThreshGap, ridgeSkip, snrThresh, ridgeLength) { # load file of (X, Y) pairs, expecting tab-delimited format data <- read.table(filename, header = FALSE)
Three input data files have been provided, each containing 2048 tab-delimited lines of (x, y) tuples. It may help to visualize the test inputs and sample outputs by graphing these in Excel (or R with gnuplot) and visually correlating the output X-axis labels with the distinct visual peaks evident in the graph.
We would expect the final ported C/C++ code to be testable as follows:
...etc.$ make $ bin/peakFinder data/input-h.txt 63 3 2 8 5 X-axis labels of declared peaks: 814.15 817.28 835.66 851.83 854.33 868.83 877.45 885.66 1021.61 $ bin/peakFinder data/input-x.txt 63 3 2 8 5 X-axis labels of declared peaks: 807 814.48 819.42 832.12 839.66 851.67 874.89 880.15 892 974.93
We have additional test input sets which we will run against the original and ported code, which will include different values for each function argument (cwtScale, ridgeThreshGap, ridgeSkip, snrThresh, ridgeLength).
A final test will include visual inspection of the code to confirm that the results are indeed being generated through a consistent and correct C/C++ port of the published mathematics.
This port will constitute a derivative work of an LGPL open-source algorithm, and so will itself inherit LGPL status. No violations of the original authors' rights are intended or expected.