Automatic feature selection from a large number of features for phone duration prediction


Gabriel Webster, Sabine Buchholz, Javier Latorre, Toshiba Research Europe Ltd.

The present research investigates automatic feature selection for phone duration prediction for computer text-to-speech (TTS), selecting from a large set of 242 candidate features. Two methods for avoiding overfitting the training data are evaluated. Experiments with an American English voice corpus show that automatic feature selection using n-fold cross validation combined with a simple per-feature improvement threshold was able to achieve a duration prediction accuracy of 22.5 ms RMSE, a relative error rate reduction of 7.8\% over a manually selected baseline feature set.