Augmentation

We support time-domain data augmentation via WavAugment library. WavAugment combines libsox and its own implementations to provide a range of augmentations. Since WavAugment depends on libsox, it is an optional depedency for Lhotse, which can be installed using tools/install_wavaugment.sh (for convenience, on Mac OS X the script will also compile libsox from source - though note that the WavAugment authors warn their library is untested on Mac).

Using Lhotse’s Python API, you can compose an arbitrary effect chain. On the other hand, for the CLI we provide a small number of predefined effect chains, such as pitch (pitch shifting), reverb (reverberation), and pitch_reverb_tdrop (pitch shift + reverberation + time dropout of a 50ms chunk).

Python usage

We define a WavAugmenter class that is a thin wrapper over WavAugment. It can either be created with a predefined, or a user-supplied effect chain.

class lhotse.augmentation.WavAugmenter(effect_chain, sampling_rate)

A wrapper class for WavAugment’s effect chain. You should construct the augment.EffectChain beforehand and pass it on to this class. For more details on how to augment, see https://github.com/facebookresearch/WavAugment

__init__(effect_chain, sampling_rate)

Initialize self. See help(type(self)) for accurate signature.

static create_predefined(name, sampling_rate)

Create a WavAugmenter class with one of the predefined augmentation setups available in Lhotse. Some examples are: “pitch”, “reverb”, “pitch_reverb_tdrop”.

Parameters
  • name (str) – the name of the augmentation setup.

  • sampling_rate (int) – expected sampling rate of the input audio.

Return type

WavAugmenter

apply(audio)

Apply the effect chain on the audio tensor.

Parameters

audio (Union[Tensor, ndarray]) – a (num_channels, num_samples) shaped tensor placed on the CPU.

Return type

ndarray

CLI usage

To extract features from augmented audio, you can pass an extra --augmentation argument to lhotse feat extract.

lhotse feat extract -a pitch ...

You can create a dataset with both clean and augmented features by combining different variants of extracted features, e.g.:

lhotse feat extract audio.yml clean_feats/
lhotse feat extract -a pitch audio.yml pitch_feats/
lhotse feat extract -a reverb audio.yml reverb_feats/
lhotse yaml combine {clean,pitch,reverb}_feats/feature_manifest.yml.gz combined_feats.yml