Augmentation¶
We support time-domain data augmentation via WavAugment and torchaudio libraries. They both leverage libsox to provide about 50 different audio effects like reverb, speed perturbation, pitch, etc.
Since WavAugment
depends on libsox, it is an optional depedency for Lhotse, which can be installed using tools/install_wavaugment.sh
(for convenience, the script will also compile libsox from source - note that the WavAugment
authors warn their library is untested on Mac).
Torchaudio also depends on libsox, but seems to provide it when installed via anaconda. This functionality is only available with PyTorch 1.7+ and torchaudio 0.7+.
Using Lhotse’s Python API, you can compose an arbitrary effect chain. On the other hand, for the CLI we provide a small number of predefined effect chains, such as pitch
(pitch shifting), reverb
(reverberation), and pitch_reverb_tdrop
(pitch shift + reverberation + time dropout of a 50ms chunk).
Python usage¶
Warning
When using WavAugment or torchaudio data augmentation together with a multiprocessing executor (i.e. ProcessPoolExecutor
), it is necessary to start it using the “spawn” context. Otherwise the process may hang (or terminate) on some systems due to libsox internals not handling forking well. Use: ProcessPoolExecutor(..., mp_context=multiprocessing.get_context("spawn"))
.
Lhotse’s FeatureExtractor
and Cut
offer convenience functions for feature extraction with data augmentation
performed before that. These functions expose an optional parameter called augment_fn
that has a signature like:
def augment_fn(audio: Union[np.ndarray, torch.Tensor], sampling_rate: int) -> np.ndarray: ...
For torchaudio
we define a SoxEffectTransform
class:
-
class
lhotse.augmentation.
SoxEffectTransform
(effects) Class-style wrapper for torchaudio SoX effect chains. It should be initialized with a config-like list of items that define SoX effect to be applied. It supports sampling randomized values for effect parameters through the
RandomValue
wrapper.- Example:
>>> audio = np.random.rand(16000) >>> augment_fn = SoxEffectTransform(effects=[ >>> ['reverb', 50, 50, RandomValue(0, 100)], >>> ['speed', RandomValue(0.9, 1.1)], >>> ['rate', 16000], >>> ]) >>> augmented = augment_fn(audio, 16000)
See SoX manual or
torchaudio.sox_effects.effect_names()
for the list of possible effects. The parameters and the meaning of the values are explained in SoX manual/help.-
__init__
(effects) Initialize self. See help(type(self)) for accurate signature.
-
sample_effects
() Resolve a list of effects, replacing random distributions with samples from them. It converts every number to string to match the expectations of torchaudio.
- Return type
List
[List
[str
]]
We define a WavAugmenter
class that is a thin wrapper over WavAugment
. It can either be created with a predefined, or a user-supplied effect chain.
-
class
lhotse.augmentation.
WavAugmenter
(effect_chain) A wrapper class for WavAugment’s effect chain. You should construct the
augment.EffectChain
beforehand and pass it on to this class.This class is only available when WavAugment is installed, as it is an optional dependency for Lhotse. It can be installed using the script in “<main-repo-directory>/tools/install_wavaugment.sh”
For more details on how to augment, see https://github.com/facebookresearch/WavAugment
-
__init__
(effect_chain) Initialize self. See help(type(self)) for accurate signature.
-
static
create_predefined
(name, sampling_rate, **kwargs) Create a WavAugmenter class with one of the predefined augmentation setups available in Lhotse. Some examples are: “pitch”, “reverb”, “pitch_reverb_tdrop”.
- Parameters
name (
str
) – the name of the augmentation setup.sampling_rate (
int
) – expected sampling rate of the input audio.
- Return type
WavAugmenter
-
CLI usage¶
To extract features from augmented audio, you can pass an extra --augmentation
argument to lhotse feat extract
.
lhotse feat extract -a pitch ...
You can create a dataset with both clean and augmented features by combining different variants of extracted features, e.g.:
lhotse feat extract audio.yml clean_feats/
lhotse feat extract -a pitch audio.yml pitch_feats/
lhotse feat extract -a reverb audio.yml reverb_feats/
lhotse yaml combine {clean,pitch,reverb}_feats/feature_manifest.yml.gz combined_feats.yml