Kaldi Interoperability¶
Data import/export¶
We support importing Kaldi data directories that contain at least the wav.scp file,
required to create the RecordingSet.
Other files, such as segments, utt2spk, etc. are used to create the SupervisionSet.
We also support converting feats.scp to FeatureSet, and reading features
directly from Kaldi’s scp/ark files via kaldiio library (which is an optional Lhotse’s dependency).
We also allow to export a pair of RecordingSet and SupervisionSet
to a Kaldi data directory.
We currently do not support the following (but may start doing so in the future):
Exporting Lhotse extracted features to Kaldi’s
feats.scpExport Lhotse’s multi-channel recording sets to Kaldi
Kaldi feature extractors¶
We support Kaldi-compatible log-mel filter energies (“fbank”) and MFCCs. We provide a PyTorch implementation that is GPU-compatible, allows batching, and backpropagation. To learn more about feature extraction in Lhotse, see Feature extraction.
Python¶
Python methods related to Kaldi support:
- lhotse.kaldi.get_duration(path)[source]
Read a audio file, it supports pipeline style wave path and real waveform.
- Parameters
path (
Union[Path,str]) – Path to an audio file or a Kaldi-style pipe.- Return type
float- Returns
float duration of the recording, in seconds.
- lhotse.kaldi.load_kaldi_data_dir(path, sampling_rate, frame_shift=None, map_string_to_underscores=None, num_jobs=1)[source]
Load a Kaldi data directory and convert it to a Lhotse RecordingSet and SupervisionSet manifests. For this to work, at least the wav.scp file must exist. SupervisionSet is created only when a segments file exists. All the other files (text, utt2spk, etc.) are optional, and some of them might not be handled yet. In particular, feats.scp files are ignored.
- Parameters
map_string_to_underscores (
Optional[str]) – optional string, when specified, we will replace all instances of this string in SupervisonSegment IDs to underscores. This is to help with handling underscores in Kaldi (seeexport_to_kaldi()). This is also done for speaker IDs.- Return type
Tuple[RecordingSet,Optional[SupervisionSet],Optional[FeatureSet]]
- lhotse.kaldi.export_to_kaldi(recordings, supervisions, output_dir, map_underscores_to=None)[source]
Export a pair of
RecordingSetandSupervisionSetto a Kaldi data directory. It even supports recordings that have multiple channels but the recordings will still have to have a singleAudioSource.The
RecordingSetandSupervisionSetmust be compatible, i.e. it must be possible to create aCutSetout of them.- Parameters
recordings (
RecordingSet) – aRecordingSetmanifest.supervisions (
SupervisionSet) – aSupervisionSetmanifest.output_dir (
Union[Path,str]) – path where the Kaldi-style data directory will be created.map_underscores_to (
Optional[str]) – optional string with which we will replace all underscores. This helps avoid issues with Kaldi data dir sorting.
- lhotse.kaldi.load_kaldi_text_mapping(path, must_exist=False)[source]
Load Kaldi files such as utt2spk, spk2gender, text, etc. as a dict.
- Return type
Dict[str,Optional[str]]
- lhotse.kaldi.save_kaldi_text_mapping(data, path)[source]
Save flat dicts to Kaldi files such as utt2spk, spk2gender, text, etc.
- lhotse.kaldi.make_wavscp_channel_string_map(source, sampling_rate)[source]
- Return type
Dict[int,str]
CLI¶
Converting Kaldi data directory called data/train, with 16kHz sampling rate recordings,
to a directory with Lhotse manifests called train_manifests:
# Convert data/train to train_manifests/{recordings,supervisions}.json
lhotse kaldi import \
data/train \
16000 \
train_manifests
# Convert train_manifests/{recordings,supervisions}.json to data/train
lhotse kaldi export \
train_manifests/recordings.json \
train_manifests/supervisions.json \
data/train