Kaldi Interoperability¶
Data import/export¶
We support importing Kaldi data directories that contain at least the wav.scp
file,
required to create the RecordingSet
.
Other files, such as segments
, utt2spk
, etc. are used to create the SupervisionSet
.
We also support converting feats.scp
to FeatureSet
, and reading features
directly from Kaldi’s scp/ark files via kaldiio library (which is an optional Lhotse’s dependency).
We also allow to export a pair of RecordingSet
and SupervisionSet
to a Kaldi data directory.
We currently do not support the following (but may start doing so in the future):
Exporting Lhotse extracted features to Kaldi’s
feats.scp
Export Lhotse’s multi-channel recording sets to Kaldi
Kaldi feature extractors¶
We support Kaldi-compatible log-mel filter energies (“fbank”) and MFCCs. We provide a PyTorch implementation that is GPU-compatible, allows batching, and backpropagation. To learn more about feature extraction in Lhotse, see Feature extraction.
Python¶
Python methods related to Kaldi support:
-
lhotse.kaldi.
get_duration
(path)[source] Read a audio file, it supports pipeline style wave path and real waveform.
- Parameters
path (
Union
[Path
,str
]) – Path to an audio file supported by libsoundfile (pysoundfile).- Return type
float
- Returns
duration of wav it is float.
-
lhotse.kaldi.
load_kaldi_data_dir
(path, sampling_rate, frame_shift=None)[source] Load a Kaldi data directory and convert it to a Lhotse RecordingSet and SupervisionSet manifests. For this to work, at least the wav.scp file must exist. SupervisionSet is created only when a segments file exists. All the other files (text, utt2spk, etc.) are optional, and some of them might not be handled yet. In particular, feats.scp files are ignored.
- Return type
Tuple
[RecordingSet
,Optional
[SupervisionSet
],Optional
[FeatureSet
]]
-
lhotse.kaldi.
export_to_kaldi
(recordings, supervisions, output_dir)[source] Export a pair of
RecordingSet
andSupervisionSet
to a Kaldi data directory. Currently, it only supports single-channel recordings that have a singleAudioSource
.The
RecordingSet
andSupervisionSet
must be compatible, i.e. it must be possible to create aCutSet
out of them.- Parameters
recordings (
RecordingSet
) – aRecordingSet
manifest.supervisions (
SupervisionSet
) – aSupervisionSet
manifest.output_dir (
Union
[Path
,str
]) – path where the Kaldi-style data directory will be created.
-
lhotse.kaldi.
load_kaldi_text_mapping
(path, must_exist=False)[source] Load Kaldi files such as utt2spk, spk2gender, text, etc. as a dict.
- Return type
Dict
[str
,Optional
[str
]]
-
lhotse.kaldi.
save_kaldi_text_mapping
(data, path)[source] Save flat dicts to Kaldi files such as utt2spk, spk2gender, text, etc.
CLI¶
Converting Kaldi data directory called data/train
, with 16kHz sampling rate recordings,
to a directory with Lhotse manifests called train_manifests
:
# Convert data/train to train_manifests/{recordings,supervisions}.json
lhotse kaldi import \
data/train \
16000 \
train_manifests
# Convert train_manifests/{recordings,supervisions}.json to data/train
lhotse kaldi export \
train_manifests/recordings.json \
train_manifests/supervisions.json \
data/train