Kaldi Interoperability¶
Data import/export¶
We support importing Kaldi data directories that contain at least the wav.scp
file,
required to create the RecordingSet
.
Other files, such as segments
, utt2spk
, etc. are used to create the SupervisionSet
.
We also support converting feats.scp
to FeatureSet
, and reading features
directly from Kaldi’s scp/ark files via kaldiio library (which is an optional Lhotse’s dependency).
We also allow to export a pair of RecordingSet
and SupervisionSet
to a Kaldi data directory.
We currently do not support the following (but may start doing so in the future):
Exporting Lhotse extracted features to Kaldi’s
feats.scp
Export Lhotse’s multi-channel recording sets to Kaldi
Kaldi feature extractors¶
We support Kaldi-compatible log-mel filter energies (“fbank”) and MFCCs. We provide a PyTorch implementation that is GPU-compatible, allows batching, and backpropagation. To learn more about feature extraction in Lhotse, see Feature extraction.
Python¶
Python methods related to Kaldi support:
- lhotse.kaldi.get_duration(path)[source]
Read a audio file, it supports pipeline style wave path and real waveform.
- Parameters
path (
Union
[Path
,str
]) – Path to an audio file or a Kaldi-style pipe.- Return type
float
- Returns
float duration of the recording, in seconds.
- lhotse.kaldi.load_kaldi_data_dir(path, sampling_rate, frame_shift=None, map_string_to_underscores=None, num_jobs=1)[source]
Load a Kaldi data directory and convert it to a Lhotse RecordingSet and SupervisionSet manifests. For this to work, at least the wav.scp file must exist. SupervisionSet is created only when a segments file exists. All the other files (text, utt2spk, etc.) are optional, and some of them might not be handled yet. In particular, feats.scp files are ignored.
- Parameters
map_string_to_underscores (
Optional
[str
]) – optional string, when specified, we will replace all instances of this string in SupervisonSegment IDs to underscores. This is to help with handling underscores in Kaldi (seeexport_to_kaldi()
). This is also done for speaker IDs.- Return type
Tuple
[RecordingSet
,Optional
[SupervisionSet
],Optional
[FeatureSet
]]
- lhotse.kaldi.export_to_kaldi(recordings, supervisions, output_dir, map_underscores_to=None)[source]
Export a pair of
RecordingSet
andSupervisionSet
to a Kaldi data directory. It even supports recordings that have multiple channels but the recordings will still have to have a singleAudioSource
.The
RecordingSet
andSupervisionSet
must be compatible, i.e. it must be possible to create aCutSet
out of them.- Parameters
recordings (
RecordingSet
) – aRecordingSet
manifest.supervisions (
SupervisionSet
) – aSupervisionSet
manifest.output_dir (
Union
[Path
,str
]) – path where the Kaldi-style data directory will be created.map_underscores_to (
Optional
[str
]) – optional string with which we will replace all underscores. This helps avoid issues with Kaldi data dir sorting.
- lhotse.kaldi.load_kaldi_text_mapping(path, must_exist=False)[source]
Load Kaldi files such as utt2spk, spk2gender, text, etc. as a dict.
- Return type
Dict
[str
,Optional
[str
]]
- lhotse.kaldi.save_kaldi_text_mapping(data, path)[source]
Save flat dicts to Kaldi files such as utt2spk, spk2gender, text, etc.
- lhotse.kaldi.make_wavscp_channel_string_map(source, sampling_rate)[source]
- Return type
Dict
[int
,str
]
CLI¶
Converting Kaldi data directory called data/train
, with 16kHz sampling rate recordings,
to a directory with Lhotse manifests called train_manifests
:
# Convert data/train to train_manifests/{recordings,supervisions}.json
lhotse kaldi import \
data/train \
16000 \
train_manifests
# Convert train_manifests/{recordings,supervisions}.json to data/train
lhotse kaldi export \
train_manifests/recordings.json \
train_manifests/supervisions.json \
data/train