Kaldi Interoperability

We support importing Kaldi data directories that contain at least the wav.scp file, required to create the RecordingSet. Other files, such as segments, utt2spk, etc. are used to create the SupervisionSet.

We currently do not support the following (but may start doing so in the future):

  • Importing Kaldi’s extracted features (feats.scp is ignored)

  • Exporting Lhotse manifests as Kaldi data directories.


Python methods related to Kaldi support:

lhotse.kaldi.load_kaldi_data_dir(path, sampling_rate)

Load a Kaldi data directory and convert it to a Lhotse RecordingSet and SupervisionSet manifests. For this to work, at least the wav.scp file must exist. SupervisionSet is created only when a segments file exists. All the other files (text, utt2spk, etc.) are optional, and some of them might not be handled yet. In particular, feats.scp files are ignored.

Return type

Tuple[RecordingSet, Optional[SupervisionSet]]

lhotse.kaldi.export_to_kaldi(recordings, supervisions, output_dir)

Export a pair of RecordingSet and SupervisionSet to a Kaldi data directory. Currently, it only supports single-channel recordings that have a single AudioSource.

The RecordingSet and SupervisionSet must be compatible, i.e. it must be possible to create a CutSet out of them.

  • recordings (RecordingSet) – a RecordingSet manifest.

  • supervisions (SupervisionSet) – a SupervisionSet manifest.

  • output_dir (Union[Path, str]) – path where the Kaldi-style data directory will be created.

lhotse.kaldi.load_kaldi_text_mapping(path, must_exist=False)

Load Kaldi files such as utt2spk, spk2gender, text, etc. as a dict.

Return type

Dict[str, Optional[str]]

lhotse.kaldi.save_kaldi_text_mapping(data, path)

Save flat dicts to Kaldi files such as utt2spk, spk2gender, text, etc.


Converting Kaldi data directory called data/train, with 16kHz sampling rate recordings, to a directory with Lhotse manifests called train_manifests:

lhotse convert-kaldi data/train 16000 train_manifests