API Reference¶
This page contains a comprehensive list of all classes and functions within lhotse.
Recording manifests¶
Data structures used for describing audio recordings in a dataset.
- lhotse.audio.set_audio_duration_mismatch_tolerance(delta)[source]¶
Override Lhotse’s global threshold for allowed audio duration mismatch between the manifest and the actual data.
Some scenarios when a mismatch can happen:
- the
Recording
manifest duration is rounded off too much (i.e., bad user input, but too inconvenient to go back and fix the manifests)
- the
data augmentation changes the number of samples a bit in a difficult to predict way
When there is a mismatch, Lhotse will either trim or replicate the diff to match the value found in the
Recording
manifest.Note
We don’t recommend setting this lower than the default value, as it could break some data augmentation transforms.
Example:
>>> import lhotse >>> lhotse.set_audio_duration_mismatch_tolerance(0.01) # 10ms tolerance
- Parameters
delta (
float
) – New tolerance in seconds.- Return type
None
- class lhotse.audio.AudioSource(type, channels, source)[source]¶
AudioSource represents audio data that can be retrieved from somewhere. Supported sources of audio are currently: - ‘file’ (formats supported by soundfile, possibly multi-channel) - ‘command’ [unix pipe] (must be WAVE, possibly multi-channel) - ‘url’ (any URL type that is supported by “smart_open” library, e.g. http/https/s3/gcp/azure/etc.) - ‘memory’ (any format, read from a binary string attached to ‘source’ member of AudioSource)
- type: str¶
- channels: List[int]¶
- source: Union[str, bytes]¶
- load_audio(offset=0.0, duration=None, force_opus_sampling_rate=None)[source]¶
Load the AudioSource (from files, commands, or URLs) with soundfile, accounting for many audio formats and multi-channel inputs. Returns numpy array with shapes: (n_samples,) for single-channel, (n_channels, n_samples) for multi-channel.
Note: The elements in the returned array are in the range [-1.0, 1.0] and are of dtype np.float32.
- Parameters
force_opus_sampling_rate (
Optional
[int
]) – This parameter is only used when we detect an OPUS file. It will tell ffmpeg to resample OPUS to this sampling rate.- Return type
ndarray
- __init__(type, channels, source)¶
- class lhotse.audio.Recording(id, sources, sampling_rate, num_samples, duration, channel_ids=None, transforms=None)[source]¶
The
Recording
manifest describes the recordings in a given corpus. It contains information about the recording, such as its path(s), duration, the number of samples, etc. It allows to represent multiple channels coming from one or more files.This manifest does not specify any segmentation information or supervision such as the transcript or the speaker – we use
SupervisionSegment
for that.Note that
Recording
can represent both a single utterance (e.g., in LibriSpeech) and a 1-hour session with multiple channels and speakers (e.g., in AMI). In the latter case, it is partitioned into data suitable for model training usingCut
.Hint
Lhotse reads audio recordings using `pysoundfile`_ and `audioread`_, similarly to librosa, to support multiple audio formats. For OPUS files we require ffmpeg to be installed.
Hint
Since we support importing Kaldi data dirs, if
wav.scp
contains unix pipes,Recording
will also handle them correctly.Examples
A
Recording
can be simply created from a local audio file:>>> from lhotse import RecordingSet, Recording, AudioSource >>> recording = Recording.from_file('meeting.wav') >>> recording Recording( id='meeting', sources=[AudioSource(type='file', channels=[0], source='meeting.wav')], sampling_rate=16000, num_samples=57600000, duration=3600.0, transforms=None )
This manifest can be easily converted to a Python dict and serialized to JSON/JSONL/YAML/etc:
>>> recording.to_dict() {'id': 'meeting', 'sources': [{'type': 'file', 'channels': [0], 'source': 'meeting.wav'}], 'sampling_rate': 16000, 'num_samples': 57600000, 'duration': 3600.0}
Recordings can be also created programatically, e.g. when they refer to URLs stored in S3 or somewhere else:
>>> s3_audio_files = ['s3://my-bucket/123-5678.flac', ...] >>> recs = RecordingSet.from_recordings( ... Recording( ... id=url.split('/')[-1].replace('.flac', ''), ... sources=[AudioSource(type='url', source=url, channels=[0])], ... sampling_rate=16000, ... num_samples=get_num_samples(url), ... duration=get_duration(url) ... ) ... for url in s3_audio_files ... )
It allows reading a subset of the audio samples as a numpy array:
>>> samples = recording.load_audio() >>> assert samples.shape == (1, 16000) >>> samples2 = recording.load_audio(offset=0.5) >>> assert samples2.shape == (1, 8000)
- id: str¶
- sources: List[lhotse.audio.AudioSource]¶
- sampling_rate: int¶
- num_samples: int¶
- duration: float¶
- channel_ids: Optional[List[int]] = None¶
- transforms: Optional[List[Dict]] = None¶
- property num_channels¶
- static from_file(path, recording_id=None, relative_path_depth=None, force_opus_sampling_rate=None, force_read_audio=False)[source]¶
Read an audio file’s header and create the corresponding
Recording
. Suitable to use when each physical file represents a separate recording session.Caution
If a recording session consists of multiple files (e.g. one per channel), it is advisable to create the
Recording
object manually, with each file represented as a separateAudioSource
object.- Parameters
path (
Union
[Path
,str
]) – Path to an audio file supported by libsoundfile (pysoundfile).recording_id (
Union
[str
,Callable
[[Path
],str
],None
]) – recording id, when not specified ream the filename’s stem (“x.wav” -> “x”). It can be specified as a string or a function that takes the recording path and returns a string.relative_path_depth (
Optional
[int
]) – optional int specifying how many last parts of the file path should be retained in theAudioSource
. By default writes the path as is.force_opus_sampling_rate (
Optional
[int
]) – when specified, this value will be used as the sampling rate instead of the one we read from the manifest. This is useful for OPUS files that always have 48kHz rate and need to be resampled to the real one – we will perform that operation “under-the-hood”. For non-OPUS files this input is undefined.force_read_audio (
bool
) – Set it toTrue
for audio files that do not have any metadata in their headers (e.g., “The People’s Speech” FLAC files).
- Return type
- Returns
a new
Recording
instance pointing to the audio file.
- static from_bytes(data, recording_id)[source]¶
Like
Recording.from_file()
, but creates a manifest for a byte string with raw encoded audio data. This data is first decoded to obtain info such as the sampling rate, number of channels, etc. Then, the binary data is attached to the manifest. CallingRecording.load_audio()
does not perform any I/O and instead decodes the byte string contents in memory.Note
Intended use of this method is for packing Recordings into archives where metadata and data should be available together (e.g., in WebDataset style tarballs).
Caution
Manifest created with this method cannot be stored as JSON because JSON doesn’t allow serializing binary data.
- Parameters
data (
bytes
) – bytes, byte string containing encoded audio contents.recording_id (
str
) – recording id, unique string identifier.
- Return type
- Returns
a new
Recording
instance that owns the byte string data.
- move_to_memory(channels=None, offset=None, duration=None, format=None)[source]¶
Read audio data and return a copy of the manifest with binary data attached. Calling
Recording.load_audio()
on that copy will not trigger I/O.If all arguments are left as defaults, we won’t decode the audio and attach the bytes we read from disk/other source as-is. If
channels
,duration
, oroffset
are specified, we’ll decode the audio and re-encode it intoformat
before attaching. The default format is FLAC, other formats compatible with torchaudio.save are also accepted.- Return type
- to_cut()[source]¶
Create a Cut out of this recording — MonoCut or MultiCut, depending on the number of channels.
- load_audio(channels=None, offset=0.0, duration=None)[source]¶
Read the audio samples from the underlying audio source (path, URL, unix pipe/command).
- Parameters
channels (
Union
[int
,List
[int
],None
]) – int or iterable of ints, a subset of channel IDs to read (reads all by default).offset (
float
) – seconds, where to start reading the audio (at offset 0 by default). Note that it is only efficient for local filesystem files, i.e. URLs and commands will read all the samples first and discard the unneeded ones afterwards.duration (
Optional
[float
]) – seconds, indicates the total audio time to read (starting fromoffset
).
- Return type
ndarray
- Returns
a numpy array of audio samples with shape
(num_channels, num_samples)
.
- perturb_speed(factor, affix_id=True)[source]¶
Return a new
Recording
that will lazily perturb the speed while loading audio. Thenum_samples
andduration
fields are updated to reflect the shrinking/extending effect of speed.- Parameters
factor (
float
) – The speed will be adjusted this many times (e.g. factor=1.1 means 1.1x faster).affix_id (
bool
) – When true, we will modify theRecording.id
field by affixing it with “_sp{factor}”.
- Return type
- Returns
a modified copy of the current
Recording
.
- perturb_tempo(factor, affix_id=True)[source]¶
Return a new
Recording
that will lazily perturb the tempo while loading audio.Compared to speed perturbation, tempo preserves pitch. The
num_samples
andduration
fields are updated to reflect the shrinking/extending effect of tempo.- Parameters
factor (
float
) – The tempo will be adjusted this many times (e.g. factor=1.1 means 1.1x faster).affix_id (
bool
) – When true, we will modify theRecording.id
field by affixing it with “_tp{factor}”.
- Return type
- Returns
a modified copy of the current
Recording
.
- perturb_volume(factor, affix_id=True)[source]¶
Return a new
Recording
that will lazily perturb the volume while loading audio.- Parameters
factor (
float
) – The volume scale to be applied (e.g. factor=1.1 means 1.1x louder).affix_id (
bool
) – When true, we will modify theRecording.id
field by affixing it with “_tp{factor}”.
- Return type
- Returns
a modified copy of the current
Recording
.
- reverb_rir(rir_recording=None, normalize_output=True, early_only=False, affix_id=True, rir_channels=None)[source]¶
Return a new
Recording
that will lazily apply reverberation based on provided impulse response while loading audio. If no impulse response is provided, we will generate an RIR using a fast random generator (https://arxiv.org/abs/2208.04101).- Parameters
rir_recording (
Optional
[Recording
]) – The impulse response to be used.normalize_output (
bool
) – When true, output will be normalized to have energy as input.early_only (
bool
) – When true, only the early reflections (first 50 ms) will be used.affix_id (
bool
) – When true, we will modify theRecording.id
field by affixing it with “_rvb”.rir_channels (
Optional
[List
[int
]]) – The channels of the impulse response to be used (in case of multi-channel impulse responses). By default, only the first channel is used. If no RIR is provided, we will generate one with as many channels as this argument specifies.
- Return type
- Returns
the perturbed
Recording
.
- resample(sampling_rate)[source]¶
Return a new
Recording
that will be lazily resampled while loading audio. :type sampling_rate:int
:param sampling_rate: The new sampling rate. :rtype:Recording
:return: A resampledRecording
.
- __init__(id, sources, sampling_rate, num_samples, duration, channel_ids=None, transforms=None)¶
- class lhotse.audio.RecordingSet(recordings=None)[source]¶
RecordingSet
represents a collection of recordings, indexed by recording IDs. It does not contain any annotation such as the transcript or the speaker identity – just the information needed to retrieve a recording such as its path, URL, number of channels, and some recording metadata (duration, number of samples).It also supports (de)serialization to/from YAML/JSON/etc. and takes care of mapping between rich Python classes and YAML/JSON/etc. primitives during conversion.
When coming from Kaldi, think of it as
wav.scp
on steroids:RecordingSet
also has the information from reco2dur and reco2num_samples, is able to represent multi-channel recordings and read a specified subset of channels, and support reading audio files directly, via a unix pipe, or downloading them on-the-fly from a URL (HTTPS/S3/Azure/GCP/etc.).Examples:
RecordingSet
can be created from an iterable ofRecording
objects:>>> from lhotse import RecordingSet >>> audio_paths = ['123-5678.wav', ...] >>> recs = RecordingSet.from_recordings(Recording.from_file(p) for p in audio_paths)
As well as from a directory, which will be scanned recursively for files with parallel processing:
>>> recs2 = RecordingSet.from_dir('/data/audio', pattern='*.flac', num_jobs=4)
It behaves similarly to a
dict
:>>> '123-5678' in recs True >>> recording = recs['123-5678'] >>> for recording in recs: >>> pass >>> len(recs) 127
It also provides some utilities for I/O:
>>> recs.to_file('recordings.jsonl') >>> recs.to_file('recordings.json.gz') # auto-compression >>> recs2 = RecordingSet.from_file('recordings.jsonl')
Manipulation:
>>> longer_than_5s = recs.filter(lambda r: r.duration > 5) >>> first_100 = recs.subset(first=100) >>> split_into_4 = recs.split(num_splits=4) >>> shuffled = recs.shuffle()
And lazy data augmentation/transformation, that requires to adjust some information in the manifest (e.g.,
num_samples
orduration
). Note that in the following examples, the audio is untouched – the operations are stored in the manifest, and executed upon reading the audio:>>> recs_sp = recs.perturb_speed(factor=1.1) >>> recs_vp = recs.perturb_volume(factor=2.) >>> recs_rvb = recs.reverb_rir(rir_recs) >>> recs_24k = recs.resample(24000)
- property data: Union[Dict[str, lhotse.audio.Recording], Iterable[lhotse.audio.Recording]]¶
Alias property for
self.recordings
- property ids: Iterable[str]¶
- Return type
Iterable
[str
]
- static from_items(recordings)¶
Function to be implemented by every sub-class of this mixin. It’s expected to create a sub-class instance out of an iterable of items that are held by the sub-class (e.g.,
CutSet.from_items(iterable_of_cuts)
).- Return type
- static from_dir(path, pattern, num_jobs=1, force_opus_sampling_rate=None, recording_id=None)[source]¶
Recursively scan a directory
path
for audio files that match the givenpattern
and create aRecordingSet
manifest for them. Suitable to use when each physical file represents a separate recording session.Caution
If a recording session consists of multiple files (e.g. one per channel), it is advisable to create each
Recording
object manually, with each file represented as a separateAudioSource
object, and then aRecordingSet
that contains all the recordings.- Parameters
path (
Union
[Path
,str
]) – Path to a directory of audio of files (possibly with sub-directories).pattern (
str
) – A bash-like pattern specifying allowed filenames, e.g.*.wav
orsession1-*.flac
.num_jobs (
int
) – The number of parallel workers for reading audio files to get their metadata.force_opus_sampling_rate (
Optional
[int
]) – when specified, this value will be used as the sampling rate instead of the one we read from the manifest. This is useful for OPUS files that always have 48kHz rate and need to be resampled to the real one – we will perform that operation “under-the-hood”. For non-OPUS files this input does nothing.recording_id (
Optional
[Callable
[[Path
],str
]]) – A function which takes the audio file path and returns the recording ID. If not specified, the filename will be used as the recording ID.
- Returns
a new
Recording
instance pointing to the audio file.
- split(num_splits, shuffle=False, drop_last=False)[source]¶
Split the
RecordingSet
intonum_splits
pieces of equal size.- Parameters
num_splits (
int
) – Requested number of splits.shuffle (
bool
) – Optionally shuffle the recordings order first.drop_last (
bool
) – determines how to handle splitting whenlen(seq)
is not divisible bynum_splits
. WhenFalse
(default), the splits might have unequal lengths. WhenTrue
, it may discard the last element in some splits to ensure they are equally long.
- Return type
List
[RecordingSet
]- Returns
A list of
RecordingSet
pieces.
- split_lazy(output_dir, chunk_size, prefix='')[source]¶
Splits a manifest (either lazily or eagerly opened) into chunks, each with
chunk_size
items (except for the last one, typically).In order to be memory efficient, this implementation saves each chunk to disk in a
.jsonl.gz
format as the input manifest is sampled.Note
For lowest memory usage, use
load_manifest_lazy
to open the input manifest for this method.- Parameters
output_dir (
Union
[Path
,str
]) – directory where the split manifests are saved. Each manifest is saved at:{output_dir}/{prefix}.{split_idx}.jsonl.gz
chunk_size (
int
) – the number of items in each chunk.prefix (
str
) – the prefix of each manifest.
- Return type
List
[RecordingSet
]- Returns
a list of lazily opened chunk manifests.
- subset(first=None, last=None)[source]¶
Return a new
RecordingSet
according to the selected subset criterion. Only a single argument tosubset
is supported at this time.- Parameters
first (
Optional
[int
]) – int, the number of first recordings to keep.last (
Optional
[int
]) – int, the number of last recordings to keep.
- Return type
- Returns
a new
RecordingSet
with the subset results.
- load_audio(recording_id, channels=None, offset_seconds=0.0, duration_seconds=None)[source]¶
- Return type
ndarray
- perturb_speed(factor, affix_id=True)[source]¶
Return a new
RecordingSet
that will lazily perturb the speed while loading audio. Thenum_samples
andduration
fields are updated to reflect the shrinking/extending effect of speed.- Parameters
factor (
float
) – The speed will be adjusted this many times (e.g. factor=1.1 means 1.1x faster).affix_id (
bool
) – When true, we will modify theRecording.id
field by affixing it with “_sp{factor}”.
- Return type
- Returns
a
RecordingSet
containing the perturbedRecording
objects.
- perturb_tempo(factor, affix_id=True)[source]¶
Return a new
RecordingSet
that will lazily perturb the tempo while loading audio. Thenum_samples
andduration
fields are updated to reflect the shrinking/extending effect of tempo.- Parameters
factor (
float
) – The speed will be adjusted this many times (e.g. factor=1.1 means 1.1x faster).affix_id (
bool
) – When true, we will modify theRecording.id
field by affixing it with “_sp{factor}”.
- Return type
- Returns
a
RecordingSet
containing the perturbedRecording
objects.
- perturb_volume(factor, affix_id=True)[source]¶
Return a new
RecordingSet
that will lazily perturb the volume while loading audio.- Parameters
factor (
float
) – The volume scale to be applied (e.g. factor=1.1 means 1.1x louder).affix_id (
bool
) – When true, we will modify theRecording.id
field by affixing it with “_sp{factor}”.
- Return type
- Returns
a
RecordingSet
containing the perturbedRecording
objects.
- reverb_rir(rir_recordings=None, normalize_output=True, early_only=False, affix_id=True, rir_channels=[0])[source]¶
Return a new
RecordingSet
that will lazily apply reverberation based on provided impulse responses while loading audio. If norir_recordings
are provided, we will generate a set of impulse responses using a fast random generator (https://arxiv.org/abs/2208.04101).- Parameters
rir_recordings (
Optional
[RecordingSet
]) – The impulse responses to be used.normalize_output (
bool
) – When true, output will be normalized to have energy as input.early_only (
bool
) – When true, only the early reflections (first 50 ms) will be used.affix_id (
bool
) – When true, we will modify theRecording.id
field by affixing it with “_rvb”.rir_channels (
List
[int
]) – The channels to be used for the RIRs (if multi-channel). Uses first channel by default. If no RIR is provided, we will generate one with as many channels as this argument specifies.
- Return type
- Returns
a
RecordingSet
containing the perturbedRecording
objects.
- resample(sampling_rate)[source]¶
Apply resampling to all recordings in the
RecordingSet
and return a newRecordingSet
. :type sampling_rate:int
:param sampling_rate: The new sampling rate. :rtype:RecordingSet
:return: a newRecordingSet
with lazily resampledRecording
objects.
- filter(predicate)¶
Return a new manifest containing only the items that satisfy
predicate
. If the manifest is lazy, the filtering will also be applied lazily.- Parameters
predicate (
Callable
[[~T],bool
]) – a function that takes a cut as an argument and returns bool.- Returns
a filtered manifest.
- classmethod from_file(path)¶
- Return type
Any
- classmethod from_json(path)¶
- Return type
Any
- classmethod from_jsonl(path)¶
- Return type
Any
- classmethod from_jsonl_lazy(path)¶
Read a JSONL manifest in a lazy manner, which opens the file but does not read it immediately. It is only suitable for sequential reads and iteration.
Warning
Opening the manifest in this way might cause some methods that rely on random access to fail.
- Return type
Any
- classmethod from_yaml(path)¶
- Return type
Any
- property is_lazy: bool¶
Indicates whether this manifest was opened in lazy (read-on-the-fly) mode or not.
- Return type
bool
- map(transform_fn)¶
Apply transform_fn to each item in this manifest and return a new manifest. If the manifest is opened lazy, the transform is also applied lazily.
- Parameters
transform_fn (
Callable
[[~T], ~T]) – A callable (function) that accepts a single item instance and returns a new (or the same) instance of the same type. E.g. with CutSet, callable acceptsCut
and returns alsoCut
.- Returns
a new
CutSet
with transformed cuts.
- classmethod mux(*manifests, stop_early=False, weights=None, seed=0)¶
Merges multiple manifest iterables into a new iterable by lazily multiplexing them during iteration time. If one of the iterables is exhausted before the others, we will keep iterating until all iterables are exhausted. This behavior can be changed with
stop_early
parameter.- Parameters
manifests – iterables to be multiplexed. They can be either lazy or eager, but the resulting manifest will always be lazy.
stop_early (
bool
) – should we stop the iteration as soon as we exhaust one of the manifests.weights (
Optional
[List
[Union
[int
,float
]]]) – an optional weight for each iterable, affects the probability of it being sampled. The weights are uniform by default. If lengths are known, it makes sense to pass them here for uniform distribution of items in the expectation.seed (
int
) – the random seed, ensures deterministic order across multiple iterations.
- classmethod open_writer(path, overwrite=True)¶
Open a sequential writer that allows to store the manifests one by one, without the necessity of storing the whole manifest set in-memory. Supports writing to JSONL format (
.jsonl
), with optional gzip compression (.jsonl.gz
).Note
when
path
isNone
, we will return aInMemoryWriter
instead has the same API but stores the manifests in memory. It is convenient when you want to make disk saving optional.Example:
>>> from lhotse import RecordingSet ... recordings = [...] ... with RecordingSet.open_writer('recordings.jsonl.gz') as writer: ... for recording in recordings: ... writer.write(recording)
This writer can be useful for continuing to write files that were previously stopped – it will open the existing file and scan it for item IDs to skip writing them later. It can also be queried for existing IDs so that the user code may skip preparing the corresponding manifests.
Example:
>>> from lhotse import RecordingSet, Recording ... with RecordingSet.open_writer('recordings.jsonl.gz', overwrite=False) as writer: ... for path in Path('.').rglob('*.wav'): ... recording_id = path.stem ... if writer.contains(recording_id): ... # Item already written previously - skip processing. ... continue ... # Item doesn't exist yet - run extra work to prepare the manifest ... # and store it. ... recording = Recording.from_file(path, recording_id=recording_id) ... writer.write(recording)
- Return type
Union
[SequentialJsonlWriter
,InMemoryWriter
]
- repeat(times=None, preserve_id=False)¶
Return a new, lazily evaluated manifest that iterates over the original elements
times
number of times.- Parameters
times (
Optional
[int
]) – how many times to repeat (infinite by default).preserve_id (
bool
) – whenTrue
, we won’t update the element ID with repeat number.
- Returns
a repeated manifest.
- shuffle(rng=None, buffer_size=10000)¶
Shuffles the elements and returns a shuffled variant of self. If the manifest is opened lazily, performs shuffling on-the-fly with a fixed buffer size.
- Parameters
rng (
Optional
[Random
]) – an optional instance ofrandom.Random
for precise control of randomness.- Returns
a shuffled copy of self, or a manifest that is shuffled lazily.
- to_eager()¶
Evaluates all lazy operations on this manifest, if any, and returns a copy that keeps all items in memory. If the manifest was “eager” already, this is a no-op and won’t copy anything.
- to_file(path)¶
- Return type
None
- to_json(path)¶
- Return type
None
- to_jsonl(path)¶
- Return type
None
- to_yaml(path)¶
- Return type
None
- class lhotse.audio.AudioMixer(base_audio, sampling_rate, reference_energy=None)[source]¶
Utility class to mix multiple waveforms into a single one. It should be instantiated separately for each mixing session (i.e. each
MixedCut
will create a separateAudioMixer
to mix its tracks). It is initialized with a numpy array of audio samples (typically float32 in [-1, 1] range) that represents the “reference” signal for the mix. Other signals can be mixed to it with different time offsets and SNRs using theadd_to_mix
method. The time offset is relative to the start of the reference signal (only positive values are supported). The SNR is relative to the energy of the signal used to initialize theAudioMixer
.Note
Both single-channel and multi-channel signals are supported as reference and added signals. The only requirement is that the when mixing 2 multi-channel signals, they must have the same number of channels.
Note
When the AudioMixer contains multi-channel tracks, 2 types of mixed signals can be generated: - mixed_audio mixes each channel independently, and returns a multi-channel signal.
If there is a mono track, it is added to all the channels.
mixed_mono_audio mixes all channels together, and returns a single-channel signal.
- __init__(base_audio, sampling_rate, reference_energy=None)[source]¶
AudioMixer’s constructor.
- Parameters
base_audio (
ndarray
) – A numpy array with the audio samples for the base signal (all the other signals will be mixed to it).sampling_rate (
int
) – Sampling rate of the audio.reference_energy (
Optional
[float
]) – Optionally pass a reference energy value to compute SNRs against. This might be required whenbase_audio
corresponds to zero-padding.
- property num_samples_total: int¶
- Return type
int
- property unmixed_audio: List[numpy.ndarray]¶
Return a list of numpy arrays with the shape (C, num_samples), where each track is zero padded and scaled adequately to the offsets and SNR used in
add_to_mix
call.- Return type
List
[ndarray
]
- property mixed_audio: numpy.ndarray¶
Return a numpy ndarray with the shape (num_channels, num_samples) - a mix of the tracks supplied with
add_to_mix
calls.- Return type
ndarray
- property mixed_mono_audio: numpy.ndarray¶
Return a numpy ndarray with the shape (1, num_samples) - a mix of the tracks supplied with
add_to_mix
calls.- Return type
ndarray
- add_to_mix(audio, snr=None, offset=0.0)[source]¶
Add audio of a new track into the mix. :type audio:
ndarray
:param audio: An array of audio samples to be mixed in. :type snr:Optional
[float
] :param snr: Signal-to-noise ratio, assuming audio represents noise (positive SNR - lower audio energy, negative SNR - higher audio energy) :type offset:float
:param offset: How many seconds to shift audio in time. For mixing, the signal will be padded before the start with low energy values. :return:
- lhotse.audio.read_audio(path_or_fd, offset=0.0, duration=None, force_opus_sampling_rate=None)[source]¶
- Return type
Tuple
[ndarray
,int
]
- class lhotse.audio.AudioBackend[source]¶
Internal Lhotse abstraction. An AudioBackend defines three methods: one for reading audio, and two filters that help determine if it should be used.
handle_special_case
means this backend should be exclusively used for a given type of input path/file.is_applicable
means this backend most likely can be used for a given type of input path/file, but it may also fail. Its purpose is more to filter out formats that definitely are not supported.
- class lhotse.audio.Sph2pipeSubprocessBackend[source]¶
- class lhotse.audio.FfmpegTorchaudioStreamerBackend[source]¶
- class lhotse.audio.TorchaudioDefaultBackend[source]¶
- read_audio(path_or_fd, offset=0.0, duration=None, force_opus_sampling_rate=None)[source]¶
- Return type
Tuple
[ndarray
,int
]
- handles_special_case(path_or_fd)¶
- Return type
bool
- is_applicable(path_or_fd)¶
- Return type
bool
- class lhotse.audio.LibsndfileBackend[source]¶
A backend that uses PySoundFile.
Note
PySoundFile has issues on MacOS because of the way its CFFI bindings are implemented. For now, we disable it on this platform. See: https://github.com/bastibe/python-soundfile/issues/331
- class lhotse.audio.AudioreadBackend[source]¶
- read_audio(path_or_fd, offset=0.0, duration=None, force_opus_sampling_rate=None)[source]¶
- Return type
Tuple
[ndarray
,int
]
- handles_special_case(path_or_fd)¶
- Return type
bool
- is_applicable(path_or_fd)¶
- Return type
bool
- class lhotse.audio.CompositeAudioBackend(backends)[source]¶
-
- read_audio(path_or_fd, offset=0.0, duration=None, force_opus_sampling_rate=None)[source]¶
- Return type
Tuple
[ndarray
,int
]
- handles_special_case(path_or_fd)¶
- Return type
bool
- is_applicable(path_or_fd)¶
- Return type
bool
- lhotse.audio.get_default_audio_backend()[source]¶
Return a backend that can be used to read all audio formats supported by Lhotse.
It first looks for special cases that need very specific handling (such as: opus, sphere/shorten, in-memory buffers) and tries to match them against relevant audio backends.
Then, it tries to use several audio loading libraries (torchaudio, soundfile, audioread). In case the first fails, it tries the next one, and so on.
- class lhotse.audio.LibsndfileCompatibleAudioInfo(channels, frames, samplerate, duration)[source]¶
- property channels¶
Alias for field number 0
- property frames¶
Alias for field number 1
- property samplerate¶
Alias for field number 2
- property duration¶
Alias for field number 3
- count(value, /)¶
Return number of occurrences of value.
- index(value, start=0, stop=9223372036854775807, /)¶
Return first index of value.
Raises ValueError if the value is not present.
- lhotse.audio.torchaudio_supports_ffmpeg()[source]¶
Returns
True
when torchaudio version is at least 0.12.0, which has support for FFMPEG streamer API.- Return type
bool
- lhotse.audio.torchaudio_soundfile_supports_format()[source]¶
Returns
True
when torchaudio version is at least 0.9.0, which has support forformat
keyword arg intorchaudio.save()
.- Return type
bool
- lhotse.audio.torchaudio_info(path_or_fileobj)[source]¶
Return an audio info data structure that’s a compatible subset of
pysoundfile.info()
that we need to create aRecording
manifest.- Return type
- lhotse.audio.torchaudio_load(path_or_fd, offset=0, duration=None)[source]¶
- Return type
Tuple
[ndarray
,int
]
- lhotse.audio.torchaudio_ffmpeg_load(path_or_fileobj, offset=0, duration=None)[source]¶
- Return type
Tuple
[ndarray
,int
]
- lhotse.audio.soundfile_load(path_or_fd, offset=0, duration=None)[source]¶
- Return type
Tuple
[ndarray
,int
]
- lhotse.audio.audioread_info(path)[source]¶
Return an audio info data structure that’s a compatible subset of
pysoundfile.info()
that we need to create aRecording
manifest.- Return type
- lhotse.audio.audioread_load(path_or_file, offset=0.0, duration=None, dtype=<class 'numpy.float32'>)[source]¶
Load an audio buffer using audioread. This loads one block at a time, and then concatenates the results.
This function is based on librosa: https://github.com/librosa/librosa/blob/main/librosa/core/audio.py#L180
- lhotse.audio.assert_and_maybe_fix_num_samples(audio, offset, duration, recording)[source]¶
- Return type
ndarray
- lhotse.audio.read_opus(path, offset=0.0, duration=None, force_opus_sampling_rate=None)[source]¶
Reads OPUS files either using torchaudio or ffmpeg. Torchaudio is faster, but if unavailable for some reason, we fallback to a slower ffmpeg-based implementation.
- Return type
Tuple
[ndarray
,int
]- Returns
a tuple of audio samples and the sampling rate.
- lhotse.audio.read_opus_torchaudio(path, offset=0.0, duration=None, force_opus_sampling_rate=None)[source]¶
Reads OPUS files using torchaudio. This is just running
tochaudio.load()
, but we take care of extra resampling if needed.- Return type
Tuple
[ndarray
,int
]- Returns
a tuple of audio samples and the sampling rate.
- lhotse.audio.read_opus_ffmpeg(path, offset=0.0, duration=None, force_opus_sampling_rate=None)[source]¶
Reads OPUS files using ffmpeg in a shell subprocess. Unlike audioread, correctly supports offsets and durations for reading short chunks. Optionally, we can force ffmpeg to resample to the true sampling rate (if we know it up-front).
- Return type
Tuple
[ndarray
,int
]- Returns
a tuple of audio samples and the sampling rate.
- lhotse.audio.read_sph(sph_path, offset=0.0, duration=None)[source]¶
Reads SPH files using sph2pipe in a shell subprocess. Unlike audioread, correctly supports offsets and durations for reading short chunks.
- Return type
Tuple
[ndarray
,int
]- Returns
a tuple of audio samples and the sampling rate.
- exception lhotse.audio.AudioLoadingError[source]¶
- __init__(*args, **kwargs)¶
- args¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception lhotse.audio.DurationMismatchError[source]¶
- __init__(*args, **kwargs)¶
- args¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- lhotse.audio.suppress_audio_loading_errors(enabled=True)[source]¶
Context manager that suppresses errors related to audio loading. Emits warning to the console.
- lhotse.audio.null_result_on_audio_loading_error(func)[source]¶
This is a decorator that makes a function return None when reading audio with Lhotse failed.
Example:
>>> @null_result_on_audio_loading_error ... def func_loading_audio(rec): ... audio = rec.load_audio() # if this fails, will return None instead ... return other_func(audio)
Another example:
>>> # crashes on loading audio >>> audio = load_audio(cut) >>> # does not crash on loading audio, return None instead >>> maybe_audio: Optional = null_result_on_audio_loading_error(load_audio)(cut)
- Return type
Callable
Supervision manifests¶
Data structures used for describing supervisions in a dataset.
- class lhotse.supervision.AlignmentItem(symbol: str, start: float, duration: float, score: Optional[float] = None)[source]¶
This class contains an alignment item, for example a word, along with its start time (w.r.t. the start of recording) and duration. It can potentially be used to store other kinds of alignment items, such as subwords, pdfid’s etc.
- property symbol¶
Alias for field number 0
- property start¶
Alias for field number 1
- property duration¶
Alias for field number 2
- property score¶
Alias for field number 3
- property end: float¶
- Return type
float
- with_offset(offset)[source]¶
Return an identical
AlignmentItem
, but with theoffset
added to thestart
field.- Return type
- perturb_speed(factor, sampling_rate)[source]¶
Return an
AlignmentItem
that has time boundaries matching the recording/cut perturbed with the same factor. SeeSupervisionSegment.perturb_speed()
for details.- Return type
- trim(end, start=0)[source]¶
See
SupervisionSegment.trim()
.- Return type
- transform(transform_fn)[source]¶
Perform specified transformation on the alignment content.
- Return type
- count(value, /)¶
Return number of occurrences of value.
- index(value, start=0, stop=9223372036854775807, /)¶
Return first index of value.
Raises ValueError if the value is not present.
- class lhotse.supervision.SupervisionSegment(id, recording_id, start, duration, channel=0, text=None, language=None, speaker=None, gender=None, custom=None, alignment=None)[source]¶
SupervisionSegment
represents a time interval (segment) annotated with some supervision labels and/or metadata, such as the transcription, the speaker identity, the language, etc.Each supervision has unique
id
and always refers to a specific recording (viarecording_id
) and one or morechannel
(by default, 0). Note that multiple channels of the recording may share the same supervision, in which case thechannel
field will be a list of integers.It’s also characterized by the start time (relative to the beginning of a
Recording
or aCut
) and a duration, both expressed in seconds.The remaining fields are all optional, and their availability depends on specific corpora. Since it is difficult to predict all possible types of metadata, the
custom
field (a dict) can be used to insert types of supervisions that are not supported out of the box.SupervisionSegment
may contain multiple types of alignments. Thealignment
field is a dict, indexed by alignment’s type (e.g.,word
orphone
), and contains a list ofAlignmentItem
objects – simple structures that contain a given symbol and its time interval. Alignments can be read from CTM files or created programatically.Examples
A simple segment with no supervision information:
>>> from lhotse import SupervisionSegment >>> sup0 = SupervisionSegment( ... id='rec00001-sup00000', recording_id='rec00001', ... start=0.5, duration=5.0, channel=0 ... )
Typical supervision containing transcript, speaker ID, gender, and language:
>>> sup1 = SupervisionSegment( ... id='rec00001-sup00001', recording_id='rec00001', ... start=5.5, duration=3.0, channel=0, ... text='transcript of the second segment', ... speaker='Norman Dyhrentfurth', language='English', gender='M' ... )
Two supervisions denoting overlapping speech on two separate channels in a microphone array/multiple headsets (pay attention to
start
,duration
, andchannel
):>>> sup2 = SupervisionSegment( ... id='rec00001-sup00002', recording_id='rec00001', ... start=15.0, duration=5.0, channel=0, ... text="i have incredibly good news for you", ... speaker='Norman Dyhrentfurth', language='English', gender='M' ... ) >>> sup3 = SupervisionSegment( ... id='rec00001-sup00003', recording_id='rec00001', ... start=18.0, duration=3.0, channel=1, ... text="say what", ... speaker='Hervey Arman', language='English', gender='M' ... )
A supervision with a phone alignment:
>>> from lhotse.supervision import AlignmentItem >>> sup4 = SupervisionSegment( ... id='rec00001-sup00004', recording_id='rec00001', ... start=33.0, duration=1.0, channel=0, ... text="ice", ... speaker='Maryla Zechariah', language='English', gender='F' ... alignment={ ... 'phone': [ ... AlignmentItem(symbol='AY0', start=33.0, duration=0.6), ... AlignmentItem(symbol='S', start=33.6, duration=0.4) ... ] ... } ... )
A supervision shared across multiple channels of a recording (e.g. a microphone array):
>>> sup5 = SupervisionSegment( ... id='rec00001-sup00005', recording_id='rec00001', ... start=33.0, duration=1.0, channel=[0, 1], ... text="ice", ... speaker='Maryla Zechariah', ... )
Converting
SupervisionSegment
to adict
:>>> sup0.to_dict() {'id': 'rec00001-sup00000', 'recording_id': 'rec00001', 'start': 0.5, 'duration': 5.0, 'channel': 0}
- id: str¶
- recording_id: str¶
- start: float¶
- duration: float¶
- channel: Union[int, List[int]] = 0¶
- text: Optional[str] = None¶
- language: Optional[str] = None¶
- speaker: Optional[str] = None¶
- gender: Optional[str] = None¶
- custom: Optional[Dict[str, Any]] = None¶
- alignment: Optional[Dict[str, List[lhotse.supervision.AlignmentItem]]] = None¶
- property end: float¶
- Return type
float
- with_offset(offset)[source]¶
Return an identical
SupervisionSegment
, but with theoffset
added to thestart
field.- Return type
- perturb_speed(factor, sampling_rate, affix_id=True)[source]¶
Return a
SupervisionSegment
that has time boundaries matching the recording/cut perturbed with the same factor.- Parameters
factor (
float
) – The speed will be adjusted this many times (e.g. factor=1.1 means 1.1x faster).sampling_rate (
int
) – The sampling rate is necessary to accurately perturb the start and duration (going through the sample counts).affix_id (
bool
) – When true, we will modify theid
andrecording_id
fields by affixing it with “_sp{factor}”.
- Return type
- Returns
a modified copy of the current
SupervisionSegment
.
- perturb_tempo(factor, sampling_rate, affix_id=True)[source]¶
Return a
SupervisionSegment
that has time boundaries matching the recording/cut perturbed with the same factor.- Parameters
factor (
float
) – The tempo will be adjusted this many times (e.g. factor=1.1 means 1.1x faster).sampling_rate (
int
) – The sampling rate is necessary to accurately perturb the start and duration (going through the sample counts).affix_id (
bool
) – When true, we will modify theid
andrecording_id
fields by affixing it with “_tp{factor}”.
- Return type
- Returns
a modified copy of the current
SupervisionSegment
.
- perturb_volume(factor, affix_id=True)[source]¶
Return a
SupervisionSegment
with modified ids.- Parameters
factor (
float
) – The volume will be adjusted this many times (e.g. factor=1.1 means 1.1x louder).affix_id (
bool
) – When true, we will modify theid
andrecording_id
fields by affixing it with “_vp{factor}”.
- Return type
- Returns
a modified copy of the current
SupervisionSegment
.
- reverb_rir(affix_id=True, channel=None)[source]¶
Return a
SupervisionSegment
with modified ids.- Parameters
affix_id (
bool
) – When true, we will modify theid
andrecording_id
fields by affixing it with “_rvb”.- Return type
- Returns
a modified copy of the current
SupervisionSegment
.
- trim(end, start=0)[source]¶
Return an identical
SupervisionSegment
, but ensure thatself.start
is not negative (in which case it’s set to 0) andself.end
does not exceed theend
parameter. If a start is optionally provided, the supervision is trimmed from the left (note that start should be relative to the cut times).This method is useful for ensuring that the supervision does not exceed a cut’s bounds, in which case pass
cut.duration
as theend
argument, since supervision times are relative to the cut.- Return type
- map(transform_fn)[source]¶
Return a copy of the current segment, transformed with
transform_fn
.- Parameters
transform_fn (
Callable
[[SupervisionSegment
],SupervisionSegment
]) – a function that takes a segment as input, transforms it and returns a new segment.- Return type
- Returns
a modified
SupervisionSegment
.
- transform_text(transform_fn)[source]¶
Return a copy of the current segment with transformed
text
field. Useful for text normalization, phonetic transcription, etc.- Parameters
transform_fn (
Callable
[[str
],str
]) – a function that accepts a string and returns a string.- Return type
- Returns
a
SupervisionSegment
with adjusted text.
- transform_alignment(transform_fn, type='word')[source]¶
Return a copy of the current segment with transformed
alignment
field. Useful for text normalization, phonetic transcription, etc.- Parameters
type (
Optional
[str
]) – alignment type to transform (key for alignment dict).transform_fn (
Callable
[[str
],str
]) – a function that accepts a string and returns a string.
- Return type
- Returns
a
SupervisionSegment
with adjusted alignments.
- __init__(id, recording_id, start, duration, channel=0, text=None, language=None, speaker=None, gender=None, custom=None, alignment=None)¶
- class lhotse.supervision.SupervisionSet(segments)[source]¶
SupervisionSet
represents a collection of segments containing some supervision information (seeSupervisionSegment
), that are indexed by segment IDs.It acts as a Python
dict
, extended with an efficientfind
operation that indexes and caches the supervision segments in an interval tree. It allows to quickly find supervision segments that correspond to a specific time interval.When coming from Kaldi, think of
SupervisionSet
as asegments
file on steroids, that may also contain text, utt2spk, utt2gender, utt2dur, etc.Examples
Building a
SupervisionSet
:>>> from lhotse import SupervisionSet, SupervisionSegment >>> sups = SupervisionSet.from_segments([SupervisionSegment(...), ...])
Writing/reading a
SupervisionSet
:>>> sups.to_file('supervisions.jsonl.gz') >>> sups2 = SupervisionSet.from_file('supervisions.jsonl.gz')
Using
SupervisionSet
like a dict:>>> 'rec00001-sup00000' in sups True >>> sups['rec00001-sup00000'] SupervisionSegment(id='rec00001-sup00000', recording_id='rec00001', start=0.5, ...) >>> for segment in sups: ... pass
Searching by
recording_id
and time interval:>>> matched_segments = sups.find(recording_id='rec00001', start_after=17.0, end_before=25.0)
Manipulation:
>>> longer_than_5s = sups.filter(lambda s: s.duration > 5) >>> first_100 = sups.subset(first=100) >>> split_into_4 = sups.split(num_splits=4) >>> shuffled = sups.shuffle()
- property data: Union[Dict[str, lhotse.supervision.SupervisionSegment], Iterable[lhotse.supervision.SupervisionSegment]]¶
Alias property for
self.segments
- Return type
Union
[Dict
[str
,SupervisionSegment
],Iterable
[SupervisionSegment
]]
- property ids: Iterable[str]¶
- Return type
Iterable
[str
]
- static from_items(segments)¶
Function to be implemented by every sub-class of this mixin. It’s expected to create a sub-class instance out of an iterable of items that are held by the sub-class (e.g.,
CutSet.from_items(iterable_of_cuts)
).- Return type
- static from_rttm(path)[source]¶
Read an RTTM file located at
path
(or an iterator) and create aSupervisionSet
manifest for them. Can be used to create supervisions from custom RTTM files (see, for example,lhotse.dataset.DiarizationDataset
).>>> from lhotse import SupervisionSet >>> sup1 = SupervisionSet.from_rttm('/path/to/rttm_file') >>> sup2 = SupervisionSet.from_rttm(Path('/path/to/rttm_dir').rglob('ref_*'))
The following description is taken from the [dscore](https://github.com/nryant/dscore#rttm) toolkit:
Rich Transcription Time Marked (RTTM) files are space-delimited text files containing one turn per line, each line containing ten fields:
Type
– segment type; should always bySPEAKER
File ID
– file name; basename of the recording minus extension (e.g.,
rec1_a
) -Channel ID
– channel (1-indexed) that turn is on; should always be1
-Turn Onset
– onset of turn in seconds from beginning of recording -Turn Duration
– duration of turn in seconds -Orthography Field
– should always by<NA>
-Speaker Type
– should always be<NA>
-Speaker Name
– name of speaker of turn; should be unique within scope of each file -Confidence Score
– system confidence (probability) that information is correct; should always be<NA>
-Signal Lookahead Time
– should always be<NA>
For instance:
SPEAKER CMU_20020319-1400_d01_NONE 1 130.430000 2.350 <NA> <NA> juliet <NA> <NA> SPEAKER CMU_20020319-1400_d01_NONE 1 157.610000 3.060 <NA> <NA> tbc <NA> <NA> SPEAKER CMU_20020319-1400_d01_NONE 1 130.490000 0.450 <NA> <NA> chek <NA> <NA>
- Parameters
path (
Union
[Path
,str
,Iterable
[Union
[Path
,str
]]]) – Path to RTTM file or an iterator of paths to RTTM files.- Return type
- Returns
a new
SupervisionSet
instance containing segments from the RTTM file.
- with_alignment_from_ctm(ctm_file, type='word', match_channel=False)[source]¶
Add alignments from CTM file to the supervision set.
- Parameters
ctm – Path to CTM file.
type (
str
) – Alignment type (optional, default = word).match_channel (
bool
) – if True, also match channel between CTM and SupervisionSegment
- Return type
- Returns
A new SupervisionSet with AlignmentItem objects added to the segments.
- write_alignment_to_ctm(ctm_file, type='word')[source]¶
Write alignments to CTM file.
- Parameters
ctm_file (
Union
[Path
,str
]) – Path to output CTM file (will be created if not exists)type (
str
) – Alignment type to write (default = word)
- Return type
None
- split(num_splits, shuffle=False, drop_last=False)[source]¶
Split the
SupervisionSet
intonum_splits
pieces of equal size.- Parameters
num_splits (
int
) – Requested number of splits.shuffle (
bool
) – Optionally shuffle the recordings order first.drop_last (
bool
) – determines how to handle splitting whenlen(seq)
is not divisible bynum_splits
. WhenFalse
(default), the splits might have unequal lengths. WhenTrue
, it may discard the last element in some splits to ensure they are equally long.
- Return type
List
[SupervisionSet
]- Returns
A list of
SupervisionSet
pieces.
- split_lazy(output_dir, chunk_size, prefix='')[source]¶
Splits a manifest (either lazily or eagerly opened) into chunks, each with
chunk_size
items (except for the last one, typically).In order to be memory efficient, this implementation saves each chunk to disk in a
.jsonl.gz
format as the input manifest is sampled.Note
For lowest memory usage, use
load_manifest_lazy
to open the input manifest for this method.- Parameters
it – any iterable of Lhotse manifests.
output_dir (
Union
[Path
,str
]) – directory where the split manifests are saved. Each manifest is saved at:{output_dir}/{prefix}.{split_idx}.jsonl.gz
chunk_size (
int
) – the number of items in each chunk.prefix (
str
) – the prefix of each manifest.
- Return type
List
[SupervisionSet
]- Returns
a list of lazily opened chunk manifests.
- subset(first=None, last=None)[source]¶
Return a new
SupervisionSet
according to the selected subset criterion. Only a single argument tosubset
is supported at this time.- Parameters
first (
Optional
[int
]) – int, the number of first supervisions to keep.last (
Optional
[int
]) – int, the number of last supervisions to keep.
- Return type
- Returns
a new
SupervisionSet
with the subset results.
- transform_text(transform_fn)[source]¶
Return a copy of the current
SupervisionSet
with the segments having a transformedtext
field. Useful for text normalization, phonetic transcription, etc.- Parameters
transform_fn (
Callable
[[str
],str
]) – a function that accepts a string and returns a string.- Return type
- Returns
a
SupervisionSet
with adjusted text.
- transform_alignment(transform_fn, type='word')[source]¶
Return a copy of the current
SupervisionSet
with the segments having a transformedalignment
field. Useful for text normalization, phonetic transcription, etc.- Parameters
transform_fn (
Callable
[[str
],str
]) – a function that accepts a string and returns a string.type (
str
) – alignment type to transform (key for alignment dict).
- Return type
- Returns
a
SupervisionSet
with adjusted text.
- find(recording_id, channel=None, start_after=0, end_before=None, adjust_offset=False, tolerance=0.001)[source]¶
Return an iterable of segments that match the provided
recording_id
.- Parameters
recording_id (
str
) – Desired recording ID.channel (
Optional
[int
]) – When specified, return supervisions in that channel - otherwise, in all channels.start_after (
float
) – When specified, return segments that start after the given value.end_before (
Optional
[float
]) – When specified, return segments that end before the given value.adjust_offset (
bool
) – When true, return segments as if the recordings had started atstart_after
. This is useful for creating Cuts. From a user perspective, when dealing with a Cut, it is no longer helpful to know when the supervisions starts in a recording - instead, it’s useful to know when the supervision starts relative to the start of the Cut. In the anticipated use-case,start_after
andend_before
would be the beginning and end of a cut; this option converts the times to be relative to the start of the cut.tolerance (
float
) – Additional margin to account for floating point rounding errors when comparing segment boundaries.
- Return type
Iterable
[SupervisionSegment
]- Returns
An iterator over supervision segments satisfying all criteria.
- filter(predicate)¶
Return a new manifest containing only the items that satisfy
predicate
. If the manifest is lazy, the filtering will also be applied lazily.- Parameters
predicate (
Callable
[[~T],bool
]) – a function that takes a cut as an argument and returns bool.- Returns
a filtered manifest.
- classmethod from_file(path)¶
- Return type
Any
- classmethod from_json(path)¶
- Return type
Any
- classmethod from_jsonl(path)¶
- Return type
Any
- classmethod from_jsonl_lazy(path)¶
Read a JSONL manifest in a lazy manner, which opens the file but does not read it immediately. It is only suitable for sequential reads and iteration.
Warning
Opening the manifest in this way might cause some methods that rely on random access to fail.
- Return type
Any
- classmethod from_yaml(path)¶
- Return type
Any
- property is_lazy: bool¶
Indicates whether this manifest was opened in lazy (read-on-the-fly) mode or not.
- Return type
bool
- map(transform_fn)¶
Apply transform_fn to each item in this manifest and return a new manifest. If the manifest is opened lazy, the transform is also applied lazily.
- Parameters
transform_fn (
Callable
[[~T], ~T]) – A callable (function) that accepts a single item instance and returns a new (or the same) instance of the same type. E.g. with CutSet, callable acceptsCut
and returns alsoCut
.- Returns
a new
CutSet
with transformed cuts.
- classmethod mux(*manifests, stop_early=False, weights=None, seed=0)¶
Merges multiple manifest iterables into a new iterable by lazily multiplexing them during iteration time. If one of the iterables is exhausted before the others, we will keep iterating until all iterables are exhausted. This behavior can be changed with
stop_early
parameter.- Parameters
manifests – iterables to be multiplexed. They can be either lazy or eager, but the resulting manifest will always be lazy.
stop_early (
bool
) – should we stop the iteration as soon as we exhaust one of the manifests.weights (
Optional
[List
[Union
[int
,float
]]]) – an optional weight for each iterable, affects the probability of it being sampled. The weights are uniform by default. If lengths are known, it makes sense to pass them here for uniform distribution of items in the expectation.seed (
int
) – the random seed, ensures deterministic order across multiple iterations.
- classmethod open_writer(path, overwrite=True)¶
Open a sequential writer that allows to store the manifests one by one, without the necessity of storing the whole manifest set in-memory. Supports writing to JSONL format (
.jsonl
), with optional gzip compression (.jsonl.gz
).Note
when
path
isNone
, we will return aInMemoryWriter
instead has the same API but stores the manifests in memory. It is convenient when you want to make disk saving optional.Example:
>>> from lhotse import RecordingSet ... recordings = [...] ... with RecordingSet.open_writer('recordings.jsonl.gz') as writer: ... for recording in recordings: ... writer.write(recording)
This writer can be useful for continuing to write files that were previously stopped – it will open the existing file and scan it for item IDs to skip writing them later. It can also be queried for existing IDs so that the user code may skip preparing the corresponding manifests.
Example:
>>> from lhotse import RecordingSet, Recording ... with RecordingSet.open_writer('recordings.jsonl.gz', overwrite=False) as writer: ... for path in Path('.').rglob('*.wav'): ... recording_id = path.stem ... if writer.contains(recording_id): ... # Item already written previously - skip processing. ... continue ... # Item doesn't exist yet - run extra work to prepare the manifest ... # and store it. ... recording = Recording.from_file(path, recording_id=recording_id) ... writer.write(recording)
- Return type
Union
[SequentialJsonlWriter
,InMemoryWriter
]
- repeat(times=None, preserve_id=False)¶
Return a new, lazily evaluated manifest that iterates over the original elements
times
number of times.- Parameters
times (
Optional
[int
]) – how many times to repeat (infinite by default).preserve_id (
bool
) – whenTrue
, we won’t update the element ID with repeat number.
- Returns
a repeated manifest.
- shuffle(rng=None, buffer_size=10000)¶
Shuffles the elements and returns a shuffled variant of self. If the manifest is opened lazily, performs shuffling on-the-fly with a fixed buffer size.
- Parameters
rng (
Optional
[Random
]) – an optional instance ofrandom.Random
for precise control of randomness.- Returns
a shuffled copy of self, or a manifest that is shuffled lazily.
- to_eager()¶
Evaluates all lazy operations on this manifest, if any, and returns a copy that keeps all items in memory. If the manifest was “eager” already, this is a no-op and won’t copy anything.
- to_file(path)¶
- Return type
None
- to_json(path)¶
- Return type
None
- to_jsonl(path)¶
- Return type
None
- to_yaml(path)¶
- Return type
None
Feature extraction and manifests¶
Data structures and tools used for feature extraction and description.
Features API - extractor and manifests¶
- class lhotse.features.base.FeatureExtractor(config=None)[source]¶
The base class for all feature extractors in Lhotse. It is initialized with a config object, specific to a particular feature extraction method. The config is expected to be a dataclass so that it can be easily serialized.
All derived feature extractors must implement at least the following:
a
name
class attribute (how are these features called, e.g. ‘mfcc’)a
config_type
class attribute that points to the configuration dataclass typethe
extract
method,the
frame_shift
property.
Feature extractors that support feature-domain mixing should additionally specify two static methods:
compute_energy
, andmix
.
By itself, the
FeatureExtractor
offers the following high-level methods that are not intended for overriding:extract_from_samples_and_store
extract_from_recording_and_store
These methods run a larger feature extraction pipeline that involves data augmentation and disk storage.
- name = None¶
- config_type = None¶
- abstract extract(samples, sampling_rate)[source]¶
Defines how to extract features using a numpy ndarray of audio samples and the sampling rate.
- Return type
ndarray
- Returns
a numpy ndarray representing the feature matrix.
- abstract property frame_shift: float¶
- Return type
float
- property device: Union[str, torch.device]¶
- Return type
Union
[str
,device
]
- static mix(features_a, features_b, energy_scaling_factor_b)[source]¶
Perform feature-domain mix of two signals,
a
andb
, and return the mixed signal.- Parameters
features_a (
ndarray
) – Left-hand side (reference) signal.features_b (
ndarray
) – Right-hand side (mixed-in) signal.energy_scaling_factor_b (
float
) – A scaling factor forfeatures_b
energy. It is used to achieve a specific SNR. E.g. to mix with an SNR of 10dB when bothfeatures_a
andfeatures_b
energies are 100, thefeatures_b
signal energy needs to be scaled by 0.1. Since different features (e.g. spectrogram, fbank, MFCC) require different combination of transformations (e.g. exp, log, sqrt, pow) to allow mixing of two signals, the exact place where to applyenergy_scaling_factor_b
to the signal is determined by the implementer.
- Return type
ndarray
- Returns
A mixed feature matrix.
- static compute_energy(features)[source]¶
Compute the total energy of a feature matrix. How the energy is computed depends on a particular type of features. It is expected that when implemented,
compute_energy
will never return zero.- Parameters
features (
ndarray
) – A feature matrix.- Return type
float
- Returns
A positive float value of the signal energy.
- extract_batch(samples, sampling_rate)[source]¶
Performs batch extraction. It is not guaranteed to be faster than
FeatureExtractor.extract()
– it depends on whether the implementation of a particular feature extractor supports accelerated batch computation.Note
Unless overridden by child classes, it defaults to sequentially calling
FeatureExtractor.extract()
on the inputs.Note
This method should support variable length inputs.
- Return type
Union
[ndarray
,Tensor
,List
[ndarray
],List
[Tensor
]]
- extract_from_samples_and_store(samples, storage, sampling_rate, offset=0, channel=None, augment_fn=None)[source]¶
Extract the features from an array of audio samples in a full pipeline:
optional audio augmentation;
extract the features;
save them to disk in a specified directory;
return a
Features
object with a description of the extracted features.
Note, unlike in
extract_from_recording_and_store
, the returnedFeatures
object might not be suitable to store in aFeatureSet
, as it does not reference any particularRecording
. Instead, this method is useful when extracting features from cuts - especiallyMixedCut
instances, which may be created from multiple recordings and channels.- Parameters
samples (
ndarray
) – a numpy ndarray with the audio samples.sampling_rate (
int
) – integer sampling rate ofsamples
.storage (
FeaturesWriter
) – aFeaturesWriter
object that will handle storing the feature matrices.offset (
float
) – an offset in seconds for where to start reading the recording - when used forCut
feature extraction, must be equal toCut.start
.channel (
Union
[int
,List
[int
],None
]) – an optional channel number(s) to insert intoFeatures
manifest.augment_fn (
Optional
[Callable
[[ndarray
,int
],ndarray
]]) – an optionalWavAugmenter
instance to modify the waveform before feature extraction.
- Return type
- Returns
a
Features
manifest item for the extracted feature matrix (it is not written to disk).
- extract_from_recording_and_store(recording, storage, offset=0, duration=None, channels=None, augment_fn=None)[source]¶
Extract the features from a
Recording
in a full pipeline:load audio from disk;
optionally, perform audio augmentation;
extract the features;
save them to disk in a specified directory;
return a
Features
object with a description of the extracted features and the source data used.
- Parameters
recording (
Recording
) – aRecording
that specifies what’s the input audio.storage (
FeaturesWriter
) – aFeaturesWriter
object that will handle storing the feature matrices.offset (
float
) – an optional offset in seconds for where to start reading the recording.duration (
Optional
[float
]) – an optional duration specifying how much audio to load from the recording.channels (
Union
[int
,List
[int
],None
]) – an optional int or list of ints, specifying the channels; by default, all channels will be used.augment_fn (
Optional
[Callable
[[ndarray
,int
],ndarray
]]) – an optionalWavAugmenter
instance to modify the waveform before feature extraction.
- Return type
- Returns
a
Features
manifest item for the extracted feature matrix.
- lhotse.features.base.get_extractor_type(name)[source]¶
Return the feature extractor type corresponding to the given name.
- Parameters
name (
str
) – specifies which feature extractor should be used.- Return type
Type
- Returns
A feature extractors type.
- lhotse.features.base.create_default_feature_extractor(name)[source]¶
Create a feature extractor object with a default configuration.
- Parameters
name (
str
) – specifies which feature extractor should be used.- Return type
Optional
[FeatureExtractor
]- Returns
A new feature extractor instance.
- lhotse.features.base.register_extractor(cls)[source]¶
This decorator is used to register feature extractor classes in Lhotse so they can be easily created just by knowing their name.
An example of usage:
@register_extractor class MyFeatureExtractor: …
- Parameters
cls – A type (class) that is being registered.
- Returns
Registered type.
- class lhotse.features.base.TorchaudioFeatureExtractor(config=None)[source]¶
Common abstract base class for all torchaudio based feature extractors.
- extract(samples, sampling_rate)[source]¶
Defines how to extract features using a numpy ndarray of audio samples and the sampling rate.
- Return type
ndarray
- Returns
a numpy ndarray representing the feature matrix.
- property frame_shift: float¶
- Return type
float
- __init__(config=None)¶
- static compute_energy(features)¶
Compute the total energy of a feature matrix. How the energy is computed depends on a particular type of features. It is expected that when implemented,
compute_energy
will never return zero.- Parameters
features (
ndarray
) – A feature matrix.- Return type
float
- Returns
A positive float value of the signal energy.
- config_type = None¶
- property device: Union[str, torch.device]¶
- Return type
Union
[str
,device
]
- extract_batch(samples, sampling_rate)¶
Performs batch extraction. It is not guaranteed to be faster than
FeatureExtractor.extract()
– it depends on whether the implementation of a particular feature extractor supports accelerated batch computation.Note
Unless overridden by child classes, it defaults to sequentially calling
FeatureExtractor.extract()
on the inputs.Note
This method should support variable length inputs.
- Return type
Union
[ndarray
,Tensor
,List
[ndarray
],List
[Tensor
]]
- extract_from_recording_and_store(recording, storage, offset=0, duration=None, channels=None, augment_fn=None)¶
Extract the features from a
Recording
in a full pipeline:load audio from disk;
optionally, perform audio augmentation;
extract the features;
save them to disk in a specified directory;
return a
Features
object with a description of the extracted features and the source data used.
- Parameters
recording (
Recording
) – aRecording
that specifies what’s the input audio.storage (
FeaturesWriter
) – aFeaturesWriter
object that will handle storing the feature matrices.offset (
float
) – an optional offset in seconds for where to start reading the recording.duration (
Optional
[float
]) – an optional duration specifying how much audio to load from the recording.channels (
Union
[int
,List
[int
],None
]) – an optional int or list of ints, specifying the channels; by default, all channels will be used.augment_fn (
Optional
[Callable
[[ndarray
,int
],ndarray
]]) – an optionalWavAugmenter
instance to modify the waveform before feature extraction.
- Return type
- Returns
a
Features
manifest item for the extracted feature matrix.
- extract_from_samples_and_store(samples, storage, sampling_rate, offset=0, channel=None, augment_fn=None)¶
Extract the features from an array of audio samples in a full pipeline:
optional audio augmentation;
extract the features;
save them to disk in a specified directory;
return a
Features
object with a description of the extracted features.
Note, unlike in
extract_from_recording_and_store
, the returnedFeatures
object might not be suitable to store in aFeatureSet
, as it does not reference any particularRecording
. Instead, this method is useful when extracting features from cuts - especiallyMixedCut
instances, which may be created from multiple recordings and channels.- Parameters
samples (
ndarray
) – a numpy ndarray with the audio samples.sampling_rate (
int
) – integer sampling rate ofsamples
.storage (
FeaturesWriter
) – aFeaturesWriter
object that will handle storing the feature matrices.offset (
float
) – an offset in seconds for where to start reading the recording - when used forCut
feature extraction, must be equal toCut.start
.channel (
Union
[int
,List
[int
],None
]) – an optional channel number(s) to insert intoFeatures
manifest.augment_fn (
Optional
[Callable
[[ndarray
,int
],ndarray
]]) – an optionalWavAugmenter
instance to modify the waveform before feature extraction.
- Return type
- Returns
a
Features
manifest item for the extracted feature matrix (it is not written to disk).
- abstract feature_dim(sampling_rate)¶
- Return type
int
- classmethod from_dict(data)¶
- Return type
- classmethod from_yaml(path)¶
- Return type
- static mix(features_a, features_b, energy_scaling_factor_b)¶
Perform feature-domain mix of two signals,
a
andb
, and return the mixed signal.- Parameters
features_a (
ndarray
) – Left-hand side (reference) signal.features_b (
ndarray
) – Right-hand side (mixed-in) signal.energy_scaling_factor_b (
float
) – A scaling factor forfeatures_b
energy. It is used to achieve a specific SNR. E.g. to mix with an SNR of 10dB when bothfeatures_a
andfeatures_b
energies are 100, thefeatures_b
signal energy needs to be scaled by 0.1. Since different features (e.g. spectrogram, fbank, MFCC) require different combination of transformations (e.g. exp, log, sqrt, pow) to allow mixing of two signals, the exact place where to applyenergy_scaling_factor_b
to the signal is determined by the implementer.
- Return type
ndarray
- Returns
A mixed feature matrix.
- name = None¶
- to_dict()¶
- Return type
Dict
[str
,Any
]
- to_yaml(path)¶
- class lhotse.features.base.Features(type, num_frames, num_features, frame_shift, sampling_rate, start, duration, storage_type, storage_path, storage_key, recording_id=None, channels=None)[source]¶
Represents features extracted for some particular time range in a given recording and channel. It contains metadata about how it’s stored: storage_type describes “how to read it”, for now it supports numpy arrays serialized with np.save, as well as arrays compressed with lilcom; storage_path is the path to the file on the local filesystem.
- type: str¶
- num_frames: int¶
- num_features: int¶
- frame_shift: float¶
- sampling_rate: int¶
- start: float¶
- duration: float¶
- storage_type: str¶
- storage_path: str¶
- storage_key: Union[str, bytes]¶
- recording_id: Optional[str] = None¶
- channels: Optional[Union[int, List[int]]] = None¶
- property end: float¶
- Return type
float
- copy_feats(writer)[source]¶
Read the referenced feature array and save it using
writer
. Returns a copy of the manifest with updated fields related to the feature storage.- Return type
- __init__(type, num_frames, num_features, frame_shift, sampling_rate, start, duration, storage_type, storage_path, storage_key, recording_id=None, channels=None)¶
- class lhotse.features.base.FeatureSet(features=None)[source]¶
Represents a feature manifest, and allows to read features for given recordings within particular channels and time ranges. It also keeps information about the feature extractor parameters used to obtain this set. When a given recording/time-range/channel is unavailable, raises a KeyError.
- property data: Union[Dict[str, lhotse.features.base.Features], Iterable[lhotse.features.base.Features]]¶
Alias property for
self.features
- static from_items(features)¶
Function to be implemented by every sub-class of this mixin. It’s expected to create a sub-class instance out of an iterable of items that are held by the sub-class (e.g.,
CutSet.from_items(iterable_of_cuts)
).- Return type
- split(num_splits, shuffle=False, drop_last=False)[source]¶
Split the
FeatureSet
intonum_splits
pieces of equal size.- Parameters
num_splits (
int
) – Requested number of splits.shuffle (
bool
) – Optionally shuffle the recordings order first.drop_last (
bool
) – determines how to handle splitting whenlen(seq)
is not divisible bynum_splits
. WhenFalse
(default), the splits might have unequal lengths. WhenTrue
, it may discard the last element in some splits to ensure they are equally long.
- Return type
List
[FeatureSet
]- Returns
A list of
FeatureSet
pieces.
- split_lazy(output_dir, chunk_size, prefix='')[source]¶
Splits a manifest (either lazily or eagerly opened) into chunks, each with
chunk_size
items (except for the last one, typically).In order to be memory efficient, this implementation saves each chunk to disk in a
.jsonl.gz
format as the input manifest is sampled.Note
For lowest memory usage, use
load_manifest_lazy
to open the input manifest for this method.- Parameters
it – any iterable of Lhotse manifests.
output_dir (
Union
[Path
,str
]) – directory where the split manifests are saved. Each manifest is saved at:{output_dir}/{prefix}.{split_idx}.jsonl.gz
chunk_size (
int
) – the number of items in each chunk.prefix (
str
) – the prefix of each manifest.
- Return type
List
[FeatureSet
]- Returns
a list of lazily opened chunk manifests.
- shuffle(*args, **kwargs)[source]¶
Shuffles the elements and returns a shuffled variant of self. If the manifest is opened lazily, performs shuffling on-the-fly with a fixed buffer size.
- Parameters
rng – an optional instance of
random.Random
for precise control of randomness.- Returns
a shuffled copy of self, or a manifest that is shuffled lazily.
- subset(first=None, last=None)[source]¶
Return a new
FeatureSet
according to the selected subset criterion. Only a single argument tosubset
is supported at this time.- Parameters
first (
Optional
[int
]) – int, the number of first supervisions to keep.last (
Optional
[int
]) – int, the number of last supervisions to keep.
- Return type
- Returns
a new
FeatureSet
with the subset results.
- find(recording_id, channel_id=0, start=0.0, duration=None, leeway=0.05)[source]¶
Find and return a Features object that best satisfies the search criteria. Raise a KeyError when no such object is available.
- Parameters
recording_id (
str
) – str, requested recording ID.channel_id (
Union
[int
,List
[int
]]) – int, requested channel.start (
float
) – float, requested start time in seconds for the feature chunk.duration (
Optional
[float
]) – optional float, requested duration in seconds for the feature chunk. By default, return everything from the start.leeway (
float
) – float, controls how strictly we have to match the requested start and duration criteria. It is necessary to keep a small positive value here (default 0.05s), as there might be differences between the duration of recording/supervision segment, and the duration of features. The latter one is constrained to be a multiple of frame_shift, while the former can be arbitrary.
- Return type
- Returns
a Features object satisfying the search criteria.
- load(recording_id, channel_id=0, start=0.0, duration=None)[source]¶
Find a Features object that best satisfies the search criteria and load the features as a numpy ndarray. Raise a KeyError when no such object is available.
- Return type
ndarray
- copy_feats(writer)[source]¶
For each manifest in this FeatureSet, read the referenced feature array and save it using
writer
. Returns a copy of the manifest with updated fields related to the feature storage.- Return type
- compute_global_stats(storage_path=None)[source]¶
Compute the global means and standard deviations for each feature bin in the manifest. It follows the implementation in scikit-learn: https://github.com/scikit-learn/scikit-learn/blob/0fb307bf39bbdacd6ed713c00724f8f871d60370/sklearn/utils/extmath.py#L715 which follows the paper: “Algorithms for computing the sample variance: analysis and recommendations”, by Chan, Golub, and LeVeque.
- Parameters
storage_path (
Union
[Path
,str
,None
]) – an optional path to a file where the stats will be stored with pickle.- Return a dict of ``{‘norm_means’``{‘norm_means’
np.ndarray, ‘norm_stds’: np.ndarray}`` with the shape of the arrays equal to the number of feature bins in this manifest.
- Return type
Dict
[str
,ndarray
]
- filter(predicate)¶
Return a new manifest containing only the items that satisfy
predicate
. If the manifest is lazy, the filtering will also be applied lazily.- Parameters
predicate (
Callable
[[~T],bool
]) – a function that takes a cut as an argument and returns bool.- Returns
a filtered manifest.
- classmethod from_file(path)¶
- Return type
Any
- classmethod from_json(path)¶
- Return type
Any
- classmethod from_jsonl(path)¶
- Return type
Any
- classmethod from_jsonl_lazy(path)¶
Read a JSONL manifest in a lazy manner, which opens the file but does not read it immediately. It is only suitable for sequential reads and iteration.
Warning
Opening the manifest in this way might cause some methods that rely on random access to fail.
- Return type
Any
- classmethod from_yaml(path)¶
- Return type
Any
- property is_lazy: bool¶
Indicates whether this manifest was opened in lazy (read-on-the-fly) mode or not.
- Return type
bool
- map(transform_fn)¶
Apply transform_fn to each item in this manifest and return a new manifest. If the manifest is opened lazy, the transform is also applied lazily.
- Parameters
transform_fn (
Callable
[[~T], ~T]) – A callable (function) that accepts a single item instance and returns a new (or the same) instance of the same type. E.g. with CutSet, callable acceptsCut
and returns alsoCut
.- Returns
a new
CutSet
with transformed cuts.
- classmethod mux(*manifests, stop_early=False, weights=None, seed=0)¶
Merges multiple manifest iterables into a new iterable by lazily multiplexing them during iteration time. If one of the iterables is exhausted before the others, we will keep iterating until all iterables are exhausted. This behavior can be changed with
stop_early
parameter.- Parameters
manifests – iterables to be multiplexed. They can be either lazy or eager, but the resulting manifest will always be lazy.
stop_early (
bool
) – should we stop the iteration as soon as we exhaust one of the manifests.weights (
Optional
[List
[Union
[int
,float
]]]) – an optional weight for each iterable, affects the probability of it being sampled. The weights are uniform by default. If lengths are known, it makes sense to pass them here for uniform distribution of items in the expectation.seed (
int
) – the random seed, ensures deterministic order across multiple iterations.
- classmethod open_writer(path, overwrite=True)¶
Open a sequential writer that allows to store the manifests one by one, without the necessity of storing the whole manifest set in-memory. Supports writing to JSONL format (
.jsonl
), with optional gzip compression (.jsonl.gz
).Note
when
path
isNone
, we will return aInMemoryWriter
instead has the same API but stores the manifests in memory. It is convenient when you want to make disk saving optional.Example:
>>> from lhotse import RecordingSet ... recordings = [...] ... with RecordingSet.open_writer('recordings.jsonl.gz') as writer: ... for recording in recordings: ... writer.write(recording)
This writer can be useful for continuing to write files that were previously stopped – it will open the existing file and scan it for item IDs to skip writing them later. It can also be queried for existing IDs so that the user code may skip preparing the corresponding manifests.
Example:
>>> from lhotse import RecordingSet, Recording ... with RecordingSet.open_writer('recordings.jsonl.gz', overwrite=False) as writer: ... for path in Path('.').rglob('*.wav'): ... recording_id = path.stem ... if writer.contains(recording_id): ... # Item already written previously - skip processing. ... continue ... # Item doesn't exist yet - run extra work to prepare the manifest ... # and store it. ... recording = Recording.from_file(path, recording_id=recording_id) ... writer.write(recording)
- Return type
Union
[SequentialJsonlWriter
,InMemoryWriter
]
- repeat(times=None, preserve_id=False)¶
Return a new, lazily evaluated manifest that iterates over the original elements
times
number of times.- Parameters
times (
Optional
[int
]) – how many times to repeat (infinite by default).preserve_id (
bool
) – whenTrue
, we won’t update the element ID with repeat number.
- Returns
a repeated manifest.
- to_eager()¶
Evaluates all lazy operations on this manifest, if any, and returns a copy that keeps all items in memory. If the manifest was “eager” already, this is a no-op and won’t copy anything.
- to_file(path)¶
- Return type
None
- to_json(path)¶
- Return type
None
- to_jsonl(path)¶
- Return type
None
- to_yaml(path)¶
- Return type
None
- class lhotse.features.base.FeatureSetBuilder(feature_extractor, storage, augment_fn=None)[source]¶
An extended constructor for the FeatureSet. Think of it as a class wrapper for a feature extraction script. It consumes an iterable of Recordings, extracts the features specified by the FeatureExtractor config, and saves stores them on the disk.
Eventually, we plan to extend it with the capability to extract only the features in specified regions of recordings and to perform some time-domain data augmentation.
- lhotse.features.base.store_feature_array(feats, storage)[source]¶
Store
feats
array on disk, usinglilcom
compression by default.- Parameters
feats (
ndarray
) – a numpy ndarray containing features.storage (
FeaturesWriter
) – aFeaturesWriter
object to use for array storage.
- Return type
str
- Returns
a path to the file containing the stored array.
- lhotse.features.base.compute_global_stats(feature_manifests, storage_path=None)[source]¶
Compute the global means and standard deviations for each feature bin in the manifest. It performs only a single pass over the data and iteratively updates the estimate of the means and variances.
We follow the implementation in scikit-learn: https://github.com/scikit-learn/scikit-learn/blob/0fb307bf39bbdacd6ed713c00724f8f871d60370/sklearn/utils/extmath.py#L715 which follows the paper: “Algorithms for computing the sample variance: analysis and recommendations”, by Chan, Golub, and LeVeque.
- Parameters
feature_manifests (
Iterable
[Features
]) – an iterable ofFeatures
objects.storage_path (
Union
[Path
,str
,None
]) – an optional path to a file where the stats will be stored with pickle.
- Return a dict of ``{‘norm_means’``{‘norm_means’
np.ndarray, ‘norm_stds’: np.ndarray}`` with the shape of the arrays equal to the number of feature bins in this manifest.
- Return type
Dict
[str
,ndarray
]
Lhotse’s feature extractors¶
- class lhotse.features.kaldi.extractors.Fbank(config=None)[source]¶
- name = 'kaldi-fbank'¶
- config_type¶
alias of
lhotse.features.kaldi.extractors.FbankConfig
- property frame_shift: float¶
- Return type
float
- extract(samples, sampling_rate)[source]¶
Defines how to extract features using a numpy ndarray of audio samples and the sampling rate.
- Return type
Union
[ndarray
,Tensor
]- Returns
a numpy ndarray representing the feature matrix.
- static mix(features_a, features_b, energy_scaling_factor_b)[source]¶
Perform feature-domain mix of two signals,
a
andb
, and return the mixed signal.- Parameters
features_a (
ndarray
) – Left-hand side (reference) signal.features_b (
ndarray
) – Right-hand side (mixed-in) signal.energy_scaling_factor_b (
float
) – A scaling factor forfeatures_b
energy. It is used to achieve a specific SNR. E.g. to mix with an SNR of 10dB when bothfeatures_a
andfeatures_b
energies are 100, thefeatures_b
signal energy needs to be scaled by 0.1. Since different features (e.g. spectrogram, fbank, MFCC) require different combination of transformations (e.g. exp, log, sqrt, pow) to allow mixing of two signals, the exact place where to applyenergy_scaling_factor_b
to the signal is determined by the implementer.
- Return type
ndarray
- Returns
A mixed feature matrix.
- static compute_energy(features)[source]¶
Compute the total energy of a feature matrix. How the energy is computed depends on a particular type of features. It is expected that when implemented,
compute_energy
will never return zero.- Parameters
features (
ndarray
) – A feature matrix.- Return type
float
- Returns
A positive float value of the signal energy.
Kaldi feature extractors as network layers¶
This whole module is authored and contributed by Jesus Villalba, with minor changes by Piotr Żelasko to make it more consistent with Lhotse.
It contains a PyTorch implementation of feature extractors that is very close to Kaldi’s – notably, it differs in that the preemphasis and DC offset removal are applied in the time, rather than frequency domain. This should not significantly affect any results, as confirmed by Jesus.
This implementation works well with autograd and batching, and can be used neural network layers.
Update January 2022: These modules now expose a new API function called “online_inference” that may be used to compute the features when the audio is streaming. The implementation is stateless, and passes the waveform remainders back to the user to feed them to the modules once new data becomes available. The implementation is compatible with JIT scripting via TorchScript.
- class lhotse.features.kaldi.layers.Wav2Win(sampling_rate=16000, frame_length=0.025, frame_shift=0.01, pad_length=None, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', dither=0.0, snip_edges=False, energy_floor=1e-10, raw_energy=True, return_log_energy=False)[source]¶
Apply standard Kaldi preprocessing (dithering, removing DC offset, pre-emphasis, etc.) on the input waveforms and partition them into overlapping frames (of audio samples). Note: no feature extraction happens in here, the output is still a time-domain signal.
Example:
>>> x = torch.randn(1, 16000, dtype=torch.float32) >>> x.shape torch.Size([1, 16000]) >>> t = Wav2Win() >>> t(x).shape torch.Size([1, 100, 400])
The input is a tensor of shape
(batch_size, num_samples)
. The output is a tensor of shape(batch_size, num_frames, window_length)
. Whenreturn_log_energy==True
, returns a tuple where the second element is a log-energy tensor of shape(batch_size, num_frames)
.- __init__(sampling_rate=16000, frame_length=0.025, frame_shift=0.01, pad_length=None, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', dither=0.0, snip_edges=False, energy_floor=1e-10, raw_energy=True, return_log_energy=False)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.- Return type
Tuple
[Tensor
,Optional
[Tensor
]]
- online_inference(x, context=None)[source]¶
The same as the
forward()
method, except it accepts an extra argument with the remainder waveform from the previous call ofonline_inference()
, and returns a tuple of((frames, log_energy), remainder)
.- Return type
Tuple
[Tuple
[Tensor
,Optional
[Tensor
]],Tensor
]
- T_destination¶
alias of TypeVar(‘T_destination’, bound=
Dict
[str
,Any
])
- add_module(name, module)¶
Adds a child module to the current module.
The module can be accessed as an attribute using the given name.
- Args:
- name (string): name of the child module. The child module can be
accessed from this module using the given name
module (Module): child module to be added to the module.
- Return type
None
- apply(fn)¶
Applies
fn
recursively to every submodule (as returned by.children()
) as well as self. Typical use includes initializing the parameters of a model (see also nn-init-doc).- Args:
fn (
Module
-> None): function to be applied to each submodule- Returns:
Module: self
Example:
>>> @torch.no_grad() >>> def init_weights(m): >>> print(m) >>> if type(m) == nn.Linear: >>> m.weight.fill_(1.0) >>> print(m.weight) >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2)) >>> net.apply(init_weights) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )
- Return type
~T
- bfloat16()¶
Casts all floating point parameters and buffers to
bfloat16
datatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- buffers(recurse=True)¶
Returns an iterator over module buffers.
- Args:
- recurse (bool): if True, then yields buffers of this module
and all submodules. Otherwise, yields only buffers that are direct members of this module.
- Yields:
torch.Tensor: module buffer
Example:
>>> for buf in model.buffers(): >>> print(type(buf), buf.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- Return type
Iterator
[Tensor
]
- children()¶
Returns an iterator over immediate children modules.
- Yields:
Module: a child module
- Return type
Iterator
[Module
]
- cpu()¶
Moves all model parameters and buffers to the CPU.
Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- cuda(device=None)¶
Moves all model parameters and buffers to the GPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.
Note
This method modifies the module in-place.
- Args:
- device (int, optional): if specified, all parameters will be
copied to that device
- Returns:
Module: self
- Return type
~T
- double()¶
Casts all floating point parameters and buffers to
double
datatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- dump_patches: bool = False¶
- eval()¶
Sets the module in evaluation mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.This is equivalent with
self.train(False)
.See locally-disable-grad-doc for a comparison between .eval() and several similar mechanisms that may be confused with it.
- Returns:
Module: self
- Return type
~T
- extra_repr()¶
Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- Return type
str
- float()¶
Casts all floating point parameters and buffers to
float
datatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- get_buffer(target)¶
Returns the buffer given by
target
if it exists, otherwise throws an error.See the docstring for
get_submodule
for a more detailed explanation of this method’s functionality as well as how to correctly specifytarget
.- Args:
- target: The fully-qualified string name of the buffer
to look for. (See
get_submodule
for how to specify a fully-qualified string.)
- Returns:
torch.Tensor: The buffer referenced by
target
- Raises:
- AttributeError: If the target string references an invalid
path or resolves to something that is not a buffer
- Return type
Tensor
- get_extra_state()¶
Returns any extra state to include in the module’s state_dict. Implement this and a corresponding
set_extra_state()
for your module if you need to store extra state. This function is called when building the module’s state_dict().Note that extra state should be pickleable to ensure working serialization of the state_dict. We only provide provide backwards compatibility guarantees for serializing Tensors; other objects may break backwards compatibility if their serialized pickled form changes.
- Returns:
object: Any extra state to store in the module’s state_dict
- Return type
Any
- get_parameter(target)¶
Returns the parameter given by
target
if it exists, otherwise throws an error.See the docstring for
get_submodule
for a more detailed explanation of this method’s functionality as well as how to correctly specifytarget
.- Args:
- target: The fully-qualified string name of the Parameter
to look for. (See
get_submodule
for how to specify a fully-qualified string.)
- Returns:
torch.nn.Parameter: The Parameter referenced by
target
- Raises:
- AttributeError: If the target string references an invalid
path or resolves to something that is not an
nn.Parameter
- Return type
Parameter
- get_submodule(target)¶
Returns the submodule given by
target
if it exists, otherwise throws an error.For example, let’s say you have an
nn.Module
A
that looks like this:A( (net_b): Module( (net_c): Module( (conv): Conv2d(16, 33, kernel_size=(3, 3), stride=(2, 2)) ) (linear): Linear(in_features=100, out_features=200, bias=True) ) )
(The diagram shows an
nn.Module
A
.A
has a nested submodulenet_b
, which itself has two submodulesnet_c
andlinear
.net_c
then has a submoduleconv
.)To check whether or not we have the
linear
submodule, we would callget_submodule("net_b.linear")
. To check whether we have theconv
submodule, we would callget_submodule("net_b.net_c.conv")
.The runtime of
get_submodule
is bounded by the degree of module nesting intarget
. A query againstnamed_modules
achieves the same result, but it is O(N) in the number of transitive modules. So, for a simple check to see if some submodule exists,get_submodule
should always be used.- Args:
- target: The fully-qualified string name of the submodule
to look for. (See above example for how to specify a fully-qualified string.)
- Returns:
torch.nn.Module: The submodule referenced by
target
- Raises:
- AttributeError: If the target string references an invalid
path or resolves to something that is not an
nn.Module
- Return type
Module
- half()¶
Casts all floating point parameters and buffers to
half
datatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- ipu(device=None)¶
Moves all model parameters and buffers to the IPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on IPU while being optimized.
Note
This method modifies the module in-place.
- Arguments:
- device (int, optional): if specified, all parameters will be
copied to that device
- Returns:
Module: self
- Return type
~T
- load_state_dict(state_dict, strict=True)¶
Copies parameters and buffers from
state_dict
into this module and its descendants. Ifstrict
isTrue
, then the keys ofstate_dict
must exactly match the keys returned by this module’sstate_dict()
function.- Args:
- state_dict (dict): a dict containing parameters and
persistent buffers.
- strict (bool, optional): whether to strictly enforce that the keys
in
state_dict
match the keys returned by this module’sstate_dict()
function. Default:True
- Returns:
NamedTuple
withmissing_keys
andunexpected_keys
fields:missing_keys is a list of str containing the missing keys
unexpected_keys is a list of str containing the unexpected keys
- Note:
If a parameter or buffer is registered as
None
and its corresponding key exists instate_dict
,load_state_dict()
will raise aRuntimeError
.
- modules()¶
Returns an iterator over all modules in the network.
- Yields:
Module: a module in the network
- Note:
Duplicate modules are returned only once. In the following example,
l
will be returned only once.
Example:
>>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.modules()): print(idx, '->', m) 0 -> Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) 1 -> Linear(in_features=2, out_features=2, bias=True)
- Return type
Iterator
[Module
]
- named_buffers(prefix='', recurse=True)¶
Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
- Args:
prefix (str): prefix to prepend to all buffer names. recurse (bool): if True, then yields buffers of this module
and all submodules. Otherwise, yields only buffers that are direct members of this module.
- Yields:
(string, torch.Tensor): Tuple containing the name and buffer
Example:
>>> for name, buf in self.named_buffers(): >>> if name in ['running_var']: >>> print(buf.size())
- Return type
Iterator
[Tuple
[str
,Tensor
]]
- named_children()¶
Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
- Yields:
(string, Module): Tuple containing a name and child module
Example:
>>> for name, module in model.named_children(): >>> if name in ['conv4', 'conv5']: >>> print(module)
- Return type
Iterator
[Tuple
[str
,Module
]]
- named_modules(memo=None, prefix='', remove_duplicate=True)¶
Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
- Args:
memo: a memo to store the set of modules already added to the result prefix: a prefix that will be added to the name of the module remove_duplicate: whether to remove the duplicated module instances in the result
or not
- Yields:
(string, Module): Tuple of name and module
- Note:
Duplicate modules are returned only once. In the following example,
l
will be returned only once.
Example:
>>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.named_modules()): print(idx, '->', m) 0 -> ('', Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )) 1 -> ('0', Linear(in_features=2, out_features=2, bias=True))
- named_parameters(prefix='', recurse=True)¶
Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
- Args:
prefix (str): prefix to prepend to all parameter names. recurse (bool): if True, then yields parameters of this module
and all submodules. Otherwise, yields only parameters that are direct members of this module.
- Yields:
(string, Parameter): Tuple containing the name and parameter
Example:
>>> for name, param in self.named_parameters(): >>> if name in ['bias']: >>> print(param.size())
- Return type
Iterator
[Tuple
[str
,Parameter
]]
- parameters(recurse=True)¶
Returns an iterator over module parameters.
This is typically passed to an optimizer.
- Args:
- recurse (bool): if True, then yields parameters of this module
and all submodules. Otherwise, yields only parameters that are direct members of this module.
- Yields:
Parameter: module parameter
Example:
>>> for param in model.parameters(): >>> print(type(param), param.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- Return type
Iterator
[Parameter
]
- register_backward_hook(hook)¶
Registers a backward hook on the module.
This function is deprecated in favor of
register_full_backward_hook()
and the behavior of this function will change in future versions.- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- Return type
RemovableHandle
- register_buffer(name, tensor, persistent=True)¶
Adds a buffer to the module.
This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s
running_mean
is not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by settingpersistent
toFalse
. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’sstate_dict
.Buffers can be accessed as attributes using given names.
- Args:
- name (string): name of the buffer. The buffer can be accessed
from this module using the given name
- tensor (Tensor or None): buffer to be registered. If
None
, then operations that run on buffers, such as
cuda
, are ignored. IfNone
, the buffer is not included in the module’sstate_dict
.- persistent (bool): whether the buffer is part of this module’s
Example:
>>> self.register_buffer('running_mean', torch.zeros(num_features))
- Return type
None
- register_forward_hook(hook)¶
Registers a forward hook on the module.
The hook will be called every time after
forward()
has computed an output. It should have the following signature:hook(module, input, output) -> None or modified output
The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the
forward
. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called afterforward()
is called.- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- Return type
RemovableHandle
- register_forward_pre_hook(hook)¶
Registers a forward pre-hook on the module.
The hook will be called every time before
forward()
is invoked. It should have the following signature:hook(module, input) -> None or modified input
The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the
forward
. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple).- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- Return type
RemovableHandle
- register_full_backward_hook(hook)¶
Registers a backward hook on the module.
The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature:
hook(module, grad_input, grad_output) -> tuple(Tensor) or None
The
grad_input
andgrad_output
are tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place ofgrad_input
in subsequent computations.grad_input
will only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries ingrad_input
andgrad_output
will beNone
for all non-Tensor arguments.For technical reasons, when this hook is applied to a Module, its forward function will receive a view of each Tensor passed to the Module. Similarly the caller will receive a view of each Tensor returned by the Module’s forward function.
Warning
Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error.
- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- Return type
RemovableHandle
- register_load_state_dict_post_hook(hook)¶
Registers a post hook to be run after module’s
load_state_dict
is called.- It should have the following signature::
hook(module, incompatible_keys) -> None
The
module
argument is the current module that this hook is registered on, and theincompatible_keys
argument is aNamedTuple
consisting of attributesmissing_keys
andunexpected_keys
.missing_keys
is alist
ofstr
containing the missing keys andunexpected_keys
is alist
ofstr
containing the unexpected keys.The given incompatible_keys can be modified inplace if needed.
Note that the checks performed when calling
load_state_dict()
withstrict=True
are affected by modifications the hook makes tomissing_keys
orunexpected_keys
, as expected. Additions to either set of keys will result in an error being thrown whenstrict=True
, and clearning out both missing and unexpected keys will avoid an error.- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- register_module(name, module)¶
Alias for
add_module()
.- Return type
None
- register_parameter(name, param)¶
Adds a parameter to the module.
The parameter can be accessed as an attribute using given name.
- Args:
- name (string): name of the parameter. The parameter can be accessed
from this module using the given name
- param (Parameter or None): parameter to be added to the module. If
None
, then operations that run on parameters, such ascuda
, are ignored. IfNone
, the parameter is not included in the module’sstate_dict
.
- Return type
None
- requires_grad_(requires_grad=True)¶
Change if autograd should record operations on parameters in this module.
This method sets the parameters’
requires_grad
attributes in-place.This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).
See locally-disable-grad-doc for a comparison between .requires_grad_() and several similar mechanisms that may be confused with it.
- Args:
- requires_grad (bool): whether autograd should record operations on
parameters in this module. Default:
True
.
- Returns:
Module: self
- Return type
~T
- set_extra_state(state)¶
This function is called from
load_state_dict()
to handle any extra state found within the state_dict. Implement this function and a correspondingget_extra_state()
for your module if you need to store extra state within its state_dict.- Args:
state (dict): Extra state from the state_dict
See
torch.Tensor.share_memory_()
- Return type
~T
- state_dict(*args, destination=None, prefix='', keep_vars=False)¶
Returns a dictionary containing a whole state of the module.
Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names. Parameters and buffers set to
None
are not included.Warning
Currently
state_dict()
also accepts positional arguments fordestination
,prefix
andkeep_vars
in order. However, this is being deprecated and keyword arguments will be enforced in future releases.Warning
Please avoid the use of argument
destination
as it is not designed for end-users.- Args:
- destination (dict, optional): If provided, the state of module will
be updated into the dict and the same object is returned. Otherwise, an
OrderedDict
will be created and returned. Default:None
.- prefix (str, optional): a prefix added to parameter and buffer
names to compose the keys in state_dict. Default:
''
.- keep_vars (bool, optional): by default the
Tensor
s returned in the state dict are detached from autograd. If it’s set to
True
, detaching will not be performed. Default:False
.
- Returns:
- dict:
a dictionary containing a whole state of the module
Example:
>>> module.state_dict().keys() ['bias', 'weight']
- to(*args, **kwargs)¶
Moves and/or casts the parameters and buffers.
This can be called as
- to(device=None, dtype=None, non_blocking=False)
- to(dtype, non_blocking=False)
- to(tensor, non_blocking=False)
- to(memory_format=torch.channels_last)
Its signature is similar to
torch.Tensor.to()
, but only accepts floating point or complexdtype
s. In addition, this method will only cast the floating point or complex parameters and buffers todtype
(if given). The integral parameters and buffers will be moveddevice
, if that is given, but with dtypes unchanged. Whennon_blocking
is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.See below for examples.
Note
This method modifies the module in-place.
- Args:
- device (
torch.device
): the desired device of the parameters and buffers in this module
- dtype (
torch.dtype
): the desired floating point or complex dtype of the parameters and buffers in this module
- tensor (torch.Tensor): Tensor whose dtype and device are the desired
dtype and device for all parameters and buffers in this module
- memory_format (
torch.memory_format
): the desired memory format for 4D parameters and buffers in this module (keyword only argument)
- device (
- Returns:
Module: self
Examples:
>>> linear = nn.Linear(2, 2) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]]) >>> linear.to(torch.double) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]], dtype=torch.float64) >>> gpu1 = torch.device("cuda:1") >>> linear.to(gpu1, dtype=torch.half, non_blocking=True) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1') >>> cpu = torch.device("cpu") >>> linear.to(cpu) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16) >>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble) >>> linear.weight Parameter containing: tensor([[ 0.3741+0.j, 0.2382+0.j], [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128) >>> linear(torch.ones(3, 2, dtype=torch.cdouble)) tensor([[0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
- to_empty(*, device)¶
Moves the parameters and buffers to the specified device without copying storage.
- Args:
- device (
torch.device
): The desired device of the parameters and buffers in this module.
- device (
- Returns:
Module: self
- Return type
~T
- train(mode=True)¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Args:
- mode (bool): whether to set training mode (
True
) or evaluation mode (
False
). Default:True
.
- mode (bool): whether to set training mode (
- Returns:
Module: self
- Return type
~T
- type(dst_type)¶
Casts all parameters and buffers to
dst_type
.Note
This method modifies the module in-place.
- Args:
dst_type (type or string): the desired type
- Returns:
Module: self
- Return type
~T
- xpu(device=None)¶
Moves all model parameters and buffers to the XPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized.
Note
This method modifies the module in-place.
- Arguments:
- device (int, optional): if specified, all parameters will be
copied to that device
- Returns:
Module: self
- Return type
~T
- zero_grad(set_to_none=False)¶
Sets gradients of all model parameters to zero. See similar function under
torch.optim.Optimizer
for more context.- Args:
- set_to_none (bool): instead of setting to zero, set the grads to None.
See
torch.optim.Optimizer.zero_grad()
for details.
- Return type
None
- training: bool¶
- class lhotse.features.kaldi.layers.Wav2FFT(sampling_rate=16000, frame_length=0.025, frame_shift=0.01, round_to_power_of_two=True, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', dither=0.0, snip_edges=False, energy_floor=1e-10, raw_energy=True, use_energy=True)[source]¶
Apply standard Kaldi preprocessing (dithering, removing DC offset, pre-emphasis, etc.) on the input waveforms and compute their Short-Time Fourier Transform (STFT). The output is a complex-valued tensor.
Example:
>>> x = torch.randn(1, 16000, dtype=torch.float32) >>> x.shape torch.Size([1, 16000]) >>> t = Wav2FFT() >>> t(x).shape torch.Size([1, 100, 257])
The input is a tensor of shape
(batch_size, num_samples)
. The output is a tensor of shape(batch_size, num_frames, num_fft_bins)
with dtypetorch.complex64
.- __init__(sampling_rate=16000, frame_length=0.025, frame_shift=0.01, round_to_power_of_two=True, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', dither=0.0, snip_edges=False, energy_floor=1e-10, raw_energy=True, use_energy=True)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- property sampling_rate: int¶
- Return type
int
- property remove_dc_offset: bool¶
- Return type
bool
- property window_type: str¶
- Return type
str
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.- Return type
Tensor
- T_destination¶
alias of TypeVar(‘T_destination’, bound=
Dict
[str
,Any
])
- add_module(name, module)¶
Adds a child module to the current module.
The module can be accessed as an attribute using the given name.
- Args:
- name (string): name of the child module. The child module can be
accessed from this module using the given name
module (Module): child module to be added to the module.
- Return type
None
- apply(fn)¶
Applies
fn
recursively to every submodule (as returned by.children()
) as well as self. Typical use includes initializing the parameters of a model (see also nn-init-doc).- Args:
fn (
Module
-> None): function to be applied to each submodule- Returns:
Module: self
Example:
>>> @torch.no_grad() >>> def init_weights(m): >>> print(m) >>> if type(m) == nn.Linear: >>> m.weight.fill_(1.0) >>> print(m.weight) >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2)) >>> net.apply(init_weights) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )
- Return type
~T
- bfloat16()¶
Casts all floating point parameters and buffers to
bfloat16
datatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- buffers(recurse=True)¶
Returns an iterator over module buffers.
- Args:
- recurse (bool): if True, then yields buffers of this module
and all submodules. Otherwise, yields only buffers that are direct members of this module.
- Yields:
torch.Tensor: module buffer
Example:
>>> for buf in model.buffers(): >>> print(type(buf), buf.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- Return type
Iterator
[Tensor
]
- children()¶
Returns an iterator over immediate children modules.
- Yields:
Module: a child module
- Return type
Iterator
[Module
]
- cpu()¶
Moves all model parameters and buffers to the CPU.
Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- cuda(device=None)¶
Moves all model parameters and buffers to the GPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.
Note
This method modifies the module in-place.
- Args:
- device (int, optional): if specified, all parameters will be
copied to that device
- Returns:
Module: self
- Return type
~T
- double()¶
Casts all floating point parameters and buffers to
double
datatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- dump_patches: bool = False¶
- eval()¶
Sets the module in evaluation mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.This is equivalent with
self.train(False)
.See locally-disable-grad-doc for a comparison between .eval() and several similar mechanisms that may be confused with it.
- Returns:
Module: self
- Return type
~T
- extra_repr()¶
Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- Return type
str
- float()¶
Casts all floating point parameters and buffers to
float
datatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- get_buffer(target)¶
Returns the buffer given by
target
if it exists, otherwise throws an error.See the docstring for
get_submodule
for a more detailed explanation of this method’s functionality as well as how to correctly specifytarget
.- Args:
- target: The fully-qualified string name of the buffer
to look for. (See
get_submodule
for how to specify a fully-qualified string.)
- Returns:
torch.Tensor: The buffer referenced by
target
- Raises:
- AttributeError: If the target string references an invalid
path or resolves to something that is not a buffer
- Return type
Tensor
- get_extra_state()¶
Returns any extra state to include in the module’s state_dict. Implement this and a corresponding
set_extra_state()
for your module if you need to store extra state. This function is called when building the module’s state_dict().Note that extra state should be pickleable to ensure working serialization of the state_dict. We only provide provide backwards compatibility guarantees for serializing Tensors; other objects may break backwards compatibility if their serialized pickled form changes.
- Returns:
object: Any extra state to store in the module’s state_dict
- Return type
Any
- get_parameter(target)¶
Returns the parameter given by
target
if it exists, otherwise throws an error.See the docstring for
get_submodule
for a more detailed explanation of this method’s functionality as well as how to correctly specifytarget
.- Args:
- target: The fully-qualified string name of the Parameter
to look for. (See
get_submodule
for how to specify a fully-qualified string.)
- Returns:
torch.nn.Parameter: The Parameter referenced by
target
- Raises:
- AttributeError: If the target string references an invalid
path or resolves to something that is not an
nn.Parameter
- Return type
Parameter
- get_submodule(target)¶
Returns the submodule given by
target
if it exists, otherwise throws an error.For example, let’s say you have an
nn.Module
A
that looks like this:A( (net_b): Module( (net_c): Module( (conv): Conv2d(16, 33, kernel_size=(3, 3), stride=(2, 2)) ) (linear): Linear(in_features=100, out_features=200, bias=True) ) )
(The diagram shows an
nn.Module
A
.A
has a nested submodulenet_b
, which itself has two submodulesnet_c
andlinear
.net_c
then has a submoduleconv
.)To check whether or not we have the
linear
submodule, we would callget_submodule("net_b.linear")
. To check whether we have theconv
submodule, we would callget_submodule("net_b.net_c.conv")
.The runtime of
get_submodule
is bounded by the degree of module nesting intarget
. A query againstnamed_modules
achieves the same result, but it is O(N) in the number of transitive modules. So, for a simple check to see if some submodule exists,get_submodule
should always be used.- Args:
- target: The fully-qualified string name of the submodule
to look for. (See above example for how to specify a fully-qualified string.)
- Returns:
torch.nn.Module: The submodule referenced by
target
- Raises:
- AttributeError: If the target string references an invalid
path or resolves to something that is not an
nn.Module
- Return type
Module
- half()¶
Casts all floating point parameters and buffers to
half
datatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- ipu(device=None)¶
Moves all model parameters and buffers to the IPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on IPU while being optimized.
Note
This method modifies the module in-place.
- Arguments:
- device (int, optional): if specified, all parameters will be
copied to that device
- Returns:
Module: self
- Return type
~T
- load_state_dict(state_dict, strict=True)¶
Copies parameters and buffers from
state_dict
into this module and its descendants. Ifstrict
isTrue
, then the keys ofstate_dict
must exactly match the keys returned by this module’sstate_dict()
function.- Args:
- state_dict (dict): a dict containing parameters and
persistent buffers.
- strict (bool, optional): whether to strictly enforce that the keys
in
state_dict
match the keys returned by this module’sstate_dict()
function. Default:True
- Returns:
NamedTuple
withmissing_keys
andunexpected_keys
fields:missing_keys is a list of str containing the missing keys
unexpected_keys is a list of str containing the unexpected keys
- Note:
If a parameter or buffer is registered as
None
and its corresponding key exists instate_dict
,load_state_dict()
will raise aRuntimeError
.
- modules()¶
Returns an iterator over all modules in the network.
- Yields:
Module: a module in the network
- Note:
Duplicate modules are returned only once. In the following example,
l
will be returned only once.
Example:
>>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.modules()): print(idx, '->', m) 0 -> Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) 1 -> Linear(in_features=2, out_features=2, bias=True)
- Return type
Iterator
[Module
]
- named_buffers(prefix='', recurse=True)¶
Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
- Args:
prefix (str): prefix to prepend to all buffer names. recurse (bool): if True, then yields buffers of this module
and all submodules. Otherwise, yields only buffers that are direct members of this module.
- Yields:
(string, torch.Tensor): Tuple containing the name and buffer
Example:
>>> for name, buf in self.named_buffers(): >>> if name in ['running_var']: >>> print(buf.size())
- Return type
Iterator
[Tuple
[str
,Tensor
]]
- named_children()¶
Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
- Yields:
(string, Module): Tuple containing a name and child module
Example:
>>> for name, module in model.named_children(): >>> if name in ['conv4', 'conv5']: >>> print(module)
- Return type
Iterator
[Tuple
[str
,Module
]]
- named_modules(memo=None, prefix='', remove_duplicate=True)¶
Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
- Args:
memo: a memo to store the set of modules already added to the result prefix: a prefix that will be added to the name of the module remove_duplicate: whether to remove the duplicated module instances in the result
or not
- Yields:
(string, Module): Tuple of name and module
- Note:
Duplicate modules are returned only once. In the following example,
l
will be returned only once.
Example:
>>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.named_modules()): print(idx, '->', m) 0 -> ('', Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )) 1 -> ('0', Linear(in_features=2, out_features=2, bias=True))
- named_parameters(prefix='', recurse=True)¶
Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
- Args:
prefix (str): prefix to prepend to all parameter names. recurse (bool): if True, then yields parameters of this module
and all submodules. Otherwise, yields only parameters that are direct members of this module.
- Yields:
(string, Parameter): Tuple containing the name and parameter
Example:
>>> for name, param in self.named_parameters(): >>> if name in ['bias']: >>> print(param.size())
- Return type
Iterator
[Tuple
[str
,Parameter
]]
- parameters(recurse=True)¶
Returns an iterator over module parameters.
This is typically passed to an optimizer.
- Args:
- recurse (bool): if True, then yields parameters of this module
and all submodules. Otherwise, yields only parameters that are direct members of this module.
- Yields:
Parameter: module parameter
Example:
>>> for param in model.parameters(): >>> print(type(param), param.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- Return type
Iterator
[Parameter
]
- register_backward_hook(hook)¶
Registers a backward hook on the module.
This function is deprecated in favor of
register_full_backward_hook()
and the behavior of this function will change in future versions.- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- Return type
RemovableHandle
- register_buffer(name, tensor, persistent=True)¶
Adds a buffer to the module.
This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s
running_mean
is not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by settingpersistent
toFalse
. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’sstate_dict
.Buffers can be accessed as attributes using given names.
- Args:
- name (string): name of the buffer. The buffer can be accessed
from this module using the given name
- tensor (Tensor or None): buffer to be registered. If
None
, then operations that run on buffers, such as
cuda
, are ignored. IfNone
, the buffer is not included in the module’sstate_dict
.- persistent (bool): whether the buffer is part of this module’s
Example:
>>> self.register_buffer('running_mean', torch.zeros(num_features))
- Return type
None
- register_forward_hook(hook)¶
Registers a forward hook on the module.
The hook will be called every time after
forward()
has computed an output. It should have the following signature:hook(module, input, output) -> None or modified output
The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the
forward
. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called afterforward()
is called.- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- Return type
RemovableHandle
- register_forward_pre_hook(hook)¶
Registers a forward pre-hook on the module.
The hook will be called every time before
forward()
is invoked. It should have the following signature:hook(module, input) -> None or modified input
The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the
forward
. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple).- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- Return type
RemovableHandle
- register_full_backward_hook(hook)¶
Registers a backward hook on the module.
The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature:
hook(module, grad_input, grad_output) -> tuple(Tensor) or None
The
grad_input
andgrad_output
are tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place ofgrad_input
in subsequent computations.grad_input
will only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries ingrad_input
andgrad_output
will beNone
for all non-Tensor arguments.For technical reasons, when this hook is applied to a Module, its forward function will receive a view of each Tensor passed to the Module. Similarly the caller will receive a view of each Tensor returned by the Module’s forward function.
Warning
Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error.
- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- Return type
RemovableHandle
- register_load_state_dict_post_hook(hook)¶
Registers a post hook to be run after module’s
load_state_dict
is called.- It should have the following signature::
hook(module, incompatible_keys) -> None
The
module
argument is the current module that this hook is registered on, and theincompatible_keys
argument is aNamedTuple
consisting of attributesmissing_keys
andunexpected_keys
.missing_keys
is alist
ofstr
containing the missing keys andunexpected_keys
is alist
ofstr
containing the unexpected keys.The given incompatible_keys can be modified inplace if needed.
Note that the checks performed when calling
load_state_dict()
withstrict=True
are affected by modifications the hook makes tomissing_keys
orunexpected_keys
, as expected. Additions to either set of keys will result in an error being thrown whenstrict=True
, and clearning out both missing and unexpected keys will avoid an error.- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- register_module(name, module)¶
Alias for
add_module()
.- Return type
None
- register_parameter(name, param)¶
Adds a parameter to the module.
The parameter can be accessed as an attribute using given name.
- Args:
- name (string): name of the parameter. The parameter can be accessed
from this module using the given name
- param (Parameter or None): parameter to be added to the module. If
None
, then operations that run on parameters, such ascuda
, are ignored. IfNone
, the parameter is not included in the module’sstate_dict
.
- Return type
None
- requires_grad_(requires_grad=True)¶
Change if autograd should record operations on parameters in this module.
This method sets the parameters’
requires_grad
attributes in-place.This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).
See locally-disable-grad-doc for a comparison between .requires_grad_() and several similar mechanisms that may be confused with it.
- Args:
- requires_grad (bool): whether autograd should record operations on
parameters in this module. Default:
True
.
- Returns:
Module: self
- Return type
~T
- set_extra_state(state)¶
This function is called from
load_state_dict()
to handle any extra state found within the state_dict. Implement this function and a correspondingget_extra_state()
for your module if you need to store extra state within its state_dict.- Args:
state (dict): Extra state from the state_dict
See
torch.Tensor.share_memory_()
- Return type
~T
- state_dict(*args, destination=None, prefix='', keep_vars=False)¶
Returns a dictionary containing a whole state of the module.
Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names. Parameters and buffers set to
None
are not included.Warning
Currently
state_dict()
also accepts positional arguments fordestination
,prefix
andkeep_vars
in order. However, this is being deprecated and keyword arguments will be enforced in future releases.Warning
Please avoid the use of argument
destination
as it is not designed for end-users.- Args:
- destination (dict, optional): If provided, the state of module will
be updated into the dict and the same object is returned. Otherwise, an
OrderedDict
will be created and returned. Default:None
.- prefix (str, optional): a prefix added to parameter and buffer
names to compose the keys in state_dict. Default:
''
.- keep_vars (bool, optional): by default the
Tensor
s returned in the state dict are detached from autograd. If it’s set to
True
, detaching will not be performed. Default:False
.
- Returns:
- dict:
a dictionary containing a whole state of the module
Example:
>>> module.state_dict().keys() ['bias', 'weight']
- to(*args, **kwargs)¶
Moves and/or casts the parameters and buffers.
This can be called as
- to(device=None, dtype=None, non_blocking=False)
- to(dtype, non_blocking=False)
- to(tensor, non_blocking=False)
- to(memory_format=torch.channels_last)
Its signature is similar to
torch.Tensor.to()
, but only accepts floating point or complexdtype
s. In addition, this method will only cast the floating point or complex parameters and buffers todtype
(if given). The integral parameters and buffers will be moveddevice
, if that is given, but with dtypes unchanged. Whennon_blocking
is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.See below for examples.
Note
This method modifies the module in-place.
- Args:
- device (
torch.device
): the desired device of the parameters and buffers in this module
- dtype (
torch.dtype
): the desired floating point or complex dtype of the parameters and buffers in this module
- tensor (torch.Tensor): Tensor whose dtype and device are the desired
dtype and device for all parameters and buffers in this module
- memory_format (
torch.memory_format
): the desired memory format for 4D parameters and buffers in this module (keyword only argument)
- device (
- Returns:
Module: self
Examples:
>>> linear = nn.Linear(2, 2) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]]) >>> linear.to(torch.double) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]], dtype=torch.float64) >>> gpu1 = torch.device("cuda:1") >>> linear.to(gpu1, dtype=torch.half, non_blocking=True) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1') >>> cpu = torch.device("cpu") >>> linear.to(cpu) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16) >>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble) >>> linear.weight Parameter containing: tensor([[ 0.3741+0.j, 0.2382+0.j], [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128) >>> linear(torch.ones(3, 2, dtype=torch.cdouble)) tensor([[0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
- to_empty(*, device)¶
Moves the parameters and buffers to the specified device without copying storage.
- Args:
- device (
torch.device
): The desired device of the parameters and buffers in this module.
- device (
- Returns:
Module: self
- Return type
~T
- train(mode=True)¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Args:
- mode (bool): whether to set training mode (
True
) or evaluation mode (
False
). Default:True
.
- mode (bool): whether to set training mode (
- Returns:
Module: self
- Return type
~T
- type(dst_type)¶
Casts all parameters and buffers to
dst_type
.Note
This method modifies the module in-place.
- Args:
dst_type (type or string): the desired type
- Returns:
Module: self
- Return type
~T
- xpu(device=None)¶
Moves all model parameters and buffers to the XPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized.
Note
This method modifies the module in-place.
- Arguments:
- device (int, optional): if specified, all parameters will be
copied to that device
- Returns:
Module: self
- Return type
~T
- zero_grad(set_to_none=False)¶
Sets gradients of all model parameters to zero. See similar function under
torch.optim.Optimizer
for more context.- Args:
- set_to_none (bool): instead of setting to zero, set the grads to None.
See
torch.optim.Optimizer.zero_grad()
for details.
- Return type
None
- training: bool¶
- class lhotse.features.kaldi.layers.Wav2Spec(sampling_rate=16000, frame_length=0.025, frame_shift=0.01, round_to_power_of_two=True, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', dither=0.0, snip_edges=False, energy_floor=1e-10, raw_energy=True, use_energy=True, use_fft_mag=False)[source]¶
Apply standard Kaldi preprocessing (dithering, removing DC offset, pre-emphasis, etc.) on the input waveforms and compute their Short-Time Fourier Transform (STFT). The STFT is transformed either to a magnitude spectrum (
use_fft_mag=True
) or a power spectrum (use_fft_mag=False
).Example:
>>> x = torch.randn(1, 16000, dtype=torch.float32) >>> x.shape torch.Size([1, 16000]) >>> t = Wav2Spec() >>> t(x).shape torch.Size([1, 100, 257])
The input is a tensor of shape
(batch_size, num_samples)
. The output is a tensor of shape(batch_size, num_frames, num_fft_bins)
.- __init__(sampling_rate=16000, frame_length=0.025, frame_shift=0.01, round_to_power_of_two=True, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', dither=0.0, snip_edges=False, energy_floor=1e-10, raw_energy=True, use_energy=True, use_fft_mag=False)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- T_destination¶
alias of TypeVar(‘T_destination’, bound=
Dict
[str
,Any
])
- add_module(name, module)¶
Adds a child module to the current module.
The module can be accessed as an attribute using the given name.
- Args:
- name (string): name of the child module. The child module can be
accessed from this module using the given name
module (Module): child module to be added to the module.
- Return type
None
- apply(fn)¶
Applies
fn
recursively to every submodule (as returned by.children()
) as well as self. Typical use includes initializing the parameters of a model (see also nn-init-doc).- Args:
fn (
Module
-> None): function to be applied to each submodule- Returns:
Module: self
Example:
>>> @torch.no_grad() >>> def init_weights(m): >>> print(m) >>> if type(m) == nn.Linear: >>> m.weight.fill_(1.0) >>> print(m.weight) >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2)) >>> net.apply(init_weights) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )
- Return type
~T
- bfloat16()¶
Casts all floating point parameters and buffers to
bfloat16
datatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- buffers(recurse=True)¶
Returns an iterator over module buffers.
- Args:
- recurse (bool): if True, then yields buffers of this module
and all submodules. Otherwise, yields only buffers that are direct members of this module.
- Yields:
torch.Tensor: module buffer
Example:
>>> for buf in model.buffers(): >>> print(type(buf), buf.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- Return type
Iterator
[Tensor
]
- children()¶
Returns an iterator over immediate children modules.
- Yields:
Module: a child module
- Return type
Iterator
[Module
]
- cpu()¶
Moves all model parameters and buffers to the CPU.
Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- cuda(device=None)¶
Moves all model parameters and buffers to the GPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.
Note
This method modifies the module in-place.
- Args:
- device (int, optional): if specified, all parameters will be
copied to that device
- Returns:
Module: self
- Return type
~T
- double()¶
Casts all floating point parameters and buffers to
double
datatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- dump_patches: bool = False¶
- eval()¶
Sets the module in evaluation mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.This is equivalent with
self.train(False)
.See locally-disable-grad-doc for a comparison between .eval() and several similar mechanisms that may be confused with it.
- Returns:
Module: self
- Return type
~T
- extra_repr()¶
Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- Return type
str
- float()¶
Casts all floating point parameters and buffers to
float
datatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- forward(x)¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.- Return type
Tensor
- get_buffer(target)¶
Returns the buffer given by
target
if it exists, otherwise throws an error.See the docstring for
get_submodule
for a more detailed explanation of this method’s functionality as well as how to correctly specifytarget
.- Args:
- target: The fully-qualified string name of the buffer
to look for. (See
get_submodule
for how to specify a fully-qualified string.)
- Returns:
torch.Tensor: The buffer referenced by
target
- Raises:
- AttributeError: If the target string references an invalid
path or resolves to something that is not a buffer
- Return type
Tensor
- get_extra_state()¶
Returns any extra state to include in the module’s state_dict. Implement this and a corresponding
set_extra_state()
for your module if you need to store extra state. This function is called when building the module’s state_dict().Note that extra state should be pickleable to ensure working serialization of the state_dict. We only provide provide backwards compatibility guarantees for serializing Tensors; other objects may break backwards compatibility if their serialized pickled form changes.
- Returns:
object: Any extra state to store in the module’s state_dict
- Return type
Any
- get_parameter(target)¶
Returns the parameter given by
target
if it exists, otherwise throws an error.See the docstring for
get_submodule
for a more detailed explanation of this method’s functionality as well as how to correctly specifytarget
.- Args:
- target: The fully-qualified string name of the Parameter
to look for. (See
get_submodule
for how to specify a fully-qualified string.)
- Returns:
torch.nn.Parameter: The Parameter referenced by
target
- Raises:
- AttributeError: If the target string references an invalid
path or resolves to something that is not an
nn.Parameter
- Return type
Parameter
- get_submodule(target)¶
Returns the submodule given by
target
if it exists, otherwise throws an error.For example, let’s say you have an
nn.Module
A
that looks like this:A( (net_b): Module( (net_c): Module( (conv): Conv2d(16, 33, kernel_size=(3, 3), stride=(2, 2)) ) (linear): Linear(in_features=100, out_features=200, bias=True) ) )
(The diagram shows an
nn.Module
A
.A
has a nested submodulenet_b
, which itself has two submodulesnet_c
andlinear
.net_c
then has a submoduleconv
.)To check whether or not we have the
linear
submodule, we would callget_submodule("net_b.linear")
. To check whether we have theconv
submodule, we would callget_submodule("net_b.net_c.conv")
.The runtime of
get_submodule
is bounded by the degree of module nesting intarget
. A query againstnamed_modules
achieves the same result, but it is O(N) in the number of transitive modules. So, for a simple check to see if some submodule exists,get_submodule
should always be used.- Args:
- target: The fully-qualified string name of the submodule
to look for. (See above example for how to specify a fully-qualified string.)
- Returns:
torch.nn.Module: The submodule referenced by
target
- Raises:
- AttributeError: If the target string references an invalid
path or resolves to something that is not an
nn.Module
- Return type
Module
- half()¶
Casts all floating point parameters and buffers to
half
datatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- ipu(device=None)¶
Moves all model parameters and buffers to the IPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on IPU while being optimized.
Note
This method modifies the module in-place.
- Arguments:
- device (int, optional): if specified, all parameters will be
copied to that device
- Returns:
Module: self
- Return type
~T
- load_state_dict(state_dict, strict=True)¶
Copies parameters and buffers from
state_dict
into this module and its descendants. Ifstrict
isTrue
, then the keys ofstate_dict
must exactly match the keys returned by this module’sstate_dict()
function.- Args:
- state_dict (dict): a dict containing parameters and
persistent buffers.
- strict (bool, optional): whether to strictly enforce that the keys
in
state_dict
match the keys returned by this module’sstate_dict()
function. Default:True
- Returns:
NamedTuple
withmissing_keys
andunexpected_keys
fields:missing_keys is a list of str containing the missing keys
unexpected_keys is a list of str containing the unexpected keys
- Note:
If a parameter or buffer is registered as
None
and its corresponding key exists instate_dict
,load_state_dict()
will raise aRuntimeError
.
- modules()¶
Returns an iterator over all modules in the network.
- Yields:
Module: a module in the network
- Note:
Duplicate modules are returned only once. In the following example,
l
will be returned only once.
Example:
>>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.modules()): print(idx, '->', m) 0 -> Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) 1 -> Linear(in_features=2, out_features=2, bias=True)
- Return type
Iterator
[Module
]
- named_buffers(prefix='', recurse=True)¶
Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
- Args:
prefix (str): prefix to prepend to all buffer names. recurse (bool): if True, then yields buffers of this module
and all submodules. Otherwise, yields only buffers that are direct members of this module.
- Yields:
(string, torch.Tensor): Tuple containing the name and buffer
Example:
>>> for name, buf in self.named_buffers(): >>> if name in ['running_var']: >>> print(buf.size())
- Return type
Iterator
[Tuple
[str
,Tensor
]]
- named_children()¶
Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
- Yields:
(string, Module): Tuple containing a name and child module
Example:
>>> for name, module in model.named_children(): >>> if name in ['conv4', 'conv5']: >>> print(module)
- Return type
Iterator
[Tuple
[str
,Module
]]
- named_modules(memo=None, prefix='', remove_duplicate=True)¶
Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
- Args:
memo: a memo to store the set of modules already added to the result prefix: a prefix that will be added to the name of the module remove_duplicate: whether to remove the duplicated module instances in the result
or not
- Yields:
(string, Module): Tuple of name and module
- Note:
Duplicate modules are returned only once. In the following example,
l
will be returned only once.
Example:
>>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.named_modules()): print(idx, '->', m) 0 -> ('', Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )) 1 -> ('0', Linear(in_features=2, out_features=2, bias=True))
- named_parameters(prefix='', recurse=True)¶
Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
- Args:
prefix (str): prefix to prepend to all parameter names. recurse (bool): if True, then yields parameters of this module
and all submodules. Otherwise, yields only parameters that are direct members of this module.
- Yields:
(string, Parameter): Tuple containing the name and parameter
Example:
>>> for name, param in self.named_parameters(): >>> if name in ['bias']: >>> print(param.size())
- Return type
Iterator
[Tuple
[str
,Parameter
]]
- online_inference(x, context=None)¶
- Return type
Tuple
[Tensor
,Tensor
]
- parameters(recurse=True)¶
Returns an iterator over module parameters.
This is typically passed to an optimizer.
- Args:
- recurse (bool): if True, then yields parameters of this module
and all submodules. Otherwise, yields only parameters that are direct members of this module.
- Yields:
Parameter: module parameter
Example:
>>> for param in model.parameters(): >>> print(type(param), param.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- Return type
Iterator
[Parameter
]
- register_backward_hook(hook)¶
Registers a backward hook on the module.
This function is deprecated in favor of
register_full_backward_hook()
and the behavior of this function will change in future versions.- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- Return type
RemovableHandle
- register_buffer(name, tensor, persistent=True)¶
Adds a buffer to the module.
This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s
running_mean
is not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by settingpersistent
toFalse
. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’sstate_dict
.Buffers can be accessed as attributes using given names.
- Args:
- name (string): name of the buffer. The buffer can be accessed
from this module using the given name
- tensor (Tensor or None): buffer to be registered. If
None
, then operations that run on buffers, such as
cuda
, are ignored. IfNone
, the buffer is not included in the module’sstate_dict
.- persistent (bool): whether the buffer is part of this module’s
Example:
>>> self.register_buffer('running_mean', torch.zeros(num_features))
- Return type
None
- register_forward_hook(hook)¶
Registers a forward hook on the module.
The hook will be called every time after
forward()
has computed an output. It should have the following signature:hook(module, input, output) -> None or modified output
The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the
forward
. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called afterforward()
is called.- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- Return type
RemovableHandle
- register_forward_pre_hook(hook)¶
Registers a forward pre-hook on the module.
The hook will be called every time before
forward()
is invoked. It should have the following signature:hook(module, input) -> None or modified input
The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the
forward
. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple).- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- Return type
RemovableHandle
- register_full_backward_hook(hook)¶
Registers a backward hook on the module.
The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature:
hook(module, grad_input, grad_output) -> tuple(Tensor) or None
The
grad_input
andgrad_output
are tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place ofgrad_input
in subsequent computations.grad_input
will only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries ingrad_input
andgrad_output
will beNone
for all non-Tensor arguments.For technical reasons, when this hook is applied to a Module, its forward function will receive a view of each Tensor passed to the Module. Similarly the caller will receive a view of each Tensor returned by the Module’s forward function.
Warning
Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error.
- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- Return type
RemovableHandle
- register_load_state_dict_post_hook(hook)¶
Registers a post hook to be run after module’s
load_state_dict
is called.- It should have the following signature::
hook(module, incompatible_keys) -> None
The
module
argument is the current module that this hook is registered on, and theincompatible_keys
argument is aNamedTuple
consisting of attributesmissing_keys
andunexpected_keys
.missing_keys
is alist
ofstr
containing the missing keys andunexpected_keys
is alist
ofstr
containing the unexpected keys.The given incompatible_keys can be modified inplace if needed.
Note that the checks performed when calling
load_state_dict()
withstrict=True
are affected by modifications the hook makes tomissing_keys
orunexpected_keys
, as expected. Additions to either set of keys will result in an error being thrown whenstrict=True
, and clearning out both missing and unexpected keys will avoid an error.- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- register_module(name, module)¶
Alias for
add_module()
.- Return type
None
- register_parameter(name, param)¶
Adds a parameter to the module.
The parameter can be accessed as an attribute using given name.
- Args:
- name (string): name of the parameter. The parameter can be accessed
from this module using the given name
- param (Parameter or None): parameter to be added to the module. If
None
, then operations that run on parameters, such ascuda
, are ignored. IfNone
, the parameter is not included in the module’sstate_dict
.
- Return type
None
- property remove_dc_offset: bool¶
- Return type
bool
- requires_grad_(requires_grad=True)¶
Change if autograd should record operations on parameters in this module.
This method sets the parameters’
requires_grad
attributes in-place.This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).
See locally-disable-grad-doc for a comparison between .requires_grad_() and several similar mechanisms that may be confused with it.
- Args:
- requires_grad (bool): whether autograd should record operations on
parameters in this module. Default:
True
.
- Returns:
Module: self
- Return type
~T
- property sampling_rate: int¶
- Return type
int
- set_extra_state(state)¶
This function is called from
load_state_dict()
to handle any extra state found within the state_dict. Implement this function and a correspondingget_extra_state()
for your module if you need to store extra state within its state_dict.- Args:
state (dict): Extra state from the state_dict
See
torch.Tensor.share_memory_()
- Return type
~T
- state_dict(*args, destination=None, prefix='', keep_vars=False)¶
Returns a dictionary containing a whole state of the module.
Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names. Parameters and buffers set to
None
are not included.Warning
Currently
state_dict()
also accepts positional arguments fordestination
,prefix
andkeep_vars
in order. However, this is being deprecated and keyword arguments will be enforced in future releases.Warning
Please avoid the use of argument
destination
as it is not designed for end-users.- Args:
- destination (dict, optional): If provided, the state of module will
be updated into the dict and the same object is returned. Otherwise, an
OrderedDict
will be created and returned. Default:None
.- prefix (str, optional): a prefix added to parameter and buffer
names to compose the keys in state_dict. Default:
''
.- keep_vars (bool, optional): by default the
Tensor
s returned in the state dict are detached from autograd. If it’s set to
True
, detaching will not be performed. Default:False
.
- Returns:
- dict:
a dictionary containing a whole state of the module
Example:
>>> module.state_dict().keys() ['bias', 'weight']
- to(*args, **kwargs)¶
Moves and/or casts the parameters and buffers.
This can be called as
- to(device=None, dtype=None, non_blocking=False)
- to(dtype, non_blocking=False)
- to(tensor, non_blocking=False)
- to(memory_format=torch.channels_last)
Its signature is similar to
torch.Tensor.to()
, but only accepts floating point or complexdtype
s. In addition, this method will only cast the floating point or complex parameters and buffers todtype
(if given). The integral parameters and buffers will be moveddevice
, if that is given, but with dtypes unchanged. Whennon_blocking
is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.See below for examples.
Note
This method modifies the module in-place.
- Args:
- device (
torch.device
): the desired device of the parameters and buffers in this module
- dtype (
torch.dtype
): the desired floating point or complex dtype of the parameters and buffers in this module
- tensor (torch.Tensor): Tensor whose dtype and device are the desired
dtype and device for all parameters and buffers in this module
- memory_format (
torch.memory_format
): the desired memory format for 4D parameters and buffers in this module (keyword only argument)
- device (
- Returns:
Module: self
Examples:
>>> linear = nn.Linear(2, 2) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]]) >>> linear.to(torch.double) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]], dtype=torch.float64) >>> gpu1 = torch.device("cuda:1") >>> linear.to(gpu1, dtype=torch.half, non_blocking=True) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1') >>> cpu = torch.device("cpu") >>> linear.to(cpu) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16) >>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble) >>> linear.weight Parameter containing: tensor([[ 0.3741+0.j, 0.2382+0.j], [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128) >>> linear(torch.ones(3, 2, dtype=torch.cdouble)) tensor([[0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
- to_empty(*, device)¶
Moves the parameters and buffers to the specified device without copying storage.
- Args:
- device (
torch.device
): The desired device of the parameters and buffers in this module.
- device (
- Returns:
Module: self
- Return type
~T
- train(mode=True)¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Args:
- mode (bool): whether to set training mode (
True
) or evaluation mode (
False
). Default:True
.
- mode (bool): whether to set training mode (
- Returns:
Module: self
- Return type
~T
- type(dst_type)¶
Casts all parameters and buffers to
dst_type
.Note
This method modifies the module in-place.
- Args:
dst_type (type or string): the desired type
- Returns:
Module: self
- Return type
~T
- property window_type: str¶
- Return type
str
- xpu(device=None)¶
Moves all model parameters and buffers to the XPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized.
Note
This method modifies the module in-place.
- Arguments:
- device (int, optional): if specified, all parameters will be
copied to that device
- Returns:
Module: self
- Return type
~T
- zero_grad(set_to_none=False)¶
Sets gradients of all model parameters to zero. See similar function under
torch.optim.Optimizer
for more context.- Args:
- set_to_none (bool): instead of setting to zero, set the grads to None.
See
torch.optim.Optimizer.zero_grad()
for details.
- Return type
None
- training: bool¶
- class lhotse.features.kaldi.layers.Wav2LogSpec(sampling_rate=16000, frame_length=0.025, frame_shift=0.01, round_to_power_of_two=True, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', dither=0.0, snip_edges=False, energy_floor=1e-10, raw_energy=True, use_energy=True, use_fft_mag=False)[source]¶
Apply standard Kaldi preprocessing (dithering, removing DC offset, pre-emphasis, etc.) on the input waveforms and compute their Short-Time Fourier Transform (STFT). The STFT is transformed either to a log-magnitude spectrum (
use_fft_mag=True
) or a log-power spectrum (use_fft_mag=False
).Example:
>>> x = torch.randn(1, 16000, dtype=torch.float32) >>> x.shape torch.Size([1, 16000]) >>> t = Wav2LogSpec() >>> t(x).shape torch.Size([1, 100, 257])
The input is a tensor of shape
(batch_size, num_samples)
. The output is a tensor of shape(batch_size, num_frames, num_fft_bins)
.- __init__(sampling_rate=16000, frame_length=0.025, frame_shift=0.01, round_to_power_of_two=True, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', dither=0.0, snip_edges=False, energy_floor=1e-10, raw_energy=True, use_energy=True, use_fft_mag=False)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- T_destination¶
alias of TypeVar(‘T_destination’, bound=
Dict
[str
,Any
])
- add_module(name, module)¶
Adds a child module to the current module.
The module can be accessed as an attribute using the given name.
- Args:
- name (string): name of the child module. The child module can be
accessed from this module using the given name
module (Module): child module to be added to the module.
- Return type
None
- apply(fn)¶
Applies
fn
recursively to every submodule (as returned by.children()
) as well as self. Typical use includes initializing the parameters of a model (see also nn-init-doc).- Args:
fn (
Module
-> None): function to be applied to each submodule- Returns:
Module: self
Example:
>>> @torch.no_grad() >>> def init_weights(m): >>> print(m) >>> if type(m) == nn.Linear: >>> m.weight.fill_(1.0) >>> print(m.weight) >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2)) >>> net.apply(init_weights) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )
- Return type
~T
- bfloat16()¶
Casts all floating point parameters and buffers to
bfloat16
datatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- buffers(recurse=True)¶
Returns an iterator over module buffers.
- Args:
- recurse (bool): if True, then yields buffers of this module
and all submodules. Otherwise, yields only buffers that are direct members of this module.
- Yields:
torch.Tensor: module buffer
Example:
>>> for buf in model.buffers(): >>> print(type(buf), buf.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- Return type
Iterator
[Tensor
]
- children()¶
Returns an iterator over immediate children modules.
- Yields:
Module: a child module
- Return type
Iterator
[Module
]
- cpu()¶
Moves all model parameters and buffers to the CPU.
Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- cuda(device=None)¶
Moves all model parameters and buffers to the GPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.
Note
This method modifies the module in-place.
- Args:
- device (int, optional): if specified, all parameters will be
copied to that device
- Returns:
Module: self
- Return type
~T
- double()¶
Casts all floating point parameters and buffers to
double
datatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- dump_patches: bool = False¶
- eval()¶
Sets the module in evaluation mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.This is equivalent with
self.train(False)
.See locally-disable-grad-doc for a comparison between .eval() and several similar mechanisms that may be confused with it.
- Returns:
Module: self
- Return type
~T
- extra_repr()¶
Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- Return type
str
- float()¶
Casts all floating point parameters and buffers to
float
datatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- forward(x)¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.- Return type
Tensor
- get_buffer(target)¶
Returns the buffer given by
target
if it exists, otherwise throws an error.See the docstring for
get_submodule
for a more detailed explanation of this method’s functionality as well as how to correctly specifytarget
.- Args:
- target: The fully-qualified string name of the buffer
to look for. (See
get_submodule
for how to specify a fully-qualified string.)
- Returns:
torch.Tensor: The buffer referenced by
target
- Raises:
- AttributeError: If the target string references an invalid
path or resolves to something that is not a buffer
- Return type
Tensor
- get_extra_state()¶
Returns any extra state to include in the module’s state_dict. Implement this and a corresponding
set_extra_state()
for your module if you need to store extra state. This function is called when building the module’s state_dict().Note that extra state should be pickleable to ensure working serialization of the state_dict. We only provide provide backwards compatibility guarantees for serializing Tensors; other objects may break backwards compatibility if their serialized pickled form changes.
- Returns:
object: Any extra state to store in the module’s state_dict
- Return type
Any
- get_parameter(target)¶
Returns the parameter given by
target
if it exists, otherwise throws an error.See the docstring for
get_submodule
for a more detailed explanation of this method’s functionality as well as how to correctly specifytarget
.- Args:
- target: The fully-qualified string name of the Parameter
to look for. (See
get_submodule
for how to specify a fully-qualified string.)
- Returns:
torch.nn.Parameter: The Parameter referenced by
target
- Raises:
- AttributeError: If the target string references an invalid
path or resolves to something that is not an
nn.Parameter
- Return type
Parameter
- get_submodule(target)¶
Returns the submodule given by
target
if it exists, otherwise throws an error.For example, let’s say you have an
nn.Module
A
that looks like this:A( (net_b): Module( (net_c): Module( (conv): Conv2d(16, 33, kernel_size=(3, 3), stride=(2, 2)) ) (linear): Linear(in_features=100, out_features=200, bias=True) ) )
(The diagram shows an
nn.Module
A
.A
has a nested submodulenet_b
, which itself has two submodulesnet_c
andlinear
.net_c
then has a submoduleconv
.)To check whether or not we have the
linear
submodule, we would callget_submodule("net_b.linear")
. To check whether we have theconv
submodule, we would callget_submodule("net_b.net_c.conv")
.The runtime of
get_submodule
is bounded by the degree of module nesting intarget
. A query againstnamed_modules
achieves the same result, but it is O(N) in the number of transitive modules. So, for a simple check to see if some submodule exists,get_submodule
should always be used.- Args:
- target: The fully-qualified string name of the submodule
to look for. (See above example for how to specify a fully-qualified string.)
- Returns:
torch.nn.Module: The submodule referenced by
target
- Raises:
- AttributeError: If the target string references an invalid
path or resolves to something that is not an
nn.Module
- Return type
Module
- half()¶
Casts all floating point parameters and buffers to
half
datatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- ipu(device=None)¶
Moves all model parameters and buffers to the IPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on IPU while being optimized.
Note
This method modifies the module in-place.
- Arguments:
- device (int, optional): if specified, all parameters will be
copied to that device
- Returns:
Module: self
- Return type
~T
- load_state_dict(state_dict, strict=True)¶
Copies parameters and buffers from
state_dict
into this module and its descendants. Ifstrict
isTrue
, then the keys ofstate_dict
must exactly match the keys returned by this module’sstate_dict()
function.- Args:
- state_dict (dict): a dict containing parameters and
persistent buffers.
- strict (bool, optional): whether to strictly enforce that the keys
in
state_dict
match the keys returned by this module’sstate_dict()
function. Default:True
- Returns:
NamedTuple
withmissing_keys
andunexpected_keys
fields:missing_keys is a list of str containing the missing keys
unexpected_keys is a list of str containing the unexpected keys
- Note:
If a parameter or buffer is registered as
None
and its corresponding key exists instate_dict
,load_state_dict()
will raise aRuntimeError
.
- modules()¶
Returns an iterator over all modules in the network.
- Yields:
Module: a module in the network
- Note:
Duplicate modules are returned only once. In the following example,
l
will be returned only once.
Example:
>>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.modules()): print(idx, '->', m) 0 -> Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) 1 -> Linear(in_features=2, out_features=2, bias=True)
- Return type
Iterator
[Module
]
- named_buffers(prefix='', recurse=True)¶
Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
- Args:
prefix (str): prefix to prepend to all buffer names. recurse (bool): if True, then yields buffers of this module
and all submodules. Otherwise, yields only buffers that are direct members of this module.
- Yields:
(string, torch.Tensor): Tuple containing the name and buffer
Example:
>>> for name, buf in self.named_buffers(): >>> if name in ['running_var']: >>> print(buf.size())
- Return type
Iterator
[Tuple
[str
,Tensor
]]
- named_children()¶
Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
- Yields:
(string, Module): Tuple containing a name and child module
Example:
>>> for name, module in model.named_children(): >>> if name in ['conv4', 'conv5']: >>> print(module)
- Return type
Iterator
[Tuple
[str
,Module
]]
- named_modules(memo=None, prefix='', remove_duplicate=True)¶
Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
- Args:
memo: a memo to store the set of modules already added to the result prefix: a prefix that will be added to the name of the module remove_duplicate: whether to remove the duplicated module instances in the result
or not
- Yields:
(string, Module): Tuple of name and module
- Note:
Duplicate modules are returned only once. In the following example,
l
will be returned only once.
Example:
>>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.named_modules()): print(idx, '->', m) 0 -> ('', Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )) 1 -> ('0', Linear(in_features=2, out_features=2, bias=True))
- named_parameters(prefix='', recurse=True)¶
Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
- Args:
prefix (str): prefix to prepend to all parameter names. recurse (bool): if True, then yields parameters of this module
and all submodules. Otherwise, yields only parameters that are direct members of this module.
- Yields:
(string, Parameter): Tuple containing the name and parameter
Example:
>>> for name, param in self.named_parameters(): >>> if name in ['bias']: >>> print(param.size())
- Return type
Iterator
[Tuple
[str
,Parameter
]]
- online_inference(x, context=None)¶
- Return type
Tuple
[Tensor
,Tensor
]
- parameters(recurse=True)¶
Returns an iterator over module parameters.
This is typically passed to an optimizer.
- Args:
- recurse (bool): if True, then yields parameters of this module
and all submodules. Otherwise, yields only parameters that are direct members of this module.
- Yields:
Parameter: module parameter
Example:
>>> for param in model.parameters(): >>> print(type(param), param.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- Return type
Iterator
[Parameter
]
- register_backward_hook(hook)¶
Registers a backward hook on the module.
This function is deprecated in favor of
register_full_backward_hook()
and the behavior of this function will change in future versions.- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- Return type
RemovableHandle
- register_buffer(name, tensor, persistent=True)¶
Adds a buffer to the module.
This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s
running_mean
is not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by settingpersistent
toFalse
. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’sstate_dict
.Buffers can be accessed as attributes using given names.
- Args:
- name (string): name of the buffer. The buffer can be accessed
from this module using the given name
- tensor (Tensor or None): buffer to be registered. If
None
, then operations that run on buffers, such as
cuda
, are ignored. IfNone
, the buffer is not included in the module’sstate_dict
.- persistent (bool): whether the buffer is part of this module’s
Example:
>>> self.register_buffer('running_mean', torch.zeros(num_features))
- Return type
None
- register_forward_hook(hook)¶
Registers a forward hook on the module.
The hook will be called every time after
forward()
has computed an output. It should have the following signature:hook(module, input, output) -> None or modified output
The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the
forward
. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called afterforward()
is called.- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- Return type
RemovableHandle
- register_forward_pre_hook(hook)¶
Registers a forward pre-hook on the module.
The hook will be called every time before
forward()
is invoked. It should have the following signature:hook(module, input) -> None or modified input
The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the
forward
. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple).- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- Return type
RemovableHandle
- register_full_backward_hook(hook)¶
Registers a backward hook on the module.
The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature:
hook(module, grad_input, grad_output) -> tuple(Tensor) or None
The
grad_input
andgrad_output
are tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place ofgrad_input
in subsequent computations.grad_input
will only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries ingrad_input
andgrad_output
will beNone
for all non-Tensor arguments.For technical reasons, when this hook is applied to a Module, its forward function will receive a view of each Tensor passed to the Module. Similarly the caller will receive a view of each Tensor returned by the Module’s forward function.
Warning
Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error.
- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- Return type
RemovableHandle
- register_load_state_dict_post_hook(hook)¶
Registers a post hook to be run after module’s
load_state_dict
is called.- It should have the following signature::
hook(module, incompatible_keys) -> None
The
module
argument is the current module that this hook is registered on, and theincompatible_keys
argument is aNamedTuple
consisting of attributesmissing_keys
andunexpected_keys
.missing_keys
is alist
ofstr
containing the missing keys andunexpected_keys
is alist
ofstr
containing the unexpected keys.The given incompatible_keys can be modified inplace if needed.
Note that the checks performed when calling
load_state_dict()
withstrict=True
are affected by modifications the hook makes tomissing_keys
orunexpected_keys
, as expected. Additions to either set of keys will result in an error being thrown whenstrict=True
, and clearning out both missing and unexpected keys will avoid an error.- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- register_module(name, module)¶
Alias for
add_module()
.- Return type
None
- register_parameter(name, param)¶
Adds a parameter to the module.
The parameter can be accessed as an attribute using given name.
- Args:
- name (string): name of the parameter. The parameter can be accessed
from this module using the given name
- param (Parameter or None): parameter to be added to the module. If
None
, then operations that run on parameters, such ascuda
, are ignored. IfNone
, the parameter is not included in the module’sstate_dict
.
- Return type
None
- property remove_dc_offset: bool¶
- Return type
bool
- requires_grad_(requires_grad=True)¶
Change if autograd should record operations on parameters in this module.
This method sets the parameters’
requires_grad
attributes in-place.This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).
See locally-disable-grad-doc for a comparison between .requires_grad_() and several similar mechanisms that may be confused with it.
- Args:
- requires_grad (bool): whether autograd should record operations on
parameters in this module. Default:
True
.
- Returns:
Module: self
- Return type
~T
- property sampling_rate: int¶
- Return type
int
- set_extra_state(state)¶
This function is called from
load_state_dict()
to handle any extra state found within the state_dict. Implement this function and a correspondingget_extra_state()
for your module if you need to store extra state within its state_dict.- Args:
state (dict): Extra state from the state_dict
See
torch.Tensor.share_memory_()
- Return type
~T
- state_dict(*args, destination=None, prefix='', keep_vars=False)¶
Returns a dictionary containing a whole state of the module.
Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names. Parameters and buffers set to
None
are not included.Warning
Currently
state_dict()
also accepts positional arguments fordestination
,prefix
andkeep_vars
in order. However, this is being deprecated and keyword arguments will be enforced in future releases.Warning
Please avoid the use of argument
destination
as it is not designed for end-users.- Args:
- destination (dict, optional): If provided, the state of module will
be updated into the dict and the same object is returned. Otherwise, an
OrderedDict
will be created and returned. Default:None
.- prefix (str, optional): a prefix added to parameter and buffer
names to compose the keys in state_dict. Default:
''
.- keep_vars (bool, optional): by default the
Tensor
s returned in the state dict are detached from autograd. If it’s set to
True
, detaching will not be performed. Default:False
.
- Returns:
- dict:
a dictionary containing a whole state of the module
Example:
>>> module.state_dict().keys() ['bias', 'weight']
- to(*args, **kwargs)¶
Moves and/or casts the parameters and buffers.
This can be called as
- to(device=None, dtype=None, non_blocking=False)
- to(dtype, non_blocking=False)
- to(tensor, non_blocking=False)
- to(memory_format=torch.channels_last)
Its signature is similar to
torch.Tensor.to()
, but only accepts floating point or complexdtype
s. In addition, this method will only cast the floating point or complex parameters and buffers todtype
(if given). The integral parameters and buffers will be moveddevice
, if that is given, but with dtypes unchanged. Whennon_blocking
is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.See below for examples.
Note
This method modifies the module in-place.
- Args:
- device (
torch.device
): the desired device of the parameters and buffers in this module
- dtype (
torch.dtype
): the desired floating point or complex dtype of the parameters and buffers in this module
- tensor (torch.Tensor): Tensor whose dtype and device are the desired
dtype and device for all parameters and buffers in this module
- memory_format (
torch.memory_format
): the desired memory format for 4D parameters and buffers in this module (keyword only argument)
- device (
- Returns:
Module: self
Examples:
>>> linear = nn.Linear(2, 2) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]]) >>> linear.to(torch.double) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]], dtype=torch.float64) >>> gpu1 = torch.device("cuda:1") >>> linear.to(gpu1, dtype=torch.half, non_blocking=True) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1') >>> cpu = torch.device("cpu") >>> linear.to(cpu) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16) >>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble) >>> linear.weight Parameter containing: tensor([[ 0.3741+0.j, 0.2382+0.j], [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128) >>> linear(torch.ones(3, 2, dtype=torch.cdouble)) tensor([[0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
- to_empty(*, device)¶
Moves the parameters and buffers to the specified device without copying storage.
- Args:
- device (
torch.device
): The desired device of the parameters and buffers in this module.
- device (
- Returns:
Module: self
- Return type
~T
- train(mode=True)¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Args:
- mode (bool): whether to set training mode (
True
) or evaluation mode (
False
). Default:True
.
- mode (bool): whether to set training mode (
- Returns:
Module: self
- Return type
~T
- type(dst_type)¶
Casts all parameters and buffers to
dst_type
.Note
This method modifies the module in-place.
- Args:
dst_type (type or string): the desired type
- Returns:
Module: self
- Return type
~T
- property window_type: str¶
- Return type
str
- xpu(device=None)¶
Moves all model parameters and buffers to the XPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized.
Note
This method modifies the module in-place.
- Arguments:
- device (int, optional): if specified, all parameters will be
copied to that device
- Returns:
Module: self
- Return type
~T
- zero_grad(set_to_none=False)¶
Sets gradients of all model parameters to zero. See similar function under
torch.optim.Optimizer
for more context.- Args:
- set_to_none (bool): instead of setting to zero, set the grads to None.
See
torch.optim.Optimizer.zero_grad()
for details.
- Return type
None
- training: bool¶
- class lhotse.features.kaldi.layers.Wav2LogFilterBank(sampling_rate=16000, frame_length=0.025, frame_shift=0.01, round_to_power_of_two=True, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', dither=0.0, snip_edges=False, energy_floor=1e-10, raw_energy=True, use_energy=False, use_fft_mag=False, low_freq=20.0, high_freq=- 400.0, num_filters=80, norm_filters=False, torchaudio_compatible_mel_scale=True)[source]¶
Apply standard Kaldi preprocessing (dithering, removing DC offset, pre-emphasis, etc.) on the input waveforms and compute their log-Mel filter bank energies (also known as “fbank”).
Example:
>>> x = torch.randn(1, 16000, dtype=torch.float32) >>> x.shape torch.Size([1, 16000]) >>> t = Wav2LogFilterBank() >>> t(x).shape torch.Size([1, 100, 80])
The input is a tensor of shape
(batch_size, num_samples)
. The output is a tensor of shape(batch_size, num_frames, num_filters)
.- __init__(sampling_rate=16000, frame_length=0.025, frame_shift=0.01, round_to_power_of_two=True, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', dither=0.0, snip_edges=False, energy_floor=1e-10, raw_energy=True, use_energy=False, use_fft_mag=False, low_freq=20.0, high_freq=- 400.0, num_filters=80, norm_filters=False, torchaudio_compatible_mel_scale=True)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- T_destination¶
alias of TypeVar(‘T_destination’, bound=
Dict
[str
,Any
])
- add_module(name, module)¶
Adds a child module to the current module.
The module can be accessed as an attribute using the given name.
- Args:
- name (string): name of the child module. The child module can be
accessed from this module using the given name
module (Module): child module to be added to the module.
- Return type
None
- apply(fn)¶
Applies
fn
recursively to every submodule (as returned by.children()
) as well as self. Typical use includes initializing the parameters of a model (see also nn-init-doc).- Args:
fn (
Module
-> None): function to be applied to each submodule- Returns:
Module: self
Example:
>>> @torch.no_grad() >>> def init_weights(m): >>> print(m) >>> if type(m) == nn.Linear: >>> m.weight.fill_(1.0) >>> print(m.weight) >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2)) >>> net.apply(init_weights) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )
- Return type
~T
- bfloat16()¶
Casts all floating point parameters and buffers to
bfloat16
datatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- buffers(recurse=True)¶
Returns an iterator over module buffers.
- Args:
- recurse (bool): if True, then yields buffers of this module
and all submodules. Otherwise, yields only buffers that are direct members of this module.
- Yields:
torch.Tensor: module buffer
Example:
>>> for buf in model.buffers(): >>> print(type(buf), buf.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- Return type
Iterator
[Tensor
]
- children()¶
Returns an iterator over immediate children modules.
- Yields:
Module: a child module
- Return type
Iterator
[Module
]
- cpu()¶
Moves all model parameters and buffers to the CPU.
Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- cuda(device=None)¶
Moves all model parameters and buffers to the GPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.
Note
This method modifies the module in-place.
- Args:
- device (int, optional): if specified, all parameters will be
copied to that device
- Returns:
Module: self
- Return type
~T
- double()¶
Casts all floating point parameters and buffers to
double
datatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- dump_patches: bool = False¶
- eval()¶
Sets the module in evaluation mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.This is equivalent with
self.train(False)
.See locally-disable-grad-doc for a comparison between .eval() and several similar mechanisms that may be confused with it.
- Returns:
Module: self
- Return type
~T
- extra_repr()¶
Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- Return type
str
- float()¶
Casts all floating point parameters and buffers to
float
datatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- forward(x)¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.- Return type
Tensor
- get_buffer(target)¶
Returns the buffer given by
target
if it exists, otherwise throws an error.See the docstring for
get_submodule
for a more detailed explanation of this method’s functionality as well as how to correctly specifytarget
.- Args:
- target: The fully-qualified string name of the buffer
to look for. (See
get_submodule
for how to specify a fully-qualified string.)
- Returns:
torch.Tensor: The buffer referenced by
target
- Raises:
- AttributeError: If the target string references an invalid
path or resolves to something that is not a buffer
- Return type
Tensor
- get_extra_state()¶
Returns any extra state to include in the module’s state_dict. Implement this and a corresponding
set_extra_state()
for your module if you need to store extra state. This function is called when building the module’s state_dict().Note that extra state should be pickleable to ensure working serialization of the state_dict. We only provide provide backwards compatibility guarantees for serializing Tensors; other objects may break backwards compatibility if their serialized pickled form changes.
- Returns:
object: Any extra state to store in the module’s state_dict
- Return type
Any
- get_parameter(target)¶
Returns the parameter given by
target
if it exists, otherwise throws an error.See the docstring for
get_submodule
for a more detailed explanation of this method’s functionality as well as how to correctly specifytarget
.- Args:
- target: The fully-qualified string name of the Parameter
to look for. (See
get_submodule
for how to specify a fully-qualified string.)
- Returns:
torch.nn.Parameter: The Parameter referenced by
target
- Raises:
- AttributeError: If the target string references an invalid
path or resolves to something that is not an
nn.Parameter
- Return type
Parameter
- get_submodule(target)¶
Returns the submodule given by
target
if it exists, otherwise throws an error.For example, let’s say you have an
nn.Module
A
that looks like this:A( (net_b): Module( (net_c): Module( (conv): Conv2d(16, 33, kernel_size=(3, 3), stride=(2, 2)) ) (linear): Linear(in_features=100, out_features=200, bias=True) ) )
(The diagram shows an
nn.Module
A
.A
has a nested submodulenet_b
, which itself has two submodulesnet_c
andlinear
.net_c
then has a submoduleconv
.)To check whether or not we have the
linear
submodule, we would callget_submodule("net_b.linear")
. To check whether we have theconv
submodule, we would callget_submodule("net_b.net_c.conv")
.The runtime of
get_submodule
is bounded by the degree of module nesting intarget
. A query againstnamed_modules
achieves the same result, but it is O(N) in the number of transitive modules. So, for a simple check to see if some submodule exists,get_submodule
should always be used.- Args:
- target: The fully-qualified string name of the submodule
to look for. (See above example for how to specify a fully-qualified string.)
- Returns:
torch.nn.Module: The submodule referenced by
target
- Raises:
- AttributeError: If the target string references an invalid
path or resolves to something that is not an
nn.Module
- Return type
Module
- half()¶
Casts all floating point parameters and buffers to
half
datatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- ipu(device=None)¶
Moves all model parameters and buffers to the IPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on IPU while being optimized.
Note
This method modifies the module in-place.
- Arguments:
- device (int, optional): if specified, all parameters will be
copied to that device
- Returns:
Module: self
- Return type
~T
- load_state_dict(state_dict, strict=True)¶
Copies parameters and buffers from
state_dict
into this module and its descendants. Ifstrict
isTrue
, then the keys ofstate_dict
must exactly match the keys returned by this module’sstate_dict()
function.- Args:
- state_dict (dict): a dict containing parameters and
persistent buffers.
- strict (bool, optional): whether to strictly enforce that the keys
in
state_dict
match the keys returned by this module’sstate_dict()
function. Default:True
- Returns:
NamedTuple
withmissing_keys
andunexpected_keys
fields:missing_keys is a list of str containing the missing keys
unexpected_keys is a list of str containing the unexpected keys
- Note:
If a parameter or buffer is registered as
None
and its corresponding key exists instate_dict
,load_state_dict()
will raise aRuntimeError
.
- modules()¶
Returns an iterator over all modules in the network.
- Yields:
Module: a module in the network
- Note:
Duplicate modules are returned only once. In the following example,
l
will be returned only once.
Example:
>>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.modules()): print(idx, '->', m) 0 -> Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) 1 -> Linear(in_features=2, out_features=2, bias=True)
- Return type
Iterator
[Module
]
- named_buffers(prefix='', recurse=True)¶
Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
- Args:
prefix (str): prefix to prepend to all buffer names. recurse (bool): if True, then yields buffers of this module
and all submodules. Otherwise, yields only buffers that are direct members of this module.
- Yields:
(string, torch.Tensor): Tuple containing the name and buffer
Example:
>>> for name, buf in self.named_buffers(): >>> if name in ['running_var']: >>> print(buf.size())
- Return type
Iterator
[Tuple
[str
,Tensor
]]
- named_children()¶
Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
- Yields:
(string, Module): Tuple containing a name and child module
Example:
>>> for name, module in model.named_children(): >>> if name in ['conv4', 'conv5']: >>> print(module)
- Return type
Iterator
[Tuple
[str
,Module
]]
- named_modules(memo=None, prefix='', remove_duplicate=True)¶
Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
- Args:
memo: a memo to store the set of modules already added to the result prefix: a prefix that will be added to the name of the module remove_duplicate: whether to remove the duplicated module instances in the result
or not
- Yields:
(string, Module): Tuple of name and module
- Note:
Duplicate modules are returned only once. In the following example,
l
will be returned only once.
Example:
>>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.named_modules()): print(idx, '->', m) 0 -> ('', Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )) 1 -> ('0', Linear(in_features=2, out_features=2, bias=True))
- named_parameters(prefix='', recurse=True)¶
Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
- Args:
prefix (str): prefix to prepend to all parameter names. recurse (bool): if True, then yields parameters of this module
and all submodules. Otherwise, yields only parameters that are direct members of this module.
- Yields:
(string, Parameter): Tuple containing the name and parameter
Example:
>>> for name, param in self.named_parameters(): >>> if name in ['bias']: >>> print(param.size())
- Return type
Iterator
[Tuple
[str
,Parameter
]]
- online_inference(x, context=None)¶
- Return type
Tuple
[Tensor
,Tensor
]
- parameters(recurse=True)¶
Returns an iterator over module parameters.
This is typically passed to an optimizer.
- Args:
- recurse (bool): if True, then yields parameters of this module
and all submodules. Otherwise, yields only parameters that are direct members of this module.
- Yields:
Parameter: module parameter
Example:
>>> for param in model.parameters(): >>> print(type(param), param.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- Return type
Iterator
[Parameter
]
- register_backward_hook(hook)¶
Registers a backward hook on the module.
This function is deprecated in favor of
register_full_backward_hook()
and the behavior of this function will change in future versions.- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- Return type
RemovableHandle
- register_buffer(name, tensor, persistent=True)¶
Adds a buffer to the module.
This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s
running_mean
is not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by settingpersistent
toFalse
. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’sstate_dict
.Buffers can be accessed as attributes using given names.
- Args:
- name (string): name of the buffer. The buffer can be accessed
from this module using the given name
- tensor (Tensor or None): buffer to be registered. If
None
, then operations that run on buffers, such as
cuda
, are ignored. IfNone
, the buffer is not included in the module’sstate_dict
.- persistent (bool): whether the buffer is part of this module’s
Example:
>>> self.register_buffer('running_mean', torch.zeros(num_features))
- Return type
None
- register_forward_hook(hook)¶
Registers a forward hook on the module.
The hook will be called every time after
forward()
has computed an output. It should have the following signature:hook(module, input, output) -> None or modified output
The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the
forward
. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called afterforward()
is called.- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- Return type
RemovableHandle
- register_forward_pre_hook(hook)¶
Registers a forward pre-hook on the module.
The hook will be called every time before
forward()
is invoked. It should have the following signature:hook(module, input) -> None or modified input
The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the
forward
. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple).- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- Return type
RemovableHandle
- register_full_backward_hook(hook)¶
Registers a backward hook on the module.
The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature:
hook(module, grad_input, grad_output) -> tuple(Tensor) or None
The
grad_input
andgrad_output
are tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place ofgrad_input
in subsequent computations.grad_input
will only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries ingrad_input
andgrad_output
will beNone
for all non-Tensor arguments.For technical reasons, when this hook is applied to a Module, its forward function will receive a view of each Tensor passed to the Module. Similarly the caller will receive a view of each Tensor returned by the Module’s forward function.
Warning
Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error.
- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- Return type
RemovableHandle
- register_load_state_dict_post_hook(hook)¶
Registers a post hook to be run after module’s
load_state_dict
is called.- It should have the following signature::
hook(module, incompatible_keys) -> None
The
module
argument is the current module that this hook is registered on, and theincompatible_keys
argument is aNamedTuple
consisting of attributesmissing_keys
andunexpected_keys
.missing_keys
is alist
ofstr
containing the missing keys andunexpected_keys
is alist
ofstr
containing the unexpected keys.The given incompatible_keys can be modified inplace if needed.
Note that the checks performed when calling
load_state_dict()
withstrict=True
are affected by modifications the hook makes tomissing_keys
orunexpected_keys
, as expected. Additions to either set of keys will result in an error being thrown whenstrict=True
, and clearning out both missing and unexpected keys will avoid an error.- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- register_module(name, module)¶
Alias for
add_module()
.- Return type
None
- register_parameter(name, param)¶
Adds a parameter to the module.
The parameter can be accessed as an attribute using given name.
- Args:
- name (string): name of the parameter. The parameter can be accessed
from this module using the given name
- param (Parameter or None): parameter to be added to the module. If
None
, then operations that run on parameters, such ascuda
, are ignored. IfNone
, the parameter is not included in the module’sstate_dict
.
- Return type
None
- property remove_dc_offset: bool¶
- Return type
bool
- requires_grad_(requires_grad=True)¶
Change if autograd should record operations on parameters in this module.
This method sets the parameters’
requires_grad
attributes in-place.This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).
See locally-disable-grad-doc for a comparison between .requires_grad_() and several similar mechanisms that may be confused with it.
- Args:
- requires_grad (bool): whether autograd should record operations on
parameters in this module. Default:
True
.
- Returns:
Module: self
- Return type
~T
- property sampling_rate: int¶
- Return type
int
- set_extra_state(state)¶
This function is called from
load_state_dict()
to handle any extra state found within the state_dict. Implement this function and a correspondingget_extra_state()
for your module if you need to store extra state within its state_dict.- Args:
state (dict): Extra state from the state_dict
See
torch.Tensor.share_memory_()
- Return type
~T
- state_dict(*args, destination=None, prefix='', keep_vars=False)¶
Returns a dictionary containing a whole state of the module.
Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names. Parameters and buffers set to
None
are not included.Warning
Currently
state_dict()
also accepts positional arguments fordestination
,prefix
andkeep_vars
in order. However, this is being deprecated and keyword arguments will be enforced in future releases.Warning
Please avoid the use of argument
destination
as it is not designed for end-users.- Args:
- destination (dict, optional): If provided, the state of module will
be updated into the dict and the same object is returned. Otherwise, an
OrderedDict
will be created and returned. Default:None
.- prefix (str, optional): a prefix added to parameter and buffer
names to compose the keys in state_dict. Default:
''
.- keep_vars (bool, optional): by default the
Tensor
s returned in the state dict are detached from autograd. If it’s set to
True
, detaching will not be performed. Default:False
.
- Returns:
- dict:
a dictionary containing a whole state of the module
Example:
>>> module.state_dict().keys() ['bias', 'weight']
- to(*args, **kwargs)¶
Moves and/or casts the parameters and buffers.
This can be called as
- to(device=None, dtype=None, non_blocking=False)
- to(dtype, non_blocking=False)
- to(tensor, non_blocking=False)
- to(memory_format=torch.channels_last)
Its signature is similar to
torch.Tensor.to()
, but only accepts floating point or complexdtype
s. In addition, this method will only cast the floating point or complex parameters and buffers todtype
(if given). The integral parameters and buffers will be moveddevice
, if that is given, but with dtypes unchanged. Whennon_blocking
is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.See below for examples.
Note
This method modifies the module in-place.
- Args:
- device (
torch.device
): the desired device of the parameters and buffers in this module
- dtype (
torch.dtype
): the desired floating point or complex dtype of the parameters and buffers in this module
- tensor (torch.Tensor): Tensor whose dtype and device are the desired
dtype and device for all parameters and buffers in this module
- memory_format (
torch.memory_format
): the desired memory format for 4D parameters and buffers in this module (keyword only argument)
- device (
- Returns:
Module: self
Examples:
>>> linear = nn.Linear(2, 2) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]]) >>> linear.to(torch.double) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]], dtype=torch.float64) >>> gpu1 = torch.device("cuda:1") >>> linear.to(gpu1, dtype=torch.half, non_blocking=True) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1') >>> cpu = torch.device("cpu") >>> linear.to(cpu) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16) >>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble) >>> linear.weight Parameter containing: tensor([[ 0.3741+0.j, 0.2382+0.j], [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128) >>> linear(torch.ones(3, 2, dtype=torch.cdouble)) tensor([[0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
- to_empty(*, device)¶
Moves the parameters and buffers to the specified device without copying storage.
- Args:
- device (
torch.device
): The desired device of the parameters and buffers in this module.
- device (
- Returns:
Module: self
- Return type
~T
- train(mode=True)¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Args:
- mode (bool): whether to set training mode (
True
) or evaluation mode (
False
). Default:True
.
- mode (bool): whether to set training mode (
- Returns:
Module: self
- Return type
~T
- type(dst_type)¶
Casts all parameters and buffers to
dst_type
.Note
This method modifies the module in-place.
- Args:
dst_type (type or string): the desired type
- Returns:
Module: self
- Return type
~T
- property window_type: str¶
- Return type
str
- xpu(device=None)¶
Moves all model parameters and buffers to the XPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized.
Note
This method modifies the module in-place.
- Arguments:
- device (int, optional): if specified, all parameters will be
copied to that device
- Returns:
Module: self
- Return type
~T
- zero_grad(set_to_none=False)¶
Sets gradients of all model parameters to zero. See similar function under
torch.optim.Optimizer
for more context.- Args:
- set_to_none (bool): instead of setting to zero, set the grads to None.
See
torch.optim.Optimizer.zero_grad()
for details.
- Return type
None
- training: bool¶
- class lhotse.features.kaldi.layers.Wav2MFCC(sampling_rate=16000, frame_length=0.025, frame_shift=0.01, round_to_power_of_two=True, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', dither=0.0, snip_edges=False, energy_floor=1e-10, raw_energy=True, use_energy=False, use_fft_mag=False, low_freq=20.0, high_freq=- 400.0, num_filters=23, norm_filters=False, num_ceps=13, cepstral_lifter=22, torchaudio_compatible_mel_scale=True)[source]¶
Apply standard Kaldi preprocessing (dithering, removing DC offset, pre-emphasis, etc.) on the input waveforms and compute their Mel-Frequency Cepstral Coefficients (MFCC).
Example:
>>> x = torch.randn(1, 16000, dtype=torch.float32) >>> x.shape torch.Size([1, 16000]) >>> t = Wav2MFCC() >>> t(x).shape torch.Size([1, 100, 13])
The input is a tensor of shape
(batch_size, num_samples)
. The output is a tensor of shape(batch_size, num_frames, num_ceps)
.- __init__(sampling_rate=16000, frame_length=0.025, frame_shift=0.01, round_to_power_of_two=True, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', dither=0.0, snip_edges=False, energy_floor=1e-10, raw_energy=True, use_energy=False, use_fft_mag=False, low_freq=20.0, high_freq=- 400.0, num_filters=23, norm_filters=False, num_ceps=13, cepstral_lifter=22, torchaudio_compatible_mel_scale=True)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- static make_lifter(N, Q)[source]¶
Makes the liftering function
- Args:
N: Number of cepstral coefficients. Q: Liftering parameter
- Returns:
Liftering vector.
- T_destination¶
alias of TypeVar(‘T_destination’, bound=
Dict
[str
,Any
])
- add_module(name, module)¶
Adds a child module to the current module.
The module can be accessed as an attribute using the given name.
- Args:
- name (string): name of the child module. The child module can be
accessed from this module using the given name
module (Module): child module to be added to the module.
- Return type
None
- apply(fn)¶
Applies
fn
recursively to every submodule (as returned by.children()
) as well as self. Typical use includes initializing the parameters of a model (see also nn-init-doc).- Args:
fn (
Module
-> None): function to be applied to each submodule- Returns:
Module: self
Example:
>>> @torch.no_grad() >>> def init_weights(m): >>> print(m) >>> if type(m) == nn.Linear: >>> m.weight.fill_(1.0) >>> print(m.weight) >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2)) >>> net.apply(init_weights) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )
- Return type
~T
- bfloat16()¶
Casts all floating point parameters and buffers to
bfloat16
datatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- buffers(recurse=True)¶
Returns an iterator over module buffers.
- Args:
- recurse (bool): if True, then yields buffers of this module
and all submodules. Otherwise, yields only buffers that are direct members of this module.
- Yields:
torch.Tensor: module buffer
Example:
>>> for buf in model.buffers(): >>> print(type(buf), buf.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- Return type
Iterator
[Tensor
]
- children()¶
Returns an iterator over immediate children modules.
- Yields:
Module: a child module
- Return type
Iterator
[Module
]
- cpu()¶
Moves all model parameters and buffers to the CPU.
Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- cuda(device=None)¶
Moves all model parameters and buffers to the GPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.
Note
This method modifies the module in-place.
- Args:
- device (int, optional): if specified, all parameters will be
copied to that device
- Returns:
Module: self
- Return type
~T
- double()¶
Casts all floating point parameters and buffers to
double
datatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- dump_patches: bool = False¶
- eval()¶
Sets the module in evaluation mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.This is equivalent with
self.train(False)
.See locally-disable-grad-doc for a comparison between .eval() and several similar mechanisms that may be confused with it.
- Returns:
Module: self
- Return type
~T
- extra_repr()¶
Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- Return type
str
- float()¶
Casts all floating point parameters and buffers to
float
datatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- forward(x)¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.- Return type
Tensor
- get_buffer(target)¶
Returns the buffer given by
target
if it exists, otherwise throws an error.See the docstring for
get_submodule
for a more detailed explanation of this method’s functionality as well as how to correctly specifytarget
.- Args:
- target: The fully-qualified string name of the buffer
to look for. (See
get_submodule
for how to specify a fully-qualified string.)
- Returns:
torch.Tensor: The buffer referenced by
target
- Raises:
- AttributeError: If the target string references an invalid
path or resolves to something that is not a buffer
- Return type
Tensor
- get_extra_state()¶
Returns any extra state to include in the module’s state_dict. Implement this and a corresponding
set_extra_state()
for your module if you need to store extra state. This function is called when building the module’s state_dict().Note that extra state should be pickleable to ensure working serialization of the state_dict. We only provide provide backwards compatibility guarantees for serializing Tensors; other objects may break backwards compatibility if their serialized pickled form changes.
- Returns:
object: Any extra state to store in the module’s state_dict
- Return type
Any
- get_parameter(target)¶
Returns the parameter given by
target
if it exists, otherwise throws an error.See the docstring for
get_submodule
for a more detailed explanation of this method’s functionality as well as how to correctly specifytarget
.- Args:
- target: The fully-qualified string name of the Parameter
to look for. (See
get_submodule
for how to specify a fully-qualified string.)
- Returns:
torch.nn.Parameter: The Parameter referenced by
target
- Raises:
- AttributeError: If the target string references an invalid
path or resolves to something that is not an
nn.Parameter
- Return type
Parameter
- get_submodule(target)¶
Returns the submodule given by
target
if it exists, otherwise throws an error.For example, let’s say you have an
nn.Module
A
that looks like this:A( (net_b): Module( (net_c): Module( (conv): Conv2d(16, 33, kernel_size=(3, 3), stride=(2, 2)) ) (linear): Linear(in_features=100, out_features=200, bias=True) ) )
(The diagram shows an
nn.Module
A
.A
has a nested submodulenet_b
, which itself has two submodulesnet_c
andlinear
.net_c
then has a submoduleconv
.)To check whether or not we have the
linear
submodule, we would callget_submodule("net_b.linear")
. To check whether we have theconv
submodule, we would callget_submodule("net_b.net_c.conv")
.The runtime of
get_submodule
is bounded by the degree of module nesting intarget
. A query againstnamed_modules
achieves the same result, but it is O(N) in the number of transitive modules. So, for a simple check to see if some submodule exists,get_submodule
should always be used.- Args:
- target: The fully-qualified string name of the submodule
to look for. (See above example for how to specify a fully-qualified string.)
- Returns:
torch.nn.Module: The submodule referenced by
target
- Raises:
- AttributeError: If the target string references an invalid
path or resolves to something that is not an
nn.Module
- Return type
Module
- half()¶
Casts all floating point parameters and buffers to
half
datatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- Return type
~T
- ipu(device=None)¶
Moves all model parameters and buffers to the IPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on IPU while being optimized.
Note
This method modifies the module in-place.
- Arguments:
- device (int, optional): if specified, all parameters will be
copied to that device
- Returns:
Module: self
- Return type
~T
- load_state_dict(state_dict, strict=True)¶
Copies parameters and buffers from
state_dict
into this module and its descendants. Ifstrict
isTrue
, then the keys ofstate_dict
must exactly match the keys returned by this module’sstate_dict()
function.- Args:
- state_dict (dict): a dict containing parameters and
persistent buffers.
- strict (bool, optional): whether to strictly enforce that the keys
in
state_dict
match the keys returned by this module’sstate_dict()
function. Default:True
- Returns:
NamedTuple
withmissing_keys
andunexpected_keys
fields:missing_keys is a list of str containing the missing keys
unexpected_keys is a list of str containing the unexpected keys
- Note:
If a parameter or buffer is registered as
None
and its corresponding key exists instate_dict
,load_state_dict()
will raise aRuntimeError
.
- modules()¶
Returns an iterator over all modules in the network.
- Yields:
Module: a module in the network
- Note:
Duplicate modules are returned only once. In the following example,
l
will be returned only once.
Example:
>>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.modules()): print(idx, '->', m) 0 -> Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) 1 -> Linear(in_features=2, out_features=2, bias=True)
- Return type
Iterator
[Module
]
- named_buffers(prefix='', recurse=True)¶
Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
- Args:
prefix (str): prefix to prepend to all buffer names. recurse (bool): if True, then yields buffers of this module
and all submodules. Otherwise, yields only buffers that are direct members of this module.
- Yields:
(string, torch.Tensor): Tuple containing the name and buffer
Example:
>>> for name, buf in self.named_buffers(): >>> if name in ['running_var']: >>> print(buf.size())
- Return type
Iterator
[Tuple
[str
,Tensor
]]
- named_children()¶
Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
- Yields:
(string, Module): Tuple containing a name and child module
Example:
>>> for name, module in model.named_children(): >>> if name in ['conv4', 'conv5']: >>> print(module)
- Return type
Iterator
[Tuple
[str
,Module
]]
- named_modules(memo=None, prefix='', remove_duplicate=True)¶
Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
- Args:
memo: a memo to store the set of modules already added to the result prefix: a prefix that will be added to the name of the module remove_duplicate: whether to remove the duplicated module instances in the result
or not
- Yields:
(string, Module): Tuple of name and module
- Note:
Duplicate modules are returned only once. In the following example,
l
will be returned only once.
Example:
>>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.named_modules()): print(idx, '->', m) 0 -> ('', Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )) 1 -> ('0', Linear(in_features=2, out_features=2, bias=True))
- named_parameters(prefix='', recurse=True)¶
Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
- Args:
prefix (str): prefix to prepend to all parameter names. recurse (bool): if True, then yields parameters of this module
and all submodules. Otherwise, yields only parameters that are direct members of this module.
- Yields:
(string, Parameter): Tuple containing the name and parameter
Example:
>>> for name, param in self.named_parameters(): >>> if name in ['bias']: >>> print(param.size())
- Return type
Iterator
[Tuple
[str
,Parameter
]]
- online_inference(x, context=None)¶
- Return type
Tuple
[Tensor
,Tensor
]
- parameters(recurse=True)¶
Returns an iterator over module parameters.
This is typically passed to an optimizer.
- Args:
- recurse (bool): if True, then yields parameters of this module
and all submodules. Otherwise, yields only parameters that are direct members of this module.
- Yields:
Parameter: module parameter
Example:
>>> for param in model.parameters(): >>> print(type(param), param.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- Return type
Iterator
[Parameter
]
- register_backward_hook(hook)¶
Registers a backward hook on the module.
This function is deprecated in favor of
register_full_backward_hook()
and the behavior of this function will change in future versions.- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- Return type
RemovableHandle
- register_buffer(name, tensor, persistent=True)¶
Adds a buffer to the module.
This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s
running_mean
is not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by settingpersistent
toFalse
. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’sstate_dict
.Buffers can be accessed as attributes using given names.
- Args:
- name (string): name of the buffer. The buffer can be accessed
from this module using the given name
- tensor (Tensor or None): buffer to be registered. If
None
, then operations that run on buffers, such as
cuda
, are ignored. IfNone
, the buffer is not included in the module’sstate_dict
.- persistent (bool): whether the buffer is part of this module’s
Example:
>>> self.register_buffer('running_mean', torch.zeros(num_features))
- Return type
None
- register_forward_hook(hook)¶
Registers a forward hook on the module.
The hook will be called every time after
forward()
has computed an output. It should have the following signature:hook(module, input, output) -> None or modified output
The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the
forward
. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called afterforward()
is called.- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- Return type
RemovableHandle
- register_forward_pre_hook(hook)¶
Registers a forward pre-hook on the module.
The hook will be called every time before
forward()
is invoked. It should have the following signature:hook(module, input) -> None or modified input
The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the
forward
. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple).- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- Return type
RemovableHandle
- register_full_backward_hook(hook)¶
Registers a backward hook on the module.
The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature:
hook(module, grad_input, grad_output) -> tuple(Tensor) or None
The
grad_input
andgrad_output
are tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place ofgrad_input
in subsequent computations.grad_input
will only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries ingrad_input
andgrad_output
will beNone
for all non-Tensor arguments.For technical reasons, when this hook is applied to a Module, its forward function will receive a view of each Tensor passed to the Module. Similarly the caller will receive a view of each Tensor returned by the Module’s forward function.
Warning
Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error.
- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- Return type
RemovableHandle
- register_load_state_dict_post_hook(hook)¶
Registers a post hook to be run after module’s
load_state_dict
is called.- It should have the following signature::
hook(module, incompatible_keys) -> None
The
module
argument is the current module that this hook is registered on, and theincompatible_keys
argument is aNamedTuple
consisting of attributesmissing_keys
andunexpected_keys
.missing_keys
is alist
ofstr
containing the missing keys andunexpected_keys
is alist
ofstr
containing the unexpected keys.The given incompatible_keys can be modified inplace if needed.
Note that the checks performed when calling
load_state_dict()
withstrict=True
are affected by modifications the hook makes tomissing_keys
orunexpected_keys
, as expected. Additions to either set of keys will result in an error being thrown whenstrict=True
, and clearning out both missing and unexpected keys will avoid an error.- Returns:
torch.utils.hooks.RemovableHandle
:a handle that can be used to remove the added hook by calling
handle.remove()
- register_module(name, module)¶
Alias for
add_module()
.- Return type
None
- register_parameter(name, param)¶
Adds a parameter to the module.
The parameter can be accessed as an attribute using given name.
- Args:
- name (string): name of the parameter. The parameter can be accessed
from this module using the given name
- param (Parameter or None): parameter to be added to the module. If
None
, then operations that run on parameters, such ascuda
, are ignored. IfNone
, the parameter is not included in the module’sstate_dict
.
- Return type
None
- property remove_dc_offset: bool¶
- Return type
bool
- requires_grad_(requires_grad=True)¶
Change if autograd should record operations on parameters in this module.
This method sets the parameters’
requires_grad
attributes in-place.This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).
See locally-disable-grad-doc for a comparison between .requires_grad_() and several similar mechanisms that may be confused with it.
- Args:
- requires_grad (bool): whether autograd should record operations on
parameters in this module. Default:
True
.
- Returns:
Module: self
- Return type
~T
- property sampling_rate: int¶
- Return type
int
- set_extra_state(state)¶
This function is called from
load_state_dict()
to handle any extra state found within the state_dict. Implement this function and a correspondingget_extra_state()
for your module if you need to store extra state within its state_dict.- Args:
state (dict): Extra state from the state_dict
See
torch.Tensor.share_memory_()
- Return type
~T
- state_dict(*args, destination=None, prefix='', keep_vars=False)¶
Returns a dictionary containing a whole state of the module.
Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names. Parameters and buffers set to
None
are not included.Warning
Currently
state_dict()
also accepts positional arguments fordestination
,prefix
andkeep_vars
in order. However, this is being deprecated and keyword arguments will be enforced in future releases.Warning
Please avoid the use of argument
destination
as it is not designed for end-users.- Args:
- destination (dict, optional): If provided, the state of module will
be updated into the dict and the same object is returned. Otherwise, an
OrderedDict
will be created and returned. Default:None
.- prefix (str, optional): a prefix added to parameter and buffer
names to compose the keys in state_dict. Default:
''
.- keep_vars (bool, optional): by default the
Tensor
s returned in the state dict are detached from autograd. If it’s set to
True
, detaching will not be performed. Default:False
.
- Returns:
- dict:
a dictionary containing a whole state of the module
Example:
>>> module.state_dict().keys() ['bias', 'weight']
- to(*args, **kwargs)¶
Moves and/or casts the parameters and buffers.
This can be called as
- to(device=None, dtype=None, non_blocking=False)
- to(dtype, non_blocking=False)
- to(tensor, non_blocking=False)
- to(memory_format=torch.channels_last)
Its signature is similar to
torch.Tensor.to()
, but only accepts floating point or complexdtype
s. In addition, this method will only cast the floating point or complex parameters and buffers todtype
(if given). The integral parameters and buffers will be moveddevice
, if that is given, but with dtypes unchanged. Whennon_blocking
is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.See below for examples.
Note
This method modifies the module in-place.
- Args:
- device (
torch.device
): the desired device of the parameters and buffers in this module
- dtype (
torch.dtype
): the desired floating point or complex dtype of the parameters and buffers in this module
- tensor (torch.Tensor): Tensor whose dtype and device are the desired
dtype and device for all parameters and buffers in this module
- memory_format (
torch.memory_format
): the desired memory format for 4D parameters and buffers in this module (keyword only argument)
- device (
- Returns:
Module: self
Examples:
>>> linear = nn.Linear(2, 2) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]]) >>> linear.to(torch.double) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]], dtype=torch.float64) >>> gpu1 = torch.device("cuda:1") >>> linear.to(gpu1, dtype=torch.half, non_blocking=True) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1') >>> cpu = torch.device("cpu") >>> linear.to(cpu) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16) >>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble) >>> linear.weight Parameter containing: tensor([[ 0.3741+0.j, 0.2382+0.j], [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128) >>> linear(torch.ones(3, 2, dtype=torch.cdouble)) tensor([[0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
- to_empty(*, device)¶
Moves the parameters and buffers to the specified device without copying storage.
- Args:
- device (
torch.device
): The desired device of the parameters and buffers in this module.
- device (
- Returns:
Module: self
- Return type
~T
- train(mode=True)¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Args:
- mode (bool): whether to set training mode (
True
) or evaluation mode (
False
). Default:True
.
- mode (bool): whether to set training mode (
- Returns:
Module: self
- Return type
~T
- type(dst_type)¶
Casts all parameters and buffers to
dst_type
.Note
This method modifies the module in-place.
- Args:
dst_type (type or string): the desired type
- Returns:
Module: self
- Return type
~T
- property window_type: str¶
- Return type
str
- xpu(device=None)¶
Moves all model parameters and buffers to the XPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized.
Note
This method modifies the module in-place.
- Arguments:
- device (int, optional): if specified, all parameters will be
copied to that device
- Returns:
Module: self
- Return type
~T
- zero_grad(set_to_none=False)¶
Sets gradients of all model parameters to zero. See similar function under
torch.optim.Optimizer
for more context.- Args:
- set_to_none (bool): instead of setting to zero, set the grads to None.
See
torch.optim.Optimizer.zero_grad()
for details.
- Return type
None
- training: bool¶
- lhotse.features.kaldi.layers.create_mel_scale(num_filters, fft_length, sampling_rate, low_freq=0, high_freq=None, norm_filters=True)[source]¶
- Return type
Tensor
Torchaudio feature extractors¶
- class lhotse.features.fbank.TorchaudioFbankConfig(dither=0.0, window_type='povey', frame_length=0.025, frame_shift=0.01, remove_dc_offset=True, round_to_power_of_two=True, energy_floor=1e-10, min_duration=0.0, preemphasis_coefficient=0.97, raw_energy=True, low_freq=20.0, high_freq=- 400.0, num_mel_bins=80, use_energy=False, vtln_low=100.0, vtln_high=- 500.0, vtln_warp=1.0)[source]¶
- dither: float = 0.0¶
- window_type: str = 'povey'¶
- frame_length: float = 0.025¶
- frame_shift: float = 0.01¶
- remove_dc_offset: bool = True¶
- round_to_power_of_two: bool = True¶
- energy_floor: float = 1e-10¶
- min_duration: float = 0.0¶
- preemphasis_coefficient: float = 0.97¶
- raw_energy: bool = True¶
- low_freq: float = 20.0¶
- high_freq: float = -400.0¶
- num_mel_bins: int = 80¶
- use_energy: bool = False¶
- vtln_low: float = 100.0¶
- vtln_high: float = -500.0¶
- vtln_warp: float = 1.0¶
- __init__(dither=0.0, window_type='povey', frame_length=0.025, frame_shift=0.01, remove_dc_offset=True, round_to_power_of_two=True, energy_floor=1e-10, min_duration=0.0, preemphasis_coefficient=0.97, raw_energy=True, low_freq=20.0, high_freq=- 400.0, num_mel_bins=80, use_energy=False, vtln_low=100.0, vtln_high=- 500.0, vtln_warp=1.0)¶
- class lhotse.features.fbank.TorchaudioFbank(config=None)[source]¶
Log Mel energy filter bank feature extractor based on
torchaudio.compliance.kaldi.fbank
function.- name = 'fbank'¶
- config_type¶
- static mix(features_a, features_b, energy_scaling_factor_b)[source]¶
Perform feature-domain mix of two signals,
a
andb
, and return the mixed signal.- Parameters
features_a (
ndarray
) – Left-hand side (reference) signal.features_b (
ndarray
) – Right-hand side (mixed-in) signal.energy_scaling_factor_b (
float
) – A scaling factor forfeatures_b
energy. It is used to achieve a specific SNR. E.g. to mix with an SNR of 10dB when bothfeatures_a
andfeatures_b
energies are 100, thefeatures_b
signal energy needs to be scaled by 0.1. Since different features (e.g. spectrogram, fbank, MFCC) require different combination of transformations (e.g. exp, log, sqrt, pow) to allow mixing of two signals, the exact place where to applyenergy_scaling_factor_b
to the signal is determined by the implementer.
- Return type
ndarray
- Returns
A mixed feature matrix.
- static compute_energy(features)[source]¶
Compute the total energy of a feature matrix. How the energy is computed depends on a particular type of features. It is expected that when implemented,
compute_energy
will never return zero.- Parameters
features (
ndarray
) – A feature matrix.- Return type
float
- Returns
A positive float value of the signal energy.
- __init__(config=None)¶
- property device: Union[str, torch.device]¶
- Return type
Union
[str
,device
]
- extract(samples, sampling_rate)¶
Defines how to extract features using a numpy ndarray of audio samples and the sampling rate.
- Return type
ndarray
- Returns
a numpy ndarray representing the feature matrix.
- extract_batch(samples, sampling_rate)¶
Performs batch extraction. It is not guaranteed to be faster than
FeatureExtractor.extract()
– it depends on whether the implementation of a particular feature extractor supports accelerated batch computation.Note
Unless overridden by child classes, it defaults to sequentially calling
FeatureExtractor.extract()
on the inputs.Note
This method should support variable length inputs.
- Return type
Union
[ndarray
,Tensor
,List
[ndarray
],List
[Tensor
]]
- extract_from_recording_and_store(recording, storage, offset=0, duration=None, channels=None, augment_fn=None)¶
Extract the features from a
Recording
in a full pipeline:load audio from disk;
optionally, perform audio augmentation;
extract the features;
save them to disk in a specified directory;
return a
Features
object with a description of the extracted features and the source data used.
- Parameters
recording (
Recording
) – aRecording
that specifies what’s the input audio.storage (
FeaturesWriter
) – aFeaturesWriter
object that will handle storing the feature matrices.offset (
float
) – an optional offset in seconds for where to start reading the recording.duration (
Optional
[float
]) – an optional duration specifying how much audio to load from the recording.channels (
Union
[int
,List
[int
],None
]) – an optional int or list of ints, specifying the channels; by default, all channels will be used.augment_fn (
Optional
[Callable
[[ndarray
,int
],ndarray
]]) – an optionalWavAugmenter
instance to modify the waveform before feature extraction.
- Return type
- Returns
a
Features
manifest item for the extracted feature matrix.
- extract_from_samples_and_store(samples, storage, sampling_rate, offset=0, channel=None, augment_fn=None)¶
Extract the features from an array of audio samples in a full pipeline:
optional audio augmentation;
extract the features;
save them to disk in a specified directory;
return a
Features
object with a description of the extracted features.
Note, unlike in
extract_from_recording_and_store
, the returnedFeatures
object might not be suitable to store in aFeatureSet
, as it does not reference any particularRecording
. Instead, this method is useful when extracting features from cuts - especiallyMixedCut
instances, which may be created from multiple recordings and channels.- Parameters
samples (
ndarray
) – a numpy ndarray with the audio samples.sampling_rate (
int
) – integer sampling rate ofsamples
.storage (
FeaturesWriter
) – aFeaturesWriter
object that will handle storing the feature matrices.offset (
float
) – an offset in seconds for where to start reading the recording - when used forCut
feature extraction, must be equal toCut.start
.channel (
Union
[int
,List
[int
],None
]) – an optional channel number(s) to insert intoFeatures
manifest.augment_fn (
Optional
[Callable
[[ndarray
,int
],ndarray
]]) – an optionalWavAugmenter
instance to modify the waveform before feature extraction.
- Return type
- Returns
a
Features
manifest item for the extracted feature matrix (it is not written to disk).
- property frame_shift: float¶
- Return type
float
- classmethod from_dict(data)¶
- Return type
- classmethod from_yaml(path)¶
- Return type
- to_dict()¶
- Return type
Dict
[str
,Any
]
- to_yaml(path)¶
- class lhotse.features.mfcc.TorchaudioMfccConfig(dither=0.0, window_type='povey', frame_length=0.025, frame_shift=0.01, remove_dc_offset=True, round_to_power_of_two=True, energy_floor=1e-10, min_duration=0.0, preemphasis_coefficient=0.97, raw_energy=True, low_freq=20.0, high_freq=- 400.0, num_mel_bins=23, use_energy=False, vtln_low=100.0, vtln_high=- 500.0, vtln_warp=1.0, cepstral_lifter=22.0, num_ceps=13)[source]¶
- dither: float = 0.0¶
- window_type: str = 'povey'¶
- frame_length: float = 0.025¶
- frame_shift: float = 0.01¶
- remove_dc_offset: bool = True¶
- round_to_power_of_two: bool = True¶
- energy_floor: float = 1e-10¶
- min_duration: float = 0.0¶
- preemphasis_coefficient: float = 0.97¶
- raw_energy: bool = True¶
- low_freq: float = 20.0¶
- high_freq: float = -400.0¶
- num_mel_bins: int = 23¶
- use_energy: bool = False¶
- vtln_low: float = 100.0¶
- vtln_high: float = -500.0¶
- vtln_warp: float = 1.0¶
- cepstral_lifter: float = 22.0¶
- num_ceps: int = 13¶
- __init__(dither=0.0, window_type='povey', frame_length=0.025, frame_shift=0.01, remove_dc_offset=True, round_to_power_of_two=True, energy_floor=1e-10, min_duration=0.0, preemphasis_coefficient=0.97, raw_energy=True, low_freq=20.0, high_freq=- 400.0, num_mel_bins=23, use_energy=False, vtln_low=100.0, vtln_high=- 500.0, vtln_warp=1.0, cepstral_lifter=22.0, num_ceps=13)¶
- class lhotse.features.mfcc.TorchaudioMfcc(config=None)[source]¶
MFCC feature extractor based on
torchaudio.compliance.kaldi.mfcc
function.- name = 'mfcc'¶
- config_type¶
- __init__(config=None)¶
- static compute_energy(features)¶
Compute the total energy of a feature matrix. How the energy is computed depends on a particular type of features. It is expected that when implemented,
compute_energy
will never return zero.- Parameters
features (
ndarray
) – A feature matrix.- Return type
float
- Returns
A positive float value of the signal energy.
- property device: Union[str, torch.device]¶
- Return type
Union
[str
,device
]
- extract(samples, sampling_rate)¶
Defines how to extract features using a numpy ndarray of audio samples and the sampling rate.
- Return type
ndarray
- Returns
a numpy ndarray representing the feature matrix.
- extract_batch(samples, sampling_rate)¶
Performs batch extraction. It is not guaranteed to be faster than
FeatureExtractor.extract()
– it depends on whether the implementation of a particular feature extractor supports accelerated batch computation.Note
Unless overridden by child classes, it defaults to sequentially calling
FeatureExtractor.extract()
on the inputs.Note
This method should support variable length inputs.
- Return type
Union
[ndarray
,Tensor
,List
[ndarray
],List
[Tensor
]]
- extract_from_recording_and_store(recording, storage, offset=0, duration=None, channels=None, augment_fn=None)¶
Extract the features from a
Recording
in a full pipeline:load audio from disk;
optionally, perform audio augmentation;
extract the features;
save them to disk in a specified directory;
return a
Features
object with a description of the extracted features and the source data used.
- Parameters
recording (
Recording
) – aRecording
that specifies what’s the input audio.storage (
FeaturesWriter
) – aFeaturesWriter
object that will handle storing the feature matrices.offset (
float
) – an optional offset in seconds for where to start reading the recording.duration (
Optional
[float
]) – an optional duration specifying how much audio to load from the recording.channels (
Union
[int
,List
[int
],None
]) – an optional int or list of ints, specifying the channels; by default, all channels will be used.augment_fn (
Optional
[Callable
[[ndarray
,int
],ndarray
]]) – an optionalWavAugmenter
instance to modify the waveform before feature extraction.
- Return type
- Returns
a
Features
manifest item for the extracted feature matrix.
- extract_from_samples_and_store(samples, storage, sampling_rate, offset=0, channel=None, augment_fn=None)¶
Extract the features from an array of audio samples in a full pipeline:
optional audio augmentation;
extract the features;
save them to disk in a specified directory;
return a
Features
object with a description of the extracted features.
Note, unlike in
extract_from_recording_and_store
, the returnedFeatures
object might not be suitable to store in aFeatureSet
, as it does not reference any particularRecording
. Instead, this method is useful when extracting features from cuts - especiallyMixedCut
instances, which may be created from multiple recordings and channels.- Parameters
samples (
ndarray
) – a numpy ndarray with the audio samples.sampling_rate (
int
) – integer sampling rate ofsamples
.storage (
FeaturesWriter
) – aFeaturesWriter
object that will handle storing the feature matrices.offset (
float
) – an offset in seconds for where to start reading the recording - when used forCut
feature extraction, must be equal toCut.start
.channel (
Union
[int
,List
[int
],None
]) – an optional channel number(s) to insert intoFeatures
manifest.augment_fn (
Optional
[Callable
[[ndarray
,int
],ndarray
]]) – an optionalWavAugmenter
instance to modify the waveform before feature extraction.
- Return type
- Returns
a
Features
manifest item for the extracted feature matrix (it is not written to disk).
- property frame_shift: float¶
- Return type
float
- classmethod from_dict(data)¶
- Return type
- classmethod from_yaml(path)¶
- Return type
- static mix(features_a, features_b, energy_scaling_factor_b)¶
Perform feature-domain mix of two signals,
a
andb
, and return the mixed signal.- Parameters
features_a (
ndarray
) – Left-hand side (reference) signal.features_b (
ndarray
) – Right-hand side (mixed-in) signal.energy_scaling_factor_b (
float
) – A scaling factor forfeatures_b
energy. It is used to achieve a specific SNR. E.g. to mix with an SNR of 10dB when bothfeatures_a
andfeatures_b
energies are 100, thefeatures_b
signal energy needs to be scaled by 0.1. Since different features (e.g. spectrogram, fbank, MFCC) require different combination of transformations (e.g. exp, log, sqrt, pow) to allow mixing of two signals, the exact place where to applyenergy_scaling_factor_b
to the signal is determined by the implementer.
- Return type
ndarray
- Returns
A mixed feature matrix.
- to_dict()¶
- Return type
Dict
[str
,Any
]
- to_yaml(path)¶
- class lhotse.features.spectrogram.SpectrogramConfig(dither=0.0, window_type='povey', frame_length=0.025, frame_shift=0.01, remove_dc_offset=True, round_to_power_of_two=True, energy_floor=1e-10, min_duration=0.0, preemphasis_coefficient=0.97, raw_energy=True)[source]¶
- dither: float = 0.0¶
- window_type: str = 'povey'¶
- frame_length: float = 0.025¶
- frame_shift: float = 0.01¶
- remove_dc_offset: bool = True¶
- round_to_power_of_two: bool = True¶
- energy_floor: float = 1e-10¶
- min_duration: float = 0.0¶
- preemphasis_coefficient: float = 0.97¶
- raw_energy: bool = True¶
- __init__(dither=0.0, window_type='povey', frame_length=0.025, frame_shift=0.01, remove_dc_offset=True, round_to_power_of_two=True, energy_floor=1e-10, min_duration=0.0, preemphasis_coefficient=0.97, raw_energy=True)¶
- class lhotse.features.spectrogram.Spectrogram(config=None)[source]¶
Log spectrogram feature extractor based on
torchaudio.compliance.kaldi.spectrogram
function.- name = 'spectrogram'¶
- config_type¶
- static mix(features_a, features_b, energy_scaling_factor_b)[source]¶
Perform feature-domain mix of two signals,
a
andb
, and return the mixed signal.- Parameters
features_a (
ndarray
) – Left-hand side (reference) signal.features_b (
ndarray
) – Right-hand side (mixed-in) signal.energy_scaling_factor_b (
float
) – A scaling factor forfeatures_b
energy. It is used to achieve a specific SNR. E.g. to mix with an SNR of 10dB when bothfeatures_a
andfeatures_b
energies are 100, thefeatures_b
signal energy needs to be scaled by 0.1. Since different features (e.g. spectrogram, fbank, MFCC) require different combination of transformations (e.g. exp, log, sqrt, pow) to allow mixing of two signals, the exact place where to applyenergy_scaling_factor_b
to the signal is determined by the implementer.
- Return type
ndarray
- Returns
A mixed feature matrix.
- static compute_energy(features)[source]¶
Compute the total energy of a feature matrix. How the energy is computed depends on a particular type of features. It is expected that when implemented,
compute_energy
will never return zero.- Parameters
features (
ndarray
) – A feature matrix.- Return type
float
- Returns
A positive float value of the signal energy.
- __init__(config=None)¶
- property device: Union[str, torch.device]¶
- Return type
Union
[str
,device
]
- extract(samples, sampling_rate)¶
Defines how to extract features using a numpy ndarray of audio samples and the sampling rate.
- Return type
ndarray
- Returns
a numpy ndarray representing the feature matrix.
- extract_batch(samples, sampling_rate)¶
Performs batch extraction. It is not guaranteed to be faster than
FeatureExtractor.extract()
– it depends on whether the implementation of a particular feature extractor supports accelerated batch computation.Note
Unless overridden by child classes, it defaults to sequentially calling
FeatureExtractor.extract()
on the inputs.Note
This method should support variable length inputs.
- Return type
Union
[ndarray
,Tensor
,List
[ndarray
],List
[Tensor
]]
- extract_from_recording_and_store(recording, storage, offset=0, duration=None, channels=None, augment_fn=None)¶
Extract the features from a
Recording
in a full pipeline:load audio from disk;
optionally, perform audio augmentation;
extract the features;
save them to disk in a specified directory;
return a
Features
object with a description of the extracted features and the source data used.
- Parameters
recording (
Recording
) – aRecording
that specifies what’s the input audio.storage (
FeaturesWriter
) – aFeaturesWriter
object that will handle storing the feature matrices.offset (
float
) – an optional offset in seconds for where to start reading the recording.duration (
Optional
[float
]) – an optional duration specifying how much audio to load from the recording.channels (
Union
[int
,List
[int
],None
]) – an optional int or list of ints, specifying the channels; by default, all channels will be used.augment_fn (
Optional
[Callable
[[ndarray
,int
],ndarray
]]) – an optionalWavAugmenter
instance to modify the waveform before feature extraction.
- Return type
- Returns
a
Features
manifest item for the extracted feature matrix.
- extract_from_samples_and_store(samples, storage, sampling_rate, offset=0, channel=None, augment_fn=None)¶
Extract the features from an array of audio samples in a full pipeline:
optional audio augmentation;
extract the features;
save them to disk in a specified directory;
return a
Features
object with a description of the extracted features.
Note, unlike in
extract_from_recording_and_store
, the returnedFeatures
object might not be suitable to store in aFeatureSet
, as it does not reference any particularRecording
. Instead, this method is useful when extracting features from cuts - especiallyMixedCut
instances, which may be created from multiple recordings and channels.- Parameters
samples (
ndarray
) – a numpy ndarray with the audio samples.sampling_rate (
int
) – integer sampling rate ofsamples
.storage (
FeaturesWriter
) – aFeaturesWriter
object that will handle storing the feature matrices.offset (
float
) – an offset in seconds for where to start reading the recording - when used forCut
feature extraction, must be equal toCut.start
.channel (
Union
[int
,List
[int
],None
]) – an optional channel number(s) to insert intoFeatures
manifest.augment_fn (
Optional
[Callable
[[ndarray
,int
],ndarray
]]) – an optionalWavAugmenter
instance to modify the waveform before feature extraction.
- Return type
- Returns
a
Features
manifest item for the extracted feature matrix (it is not written to disk).
- property frame_shift: float¶
- Return type
float
- classmethod from_dict(data)¶
- Return type
- classmethod from_yaml(path)¶
- Return type
- to_dict()¶
- Return type
Dict
[str
,Any
]
- to_yaml(path)¶
Librosa filter-bank¶
- class lhotse.features.librosa_fbank.LibrosaFbankConfig(sampling_rate=22050, fft_size=1024, hop_size=256, win_length=None, window='hann', num_mel_bins=80, fmin=80, fmax=7600)[source]¶
Default librosa config with values consistent with various TTS projects.
This config is intended for use with popular TTS projects such as [ParallelWaveGAN](https://github.com/kan-bayashi/ParallelWaveGAN) Warning: You may need to normalize your features.
- sampling_rate: int = 22050¶
- fft_size: int = 1024¶
- hop_size: int = 256¶
- win_length: int = None¶
- window: str = 'hann'¶
- num_mel_bins: int = 80¶
- fmin: int = 80¶
- fmax: int = 7600¶
- __init__(sampling_rate=22050, fft_size=1024, hop_size=256, win_length=None, window='hann', num_mel_bins=80, fmin=80, fmax=7600)¶
- lhotse.features.librosa_fbank.pad_or_truncate_features(feats, expected_num_frames, abs_tol=1, pad_value=- 23.025850929940457)[source]¶
- lhotse.features.librosa_fbank.logmelfilterbank(audio, sampling_rate, fft_size=1024, hop_size=256, win_length=None, window='hann', num_mel_bins=80, fmin=80, fmax=7600, eps=1e-10)[source]¶
Compute log-Mel filterbank feature.
- Args:
audio (ndarray): Audio signal (T,). sampling_rate (int): Sampling rate. fft_size (int): FFT size. hop_size (int): Hop size. win_length (int): Window length. If set to None, it will be the same as fft_size. window (str): Window function type. num_mel_bins (int): Number of mel basis. fmin (int): Minimum frequency in mel basis calculation. fmax (int): Maximum frequency in mel basis calculation. eps (float): Epsilon value to avoid inf in log calculation.
- Returns:
ndarray: Log Mel filterbank feature (#source_feats, num_mel_bins).
- class lhotse.features.librosa_fbank.LibrosaFbank(config=None)[source]¶
Librosa fbank feature extractor
Differs from Fbank extractor in that it uses librosa backend for stft and mel scale calculations. It can be easily configured to be compatible with existing speech-related projects that use librosa features.
- name = 'librosa-fbank'¶
- config_type¶
- property frame_shift: float¶
- Return type
float
- extract(samples, sampling_rate)[source]¶
Defines how to extract features using a numpy ndarray of audio samples and the sampling rate.
- Return type
ndarray
- Returns
a numpy ndarray representing the feature matrix.
- static mix(features_a, features_b, energy_scaling_factor_b)[source]¶
Perform feature-domain mix of two signals,
a
andb
, and return the mixed signal.- Parameters
features_a (
ndarray
) – Left-hand side (reference) signal.features_b (
ndarray
) – Right-hand side (mixed-in) signal.energy_scaling_factor_b (
float
) – A scaling factor forfeatures_b
energy. It is used to achieve a specific SNR. E.g. to mix with an SNR of 10dB when bothfeatures_a
andfeatures_b
energies are 100, thefeatures_b
signal energy needs to be scaled by 0.1. Since different features (e.g. spectrogram, fbank, MFCC) require different combination of transformations (e.g. exp, log, sqrt, pow) to allow mixing of two signals, the exact place where to applyenergy_scaling_factor_b
to the signal is determined by the implementer.
- Return type
ndarray
- Returns
A mixed feature matrix.
- static compute_energy(features)[source]¶
Compute the total energy of a feature matrix. How the energy is computed depends on a particular type of features. It is expected that when implemented,
compute_energy
will never return zero.- Parameters
features (
ndarray
) – A feature matrix.- Return type
float
- Returns
A positive float value of the signal energy.
- __init__(config=None)¶
- property device: Union[str, torch.device]¶
- Return type
Union
[str
,device
]
- extract_batch(samples, sampling_rate)¶
Performs batch extraction. It is not guaranteed to be faster than
FeatureExtractor.extract()
– it depends on whether the implementation of a particular feature extractor supports accelerated batch computation.Note
Unless overridden by child classes, it defaults to sequentially calling
FeatureExtractor.extract()
on the inputs.Note
This method should support variable length inputs.
- Return type
Union
[ndarray
,Tensor
,List
[ndarray
],List
[Tensor
]]
- extract_from_recording_and_store(recording, storage, offset=0, duration=None, channels=None, augment_fn=None)¶
Extract the features from a
Recording
in a full pipeline:load audio from disk;
optionally, perform audio augmentation;
extract the features;
save them to disk in a specified directory;
return a
Features
object with a description of the extracted features and the source data used.
- Parameters
recording (
Recording
) – aRecording
that specifies what’s the input audio.storage (
FeaturesWriter
) – aFeaturesWriter
object that will handle storing the feature matrices.offset (
float
) – an optional offset in seconds for where to start reading the recording.duration (
Optional
[float
]) – an optional duration specifying how much audio to load from the recording.channels (
Union
[int
,List
[int
],None
]) – an optional int or list of ints, specifying the channels; by default, all channels will be used.augment_fn (
Optional
[Callable
[[ndarray
,int
],ndarray
]]) – an optionalWavAugmenter
instance to modify the waveform before feature extraction.
- Return type
- Returns
a
Features
manifest item for the extracted feature matrix.
- extract_from_samples_and_store(samples, storage, sampling_rate, offset=0, channel=None, augment_fn=None)¶
Extract the features from an array of audio samples in a full pipeline:
optional audio augmentation;
extract the features;
save them to disk in a specified directory;
return a
Features
object with a description of the extracted features.
Note, unlike in
extract_from_recording_and_store
, the returnedFeatures
object might not be suitable to store in aFeatureSet
, as it does not reference any particularRecording
. Instead, this method is useful when extracting features from cuts - especiallyMixedCut
instances, which may be created from multiple recordings and channels.- Parameters
samples (
ndarray
) – a numpy ndarray with the audio samples.sampling_rate (
int
) – integer sampling rate ofsamples
.storage (
FeaturesWriter
) – aFeaturesWriter
object that will handle storing the feature matrices.offset (
float
) – an offset in seconds for where to start reading the recording - when used forCut
feature extraction, must be equal toCut.start
.channel (
Union
[int
,List
[int
],None
]) – an optional channel number(s) to insert intoFeatures
manifest.augment_fn (
Optional
[Callable
[[ndarray
,int
],ndarray
]]) – an optionalWavAugmenter
instance to modify the waveform before feature extraction.
- Return type
- Returns
a
Features
manifest item for the extracted feature matrix (it is not written to disk).
- classmethod from_dict(data)¶
- Return type
- classmethod from_yaml(path)¶
- Return type
- to_dict()¶
- Return type
Dict
[str
,Any
]
- to_yaml(path)¶
Feature storage¶
- class lhotse.features.io.FeaturesWriter[source]¶
FeaturesWriter
defines the interface of how to store numpy arrays in a particular storage backend. This backend could either be:separate files on a local filesystem;
a single file with multiple arrays;
cloud storage;
etc.
Each class inheriting from
FeaturesWriter
must define:- the
write()
method, which defines the storing operation (accepts a
key
used to place thevalue
array in the storage);
- the
- the
storage_path()
property, which is either a common directory for the files, the name of the file storing multiple arrays, name of the cloud bucket, etc.
- the
- the
name()
property that is unique to this particular storage mechanism - it is stored in the features manifests (metadata) and used to automatically deduce the backend when loading the features.
- the
Each
FeaturesWriter
can also be used as a context manager, as some implementations might need to free a resource after the writing is finalized. By default nothing happens in the context manager functions, and this can be modified by the inheriting subclasses.Example:
>>> with MyWriter('some/path') as storage: ... extractor.extract_from_recording_and_store(recording, storage)
The features loading must be defined separately in a class inheriting from
FeaturesReader
.- abstract property name: str¶
- Return type
str
- abstract property storage_path: str¶
- Return type
str
- store_array(key, value, frame_shift=None, temporal_dim=None, start=0)[source]¶
Store a numpy array in the underlying storage and return a manifest describing how to retrieve the data.
If the array contains a temporal dimension (e.g. it represents the frame-level features, alignment, posteriors, etc. of an utterance) then
temporal_dim
andframe_shift
may be specified to enable downstream padding, truncating, and partial reads of the array.- Parameters
key (
str
) – An ID that uniquely identifies the array.value (
ndarray
) – The array to be stored.frame_shift (
Optional
[float
]) – Optional float, when the array has a temporal dimension it indicates how much time has passed between the starts of consecutive frames (expressed in seconds).temporal_dim (
Optional
[int
]) – Optional int, when the array has a temporal dimension, it indicates which dim to interpret as temporal.start (
float
) – Float, when the array is temporal, it indicates what is the offset of the array w.r.t. the start of recording. Useful for reading subsets of an array when it represents something computed from long recordings. Ignored for non-temporal arrays.
- Return type
Union
[Array
,TemporalArray
]- Returns
A manifest of type
Array
orTemporalArray
, depending on the input arguments.
- class lhotse.features.io.FeaturesReader[source]¶
FeaturesReader
defines the interface of how to load numpy arrays from a particular storage backend. This backend could either be:separate files on a local filesystem;
a single file with multiple arrays;
cloud storage;
etc.
Each class inheriting from
FeaturesReader
must define:- the
read()
method, which defines the loading operation (accepts the
key
to locate the array in the storage and return it). The read method should support selecting only a subset of the feature matrix, with the bounds expressed as argumentsleft_offset_frames
andright_offset_frames
. It’s up to the Reader implementation to load only the required part or trim it to that range only after loading. It is assumed that the time dimension is always the first one.
- the
- the
name()
property that is unique to this particular storage mechanism - it is stored in the features manifests (metadata) and used to automatically deduce the backend when loading the features.
- the
The features writing must be defined separately in a class inheriting from
FeaturesWriter
.- abstract property name: str¶
- Return type
str
- lhotse.features.io.register_reader(cls)[source]¶
Decorator used to add a new
FeaturesReader
to Lhotse’s registry.Example:
@register_reader class MyFeatureReader(FeatureReader): ...
- lhotse.features.io.register_writer(cls)[source]¶
Decorator used to add a new
FeaturesWriter
to Lhotse’s registry.Example:
@register_writer class MyFeatureWriter(FeatureWriter): ...
- lhotse.features.io.get_reader(name)[source]¶
Find a
FeaturesReader
sub-class that corresponds to the providedname
and return its type.Example:
reader_type = get_reader(“lilcom_files”) reader = reader_type(“/storage/features/”)
- Return type
Type
[FeaturesReader
]
- lhotse.features.io.get_writer(name)[source]¶
Find a
FeaturesWriter
sub-class that corresponds to the providedname
and return its type.Example:
writer_type = get_writer(“lilcom_files”) writer = writer_type(“/storage/features/”)
- Return type
Type
[FeaturesWriter
]
- class lhotse.features.io.LilcomFilesReader(storage_path, *args, **kwargs)[source]¶
Reads Lilcom-compressed files from a directory on the local filesystem.
storage_path
corresponds to the directory path;storage_key
for each utterance is the name of the file in that directory.- name = 'lilcom_files'¶
- class lhotse.features.io.LilcomFilesWriter(storage_path, tick_power=- 5, *args, **kwargs)[source]¶
Writes Lilcom-compressed files to a directory on the local filesystem.
storage_path
corresponds to the directory path;storage_key
for each utterance is the name of the file in that directory.- name = 'lilcom_files'¶
- property storage_path: str¶
- Return type
str
- store_array(key, value, frame_shift=None, temporal_dim=None, start=0)¶
Store a numpy array in the underlying storage and return a manifest describing how to retrieve the data.
If the array contains a temporal dimension (e.g. it represents the frame-level features, alignment, posteriors, etc. of an utterance) then
temporal_dim
andframe_shift
may be specified to enable downstream padding, truncating, and partial reads of the array.- Parameters
key (
str
) – An ID that uniquely identifies the array.value (
ndarray
) – The array to be stored.frame_shift (
Optional
[float
]) – Optional float, when the array has a temporal dimension it indicates how much time has passed between the starts of consecutive frames (expressed in seconds).temporal_dim (
Optional
[int
]) – Optional int, when the array has a temporal dimension, it indicates which dim to interpret as temporal.start (
float
) – Float, when the array is temporal, it indicates what is the offset of the array w.r.t. the start of recording. Useful for reading subsets of an array when it represents something computed from long recordings. Ignored for non-temporal arrays.
- Return type
Union
[Array
,TemporalArray
]- Returns
A manifest of type
Array
orTemporalArray
, depending on the input arguments.
- class lhotse.features.io.NumpyFilesReader(storage_path, *args, **kwargs)[source]¶
Reads non-compressed numpy arrays from files in a directory on the local filesystem.
storage_path
corresponds to the directory path;storage_key
for each utterance is the name of the file in that directory.- name = 'numpy_files'¶
- class lhotse.features.io.NumpyFilesWriter(storage_path, *args, **kwargs)[source]¶
Writes non-compressed numpy arrays to files in a directory on the local filesystem.
storage_path
corresponds to the directory path;storage_key
for each utterance is the name of the file in that directory.- name = 'numpy_files'¶
- property storage_path: str¶
- Return type
str
- store_array(key, value, frame_shift=None, temporal_dim=None, start=0)¶
Store a numpy array in the underlying storage and return a manifest describing how to retrieve the data.
If the array contains a temporal dimension (e.g. it represents the frame-level features, alignment, posteriors, etc. of an utterance) then
temporal_dim
andframe_shift
may be specified to enable downstream padding, truncating, and partial reads of the array.- Parameters
key (
str
) – An ID that uniquely identifies the array.value (
ndarray
) – The array to be stored.frame_shift (
Optional
[float
]) – Optional float, when the array has a temporal dimension it indicates how much time has passed between the starts of consecutive frames (expressed in seconds).temporal_dim (
Optional
[int
]) – Optional int, when the array has a temporal dimension, it indicates which dim to interpret as temporal.start (
float
) – Float, when the array is temporal, it indicates what is the offset of the array w.r.t. the start of recording. Useful for reading subsets of an array when it represents something computed from long recordings. Ignored for non-temporal arrays.
- Return type
Union
[Array
,TemporalArray
]- Returns
A manifest of type
Array
orTemporalArray
, depending on the input arguments.
- lhotse.features.io.lookup_cache_or_open(storage_path)[source]¶
Helper internal function used in HDF5 readers. It opens the HDF files and keeps their handles open in a global program cache to avoid excessive amount of syscalls when the Reader class is instantiated and destroyed in a loop repeatedly (frequent use-case).
The file handles can be freed at any time by calling
close_cached_file_handles()
.
- lhotse.features.io.lookup_chunk_size(h5_file_handle)[source]¶
Helper internal function to retrieve the chunk size from an HDF5 file. Helps avoid unnecessary repeated disk reads.
- Return type
int
- lhotse.features.io.close_cached_file_handles()[source]¶
Closes the cached file handles in
lookup_cache_or_open
(see its docs for more details).- Return type
None
- class lhotse.features.io.NumpyHdf5Reader(storage_path, *args, **kwargs)[source]¶
Reads non-compressed numpy arrays from a HDF5 file with a “flat” layout. Each array is stored as a separate HDF
Dataset
because their shapes (numbers of frames) may vary.storage_path
corresponds to the HDF5 file path;storage_key
for each utterance is the key corresponding to the array (i.e. HDF5 “Group” name).- name = 'numpy_hdf5'¶
- class lhotse.features.io.NumpyHdf5Writer(storage_path, mode='w', *args, **kwargs)[source]¶
Writes non-compressed numpy arrays to a HDF5 file with a “flat” layout. Each array is stored as a separate HDF
Dataset
because their shapes (numbers of frames) may vary.storage_path
corresponds to the HDF5 file path;storage_key
for each utterance is the key corresponding to the array (i.e. HDF5 “Group” name).Internally, this class opens the file lazily so that this object can be passed between processes without issues. This simplifies the parallel feature extraction code.
- name = 'numpy_hdf5'¶
- __init__(storage_path, mode='w', *args, **kwargs)[source]¶
- Parameters
storage_path (
Union
[Path
,str
]) – Path under which we’ll create the HDF5 file. We will add a.h5
suffix if it is not already instorage_path
.mode (
str
) – Modes supported by h5py: w Create file, truncate if exists (default) w- or x Create file, fail if exists a Read/write if exists, create otherwise
- property storage_path: str¶
- Return type
str
- store_array(key, value, frame_shift=None, temporal_dim=None, start=0)¶
Store a numpy array in the underlying storage and return a manifest describing how to retrieve the data.
If the array contains a temporal dimension (e.g. it represents the frame-level features, alignment, posteriors, etc. of an utterance) then
temporal_dim
andframe_shift
may be specified to enable downstream padding, truncating, and partial reads of the array.- Parameters
key (
str
) – An ID that uniquely identifies the array.value (
ndarray
) – The array to be stored.frame_shift (
Optional
[float
]) – Optional float, when the array has a temporal dimension it indicates how much time has passed between the starts of consecutive frames (expressed in seconds).temporal_dim (
Optional
[int
]) – Optional int, when the array has a temporal dimension, it indicates which dim to interpret as temporal.start (
float
) – Float, when the array is temporal, it indicates what is the offset of the array w.r.t. the start of recording. Useful for reading subsets of an array when it represents something computed from long recordings. Ignored for non-temporal arrays.
- Return type
Union
[Array
,TemporalArray
]- Returns
A manifest of type
Array
orTemporalArray
, depending on the input arguments.
- class lhotse.features.io.LilcomHdf5Reader(storage_path, *args, **kwargs)[source]¶
Reads lilcom-compressed numpy arrays from a HDF5 file with a “flat” layout. Each array is stored as a separate HDF
Dataset
because their shapes (numbers of frames) may vary.storage_path
corresponds to the HDF5 file path;storage_key
for each utterance is the key corresponding to the array (i.e. HDF5 “Group” name).- name = 'lilcom_hdf5'¶
- class lhotse.features.io.LilcomHdf5Writer(storage_path, tick_power=- 5, mode='w', *args, **kwargs)[source]¶
Writes lilcom-compressed numpy arrays to a HDF5 file with a “flat” layout. Each array is stored as a separate HDF
Dataset
because their shapes (numbers of frames) may vary.storage_path
corresponds to the HDF5 file path;storage_key
for each utterance is the key corresponding to the array (i.e. HDF5 “Group” name).- name = 'lilcom_hdf5'¶
- __init__(storage_path, tick_power=- 5, mode='w', *args, **kwargs)[source]¶
- Parameters
storage_path (
Union
[Path
,str
]) – Path under which we’ll create the HDF5 file. We will add a.h5
suffix if it is not already instorage_path
.tick_power (
int
) – Determines the lilcom compression accuracy; the input will be compressed to integer multiples of 2^tick_power.mode (
str
) – Modes supported by h5py: w Create file, truncate if exists (default) w- or x Create file, fail if exists a Read/write if exists, create otherwise
- property storage_path: str¶
- Return type
str
- store_array(key, value, frame_shift=None, temporal_dim=None, start=0)¶
Store a numpy array in the underlying storage and return a manifest describing how to retrieve the data.
If the array contains a temporal dimension (e.g. it represents the frame-level features, alignment, posteriors, etc. of an utterance) then
temporal_dim
andframe_shift
may be specified to enable downstream padding, truncating, and partial reads of the array.- Parameters
key (
str
) – An ID that uniquely identifies the array.value (
ndarray
) – The array to be stored.frame_shift (
Optional
[float
]) – Optional float, when the array has a temporal dimension it indicates how much time has passed between the starts of consecutive frames (expressed in seconds).temporal_dim (
Optional
[int
]) – Optional int, when the array has a temporal dimension, it indicates which dim to interpret as temporal.start (
float
) – Float, when the array is temporal, it indicates what is the offset of the array w.r.t. the start of recording. Useful for reading subsets of an array when it represents something computed from long recordings. Ignored for non-temporal arrays.
- Return type
Union
[Array
,TemporalArray
]- Returns
A manifest of type
Array
orTemporalArray
, depending on the input arguments.
- class lhotse.features.io.ChunkedLilcomHdf5Reader(storage_path, *args, **kwargs)[source]¶
Reads lilcom-compressed numpy arrays from a HDF5 file with chunked lilcom storage. Each feature matrix is stored in an array of chunks - binary data compressed with lilcom. Upon reading, we check how many chunks need to be retrieved to avoid excessive I/O.
storage_path
corresponds to the HDF5 file path;storage_key
for each utterance is the key corresponding to the array (i.e. HDF5 “Group” name).- name = 'chunked_lilcom_hdf5'¶
- class lhotse.features.io.ChunkedLilcomHdf5Writer(storage_path, tick_power=- 5, chunk_size=100, mode='w', *args, **kwargs)[source]¶
Writes lilcom-compressed numpy arrays to a HDF5 file with chunked lilcom storage. Each feature matrix is stored in an array of chunks - binary data compressed with lilcom. Upon reading, we check how many chunks need to be retrieved to avoid excessive I/O.
storage_path
corresponds to the HDF5 file path;storage_key
for each utterance is the key corresponding to the array (i.e. HDF5 “Group” name).- name = 'chunked_lilcom_hdf5'¶
- __init__(storage_path, tick_power=- 5, chunk_size=100, mode='w', *args, **kwargs)[source]¶
- Parameters
storage_path (
Union
[Path
,str
]) – Path under which we’ll create the HDF5 file. We will add a.h5
suffix if it is not already instorage_path
.tick_power (
int
) – Determines the lilcom compression accuracy; the input will be compressed to integer multiples of 2^tick_power.chunk_size (
int
) – How many frames to store per chunk. Too low a number will require many reads for long feature matrices, too high a number will require to read more redundant data.mode (
str
) – Modes supported by h5py: w Create file, truncate if exists (default) w- or x Create file, fail if exists a Read/write if exists, create otherwise
- property storage_path: str¶
- Return type
str
- store_array(key, value, frame_shift=None, temporal_dim=None, start=0)¶
Store a numpy array in the underlying storage and return a manifest describing how to retrieve the data.
If the array contains a temporal dimension (e.g. it represents the frame-level features, alignment, posteriors, etc. of an utterance) then
temporal_dim
andframe_shift
may be specified to enable downstream padding, truncating, and partial reads of the array.- Parameters
key (
str
) – An ID that uniquely identifies the array.value (
ndarray
) – The array to be stored.frame_shift (
Optional
[float
]) – Optional float, when the array has a temporal dimension it indicates how much time has passed between the starts of consecutive frames (expressed in seconds).temporal_dim (
Optional
[int
]) – Optional int, when the array has a temporal dimension, it indicates which dim to interpret as temporal.start (
float
) – Float, when the array is temporal, it indicates what is the offset of the array w.r.t. the start of recording. Useful for reading subsets of an array when it represents something computed from long recordings. Ignored for non-temporal arrays.
- Return type
Union
[Array
,TemporalArray
]- Returns
A manifest of type
Array
orTemporalArray
, depending on the input arguments.
- class lhotse.features.io.LilcomChunkyReader(storage_path, *args, **kwargs)[source]¶
Reads lilcom-compressed numpy arrays from a binary file with chunked lilcom storage. Each feature matrix is stored in an array of chunks - binary data compressed with lilcom. Upon reading, we check how many chunks need to be retrieved to avoid excessive I/O.
storage_path
corresponds to the binary file path.storage_key
for each utterance is a comma separated list of offsets in the file. The first number is the offset for the whole array, and the following numbers are relative offsets for each chunk. These offsets are relative to the previous chunk start.- name = 'lilcom_chunky'¶
- CHUNK_SIZE = 500¶
- class lhotse.features.io.LilcomChunkyWriter(storage_path, tick_power=- 5, mode='wb', *args, **kwargs)[source]¶
Writes lilcom-compressed numpy arrays to a binary file with chunked lilcom storage. Each feature matrix is stored in an array of chunks - binary data compressed with lilcom. Upon reading, we check how many chunks need to be retrieved to avoid excessive I/O.
storage_path
corresponds to the binary file path.storage_key
for each utterance is a comma separated list of offsets in the file. The first number is the offset for the whole array, and the following numbers are relative offsets for each chunk. These offsets are relative to the previous chunk start.- name = 'lilcom_chunky'¶
- CHUNK_SIZE = 500¶
- __init__(storage_path, tick_power=- 5, mode='wb', *args, **kwargs)[source]¶
- Parameters
storage_path (
Union
[Path
,str
]) – Path under which we’ll create the binary file.tick_power (
int
) – Determines the lilcom compression accuracy; the input will be compressed to integer multiples of 2^tick_power.chunk_size – How many frames to store per chunk. Too low a number will require many reads for long feature matrices, too high a number will require to read more redundant data.
mode (
str
) – Modes, one of: “w” (write) or “a” (append); can be “wb” and “ab”, “b” is implicit
- property storage_path: str¶
- Return type
str
- store_array(key, value, frame_shift=None, temporal_dim=None, start=0)¶
Store a numpy array in the underlying storage and return a manifest describing how to retrieve the data.
If the array contains a temporal dimension (e.g. it represents the frame-level features, alignment, posteriors, etc. of an utterance) then
temporal_dim
andframe_shift
may be specified to enable downstream padding, truncating, and partial reads of the array.- Parameters
key (
str
) – An ID that uniquely identifies the array.value (
ndarray
) – The array to be stored.frame_shift (
Optional
[float
]) – Optional float, when the array has a temporal dimension it indicates how much time has passed between the starts of consecutive frames (expressed in seconds).temporal_dim (
Optional
[int
]) – Optional int, when the array has a temporal dimension, it indicates which dim to interpret as temporal.start (
float
) – Float, when the array is temporal, it indicates what is the offset of the array w.r.t. the start of recording. Useful for reading subsets of an array when it represents something computed from long recordings. Ignored for non-temporal arrays.
- Return type
Union
[Array
,TemporalArray
]- Returns
A manifest of type
Array
orTemporalArray
, depending on the input arguments.
- class lhotse.features.io.LilcomURLReader(storage_path, *args, **kwargs)[source]¶
Downloads Lilcom-compressed files from a URL (S3, GCP, Azure, HTTP, etc.).
storage_path
corresponds to the root URL (e.g. “s3://my-data-bucket”)storage_key
will be concatenated tostorage_path
to form a full URL (e.g. “my-feature-file.llc”)Caution
Requires
smart_open
to be installed (pip install smart_open
).- name = 'lilcom_url'¶
- class lhotse.features.io.LilcomURLWriter(storage_path, tick_power=- 5, *args, **kwargs)[source]¶
Writes Lilcom-compressed files to a URL (S3, GCP, Azure, HTTP, etc.).
storage_path
corresponds to the root URL (e.g. “s3://my-data-bucket”)storage_key
will be concatenated tostorage_path
to form a full URL (e.g. “my-feature-file.llc”)Caution
Requires
smart_open
to be installed (pip install smart_open
).- name = 'lilcom_url'¶
- property storage_path: str¶
- Return type
str
- store_array(key, value, frame_shift=None, temporal_dim=None, start=0)¶
Store a numpy array in the underlying storage and return a manifest describing how to retrieve the data.
If the array contains a temporal dimension (e.g. it represents the frame-level features, alignment, posteriors, etc. of an utterance) then
temporal_dim
andframe_shift
may be specified to enable downstream padding, truncating, and partial reads of the array.- Parameters
key (
str
) – An ID that uniquely identifies the array.value (
ndarray
) – The array to be stored.frame_shift (
Optional
[float
]) – Optional float, when the array has a temporal dimension it indicates how much time has passed between the starts of consecutive frames (expressed in seconds).temporal_dim (
Optional
[int
]) – Optional int, when the array has a temporal dimension, it indicates which dim to interpret as temporal.start (
float
) – Float, when the array is temporal, it indicates what is the offset of the array w.r.t. the start of recording. Useful for reading subsets of an array when it represents something computed from long recordings. Ignored for non-temporal arrays.
- Return type
Union
[Array
,TemporalArray
]- Returns
A manifest of type
Array
orTemporalArray
, depending on the input arguments.
- class lhotse.features.io.KaldiReader(storage_path, *args, **kwargs)[source]¶
Reads Kaldi’s “feats.scp” file using kaldi_native_io.
storage_path
corresponds to the path tofeats.scp
.storage_key
corresponds to the utterance-id in Kaldi.Caution
Requires
kaldi_native_io
to be installed (pip install kaldi_native_io
).- name = 'kaldiio'¶
- class lhotse.features.io.KaldiWriter(storage_path, compression_method=1, *args, **kwargs)[source]¶
Write data to Kaldi’s “feats.scp” and “feats.ark” files using kaldi_native_io.
storage_path
corresponds to a directory where we’ll create “feats.scp” and “feats.ark” files.storage_key
corresponds to the utterance-id in Kaldi.The following
compression_method
values are supported by kaldi_native_io:kAutomaticMethod = 1 kSpeechFeature = 2 kTwoByteAuto = 3 kTwoByteSignedInteger = 4 kOneByteAuto = 5 kOneByteUnsignedInteger = 6 kOneByteZeroOne = 7
Note
Setting compression_method works only with 2D arrays.
Example:
>>> data = np.random.randn(131, 80) >>> with KaldiWriter('featdir') as w: ... w.write('utt1', data) >>> reader = KaldiReader('featdir/feats.scp') >>> read_data = reader.read('utt1') >>> np.testing.assert_equal(data, read_data)
Caution
Requires
kaldi_native_io
to be installed (pip install kaldi_native_io
).- name = 'kaldiio'¶
- property storage_path: str¶
- Return type
str
- store_array(key, value, frame_shift=None, temporal_dim=None, start=0)¶
Store a numpy array in the underlying storage and return a manifest describing how to retrieve the data.
If the array contains a temporal dimension (e.g. it represents the frame-level features, alignment, posteriors, etc. of an utterance) then
temporal_dim
andframe_shift
may be specified to enable downstream padding, truncating, and partial reads of the array.- Parameters
key (
str
) – An ID that uniquely identifies the array.value (
ndarray
) – The array to be stored.frame_shift (
Optional
[float
]) – Optional float, when the array has a temporal dimension it indicates how much time has passed between the starts of consecutive frames (expressed in seconds).temporal_dim (
Optional
[int
]) – Optional int, when the array has a temporal dimension, it indicates which dim to interpret as temporal.start (
float
) – Float, when the array is temporal, it indicates what is the offset of the array w.r.t. the start of recording. Useful for reading subsets of an array when it represents something computed from long recordings. Ignored for non-temporal arrays.
- Return type
Union
[Array
,TemporalArray
]- Returns
A manifest of type
Array
orTemporalArray
, depending on the input arguments.
- class lhotse.features.io.MemoryLilcomWriter(*args, **kwargs)[source]¶
- name = 'memory_lilcom'¶
- property storage_path: None¶
- Return type
None
- store_array(key, value, frame_shift=None, temporal_dim=None, start=0)¶
Store a numpy array in the underlying storage and return a manifest describing how to retrieve the data.
If the array contains a temporal dimension (e.g. it represents the frame-level features, alignment, posteriors, etc. of an utterance) then
temporal_dim
andframe_shift
may be specified to enable downstream padding, truncating, and partial reads of the array.- Parameters
key (
str
) – An ID that uniquely identifies the array.value (
ndarray
) – The array to be stored.frame_shift (
Optional
[float
]) – Optional float, when the array has a temporal dimension it indicates how much time has passed between the starts of consecutive frames (expressed in seconds).temporal_dim (
Optional
[int
]) – Optional int, when the array has a temporal dimension, it indicates which dim to interpret as temporal.start (
float
) – Float, when the array is temporal, it indicates what is the offset of the array w.r.t. the start of recording. Useful for reading subsets of an array when it represents something computed from long recordings. Ignored for non-temporal arrays.
- Return type
Union
[Array
,TemporalArray
]- Returns
A manifest of type
Array
orTemporalArray
, depending on the input arguments.
- class lhotse.features.io.MemoryRawWriter(*args, **kwargs)[source]¶
- name = 'memory_raw'¶
- property storage_path: None¶
- Return type
None
- store_array(key, value, frame_shift=None, temporal_dim=None, start=0)¶
Store a numpy array in the underlying storage and return a manifest describing how to retrieve the data.
If the array contains a temporal dimension (e.g. it represents the frame-level features, alignment, posteriors, etc. of an utterance) then
temporal_dim
andframe_shift
may be specified to enable downstream padding, truncating, and partial reads of the array.- Parameters
key (
str
) – An ID that uniquely identifies the array.value (
ndarray
) – The array to be stored.frame_shift (
Optional
[float
]) – Optional float, when the array has a temporal dimension it indicates how much time has passed between the starts of consecutive frames (expressed in seconds).temporal_dim (
Optional
[int
]) – Optional int, when the array has a temporal dimension, it indicates which dim to interpret as temporal.start (
float
) – Float, when the array is temporal, it indicates what is the offset of the array w.r.t. the start of recording. Useful for reading subsets of an array when it represents something computed from long recordings. Ignored for non-temporal arrays.
- Return type
Union
[Array
,TemporalArray
]- Returns
A manifest of type
Array
orTemporalArray
, depending on the input arguments.
Feature-domain mixing¶
- class lhotse.features.mixer.FeatureMixer(feature_extractor, base_feats, frame_shift, padding_value=- 1000.0, reference_energy=None)[source]¶
Utility class to mix multiple feature matrices into a single one. It should be instantiated separately for each mixing session (i.e. each
MixedCut
will create a separateFeatureMixer
to mix its tracks). It is initialized with a numpy array of features (typically float32) that represents the “reference” signal for the mix. Other signals can be mixed to it with different time offsets and SNRs using theadd_to_mix
method. The time offset is relative to the start of the reference signal (only positive values are supported). The SNR is relative to the energy of the signal used to initialize theFeatureMixer
.It relies on the
FeatureExtractor
to have definedmix
andcompute_energy
methods, so that theFeatureMixer
knows how to scale and add two feature matrices together.- __init__(feature_extractor, base_feats, frame_shift, padding_value=- 1000.0, reference_energy=None)[source]¶
FeatureMixer’s constructor.
- Parameters
feature_extractor (
FeatureExtractor
) – TheFeatureExtractor
instance that specifies how to mix the features.base_feats (
ndarray
) – The features used to initialize theFeatureMixer
are a point of reference in terms of energy and offset for all features mixed into them.frame_shift (
float
) – Required to correctly compute offset and padding during the mix.padding_value (
float
) – The value used to pad the shorter features during the mix. This value is adequate only for log space features. For non-log space features, e.g. energies, use either 0 or a small positive value like 1e-5.reference_energy (
Optional
[float
]) – Optionally pass a reference energy value to compute SNRs against. This might be required whenbase_feats
correspond to padding energies.
- property num_features¶
- property unmixed_feats: numpy.ndarray¶
Return a numpy ndarray with the shape (num_tracks, num_frames, num_features), where each track’s feature matrix is padded and scaled adequately to the offsets and SNR used in
add_to_mix
call.- Return type
ndarray
- property mixed_feats: numpy.ndarray¶
Return a numpy ndarray with the shape (num_frames, num_features) - a mono mixed feature matrix of the tracks supplied with
add_to_mix
calls.- Return type
ndarray
- add_to_mix(feats, sampling_rate, snr=None, offset=0.0)[source]¶
Add feature matrix of a new track into the mix. :type feats:
ndarray
:param feats: A 2D feature matrix to be mixed in. :type sampling_rate:int
:param sampling_rate: The sampling rate offeats
:type snr:Optional
[float
] :param snr: Signal-to-noise ratio, assumingfeats
represents noise (positive SNR - lowerfeats
energy, negative SNR - higherfeats
energy) :type offset:float
:param offset: How many seconds to shiftfeats
in time. For mixing, the signal will be padded before the start with low energy values.
Augmentation¶
Cuts¶
Data structures and tools used to create training/testing examples.
The following is the hierarchy of imports in this module (to avoid circular imports):
┌─────────────┐ │ __init__.py │─────────────┬────────────────────────────────────────────┐ └─────────────┘ │ │
│ │ │ │ ▼ │ │ ┌────────────────┐ │ ├──────────▶│ mono.MonoCut │────────────────────┐ │ │ └────────────────┘ │ │ │ ▼ │ │ ┌────────────────┐ ┌────────────────┐ │ ├──────────▶│ multi.MultiCut │──────────▶│ data.DataCut │───────┤ │ └────────────────┘ └────────────────┘ │ │ ▲ ▼ │ ┌────────────────────┐ │ ┌─────────────┐ ├──────────▶│ mixed.MixedCut │────────────────┴───────▶│ base.Cut │ │ └────────────────────┘ └─────────────┘ │ │ ▲ │ │ │ │ │ ┌────────────────────┐ │ ├──────────────────────┴────────▶│ padding.PaddingCut │───────────┤ │ └────────────────────┘ │
┌────────────────┐ ▲ │ │ set.CutSet │───────────────────────────────────┴─────────────────────┘ └────────────────┘
- lhotse.cut.create_cut_set_eager(recordings=None, supervisions=None, features=None, output_path=None, random_ids=False)[source]¶
Create a
CutSet
from any combination of supervision, feature and recording manifests. At least one ofrecordings
orfeatures
is required.The created cuts will be of type
DataCut
(MonoCut for single-channel and MultiCut for multi-channel). TheDataCut
boundaries correspond to those found in thefeatures
, when available, otherwise to those found in therecordings
.When
supervisions
are provided, we’ll be searching them for matching recording IDs and attaching to created cuts, assuming they are fully within the cut’s time span.- Parameters
recordings (
Optional
[RecordingSet
]) – an optionalRecordingSet
manifest.supervisions (
Optional
[SupervisionSet
]) – an optionalSupervisionSet
manifest.features (
Optional
[FeatureSet
]) – an optionalFeatureSet
manifest.output_path (
Union
[Path
,str
,None
]) – an optional path where theCutSet
is stored.random_ids (
bool
) – boolean, should the cut IDs be randomized. By default, use the recording ID with a loop index and a channel idx, i.e. “{recording_id}-{idx}-{channel}”)
- Return type
CutSet
- Returns
a new
CutSet
instance.
- lhotse.cut.create_cut_set_lazy(output_path, recordings=None, supervisions=None, features=None, random_ids=False)[source]¶
Create a
CutSet
from any combination of supervision, feature and recording manifests. At least one ofrecordings
orfeatures
is required.This method is the “lazy” variant, which allows to create a
CutSet
with a minimal memory usage. It has some extra requirements:- The user must provide an
output_path
, where we will write the cuts as we create them. We’ll return a lazily-opened
CutSet
from that file.
- The user must provide an
recordings
andfeatures
(if both provided) have to be of equal lengthand sorted by
recording_id
attribute of their elements.
supervisions
(if provided) have to be sorted byrecording_id
;note that there may be multiple supervisions with the same
recording_id
, which is allowed.
In addition, to prepare cuts in a fully memory-efficient way, make sure that:
- All input manifests are stored in JSONL format and opened lazily
with
<manifest_class>.from_jsonl_lazy(path)
method.
For more details, see
create_cut_set_eager()
.- Parameters
output_path (
Union
[Path
,str
]) – path to which we will write the cuts.recordings (
Optional
[RecordingSet
]) – an optionalRecordingSet
manifest.supervisions (
Optional
[SupervisionSet
]) – an optionalSupervisionSet
manifest.features (
Optional
[FeatureSet
]) – an optionalFeatureSet
manifest.random_ids (
bool
) – boolean, should the cut IDs be randomized. By default, use the recording ID with a loop index and a channel idx, i.e. “{recording_id}-{idx}-{channel}”)
- Return type
CutSet
- Returns
a new
CutSet
instance.
- lhotse.cut.compute_supervisions_frame_mask(cut, frame_shift=None, use_alignment_if_exists=None)[source]¶
Compute a mask that indicates which frames in a cut are covered by supervisions.
- Parameters
cut (
Cut
) – a cut object.frame_shift (
Optional
[float
]) – optional frame shift in seconds; required when the cut does not have pre-computed features, otherwise ignored.use_alignment_if_exists (
Optional
[str
]) – optional str (key from alignment dict); use the specified alignment type for generating the mask
:returns a 1D numpy array with value 1 for frames covered by at least one supervision, and 0 for frames not covered by any supervision.
Recipes¶
Convenience methods used to prepare recording and supervision manifests for standard corpora.
Kaldi conversion¶
Convenience methods used to interact with Kaldi data directories.
- lhotse.kaldi.get_duration(path)[source]¶
Read a audio file, it supports pipeline style wave path and real waveform.
- Parameters
path (
Union
[Path
,str
]) – Path to an audio file or a Kaldi-style pipe.- Return type
float
- Returns
float duration of the recording, in seconds.
- lhotse.kaldi.load_kaldi_data_dir(path, sampling_rate, frame_shift=None, map_string_to_underscores=None, use_reco2dur=True, num_jobs=1)[source]¶
Load a Kaldi data directory and convert it to a Lhotse RecordingSet and SupervisionSet manifests. For this to work, at least the wav.scp file must exist. SupervisionSet is created only when a segments file exists. reco2dur is used by default when exists (to enforce reading the duration from the audio files themselves, please set use_reco2dur = False. All the other files (text, utt2spk, etc.) are optional, and some of them might not be handled yet. In particular, feats.scp files are ignored.
- Parameters
map_string_to_underscores (
Optional
[str
]) – optional string, when specified, we will replace all instances of this string in SupervisonSegment IDs to underscores. This is to help with handling underscores in Kaldi (seeexport_to_kaldi()
). This is also done for speaker IDs.- Return type
Tuple
[RecordingSet
,Optional
[SupervisionSet
],Optional
[FeatureSet
]]
- lhotse.kaldi.export_to_kaldi(recordings, supervisions, output_dir, map_underscores_to=None, prefix_spk_id=False)[source]¶
Export a pair of
RecordingSet
andSupervisionSet
to a Kaldi data directory. It even supports recordings that have multiple channels but the recordings will still have to have a singleAudioSource
.The
RecordingSet
andSupervisionSet
must be compatible, i.e. it must be possible to create aCutSet
out of them.- Parameters
recordings (
RecordingSet
) – aRecordingSet
manifest.supervisions (
SupervisionSet
) – aSupervisionSet
manifest.output_dir (
Union
[Path
,str
]) – path where the Kaldi-style data directory will be created.map_underscores_to (
Optional
[str
]) – optional string with which we will replace all underscores. This helps avoid issues with Kaldi data dir sorting.prefix_spk_id (
Optional
[bool
]) – add speaker_id as a prefix of utterance_id (this is to ensure correct sorting inside files which is required by Kaldi)
- lhotse.kaldi.load_kaldi_text_mapping(path, must_exist=False, float_vals=False)[source]¶
Load Kaldi files such as utt2spk, spk2gender, text, etc. as a dict.
- Return type
Dict
[str
,Optional
[str
]]
Others¶
Helper methods used throughout the codebase.
- lhotse.manipulation.combine(*manifests)[source]¶
Combine multiple manifests of the same type into one.
- Examples:
>>> # Pass several arguments >>> combine(recording_set1, recording_set2, recording_set3) >>> # Or pass a single list/tuple of manifests >>> combine([supervision_set1, supervision_set2])
- Return type
~Manifest
- lhotse.manipulation.split_parallelize_combine(num_jobs, manifest, fn, *args, **kwargs)[source]¶
Convenience wrapper that parallelizes the execution of functions that transform manifests. It splits the manifests into
num_jobs
pieces, applies the function to each split, and then combines the splits.This function is used internally in Lhotse to implement some parallel ops.
Example:
>>> from lhotse import CutSet, split_parallelize_combine >>> cuts = CutSet(...) >>> window_cuts = split_parallelize_combine( ... 16, ... cuts, ... CutSet.cut_into_windows, ... duration=30.0 ... )
- Parameters
num_jobs (
int
) – The number of parallel jobs.manifest (~Manifest) – The manifest to be processed.
fn (
Callable
) – Function or method that transforms the manifest; the first parameter has to bemanifest
(for methods, they have to be methods on that manifests type, e.g.CutSet.cut_into_windows
.args – positional arguments to
fn
.
:param kwargs keyword arguments to
fn
.- Return type
~Manifest