File I/O
Utilities for working with files in Apache Beam.
- class resiliparse.beam.fileio.MatchFiles(file_pattern: str, empty_match_treatment: EmptyMatchTreatment = 'ALLOW_IF_WILDCARD', shuffle: bool = True)
Bases:
PTransform
Match a file pattern using
apache_beam.io.filesystems.FileSystems.match()
.Unlike the original Beam implementation, this file matcher enforces a fusion break by reshuffling the matched file names. This circumvents limitations in certain Beam runners that do not automatically distribute splits, such as the FlinkRunner.
- Parameters:
file_pattern – file glob
empty_match_treatment – what to do with empty glob matches
shuffle – shuffle matches to break fusion (setting this to
False
effectively falls back to the original Beam implementation)