Table of Contents:
This Formatter package was created for parse
and format
any string values
that match a format pattern string with Python regular expression.
This package be the co-pylot project for stating to my Python Software Developer
way.
🎯 First objective of this project is include necessary formatter objects for
any data components package which mean we can parse
any complicate names on
data source and ingest the right names to in-house or data target.
pip install -U fmtutil
Dependency supported:
Python Version | Installation |
---|---|
== 3.8 |
pip install "fmtutil>=0.4,<0.5.0" |
>=3.9,<3.13 |
pip install -U fmtutil |
For example, we want to get filename with the format like, filename_20220101.csv
,
on the file system storage, and we want to incremental ingest the latest file with
date 2022-03-25 date. So we will implement Datetime
object and parse
that filename to it,
assert (
Datetime.parse('filename_20220101.csv', 'filename_%Y%m%d.csv').value
== datetime.datetime(2022, 1, 1, 0)
)
The above example is 🥱 NOT SURPRISE!!! for us because Python
already provide the build-in datetime
to parse by datetime.strptime
and
format by {dt}.strftime
. This package will be the special thing when we group
more than one format-able objects together as Naming
, Version
, and Datetime
.
For complex filename format like:
{filename:%s}_{datetime:%Y_%m_%d}.{version:%m.%n.%c}.csv
From above filename format string, the datetime
package does not enough for
this scenario right? but you can handle by your hard-code object or create the
better package than this project.
Note
Any formatter object was implemented the self.valid
method for help us validate
format string value like the above the example scenario,
this_date = Datetime.parse('20220101', '%Y%m%d')
assert this_date.valid('any_files_20220101.csv', 'any_files_%Y%m%d.csv')
The main purpose is Formatter Objects for parse
and format
with string
value, such as Datetime
, Version
, and Serial
formatter objects. These objects
were used for parse any filename with put the format string value.
The formatter able to enhancement any format value from sting value, like in
Datetime
, for %B
value that was designed for month shortname (Jan
,
Feb
, etc.) that does not support in build-in datetime
package.
Important
The main usage of this formatter object is parse
and format
method.
from fmtutil import Datetime
datetime = Datetime.parse(value='Datetime_20220101_000101', fmt='Datetime_%Y%m%d_%H%M%S')
datetime.format('New_datetime_%Y%b-%-d_%H:%M:%S')
>>> 'New_datetime_2022Jan-1_00:01:01'
from fmtutil import Version
version = Version.parse(value='Version_2_0_1', fmt='Version_%m_%n_%c')
version.format('New_version_%m%n%c')
>>> 'New_version_201'
from fmtutil import Serial
serial = Serial.parse(value='Serial_62130', fmt='Serial_%n')
serial.format('Convert to binary: %b')
>>> 'Convert to binary: 1111001010110010'
from fmtutil import Naming
naming = Naming.parse(value='de is data engineer', fmt='%a is %n')
naming.format('Camel case is %c')
>>> 'Camel case is dataEngineer'
from fmtutil import Storage
storage = Storage.parse(value='This file have 250MB size', fmt='This file have %M size')
storage.format('The byte size is: %b')
>>> 'The byte size is: 2097152000'
from fmtutil import Constant, make_const
from fmtutil.exceptions import FormatterError
const = make_const({'%n': 'normal', '%s': 'special'})
try:
parse_const: Constant = const.parse(value='Constant_normal', fmt='Constant_%n')
parse_const.format('The value of %%s is %s')
except FormatterError:
pass
>>> 'The value of %s is special'
All formatter object can convert itself to constant formatter object for frozen
parsing value to constant by .to_const()
.
Note
This package already implement the environment constant object,
fmtutil.EnvConst
.
Read more about the Formatter objects API
The FormatterGroup object, FormatterGroup
, which is the grouping of needed
mapping formatter objects and its alias formatter object ref name together. You
can define a name of formatter that you want, such as name
for Naming
, or
timestamp
for Datetime
.
Parse:
from fmtutil import make_group, Naming, Datetime, FormatterGroupType
group_obj: FormatterGroupType = make_group({'name': Naming, 'datetime': Datetime})
group_obj.parse('data_engineer_in_20220101_de', fmt='{name:%s}_in_{timestamp:%Y%m%d}_{name:%a}')
>>> {
>>> 'name': Naming.parse('data engineer', '%n'),
>>> 'timestamp': Datetime.parse('2022-01-01 00:00:00.000000', '%Y-%m-%d %H:%M:%S.%f')
>>> }
Format:
from fmtutil import FormatterGroup
from datetime import datetime
group_01: FormatterGroup = group_obj({'name': 'data engineer', 'datetime': datetime(2022, 1, 1)})
group_01.format('{name:%c}_{timestamp:%Y_%m_%d}')
>>> dataEngineer_2022_01_01
If you have multi-format filenames on the data source directory, and you want to dynamic getting max datetime on these filenames to your app, you can use a formatter group.
from fmtutil import (
make_group, Naming, Datetime, FormatterGroup, FormatterGroupType, FormatterArgumentError,
)
name: Naming = Naming.parse('Google Map', fmt='%t')
fmt_group: FormatterGroupType = make_group({
"naming": name.to_const(),
"timestamp": Datetime,
})
rs: list[FormatterGroup] = []
for file in (
'googleMap_20230101.json',
'googleMap_20230103.json',
'googleMap_20230103_bk.json',
'googleMap_with_usage_20230105.json',
'googleDrive_with_usage_20230105.json',
):
try:
rs.append(
fmt_group.parse(file, fmt=r'{naming:c}_{timestamp:%Y%m%d}\.json')
)
except FormatterArgumentError:
continue
repr(max(rs).groups['timestamp'])
>>> <Datetime.parse('2023-01-03 00:00:00.000000', '%Y-%m-%d %H:%M:%S.%f')>
Tip
The above Example will convert the name
, Naming instance, to Constant
instance before passing to the Formatter Group because it does not want
to dynamic parsing this format when find any matching filenames at destination
path.
This project was licensed under the terms of the MIT license.