Jafet Rubi
Jan 28 2025
Dynamically assigning complex functions at runtime.
Consider this: you’ve been tasked at work to come up with a solution for dynamically calling functions depending on different criteria in a metadata table. Let’s say this table contains an object name, a “rule” that maps to a specific function, and any parameters for that function.
How would you go about solving this problem? Here’s how I did it:
Here’s an example of the aforementioned metadata table:
object_name | rule | parameters |
---|---|---|
dim_customer | create_composite_key | {‘key1’: ‘foo’, ‘key2’: ‘bar’} |
dim_customer | clean_email_address | {‘column1’: ‘baz’, ‘audit’: True} |
dim_products | create_composite_key | {‘key1’: ‘qux’, ‘key2’: ‘corge’} |
Now, here’s examples of the functions that need to be called for each entry in the tables above:
def gen_comp_key(df, key1, key2):
df = df.withColumn('composite_key', concat(col(key1), lit("-"), col(key2)))
return df
def clean_email(df, column1, audit):
if audit:
df = df.withColumn('email_address_clean', trim(col(column1)))
return df
Now that we have these functions defined, we need to create a dictionary to house these functions that map to the metadata “rule” columns. Notice we store the function references (without parenthesis and parameters) and not the function executions.
dispatcher = {
'create_composite_key': gen_comp_key,
'clean_email_address': clean_email
}
Notice how we have stored the functions as the values and the “rule” column in the metadata table as the key. Now, let’s call the functions dynamically based on the contents of the table above:
rows = [
('dim_customer', 'create_composite_key', {'key1': 'foo', 'key2': 'bar'}),
('dim_customer', 'clean_email_address', {'column1': 'baz', 'audit': True}),
('dim_products', 'create_composite_key', {'key1': 'qux', 'key2': 'corge'})
]
for row in rows:
for obj, rule, params in row:
dispatcher[rule](df=df, **params)
In the above, the dispatcher
is our dictionary containing our functions. We call the dictionary containing the functions using the inputs from our table, both the rule name and the parameters for each respective data object, without explicitly calling the functions!