jo3util/README.md

# README

A bunch of functionality in here, but the main one is `store.load_or_create`.
It might get its own package someday, when it's less before-pre-alpha-sneakpeak-demo as it's now.

## load_or_create

### the status quo

A common pattern is:
```python
def load_obj(path):
    ""

def save_obj(obj, path):
    ""

def create_obj(*args, **kwargs):
    ""

def infer_path(obj_id):
    return "./obj/" + str(obj_id)

obj_id = 0
path = infer_path(obj_id)
if path.exists():
    obj = load_obj(load)
else:
    obj = create_obj(obj_id, some_kwarg=0)
    save_obj(obj, path)
```

And in some cases, you want to create and save many variations of the object.
It might be better to hash it's characteristics and use that as part of the path.

```python
import sha256
import json

def infer_path(obj_id, **some_other_kwargs):
    hash = str(sha256(json.dumps(some_other_kwargs)).hexdigest())
    return "./obj/" + hash + ".pkl"
```

### the problem

The above is fine and dandy, but when someone wants to use your obj,
they'd need to keep track of 4 separate functions.

You can dress it up as such:
```python
def get_obj(obj_id, *args, **kwargs):
    path = infer_path(obj_id)
    if path.exists():
        obj = load_obj(load)
    else:
        obj = create_obj(obj_id, some_kwarg=0)
        save_obj(obj, path)
    return obj
```
But that takes a lot of freedom away from your user, who might have their
own ideas on where the object should be stored.

### the solution

```python
from jo3util.store import load_or_create
get_obj = load_or_create(
    load=load_obj,
    save=save_obj,
    path_fn=infer_path,
)(create_obj)

obj = get_obj(obj_id, some_kwarg=0)

path_of_obj_0 = get_obj.path(obj_id, some_kwarg=0)
path_of_obj_1 = get_obj.path_of_obj(obj)
assert path_of_obj_0 ==  path_of_obj_1
```

You can now elegantly pack the four functions together.
But you still have the flexibility to alter the path function on the fly:

```python
get_obj.path_fn = lambda hash: f"./{hash}.pkl"
```

Now, storing different objects of which one is dependent on the other, becomes intuitive and elegant:

```python
get_human = load_or_create(
    path_fn=lambda name: "./" + name + "/body.txt"
)(lambda name: name)
get_finger_print = load_or_create(
    path_fn=lambda finger: get_human.dir_from_obj(human) / f"{finger}.print"
)(lambda human, finger: f"{human}'s finger the {finger}")

assert not get_human.path("john").exists()
human = get_human("john")
assert get_human.path("john").exists()

finger_print = get_finger_print(human, "thumb")
assert get_finger_print.path(human, "thumb") == "./john/thumb.print"
```

The Finger print is now always stored in the same directory as where the human's `body.txt` is stored.
You don't need to keep track of the location of `body.txt`.


### under the hood

The main trick is to match the parameter names of the `create` function (in our case `create_obj`)
with those of the three other subfunctions (in our case `load_obj`, `save_obj` and `infer_path`).

The three subfunctions's allowed parameters are mostly a non-strict superset of the create function's
parameters.

When you call `get_obj`, something like this happens:

```python
def call_fn_with_filtered_arguments(fn, *args, **kwargs):
    """ call fn with only the subset of args and kwargs that fn expects.
    """
    path_parameters = get_parameters_that_fn_expects(infer_path)
    # in reality we first infer the args name, for positional arguments.
    args = [arg for arg in args if arg in path_parameters]
    kwargs = {key: arg for key, arg in kwargs.items() if key in path_parameters}
    return infer_path(*args, **kwargs)

def get_obj_pseudo_code(*args, **kwargs):
    hash = some_hash_fn(*args, **kwargs)
    path = call_fn_with_filtered_arguments(infer_path, *args, hash=hash, **kwargs)
    if path.exists():
        return call_fn_with_filtered_arguments(
            load_obj,
            *args,
            path=path,
            file=open(path, "rb"),
            **kwargs
        )

    obj = create_obj(*args, **kwargs)
    call_fn_with_filtered_arguments(
        save_obj,
        qbj,
        *args,
        path=path,
        file=open(path, "wb"),
        **kwargs
    )
    return obj
```

So, the load, save and path functions you provide do not have to have the same signature as the create
function does, but you can call them _as if_ they are the create function.

### philosophy

The main idea is that some object's storage location should be inferrable from the arguments 
during its creation call.

In reality, we tend to separately keep track of some object's path, its arguments and itself.
This tends to go bad when we need to load, save or create the object in some other context.
It becomes easy to forget where some object ought to be stored. 
Or it can happen that or different places where the same object is handled, have different opinions on storage location.

It can lead to duplicates; forgetting where the object was stored; or losing a folder of data
because the folder is too unwieldy to salvage.

By packaging a function with it's load and save countparts and a default storage location, we don't
need to worry about storage location anymore and can focus on creating and using our objects.

If we ever do change our minds on the ideal storage location, then there is an obvious central place
where we can change it, and that change then easily immediately applies to _all_ the places where
some object's path needs to be determined.
ginit 2023-07-01 13:30:03 +02:00			`# README`

readme.md 2024-08-06 22:45:46 +02:00			A bunch of functionality in here, but the main one is `store.load_or_create`.
			`It might get its own package someday, when it's less before-pre-alpha-sneakpeak-demo as it's now.`

			`## load_or_create`

			`### the status quo`

			`A common pattern is:`
			```python
			`def load_obj(path):`
			`""`

			`def save_obj(obj, path):`
			`""`

			`def create_obj(args, *kwargs):`
			`""`

			`def infer_path(obj_id):`
			`return "./obj/" + str(obj_id)`

			`obj_id = 0`
			`path = infer_path(obj_id)`
			`if path.exists():`
			`obj = load_obj(load)`
			`else:`
			`obj = create_obj(obj_id, some_kwarg=0)`
			`save_obj(obj, path)`
			```

			`And in some cases, you want to create and save many variations of the object.`
			`It might be better to hash it's characteristics and use that as part of the path.`

			```python
			`import sha256`
			`import json`

			`def infer_path(obj_id, **some_other_kwargs):`
			`hash = str(sha256(json.dumps(some_other_kwargs)).hexdigest())`
			`return "./obj/" + hash + ".pkl"`
			```

			`### the problem`

			`The above is fine and dandy, but when someone wants to use your obj,`
			`they'd need to keep track of 4 separate functions.`

			`You can dress it up as such:`
			```python
			`def get_obj(obj_id, args, *kwargs):`
			`path = infer_path(obj_id)`
			`if path.exists():`
			`obj = load_obj(load)`
			`else:`
			`obj = create_obj(obj_id, some_kwarg=0)`
			`save_obj(obj, path)`
			`return obj`
			```
			`But that takes a lot of freedom away from your user, who might have their`
			`own ideas on where the object should be stored.`

			`### the solution`

			```python
			`from jo3util.store import load_or_create`
			`get_obj = load_or_create(`
			`load=load_obj,`
			`save=save_obj,`
			`path_fn=infer_path,`
			`)(create_obj)`

			`obj = get_obj(obj_id, some_kwarg=0)`

			`path_of_obj_0 = get_obj.path(obj_id, some_kwarg=0)`
			`path_of_obj_1 = get_obj.path_of_obj(obj)`
			`assert path_of_obj_0 == path_of_obj_1`
			```

			`You can now elegantly pack the four functions together.`
			`But you still have the flexibility to alter the path function on the fly:`

			```python
			`get_obj.path_fn = lambda hash: f"./{hash}.pkl"`
			```

			`Now, storing different objects of which one is dependent on the other, becomes intuitive and elegant:`

			```python
			`get_human = load_or_create(`
			`path_fn=lambda name: "./" + name + "/body.txt"`
			`)(lambda name: name)`
			`get_finger_print = load_or_create(`
			`path_fn=lambda finger: get_human.dir_from_obj(human) / f"{finger}.print"`
			`)(lambda human, finger: f"{human}'s finger the {finger}")`

			`assert not get_human.path("john").exists()`
			`human = get_human("john")`
			`assert get_human.path("john").exists()`

			`finger_print = get_finger_print(human, "thumb")`
			`assert get_finger_print.path(human, "thumb") == "./john/thumb.print"`
			```

			The Finger print is now always stored in the same directory as where the human's `body.txt` is stored.
			You don't need to keep track of the location of `body.txt`.


			`### under the hood`

			The main trick is to match the parameter names of the `create` function (in our case `create_obj`)
			with those of the three other subfunctions (in our case `load_obj`, `save_obj` and `infer_path`).

			`The three subfunctions's allowed parameters are mostly a non-strict superset of the create function's`
			`parameters.`

			When you call `get_obj`, something like this happens:

			```python
			`def call_fn_with_filtered_arguments(fn, args, *kwargs):`
			`""" call fn with only the subset of args and kwargs that fn expects.`
			`"""`
			`path_parameters = get_parameters_that_fn_expects(infer_path)`
			`# in reality we first infer the args name, for positional arguments.`
			`args = [arg for arg in args if arg in path_parameters]`
			`kwargs = {key: arg for key, arg in kwargs.items() if key in path_parameters}`
			`return infer_path(args, *kwargs)`

			`def get_obj_pseudo_code(args, *kwargs):`
			`hash = some_hash_fn(args, *kwargs)`
			`path = call_fn_with_filtered_arguments(infer_path, args, hash=hash, *kwargs)`
			`if path.exists():`
			`return call_fn_with_filtered_arguments(`
			`load_obj,`
			`*args,`
			`path=path,`
			`file=open(path, "rb"),`
			`**kwargs`
			`)`

			`obj = create_obj(args, *kwargs)`
			`call_fn_with_filtered_arguments(`
			`save_obj,`
			`qbj,`
			`*args,`
			`path=path,`
			`file=open(path, "wb"),`
			`**kwargs`
			`)`
			`return obj`
			```

			`So, the load, save and path functions you provide do not have to have the same signature as the create`
			`function does, but you can call them _as if_ they are the create function.`

			`### philosophy`

			`The main idea is that some object's storage location should be inferrable from the arguments`
			`during its creation call.`

			`In reality, we tend to separately keep track of some object's path, its arguments and itself.`
			`This tends to go bad when we need to load, save or create the object in some other context.`
			`It becomes easy to forget where some object ought to be stored.`
			`Or it can happen that or different places where the same object is handled, have different opinions on storage location.`

			`It can lead to duplicates; forgetting where the object was stored; or losing a folder of data`
			`because the folder is too unwieldy to salvage.`

			`By packaging a function with it's load and save countparts and a default storage location, we don't`
			`need to worry about storage location anymore and can focus on creating and using our objects.`

			`If we ever do change our minds on the ideal storage location, then there is an obvious central place`
			`where we can change it, and that change then easily immediately applies to _all_ the places where`
			`some object's path needs to be determined.`