readme.md
This commit is contained in:
parent
b8c72498ba
commit
7048bb058a
181
README.md
181
README.md
|
@ -1,12 +1,175 @@
|
||||||
# README
|
# README
|
||||||
|
|
||||||
## files
|
A bunch of functionality in here, but the main one is `store.load_or_create`.
|
||||||
|
It might get its own package someday, when it's less before-pre-alpha-sneakpeak-demo as it's now.
|
||||||
|
|
||||||
______________________________________________________________________
|
## load_or_create
|
||||||
|.gitignore |a list of unshared files|
|
|
||||||
|makefile|dev tools for installing, publisizing etc.|
|
### the status quo
|
||||||
|pyproject.toml |project metadata|
|
|
||||||
|requirements.txt |python dependencies|
|
A common pattern is:
|
||||||
|setup.py|necessary for `pip install -e .`|
|
```python
|
||||||
|src/main.py|first file that gets called|
|
def load_obj(path):
|
||||||
--------------------------------------------
|
""
|
||||||
|
|
||||||
|
def save_obj(obj, path):
|
||||||
|
""
|
||||||
|
|
||||||
|
def create_obj(*args, **kwargs):
|
||||||
|
""
|
||||||
|
|
||||||
|
def infer_path(obj_id):
|
||||||
|
return "./obj/" + str(obj_id)
|
||||||
|
|
||||||
|
obj_id = 0
|
||||||
|
path = infer_path(obj_id)
|
||||||
|
if path.exists():
|
||||||
|
obj = load_obj(load)
|
||||||
|
else:
|
||||||
|
obj = create_obj(obj_id, some_kwarg=0)
|
||||||
|
save_obj(obj, path)
|
||||||
|
```
|
||||||
|
|
||||||
|
And in some cases, you want to create and save many variations of the object.
|
||||||
|
It might be better to hash it's characteristics and use that as part of the path.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import sha256
|
||||||
|
import json
|
||||||
|
|
||||||
|
def infer_path(obj_id, **some_other_kwargs):
|
||||||
|
hash = str(sha256(json.dumps(some_other_kwargs)).hexdigest())
|
||||||
|
return "./obj/" + hash + ".pkl"
|
||||||
|
```
|
||||||
|
|
||||||
|
### the problem
|
||||||
|
|
||||||
|
The above is fine and dandy, but when someone wants to use your obj,
|
||||||
|
they'd need to keep track of 4 separate functions.
|
||||||
|
|
||||||
|
You can dress it up as such:
|
||||||
|
```python
|
||||||
|
def get_obj(obj_id, *args, **kwargs):
|
||||||
|
path = infer_path(obj_id)
|
||||||
|
if path.exists():
|
||||||
|
obj = load_obj(load)
|
||||||
|
else:
|
||||||
|
obj = create_obj(obj_id, some_kwarg=0)
|
||||||
|
save_obj(obj, path)
|
||||||
|
return obj
|
||||||
|
```
|
||||||
|
But that takes a lot of freedom away from your user, who might have their
|
||||||
|
own ideas on where the object should be stored.
|
||||||
|
|
||||||
|
### the solution
|
||||||
|
|
||||||
|
```python
|
||||||
|
from jo3util.store import load_or_create
|
||||||
|
get_obj = load_or_create(
|
||||||
|
load=load_obj,
|
||||||
|
save=save_obj,
|
||||||
|
path_fn=infer_path,
|
||||||
|
)(create_obj)
|
||||||
|
|
||||||
|
obj = get_obj(obj_id, some_kwarg=0)
|
||||||
|
|
||||||
|
path_of_obj_0 = get_obj.path(obj_id, some_kwarg=0)
|
||||||
|
path_of_obj_1 = get_obj.path_of_obj(obj)
|
||||||
|
assert path_of_obj_0 == path_of_obj_1
|
||||||
|
```
|
||||||
|
|
||||||
|
You can now elegantly pack the four functions together.
|
||||||
|
But you still have the flexibility to alter the path function on the fly:
|
||||||
|
|
||||||
|
```python
|
||||||
|
get_obj.path_fn = lambda hash: f"./{hash}.pkl"
|
||||||
|
```
|
||||||
|
|
||||||
|
Now, storing different objects of which one is dependent on the other, becomes intuitive and elegant:
|
||||||
|
|
||||||
|
```python
|
||||||
|
get_human = load_or_create(
|
||||||
|
path_fn=lambda name: "./" + name + "/body.txt"
|
||||||
|
)(lambda name: name)
|
||||||
|
get_finger_print = load_or_create(
|
||||||
|
path_fn=lambda finger: get_human.dir_from_obj(human) / f"{finger}.print"
|
||||||
|
)(lambda human, finger: f"{human}'s finger the {finger}")
|
||||||
|
|
||||||
|
assert not get_human.path("john").exists()
|
||||||
|
human = get_human("john")
|
||||||
|
assert get_human.path("john").exists()
|
||||||
|
|
||||||
|
finger_print = get_finger_print(human, "thumb")
|
||||||
|
assert get_finger_print.path(human, "thumb") == "./john/thumb.print"
|
||||||
|
```
|
||||||
|
|
||||||
|
The Finger print is now always stored in the same directory as where the human's `body.txt` is stored.
|
||||||
|
You don't need to keep track of the location of `body.txt`.
|
||||||
|
|
||||||
|
|
||||||
|
### under the hood
|
||||||
|
|
||||||
|
The main trick is to match the parameter names of the `create` function (in our case `create_obj`)
|
||||||
|
with those of the three other subfunctions (in our case `load_obj`, `save_obj` and `infer_path`).
|
||||||
|
|
||||||
|
The three subfunctions's allowed parameters are mostly a non-strict superset of the create function's
|
||||||
|
parameters.
|
||||||
|
|
||||||
|
When you call `get_obj`, something like this happens:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def call_fn_with_filtered_arguments(fn, *args, **kwargs):
|
||||||
|
""" call fn with only the subset of args and kwargs that fn expects.
|
||||||
|
"""
|
||||||
|
path_parameters = get_parameters_that_fn_expects(infer_path)
|
||||||
|
# in reality we first infer the args name, for positional arguments.
|
||||||
|
args = [arg for arg in args if arg in path_parameters]
|
||||||
|
kwargs = {key: arg for key, arg in kwargs.items() if key in path_parameters}
|
||||||
|
return infer_path(*args, **kwargs)
|
||||||
|
|
||||||
|
def get_obj_pseudo_code(*args, **kwargs):
|
||||||
|
hash = some_hash_fn(*args, **kwargs)
|
||||||
|
path = call_fn_with_filtered_arguments(infer_path, *args, hash=hash, **kwargs)
|
||||||
|
if path.exists():
|
||||||
|
return call_fn_with_filtered_arguments(
|
||||||
|
load_obj,
|
||||||
|
*args,
|
||||||
|
path=path,
|
||||||
|
file=open(path, "rb"),
|
||||||
|
**kwargs
|
||||||
|
)
|
||||||
|
|
||||||
|
obj = create_obj(*args, **kwargs)
|
||||||
|
call_fn_with_filtered_arguments(
|
||||||
|
save_obj,
|
||||||
|
qbj,
|
||||||
|
*args,
|
||||||
|
path=path,
|
||||||
|
file=open(path, "wb"),
|
||||||
|
**kwargs
|
||||||
|
)
|
||||||
|
return obj
|
||||||
|
```
|
||||||
|
|
||||||
|
So, the load, save and path functions you provide do not have to have the same signature as the create
|
||||||
|
function does, but you can call them _as if_ they are the create function.
|
||||||
|
|
||||||
|
### philosophy
|
||||||
|
|
||||||
|
The main idea is that some object's storage location should be inferrable from the arguments
|
||||||
|
during its creation call.
|
||||||
|
|
||||||
|
In reality, we tend to separately keep track of some object's path, its arguments and itself.
|
||||||
|
This tends to go bad when we need to load, save or create the object in some other context.
|
||||||
|
It becomes easy to forget where some object ought to be stored.
|
||||||
|
Or it can happen that or different places where the same object is handled, have different opinions on storage location.
|
||||||
|
|
||||||
|
It can lead to duplicates; forgetting where the object was stored; or losing a folder of data
|
||||||
|
because the folder is too unwieldy to salvage.
|
||||||
|
|
||||||
|
By packaging a function with it's load and save countparts and a default storage location, we don't
|
||||||
|
need to worry about storage location anymore and can focus on creating and using our objects.
|
||||||
|
|
||||||
|
If we ever do change our minds on the ideal storage location, then there is an obvious central place
|
||||||
|
where we can change it, and that change then easily immediately applies to _all_ the places where
|
||||||
|
some object's path needs to be determined.
|
||||||
|
|
|
@ -1,7 +1,7 @@
|
||||||
|
|
||||||
[project]
|
[project]
|
||||||
name = "jo3util"
|
name = "jo3util"
|
||||||
version = "0.0.17"
|
version = "0.0.18"
|
||||||
description = ""
|
description = ""
|
||||||
dependencies = []
|
dependencies = []
|
||||||
dynamic = ["readme"]
|
dynamic = ["readme"]
|
||||||
|
|
|
@ -96,7 +96,7 @@ class inner(load_or_create):
|
||||||
|
|
||||||
self.save_wrapper(obj, *args, **{"path": path} | kwargs)
|
self.save_wrapper(obj, *args, **{"path": path} | kwargs)
|
||||||
if "file" in self.save_arg_names: self.hash_obj({"path": path} | kwargs)
|
if "file" in self.save_arg_names: self.hash_obj({"path": path} | kwargs)
|
||||||
if self.save_json: path.with_suffix(".json").write_bytes(self.to_json(**kwargs))
|
if self.save_json: path.with_suffix(".kwargs.json").write_bytes(self.to_json(**kwargs))
|
||||||
|
|
||||||
return obj
|
return obj
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue
Block a user