Finding Python Models

Loading Document Objects to Beanie Dynamically

I have a feeling this post is going to be quite dense. The idea to write it came from an episode of Talk Python where Michael Kennedy interviewed Roman Right, the creator of Beanie, my Python object-document mapper (ODM) of choice for MongoDB.

Lightning fast background here: Beanie is a library that provides a way to connect with and interact with MongoDB databases asynchronously. It does so primarily through a Document class that is used as the parent class to your database models. Beanie then allows you to perform basic CRUD operations (among other advanced features) by using your database models directly.

Phew!

Anyway, on the podcast mentioned above, the subject came up about how the init_beanie function (which initializes a connection to the MongoDB database) requires a list of Document models to be passed in as a parameter. (This is necessary to specify which models correspond to the active database connection—as one could potentially have different models for different databases).

In any case, it's very likely that you will have multiple database models within a particular application that you would want to initialize.

Especially during development, it would be nice if you could ask Beanie to include all relevant models contained within a given directory.

And yes, Michael asked that very same question to Roman.

While that feature hasn't been implemented (yet?), I thought it would be a good idea to write about my solution.

In Plain English

As I've previously mentioned, I started this project as a Flask application, and I was facing a similar issue. I wanted to be able to load my database models dynamically.

I came across this absolutely fantastic resource by Bob Waycott explaining precisely how to do that very thing. In Flask.

While I initially started building this site in Flask and SQLAlchemy, I eventually moved away from those libraries altogether, but the idea of a dynamic loader stayed with me.

To accomplish a similar task with Beanie, my basic solution needed to accomplish the following:

  • Walk a given directory in search of all modules (my .py files)
  • Look inside each of those modules in search of objects
  • Determine if objects found were Beanie Document models
  • Create an iterable of the models found
  • Return a list with dot separated paths to each of those models

Where to Start

I don't want to rehash instructions on how to use Beanie. The documentation already does an awesome job at that.

Relevant to this topic, I'm focusing on how one goes about connecting to a database. It involves creating the client (using the Motor library) and initializing the connection with Beanie's built in method.

That part of the code looks like this:

await init_beanie(database=client.db_name, document_models=[Sample])

Note that the Beanie documentation also states:

init_beanie supports not only list of classes for the document_models parameter, but also strings with the dot separated paths

As such, my end goal is to provide a list for document_models that only includes my relevant Beanie models. Something like:

document_models=[
    "app.models.article.ArticleModel", 
    "app.models.user.UserModel",
    "app.models.tags.TagModel"
]

So, to get that list of dot separated paths, I'll write a load_beanie_models function to find relevant models in a given directory.

Simple, right?

To Be or Not to Beanie

My solution that follows is not necessarily chronological in terms of the logic laid out above, but hopefully it makes sense as I start explaining it.

One of the primary tasks I will have to do is determine whether a given object is a Beanie Document to begin with.

To do so, I created a simple function that receives an item and determines if it is a Beanie Document.

It's pretty straightforward:

from inspect import isclass

def is_beanie_model(item: Model) -> bool:
"""Determines if item is a Beanie Document object."""
return (isclass(item) and issubclass(item, Document)

This uses the builtin inspect library to access the isclass method. It checks to make sure item is a class, and in addition, a subclass of the Beanie Document model. The return value is a boolean True or False.

As an aside: All Beanie database objects are built as a subclass of Document, which itself is a subclass of pydantic's BaseModel class.

Let's Go Get Them

Before I jump into the more complex parts, I decided to create a simple function with no arguments. It is responsible for calling my dynamic_loader function (which I'll talk about below) and gathers objects based on the is_beanie_model comparison function mentioned above, as well as a string value of the directory I want to traverse.

This allows me to potentially use the loader for things other than Beanie models. (Bob Waycott—who I mentioned above—uses it as a dynamic way to register views in Flask.)

def get_beanie_models() -> list[str]:
"""Dynamic Beanie model finder."""
return dynamic_loader("models", is_beanie_model)

This function will provide me with the end result—a list of my Beanie database models. You also see a peek of my dynamic_loader function, which I have not gotten to quite yet.

Stay tuned.

But First, Path to Enlightenment (or Models)

You'll note that the dynamic_loader function takes a string value of "models".

As mentioned above, this is the name of the directory that contains all my database models (I could theoretically start searching at my root directory, in case your models are peppered around in your app, though I wouldn't recommend that—but app structure is another topic altogether).

For reference, this is a simplified version of my app structure. All of my functions are defined in the src/lib/util.py module.

src/
┣ api/
┣ core/
┃ ┣ config.py
┃ ┣ security.py
┃ ┗ __init__.py
┣ crud/
┃ ┣ article.py
┃ ┣ user.py
┃ ┗ __init__.py
┣ db/
┃ ┣ db.py
┃ ┗ __init__.py
┣ lib/
┃ ┣ util.py  # my functions live here
┃ ┗ __init__.py
┣ models/
┃ ┣ mixins/
┃ ┣ article.py
┃ ┣ base.py
┃ ┣ tags.py
┃ ┣ user.py
┃ ┗ __init__.py
┣ main.py
┗ __init__.py

I decided to build all my app models in a "models" subdirectory. As a result, I want to do a recursive search in that directory for all existing modules (any .py files).

Here, I use the excellent, builtin Pathlib library to help with all of this. Again, from my util.py module:

from typing import Iterator
from Pathlib import Path

APPLICATION_DIR = Path(__file__).parent.parent  # App's root directory

def get_modules(module) -> Iterator[str]:
    """Returns all .py modules in given file_dir as
    a generator of dot separated string values.
    """
    file_dir = Path(APPLICATION_DIR / module)
    idx_app_root = len(APPLICATION_DIR.parts) - 1  
    modules = [f for f in list(file_dir.rglob("*.py"))
            if not f.stem == "__init__"]
    for filepath in modules:
        yield (".".join(filepath.parts[idx_app_root:])[0:-3])

And as these things go, here's the breakdown.

Depending on where you are in your app directory making the call, you can get a Pathlib object by using Path(__file__) and some combination of .parent notation to specify your application root.

Once we have the APPLICATION_DIR (your root directory), you can append the path with / and the name of your model directory (I call the variable module). This becomes your file_dir which will be searched recursively.

Ultimately, once we find the paths to all our .py files, I will use Pathlib's .parts to split the path into individual components of that path. That may sound more complicated than it is.

But first, I want to know the "index" of where my dot separated string paths need to start.

For example, lets say my APPLICATION_DIR is located in c:\\project\app\src. After splitting that in parts, I will have a tuple of ("c:\\", "projects", "app", "src) and len value of four.

When using import statements, I want my dot separted paths to start with src. So later, let's say my Beanie Document is located in c:\\project\app\src\models\article.py, I would want my dot separated path to say something like src.models.article.BeanieDocument.

So backing up a bit, If I split my APPLICATION_DIR into "parts", I want to know the index of my src directory. That will always be the equivalent of len(APPLICATION_DIR.parts) - 1.

For those of you counting at home, that means the idx_app_root value is 4 - 1... or 3... aka src.

Next, there's that scary looking list comprehension.

modules = [f for f in list(file_dir.rglob("*.py"))
        if not f.stem == "__init__"]

I'll unpack it and write it long form to see if it helps (maybe I should do this in my app too? Is this more pythonic?):

all_modules = list(file_dir.rglob("*.py"))
modules = []

for file in all_modules:
    if not file.stem == "__init__":
        modules.append(file)

Pathlib allows us to rglob the given directory, which means it yields all existing files that match the given pattern. Next, we iterate over all_modules to ensure we are not pulling in any .py files named __init__ (if you've read this far, you are probably aware as to why we would want to exclude those files).

Now that we have the paths to all the modules (less __init__.py) from the given directory, we can create a generator to return the dot separated paths to these modules.

for file in modules:
    yield (".".join(filepath.parts[idx_app_root:])[0:-3])

Remember that idx_app_root? Now I can use it on each of the paths to get the relevant, dot separated path.

So if you follow, you can see that c:\\project\app\src\models\article.py becomes ("c:\\", "project", "app", "src", "models", "article", ".py") after splitting it in parts.

Since we know my index is 3, I can join all the values from that index forward using the slicing notation of [3:], or rather, [idx_app_root:].

By using .join method on the path parts, I then get "src.models.article.py" as my string value. Lastly, I slice away the last three characters with [0:-3] to remove the ".py".

My resulting path looks like src.models.article.

Wow, that seems like way too much explanation for something relatively simple. To be honest, I could've really used something like this when figuring this out, so here it is for posterity.

Almost There

Lastly, I need to check each one of those modules to see if there are any Beanie models in there to begin with. But a quick aside.

I want you to take a quick look at my article.py module, which holds my database model for my blog articles:

from beanie import Document
from pydantic import BaseModel

__all__ = (
    "ArticleBase",
    "Article",
    "ArticleCreate",
    "ArticleDB",
)

class ArticleBase(BaseModel):
    ...  # hidden for sake of brevity

class Article(ArticleBase):
    ...  # ditto

class ArticleCreate(ArticleBase):
    ...  # same

class ArticleDB(ArticleBase, Document):
    ...  # samesies

Notice two things.

First of all, I explicitly list the objects that I have defined in my module and assign those to the __all__ variable. I want to make sure that I control what is seen and not seen throughout my app's code.

Secondly, notice how most of my models are actually not Beanie models at all. The only one that matters to Beanie is the last one, ArticleDB. It is the only one subclassed by the Document class.

Talking about my database model design is beyond the scope here, but I wanted to point out that even though the __all__ variable is optional here, I do make use of it in my dynamic loader function.

DYNAMITE!!

We made it! Almost...

The last piece of the puzzle is taking all the dot separated paths we found with the get_models function and extracting only the paths that contain Beanie Document objects.

May I present:

from importlib import import_module

def dynamic_loader(module, compare) -> list:
    """Iterates over all .py files in `module` directory,
    finding all classes that match `compare` function.
    """
    items = []
    for mod in get_modules(module):
        module = import_module(mod)
        if hasattr(module, "__all__"):
            objs = ([getattr(module, obj)
                    for obj in module.__all__]) 
            items += [o for o in objs 
                    if compare(o) and o not in items]
    return items

This might look a little complicated, but it really isn't. The real hero here is the import_module method from the standard importlib library. This allows us to import a module within our function (just like you do at the top of your .py files).

So with the get_modules function we defined, we get a generator object containing all modules within a given directory.

Then, for each module (mod), we import it into our function and inspect it to see if it has the __all__ variable I mentioned above.

If so, for each object contained in __all__, we use the getattr method to get the named attribute of that object. (Hint: that's how we can tell if an object is a Beanie Document!)

Once we have a list of those objects and their attributes, I use a list comprehension to sort out which objects are Beanie objects by using the comparator function that we started with (is_beanie_model).

The resulting items list contains only the relevant objects.

Python By ZZZ

I know this was a long one.

We are really done here, but I want to point out an oversight I made when finding this solution. I was so focused on getting the "strings with dot separated paths"—I missed the fact that I was already technically done.

Observe...

def get_beanie_models() -> List[str]:
"""Dynamic Beanie model finder."""  
    return dynamic_loader("models", is_beanie_model)


def db_models() -> List[str]:
"""Create a list of Beanie models to include in db initialization."""
    objs = get_beanie_models()

    obj_list = [f"{o.__module__}.{o.__name__}" for o in objs]

    return obj_list  # this works
    # return objs  # this also works

To create the dot separated paths, I took the Beanie Document objects and created a new list, using __module__ to get the module path and the __name__ method to get the Document object's name.

This returns a nice, clean dot separated string path. In my case, it would look something like src.models.article.ArticleDB.

However, I glossed over the primary part that allows me to pass the actual Beanie objects directly to the document_models parameter.

Again, from the Beanie docs, now with my emphasis:

init_beanie supports not only list of classes for the document_models parameter, but also strings with the dot separated paths

By using objs directly, I am sending the actual object to the init_beanie function. As a result, instead of a path, I would be passing something like <class 'src.models.article.ArticleDB>.

Anyway, both solutions work. If you're trying this at home, go for the second one. (Ignore the db_models function.)

Live and learn I guess.

I actually didn't know about the second solution until I wrote this article. I was so fixated on getting the dot separated paths that I missed the more obvious (and simpler) solution. The lesson here is: Write a blog article, learn something.

So now, all I have to do to load my Document models to Beanie is:

# Initialize Beanie with dynamically loaded Document models

await init_beanie(
    database=client.db_name,
    document_models=db_models()  # or just get_beanie_models()
)

And there you have it.

That's how I dynamically load my database models to Beanie. When I create new database objects, as long as they are subclassed with Beanie's Document model and included in the module's __all__ variable, I don't have to remember to include it in init_beanie.

Now, I'm going to get some sleep. It's late.

Sigh, wanted to mention one last thing. Note that you can use the dynamic_loader function to find just about any kind of Python object in any given directory. All it would take is a comparator function and the name of any directory. So you could technically create something similar to is_beanie_model, maybe like is_route or is_view and end up with a similar end result.