SQLAlchemy中使用同一模型映射多个表

February 04, 2015 (最后修改: November 09, 2019)

为了查询效率，将单一表根据某种条件分成多张表，每个表的结构完全相同，只有表名不同。已经在SQLAlchemy中定义一个类Record表示表结构，在不同条件下需要修改该类对应的表名。

其实，应该使用更合适的“分表”技术，不需要如此麻烦。但现在我还没研究过“分表”，只好选择这种不优雅的方式。

下面介绍我试过的两种方法，可以实现修改表名。

动态修改表名

这种方式适合select查询，其它未测试过。

定义Record类时，定义默认表的名称

class Record(Model):
    __tablename__ = "record_nwp_cma20n03"

使用Record类时，提前修改表名

Record.__table__.name = "record_"+repo

之后的查询都会在新指定的表中查找。

另一个示例

def main():
    session = Session()
    Record.__table__.name = 'record_{repo_name}'.format(repo_name='nwp_cma20n03')
    query = session.query(Record)
    print query
    Record.__table__.name = 'record_{repo_name}'.format(repo_name='nwp_qu_cma20n03')
    query = session.query(Record)
    print query

输出

SELECT ...
FROM record_nwp_cma20n03
SELECT ...
FROM record_nwp_qu_cma20n03

可以看到，两次查询的表名确实变了。

局限

但该方法并不总能成功，当我用如下的语句插入数据时，表名仅在第一次设置时有效，之后设置均没有效果。

for message in consumer:
    table_name = 'record_{repo_name}'.format(repo_name=message['repo'])
    Record.__table__.name = table_name
    a_record = Record()
    # ...
    session.add(a_record)

具体原因我不清楚，可能每个类只能对应一个映射关系，使用session.query时可以动态更新tablename，但使用session.add时只能使用第一次绑定的映射关系。

如果需要映射多个表，最好为每个表指定一个映射类。

SQLAlchemy的文档中有这种情况的详细描述，参见《Multiple Mappers for One Class》

同一个模型映射多个表

上面的文档中提到一种映射多个相同表的方法，参见《Entity Name》。

使用类继承，定义一个表结构的基类。

class RecordBase(object):
    record_id = Column(Integer(), primary_key=True)
    repo_id = Column(Integer())
    version_id = Column(Integer())
    # ...

Record类继承RecordBase类，接口与原来相同。

class Record(RecordBase, Model):
    __tablename__ = "record_nwp_cma20n03"
    def __init__(self):
        pass
    def __repr__(self):
        return "&lt;record(id={record_id}, string='{record_string}'".format(
            record_id=self.record_id,
            record_string=self.record_string.strip()
        )
    def columns(self):
        return [c.name for c in self.__table__.columns]
    def to_dict(self):
        return dict([(c, getattr(self, c)) for c in self.columns()])

当需要修改为另一个表名时，只需要重新建立一个类即可

class SomeRecord(RecordBase, Model):
    __tablename__ = 'record_{repo_name}'.format(repo_name=content['repo'])

这样确实可以修改表名，但注意上文失败的例子中，我是在一个循环中插入数据，每个数据对应的表名都可能不同，所以for循环内部每次都需要根据表名定义类。当重复定义某表的对应类时，SQLAlchemy会抛出异常。

sqlalchemy.exc.InvalidRequestError: Table 'record_nwp_qu_cma20n03' is already defined for this MetaData instance. Specify 'extend_existing=True' to redefine options and columns on an existing Table object.

似乎可以使用extend_existing参数避免异常，但我没试过。

重复定义完全没有必要，换一种思路解决问题。Python是动态语言，何不利用一下动态语言的强大特性。

动态定义表类

循环内部，根据每条消息的属性，确定该消息需要写入到哪个表中，选择表名对应的类，生成对象，将对象插入到数据库中。

表名对应的类需要动态生成，将表名和类的对应关系保存在一个map中，之后再遇到该表时，不用重新定义类，直接使用之前已创建的类即可。

# record table class mapper
#   key: table name
#   value: table class
record_table_mapper = {}

Python中的type函数可以动态创建类，创建时需要设置__tablename__属性

table_name = 'record_{repo_name}'.format(repo_name=content['repo'])
if table_name not in record_table_mapper:
    record_table_mapper[table_name] = type(
        'Record_{repo_name}'.format(repo_name=content['repo']),
        (RecordBase, Model),
        {'__tablename__': 'record_{repo_name}'.format(repo_name=content['repo'])})
record_table = record_table_mapper[table_name]

之后就可以创建实例

a_record = record_table()
# ...
session.add(a_record)
session.commit()