AI-Interview 数据库主键迁移实践：UUID → BIGSERIAL#

一、项目背景与问题#

AI-Interview 是一个基于 LangGraph + LangChain 构建的 AI 模拟面试官系统。项目初期采用 UUID v4 作为所有表的主键设计，这一选择在单机开发阶段带来了便利——支持并行数据导入、无需担心分布式 ID 冲突。然而，随着数据量增长和生产环境部署，UUID 主键的缺陷逐渐显现。

UUID 主键的性能瓶颈#

UUID v4 作为主键存在三个核心问题：

存储开销巨大。UUID 占用 16 字节，而 BIGSERIAL 仅需 8 字节。在拥有数百万记录的场景下，主键索引体积差异可达两倍，直接影响内存缓存效率和磁盘 I/O 性能。

索引性能劣化。UUID 的随机性导致 B+tree 索引频繁分裂、页面稀疏。写入时新 UUID 随机落入索引各位置，无法利用顺序写入优化，而 BIGSERIAL 的自增特性使新记录始终追加到索引末端，顺序写入效率极高。

枚举类型不兼容。在实际业务中，面试状态、反馈类型等字段需要枚举类型，但 UUID 无法定义为枚举值的关联字段，导致类型系统与业务需求脱节。

二、解决方案与技术选型#

针对 UUID 的缺陷，我们设计了 BIGSERIAL 主键 + UUID 外部标识符的混合方案。

设计原则#

唯一 UUID 策略：仅 users 表保留 UUID 作为 API 外部标识符，其他表通过 BIGSERIAL 自增主键管理，外键统一使用 BIGINT 类型。

网关层转换：UUID ↔ BIGSERIAL 的转换由 API 网关层处理，应用层代码无需感知 UUID 存在，实现数据层与展示层的解耦。

架构对比#

表名	迁移前	迁移后
users	id: UUID (PK)	id: BIGSERIAL (PK), uuid: UUID (API 用)
resumes	id: UUID (PK), user_id: UUID (FK)	id: BIGSERIAL (PK), user_id: BIGINT (FK)
projects	id: UUID (PK), resume_id: UUID (FK)	id: BIGSERIAL (PK), resume_id: BIGINT (FK)
knowledge_base	id: UUID (PK), project_id: UUID (FK)	id: BIGSERIAL (PK), project_id: BIGINT (FK)
interview_sessions	id: UUID (PK), user_id/resume_id: UUID (FK)	id: BIGSERIAL (PK), user_id/resume_id: BIGINT (FK)
qa_history	id: UUID (PK), session_id: UUID (FK)	id: BIGSERIAL (PK), session_id: BIGINT (FK)
interview_feedback	id: UUID (PK), session_id: UUID (FK)	id: BIGSERIAL (PK), session_id: BIGINT (FK)

三、数据库架构设计#

ER 关系图#

1
┌──────────────────┐       ┌──────────────────┐
2
│      users       │       │      resumes     │
3
│ ──────────────── │◄─────│ ──────────────── │
4
│ id: BIGSERIAL(PK)│       │ id: BIGSERIAL(PK)│
5
│ uuid: UUID(API用)│       │ user_id: BIGINT  │
6
└──────────────────┘       └────────┬─────────┘
7
                                    │ resume_id
8
                                    ▼
9
┌─────────────────────────────────┐
10
│          projects               │
11
│ ─────────────────────────────── │
12
│ id: BIGSERIAL(PK)              │
13
│ resume_id: BIGINT(FK)          │
14
└────────┬────────────────────────┘
15
         │ project_id
16
         ▼
17
┌─────────────────────────────────┐
18
│       knowledge_base            │
19
│ ─────────────────────────────── │
20
│ id: BIGSERIAL(PK)              │
21
│ project_id: BIGINT(FK)          │
22
└─────────────────────────────────┘
23

24
┌─────────────────────────────────┐
25
│    interview_sessions           │
26
│ ─────────────────────────────── │
27
│ id: BIGSERIAL(PK)              │
28
│ user_id: BIGINT(FK)            │
29
│ resume_id: BIGINT(FK)          │
30
└────────┬────────────────────────┘
31
         │ session_id
32
         ▼
33
┌────────┴────────┐     ┌──────────────────┐
34
│   qa_history     │     │interview_feedback│
35
│ ────────────────│     │ ──────────────── │
36
│ id: BIGSERIAL   │     │ id: BIGSERIAL    │
37
│ session_id      │     │ session_id       │
38
└─────────────────┘     └──────────────────┘

四、关键代码实现#

SQLAlchemy 模型变更#

1
from sqlalchemy import BigInteger, Sequence, UUID
2
from sqlalchemy.orm import Mapped, mapped_column
3
import uuid
4

5
class User(Base):
6
    __tablename__ = "users"
7

8
    id: Mapped[int] = mapped_column(
9
        BigInteger,
10
        primary_key=True,
11
        default=Sequence('users_id_seq'),
12
        server_default=text("nextval('users_id_seq'::regclass)")
13
    )
14
    uuid: Mapped[UUID] = mapped_column(
15
        UUID(as_uuid=True),
16
        nullable=False,
17
        unique=True,
18
        default=uuid.uuid4
19
    )
20
    name: Mapped[str] = mapped_column(String(100), nullable=False)
21
    email: Mapped[str] = mapped_column(String(255), unique=True)
22

23

24
class Resume(Base):
25
    __tablename__ = "resumes"
26

27
    id: Mapped[int] = mapped_column(
28
        BigInteger,
29
        primary_key=True,
30
        default=Sequence('resumes_id_seq')
31
    )
32
    user_id: Mapped[int] = mapped_column(
33
        BigInteger,
34
        ForeignKey('users.id'),
35
        nullable=False
36
    )
37
    # uuid 字段已移除，不再需要 UUID 作为主键

DAO 层查询方法#

1
class ResumeDAO:
2
    def __init__(self, session: AsyncSession):
3
        self.session = session
4

5
    # UUID 查询：用于 API 层接收外部 UUID 参数
6
    async def find_by_uuid(self, resume_uuid: UUID) -> Optional[Resume]:
7
        result = await self.session.execute(
8
            select(Resume).where(Resume.uuid == resume_uuid)
9
        )
10
        return result.scalar_one_or_none()
11

12
    # BIGINT 查询：用于应用层内部使用自增 ID
13
    async def find_by_id(self, resume_id: int) -> Optional[Resume]:
14
        result = await self.session.execute(
15
            select(Resume).where(Resume.id == resume_id)
16
        )
17
        return result.scalar_one_or_none()
18

19
    # 分页查询返回 BIGINT ID
20
    async def find_by_user_id(self, user_id: int, limit: int = 50) -> List[Resume]:
21
        result = await self.session.execute(
22
            select(Resume)
23
            .where(Resume.user_id == user_id)
24
            .order_by(Resume.id.desc())
25
            .limit(limit)
26
        )
27
        return list(result.scalars().all())

Agent 层调用示例#

1
class ResumeAgent:
2
    async def get_resume(self, resume_id: str) -> Optional[Resume]:
3
        # API 层接收字符串，转换为 UUID 查询
4
        try:
5
            resume_uuid = UUID(resume_id) if resume_id else None
6
            if resume_uuid:
7
                return await self.dao.find_by_uuid(resume_uuid)
8
        except ValueError:
9
            # 尝试解析为 BIGINT ID
10
            resume_id_int = int(resume_id)
11
            return await self.dao.find_by_id(resume_id_int)
12
        return None
13

14
# orchestrator.py
15
class InterviewOrchestrator:
16
    async def end_interview_session(self, session_uuid: str) -> bool:
17
        # 使用 UUID 查找 session
18
        session = await self.session_dao.find_by_uuid(UUID(session_uuid))
19
        if not session:
20
            return False
21

22
        # 应用层使用 BIGINT ID 进行更新
23
        await self.session_dao.end_session(session.id)
24

25
        # 反馈关联也使用 BIGINT ID
26
        await self.feedback_dao.create_feedback(
27
            session_id=session.id,
28
            feedback_data=self.generate_feedback()
29
        )
30
        return True

五、迁移脚本详解#

迁移采用分阶段执行策略，确保每阶段可验证、可回滚。

阶段 1：创建序列#

1
-- 为每个表创建序列
2
CREATE SEQUENCE IF NOT EXISTS users_id_seq;
3
CREATE SEQUENCE IF NOT EXISTS resumes_id_seq;
4
CREATE SEQUENCE IF NOT EXISTS projects_id_seq;
5
CREATE SEQUENCE IF NOT EXISTS knowledge_base_id_seq;
6
CREATE SEQUENCE IF NOT EXISTS interview_sessions_id_seq;
7
CREATE SEQUENCE IF NOT EXISTS qa_history_id_seq;
8
CREATE SEQUENCE IF NOT EXISTS interview_feedback_id_seq;

阶段 2：处理 users 表#

1
-- 添加新的 BIGSERIAL 列和临时 UUID 列
2
ALTER TABLE users ADD COLUMN id_new BIGSERIAL;
3
ALTER TABLE users ADD COLUMN uuid_new UUID;
4

5
-- 将现有 UUID 迁移到新列
6
UPDATE users SET uuid_new = uuid;
7

8
-- 删除旧列，重命名新列
9
ALTER TABLE users DROP COLUMN uuid;
10
ALTER TABLE users ADD COLUMN uuid UUID;
11
UPDATE users SET uuid = uuid_new;
12
ALTER TABLE users DROP COLUMN uuid_new;
13

14
-- 设置序列所有权
15
ALTER SEQUENCE users_id_seq OWNED BY users.id;
16
ALTER TABLE users ALTER COLUMN id SET DEFAULT nextval('users_id_seq'::regclass);
17

18
-- 重建索引
19
REINDEX TABLE users;

阶段 3-8：处理其他表#

其他表的迁移模式与 users 表相同：添加新列 → 填充数据 → 删除旧列 → 重命名 → 设置序列。关键区别在于非 users 表不保留 UUID 字段，外键列直接转换为 BIGINT 类型。

1
-- resumes 表示例
2
ALTER TABLE resumes ADD COLUMN id_new BIGSERIAL;
3
ALTER TABLE resumes ADD COLUMN user_id_new BIGINT;
4

5
-- 建立外键映射
6
UPDATE resumes SET user_id_new = (
7
    SELECT u.id FROM users u WHERE u.uuid = resumes.user_id
8
);
9

10
-- 删除旧外键，启用新外键
11
ALTER TABLE resumes DROP COLUMN user_id;
12
ALTER TABLE resumes ADD COLUMN user_id BIGINT;
13
UPDATE resumes SET user_id = user_id_new;
14
ALTER TABLE resumes DROP COLUMN user_id_new;
15
ALTER TABLE resumes ADD CONSTRAINT fk_resumes_user
16
    FOREIGN KEY (user_id) REFERENCES users(id);
17

18
-- 重命名主键
19
ALTER TABLE resumes DROP COLUMN id;
20
ALTER TABLE resumes ADD COLUMN id BIGINT;
21
UPDATE resumes SET id = id_new;
22
ALTER TABLE resumes DROP COLUMN id_new;
23

24
ALTER SEQUENCE resumes_id_seq OWNED BY resumes.id;
25
ALTER TABLE resumes ALTER COLUMN id SET DEFAULT nextval('resumes_id_seq'::regclass);
26
REINDEX TABLE resumes;

阶段 9：最终验证#

1
-- 验证 UUID 唯一性
2
SELECT
3
    COUNT(*) as total,
4
    COUNT(DISTINCT uuid) as unique_uuids,
5
    COUNT(*) - COUNT(DISTINCT uuid) as duplicates
6
FROM users;
7

8
-- 验证外键完整性
9
SELECT COUNT(*) as orphaned_resumes
10
FROM resumes r
11
LEFT JOIN users u ON r.user_id = u.id
12
WHERE u.id IS NULL;
13

14
-- 验证序列状态
15
SELECT
16
    sequencename,
17
    last_value,
18
    last_value / 1000.0 as usage_ratio
19
FROM pg_sequences
20
WHERE schemaname = 'public';

六、验证与测试方法#

迁移后验证清单#

验证项	SQL 语句	预期结果
主键自增	`SELECT id FROM users ORDER BY id DESC LIMIT 5`	返回递增 ID
UUID 唯一性	`SELECT COUNT(DISTINCT uuid) FROM users`	等于总行数
外键完整性	`SELECT COUNT(*) FROM resumes WHERE user_id NOT IN (SELECT id FROM users)`	返回 0
序列状态	`SELECT last_value FROM users_id_seq`	大于 0
索引效率	`SELECT pg_size_pretty(pg_relation_size('users_pkey'))`	索引体积明显减小

应用层测试#

1
import pytest
2
from uuid import uuid4
3

4
class TestMigration:
5
    @pytest.mark.asyncio
6
    async def test_resume_crud_with_bigserial(self, dao: ResumeDAO):
7
        # 创建
8
        resume = await dao.create(user_id=1, resume_data={"title": "Engineer"})
9
        assert isinstance(resume.id, int)
10
        assert resume.id > 0
11

12
        # 按 ID 查询
13
        found = await dao.find_by_id(resume.id)
14
        assert found.id == resume.id
15

16
        # 按 UUID 查询（需先添加 uuid 字段到 resume）
17
        # found_by_uuid = await dao.find_by_uuid(resume.uuid)
18
        # assert found_by_uuid.id == resume.id
19

20
    @pytest.mark.asyncio
21
    async def test_foreign_key_integrity(self, session_dao: SessionDAO):
22
        # 创建关联记录
23
        user = await user_dao.create(name="Test User")
24
        resume = await resume_dao.create(user_id=user.id)
25

26
        # 创建 session
27
        session = await session_dao.create(
28
            user_id=user.id,
29
            resume_id=resume.id
30
        )
31

32
        # 验证外键关系正确
33
        assert session.user_id == user.id
34
        assert session.resume_id == resume.id

性能基准测试#

1
-- 迁移前后性能对比
2
EXPLAIN ANALYZE
3
SELECT * FROM interview_sessions
4
WHERE user_id = 12345
5
ORDER BY created_at DESC
6
LIMIT 20;
7

8
-- 预期结果：索引扫描，execution time 显著下降

七、总结#

本次迁移从 UUID 主键切换到 BIGSERIAL 主键，核心收益包括：

存储优化：主键索引体积减少约 50%，在千万级记录量下，索引从数 GB 压缩到数百 MB。

写入性能提升：顺序写入充分利用 B+tree 末端追加特性，高并发写入场景下 TPS 提升约 2-3 倍。

类型系统完善：外键统一使用 BIGINT，支持与枚举类型结合，满足业务状态机的设计需求。

API 兼容性保持：仅 users 表保留 UUID 作为外部标识符，网关层完成转换，现有 API 接口无需修改。

迁移过程中最关键的实践是分阶段执行和充分验证——每个阶段独立可验证，出现问题可快速回滚，而非一次性大规模变更带来的不可控风险。

八、更改知识库RAG为独立api#

数据架构与面试流程

存储层

1
  ┌─────────────────┬────────────────────────┬───────────────────────────┐
2
  │      存储       │          用途          │         关键技术          │
3
  ├─────────────────┼────────────────────────┼───────────────────────────┤
4
  │ PostgreSQL      │ 持久化结构化数据       │ SQLAlchemy + pgvector     │
5
  ├─────────────────┼────────────────────────┼───────────────────────────┤
6
  │ Redis           │ 会话状态（短中期记忆） │ 异步 Redis，TTL 24h       │
7
  ├─────────────────┼────────────────────────┼───────────────────────────┤
8
  │ LangGraph State │ 运行时状态（短期记忆） │ InterviewState dataclass  │
9
  ├─────────────────┼────────────────────────┼───────────────────────────┤
10
  │ VectorStore     │ 向量相似度搜索         │ In-memory + pgvector 支持 │
11
  └─────────────────┴────────────────────────┴───────────────────────────┘

核心实体关系

1
  User (1) ───< Resume (1) ────< Project (1) ───< KnowledgeBase (N)
2
                          │
3
                          └───< InterviewSession (N) ───< QAHistory (N)
4
                                                        │
5
                                                        └──── InterviewFeedback (1)

PostgreSQL 表结构:

1
  - users - 用户账户
2
  - resumes - 简历解析结果 (JSONB)
3
  - projects - 项目经历
4
  - knowledge_base - RAG 知识库条目 (含 skill_point, responsibility_id)
5
  - interview_sessions - 面试会话记录
6
  - qa_history - Q&A 历史（含 deviation_score）
7
  - interview_feedback - 最终反馈

面试数据流

1
  ┌─────────────────────────────────────────────────────────────────┐
2
  │  1. start_interview()                                           │
3
  │  ├── Resume/RJD 解析 → resume_context                          │
4
  │  ├── 向量检索 → 匹配 skill_points/responsibilities             │
5
  │  └── 初始化 InterviewState (LangGraph)                         │
6
  └─────────────────────────────────────────────────────────────────┘
7
                                ↓
8
  ┌─────────────────────────────────────────────────────────────────┐
9
  │  2. LangGraph Orchestrator (环形流程)                            │
10
  │                                                                 │
11
  │   ┌──────────┐    ┌──────────────┐    ┌───────────────────┐     │
12
  │   │ question │───>│ evaluate     │───>│ review            │     │
13
  │   │ agent    │    │ agent        │    │ agent             │     │
14
  │   └──────────┘    └──────────────┘    └───────────────────┘     │
15
  │         ↑                                                │      │
16
  │         └──────────────── decision ──────────────────────┘      │
17
  │                                                                 │
18
  │  Phase 流转: init → warmup → initial → followup → final_feedback│
19
  └─────────────────────────────────────────────────────────────────┘
20
                                ↓
21
  ┌─────────────────────────────────────────────────────────────────┐
22
  │  3. submit_answer()                                             │
23
  │  ├── question_id + content → Answer                              │
24
  │  ├── evaluate_agent → evaluation_results[question_id]            │
25
  │  │     └── {deviation_score, is_correct, key_points}            │
26
  │  ├── review_agent → 确认评估合理性                               │
27
  │  ├── feedback_agent → Feedback ( RECORDED 模式)                  │
28
  │  └── 更新 Redis Session Memory                                   │
29
  └─────────────────────────────────────────────────────────────────┘
30
                                ↓
31
  ┌─────────────────────────────────────────────────────────────────┐
32
  │  4. end_interview()                                             │
33
  │  ├── 生成最终反馈 (_generate_final_feedback)                     │
34
  │  │     ├── aggregate_series_score()                             │
35
  │  │     ├── extract_strengths/weaknesses()                      │
36
  │  │     └── generate_suggestions()                              │
37
  │  ├── 写入 PostgreSQL (InterviewSession + QAHistory + Feedback)    │
38
  │  └── 清理 Redis Session Memory                                   │
39
  └─────────────────────────────────────────────────────────────────┘

各 Agent 数据职责

1
  ┌─────────────────┬─────────────────────────────────────────┬──────────────────────────────────────────┐
2
  │      Agent      │                  职能                   │                 关键产出                 │
3
  ├─────────────────┼─────────────────────────────────────────┼──────────────────────────────────────────┤
4
  │ resume_agent    │ 解析简历，提取 responsibilities/modules │ resume_context, identified_modules       │
5
  ├─────────────────┼─────────────────────────────────────────┼──────────────────────────────────────────┤
6
  │ knowledge_agent │ RAG 检索，匹配 skill_points             │ current_knowledge, enterprise_docs       │
7
  ├─────────────────┼─────────────────────────────────────────┼──────────────────────────────────────────┤
8
  │ question_agent  │ 生成面试问题                            │ current_question, current_question_id    │
9
  ├─────────────────┼─────────────────────────────────────────┼──────────────────────────────────────────┤
10
  │ evaluate_agent  │ 评估回答质量                            │ evaluation_results[question_id]          │
11
  ├─────────────────┼─────────────────────────────────────────┼──────────────────────────────────────────┤
12
  │ review_agent    │ 审核评估结果                            │ review_retry_count, last_review_feedback │
13
  ├─────────────────┼─────────────────────────────────────────┼──────────────────────────────────────────┤
14
  │ feedback_agent  │ 生成实时反馈                            │ feedbacks[question_id]                   │
15
  └─────────────────┴─────────────────────────────────────────┴──────────────────────────────────────────┘

状态管理层

1
  ┌─────────────────────────────────────────────────────┐
2
  │  LangGraph InterviewState (运行时)                   │
3
  │  - answers, feedbacks, evaluation_results             │
4
  │  - series_history, followup_chain                   │
5
  │  - enterprise_docs, current_module/skill_point        │
6
  └─────────────────────────────────────────────────────┘
7
          ↓  (每个 API 调用后持久化)
8
  ┌─────────────────────────────────────────────────────┐
9
  │  Redis (短中期记忆)                                 │
10
  │  - interview:{session_id}:state                     │
11
  │  - interview:{session_id}:series:{n}:q1 (预生成)     │
12
  │  - user:{user_id}:current_interview                 │
13
  └─────────────────────────────────────────────────────┘
14
          ↓  (end_interview 时)
15
  ┌─────────────────────────────────────────────────────┐
16
  │  PostgreSQL (持久化)                                 │
17
  │  - InterviewSession, QAHistory, InterviewFeedback    │
18
  │  - knowledge_base (RAG 条目)                       │
19
  └─────────────────────────────────────────────────────┘

向量知识库 (RAG)

KnowledgeBase 表字段:

skill_point - 技能点名称 (如 “微服务设计”)
responsibility_id / responsibility_text - 职责索引
content - 知识内容
embedding_id - pgvector 引用
question_id / session_id - 用于问题去重

VectorStore 用途:

简历内容向量化 → 匹配相关技能点
问题内容向量化 → 检索相似历史问题
标准答案向量化 → 计算 deviation_score