为什么使用 Pydantic?¶
今天,Pydantic 每月被下载多次,并且被世界上一些最大和最知名的组织所使用。
很难知道为什么自六年前诞生以来有如此多的人采用了 Pydantic,但这里有一些猜测。
类型提示为模式验证提供支持¶
Pydantic 所依据的模式通常是由 Python 类型提示定义的。
类型提示对此非常有用,因为如果正在编写现代 Python,那么已经知道如何使用它们。使用类型提示还意味着 Pydantic 与 mypy 和 pyright 等静态类型工具以及 pycharm 和 vscode 等 IDE 很好地集成。
示例 - 仅输入类型提示
(此示例需要 Python 3.9+)
from typing import Annotated, Dict, List, Literal, Tuple
from annotated_types import Gt
from pydantic import BaseModel
class Fruit(BaseModel):
name: str # (1)!
color: Literal['red', 'green'] # (2)!
weight: Annotated[float, Gt(0)] # (3)!
bazam: Dict[str, List[Tuple[int, bool, float]]] # (4)!
print(
Fruit(
name='Apple',
color='red',
weight=4.2,
bazam={'foobar': [(1, True, 0.1)]},
)
)
#> name='Apple' color='red' weight=4.2 bazam={'foobar': [(1, True, 0.1)]}
- The
name
field is simply annotated withstr
- any string is allowed. - The
Literal
type is used to enforce thatcolor
is either'red'
or'green'
. - Even when we want to apply constraints not encapsulated in python types, we can use
Annotated
andannotated-types
to enforce constraints without breaking type hints. - I'm not claiming "bazam" is really an attribute of fruit, but rather to show that arbitrarily complex types can easily be validated.
了解更多
查看关于支持类型的文档。
性能¶
Pydantic 的核心验证逻辑在一个单独的包 pydantic-core
中实现,大多数类型的验证是在 Rust 中实现的。
因此,Pydantic 是 Python 中最快的数据验证库之一。
性能示例 - Pydantic 与专用代码
一般来说,专用代码应该比通用验证器快得多,但在这个示例中,Pydantic 在解析 JSON 和验证 URL 时比专用代码快 300%以上。
import json
import timeit
from urllib.parse import urlparse
import requests
from pydantic import HttpUrl, TypeAdapter
reps = 7
number = 100
r = requests.get('https://api.github.com/emojis')
r.raise_for_status()
emojis_json = r.content
def emojis_pure_python(raw_data):
data = json.loads(raw_data)
output = {}
for key, value in data.items():
assert isinstance(key, str)
url = urlparse(value)
assert url.scheme in ('https', 'http')
output[key] = url
emojis_pure_python_times = timeit.repeat(
'emojis_pure_python(emojis_json)',
globals={
'emojis_pure_python': emojis_pure_python,
'emojis_json': emojis_json,
},
repeat=reps,
number=number,
)
print(f'pure python: {min(emojis_pure_python_times) / number * 1000:0.2f}ms')
#> pure python: 5.32ms
type_adapter = TypeAdapter(dict[str, HttpUrl])
emojis_pydantic_times = timeit.repeat(
'type_adapter.validate_json(emojis_json)',
globals={
'type_adapter': type_adapter,
'HttpUrl': HttpUrl,
'emojis_json': emojis_json,
},
repeat=reps,
number=number,
)
print(f'pydantic: {min(emojis_pydantic_times) / number * 1000:0.2f}ms')
#> pydantic: 1.54ms
print(
f'Pydantic {min(emojis_pure_python_times) / min(emojis_pydantic_times):0.2f}x faster'
)
#> Pydantic 3.45x faster
与用编译语言编写的其他以性能为中心的库不同,Pydantic 也对通过函数式验证器进行自定义验证有着出色的支持。
了解更多
塞缪尔·科尔文在 2023 年 PyCon 的演讲解释了 pydantic-core
是如何工作的以及它如何与 Pydantic 集成。
序列化¶
Pydantic 提供了以三种方式对模型进行序列化的功能:
-
对于由相关 Python 对象组成的 Python
dict
-
对于仅由“可 JSON 化”类型组成的 Python
dict
- To a JSON string
在所有这三种模式下,输出都可以通过排除特定字段、排除未设置的字段、排除默认值以及排除 None
值来进行定制
示例 - 序列化 3 种方式
from datetime import datetime
from pydantic import BaseModel
class Meeting(BaseModel):
when: datetime
where: bytes
why: str = 'No idea'
m = Meeting(when='2020-01-01T12:00', where='home')
print(m.model_dump(exclude_unset=True))
#> {'when': datetime.datetime(2020, 1, 1, 12, 0), 'where': b'home'}
print(m.model_dump(exclude={'where'}, mode='json'))
#> {'when': '2020-01-01T12:00:00', 'why': 'No idea'}
print(m.model_dump_json(exclude_defaults=True))
#> {"when":"2020-01-01T12:00:00","where":"home"}
了解更多
请参阅序列化文档。
JSON Schema¶
JSON 模式可为任何 Pydantic 模式生成——可实现自文档化的 API 并与支持 JSON 模式的各种工具集成。
示例 - JSON 模式
from datetime import datetime
from pydantic import BaseModel
class Address(BaseModel):
street: str
city: str
zipcode: str
class Meeting(BaseModel):
when: datetime
where: Address
why: str = 'No idea'
print(Meeting.model_json_schema())
"""
{
'$defs': {
'Address': {
'properties': {
'street': {'title': 'Street', 'type': 'string'},
'city': {'title': 'City', 'type': 'string'},
'zipcode': {'title': 'Zipcode', 'type': 'string'},
},
'required': ['street', 'city', 'zipcode'],
'title': 'Address',
'type': 'object',
}
},
'properties': {
'when': {'format': 'date-time', 'title': 'When', 'type': 'string'},
'where': {'$ref': '#/$defs/Address'},
'why': {'default': 'No idea', 'title': 'Why', 'type': 'string'},
},
'required': ['when', 'where'],
'title': 'Meeting',
'type': 'object',
}
"""
Pydantic 生成 JSON Schema 版本 2020-12,该标准的最新版本与 OpenAPI 3.1 兼容。
了解更多
查看关于 JSON Schema 的文档。
严格模式和数据强制¶
默认情况下,Pydantic 对常见的不正确类型是宽容的,并将数据强制转换为正确的类型——例如,传递给 int
字段的数字字符串将被解析为 int
。
Pydantic 也有 strict=True
模式——也被称为“严格模式”——在这种模式下类型不会被强制转换,除非输入数据完全与模式或类型提示匹配,否则会引发验证错误。
但严格模式在验证 JSON 数据时会非常无用,因为 JSON 没有与许多常见的 Python 类型(如 datetime
、 UUID
或 bytes
)匹配的类型。
为了解决这个问题,Pydantic 可以在一步中解析和验证 JSON。这允许像 RFC3339(又名 ISO8601)字符串到 datetime
对象这样合理的数据转换。由于 JSON 解析是在 Rust 中实现的,因此它的性能也非常高。
示例 - 真正有用的严格模式
from datetime import datetime
from pydantic import BaseModel, ValidationError
class Meeting(BaseModel):
when: datetime
where: bytes
m = Meeting.model_validate({'when': '2020-01-01T12:00', 'where': 'home'})
print(m)
#> when=datetime.datetime(2020, 1, 1, 12, 0) where=b'home'
try:
m = Meeting.model_validate(
{'when': '2020-01-01T12:00', 'where': 'home'}, strict=True
)
except ValidationError as e:
print(e)
"""
2 validation errors for Meeting
when
Input should be a valid datetime [type=datetime_type, input_value='2020-01-01T12:00', input_type=str]
where
Input should be a valid bytes [type=bytes_type, input_value='home', input_type=str]
"""
m_json = Meeting.model_validate_json(
'{"when": "2020-01-01T12:00", "where": "home"}'
)
print(m_json)
#> when=datetime.datetime(2020, 1, 1, 12, 0) where=b'home'
了解更多
请参阅严格模式的文档。
数据类、类型字典等¶
Pydantic 提供了四种创建模式以及进行验证和序列化的方式:
-
BaseModel
——具有许多可通过实例方法使用的常用实用程序的 Pydantic 自身的超类。 -
pydantic.dataclasses.dataclass
——是围绕标准数据类的一个包装器,在初始化数据类时执行验证。 -
[
TypeAdapter
][
pydantic.type_adapter.TypeAdapter
]——一种用于对任何类型进行验证和序列化的通用方法。这使得像TypedDict
和
NamedTuple
这样的类型以及像
int
或
timedelta
这样的简单标量值都可以进行验证——所有支持的类型都可以与
TypeAdapter
一起使用。
-
validate_call
——用于在调用函数时执行验证的装饰器。
基于 TypedDict 的示例 - 模式
from datetime import datetime
from typing_extensions import NotRequired, TypedDict
from pydantic import TypeAdapter
class Meeting(TypedDict):
when: datetime
where: bytes
why: NotRequired[str]
meeting_adapter = TypeAdapter(Meeting)
m = meeting_adapter.validate_python( # (1)!
{'when': '2020-01-01T12:00', 'where': 'home'}
)
print(m)
#> {'when': datetime.datetime(2020, 1, 1, 12, 0), 'where': b'home'}
meeting_adapter.dump_python(m, exclude={'where'}) # (2)!
print(meeting_adapter.json_schema()) # (3)!
"""
{
'properties': {
'when': {'format': 'date-time', 'title': 'When', 'type': 'string'},
'where': {'format': 'binary', 'title': 'Where', 'type': 'string'},
'why': {'title': 'Why', 'type': 'string'},
},
'required': ['when', 'where'],
'title': 'Meeting',
'type': 'object',
}
"""
TypeAdapter
for aTypedDict
performing validation, it can also validate JSON data directly withvalidate_json
dump_python
to serialise aTypedDict
to a python object, it can also serialise to JSON withdump_json
TypeAdapter
can also generate JSON Schema
定制¶
功能验证器和序列化器,以及用于自定义类型的强大协议,意味着 Pydantic 的运作方式可以在每个字段或每个类型的基础上进行定制。
自定义示例 - 包装验证器
“包装验证器”是 Pydantic V2 中的新功能,也是自定义 Pydantic 验证最强大的方法之一。
from datetime import datetime, timezone
from pydantic import BaseModel, field_validator
class Meeting(BaseModel):
when: datetime
@field_validator('when', mode='wrap')
def when_now(cls, input_value, handler):
if input_value == 'now':
return datetime.now()
when = handler(input_value)
# in this specific application we know tz naive datetimes are in UTC
if when.tzinfo is None:
when = when.replace(tzinfo=timezone.utc)
return when
print(Meeting(when='2020-01-01T12:00+01:00'))
#> when=datetime.datetime(2020, 1, 1, 12, 0, tzinfo=TzInfo(+01:00))
print(Meeting(when='now'))
#> when=datetime.datetime(2032, 1, 2, 3, 4, 5, 6)
print(Meeting(when='2020-01-01T12:00'))
#> when=datetime.datetime(2020, 1, 1, 12, 0, tzinfo=datetime.timezone.utc)
了解更多
请参阅有关验证器、自定义序列化程序和自定义类型的文档。
生态¶
在撰写本文时,GitHub 上有 214100 个存储库,PyPI 上有 8119 个依赖于 Pydantic 的包。
一些依赖于 Pydantic 的著名库:
huggingface/transformers
107,475 starstiangolo/fastapi
60,355 starshwchase17/langchain
54,514 starsapache/airflow
30,955 starsmicrosoft/DeepSpeed
26,908 starsray-project/ray
26,600 starslm-sys/FastChat
24,924 starsLightning-AI/lightning
24,034 starsOpenBB-finance/OpenBBTerminal
22,785 starsgradio-app/gradio
19,726 starspola-rs/polars
18,587 starsmindsdb/mindsdb
17,242 starsRasaHQ/rasa
16,695 starsmlflow/mlflow
14,780 starsheartexlabs/label-studio
13,634 starsspotDL/spotify-downloader
12,124 starsSanster/lama-cleaner
12,075 starsairbytehq/airbyte
11,174 starsopenai/evals
11,110 starsmatrix-org/synapse
11,071 starsydataai/ydata-profiling
10,884 starspyodide/pyodide
10,245 starstiangolo/sqlmodel
10,160 starslucidrains/DALLE2-pytorch
9,916 starspynecone-io/reflex
9,679 starsPaddlePaddle/PaddleNLP
9,663 starsaws/serverless-application-model
9,061 starsmodin-project/modin
8,808 starsgreat-expectations/great_expectations
8,613 starsdagster-io/dagster
7,908 starsNVlabs/SPADE
7,407 starsbrycedrennan/imaginAIry
7,217 starschroma-core/chroma
7,127 starslucidrains/imagen-pytorch
7,089 starssqlfluff/sqlfluff
6,278 starsdeeppavlov/DeepPavlov
6,278 starsautogluon/autogluon
5,966 starsbridgecrewio/checkov
5,747 starsbentoml/BentoML
5,275 starsreplicate/cog
5,089 starsvitalik/django-ninja
4,623 starsapache/iceberg
4,479 starsjina-ai/discoart
3,820 starsembedchain/embedchain
3,493 starsskypilot-org/skypilot
3,052 starsPrefectHQ/marvin
2,985 starsmicrosoft/FLAML
2,569 starsdocarray/docarray
2,353 starsaws-powertools/powertools-lambda-python
2,198 starsNVIDIA/NeMo-Guardrails
1,830 starsroman-right/beanie
1,299 starsart049/odmantic
807 stars
更多使用 Pydantic 的图书馆可以在 Kludex/awesome-pydantic
找到 。
谁在使用 Pydantic¶
一些使用 Pydantic 的知名公司和组织以及关于我们如何知道它们在使用 Pydantic 的原因/方式的评论。
以下组织被包含在内是因为它们符合以下一个或多个标准:
-
在公共存储库中使用 pydantic 作为依赖项
-
将流量引导至 pydantic 文档站点,来自组织内部域的特定引荐者不包括在内,因为它们通常不在公共领域
-
Pydantic 团队与组织所雇用的工程师之间关于在组织内使用 Pydantic 的直接沟通
我们已经在适当的地方包含了一些额外的细节,并且这些已经在公共领域中。
Adobe¶
adobe/dy-sql
uses Pydantic.
Amazon and AWS¶
- powertools-lambda-python
- awslabs/gluonts
- AWS sponsored Samuel Colvin $5,000 to work on Pydantic in 2022
Anthropic¶
anthropics/anthropic-sdk-python
uses Pydantic.
Apple¶
(Based on the criteria described above)
ASML¶
(Based on the criteria described above)
AstraZeneca¶
Multiple repos in the AstraZeneca
GitHub org depend on Pydantic.
Cisco Systems¶
- Pydantic is listed in their report of Open Source Used In RADKit.
cisco/webex-assistant-sdk
Comcast¶
(Based on the criteria described above)
Datadog¶
- Extensive use of Pydantic in
DataDog/integrations-core
and other repos - Communication with engineers from Datadog about how they use Pydantic.
Facebook¶
Multiple repos in the facebookresearch
GitHub org depend on Pydantic.
GitHub¶
GitHub sponsored Pydantic $750 in 2022
Google¶
Extensive use of Pydantic in google/turbinia
and other repos.
HSBC¶
(Based on the criteria described above)
IBM¶
Multiple repos in the IBM
GitHub org depend on Pydantic.
Intel¶
(Based on the criteria described above)
Intuit¶
(Based on the criteria described above)
Intergovernmental Panel on Climate Change¶
Tweet explaining how the IPCC use Pydantic.
JPMorgan¶
(Based on the criteria described above)
Jupyter¶
- The developers of the Jupyter notebook are using Pydantic for subprojects
- Through the FastAPI-based Jupyter server Jupyverse
- FPS's configuration management.
Microsoft¶
- DeepSpeed deep learning optimisation library uses Pydantic extensively
- Multiple repos in the
microsoft
GitHub org depend on Pydantic, in particular their - Pydantic is also used in the
Azure
GitHub org - Comments on GitHub show Microsoft engineers using Pydantic as part of Windows and Office
Molecular Science Software Institute¶
Multiple repos in the MolSSI
GitHub org depend on Pydantic.
NASA¶
Multiple repos in the NASA
GitHub org depend on Pydantic.
NASA are also using Pydantic via FastAPI in their JWST project to process images from the James Webb Space Telescope, see this tweet.
Netflix¶
Multiple repos in the Netflix
GitHub org depend on Pydantic.
NSA¶
The nsacyber/WALKOFF
repo depends on Pydantic.
NVIDIA¶
Mupltiple repos in the NVIDIA
GitHub org depend on Pydantic.
Their "Omniverse Services" depends on Pydantic according to their documentation.
OpenAI¶
OpenAI use Pydantic for their ChatCompletions API, as per this discussion on GitHub.
Anecdotally, OpenAI use Pydantic extensively for their internal services.
Oracle¶
(Based on the criteria described above)
Palantir¶
(Based on the criteria described above)
Qualcomm¶
(Based on the criteria described above)
Red Hat¶
(Based on the criteria described above)
Revolut¶
Anecdotally, all internal services at Revolut are built with FastAPI and therefore Pydantic.
Robusta¶
The robusta-dev/robusta
repo depends on Pydantic.
Salesforce¶
Salesforce sponsored Samuel Colvin $10,000 to work on Pydantic in 2022.
Starbucks¶
(Based on the criteria described above)
Texas Instruments¶
(Based on the criteria described above)
Twilio¶
(Based on the criteria described above)
Twitter¶
Twitter's the-algorithm
repo where they
open sourced
their recommendation engine uses Pydantic.
UK Home Office¶
(Based on the criteria described above)
本文总阅读量次