gh-131798: Narrow the return type of _FORMAT_SIMPLE and _FORMAT_WITH_SPEC to str#146639
gh-131798: Narrow the return type of _FORMAT_SIMPLE and _FORMAT_WITH_SPEC to str#146639NekoAsakura wants to merge 10 commits intopython:mainfrom
_FORMAT_SIMPLE and _FORMAT_WITH_SPEC to str#146639Conversation
…MAT_WITH_SPEC` to `str`
| res = sym_new_type(ctx, &PyUnicode_Type); | ||
| } | ||
|
|
||
| op(_FORMAT_WITH_SPEC, (value, fmt_spec -- res)) { |
There was a problem hiding this comment.
This can return a subclass, e.g.
class my(str):
def __format__(self, spec):
return self
m=my('aap')
type(f'{m}')
Does the sym_new_type(ctx, &PyUnicode_Type); not guarantee we have an exact type?
There was a problem hiding this comment.
Good catch! I oversimplified this. Narrowing type for format isn't as straightforward as I thought.
That said, I'm curious why FORMAT_SIMPLE needs to preserve str subclasses. Had a look through the git history it's been this way since 2015, and seems not a deliberate type-preservation contract.
Current behaviour is already inconsistent:
class my(str):
def __format__(self, spec):
return self
m = my('aap')
type('{}'.format(m)) # <class 'my'>
type('x{}'.format(m)) # <class 'str'>
type(f'{m}') # <class 'my'>
type(f'x{m}') # <class 'str'>
Multi-piece concatenation already strips it (both _PyUnicodeWriter in str.format and BUILD_STRING in f-strings create exact str).
If FORMAT_SIMPLE normalised to exact str, this inconsistency goes away and type narrowing becomes unconditionally correct. Would that be a reasonable change, or is there a reason to preserve str subclass identity through formatting that I'm missing?
There was a problem hiding this comment.
Huh that's kinda scary. The semantics look a little strange, so maybe we should leave this alone for now?
There was a problem hiding this comment.
Fair enough. I just noticed the inconsistency and thought it was worth raising.
For the narrowing itself: we could do conditional narrowing for input types where we can prove __format__ returns exact str (e.g. int, float). But honestly it feels not pythonic to hardcode a list of "safe" types for a uop that's less then 0.1%. I'd prefer just closing this PR unless you think it's worth keeping with that approach?
There was a problem hiding this comment.
Actually we already have something like that called sym_is_safe_type, you can use that
There was a problem hiding this comment.
I couldn't find sym_is_safe_type anywhere. The closest thing I can see is sym_is_safe_const, but that checks for known constant values rather than types. Have I overlooked something?
There was a problem hiding this comment.
Oh yeah sorry that's the one! Just factor out the type checks in _Py_uop_sym_is_safe_const into another function and use that in this case, I think that's fine!
There was a problem hiding this comment.
Done. How does it look now?
…MAT_WITH_SPEC` to str for built-in types
| return (typ == &PyLong_Type) || | ||
| (typ == &PyUnicode_Type) || | ||
| (typ == &PyFloat_Type) || | ||
| (typ == &_PyNone_Type) || | ||
| (typ == &PyBool_Type) || | ||
| (typ == &PyFrozenDict_Type); |
There was a problem hiding this comment.
Oops, I meant to factor this out into a common function, sorry!
There was a problem hiding this comment.
You mean factoring safe types into a static helper? No problem.
…MAT_WITH_SPEC` to str for built-in types
|
Going to close and reopen this PR to re-trigger CI. Sorry! |
# Conflicts: # Python/optimizer_symbols.c
markshannon
left a comment
There was a problem hiding this comment.
Looks good overall. The is_safe_builtin_type function needs fixing up though.
| } | ||
|
|
||
| static bool | ||
| is_safe_builtin_type(PyTypeObject *typ) |
There was a problem hiding this comment.
I don't like this name. "is safe" for what?
For how we are using this, I don't think frozendict and frozenset should be included as they can contain "unsafe" objects. You should be able to add bytes to the list.
Maybe is_atomic_builtin_type?
Also add a comment explaining that "atomic" means that its values can be evaluated without side effects as they aren't containers. Also that common functions on them are pure and return the expected type, like str(ob) returns a str, int(ob) returns an obj, etc.
There was a problem hiding this comment.
Looks like frozendict and frozenset aren't safe for const folding either. I've pushed a test that captures the behaviour. I am not sure if there is a better fix than just dropping them. Do you want that addressed here, or track in a separate issue?
Documentation build overview
40 files changed ·
|
Uh oh!
There was an error while loading. Please reload this page.