If the only thing you want is to use the {identifier} syntax, without having to repeat `identifier` more than once, the simplest alternative is probably:
(disclaimer: I wouldn't actually use locals() like this in real code)
Also, if this thing is truly mangling bytecode, it's not portable between different python versions
But i'm not quite sure about that: I only skimmed the codebase, and that works seems to be done by interpy_untokenize, which boils down to some string mangling
Also, having expressions (or worse, statements... like it would be in ruby since there's no difference there) evaluated when evaluating a string is quite bad (this is not Haskell, and thus we cannot have guarantees that side effects won't happen)
However, if you remove the `print(x)` line, both Python 2 and 3 don't hoist the enclosing `x` into `locals()` (since it can't see a single usage of that variable), resulting in a `KeyError`:
Traceback (most recent call last):
File "blah.py", line 7, in <module>
outer()
File "blah.py", line 5, in outer
inner()
File "blah.py", line 4, in inner
print("x = {x}".format(**locals()))
KeyError: 'x'
Python 3 has the `nonlocal` keyword that you can use to indicate that you're using a variable from an enclosing scope so that it's properly introduced into `locals()` but Python 2 doesn't have this facility.
That doesn't seem funky to me, given that `x` in your example is in fact not local to `inner`. It may seem surprising, but it's consistent with how Python handles locals in general.
I called it funky because `x` is never local to `inner`, but gets an entry inside `locals()` if `inner` uses it implicitly or explicitly with `nonlocal`.
Non-funky behaviour would be to have an `enclosing()` that walks up the outer functions and returns the union of their `locals()`. Alternatively, a `vars()` which expands to all variables in scope (respecting LEGB) would be best given the context of variable interpolation.
Also, if this thing is truly mangling bytecode, it's not portable between
different python versions [...] seems to be done by interpy_untokenize,
which boils down to some string mangling
It uses the Python file encoding property ("# coding: foobar") to rewrite the source code, not the bytecode, and they refer to pyxl as an inspiration.
This is certainly interesting, but I think it kind of breaks with the "only one obvious way to do things," which Ruby tends not to follow (and with regards to string formatting, Python doesn't either :P ). Does this buy you anything beyond trying to ramrod Ruby syntax into Python? If this truly compiles down to byte code that's doing string concatenation in the Python VM, the built-in string formatting library tends to be a non-trivial amount faster, beyond trying to write Ruby in a language that isn't Ruby.
Well... String formatting already sorta breaks the "only one obvious way to do something" rule. I dunno about you, but we have string formatting all over the place in our code. I would wager that every single module in our codebase has something that looks like this:
cache.add_message("Hey, your {zoop} is {boop}".format(zoop=zoop, boop=boop))
That that call to "format" just kind of sucks -- it repeats what's already pretty obvious by looking at the string. It's specially crappy when you have strings that need lots of variables. We've taken to doing this recently, which I'm usually okay with:
cache.add_message("Hey, your {} is {}".format(zoop, boop))
The downside of that is you have to make sure that the order of the arguments matches exactly with the order of the empty brackets. It's kinda error prone... But generally not that big of a deal.
We could also do this...
cache.add_message("Hey, your %s is %s" % zoop, boop))
Or this...
cache.add_message("{1} alert! Hey, your {0} is {1}".format(zoop, boop))
So yeah, I would argue that string formatting in Python is ALREADY in a kinda nasty place. There's ALREADY a bunch of ways to do it, and it all just kinda sucks.
IMO, Ruby-style string formatting is probably the nicest I've seen. If it were in Python, it absolutely would be THE way to do string formatting, I bet.
Actually, I kind of like the existing Python ways better in many respects from the Ruby one, since it lets you separate format strings from the parameters passed to them, and reuse format strings more easily (which is especially useful if you want to move string literals out of code and into resources, because then you can include format strings in that as well.)
You can do this in Ruby too, with sprintf. It's almost as if there are multiple, complimentary ways of accomplishing similar tasks, depending on your needs.
Yeah, I was comparing the Python and Ruby features already under discussion in the thread, not what is available in each language, but that's useful to point out given that people might not be aware of it.
Maybe you can elucidate what about Python's syntax is broken or not DRY?
EDIT: To me, seeing this is Python code anywhere would seem to violate principle of least astonishment, which I think is somewhat more important than being DRY, if that's even a problem here.
If you want format strings with sensible placeholders, you end up with something like
"Hello {person}, it's a {weather} day".format(person=person, weather=weather)
The list of variables (person, weather) shows up three times.
Somewhat off-topic, a similar problem shows up when you have a bunch of related functions, all taking a particular kwarg (or kwargs), and calling each other. Like
A neat thing that perl6 has is syntax for "keyword argument whose value is found in the variable of the same name". So the equivalent of that line could be written
String interpolation is one of the things that I've never understood about python. I'm not a massive fan of Ruby's particular syntax for it[1], but not having any syntax at all feels like such a massive oversight. And then I realize that it's probably a deliberate omission and that just seems really really weird to me.
So I'm happy to see this, even if I'm probably too conservative to use it in my day job. And I didn't know about the coding: thing, and it looks like this method could also be used on my other python-wtf, which makes me even happier.
(My other python-wtf is that there really ought to be nicer syntax for a['b']. For a while I thought that a::b would be nice, but then I remembered that that could be a slice, so it can't be parsed reliably. a$b is probably my next choice. Or even require that kind of slice to be written with a space or something, like "a: :b".)
[1] Requiring braces even for a simple variable name seems like a poor decision. There's a little-known language called Haxe which IIRC gets it right: you can embed variables with just "hello $foo", or expressions with "your score is ${kills-deaths}". I get that Ruby allows unusual characters in variable names, and it's not obvious whether "is this yours, #name?" means #{name?} or #{name}?. But I'd rather have that potential for confusion than force the braces even when there's no ambiguity.
Are you sure? https://php.net/manual/en/language.types.string.php#language... suggests that it can only interpolate a few specific types of expression, but can't do e.g. arithmetic or or simple function calls. (It seems you can interpolate a variable named by a function call, but you can't interpolate a function call itself.)
Though I do actually prefer having the explicit formatting call there so I know when the interpolation is being performed. Side effects and all that. In a perfect world, this is the syntax I'd prefer:
def foo(x):
print "Hello, {x}".format()
ie. a format() without args defaults to "all variables accessible in the current scope".
I wouldn't actually want it to support arbitrary python the way ruby does, I find the .format() syntax flexible enough.
(Also, my current implementation is for locals only. It wouldn't be hard to extend to globals, but would suffer the "nonlocals won't be captured" problem described in other comments here no matter what)
(EDIT: Also, it relies on sys._getframe, which is CPython specific)
This special syntax should be discouraged for any strings that shall be translated in the future. Different languages have different syntax, so the order can change, with this kind of syntax, you will be in big trouble very soon!
For anything translated, you'd use the (admittedly gross, fixed in Python3) "Welcome %(who)s to %(place)s" % { 'who': who, 'place': place }. The Python3 version is "Welcome {who} to {place}".format(place=place, who=who), or if you want to be un-idiomatic and unsafe, "Welcome {who} to {place}".format(locals())
It's easier to see the point with more variables being interpolated where it becomes cumbersome to keep track of the order and you end up recounting everything multiple times when something's out of place.
Can anyone explain the creators' reasoning beyond the existing string interpolation in Python? I never thought it was cumbersome, on the contrary I liked the verbosity of "My {name} is, I am {years} old".format(name=name, years=years) and I think there are good reasons (beyond dragonwriter's reusability argument).
So, like, just to point out, your example would be, with name Adam and age 10:
"My Adam is, I am 10 old."
Correcting your example:
"My name is {name}, I am {years} years old".format(name=name, years=years)
So to throw that out, that one line includes the word "name" four freaking times, and years four freaking times. You say you like the verbosity of it. Why? Would you like this format yet more?
"My name is {name=name}, I am {years=years} years old".format(name=name, years=years)
If not why not? It's yet more verbose.
I think that the reason that other people like the non-verbose format of:
"My name is {name}, I am {years} years old" # assuming the presence of local variables "name" and "years"
Is that, well, it's pretty obvious what's going on here, and repeating name and years a bunch more times do not, it seems, make it any more clear what's going on.
A reasonable argument might be that:
"My name is {name}, I am {years} years old".format()
Is more clear about what's going on. But repeating the variable names is not particularly elucidating.
Well, I did not want to defend the way .format() works now. That's why I asked for the reasoning. Your reasonable argument sounds reasonable to me, I would certainly use it. But then my initial question is even more important: Why is this not the one Python way. What did the initial designers think?
I guess finding discussion of the % formatting would be harder. I suspect that the discussion would have been about the tradeoff between the implicit variable insertion and simpler positional examples:
"My name is %s, I am %s years old" % (name, years)
(where there is definitely at least a tendency to avoid implicit behavior in the design of python; of course positional formatting like that is implicit, but it is quite a bit less implicit than automatically pulling variables out of the current scope)
edit: the % formatting was probably informed by sprintf.
Some of the examples in that SO post are outdated. However, list joining is faster that string concatenation, but not by much. Assembling a 110 MB string:
from timeit import Timer
try:
from StringIO import StringIO
except ImportError:
from io import StringIO
nr = 1200000
data = "The Quick Brown Fox Jumps Over The Lazy Dog: Woven silk pyjamas exchanged for blue quartz.\n"
# contruct a list first, then join
def dolist():
s = []
a = s.append
i = 0
while i < nr:
a(data)
i += 1
s = "".join(s)
print("%s chars (joined list)" % len(s))
# string concatenation fest
def dostr():
s = ""
i = 0
while i < nr:
s += data
i += 1
print("%s chars (string concatenation)" % len(s))
# use a string as a file
def dostringio():
buf = StringIO()
w = buf.write
i = 0
while i < nr:
w(data)
i += 1
s = buf.getvalue()
print("%s chars (cStringIO)" % len(s))
if 1:
tlist = Timer("dolist()", "from __main__ import dolist")
print("the joined list took %.2f seconds" % tlist.timeit(2))
tstr = Timer("dostr()", "from __main__ import dostr")
print("the concatenation fest took %.2f seconds" % tstr.timeit(2))
tlist = Timer("dostringio()", "from __main__ import dostringio")
print("the cStringIO approach took %.2f seconds" % tlist.timeit(2))
else:
@profile
def callall():
# For use with a profiler (eg, kernprof.py/lineprof)
for i in xrange(2):
dolist()
dostr()
dostringio()
callall()
Result:
(user@air) /Users/user/Prj/python $ python3 stringplakbenchmark.py
109200000 chars (joined list)
109200000 chars (joined list)
the joined list took 1.12 seconds
109200000 chars (string concatenation)
109200000 chars (string concatenation)
the concatenation fest took 1.76 seconds
109200000 chars (cStringIO)
109200000 chars (cStringIO)
the cStringIO approach took 1.45 seconds
(user@air) /Users/user/Prj/python $ python2.7 stringplakbenchmark.py
109200000 chars (joined list)
109200000 chars (joined list)
the joined list took 0.99 seconds
109200000 chars (string concatenation)
109200000 chars (string concatenation)
the concatenation fest took 1.33 seconds
109200000 chars (cStringIO)
109200000 chars (cStringIO)
the cStringIO approach took 5.21 seconds
(user@air) /Users/user/Prj/python $ python2.6 stringplakbenchmark.py
109200000 chars (joined list)
109200000 chars (joined list)
the joined list took 0.95 seconds
109200000 chars (string concatenation)
109200000 chars (string concatenation)
the concatenation fest took 1.39 seconds
109200000 chars (cStringIO)
109200000 chars (cStringIO)
the cStringIO approach took 5.54 seconds
Something about this seems patently unpythonic. Also, the way the code is included reminds me of monkey-patching, which is a Ruby behavior I like to leave at the door when I'm coding in Python.
Interesting and clever approach (even though, as I see, pyxl came first at that).
Just a doubt: how do I specify source file encoding if the coding string is now hijacked for interpolation purposes? Is there a default, fixed encoding (which I hope is not iso-8859-1, python's own default)?
Yup that's the point of my submission, too bad HN edited my title so it looks like a boring trick. It allows you to transform your code without significant speed loss.
Also, if this thing is truly mangling bytecode, it's not portable between different python versions
But i'm not quite sure about that: I only skimmed the codebase, and that works seems to be done by interpy_untokenize, which boils down to some string mangling
Also, having expressions (or worse, statements... like it would be in ruby since there's no difference there) evaluated when evaluating a string is quite bad (this is not Haskell, and thus we cannot have guarantees that side effects won't happen)
Nice hack, btw