Ruby-like string interpolation in Python

berdario · on April 8, 2015

If the only thing you want is to use the {identifier} syntax, without having to repeat `identifier` more than once, the simplest alternative is probably:

    >>> package = "foo"
    >>> whatever = "bar"
    >>> "Enjoy {package}".format(**locals())
    'Enjoy foo'

(disclaimer: I wouldn't actually use locals() like this in real code)

Also, if this thing is truly mangling bytecode, it's not portable between different python versions

But i'm not quite sure about that: I only skimmed the codebase, and that works seems to be done by interpy_untokenize, which boils down to some string mangling

Also, having expressions (or worse, statements... like it would be in ruby since there's no difference there) evaluated when evaluating a string is quite bad (this is not Haskell, and thus we cannot have guarantees that side effects won't happen)

Nice hack, btw

rraval · on April 8, 2015

Note that `locals()` actually does funky things around enclosing scopes.

Consider the following Python2/3 code:

    def outer():
        x = 1
        def inner():
            print(x)
            print("x = {x}".format(**locals()))
        inner()

    outer()

This actually prints the right thing when run:

    1
    x = 1

However, if you remove the `print(x)` line, both Python 2 and 3 don't hoist the enclosing `x` into `locals()` (since it can't see a single usage of that variable), resulting in a `KeyError`:

    Traceback (most recent call last):
      File "blah.py", line 7, in <module>
        outer()
      File "blah.py", line 5, in outer
        inner()
      File "blah.py", line 4, in inner
        print("x = {x}".format(**locals()))
    KeyError: 'x'

Python 3 has the `nonlocal` keyword that you can use to indicate that you're using a variable from an enclosing scope so that it's properly introduced into `locals()` but Python 2 doesn't have this facility.

wylee · on April 8, 2015

That doesn't seem funky to me, given that `x` in your example is in fact not local to `inner`. It may seem surprising, but it's consistent with how Python handles locals in general.

rraval · on April 8, 2015

I called it funky because `x` is never local to `inner`, but gets an entry inside `locals()` if `inner` uses it implicitly or explicitly with `nonlocal`.

Non-funky behaviour would be to have an `enclosing()` that walks up the outer functions and returns the union of their `locals()`. Alternatively, a `vars()` which expands to all variables in scope (respecting LEGB) would be best given the context of variable interpolation.

csl · on April 9, 2015

    Also, if this thing is truly mangling bytecode, it's not portable between
    different python versions [...] seems to be done by interpy_untokenize,
    which boils down to some string mangling

It uses the Python file encoding property ("# coding: foobar") to rewrite the source code, not the bytecode, and they refer to pyxl as an inspiration.

For a good explanation, see https://github.com/dropbox/pyxl

maxerickson · on April 8, 2015

str.format_map takes a mapping directly.

berdario · on April 8, 2015

Thanks! I'd edit my post if I could

rpcope1 · on April 8, 2015

This is certainly interesting, but I think it kind of breaks with the "only one obvious way to do things," which Ruby tends not to follow (and with regards to string formatting, Python doesn't either :P ). Does this buy you anything beyond trying to ramrod Ruby syntax into Python? If this truly compiles down to byte code that's doing string concatenation in the Python VM, the built-in string formatting library tends to be a non-trivial amount faster, beyond trying to write Ruby in a language that isn't Ruby.

methodover · on April 8, 2015

Well... String formatting already sorta breaks the "only one obvious way to do something" rule. I dunno about you, but we have string formatting all over the place in our code. I would wager that every single module in our codebase has something that looks like this:

  cache.add_message("Hey, your {zoop} is {boop}".format(zoop=zoop, boop=boop))

That that call to "format" just kind of sucks -- it repeats what's already pretty obvious by looking at the string. It's specially crappy when you have strings that need lots of variables. We've taken to doing this recently, which I'm usually okay with:

  cache.add_message("Hey, your {} is {}".format(zoop, boop))

The downside of that is you have to make sure that the order of the arguments matches exactly with the order of the empty brackets. It's kinda error prone... But generally not that big of a deal.

We could also do this...

  cache.add_message("Hey, your %s is %s" % zoop, boop))

Or this...

  cache.add_message("{1} alert! Hey, your {0} is {1}".format(zoop, boop))

So yeah, I would argue that string formatting in Python is ALREADY in a kinda nasty place. There's ALREADY a bunch of ways to do it, and it all just kinda sucks.

IMO, Ruby-style string formatting is probably the nicest I've seen. If it were in Python, it absolutely would be THE way to do string formatting, I bet.

  cache.add_message("Hey, your {zoop} is {boop}")

So much nicer.

dragonwriter · on April 8, 2015

Actually, I kind of like the existing Python ways better in many respects from the Ruby one, since it lets you separate format strings from the parameters passed to them, and reuse format strings more easily (which is especially useful if you want to move string literals out of code and into resources, because then you can include format strings in that as well.)

stouset · on April 8, 2015

You can do this in Ruby too, with sprintf. It's almost as if there are multiple, complimentary ways of accomplishing similar tasks, depending on your needs.

moe · on April 8, 2015

You don't even need sprintf, Ruby supports Python-style interpolation, too:

  bar='batz'  

  "foo #{bar} #{0.5+0.5}"
   => "foo batz 1.0"

  "foo %s %.2f" % [bar, 1.0]
   => "foo batz 1.00"

stouset · on April 13, 2015

% is an alias for sprintf. :)

dragonwriter · on April 8, 2015

Yeah, I was comparing the Python and Ruby features already under discussion in the thread, not what is available in each language, but that's useful to point out given that people might not be aware of it.

methodover · on April 8, 2015

Oh hey, I didn't even think of that. That's a really good point.

renox · on April 8, 2015

> Does this buy you anything beyond trying to ramrod Ruby syntax into Python?

I'm not the poster but IMHO it fixes Python's syntax which is a case of DRY violation.

rpcope1 · on April 8, 2015

Maybe you can elucidate what about Python's syntax is broken or not DRY?

EDIT: To me, seeing this is Python code anywhere would seem to violate principle of least astonishment, which I think is somewhat more important than being DRY, if that's even a problem here.

philh · on April 8, 2015

If you want format strings with sensible placeholders, you end up with something like

    "Hello {person}, it's a {weather} day".format(person=person, weather=weather)

The list of variables (person, weather) shows up three times.

Somewhat off-topic, a similar problem shows up when you have a bunch of related functions, all taking a particular kwarg (or kwargs), and calling each other. Like

    def foo(arg, conf1=None, conf2=None):
        bar(arg+2, conf1=conf1, conf2=conf2) # eww :(

    def bar(arg, conf1=None, conf2=None):
        # etc.

A neat thing that perl6 has is syntax for "keyword argument whose value is found in the variable of the same name". So the equivalent of that line could be written

    bar($arg+2, :conf1($conf1), :conf2($conf2))

but it could also be

    bar($arg+2, :$conf1, :$conf2)

philh · on April 8, 2015

String interpolation is one of the things that I've never understood about python. I'm not a massive fan of Ruby's particular syntax for it[1], but not having any syntax at all feels like such a massive oversight. And then I realize that it's probably a deliberate omission and that just seems really really weird to me.

So I'm happy to see this, even if I'm probably too conservative to use it in my day job. And I didn't know about the coding: thing, and it looks like this method could also be used on my other python-wtf, which makes me even happier.

(My other python-wtf is that there really ought to be nicer syntax for a['b']. For a while I thought that a::b would be nice, but then I remembered that that could be a slice, so it can't be parsed reliably. a$b is probably my next choice. Or even require that kind of slice to be written with a space or something, like "a: :b".)

[1] Requiring braces even for a simple variable name seems like a poor decision. There's a little-known language called Haxe which IIRC gets it right: you can embed variables with just "hello $foo", or expressions with "your score is ${kills-deaths}". I get that Ruby allows unusual characters in variable names, and it's not obvious whether "is this yours, #name?" means #{name?} or #{name}?. But I'd rather have that potential for confusion than force the braces even when there's no ambiguity.

aetherson · on April 8, 2015

Note that Ruby allows non-braced interpolation with instance variables. So:

  irb(main):006:0> @foo = 'bar'
  => "bar"
  irb(main):007:0> "This is the value of foo: #@foo"
  => "This is the value of foo: bar"

toupeira · on April 8, 2015

There's also a very well-known language called PHP that does the same ;-)

philh · on April 9, 2015

Are you sure? https://php.net/manual/en/language.types.string.php#language... suggests that it can only interpolate a few specific types of expression, but can't do e.g. arithmetic or or simple function calls. (It seems you can interpolate a variable named by a function call, but you can't interpolate a function call itself.)

ForHackernews · on April 8, 2015

You can already almost do this in Python, with the native string format operation:

>>> name = "Foo Bar"

>>> age = 25

>>> "Hi, my name is {name} and I'm {age} years old.".format(splatlocals())

"Hi, my name is Foo Bar and I'm 25 years old."

Arguably, this an abuse of `locals()`, but it gets you very nearly the same kind of use-variables-in-strings-with-curly-braces functionality.

Edit: HN markdown doesn't seem to let you escape the star italics operator. To be clear, you have to double-star splat locals().

maxerickson · on April 8, 2015

Repeating my similar comment, str.format_map takes a dictionary directly (available since 3.2).

ekimekim · on April 8, 2015

I wrote a version of this some time ago:

https://github.com/ekimekim/pylibs/blob/master/libs/interpol...

It's not quite as natural as your one:

    def foo(x):
        print interpolate("Hello, {x}")

Though I do actually prefer having the explicit formatting call there so I know when the interpolation is being performed. Side effects and all that. In a perfect world, this is the syntax I'd prefer:

    def foo(x):
        print "Hello, {x}".format()

ie. a format() without args defaults to "all variables accessible in the current scope". I wouldn't actually want it to support arbitrary python the way ruby does, I find the .format() syntax flexible enough.

(Also, my current implementation is for locals only. It wouldn't be hard to extend to globals, but would suffer the "nonlocals won't be captured" problem described in other comments here no matter what)

(EDIT: Also, it relies on sys._getframe, which is CPython specific)

BetaMechazawa · on April 8, 2015

I guess I'm missing the point. Why not use "Welcome %s to %s" % (who, place)

smcl · on April 8, 2015

From the README.md:

    I really enjoyed Ruby String interpolation, and "".format(...) or "" % (...) seems very verbose to me. I'm lazy by nature ;)

PythonicAlpha · on April 8, 2015

This special syntax should be discouraged for any strings that shall be translated in the future. Different languages have different syntax, so the order can change, with this kind of syntax, you will be in big trouble very soon!

nostrademons · on April 8, 2015

For anything translated, you'd use the (admittedly gross, fixed in Python3) "Welcome %(who)s to %(place)s" % { 'who': who, 'place': place }. The Python3 version is "Welcome {who} to {place}".format(place=place, who=who), or if you want to be un-idiomatic and unsafe, "Welcome {who} to {place}".format(locals())

hashmymustache · on April 8, 2015

It's easier to see the point with more variables being interpolated where it becomes cumbersome to keep track of the order and you end up recounting everything multiple times when something's out of place.

mcbetz · on April 8, 2015

Can anyone explain the creators' reasoning beyond the existing string interpolation in Python? I never thought it was cumbersome, on the contrary I liked the verbosity of "My {name} is, I am {years} old".format(name=name, years=years) and I think there are good reasons (beyond dragonwriter's reusability argument).

aetherson · on April 8, 2015

So, like, just to point out, your example would be, with name Adam and age 10:

  "My Adam is, I am 10 old."

Correcting your example:

  "My name is {name}, I am {years} years old".format(name=name, years=years)

So to throw that out, that one line includes the word "name" four freaking times, and years four freaking times. You say you like the verbosity of it. Why? Would you like this format yet more?

  "My name is {name=name}, I am {years=years} years old".format(name=name, years=years)

If not why not? It's yet more verbose.

I think that the reason that other people like the non-verbose format of:

  "My name is {name}, I am {years} years old" # assuming the presence of local variables "name" and "years"

Is that, well, it's pretty obvious what's going on here, and repeating name and years a bunch more times do not, it seems, make it any more clear what's going on.

A reasonable argument might be that:

  "My name is {name}, I am {years} years old".format()

Is more clear about what's going on. But repeating the variable names is not particularly elucidating.

mcbetz · on April 8, 2015

Well, I did not want to defend the way .format() works now. That's why I asked for the reasoning. Your reasonable argument sounds reasonable to me, I would certainly use it. But then my initial question is even more important: Why is this not the one Python way. What did the initial designers think?

maxerickson · on April 8, 2015

The new style .format is talked about here:

https://www.python.org/dev/peps/pep-3101/

I guess finding discussion of the % formatting would be harder. I suspect that the discussion would have been about the tradeoff between the implicit variable insertion and simpler positional examples:

  "My name is %s, I am %s years old" % (name, years)

(where there is definitely at least a tendency to avoid implicit behavior in the design of python; of course positional formatting like that is implicit, but it is quite a bit less implicit than automatically pulling variables out of the current scope)

edit: the % formatting was probably informed by sprintf.

mcbetz · on April 8, 2015

Thanks for the explanation and link!

mkesper · on April 8, 2015

How is this optimized if it compiles to a + "" + b?

mesozoic · on April 8, 2015

It's not

andybak · on April 8, 2015

Are we sure?: http://stackoverflow.com/questions/3055477/how-slow-is-pytho...

Luyt · on April 8, 2015

Some of the examples in that SO post are outdated. However, list joining is faster that string concatenation, but not by much. Assembling a 110 MB string:

    from timeit import Timer
    try:
        from StringIO import StringIO
    except ImportError:
        from io import StringIO

    nr = 1200000
    data = "The Quick Brown Fox Jumps Over The Lazy Dog: Woven silk pyjamas exchanged for blue quartz.\n"

    # contruct a list first, then join
    def dolist():
        s = []
        a = s.append
        i = 0
        while i < nr:
            a(data)
            i += 1
        s = "".join(s)
        print("%s chars (joined list)" % len(s))

    # string concatenation fest
    def dostr():
        s = ""
        i = 0
        while i < nr:
            s += data
            i += 1
        print("%s chars (string concatenation)" % len(s))

    # use a string as a file
    def dostringio():
        buf = StringIO()
        w = buf.write
        i = 0
        while i < nr:
            w(data)
            i += 1
        s = buf.getvalue()
        print("%s chars (cStringIO)" % len(s))

    if 1:
        tlist = Timer("dolist()", "from __main__ import dolist")
        print("the joined list took %.2f seconds" % tlist.timeit(2))

        tstr = Timer("dostr()", "from __main__ import dostr")
        print("the concatenation fest took %.2f seconds" % tstr.timeit(2))

        tlist = Timer("dostringio()", "from __main__ import dostringio")
        print("the cStringIO approach took %.2f seconds" % tlist.timeit(2))
    else:
        @profile
        def callall():
            # For use with a profiler (eg, kernprof.py/lineprof)
            for i in xrange(2):
                dolist()
                dostr()
                dostringio()
        callall()

    Result:

    (user@air) /Users/user/Prj/python $ python3 stringplakbenchmark.py
    109200000 chars (joined list)
    109200000 chars (joined list)
    the joined list took 1.12 seconds
    109200000 chars (string concatenation)
    109200000 chars (string concatenation)
    the concatenation fest took 1.76 seconds
    109200000 chars (cStringIO)
    109200000 chars (cStringIO)
    the cStringIO approach took 1.45 seconds

    (user@air) /Users/user/Prj/python $ python2.7 stringplakbenchmark.py
    109200000 chars (joined list)
    109200000 chars (joined list)
    the joined list took 0.99 seconds
    109200000 chars (string concatenation)
    109200000 chars (string concatenation)
    the concatenation fest took 1.33 seconds
    109200000 chars (cStringIO)
    109200000 chars (cStringIO)
    the cStringIO approach took 5.21 seconds

    (user@air) /Users/user/Prj/python $ python2.6 stringplakbenchmark.py
    109200000 chars (joined list)
    109200000 chars (joined list)
    the joined list took 0.95 seconds
    109200000 chars (string concatenation)
    109200000 chars (string concatenation)
    the concatenation fest took 1.39 seconds
    109200000 chars (cStringIO)
    109200000 chars (cStringIO)
    the cStringIO approach took 5.54 seconds

languagehacker · on April 8, 2015

Something about this seems patently unpythonic. Also, the way the code is included reminds me of monkey-patching, which is a Ruby behavior I like to leave at the door when I'm coding in Python.

est · on April 8, 2015

Another gem is pyxl from Dropbox

https://github.com/dropbox/pyxl

Reminds me of JSX/E4X/XML Islands, etc.

alanfranz · on April 8, 2015

Interesting and clever approach (even though, as I see, pyxl came first at that).

Just a doubt: how do I specify source file encoding if the coding string is now hijacked for interpolation purposes? Is there a default, fixed encoding (which I hope is not iso-8859-1, python's own default)?

est · on April 8, 2015

> Is there a default, fixed encoding

For interpy, I think it assumes utf-8 by default.

https://github.com/syrusakbary/interpy/blob/master/interpy/c...

b6fan · on April 8, 2015

The `codecs.register` approach is interesting. I hope Ruby has something similar. Unfortunately I could not find such thing in Ruby.

est · on April 8, 2015

Yup that's the point of my submission, too bad HN edited my title so it looks like a boring trick. It allows you to transform your code without significant speed loss.

Think of possibilities like

# coding: JIT

or

# coding: inline-C

etc.

maxerickson · on April 8, 2015

The typical way to do that is to reuse the interpreter infrastructure. This hack could be done with the ast:

https://docs.python.org/3.5/library/ast.html#ast.NodeTransfo...

est · on April 9, 2015

AST is cool, but you have to comply with Python's syntax.

the coding way you can invent any wild syntax.

maxerickson · on April 9, 2015

I guess my first approach to that would not involve overloading the import statement.

BerislavLopac · on April 8, 2015

There are also Python template strings: https://docs.python.org/2.7/library/string.html#template-str...

volent · on April 8, 2015

Looks fun but it doesn't seem to support unicode in python2.