I would not call it performance comparison at all. When Python call functions wr...

kortex · on July 24, 2022

Can we move past the whole "it is not really python to use libraries written in C", especially when talking pure stdlib python?

Python is basically a DSL for C extensions. That is the whole point. It would be like criticizing any compiled language for essentially being a DSL for machine code, and not "Real Instructions".

Python's ability to interop with pre-built, optimized libraries with a lightweight interface is arguably its greatest selling point. Everyone and their dog knows purely interpreted CPython is slow. It doesn't need to be pointed out every single time python performance is brought up unless literally discussing optimizing the CPython vm.

FpUser · on July 24, 2022

I am not criticizing Python. It does well enough what it was made to do. It just make no sense to call that particular example a "language performance comparison". It is anything but.

kortex · on July 24, 2022

But you are still wrong. As mentioned, Dicts are incredibly efficient data structures in Python (because they underpin everything) and the Counter class is pure python. That's 100% pure python. Saying dicts "don't count" because they are implemented in C would disqualify the entire language of CPython, as virtually everything under the hood is a PyObject struct pointer. It just so happens that "counting abstract objects" is the class of problem* CPython does basically all the time in normal VM execution anyways, so this task is easy and fast.

* looking up a reference of an attribute with a string key underpins the meat and potatoes of python's execution model

Bogdanp · on July 25, 2022

I have no dog in this fight, but I do want to point out that’s it’s not exactly true that Counter is implemented in pure Python:

* it’s a subclass of dict

* its update method (used by the code in the post) dispatches to a C implementation on its fast path

kortex · on July 25, 2022

Yes, that's exactly what I said. dict.update is in C, because it's a core feature of the python vm. It's pure CPython. What do you think "pure python" is? There's no python hardware ISA (afaik). All cpython is manipulating data structures in C via Python VM opcodes. It just so happens that whatever opcodes that are dispatched in the course of solving this problem are quite efficient.

If you say "it does not count as Real Python if you dispatch to C", then you literally cannot execute any CPython vm opcodes, because it's all dispatching to C under the hood.

Bogdanp · on July 26, 2022

“Pure Python” commonly means implemented using only the Python language. Something written in pure Python ought to be portable across Python implementations. I was merely pointing out that this line

https://github.com/python/cpython/blob/4395ff1e6a18fb26c7a66...

isn’t exactly pure Python, because, under a different runtime (eg PyPy), the code would take a different path (the “pure Python” implementation of _count_elements[1] instead of the C implementation[2][3]). Yes, it's hard to draw exact lines when it comes to Python, especially as the language is so tied to its implementation. However, I think in this case it's relatively clear that the code that specific line is calling is an optimization in CPython, specifically intended to get around some of the VM overhead. Said optimization comes into play in the OP.

[1]: https://github.com/python/cpython/blob/4395ff1e6a18fb26c7a66...

[2]: https://github.com/python/cpython/blob/4395ff1e6a18fb26c7a66...

[3]: https://github.com/python/cpython/blob/4395ff1e6a18fb26c7a66...

igouy · on July 24, 2022

> It doesn't need to be pointed out …

Apparently there will be someone who feels the need to point it out; and someone who feels the need to point out that it doesn't need to be pointed out; and …

__marvin_the__ · on July 24, 2022

Python's `collections.Counter` is written in Python and is a subclass of the builtin `dict` type. I don't think it's comparable to something like using `pandas` to solve the problem.

https://github.com/python/cpython/blob/main/Lib/collections/...

Arnavion · on July 25, 2022

So we should disqualify the C and C++ impls because some libc functions are implemented using ASM, right?

dvhh · on July 26, 2022

Are you implying that node.js is pure ecmascript ?

Bolkan · on July 25, 2022

If it looks like python and quacks like python...