Hacker Newsnew | past | comments | ask | show | jobs | submit | chrisaycock's commentslogin

I hadn't realized that Kalshi is a YC company:

https://news.ycombinator.com/item?id=33696486


> I think it would be the best to start interpreting the query and start compilation in another thread

This technique is known as a "tiered JIT". It's how production virtual machines operate for high-level languages like JavaScript.

There can be many tiers, like an interpreter, baseline compiler, optimizing compiler, etc. The runtime switches into the faster tier once it becomes ready.

More info for the interested:

https://ieeexplore.ieee.org/document/10444855


It’s also common for JITs to sprout a tier and shed a tier over time, as the last and first tiers shift in cost/benefit. If the first tier works better you delay the other tiers. If the last tier gets faster (in run time or code optimization) you engage it sooner, or strip the middle tier entirely and hand half that budget to the last tier.

I write JITs so I know, but I always try to write in a way that even non-JIT people can understand :)

I first encountered q/kdb+ at a quant job in 2007. I learned so much from the array semantics about how to concisely represent time-series logic that I can't imagine ever using a scalar language for research.

Fun fact: the aj (asof join) function was my inspiration for pandas.merge_asof. I added the extra parameters (direction, tolerance, allow_exact_matches) because of the limitations I kept hitting in kdb.

https://code.kx.com/q/ref/aj/

https://pandas.pydata.org/docs/reference/api/pandas.merge_as...


Similarly, this is how it was introduced in ClickHouse in 2019: https://github.com/ClickHouse/ClickHouse/pull/4774

The aj function at its heart is a bin (https://code.kx.com/q/ref/bin/) search between the two tables, on the requested columns, to find the indices of the right table to zip onto the left table.

  aj[`sym`time;t;q]
becomes

  t,'(`sym`time _q)(`sym`time#q)bin`sym`time#t
The rest of the aj function internals are there to handle edge cases, handling missing columns and options for filling nulls.

A lot of the joins can be distilled to the core operators/functions in a similar manner. For example the plus-join is

  x+0i^y(cols key y)#x

Indeed, my very first attempt used numpy.searchsorted:

https://numpy.org/doc/2.2/reference/generated/numpy.searchso...

I couldn't figure-out how Arthur's bin matched on symbol though, so I switched to a linear scan on the right table to record the last-seen index for each "by" element. While it worked, my hash table was messy because I relied on Python to handle a whole tuple as a key, which had some issues during initial testing.

The asof join I wrote for Empirical properly categorizes the keys before they are matched. That approach worked far better.

https://www.empirical-soft.com/tutorial.html#dataframes


dang I had no idea you wrote the asof join for pandas. thank you for that

The article points out that tools like TLA+ can prove that a system is correct, but can't demonstrate that a system is performant. The author asks for ways to assess latency et al., which is currently handled by simulation. While this has worked for one-off cases, OP requests more generalized tooling.

It's like the quote attributed to Don Knuth: "Beware of bugs in the above code; I have only proved it correct, not tried it."


From my point of view, they cannot even prove that, because in most cases there is no validation if the TLA+ model actually maps to the e.g. C code that was written.

I only believe in formal methods where we always have a machine validated way from model to implementation.


Well Coq has program extraction built in.


Yeah and that's why it's way better than the likes of TLA+.


See Dafny


I know it, :)


preach


There are methods of determining Worst Case Execution Time/WCET. I’ve been involved in real time embedded systems development, where that was a thing.

But one tool (like TLA+) can’t realistically support all formalisms for all types of analyses ¯\_(ツ)_/¯


I had been thinking about this idea for a long time, but I doubt I'll be able to get around to it. After speaking with a friend this evening, I decided to just jot it down for anyone interested.

Basically, use copy-and-patch compilation in a vector language to fuse loops and avoid temporaries. It can be employed for a baseline compiler that will use less memory than an interpreter and will have a much lower startup cost than an optimizing compiler.


Statically typed dataframes are exactly why I created the Empirical programming language:

https://www.empirical-soft.com

It can infer the column names and types from a CSV file at compile time.

Here's an example that misspells the "ask" column as if it were plural:

  let quotes = load("quotes.csv")
  sort quotes by (asks - bid) / bid
The error is caught before the script is run:

  Error: symbol asks was not found
I had to use a lot of computer-science techniques to get this working, like type providers and compile-time function evaluation. I'm really proud of the novelty of it and even won Y Combinator's Startup School grant for it.

Unfortunately, it didn't go anywhere as a project. Turns out that static typing isn't enough of a selling point for people to drop Python. I haven't touched Empirical in four years, but my code and my notes are still publicly available on the website.


Wow this is amazing!! Thanks for sharing!

I love how you really expanded on the idea of executing code at compile time. You should be proud.

You probably already know this but for people like me to switch "all" it would take would be:

1. A plotting library like ggplot2 or plotnine

2. A machine learning library, like scikit

3. A dashboard framework like streamlit or shiny

4. Support for Empirical in my cloud workspace environment, which is Jupyter based, and where I have to execute all the code, because that's where the data is and has to stay due to security

Just like how Polars is written in Rust and has Python bindings, I wonder if there's a market for 1 and 2 written in Rust and then having bindings to Python, Empirical, R, Julia etc. I feel like 4 is just a matter of time if Empirical becomes popular, but I think 3 would have to be implemented specifically for Empirical.

I think the idea of statically typed dataframes is really useful and you were ahead of your time. Maybe one day the time will be right.


Does this require that the file is available locally or does it do network io at compile time?


The inferencing logic needs to sample the file, so (1) the file path must be determined at compile time and (2) the file must be available to be read at compile time. If neither condition is true---like the filename is a runtime parameter, for example---then the user must supply the type in advance.

There is no magic here. No language can guess the type of anything without seeing what the thing is.


Yeah, i think that's what limits the utility of such systems. Polars does typechecking at query planning time. So before you really do computation. I don't expect that much can improve over this model due to the aforementioned limitations.

I think needing network access or file access at compile time is a semi-hard blocker for statically typed dataframes.


> Jobs are being eliminated within the IT function which are routine and mundane, such as reporting, clerical administration.

I've had a similar thought recently, that there is no demand for rote programmers. No employer is going to hand you a completed spec and tell you to code it up.

Software engineers and data scientists today must be innovative, understand the business they operate in, communicate with users, and work cross functionally. You’ve got to create something original and see it through without having to be told what to do at every step.


Personally, I'd classify jobs such as reporting and clerical administration more as 'admin' then IT. You do some of that as a SWE (tickets, SoW, Design docs) but that's not generally the focus of the work.


This op-ed was written by an undergrad and complains that Northeastern's switch to Python (from Racket) for its introductory classes will prevent students from learning fundamentals of computer science.

But that complaint can be made about any language! "This dynamically typed language won't allow students to understand type safety." "This high-level language won't allow students to learn pointers and systems programming." Etc.

I believe that an intro course should get students coding since the first major hurdle is learning how to construct any kind of program at all. The switch to a more "employable" language isn't going to make education worse.


How about you focus on the argument instead of making ad hominem attacks?

>Racket was chosen because it has “teaching languages” that can gradually introduce features as students are taught the relevant design principles.

So no, that complaint can't be made about any language.


Tell me you haven't read the article (or used racket) without telling me.

> I believe that an intro course should get students coding since the first major hurdle is learning how to construct any kind of program at all. The switch to a more "employable" language isn't going to make education worse.

None of this is the issue at hand. The switch to python is because industry uses it. The article correctly makes the point that racket was intentionally designed to get students coding as easily and quickly as possible. It has multiple steps of teaching languages for exactly that purpose, introducing concepts in ways that let students grapple with them one at a time in an interactive environment.

Meanwhile in python complex topics like duck typing, object oriented methods, exceptions, the distinction between iterables and lists, how to use a command line/terminal or how to configure an IDE, and so on must be covered before people can start writing code for the exercises. Racket is streamlined for beginners.


> Meanwhile in python complex topics like duck typing, object oriented methods, exceptions, the distinction between iterables and lists, how to use a command line/terminal or how to configure an IDE, and so on must be covered before people can start writing code for the exercises.

No, they dont have to be at all. You might as well suggest you need to learn the JVM before writing a line of Java.


Python supports imperative, OO and functional programming paradigms. And to start you can use any text editor, an IDE is not required. In fact you can start working in the REPL right away, in which case you need a terminal and the command “Python”.


All of that except for working in the terminal, could probably be considered higher level


To quote the above person: "tell me you haven't read the article without telling me".

You thought that supporting multiple "programming paradigms" is a nice thing, but it's the opposite for teaching beginning student. Experienced programmers want expressivity/customization/choices to do whatever they want. That's not what newbies need when they get stuck on an assignment.


Does anyone else question whether an undergrad is best able to evaluate the strengths and weaknesses of the curriculum?


In this case, you can find the same criticisms in published articles and books. I expect this student heard them straight from the source (author of the articles or books). That does not lessen their impact or correctness in my opinion.

Also, see SolarNet's comment. https://news.ycombinator.com/item?id=42677918


Yeah.. Undergads leave school and are barely trusted to write CRUD apps.


I have found that grit, curiosity, and relationship building are far more important than memorizing facts and applying formulae.

Society needs to value those attributes if we're going to solve the really hard problems.


It is somewhat of a catch-22, because any student who has those values already internalized can extract a ton of value even from an archaic educational system. Particularly if it is difficult and can create a sense of shared purpose to excel among the students.

Maybe it is more down to what we consider virtuous as a society, and the rest follows. Again, catch-22, education can have a significant role in that, but it cannot carry the whole burden either.

My whole family is in teaching and pedagogical research. I also have quite a few younger cousins. It is obvious that the situation has changed a ton in the last 20 years (at least in EU public education) and keeps changing rapidly, perhaps too rapidly.

There is much more emphasis on group assignments, forcing students to learn how work together (or how to deal with people who don’t want to do shit, with minimal leverage on them). Also many more open-ended assignments and exams, that force you to take some initiative, actually understand and engage with the subject, and then make you get good at expressing yourself both in writing and orally. And grit is always part of it if the bar is kept high, which is a complex issue on its own.

It’s just much harder to teach well like this, and it has been pushed on teachers haphazardly. Good teachers are better, bad teachers are worse. Same with students to an extent, there’s a bigger burden on them to actively engage, and it is easier to get away with not doing so because evaluation is more fuzzy, and group work can be avoided. At some point you do need to learn some facts, this model relies more on imparting the skills and values for students to then be able to learn anything they need, but they do have to then go and do it, otherwise you end up worse-off in aggregate.

Teaching facts and making sure students memorized them is clearly worse, but it is much easier to do effectively and systematically at scale. Education is at the end of the day a mass-production factory of adults.

It’s a very hard question and lots of people are dedicating their lives to figuring it out.


I tend to agree. How do we measure those things though?


You outline the OKRs and put KPIs against them.


This was posted a couple days ago:

https://news.ycombinator.com/item?id=41161831


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: