> Working with SQL in X (any language) usually has a poor developer experience that is why ORM or query builders are popular.
At least when it comes to Postgres, I don't understand why more developers don't create their own user-defined functions with PL/pgSQL. It's very a robust and powerful procedural language. In my opinion, ORM's like SQLAlchemy add a completely unnecessary layer of abstraction. ORMs might be convenient for quick/simple queries, but when you're creating or troubleshooting a moderately complex query, it's far better to do it with native SQL than it is with a chained mess of ORM methods.
At my company, we employ UDFs for every SQL query we make and every UDF returns a declared type. The end result is far easier to debug and maintain when compared with the ad hoc ORM query alternatives. Plus it has the side benefit of separating out application code from database code which allows our DBAs to more effectively code review SQL code written by developers.
> You just have to choose wisely your tools for sure, but most of the code you write needs to be rewritten anyway every X years.
Not if you stick with plain SQL and/or Pl/pgSQL :-)
I did this on a recent project, and it worked really well.
I had each function definition in its own .sql file, with a preceding "drop function" call, and a Makefile clause to run them all. Which meant managing versions was easy (coupled with migration .sql files). I also got to find out if any of my SQL was broken right up front, and testing the SQL was simple - call the function and check the return.
I also defined views for return types, so mapping the return values to the structs in the Go code was easy (yes it's boilerplate, but it really is not as painful as the author makes out). Query functions always returned the relevant view type (or a set of them).
The author's approach seems to go to great lengths to avoid a relatively small amount of boilerplate.
Could you please give 2 specific examples on how this would work? Are the functions only for UPDATE/INSERT, or also for reading data? I'm using views to simplify queries, but still via ORM.
Functions are for both reading and modifying data, because the actual SQL for the query might be complex (and therefore better managed as a function than as a string in Go code).
Example: User CRUD
create view vw_user as select id, name, email from users;
create function fn_get_user(p_user_id uuid) returns vw_user as $$ select * from vw_users where id = p_user_id $$;
create function fn_change_user_name(p_user_id uuid, p_name text) returns vw_user as $$ update users set name = p_name where id = p_user_id; select * from vw_users where id = p_user_id; $$
create function fn_create_user(p_name text, p_email text) returns vw_user as $$ insert into users (id, name, email) values (gen_random_uuid(), p_name, p_email); select * from vw_users where id = p_user_id; $$
The advantage here is that there is only one return type, so you only need one ScanUser function which returns a hydrated Go User struct. If you need to change the struct, then you change the vw_user view, and the ScanUser function, and you're done.
Each function maps 1:1 to a Go function that calls it, though it's also possible/easy to have more complex Go functions that call more than one db function. Or indeed, meta-functions that call other functions before returning a value to the Go code.
The problem with ORMS is always that eventually the mapping between struct and database breaks, and you end up having to do some funky stuff to make it work (and from then on in it gets increasingly complex and difficult). The structures in the database are optimised for storing the data. The structures in the Go code need to be optimised for the business processes (and the structures in the UI are optimised for that, so they will be different again). An ORM ignores all of this and assumes that everything maps 1:1. Maintaining those relationships manually (rather than in an ORM) does involve some boilerplate, but it allows you to keep the structures separate.
For example: User UI data:
create view vw_ui_user as select users.id as user_id, users.name, users.email, sessions.id as session_id, max(sessions.when_created) as last_logged_in from users left join sessions on users.id = sessions.user_id group by 1,2,3,4;
create function get_ui_user(p_email text) returns json as $$ select to_json(select * from vw_ui_user where email = p_email);$$
The json data generated from this function can be returned direct to the UI without the Go code needing to do anything to it. If the UI needs different data, the view can be changed without affecting anything else.
caveat I didn't bother checking this for syntax or typos. I have probably made several errors in both.
CREATE OR REPLACE requires that the new function has the same signature as the old one [0]. I understand why, but I needed to change the signature sometimes. It was easier to do a Drop and then a Create as standard. Though this did mean having to manage dependencies between files myself (I prefixed the .sql files with 00_, 01_,02_ etc to indicate order of dependencies). It sounds like a lot of work, but in practice it was easy - it broke very quickly and very loudly if I got it wrong at all ;)
1. How do you handle versioning? Like if you want to try a development branch on a non-branch/shared db. Creating different version of stored procedures creates a recursive problem. A calls B, now A’ has to call B’
2. Sometimes we still need to programmatically decide to include a table in the join or not or get creative on a filter. Pl/pgsql is less flexible in this regard. You get the benefit of syntax checking only when query is verbatim and not dynamically constructed.
> 1. How do you handle versioning? Like if you want to try a development branch on a non-branch/shared db. Creating different version of stored procedures creates a recursive problem. A calls B, now A’ has to call B’
Writing UDFs and using Pl/pgSQL has no impact on how you do versioning. At my company we follow standard Gitflow and use golang-migrate for schema migrations (or Phinx for our PHP code bases).
If you're working at a company where developers are all forced to use the same shared database, then you're going to have a lot of development challenges that are unrelated to UDFs and Pl/pgSQL. Multiple devs sharing the same database always requires some team coordination to ensure that each member isn't stepping on another's toes -- whether that be prefixing your UDFs with your initials during development or agreeing not to work on the same UDFs at the same time.
> 2. Sometimes we still need to programmatically decide to include a table in the join or not or get creative on a filter. Pl/pgsql is less flexible in this regard. You get the benefit of syntax checking only when query is verbatim and not dynamically constructed.
That's just untrue. Pl/pgSQL fully supports conditional logic, dynamic query string construction, multi-query transactions, storing intermediate result sets in a variable or temp table, etc. The use case you described is actually a great example of when you would decide to use Pl/pgSQL. The language is extremely robust.
I get that pgsql can construct dynamic queries, but I was assuming you were talking about the benefit of install time / compile time verification of query syntax. This is true for most regular stored procedures except when query is dynamic. Obviously the exact query isn’t known until runtime. I agree it is not a major downside.
Yep, close to 100% of my data manipulation is done in pl/pgsql. It’s awesome. At least 50% fewer LOC, and 10-100x faster than the equivalent code written in Java or Go, due to all the round trips.
Oh yeah, I completely forgot to mention that. The performance gains are immense when you use Pl/pgSQL to eliminate round-trips to the database. That's easily one of the most important reasons to use Pl/pgSQL.
The vast majority of data-heavy web apps today must have the database running on the same server or within the same datacenter -- they can't tolerate any kind of latency between the application and the database server because they failed to reduce round-trips. Ever tried deploying MediaWiki on an application server with >20ms latency to the database server? It just doesn't work -- each page takes several seconds to load.
If you minimize round-trips to the database, it gives you more flexibility on how you can deploy/host your database server. That's flexibility you want when you're designing failover/disaster recovery schemes.
Yep, I realised I needed to give plpgsql a shot when I started thinking, from first principles, about all the effort I was wasting. Not just machine cycles - the buffer copying, context switches, network switches, latency - but also, as I was working in Java at the time, there was the immense weight of the ridiculous JPA ORM sitting on top of it all, making it worse. When I took a step back and realised what we had done, the minimalist in me went into cardiac arrest.
With plpgsql you define your schema once, in SQL alone; you don't write a million duplicate "entity objects" in your language of choice, there is no friction or "impedance mismatch", no need to catch network errors for each DB call -- you just write SQL and return values like any other Go function. Because my functions are generally self-contained, I rarely even need to bother with transaction management, which eliminates even more round trips.
It's true that I needed to write some supporting code to manage schema upgrades (one day I hope to open source it), and I'm really intrigued to see if I can use sqlc to create Go stubs for my PG functions. But my SQL code sits next to my Go code in my IDE, it's syntax and correctness checked by the GoLand IDE, and life with an SQL database is super enjoyable!
I'm looking forward to integrating plpgsql_check into my build chain.
I’d love to hear how you handle code coverage, dynamic queries (custom filters in reports, etc.), live schema migrations (we aren’t brave and don’t do that at all - and don’t use functions either).
I’m coming from a sqlalchemy background if that helps to set the context.
All my test code runs against sqlite. I'll only need a postgres instance for one or two geospatial queries that use postgis and for now that's manual, so this speeds up my builds. SQL functions tie you in
At least when it comes to Postgres, I don't understand why more developers don't create their own user-defined functions with PL/pgSQL. It's very a robust and powerful procedural language. In my opinion, ORM's like SQLAlchemy add a completely unnecessary layer of abstraction. ORMs might be convenient for quick/simple queries, but when you're creating or troubleshooting a moderately complex query, it's far better to do it with native SQL than it is with a chained mess of ORM methods.
At my company, we employ UDFs for every SQL query we make and every UDF returns a declared type. The end result is far easier to debug and maintain when compared with the ad hoc ORM query alternatives. Plus it has the side benefit of separating out application code from database code which allows our DBAs to more effectively code review SQL code written by developers.
> You just have to choose wisely your tools for sure, but most of the code you write needs to be rewritten anyway every X years.
Not if you stick with plain SQL and/or Pl/pgSQL :-)