A way to edit/improve tags for existing images would be useful. For example, the duck images could use a "Sex" tag.
And probably some kind of uploader account would be a good idea as well. So if somebody contacts you about an image they uploaded, you can verify if they were the original uploader. And you could give a user more rights related to editing metadata for images they uploaded.
Maybe the species + common name could be normalized instead of being free text fields. Especially if you find an existing database of species names.
Really appreciate this detailed feedback. These are the types of features that would make the platform more robust long-term.
Re: tagging/metadata - You're right that additional fields like sex, age, life stage would be valuable. Right now keeping it simple to reduce upload friction, but could definitely add optional metadata fields as the collection grows.
Re: user accounts - Good point about verification and editing rights. Currently everything is anonymous/CC0, but user accounts would enable better provenance tracking and let uploaders manage their own submissions. Thinking about how to add this without adding too much complexity to the upload flow.
Re: normalized taxonomy - Already using GBIF's taxonomy API for species names (auto-suggest on upload), so names are normalized to their database. Common names are free text for now since they vary regionally. Could potentially pull those from GBIF as well.
This is pretty helpful. Gives me a clear roadmap for v2 features. Trying to balance simplicity (to encourage contributions) with the kind of rich metadata that makes the collection more useful. Open to thoughts on how to strike that balance. Thanks!
I'm confused by that thing. After scrolling for a couple of minutes, the animation seems to play on on its own for longer than I cared to wait. Isn't the animation supposed to be coupled to the scrolling?
And it doesn't really feel like scrolling either, since apart from the tiny depth bar on the left, no content scrolls by.
The punchlines are there mostly to make the empty scroll a bit more entertaining while you go down. But you're right that nothing really scrolls by in the traditional sense.
The idea is more about measuring how far your thumb travels than about browsing actual content.
I think this was actually on the table and rejected. Which is why the "control the narrative" argument is fishy, or at least missing something. And now having just defended Oracle I have to go take a shower.
How does that work? I assume these APIs use SSL, which should prevent such MitM attacks.
Are those Apps using the system SSL library which bypasses certificate validation for those domains? Or does the OS add a Root CA to the certificate store which signs fake certificates for those domains?
I forget the shape of the API but the pebble requests resources over Bluetooth and the mobile app actually makes the requests so it should be able to rewrite anything before/after a request easily.
I wonder what the thought process of the Go designers was when coming up with that approach. Function scope is rarely what a user needs, has major pitfalls, and is more complex to implement in the compiler (need to append to an unbounded list).
> I wonder what the thought process of the Go designers was when coming up with that approach.
Sometimes we need block scoped cleanup, other times we need the function one.
You can turn the function scoped defer into a block scoped defer in a function literal.
AFAICT, you cannot turn a block scoped defer into the function one.
So I think the choice was obvious - go with the more general(izable) variant. Picking the alternative, which can do only half of the job, would be IMO a mistake.
I hate even more that you can call defer in a loop, and it will appear to work, as long as the loop has relatively few iterations, and is just silently massively wasteful.
I know. Or in some cases, you can put the loop body in a dedicated function. There are workarounds. It's just bad that the wrong way a) is the most obvious way, and b) is silently wrong in such a way that it appears to work during testing, often becoming a problem only when confronted with real-world data, and often surfacing only as being a hard-to-debug performance or resource usage issue.
In a tight loop you'd want your cleanup to happen after the fact. And in, say, an IO loop, you're going to want concurrency anyway, which necessarily introduces new function scope.
> In a tight loop you'd want your cleanup to happen after the fact.
Why? Doing 10 000 iterations where each iteration allocates and operates a resource, then later going through and freeing those 10 000 resources, is not better than doing 10 000 iterations where each iteration allocates a resource, operates on it, and frees it. You just waste more resources.
> And in, say, an IO loop, you're going to want concurrency anyway
This is not necessarily true; not everything is so performance sensitive that you want to add the significant complexity of doing it async. Often, a simple loop where each iteration opens a file, reads stuff from it, and closes it, is more than good enough.
Say you have a folder with a bunch of data files you need to work on. Maybe the work you do per file is significant and easily parallelizable; you would probably want to iterate through the files one by one and process each file with all your cores. There are even situations where the output of working on one file becomes part of the input for work on the next file.
Anyway, I will concede that all of this is sort of an edge case which doesn't come up that often. But why should the obvious way be the wrong way? Block-scoped defer is the most obvious solution since variable lifetimes are naturally block-scoped; what's the argument for why it ought to be different?
It doesn't just have to be files, FWIW. I once worked in a Go project which used SDL through CGO for drawing. "Widgets" were basically functions which would allocate an SDL surface, draw to it using Cairo, and return it to Go code. That SDL surface would be wrapped in a Go wrapper with a Destroy method which would call SDL_DestroySurface.
And to draw a surface to the screen, you need to create an SDL texture from it. If that's all you want to do, you can then destroy the SDL surface.
So you could imagine code like this:
strings := []string{"Lorem", "ipsum", "dolor", "sit", "amet"}
stringTextures := []SDLTexture{}
for _, s := range strings {
surface := RenderTextToSurface(s)
defer surface.Destroy()
stringTextures = append(stringTextures, surface.CreateTexture())
}
Oops, you're now using way more memory than you need!
Why would you allocate/destroy memory in each iteration when you can reuse it to much greater effect? Aside from bad API design, but a language isn't there to paper over bad design decisions. A good language makes bad design decisions painful.
The surfaces are all of different size, so the code would have to be more complex, resizing some underlying buffer on demand. You'd have to split up the text rendering into an API to measure the text and an API to render the text, so that you could resize the buffer. So you'd introduce quite a lot of extra complexity.
And what would be the benefit? You save up to one malloc and free per string you want to render, but text rendering is so demanding it completely drowns out the cost of one allocation.
Why does the buffer need to be resized? Your malloc version allocates a fixed amount of memory on each iteration. You can allocate the same amount of memory ahead of time.
If you were dynamically changing the malloc allocation size on each iteration then you have a case for a growable buffer to do the same, but in that case you would already have all the complexity of which you speak as required to support a dynamically-sized malloc.
> The example allocates an SDL_Surface large enough to fit the text string each iteration.
Impossible without knowing how much to allocate, which you indicate would require adding a bunch of complexity. However, I am willing to chalk that up to being a typo. Given that we are now calculating how much to allocate on each iteration, where is the meaningful complexity? I see almost no difference between:
>> The example allocates an SDL_Surface large enough to fit the text string each iteration.
> Impossible without knowing how much to allocate
But we do know how much to allocate? The implementation of this example's RenderTextToSurface function would use SDL functions to measure the text, then allocate an SDL_Surface large enough, then draw to that surface.
> I see almost no difference between: (code example) and (code example)
What? Those two code examples aren't even in the same language as the code I showed.
The difference would be between the example I gave earlier:
stringTextures := []SDLTexture{}
for _, str := range strings {
surface := RenderTextToSurface(str)
defer surface.Destroy()
stringTextures = append(stringTextures, surface.CreateTexture())
}
> Remember, I'm talking about the API to a Go wrapper around SDL.
We were talking about using malloc/free vs. a resizable buffer. Happy to progress the discussion towards a Go API, however. That, obviously, is going to look something more like this:
renderer := SDLRenderer()
defer renderer.Destroy()
for _, str := range strings {
surface := renderer.RenderTextToSurface(str)
textures = append(textures, renderer.CreateTextureFromSurface(surface))
}
I have no idea why you think it would look like that monstrosity you came up with.
> No. We were talking about using malloc/free vs. a resizable buffer.
No. This is a conversation about Go. My example[1], that you responded to, was an example taken from a real-world project I've worked on which uses Go wrappers around SDL functions to render text. Nowhere did I mention malloc or free, you brought those up.
The code you gave this time is literally my first example (again, [1]), which allocates a new surface every time, except that you forgot to destroy the surface. Good job.
I invite you to read the code again. You missed a few things. Notably it uses a shared memory buffer, as discussed, and does free it upon defer being executed. It is essentially equivalent to the second C snippet above, while your original example is essentially equivalent to the first C snippet.
Wait, so your wrapper around SDL_Renderer now also inexplicably contains a scratch buffer? I guess that explains why you put RenderTextToSurface on your SDL_Renderer wrapper, but ... that's some really weird API design. Why does the SDL_Renderer wrapper know how to use SDL_TTF or PangoCairo to draw text to a surface? Why does SDL_Renderer then own the resulting surface?
To anyone used to SDL, your proposed API is extremely surprising.
It would've made your point clearer if you'd explained this coupling between SDL_Renderer and text rendering in your original post.
But yes, I concede that if there was any reason to do so, putting a scratch surface into your SDL_Renderer that you can auto-resize and render text to would be a solution that makes for slightly nicer API design. Your SDL_Renderer now needs to be passed around as a parameter to stuff which only ought to need to concern itself with CPU rendering, and you now need to deal with mutexes if you have multiple goroutines rendering text, but those would've been alright trade-offs -- again, if there was a reason to do so. But there's not; the allocation is fast and the text rendering is slow.
You're right to call out that the SDLRenderer name was a poor choice. SDL is an implementation detail that should be completely hidden from the user of the API. That it may or may not use SDL under the hood is irrelevant to the user of the API. If the user wanted to use SDL, they would do so directly. The whole point of this kind of abstraction, of course, is to decouple of the dependence on something like SDL. Point taken.
Aside from my failure in dealing with the hardest problem in computer science, how would you improve the intent of the API? It is clearly improved over the original version, but we would do well to iterate towards something even better.
Some hypothetical example numbers: if software-rendering text takes 0.1 milliseconds, and I have a handful of text strings to render, I may not care that rendering the strings takes a millisecond or two.
But that 0.1 millisecond to render a string is an eternity compared to the time it takes to allocate some memory, which might be on the order of single digit microseconds. Saving a microsecond from a process which takes 0.1 milliseconds isn't noticeable.
It might be preferable to create a font atlas and just allocate printable ASCII characters as a spritesheet (a single SDL_Texture* reference and an array of rects.) Rather than allocating a texture for each string, you just iterate the string and blit the characters, no new allocations necessary.
If you need something more complex, with kerning and the like, the current version of SDL_TTF can create font atlases for various backends.
Completely depends on context. If you're rendering dynamically changing text, you should do as you say. If you have some completely static text, there's really nothing wrong with doing the text rendering once using PangoCairo and then re-using that texture. Doing it with PangoCairo also lets you do other fancy things like drop shadows easier.
Opening a file is fairly fast (at least if you're on Linux; Windows not so much). Synchronous code is simpler than concurrent code. If processing files sequentially is fast enough, for what reason would you want to open them concurrently?
For concurrent processing you'd probably do something like splitting the file names into several batches and process those batches sequentially in each goroutine, so it's very much possible that you'd have an exact same loop for the concurrent scenario.
P.S. If you have enough files you don't want to try to open them all at once — Go will start creating more and more threads to handle the "blocked" syscalls (open(2) in this case), and you can run out of 10,000 threads too
You'd probably have to be doing something pretty unusual to not use a worker queue. Your "P.S." point being a perfect case in point as to why.
If you have a legitimate reason for doing something unusual, it is fine to have to use the tools unusually. It serves as a useful reminder that you are purposefully doing something unusual rather than simply making a bad design choice. A good language makes bad design decisions painful.
You have now transformed the easy problem of "iterate through some files" into the much more complex problem of either finding a work queue library or writing your own work queue library; and you're baking in the assumption that the only reasonable way to use that work queue is to make each work item exactly one file.
What you propose is not a bad solution, but don't come here and pretend it's the only reasonable solution for almost all situations. It's not. Sometimes, you want each work item to be a list of files, if processing one file is fast enough for synchronisation overhead to be significant. Often, you don't have to care so much about the wall clock time your loop takes and it's fast enough to just do sequentially. Sometimes, you're implementing a non-important background task where you intentionally want to only bother one core. None of these are super unusual situations.
It is telling that you keep insisting that any solution that's not a one-file-per-work-item work queue is super strange and should be punished by the language's design, when you haven't even responded to my core argument that: sometimes sequential is fast enough.
Your comment was in reply to nasretdinov, but its fundamental logic ignores what I've been telling you this whole time. You're pretending that the only solution to iterating through files is a work queue and that any solution that does a synchronous open/close for each iteration is fundamentally bad. I have told you why it isn't: you don't always need the performance.
for _, filename := range files {
queue <- func() {
f, _ := os.Open(filename)
defer f.Close()
}
}
or more realistically,
var group errgroup.Group
group.SetLimit(10)
for _, filename := range files {
group.Go(func() error {
f, err := os.Open(filename)
if err != nil {
return fmt.Errorf("failed to open file %s: %w", filename, err)
}
defer f.Close()
// ...
return nil
})
}
if err := group.Wait(); err != nil {
return fmt.Errorf("failed to process files: %w", err)
}
Perhaps you can elaborate?
I did read your code, but it is not clear where the worker queue is. It looks like it ranges over (presumably) a channel of filenames, which is not meaningfully different than ranging over a slice of filenames. That is the original, non-concurrent solution, more or less.
// Spawn workers
for _ := range 10 {
go func() {
for path := range workQueue {
fp, err := os.Open(path)
if err != nil { ... }
defer fp.Close()
// do work
}
}()
}
// Iterate files and give work to workers
for _, path := range paths {
workQueue <- path
}
Maybe, but why would one introduce coupling between the worker queue and the work being done? That is a poor design.
Now we know why it was painful. What is interesting here is that the pain wasn't noticed as a signal that the design was off. I wonder why?
We should dive into that topic. I suspect at the heart of it lies why there is so much general dislike for Go as a language, with it being far less forgiving to poor choices than a lot of other popular languages.
I think your issue is that you're an architecture astronaut. This is not a compliment. It's okay for things to just do the thing they're meant to do and not be super duper generic and extensible.
It is perfectly okay inside of a package. Once you introduce exports, like as seen in another thread, then there is good reason to think more carefully about how users are going to use it. Pulling the rug out from underneath them later when you discover your original API was ill-conceived is not good citizenry.
But one does still have to be mindful if they want to write software productively. Using a "super duper generic and extensible" solution means that things like error propagation is already solved for you. Your code, on the other hand, is going to quickly become a mess once you start adding all that extra machinery. It didn't go unnoticed that you conveniently left that out.
Maybe that no longer matters with LLMs, when you don't even have to look the code and producing it is effectively free, but LLMs these days also understand how defer works so then this whole thing becomes moot.
What's the issue with logging? I would have expected stdout/stderr to get automatically transferred to the providers managed logging solution (e.g. cloudwatch).
Though I never really understood the appeal of FaaS over something like Google-Cloud-Run.
As a developer who spent a couple months developing a microservice using aws lambda functions:
it SUCKS. There's no interactive debugging. Deploy for a minute or 5 depending on the changes, then trigger the lambda, wait another 5 minutes for all the logs to show up. Then proceed with printf/stack trace debugging.
For reasons that I forgot, locally running the lambda code on my dev box was not applicable. Locally deploying the cloud environment neither.
I wasn't around for the era but I imagine it's like working on an ancient mainframe with long compile times and a very slow printer.
Cloud Run is one service where GCP really shines. It is very flexible: handles services and long running background jobs, less annoying run time limitations like you find in Lambda.
And probably some kind of uploader account would be a good idea as well. So if somebody contacts you about an image they uploaded, you can verify if they were the original uploader. And you could give a user more rights related to editing metadata for images they uploaded.
Maybe the species + common name could be normalized instead of being free text fields. Especially if you find an existing database of species names.
reply