MobileDiffusion: Rapid text-to-image generation on-device

minimaxir · on Jan 31, 2024

> With superior efficiency in terms of latency and size, MobileDiffusion has the potential to be a very friendly option for mobile deployments given its capability to enable a rapid image generation experience while typing text prompts. And we will ensure any application of this technology will be in-line with Google’s responsible AI practices.

So I'm interpreting this that it won't ever get released.

kccqzy · on Feb 1, 2024

I'm interpreting it as they will be adding a layer of safety restrictions. Understandable given the furore of the recent Taylor Swift generated image incident.

Everyone needs to do this and probably is already doing this. Search for "ChatGPT lobotomized" and you'll see plenty of complaints about the safety filters added by OpenAI.

babyshake · on Feb 1, 2024

I'm much more comfortable with the idea of AI watermarking images it creates instead of refusing to create images because of "safety", which in practice more often means not wanting to offend anyone. Imagine if word processors like Google Docs refused to write things you wanted to write because of mature themes. The important thing, in my opinion, is to make it a lot more difficult to pass off AI generated content as being authentic and to make provenance traceable if you were to do something like create revenge porn with AI, but not to make AI refuse to create explicit material at all.

AlecSchueler · on Feb 1, 2024

It being authentic or not isn't actually important in a lot of cases though. Consider someone like Mia Janin, who recently took her own life after been harassed using deepfakes. Everyone understood that the images weren't "authentic" but their power to cause distress was very real.

theultdev · on Feb 1, 2024

Is there a difference between being harassed with deepfakes vs fakes?

Photoshop has the power to cause distress too when used maliciously.

You go after the aggressors, not the tool used for aggression.

AlecSchueler · on Feb 1, 2024

Ease of use and accessibility. Think of how we control access to guns even though a baseball bat could also be used to kill or maim someone.

I agree that we should legislate against the aggressors, that's why I'm pointing out the limitations of technical solutions like watermarks. We need extensions to things like revenge pornography laws, if we're talking about legislation, and I don't see any harm in outlawing services that automate the creation of deepfakes.

Of course the only "solution" is that we would universally behind to teach young boys that they are not entitled to women's bodies or their sexuality, but so many grown men apparently disagree that I can't see it happening quickly enough.

spangry · on Feb 1, 2024

I'm one of the grown men who disagree. I don't think treating half the population as pre-criminals, when in reality it's an extremely tiny minority who act in this way, is a particularly good solution. If we were to apply this kind of "solution" to all undesirable behaviours exhibited by deviant minorities of both men and women I doubt there'd be any time for any actual K-12 formal education.

It's a nice applause line though.

AlecSchueler · on Feb 1, 2024

So no to technical solutions, no to legislative solutions and no to education. What do you suggest?

Edit: you disagree that men aren't entitled to women's sexuality?

Edit: I mis-interpreted what was being disagree with.

spangry · on Feb 1, 2024

I suggest we clearly specify the class of problem we're trying to solve and come up with a principled solution that would make sense when applied consistently and universally to all problems in that class. I prefer this over coming up with knee-jerk moral panic patches (e.g. "censor generative models so they can't generate this particular thing I find distasteful") or with overly abstract and tangential problem-solutions (e.g. "just teach men not to be big jerks").

I think the central issue here is: what restrictions, if any, should be placed around creating and distributing a likeness of another person? Are we just looking to prohibit pornographic likenesses, or do you think the restrictions should be broader? What's the threshold rule you would apply? Should these rules be medium-specific, or should we also prohibit people from, say, painting a likeness of another person without their consent?

I guess in a US context you'd also have to consider whether it's constitutional to restrict freedom of expression, even the distasteful ones, in this manner.

Edit: Just saw your edit suggesting that I think "men are entitled to women's bodies" (whatever that means). I think I'll end my participation here, not interested in having a bad faith discussion.

AlecSchueler · on Feb 1, 2024

I'm not in the US but I understand that there are already laws which limit your expression, such as in cases of CSAM or revenge pornography, which night be the closest analogue.

Personally the limits are similar to that, as I'm personally most interested in fighting sexual harassment. The legislation against revenge pornography already faces and tackles issues of what constitutes pornography and when it becomes illegal to disseminate pornographic images of others, so it's not an intractable problem.

Indeed, we also have precedents for limiting the use of tools for certain purposes. Using deepfake technology to generate images akin to CSAM would already be illegal in the UK, but other broader and everyday examples exist like speed limits for cars.

Edit to respond to yours: I said above that we should teach boys they're not entitled to women's sexuality, but that many men disagree. You said you were one of them. I had meant the disagreement being on the entitlement, but I'm now considering that you took it to mean they disagreed with the education about entitlement. It was a misunderstanding, and I was responding in good faith. I didn't suggest anything about you, I asked if my interpretation of your response was correct.

spangry · on Feb 2, 2024

Fair enough on your edit, I accept it was a misinterpretation and appreciate the clarification.

The precedents you raise are worth considering. They're related but not completely analogous to deepfake porn of real people in my view. CSAM is criminalised due to the direct harm its production inflicts on minors and the deep injury to society that follows. Deepfake CSAM, I presume, has more of an 'obscenity' rationale as there is no actual direct harm inflicted on minors in that case. I suppose you could have a similar obscenity rationale for criminalising deepfake porn but you would then have to accept that pornography in general should be outlawed. An obscenity rationale would also be more supportive of criminal sanctions, as acts of obscenity injure society in addition to individual subjects.

I think revenge pornography is the best analogy here. I assume the policy rationale / theory of harm for criminalising 'revenge porn' (i.e. distributing true intimate private images of another person) is one of two things: (1) violation of the subject's privacy or (2) infliction of psychological harm on the subject. If the policy rationale is (1) then I don't I don't think there's a sound analogy to deepfake porn - deepfakes are fictional and so do not violate the privacy of the subject.

If the rationale is (2), psychological harm, then I could see a similar policy rationale for legislating against deepfake porn. But if psychological harm is your policy rationale then wouldn't it make more sense to directly criminalise the infliction of psychological harm on others regardless of the method used? If we were regulating on a principled and universal basis we should pass a law that criminalises any act, publication or utterance that inflicts psychological harm on another person, rather than using the law to solve single instances of this class of offences. Although I'd strongly disagree with such a law due to the chilling effect it would have on all forms of speech, expression and public commentary I think there's at least a principled argument to be had.

But if you legislate on this principle then you have to grapple with the far reaching implications of such a law - if someone writes some smutty (but fictional) erotic story about me that I find psychologically distressing should they then be thrown in jail? What if they say hurtful things to me that I find psychologically harmful? What if they insult a religion or political candidate, party or ideology that I strongly identify with? We all inflict psychological harm on others from time to time - what should minimum harm threshold be?

Personally, I don't think the criminal law is the answer in either the deepfake or revenge porn cases if the rationale is 'psychological harm'. Although I'm not sure where I stand on the following, I think a civil tort for infliction of psychological harm would be the sanest option if we feel the need to regulate against infliction of psychological harm. It would be analogous to defamation and libel torts, but instead of having to prove economic harm the plaintiff would have to prove some minimum threshold level of psychological harm to become entitled to compensation from the defendant in proportion to the actual provable injury sustained.

My thoughts aside, what is your general theory of harm / principled policy rationale here and, on that basis, what do you think the state's response should be to regulate?

kj99 · on Feb 1, 2024

Presumably enforcement of existing laws.

AlecSchueler · on Feb 1, 2024

Which laws do you mean and where do they apply?

This article talks a bit about the lack of legal power to fight against deepfakes: https://mcolaw.com/theres-not-much-we-can-legally-do-about-d...

kj99 · on Feb 1, 2024

Good information. Then the solution would be to improve harassment legislation rather than limiting the availability of tools. Just as assault is illegal but we don’t require all hammers to be made out of foam.

AlecSchueler · on Feb 1, 2024

Indeed.

postalrat · on Feb 1, 2024

The number of people who committed suicide after being harassed with memes or emoji must be higher than those harassed with deep fakes. Too bad nobody is interested enough in banning emoji to do a study.

spangry · on Feb 1, 2024

Why stop there? I think we'd all agree that "mean words", either written or spoken, have immense power to "cause distress" and have driven many a person to suicide. We should ban those.

AlecSchueler · on Feb 1, 2024

We do. Incitement to violence or "true threats" for example already fall outside of 1st amendment protections. I personally see deepfakes created or disseminated for harassment purposes as an act of violence.

spangry · on Feb 1, 2024

I'm neither referring to "true threats" nor making any kind of argument about the US constitution, so I'm unsure why you're bringing these up. I thought it would have been pretty clear that in the context of driving people to suicide, I was suggesting banning "insulting words". Hope that clarifies things for you.

AlecSchueler · on Feb 1, 2024

It seemed like you were making a sarcastic comment about the policing of harmful communications. I mostly hear such arguments from US citizens so wanted to point out that even the US has precedent for limiting such expressions to deter violence.

AlecSchueler · on Feb 1, 2024

Can you link to an article where emojis have been pivotal in an harassment case leading to suicide?

They've been around for 30 years, deepfakes as they are today have been around for less than a year. I'm not sure absolute numbers are the best thing to look at either way.

postalrat · on Feb 1, 2024

The emoji and meme industry has been hiding these stats for years. Facebook, X, TikTok, Apple, etc have the data you want but they won't give it to you.

Legend2440 · on Feb 1, 2024

I would interpret it as "expect to see this powering some features in the next-generation Pixel".

jsheard · on Feb 1, 2024

We've already seen this progression - they debuted Magic Eraser as a cloud feature, then with the Pixel 8 they got it running locally on the device. But they also introduced Magic Editor with the Pixel 8, running on the cloud, and the Pixel 9 or 10 will probably run it on-device.

ollin · on Feb 1, 2024

it may turn out more like the imagen timeline

2022-05 - google imagen research paper posted https://news.ycombinator.com/item?id=31484562

2022-12 - imagen developers leave google to form ideogram

2023-08 - ideogram ships a version of imagen, free, for anyone who wants to use it https://ideogram.ai/publicly-available

2023-12 - google "imagen 2" is officially "generally available for Vertex AI customers on the allowlist (i.e., approved for access)." https://news.ycombinator.com/item?id=38628417

ollin · on Feb 1, 2024

some points that stood out to me:

1. they made a lot of careful tweaks to the unet network architecture - it seems like they ran many different ablations here ("In total, our endeavor consumes approximately 512 TPUs spanning 30 days").

2. the model distillation is based on previous UFOGen work from the same team https://arxiv.org/abs/2311.09257 (hence the UFO graphic in the diffusion-gan diagram)

3. they train their own 8-channel latent encoder / decoder ("VAE") from scratch (similar to Meta's Emu paper) instead of using the SD VAEs like many other papers do

4. they use an internal dataset of 150m image/text pairs (roughly the size of laion-highres)

5. they also reran SD training from scratch on this dataset to get their baseline performance

SushiHippie · on Jan 31, 2024

Kind of funny that they show the iphone 15 pro and the Samsung S24 in the comparison chart, but not their own phone the google pixel 8. (I know it will perform worse than both phones)

geor9e · on Feb 1, 2024

Pixel marketing touts the NPU chip, which sounds ideal for accelerating a model like this. They might have reasons for leaving it out. Perhaps it's planned to ship with a new model - announcing a new phone in a model paper would be weird. I remember them advertising some Pixel-only photo editing features in the past.

refulgentis · on Feb 1, 2024

It's great in theory until you have to pierce through several organizational boundaries to actually leverage it. Too sclerotic for it to happen regularly at scale.

geor9e · on Feb 3, 2024

This is true. Knowing Google, their own blog post is probably the first time anyone 2 steps up the reporting ladder even heard of it.

causality0 · on Feb 1, 2024

All the Pixel AI stuff runs on the cloud anyway. Just try using it in airplane mode.

dvngnt_ · on Feb 1, 2024

not entirely true. translation is done on device

adamzochowski · on Feb 1, 2024

Not entirely true either. If it thinks it has network but it's flakey, it won't translate offline, it will say there is network error and will give you a button to retry. No button to do offline.

Additionally, in airplane mode it heavily doesn't want to translate, in my use case I have to go to saved translations as otherwise it won't even let me type what I need to translate.

kimixa · on Feb 1, 2024

I just tried airplane mode on my pixel 7 pro and it seemed to be able to translate from the camera without problems

It doesn't seem to do it "live" in the preview without network access, though.

And the translation app seemed to get into a bad state and fail to download the language packs without first clearing the data, saying I need to download the pack, but the language list showing it already was. Though I haven't even opened it since I transferred it from my old phone, so if there's some phone-specific stuff going on that might have got messed up.

numpad0 · on Feb 1, 2024

Pixel 8 / Tensor G3 specifically is rumored to have a performance or heat issue. The scope of that rumored issue is limited to that generation.

SushiHippie · on Feb 1, 2024

Magic eraser in google photos is also on device. (I have not given Google Photos Network permission and don't have google play services installed)

The voice recorder transcription is also on device, but I haven't gotten it to work without google play services on GrapheneOS

_the_inflator · on Feb 1, 2024

There is some truth to it.

I don’t see it as a disadvantage, since Google markets services on both devices you mentioned. Hardly anyone will abandon its iPhone in favor of a Pixel just for a Google service.

So I think it’s ok what Google did marketing wise.

cuuupid · on Feb 1, 2024

Google has fallen so far. Both Inception and Mobilenet were released openly and changed the entire AI world.

Nowadays we just get blog posts about results that were supposedly achieved, an accompanying paper that can’t be reproduced (because of Google’s magical “private datasets”), and some screencaps of a cool application of the tech that is virtually guaranteed to never make it to product.

tussa · on Feb 1, 2024

Probably because the actual product is garbage.

Remember the Google assistant demo, where it booked a table at a restaurant?

That never materialized. Google assistant is just eating crayons today.

the-rc · on Feb 1, 2024

It did materialize.

https://www.reddit.com/r/googlehome/comments/ezv3us/google_a...

Or the comments under https://youtu.be/-RHG5DFAjp8

It's probably hard to trigger these days because most places support OpenTable or similar.

refulgentis · on Feb 1, 2024

Yeah it did

But to your point its not as impressive as it was sold: glorified reverse phone tree. But it is what it was sold as, and it is available.

N3cr0ph4g1st · on Feb 2, 2024

Assistant makes reservations for me just like that demo :P but in general I agree it's gone very downhill

inquirerGeneral · on Feb 1, 2024

It did work, I used it myself. A quick search shows others who had my experience. This was late 2019 for me. Here's the first link and the Google post on rolling out in summer 2019:

"Google Assistant just made a dinner reservation for me... I knew this was coming... but mind blown!" https://www.reddit.com/r/googlehome/comments/ezv3us/google_a...

"Now, you can use it on all Pixel phones in 43 U.S. states.

All it takes is a few seconds to tell your Assistant where you'd like to go. Just ask the Assistant on your phone, “Book a table for four people at [restaurant name] tomorrow night.” The Assistant will then call the restaurant to see if it can accommodate your request. Once your reservation is successfully made, you’ll receive a notification on your phone, an email update and a calendar invite so you don’t forget."

https://blog.google/products/assistant/book-table-google-ass...

minimaxir · on Feb 1, 2024

No one ever filed multimillion dollar lawsuits over Inception and MobileNet.

The calculus is different now.

sp332 · on Jan 31, 2024

I don't suppose there's any way to actually get this?

whywhywhywhy · on Feb 1, 2024

Are like people at Google Research not embarrassed that none of this stuff ever makes it to real life?

Google AI internally needs a huge culture change, stop acting like academics making things for academics and start working like developers making products for customers.

I'd say in 10 years we'll be looking back and seeing the wasted potential but actually you can look back around 10 years and already see the wasted potential of all the things Google demoed or papered and never shipped.

unclad5968 · on Feb 1, 2024

That's what research is. It's somebody else's job to determine whether or not it would be useful as a product.

whywhywhywhy · on Feb 6, 2024

That attitude is why they're now behind in the AI race despite a 10 year head start...

Stop larping as an academic and actually start working as an employee, you know what they actually are.

rysertio · on Feb 1, 2024

Original paper: https://arxiv.org/abs/2311.16567

djoldman · on Feb 1, 2024

https://arxiv.org/pdf/2311.16567.pdf

spupy · on Feb 1, 2024

So what could be some use cases for this apart from as a toy or for faking photos/art?

simultsop · on Feb 1, 2024

Generating memes to friends sparing search time. Describing sketches have them on the fly sharing with teams?

nonrandomstring · on Feb 1, 2024

People search and remember things visually. Even if they're not consciously aware. So on the Cybershow [0] we decided to jump-in and use AI images as a quick way to visually tag episodes with something meaningful and fun.

We did that despite some moral ambivalence/uneasiness around AI "art".

For example, give me a "young and exciting Dana Meadows in front of a board of systems theory"

I'm not awful at photoshopping things, and sometimes that's the only way to get a specific image one has in mind. But it saves time and lets us concentrate on writing and researching instead.

TBH if an artist/illustrator came along and said "Let me do the episode icons even though you can't pay me yet" I'd feel inclined to ask the AI to step aside.

[0] https://cybershow.uk/episodes.php

malka · on Feb 1, 2024

At first it seems nice. Then I realise it is by Google.

This will never see the light of day.

asimpleusecase · on Feb 1, 2024

Google may very well be first to create AGI but it will be wrapped in so many “safety” layers that it would effectively be lobotomised. Let’s just hope that a Google AGI never gets to watch A clockwork orange.

itsTyrion · on Feb 1, 2024

I'm tired of this crap

thyrox · on Feb 1, 2024

I never upvote any Google's A.I. research articles as most of the time it is: look what we have done, but we will never release anything.

OpenAi gets a lot of criticism for being closed, but at least I can play with their api most of the time.

What's the point of this if we will never be able to use this?

rvnx · on Feb 1, 2024

Corporate AND personal marketing.

AI researchers can make any claim, the risk of getting busted is close to none.

Didn't work ? Well dataset was different

Didn't work ? Well code was different

Can I try your work ? Well it's proprietary / I don't have access / We shutdowned the cluster

But the result is guaranteed increase in salary and job opportunities.

Since these companies are publicly listed, they are by definition encouraged and encouraging to make grandiose claims in order to make themselves more attractive to investors, and they can blame the individuals if it becomes discovered.

My favorite being that Bard (PaLM version) is sentient, but it was too big this time.

Imagine a large pharmaceutical company claiming they can cure very important diseases, but the results cannot be independently verified, nor audited.

It’s ok in the short-term, but not when you make that claim during few years.

nuz · on Feb 1, 2024

The point is probably the implication that it'll be pushed to android as a native feature (in the photos app or similar), thus it making sense for investors reading this to put money into google rather than e.g. stability or openai etc. The people writing the article are likely shareholders etc.

htsh · on Feb 1, 2024

Dreambooth was kinda great?

That said, I agree that I wish there were more done post-research towards products with some of this stuff.

SignalM · on Feb 1, 2024

[flagged]

kelnos · on Feb 1, 2024

What a sadly narrow definition of the hacker spirit you seem to have.

I enjoy reading articles about interesting research, regardless if there's a practical application of it or not.

(If you don't like this sort of thing, just flag it and move on. No need to waste comment space with denouncements.)

_kwef · on Feb 1, 2024

If you don't like their comment, just flag it and move on. No need to waste comment space with more denouncements.