Public Suffix List Problems

randomstring · on Sept 6, 2019

This was a challenge for blekko when we built our search engine. A major component to ranking a URL is the cumulative rank of the domain it is hosted on. You don't wan't URL ranking to leak between independent sites sharing the same top level domain. Blog hosting sites are a prime example.

Another factor was budgeting crawl resources. The crawler has a lot of pages to crawl and if you let it, it'd just do a deep dive on amazon.com and never come back. So dividing crawl budget between domains and subdomains is important. As noted in the article, it can be gamed so you have to guard against that algorithmically and occasionally make special cases.

Prior to the Public Suffix List (https://publicsuffix.org/) project I was using the Mozilla project's list of top level domains. I had code that would download the latest TLD list from Mozilla (and later the publicsuffix.org) and generate a trie that the blekko crawler could traverse to obtain the TLD and the subdomain. Blekko kept all sorts of data on every domain and subdomain, including (but not limited to) domain rank, host IPs, country, language, average porn score per page, etc.

My Perl TLD parsing code is here: https://github.com/randomstring/Net-Domain-PublicSuffix

Interesting aside: the TLD .US (https://en.wikipedia.org/wiki/.us) has some "weird" rules that must have seemed like a good idea when they were proposed, but complicate parsing rules. For instance in the hostname town.windermere.fl.us the "town" is significant to differentiate it from a potentially different entity hosted on co.windermere.fl.us (the county of Winermere Florida, if such a thing exists). Thankfully, few cities/counties/villages/etc use theses convoluted .us domains. Many opt for a more traditional TLD (seattle.gov for instance).

throwaway2048 · on Sept 6, 2019

What would make a lot more sense is a DNS mechanism for domains to indicate their subdomains are trusted/untrusted with parent domain cookies, possibly with a white/blacklist mechanism.

It would be a lot more manageable than some centralized list that tries to capture every user controllable content domain on the internet.

regecks · on Sept 7, 2019

No thanks. There are already too many DNS lookups and TCP connections a user-agent has to make in 2019:

- OCSP if not stapled.

- IPv4/IPv6 racing/Happy Eyeballs.

- Encrypted SNI (DNS lookup).

- Certificate Transparency: some Chrome builds are now querying logs to confirm embedded SCTs (if I understood correctly).

and now:

- Related Domains by DNS (draft-brotman-rdbd-02), or whatever replaces the PSL.

RDBD would have to be in the critical path (in order to understand the cookie scope), it's just going to make things slow and complex.

Are we also gonna give the same treatment to the HSTS preload list?

throwaway2048 · on Sept 7, 2019

DNS requests don't need to be serial, blocking load, its easy to issue a burst of them at once. Virtually every user agent is already doing this I don't see the downside of that.

Not sure why you feel there is arbitrarily "too many", do you have a concrete objection?

regecks · on Sept 7, 2019

Extra queries are a gamble that you don't encounter latency on any of the important ones, right? For example, on mobile data when you're hitting packet loss.

I'm not so fussed about extra queries when they do not delay the user. But if we send out a RDBD query that a resolver doesn't respond to in a speedy manner, what does the user-agent do if it needs to decide how to scope a cookie? You can't exactly soft-fail something like that ...

sleevi · on Sept 8, 2019

This has been a common suggestion since before the Publix Suffix List existed, as you can see from the linked issues in the text (and the references to the IETF DBOUND WG).

Like most things, on first glance, it seems like it does make sense. Except it has a lot of issues, which you can see have been discussed for 15 years without resolution, even though yes, it would scale better.

tedunangst · on Sept 6, 2019

Ouch. I don't think many people consider implications of cnameing blog.example.com to blogspot.com and store.example.com to shopify.com and chat.example.com to discord.com, etc.

zawerf · on Sept 6, 2019

Can you elaborate what the implications are?

Is the attack something like a malicious site operator cnameing subdomain.evil.com to yourbank.com and which can then be accessed from evil.com? Same origin policy would still block that right?

unilynx · on Sept 6, 2019

Not sure if that's what GP is referring to, but one thing it would do is increase your attack surface...

If I setup bank.mycompany.com and discourse.mycompany.com as CNAMEs, bank.mycompany.com now sets a cookie at mycompany.com, and discourse.mycompany.com is hacked.. the latter could be used to read the cookies from the first. Which wouldn't have happened without the CNAMEs, or if my company.com was on the PSL.

Lots of "ifs" and "buts" though before something like that would happen in practice...

zenexer · on Sept 8, 2019

That has no relation to the use of CNAMEs—this problem exists even if CNAME records aren’t used.

If “z” is a public suffix, x.y.z can set and read cookies from x.y.z and y.z, but not z. Whether this is possible has nothing to do with whether x.y.z is a CNAME.

zenexer · on Sept 8, 2019

The only implications of which I’m aware are in relation to web servers that serve the same content regardless of the Host header. It is then possible for someone to log into and interact normally with the website, but the cookies will be associated with the domain controlled by the attacker. The attacker can then change where the domain points later and retrieve the cookies.

This sort of attack is rather obscure, though. It’s a lot more effective to create a phishing site with just a log in form—or, if you want to get fancy, even proxy everything directly from the real website. You can then steal passwords immediately, rather than stealing session cookies after-the-fact. This works even against servers that pay attention to the Host header.

Edit: After re-reading the relevant portion of the article, I believe the author is referring to something similar to this invalid scenario:

1. Alice operates a web store at alice.example.

2. To make purchases, alice.example makes HTTP requests to api.alice.example from within the browser.

3. api.alice.example doesn’t pay attention to Host headers.

4. Mallory operates a malicious website at mallory.example.

5. Mallory cannot make requests to api.alice.example from mallory.example—that would break the same-origin policy. She works around this by configuring api.mallory.example as a CNAME to api.alice.example.

6. Bob has an account at alice.example, and is currently logged in.

7. Mallory tricks Bob into visiting mallory.example.

8. The content loaded by mallory.example makes a request to api.mallory.example indicating that Bob is making a purchase.

This doesn’t work—Bob is safe, and Mallory doesn’t score any loot. When the browser makes the request to api.mallory.example, it will pass cookies from api.mallory.example and mallory.example, not api.alice.example and alice.example. The browser pretends it has no idea that api.mallory.example and api.alice.example point to the same server.

In order for this to work, Bob would need to log in again, but this time at mallory.example—and this brings us back to my original point; it’s easier and more effective to set up a standard phishing page without any CNAME shenanigans.

While I agree with most of the author’s points, this one baffles me. I don’t see any realistic security implication of what the author has described in this scenario. There are theoretical implications, but in every case it seems there are far easier and more effective attacks. Furthermore, it’s nearly impossible to solve the problem described here; the proposed solutions the author mentions for replacing the Public Suffix List don’t resolve the CNAME threat I’ve described.

Edit 2: I’ve opened an issue asking for clarification: https://github.com/sleevi/psl-problems/issues/4

zenexer · on Sept 9, 2019

The author responded: https://github.com/sleevi/psl-problems/issues/4#issuecomment...

mazirian · on Sept 7, 2019

I have a button to clear the current domain's cookies so I use the PSL to determine the eTLD so I don't blow away unrelated cookies but also not limited to whatever sub-domain I'm currently on.

Not sure how to do this without the PSL after reading the post.

unilynx · on Sept 6, 2019

I'm not convinced yet...

One of the uses of PSL is to separate mutual distrusting users to whom you provide a domain (eg appspot.com) and with the PSL, it actually sets everything up to prevent accidental cookie leakage... a perfect case of 'secure by default' for new customers, without requiring every developer to properly implementing origin policies.

As far as I see the PSL is still the only way to provide that security in a failsafe way. So let's perhaps redefine the PSL for just security and privacy and slowly move away from attempts to use it for quota enforcements, but let's not replace it with 'hope' just yet...

Disclaimer: had a PR approved for the PSL a couple of weeks ago, to be able to provide users I don't necessarily trust with development subdomains to play with but provision them with wildcard letsencrypt certificates (which is easiest if you just manage the DNS for them...)

IX-103 · on Sept 7, 2019

That's kinda the point. Right now the PSL is the best solution for a lot of things, but it fails at all of them.

The PSL is compiled into most web libraries (as a security measure), so those domains are only truly separate if the browser is new enough to have the updated version of the PSL. In other cases they all look like the same domain and you get no security and no privacy. In case you think this unlikely, how many people still use cell phones that are no longer being updated? How many "smart" TVs or other web-enabled devices aren't getting updates?

Of course, if you have a way of ensuring that browsers whose PSL is too old can't access your site or any of its subdomains then you might have some guarantee of security.

The point is that we need something that works. There are a few ideas out there. First party sets would do a lot for security and privacy, but there are issues with using them for reputation and attribution.