I work with Ad Data a lot in my job, and there's a lot of misconceptions about what this data that journalists love to propogate:
The location data in these networks is very inaccurate. Your OS and browser actually do a pretty good job of locking down your location data unless you give explicit permission. It's in the ad network's interests to lie about the quality of their data - so a lot of the "location" data is going to be a vaguely accurate guess based on your IP address.
But also, location data is really important to ads right now because, contrary to common perception, per user tracking is very, very hard. Each SDK might be tattling on you, but unless you give them a key to match you across apps, each signal from each app is unique. Which is why you are often served advertisements based on what other people on your network is searching - it's much easier to just blast everyone at that IP address than it is to find that specific user or device again in the data stream.
Bidstream data in particular is very fraught. You're only getting the active data at the point the add is served, but it's not easy to aggregate in any way. You'll be counting the same person separately dozens or hundreds of times with different identifiers for each. The data you get from something like Mobilewalla is not useful for tracking individuals so much as it's useful for finding patterns.
I think it's pretty telling from the few examples shared about how agencies actually use the data:
>"CBP uses the information to “look for cellphone activity in unusual places,” including unpopulated portions of the US-Mexico border."
>According to the Wall Street Journal, the IRS tried to use Venntel’s data to track individual suspects, but gave up when it couldn’t locate its targets in the company’s dataset.
>In March 2021, SOCOM told Vice that the purpose of the contract was to “evaluate” the feasibility of using A6 services in an “overseas operating environment,” and that the government was no longer executing the contract
Something is going to have to be figured out about this data - realistically the only way is a sunset on customized advertisements. However, I would personally not be worried (yet) that the government is going to be able to identify an individual and track them down using these public sources as they currently are.
I worked in ad-tech for a year before I left the tech industry as a whole. I've also done a fair bit of investigative journalism.
Let me share a thing:
Factual, a company that specializes in hyperlocal geofencing, uses geofencing much smaller than the self-regulation that their industry allows in their own rules. I learned this after a coworker quit because our company was allowing ad targeting to people using these smaller geofences. The whole company had an all-hands about it where the CEO of the company told everyone that we were not going to stop using Factual nor the smaller-than-allowed geofences because we, ourselves, were not the ones to produce those geofences. We were just a man in the middle helping to build a system to track people at high resolution.
Please try to reconcile with what your industry has and continues to destroy.
>Please try to reconcile with what your industry has and continues to destroy.
I don't see anything contradictory between your comment and the OP. Having an amoral CEO who condones breaking geotargeting self-regulation doesn't contradict OP's claim that it's hard to tie geotargeting data in bidstreams back to a particular person.
Only one person/company has to solve any given hard problem before they can sell it to interested parties. Who might lose it in a data leak, or package it up and re-sell it, etc, etc.
For example, it was very hard for me to identify myself in an anonymized public dataset of vehicle trips, but I did. It was also hard to FOIA for the documents showing them writing SQL to spot my trip.. but I did.
There definitely is and I've definitely pitched it to places. The Intercept had interest but told me that they wanted me to build the story out more to be less focused on Chicago. I understand where they were coming from (and the others who said the same thing) but it wasn't possible for me to continue doing freelance work, so no stories ended up being published about it at all.
First thing would be that a small geofence (i.e., a narrow church on available data) is entirely orthogonal to having high precision, high quality location data available.
I won't claim with certainty that this is the case, but it seems likely that Factual was overselling their capabilities. That, or they relied specifically on having users grant high precision location data access and had nothing otherwise.
Apps that already need location data are probably the most likely sources of collecting such data - food apps, dating apps, chat apps you have sent your location in, ...
There are whole companies that de-anon ad data as a service. Which gives the lots of data brokers the ability to not do the last mile and feel good about themselves. It’s a joke.
I remember when the first article was posted. Their method requires two parallel corpuses e.g. people who write on LinkedIn (under their real name) and Reddit.
Also, people who post under their real name are likely to write with their real voice:
> Any deanonymization setup with ground truth introduces
distributional biases. In our cross-platform datasets, the pro-files are likely easier to deanonymize than an average profile: the very fact that ground truth exists implies that the user may not have cared about anonymity in the first place. Similarly, two split-profiles of a single user are inherently alike, whereas two pseudonymous accounts of the same person (e.g., an official and a pseudonymous alt account) might expose more heterogeneous micro-data.
But dude... just think of all the optimal personalized mattres sales they can do with that data. I mean, people that use the bathroom at 3:57pm for seven minutes are 0.00138% more likely to buy a new mattress within the next six months. They need that data. Think of all the unsold mattresses.
At this point, your device is not giving anyone your location without explicit permission. So it really just comes down to your IP Address, which services do need.
I think your is statement is inaccurate to the point of being intentionally misleading:
Many devices, when running, and in some cases even if turned off but connected to their battery, will ping cell towers (maybe even BLE/Wifi) and get triangulated by the network infrastructure (such as cell towers) without actively broadcasting the GPS location.
That's why I don't quite understand why the gubernment needs to have finer grained data (esp around the US/Mexican border). Precision location info would only be needed if you need to track people in densely populated areas.
That location information is not available to apps or ad networks without user consent. The government can access it from the carrier with a warrant, but that's not what we're discussing here.
Carriers have also sold customer location data, no search warrant required. Though we can rest assured that the FCC has slapped the carriers' wrists with the utmost seriousness.
IP doesn't handle roaming very well. If you got routed onto the internet directly from your local cell tower, then your connections would drop whenever you switched to a different tower, which is somewhat suboptimal. Cell networks handle it at a lower level and route your traffic through a central location which serves as the origin of your IP traffic. Geolocate your IP while on cell data and you'll probably see something pretty far away from where you are. My phone's IP address at the moment is about 400 miles away from the actual phone.
IP Address is all you need to get fairly accurate (town or neighborhood) location for most of North America.
But it is necessary to send it somewhere, otherwise the internet wouldn't work.
Unfortunately it seems to have become accepted for our devices to communicate constantly and often with services we never explicitly started communication with (like Ad networks used in Apps).
Permission systems on devices should care about Network connections just as much as Location. Ideally when installing an app you'd get the list of domains it requests to communicate with, and you could toggle them. Bonus points if the app store made it a requirement to identify which Domains are third parties and the category like an Ad service.
If you use Google Location Services, which is stock install on basically all Android devices, it absolutely is uploading "anonymized" GPS data all the time.
I think the issue here is one of informed consent. You might say, "OK, this makes sense" when agreeing to location data for a weather app. In the context of whether it's going to hail soon, location is reasonable. What you only see in those GDPR-type banners is that the data is being re-sold off to 1001 "partners", none of whom are important for my hail-to-head concerns. Never mind all the cases where it's re-sold on to all the governments and personal-level creeps through aggregators.
Well, in the case of a company trying to market to you, it literally _is_ their business. It makes them money.
The problem is that we have markets where we:
- Incentivize organizations to pursue profits at the expense of everything else, which includes social good and civic rights
- Rarely hold bad actors accountable (and almost never in a timely manner)
Which means, given enough time, we're always going to trend to whatever makes the most money. Targeted advertising makes money, and will continue to do so unless or until we collectively decide to make it a greater risk to profits than it is today.
i'm not confident they know where i am at all. i routinely get ads on social media for places (super random US states, cities, etc.) nowhere near where i live (SF Bay Area).
The government wants to know that. They don't need to know.
CPB doesn't care where I am. Unless they make a mistake and think I'm an illegal immigrant. Or they decide to teach a lesson to someone who's critical of them.
Exactly, people are going to be logged into these apps with trackable identifiers. You can see it with tracker control on android, download a new app and see existing apps report things to new trackers, which seems to be happening at the sdk level.
> Each SDK might be tattling on you, but unless you give them a key to match you across apps, each signal from each app is unique
Aren't there many examples of these? For example IMEI, IMSI, phone number, etc?
Even without "unique" signals, isn't it fairly trivial to identify a user with a handful of "not very unique" signals? User-agent, a few recent IP addresses, browser capabilities, list of installed apps, device operating system properties, etc?
The location data in these networks is very inaccurate. Your OS and browser actually do a pretty good job of locking down your location data unless you give explicit permission. It's in the ad network's interests to lie about the quality of their data - so a lot of the "location" data is going to be a vaguely accurate guess based on your IP address.
But also, location data is really important to ads right now because, contrary to common perception, per user tracking is very, very hard. Each SDK might be tattling on you, but unless you give them a key to match you across apps, each signal from each app is unique. Which is why you are often served advertisements based on what other people on your network is searching - it's much easier to just blast everyone at that IP address than it is to find that specific user or device again in the data stream.
Bidstream data in particular is very fraught. You're only getting the active data at the point the add is served, but it's not easy to aggregate in any way. You'll be counting the same person separately dozens or hundreds of times with different identifiers for each. The data you get from something like Mobilewalla is not useful for tracking individuals so much as it's useful for finding patterns.
I think it's pretty telling from the few examples shared about how agencies actually use the data:
>"CBP uses the information to “look for cellphone activity in unusual places,” including unpopulated portions of the US-Mexico border."
>According to the Wall Street Journal, the IRS tried to use Venntel’s data to track individual suspects, but gave up when it couldn’t locate its targets in the company’s dataset.
>In March 2021, SOCOM told Vice that the purpose of the contract was to “evaluate” the feasibility of using A6 services in an “overseas operating environment,” and that the government was no longer executing the contract
Something is going to have to be figured out about this data - realistically the only way is a sunset on customized advertisements. However, I would personally not be worried (yet) that the government is going to be able to identify an individual and track them down using these public sources as they currently are.