What Do You Do
So What Do You Do?
Obviously, people ask one another this question all the time, and usually my answer is “Oh I’m a software engineer” and that’s the end of it. But of course it’s not the end of it at all. If it’s asked in the presence of, or by, a software engineer, then the conversation will likely continue with “Oh, what sort of thing?” and comparisons of tech stacks ensue. The words “Open Banking” or “Oh, digital identity stuff” get bandied about. Et cetera.
Sometimes, though, people actually want a real answer. And to be perfectly frank, it’s a tough one to give. Because what we in the Open Data world actually do is pretty hard to explain. It’s not like we make a product that you can point to and say “I made that”. It’s such a complex field, with a lot of moving parts, that it lacks a really useful elevator pitch. All the marketing material is opaque and jargon-filled, and of course leaves one no further to knowing what it’s all about than before one read it. This isn’t to say it’s too complex for people who aren’t involved to understand, and I’m not claiming to have any special super powers because I do understand it. It’s mostly that it isn’t a product, so much as a set of solutions to problems people might not even realise exist.
It’s such a problem, in fact, that recently the question “What do we do?” was asked by someone in the industry, of their colleagues in the industry, and nobody had a decent answer. It was a bit embarrassing, to be honest.
So I’m going to write a series of posts that walk us through the various problems we solve, in the hopes that it might become clearer. In the hopes, also, that non-technical people might stand a chance of understanding it, I’m going to ignore where it all started - Open Banking - and talk about data-sharing in a more familiar context.
And I’ve already screwed up by dropping the term “data-sharing” in there as if everyone knows what that is.
So Data Sharing, eh? Why?
You’ve probably all done this already. You’ve probably all used social media. Let’s start there. Let’s start with Instagram. You’ll see why I chose that in a minute.
You’ve got your Instagram account, and you post photos of your life on it. You post them for your friends to see, and maybe for your family to see. But Auntie Nellie isn’t on Instagram, and you want her to see the photos too. So you have to upload them all to Facebook as well, so Auntie Nellie can see them. Which is annoying.
Luckily, Facebook allows you to link your Instagram account to your Facebook account, so that when you post a photo on Instagram, it also gets posted on Facebook.
This is data sharing. You have data - your photos - that you want to share with other people - your friends and family. You use a service - Instagram - to share that data. But you also want to share it with other people who aren’t on that service, so you use another service - Facebook - to share it with them. Forget for now that these two things are owned by Meta. It’s irrelevant to the point I’m trying to make.
Remember that. This idea is fundamental to everything that’s to come.
The Way Things Used To Be
Back in the day, to share stuff like this on the Internet, the only real option was to give your literal login details for one site to another, and just trust that they’d do nothing malicious with it.
Obviously this is a stupid idea. I won’t even bother explaining why.
Luckily there was a solution. It started with something called an API.
APIs In A Nutshell
If you already know what an API is, you can skip this bit. API stands for Application Programming Interface. It’s a way for one piece of software to programmatically use another. Sometimes these bits of software would actually just be different parts of the same program, but often they were different programs running on different machines. That idea, the idea of different programs running on different machines, has been around for a long time really. We used to call it “remote procedure calls” or RPCs. Then came all sorts of other permutations, and eventually the HTTP protocol came to dominate everything, and we built “web services” or “RESTful services” or “REST APIs” and all sorts of other terms for the same thing. We’ve kind-of settled on API for now though. So we’ll stick with that.
How does this relate to photo sharing? Well, Instagram’s engineers would build an API that allowed other software to interact with Instagram. Facebook’s engineers would write something which consumed that API. Now, Facebook doesn’t need your credentials for Instagram. It has its own. That might’ve been a username and password, sometimes referred to as basic auth. Or it might be an API key, or an API token (functionally same thing). Doesn’t matter. Facebook had a credential that allowed it to interact with Instagram’s API.
A New Problem Arises
Have you spotted the problem yet?
Obviously Facebook no longer has free rein of your Instagram account. It just has free rein of everyone’s Instagram photos. (At this point I should mention, this isn’t what actually happened, I’m walking you through why API security is important)
So what we’re looking for here is a way for
- one piece of software to interact with another piece of software
- on behalf of a specific user
- without being able to do anything other than what the user wants them to do
- or anything with anything belonging to any other user
If you already know how OAuth2 works, you can skip the next bit.
OAuth2 In A Nutshell
As the name implies, OAuth2 is the second version of the OAuth protocol. We don’t talk about the first one. It was invented essentially for Twitter, and was bit of a pig, and we’ll leave it at that. If you hear someone mention OAuth without the 2, they’re almost certainly talking about OAuth2.
What it is, is a protocol for delegating authorisation. It’s a mechanism for you to say to Instagram “I hereby allow Facebook to post photos on my behalf, but only photos, and only the ones I choose to share with Facebook”.
How it works, without getting technical, is that as already discussed, Instagram have their API. They also have an authorisation server, which is a bit of software which allows interested parties like Facebook to create “applications” on it. Facebook creates an application on Instagram’s authorisation server, and it can now ask users to authorise that application to do things on their behalf. The application itself can’t do anything. Facebook has to ask the user to authorise it, and the user can choose what to authorise it to do. You’ll have seen this in action countless times even if you’ve never linked Instagram to Facebook. “Sign in with Google”? That’s OAuth2. Well, it’s OAuth2 and a bit more, but let’s not get ahead of ourselves. Facebook’s engineers have to build something that goes to Instagram’s authorisation server, and asks the user to authorise the Facebook application. Then they have to build the thing that actually fetches the photos from Instagram and posts them on Facebook.
Scaling Up
So you can post your photos on Facebook without giving Facebook your Instagram credentials. That’s great. But what if you want to post them on Twitter too? Twitter also has to go to Instagram, and set up an application on their authorisation server, and ask you to authorise it. Then build the plumbing that does all the fetching and posting.
Now, let’s add Flickr into the mix. Now, Facebook have to build the plumbing to fetch photos from Flickr, and create the application on Flickr’s authorisation server, and ask you to authorise it. Twitter have to do the same thing.
You’re probably thinking “Oh that’s not such a big deal”. Or you might be thinking “Oh yes, that is a problem”
But it’s not a problem. It’s several problems. Which I’ll start delving into in the next post.