From the course: OpenAI ChatGPT: Creating Custom GPTs
Exploring privacy and copyright - ChatGPT Tutorial
From the course: OpenAI ChatGPT: Creating Custom GPTs
Exploring privacy and copyright
- Okay, when I first saw this knowledge feature, I had an immediate question and there's a good chance you have the same question, so let's address it. What exactly happens to copyright and privacy when you upload documents into a GPT, especially if that GPT is shared with other people? Does OpenAI protect the privacy and is OpenAI looking into the files? This is a really complicated question because it relates to copyright and AI, something that is currently being battled in courts around the world, and it's not settled law. And I am not a lawyer, so I can't provide any legal advice. What I'll do instead is point at the features that exist inside the interface and also what OpenAI has said publicly about this, and then you can use that information to make your own decisions about how to work with this system. Now, to understand what is happening with the knowledge inside GPTs, we first have to take a look at what is happening to your regular interactions in ChatGPT. If you open ChatGPT, click on your name at the bottom, then go to Settings and Beta, you get a series of settings, including this one here, Data Controls. And this is where you choose whether you want to share your chat history and training data with OpenAI. This feature is turned on by default, and if you turn it off, you lose your history. So you won't see all the chats you've previously done in the sidebar here. What happens if you have it turned on, though, is OpenAI can look at your chats. They can look at what you input and what the system outputs. And there's a good reason for that. First of all, ChatGPT was originally a research project from OpenAI to see how people would interact with an LLM model like this. So therefore, it was always meant to collect data and look at these interchanges. Secondly, because we are now working with non-deterministic systems, these large language models are unpredictable in their behavior, so it's very important for the people who are building these systems to know how they interact and observe when people are trying to do the same thing many times and not getting the results they want. So we're in a bit of a dilemma here as users. If we don't share our data, the systems can't improve. But if we do share our data, they can see what we're doing. And that requires a high level of trust from each individual user. But like I said, this setting is turned on by default, and most users are unaware of it, and they're just using the system. So that means if you create a GPT and you share it with other people, and they have this setting turned on, then they are sharing their data from their interactions with your GPT with OpenAI. So when you upload a document to the knowledge section in the GPT, a new section appears at the very bottom called additional settings. And under here, we have a pre-checked box that says use conversation data in your GPT to improve our models. This message here says, basically, we can look at the conversation that's coming out of this GPT, including the conversations that are surfacing information from your documents. So if you're uploading a document and you don't want OpenAI to see what's in the document, you need to uncheck that box. And that also applies if this is a private GPT that only you can see. Now, this box only appears if you upload a document, because if you just have a regular GPT that doesn't have a document, then all OpenAI is seeing is the instructions and your actual chat. So there's no risk of leaking personal information or private information from a document, for example. But there's another layer to this. What happens if you upload a document that you have the rights to read on your own, but you don't have the rights to share? That's where things get really interesting. So according to OpenAI in conversation, they say if you create a GPT and you share it publicly, then they will run it through some form of copyright check. So my assumption here is if you upload a book, for example, there's a high likelihood that the system will discover, hey, this is a book that is under copyright, and then take the GPT down, or at least remove it from the public marketplace. And once they open the GPT store, they definitely have to run those kinds of checks on anything. But in conversation, they also say they don't run those types of checks on private GPTs because they are private and people can do whatever they want with their own data. And they won't do it on a ChatGPT enterprise accounts because that's inside the enterprise bubble. And there's even more. At OpenAI Dev Day, they made a lot of announcements. And one of those announcements can be found quite far down in the blog posts. Let's see, where is it? It is here, copyright shield, where it says OpenAI is committed to protecting our customers with built-in copyright safeguards in our systems. Today, we're going one step further by introducing copyright shield. We will now step in and defend our customers and pay the cost incurred if you face legal claims around copyright infringement. And then it says, this applies to generally available features of ChatGPT Enterprise and our developer platform, which is understood as the API. That means to my reading, as a non-legal expert and not a lawyer, this does not apply to regular ChatGPT accounts. I might be wrong, but it very specifically states ChatGPT Enterprise and our developer platform, which is the API. It does not state ChatGPT. So from that, it sounds like the copyright protection does not extend to regular users. Then again, regular users probably wouldn't need that level of copyright protection unless they are deliberately uploading content that is not their property and then trying to share that out, in which case the system should catch it. Like I said, this is complicated and untested even in courts. So your miles will vary, as they say.