October 12, 2023

Episode Description:

Join us for an insightful conversation with Patricia Thaine, Founder and CEO of Private AI, as we delve into the world of artificial intelligence, language models, and data privacy. In this engaging discussion, Patricia sheds light on the transformative potential of AI, particularly language models like GPT-3.5, in various industries.

In this episode, your host, John Verry, and Patricia Thaine discuss:

  • how specialized AI models are revolutionizing tasks such as sentiment analysis and personal information identification, all while ensuring data remains private and secure.
  • responsible AI practices and preparing the next generation to harness AI’s power responsibly.
  • the potential of AI and the ethical considerations that accompany it.
  • And more!

If you want to learn more about the realm of cybersecurity, follow The Virtual CISO Podcast on your favorite streaming platforms!

For weekly updates on the state of cybersecurity, digital technology, and more, follow us on LinkedIn, @pivot-point-security.

Part 1: Exploring Hallucinations with the CEO of Private AI Patricia Thaine

Join Patricia Thaine, CEO of Private AI, and host John Verry for a deep dive into the intriguing realm of AI hallucinations. These hallucinations, as Patricia elucidates, are outputs from AI models like GPT-3.5, stemming from their training data, which comprises internet text sequences.

The core of the issue lies in how AI models function: they predict the most probable word or phrase following a given input, based on patterns in their training data. However, AI hallucinations aren’t tethered to real-world accuracy, raising concerns for business users.

To illustrate, Patricia shares unsettling real-world examples, including a professor who used AI-generated students to craft a convincing yet entirely fabricated research paper and a lawyer who cited non-existent case law in court. These cases highlight the challenge of distinguishing AI-generated content from reality.

Part 2: Navigating Generative AI Ownership in Companies | Who Should Be in Charge

In the ever-evolving landscape of artificial intelligence, the question of who should own and govern generative AI within a company is a topic of great significance. Join Patricia Thaine, CEO of Private AI, and host John Verry in this insightful discussion where we explore the complexities surrounding this issue.

When it comes to generative AI, ownership can be elusive. It may fall under the purview of the AI group, the CTO, or various other teams within the organization, all striving to secure approval from their compliance teams. In many companies, this ownership structure is still up in the air.

However, given the risks and intricate nature of generative AI, there’s a growing consensus that a centralized entity should oversee it. This centralization could be vital to managing data security, compliance, and functionality effectively.

As with APIs and third-party services, compliance teams, data protection officers, CISOs, and CIOs play a pivotal role in governing generative AI integration. It’s a complex dance between harnessing the potential of AI and maintaining strict controls over sensitive data.

Part 3: AI and Human Roles: Adapting to Change and Augmentation

Join host John Verry and guest Patricia Thaine, CEO of Private AI, in this engaging video as they explore the influence of AI, particularly ChatGPT, on various professions. They discuss the concern of AI replacing human jobs and emphasize its role in enhancing human capabilities.

AI, like ChatGPT, simplifies tasks such as data curation, making work more efficient. John and Patricia debunk the myth that AI will entirely replace jobs, highlighting its augmentative nature.

Transcript:

Speaker 1:
Listening to The Virtual CISO Podcast, providing the best insight on information security and security IT advice to business leaders everywhere.

John Verry:
Hey there, and welcome to yet another episode of The Virtual CISO Podcast. With you is always John Verry, your host. With me today, Miss or Mrs… I shouldn’t have said that that way … Patricia Thaine. Hey, Patricia.

Patricia Thaine:
Hi, John. Very nice to be here. Thanks for having me.

John Verry:
Yeah, glad to have you on. Always like to start simple, tell us a little bit about who you are and what is it that you do every day.

Patricia Thaine:
Sure. I come from a background in doing research in privacy, specifically applied cryptography and natural language processing and speech processing. I’m the co-founder and CEO of a startup called Private AI based out of Toronto.

John Verry:
So if anyone hasn’t picked up by now, today’s conversation might have something to do with artificial intelligence. Of course, the podcast title will do that as well. So, before we get down to business, I always ask, what’s your drink of choice?

Patricia Thaine:
Sparkling passion fruit juice.

John Verry:
Not guava, not any of those other tropical fruits. Got to be passion fruit, right?

Patricia Thaine:
Passion fruit’s amazing.

John Verry:
Well, I think I’ve had it, but I can’t say for sure, but I’m going on vacation next week and I think I’m going to look for some passion fruit juice. I’ll shoot you a note, tell you what I think.

Patricia Thaine:
Please do.

John Verry:
So thank you for coming on. I think we can thank Mr. Cameron as in James and Mr. Schwarzenegger as in Arnold for enlightening us about the potential mess that artificial intelligence power could yield at some point. That was way back in 1984 and it feels to me like right now we’re starting to come into that world where that’s relevant, full self-driving cars and things of that nature. I think ChatGPT and the various iterations of that bard all have the potential to transform life as we know it.
I think many people probably underappreciate the changes that might occur, but I also think it’s given some people nightmares about Skynet. So, I’m going to say let’s get two basics out of the way, but I have a funny feeling I’m calling these basics and they’re not. Can you define artificial intelligence and then can you define ChatGPT and tell me whether or not that is or isn’t artificial intelligence?

Patricia Thaine:
Yeah, so artificial intelligence, the definition of artificial intelligence is quite a debated topic and some people will say that something like ChatGPT is artificial intelligence or the large language model that it’s using. Some people will say that is not artificial intelligence. One example of how to detect artificial intelligence is the Turing test, where you have to decide, “Is this a computer I’m speaking to or is this a human I’m speaking to?”
But of course, the people who don’t think that ChatGPT is artificial intelligence would probably not think that the Turing test is a valid test for measuring artificial intelligence. I think a lot of this debate also stems from us not even truly understanding what is human intelligence versus animal intelligence or human intelligence versus computer intelligence. So, long story short, it’s debatable.

John Verry:
Long story short is I don’t know, right? I mean you gave me the five-minute I don’t know. Is that the official stance here?

Patricia Thaine:
The five-minute nobody can really agree on what artificial intelligence is. The computer is able to have some understanding of the world for what does understanding of the world mean. Also, a very unabated topic. So, yes, the very long nobody really knows.

John Verry:
Yeah, I got you. It’s funny, it reminds me of the very famous quote from one of the senators. I don’t know how to define pornography, but I sure know it when I see it. I guess to some extent maybe some people’s definition would be, “Hey, yeah.” To me, I look at ChatGPT and I think of that as, yeah, there’s some artificial intelligence going on there because it’s definitely an intelligent piece of software. So, you mentioned in your definition of that, you mentioned a large language model, which is essential to this tenant. What is a large language model?

Patricia Thaine:
A large language model essentially is trained to take an input and output the most likely word right after that input. So, it is a probabilistic model of language that is very large.

John Verry:
That’s interesting, and I forewarned you that I had done some research on this before our podcast. So, I know not even quite enough to be dangerous, but I’ll ask this question. So, you mentioned nodes, right? So Ars Technica did a really cool jargon-free maybe ChatGPT for dummies, which was good because I’m a dummy. They described it as having that each word becomes a vector, like a math. Then what happens is there’s these… I think you call them weightings, but there’s these 12,288, if I recall, data points associated with it somehow. Maybe that’s the weightings you’re referring to and that it gets acted on and then there’s 96 layers that this filters down through to get to the ultimate answer. Is it really that complex?

Patricia Thaine:
Yeah, it is pretty complex. So, essentially, first, of course, you do need to do some text pre-processing. That’s not part of the large language model, but the text pre-processing might be things like making sure everything’s in the same casing, tokenizing the values by. For example, apostrophe S will be separated from I-T or its. There’s a bunch of pre-processing in there. Then these tokens tend to turn into vectors and then an embedding is created within the network that takes context into account and it also takes the most important words within the input into account as well.
That part of taking the most important words within the input into account, that’s called the attention layer. That is what really revolutionized language models, because prior to this, which is a core piece of the transformer model, we had language models that could definitely not perform as well as language models now.

John Verry:
So it’s shocking to me that ChatGPT is math. So, we’re representing words, if you will, as math and then we’re doing math to determine relationships. Could you give me an example? Because that’s mind bending for someone who doesn’t understand how this works.

Patricia Thaine:
Yeah, I’ll give you a very simple example. So, in previous iterations of word vectors, there are things called, for example, word to vec. These were vector representations of words based on the context in which they appeared in the corpus that was used for creating these vectors. So, the assumption there is that words that appear in similar context have some similarity, one with the other. A famous example is if you take king and queen and woman and man, you do king minus man plus woman, you get queen.
So, that is a very hand-selected example. In a lot of research papers, you’ll have hand-selected examples to showcase these things. So, they don’t all work out this nicely, but you might also be familiar with the dimensionality reduction of these word embeddings into clusters. So, words that appear in similar contexts might be clustered in similar spaces in two-dimensional space, for example.

John Verry:
So maybe another stupid simple idea would be you’re saying that cats and dogs might often appear in similar context because there are common pets that people have. People search for cat and dog food. So, these large language models would then make inferences about the relationship between cat and dog, maybe in this vector space. Is vector space really a two-dimensional space? I mean conventional vectors.

Patricia Thaine:
It could be.

John Verry:
Can it be a three-dimensional space as well?

Patricia Thaine:
It could be much larger.

John Verry:
Okay. So, it is a vastly multidimensional space. But if we were mapping things in space, the things which are more like each other or in context happen a lot together get grouped close to each other and there’s an inference to a relationship between them. The cats and dogs might both eat food, they both might be family pets, things of that nature.

Patricia Thaine:
It might be something like that. Those multidimensional vectors, they’re called tensors in the machine learning land.

John Verry:
So now you’re bending my head, right, because I’m an engineer, but we never got beyond three dimensions. I mean the fourth dimension gets crazy when you start thinking. Yeah, I mean how many dimensions might we be talking about?

Patricia Thaine:
That is a good question. It could be thousands. Let me see how many-

John Verry:
Literally thousands? Wow. It’s remarkable that ChatGPT works as fast as it does then, because when you think about 12,288 attributes, these 96 layers, you’re making all these calculations in multidimensional space. It must eat up a tremendous amount of processing power to answer a pretty simple question. What time is this Hard Knocks on tonight on HBO? I wonder what that costs and compute power.

Patricia Thaine:
So the really interesting thing is that GPUs are actually really good at computing these things and that’s why it’s so popular for machine learning.

John Verry:
I didn’t know that. They were also the popular in password cracking machine probably for the same reason that the type of math that they’re doing is very efficient.

Patricia Thaine:
Yeah.

John Verry:
All right. So, I think we have as basic as an average person an information student that might be listening to this needs to know about ChatGPT. So, let’s start talking about some of the risks about these technologies. So, you hear a lot about hallucinations. What are they, why do they happen, and how can they put business users of a tool like ChatGPT at risk?

Patricia Thaine:
So hallucinations are outputs from the model that they’re based on what the training GIDA was, which is basically sequences of words, one after the other. Then what it’s outputting is the most likely word after that sequence. What that means is that it might sound very convincing, but it might be making things up like sources or information about anything, because its goal is to find the most likely next word. It is not necessarily to find the most likely next word within the context of reality. So, an example of this.

John Verry:
That’s a little discomforting, just so you know. I mean to a researcher, that might sound, “Okay, cool,” but somebody like me, that’s a little scary.

Patricia Thaine:
Yeah, fair enough, fair enough. Yeah. So, an example of that would be, for instance, there’s this one professor for example, Dr. Sweeney who is a really interesting person. She’s the person who called out or figured out that if you get one of the healthcare data sets that was made publicly available with the zip code and a little bit and age, I believe, you could figure out who the governor of Massachusetts was in this particular dataset. That led to a whole kerfuffle about maybe we should be more careful with healthcare data and actually have a process when it comes to anonymizing data, but anyway.
She had mentioned in a talk recently that some of her students had gotten to ChatGPT to create a research paper that she wrote with a bunch of studies that looked really realistic but that she never conducted. It was in the format that she would write it and it sounded like her, but yeah, none of this was real. In another instance, there was also a lawyer recently that had brought arguments to a court of law and cited cases that never existed and that lawyer got into deep problem.

John Verry:
That’s staggering. Because of the way the mathematics works, it knew that for this type of a case that this type of case law might be relevant and it created case law relevant to a case that was not actual case law. That’s crazy. So, that is a great example of places where you can have risk. I’ve seen it that Justin. I don’t use ChatGPT as much as I probably should or the iterations of it, but occasionally, I’ll use it to help write and I have found glaring errors, because the subtleties of information security, we misinterpret some stuff. The other thing and I’m curious if this is a risk in some way maybe to the people that are publishing data, but I was writing trying to do something about ISO 27001 and 27701.
I was having a hard time phrasing what I wanted to say well. I put that in and I got this great language back and I looked at it and I said, “This looks really familiar to me.” I had written it and it was off of our website like word for word. It was just feeding that to… I didn’t know it was me. Does that represent a risk to the people that are putting data out there that obviously if these models are trained on it, they’re going to be using it? Then could that also create plagiarism? Could that effectively also create another set of risks around plagiarism?

Patricia Thaine:
Yeah, it definitely can. As you know, there is currently a lawsuit ongoing in the United States against OpenAI for copyright infringement of ChatGPT.

John Verry:
Okay. So, that’s an example of that where technically if that material that was on my website was copyrighted and ChatGPT was using that and giving it to somebody else to use and they used it, they became the avenue by which someone violated copyright. They enabled copyright infringement.

Patricia Thaine:
Indeed, but in addition to that, there is a big question mark about whether even training models-

John Verry:
[inaudible 00:16:46] train them on. Oh, that’s crazy. Yeah, that’s crazy. If you think about it, we’ve got hundreds and hundreds of pages. We never gave OpenAI the right to use our data. Does by virtue of the fact that it’s sitting on the internet and the internet is public, does that impugn them or give them the right to have that data? That’s a pretty cool question. These are the questions that are going to be debated in court over the next few years with regards to this stuff.

Patricia Thaine:
Yes, 100%. If you look at the GDPR for example, one of their requirements is consent for the use of any personal information for each and every purpose that it will be useful. If ChatGPT was trained on EU citizen personal information, they might be in breach of the GDPR. So, that’s another thing to keep an eye out in the years to come. Chances are that they took some cautionary measures there, but it’s pretty much a black box what they did.

John Verry:
It’s going to be interesting to watch it all sneak out. All right. So, how I found you for this conversation was I started to read about people restricting access to ChatGPT because of some of these risks and it’s had things to do with accidentally leaking sensitive data, things of that nature. So, that is, as I understand it, what Private AI does. So, if I’m a user of ChatGPT, how would I unknowingly leak sensitive data? Whether that’s personal information, whether that’s intellectual property, how does that work? Then maybe you can tell us a little bit about strategies for preventing that. Obviously, one of the strategies is to use your product so you can talk a little bit about how your product works and how it does prevent that.

Patricia Thaine:
Sounds good. Yeah, so whenever you’re sending data to a third party, you do need to be cautious about what it is you’re sending. Some examples more recently around ChatGPT are Samsung employees sending confidential information. There was also a data leak where conversational history was shared across ChatGPT users and that may have included credit card numbers. So, essentially, the best thing you can do is prevent this data from going out in the first place, but you still want to be able to use the services. They are very helpful. So, where we come in is identifying the personal information, removing it, and then the prompt without that personal information gets sent to the third party of your choice.
Then in the response, that original information gets reintegrated. So, your names weren’t sent, but the names that you did mention get reintegrated in the appropriate context and so on very seamlessly. We are currently working on design partnerships to be able to do the same thing for confidential information as well.

John Verry:
So that would be if we had an internal project code named X, anything associated with code name X could be filtered in that same way, source code, things of that nature. I mean when you look forward, those are the types of things that we want, because obviously, these days with SaaS being what they’re worth, that intellectual properties is where the fortune to many organizations.

Patricia Thaine:
Absolutely. Because confidential information can mean different things for different organizations, we are taking these design partnerships from organization to organization to make sure that we cover what that particular group means as confidential information.

John Verry:
Got you. This might be a bad analogy, but it feels right. It seems like DLP for AI, data loss prevention for AI.

Patricia Thaine:
Right? It is a type of data loss prevention for AI.

John Verry:
That’s pretty cool. So, question for you, what I don’t understand is, so I sent a prompt up to ChatGPT. Let’s say I incorporate data that is sensitive to my organization or violates GDPR or CCPA. How does that get exposed to somebody else? That seems to imply to me that my data then becomes part of the large language model. Is that the way these large language models work?

Patricia Thaine:
That’s a great question. It depends. So, they might be storing the data for training later. You do have to look at the terms of use to make sure that either they’re not doing that or that your settings are preventing them from doing so. But in general, it’s just best practice to prevent the sharing of personal information to third parties as much as possible.

John Verry:
Got you. All right. So, we talked about hallucinations and the poor lawyer who looked foolish in court. I don’t know what the implications of that. We talked about this data leakage, if you will. Are there any other risks of note that we should be talking about when it comes to using a tool like ChatGPT?

Patricia Thaine:
Good question. One is of course bias. So, the model is trained on the internet. If you are going to use it to automatically respond to customers, for example, you will want to run ample tests repeatedly over intervals of time to make sure that the model isn’t producing any insulting results, any results that might be biased towards more men towards than women or different skin colors or cultures or religions. They do put some protections in there within ChatGPT. But no, they can’t be perfect.
The best time to actually deal with this is at training and you can’t when you’re training on the entire internet. So, bias is definitely one of those. In general, we did talk about the copyright aspect of it, just general explainability of what the sources were for the information is something that’s a big question mark and that is not easily solvable at all. So, that’s definitely something to keep in mind.

John Verry:
Yeah, that’s a really interesting point, because if you think about it logically, I literally just took some training on unintended bias I think is what they referred with. But we all have some implicit bias. Maybe implicit bias is the right phrase, things we don’t even realize. Obviously, that’s going to come out in the writing. Obviously, that’s going to come out in the content that gets out to the internet. So, wouldn’t it be logical that if the content which is being produced by humans who have at some level of implicit bias, if a system was training on it that it would actually absorb or inherit said bias.

Patricia Thaine:
Absolutely.

John Verry:
Logic, right?

Patricia Thaine:
Yeah. That’s a problem in machine learning for a while now.

John Verry:
Okay. So, let’s talk a little bit about one of the other things which I thought was fascinating on your website was the fact that you had a lot of coverage on… Well, what I guess I’ll refer to as private learning models and/or large learning models. What are the benefits of a private LLM? Why would I use a private LLM versus using one of the standard ones that I can just access over the internet?

Patricia Thaine:
So I think that’s still something that people are grappling with to see whether or not there is a benefit. The benefit of course is the privacy and security of it. It is within your environment. You have a lot more control over it. What we are seeing is, for example, Microsoft which has Azure OpenAI create an environment that’s isolated for certain companies. Ultimately, what I think is going to happen is people will play around with having their own large language models within their environment, but unless they’re one of the very large players and it’s generally going to be very hard to keep up to date, very expensive to train or to maintain, including running it in your environment given just how large these are. So, you’ve got already these larger players who are swallowing quite a bit of the cost for running these models. So, ultimately, it will be a cost-benefit analysis and whether or not you absolutely need the control over the large language model to the extent where it has to be in your environment versus a secure environment that’s isolated for your company.

John Verry:
So obviously, the security would be good. You said a little bit earlier, so if I wanted the large language model to learn, so let’s say I’m an information security firm. Let’s say that we’re doing a security information event management, so we’re getting thousands of alerts and logs and things of that nature coming in. Let’s say that I wanted to use a large language model, but I wanted to continually learn off of what I was doing, but I didn’t want that to get absorbed by a public model. Is there a way where you could use the public model and then have what you are doing with it grow that model privately if you will? I don’t really understand how these work. Or is it possible to have two language models that you’re using or do you always use one but you’d want to augment it? How does that work?

Patricia Thaine:
Yeah, all of the above is possible. It depends on the provider of the large language model. You are likely able to get an instance that is trained just on your data. It is still safer to prevent the training on personal information. Otherwise, what you’re dealing with is the same levels of access control for the original data are going to have to be the same level of access control for the model. You’re going to have to have the different models for different teams as a result of it. So, you have to really be careful about what you’re using to train the model as a result of that. Same with confidential information, you might not want your sales team having access to the latest research results that your company hasn’t even publicized yet. I think that answered the question. Not sure.

John Verry:
Yup. Very well. So, also, does it make sense or do people have… So, the OpenAI, ChatGPT 3.5 or 4, I forget which is the current one or they run both of them I guess. In my mind, I guess what I would call a general purpose large language model. It’s trained on a broad array of data. When you talk about private learning models, one of the reasons you also use one of those would be for a very specialized application. One place I’ve heard that AI is starting to come into play is in doctor’s offices that they’re trying to make that work.
So, you can come into your general practitioner who might not know a lot about certain specialties. So, obviously, if he starts taking your history and the AI says, “Hey, based on X, Y and Z, you might refer this person to this person because they might have this particular medical diagnostic code.” Are special purpose language models something that exists and can you talk a little bit about that?

Patricia Thaine:
They do. The good thing about this, so not necessarily for chats, I mean there are some good ones for chatting as well that aren’t necessarily ChatGPT, but there are special purpose ones for sentimental analysis or for what we do, which is identifying personal information. Actually, when you’re create a special purpose one, it means that it could be smaller, it could be deployed more efficiently, it can cost less money to run in even the short term. But yes, you are right that training a special model would be for special purposes. Or if you’re looking at what organizations have, it’s 80 to 90% of data that is unstructured.
So, that means text, audio, images, videos, things that are really difficult to get a grasp on. The AI is the key to unlocking that data in an efficient way. So, either fine tuning the model with data within your organization or making sure that the model has access to data within your organization that it can reach into for given particular prompts. That is a holy grail of unlocking this massive amount of information that’s been just chaotic in organizations for the sense-

John Verry:
For forever, since we started storing data in unstructured formats. So, multi-right or Lotus 1-2-3, they started our problems. You said something which I didn’t really think I knew. It sounds as if you’ve developed a language model or a learning model, excuse me, that could be used to help clients identify sensitive information, personal information. So, do you have a tool? You made it sound like you have a tool that you could point at a bunch of unstructured data and it uses a language model, a learning model, I keep saying that, to identify this data. Is that also in your bailiwick beyond this just filtering component?

Patricia Thaine:
That language model. You’re right with calling it a language model is actually what we use.

John Verry:
Oh, it is language model. I didn’t know I was right when I thought I was wrong.

Patricia Thaine:
You’re right. Yeah. So, we started in 2019 with the understanding that no one was doing even the fundamentals of privacy, which is what is the personal information in the first place?

John Verry:
Couldn’t agree more.

Patricia Thaine:
AI had just gotten to the point in that state that it could reliably be used to solve this problem, but not with a lot of very accurately curated data and not with models that were built for the efficiency required to dealing with this kind of information in also a multimodal way. So, our entire goal at Private AI is to build a privacy layer for software and this is a very first stage. What kind of personal information do you have and what can you remove quickly? So we enable developers within organizations to integrate this anywhere from their egress proxies to within their own software components for scanning S3 buckets. So, both for risk assessment and risk reduction. PrivateGPT came up because this was a perfect application of that technology as well to showcase this is another aspect in which a privacy layer for software makes sense.

John Verry:
So you do identification as well as filtering and this idea of filtering into ChatGPT, you could filter into and out of any application. You can sit in line. Okay, that’s cool. So, if I think about it as a set of APIs with a lot of intelligence built in, as data is trancing it, it’s going to dynamically rewrite that data in a way which allows the backend application or backend API to accept the data, still function, return valid information, and then you’ll again transform that data on the way back out, real time back out to recontextualize the data based on the original data that was removed.

Patricia Thaine:
Exactly. It runs entirely in the customer’s environment. We never see the data.

John Verry:
Well, that’s really cool. My head hurt. So, one last question there. Once you’ve run your tool against a bunch of unstructured data and it’s going to tell me what type of data that you have, in what files, I’m assuming in what locations that data’s located, are you working with or do you have plans to… The next step obviously is to begin that process of transforming that into what they call in GDPR or record of processing activities or what some people will refer to as a data map. Are you working with third parties? How do I take the next step with that data from a privacy perspective? Is that something that you guys are working on?

Patricia Thaine:
Yeah, it really depends on the use case. In some cases, it could be part of an egress proxy, so that any data going into the organization or through different parts of the organization to the central database can get filtered out, for example, or any data going into the data science team can get filtered out. But in other cases, as you mentioned, people do need a data map, in which case there’s generally an ETL pipeline that we could fit into nicely that organizations would have, but otherwise we can either integrate into existing connectors or build connectors for the various data sources.

John Verry:
Cool. Then I guess another obvious use case is data transformation out of prod into non-prod environments from a software development perspective. That’s a big challenge.

Patricia Thaine:
Yeah, that’s a huge use case.

John Verry:
Yeah, really cool stuff. All right. We’ve covered a ton of ground. Actually, I had one other question to ask you. Who generally owns generative AI in a company?

Patricia Thaine:
Yeah, I love this question, John. It’s still to be determined. In some cases, it might be the AI group, it might be the CTO, it might be a number of different groups who want to use generative AI and are trying to get their compliance teams to say yes to things. So, I think in a lot of organizations, it’s still to be determined and it might end up being that each major team, like sales, marketing, and developers in different parts of the organization might own their own generative AI for their purposes.

John Verry:
But I would assume at some point, given the risk that we’re talking about, given the complexity that it’s got to be governed by some central entity, right?

Patricia Thaine:
Yes. I’d say it’s similar to many of the APIs or third-party services. You still need to get approval from the compliance team or the data protection officer or the CISO or the CIO in order to move along with the integration or the purchase of the software.

John Verry:
Yeah, that’s an interesting idea that at one level, we see the complexity of this AI monster and another level, it’s as simple as it’s another API. In theory, we’ve already got processes to govern that, but when the risk associated with the data and the risk associated with the functionality of the API changes, I think the way that we govern that has to change. I don’t know that we’re prepared for that.

Patricia Thaine:
So that’s a really good point. I think we might be a little bit more prepared for it with regards to what we allow to leave the organization to these third-party LLMs. However, where there might be an entirely new role or an entirely new responsibility under the CTO or another one of those C-suite roles could be who will govern the generative AI that’s trained on the company data.

John Verry:
So speaking about that risk and governing that risk, I know we’re in a new and emerging space. At least from my perspective, the best general purpose guidance that we’ve got out there right now is the NIST AI risk management framework. So, when you’re chatting with the orgs that you’re working with, is there general awareness of that? Are people using that? Do you believe in it? Do you use it? Are there other things that you think people should be using?

Patricia Thaine:
Yeah, that’s a good question. So, I haven’t actually come across anybody who’s brought it up within sales conversations. I know people are thinking about the Responsible AI a lot. I know that they are looking at various tools for Responsible AI. They might be looking at these guidelines behind the scenes without necessarily mentioning it to vendors like ourselves. But I do know that this is definitely not a new topic and there are lots of researchers who have been working on this for over a decade and some really good work for example, on the topic of how to gauge the ethics of the natural language processing project or research that you’re doing is by Dr. Saif Muhammad out of the National Research Council of Canada.
He has these guidelines to follow that you can answer various questions in order to think about this major problem. But there are also many other researchers that I’m depending on the work that you’re doing that I’m happy to point you to or listeners too if they’re interested.

John Verry:
Sounds good. We beat it up pretty good. Did we miss anything that we should chat about or ChatGPT about? That would be what my kids would refer to as a dad joke.

Patricia Thaine:
There’s good dad jokes books out there too.

John Verry:
Yeah, you know what? If I didn’t know a good dad joke, I know where I could find one, ChatGPT, right?

Patricia Thaine:
There you go right now.

John Verry:
I bet you would give me a hundred bad dad jokes.

Patricia Thaine:
Those dad jokes books are going to go out of business.

John Verry:
No, I think the people who tell dad jokes are not yet using ChatGPT. So, I think I’d have an advantage over most of them.

Patricia Thaine:
Yeah, I think one thing we could touch upon is that fear of ChatGPT that there is of replacing various people within their roles. That is a tricky one to touch upon. I don’t know if you want to talk about that at all.

John Verry:
Sure. I’d love to hear your opinion.

Patricia Thaine:
So there’s still definitely a big need for people to curate data. If you think of a lot of roles, it’s curating information to make the best decision possible. This is enabling that curation of information a little bit more easily. You of course have to go look for third party sources, but down the line, that’s definitely going to be resolved. What this really is an opportunity to take a lot of very painful tasks. Where did I place the document from last Tuesday when I was talking to this client and what did it say about this very specific infrastructure information? I think it’s very much a pain reducer if we use it correctly and we don’t jump the gun on crazy scary use cases.

John Verry:
I think most of the time is that we grossly misjudge the impact that technology is going to have. So, when we invented the fax machine and we invented email, it’s like, “Well, that means that mail is going to disappear. When we suddenly created microwaves and overnight stuff, I mean we thought that it just changes the way. We adapt. So, I suspect that most of this stuff, I think it augments what we do. I don’t think it replaces what we do and I think that the only people that are in significant danger, the people that fail to adapt to the augmentation.

Patricia Thaine:
It does definitely call for an adaptation of our education system.

John Verry:
That’s an interesting point.

Patricia Thaine:
Yeah, we, for example, were taught don’t use Wikipedia for your sources, but also, these are the locations where you do look for good information. How do you train children to think critically about these systems the way that we were taught to think critically about the Wikipedia content?

John Verry:
Yeah, I mean you almost need to teach the generation that’s growing up with these new technologies how to optimally. Part of that is safely and securely, use these new technologies, leverage these new technologies to create a better version of themselves and to be more competitive in the world economy. It’s going to be interesting. I love the line. The only constant is change and we just need to adapt. There was also a line that I saw at a conference I was at recently, see if I can remember it.
It was things have never moved as fast as they’re moving today and we’ll never move so slowly again. I may have pollocked that a little bit, but I mean I think it’s a fitting close to this conversation on AI, right? I mean as crazy as things have been in, it’s never going to move this slow again because I mean it should accelerate logically.

Patricia Thaine:
Yeah, it definitely seems that way. That is a great line to have on for sure.

John Verry:
All right. So, I’ll ask you the question, I don’t know if you’re prepared. Hopefully you did because you’ve done a great job to this point. I don’t want to disappoint the listeners. Give me an amazing or horrible CISO, a fictional character or a real world person that would make an amazing or horrible CISO and why.

Patricia Thaine:
I wonder if you get this one a lot, but I think Sherlock Holmes would be an amazing CISO because being a CISO is basically massive amounts of detective work and observability.

John Verry:
I think we have heard that and you hurt my razor-thin ego significantly because you said … Maybe you get this a lot. I would’ve assumed by now you listen to all 120 or so odd podcasts I’ve produced and you would know exactly how many times Sherlock Holmes was mentioned. So, you just popped that bubble for me, Patricia.

Patricia Thaine:
It’s on my to-do list. Absolutely.

John Verry:
If it’s anything like my to-do list, you ain’t never getting there. All right. So, listen, thank you for coming on. I think that the work you’re doing at Private AI is fascinating and timely and I wish you nothing but the best of luck with it. If somebody’s interested in what you’re doing, how would they get in touch with you and your team?

Patricia Thaine:
Yeah, so there are a few ways. You can contact me at [email protected] or add me on LinkedIn. Please add a message saying it’s from this podcast or you could go on our website at private-ai.com and fill out the contact us form or ask for an API key if you’d like.

John Verry:
Awesome. This has been great. Thank you.

Patricia Thaine:
Thank you, John. Pleasure to speak with you.