Practical AI – Episode #293

The path towards trustworthy AI

with Elham Tabassi from NIST

All Episodes

Elham Tabassi, the Chief AI Advisor at the U.S. National Institute of Standards & Technology (NIST), joins Chris for an enlightening discussion about the path towards trustworthy AI. Together they explore NIST’s ‘AI Risk Management Framework’ (AI RMF) within the context of the White House’s ‘Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence’.

Featuring

Sponsors

Timescale – Real-time analytics on Postgres, seriously fast. Over 3 million Timescale databases power loT, sensors, Al, dev tools, crypto, and finance apps — all on Postgres. Postgres, for everything.

RetoolThe low-code platform for developers to build internal tools — Some of the best teams out there trust Retool…Brex, Coinbase, Plaid, Doordash, LegalGenius, Amazon, Allbirds, Peloton, and so many more – the developers at these teams trust Retool as the platform to build their internal tools. Try it free at retool.com/changelog

DeleteMe – DeleteMe makes it quick, easy and safe to remove your personal data online.

Notes & Links

📝 Edit Notes

Chapters

1 00:00 Welcome to Practical AI 00:33
2 00:35 Sponsor: Timescale 02:17
3 03:05 What is NIST? 03:49
4 06:54 Evolving trust in technology 04:58
5 11:51 AI for everyone? 05:32
6 17:37 Sponsor: Retool 02:54
7 20:47 White House Executive Order 06:08
8 26:55 Risk and trust 04:51
9 31:55 Sponsor: DeleteMe 02:35
10 34:45 Getting started 08:17
11 43:02 Tooling for AI teams 03:18
12 46:20 Where things are going 04:06
13 50:26 Thanks for joining us! 00:32
14 50:58 Outro 00:45

Transcript

📝 Edit Transcript

Changelog

Play the audio to listen along while you enjoy the transcript. 🎧

Welcome to another episode of the Practical AI podcast. I am Chris Benson, I am a principal AI research engineer at Lockheed Martin, and unfortunately, my co-host Daniel is not with us today, but it is my pleasure to introduce Elham Tabassi, who is the chief AI advisor at NIST, which is the National Institute of Standards and Technology. Welcome to the show, Elham.

Thanks for having me.

You guys are doing so much in this area in terms of AI, and kind of setting the stage… And I was wondering, for those of us in the audience who may not be familiar with NIST, if you could kind of start out with just telling us a little bit about NIST, what you do both in AI and maybe outside to give a little context, and give us a little intro into what NIST is doing in AI, and your role in that.

Yeah, happy to. NIST, or National Institute of Standards and Technology, is a non-regulatory agency under the Department of Commerce. NIST was established in 1902, and our mission has not changed since then. NIST’s mission is to advance U.S. innovation and industrial competitiveness. At NIST, we have a very broad portfolio of research, from building the most accurate atomic clock, to modeling the behavior of the wildfire. But most importantly, we have a long tradition of cultivating trust in technology. We do that by advancing measurement science and standards, measurement science and standards that makes technology more reliable, secure, private, fair… In other words, more trustworthy. And that’s exactly what we’re doing in the space of AI.

As I mentioned, NIST was established in 1901 to fix the standards of weights and measures. Our predecessors created advanced standards to measure basic things such as length, mass, temperature, time, light, electricity… All those that were essential for technological innovation and competitiveness at the turn of the 20th century. We are following the same course, working with and engaging the whole community in figuring out proper standards and measurement science for advanced technologies of our time, which is artificial intelligence.

And the way we do it is exactly, or maybe an improved version of what we have been doing in the past century or so. NIST’s day-to-day work is focused on helping industry develop valid, scientifically-rigorous methods. And one thing that I want to emphasize is that we do this through multi-stakeholder, open, transparent collaborations. While we have a lot of really good experts and expertise at NIST, we also know that we don’t have all of the answers, and it’s really important and vital for us to foster a consensus and buy-in across our stakeholder community.

So what we do is that we listen and we engage, we get the input, we distill them down, we develop a path for measurement to build up or bolster the scientific underpinning, and then we develop tools, guidelines, frameworks, metrics, standards etc. to support industry and technology. And we have done that for development of the AI risk management framework, we have done that for quantum computing, for cybersecurity, and we are continuing doing that for improving methods and measures for risk management and trustworthiness of AI systems.

That’s very – that’s a great introduction. I’m curious you talkedb about a couple of things about collaboration… You seem to be right at the center, sort of an interface between government interests in these technologies and the issues around them, and industry. And I know you work with a number of different organizations as - NIST does - in these different things that you’ve talked about. And you specifically called out trust. I was wondering if you could talk a little bit about kind of how those different collaborations work, how trust in technology can evolve, and how does NIST go about that process of starting to - you know, in AI and in other adjacent technologies, how does it go about that process that it’s been doing for so long?

Yeah, thank you for that question. As I said, it’s sort of, I think, the magic sauce for us to do stakeholder engagements, to work with the community, and ask for their inputs, leverage the knowledge of the community, and build on the really good work that the community has done… And by working with all of the experts, strengthen the scientific underpinning, building the right technical building blocks that are needed for development of scientifically valid guidelines and standards.

[00:08:08.08] In terms of the engagements, particularly in the space of the AI, we all know that AI is multidisciplinary, and in order to understand the concept of the trust, what makes AI systems trustworthy and what constitutes trust - that was one of the main questions as we were developing the AI risk management framework.

And in the engagements that we were doing with the community early on, we recognized that as much as we need the input from the community that developed the technology - this is the community with expertise in math, statistics, computer science - we also need input from the community that studied the impact of the technology. That’s economists, sociologists, psychologists, cognitive scientists. And we need to bring all of them together, because AI systems are more than just data compute and algorithm. They are a complex interactions of data compute and algorithm with the human, with the environment, with the human that operates this, with the human that can be impacted by the systems. So that engagement with a very broad set of actors in the community, to bring different expertise and backgrounds, became really important.

To answer your question about the trust and what constitutes the trust - as I said, that was one of the important and central questions in development of the AI-RMF. The AI-RMF or the AI risk management framework, very briefly - it was directed by a congressional mandate, and it’s a voluntary framework for managing the risk of AI in a flexible, structured, and measurable way. It was, as we do with anything else, developed in close collaborations with the AI community, and engaging diverse groups of different backgrounds, expertise and perspective to particularly get in and focus and hone on the concept of trust and trustworthiness.

So on the side of what makes AI systems trustworthy, when we started the process, there have been very good, high-level, value-based documents that talk about AI systems to be non-discriminatory, ethical, and there has been a lot of different other papers and publications. Basically, there were many different views about what makes an AI technology, AI systems trustworthy, and these views were not all aligned and on the same page. So it’s not a property that can be defined with perfect rigor, but based on the collaborations and engagements and consultations that we did with the community, we understood that there are well-established key characteristics of trustworthy systems.

With the help and consultation with the community, the AI-RMF describes trustworthy AI systems as those that are valid and reliable, accountable and transparent, safe, secure and resilient, explainable and interpretable, privacy-enhanced, and firm with harmful bias managed. It takes it a step further and for each of these characteristics it provides sort of a definition or brings the community on a shared understanding with expectation from each of these characteristics, and also talks about how these characteristics interrelate, and trade-offs involved in decisions about how safe is safe, how private is private, or how to enhance interpretability or transparency, while at the same time, for example, preserving the privacy, or ensuring the security and resilience of the AI systems.

[00:11:54.11] I’m curious - and as we go we’ll certainly dive into those topics, but one of the things around trust is folks like us who are in this industry, and are living and working in and around, and developing AI every day, these are kind of work topics that we’re going through, and the guidance that NIST provides is invaluable, and especially being part of that process of developing, and as you described… But I would - before we surge into all that, there’s so many people out there that are not in this line of work, as we are, and those that are curious about how they see it in the news every day, and they’re trying to understand what are these technologies that we’re working on… And many people in the audience for this podcast are what we would describe as kind of AI curious, as opposed to just – we have practitioners, but we also have AI-curious people, who are trying to understand how it fits into them. And I was wondering if you’d take a moment and kind of talk about the context of trust and AI for those of us who are not in this industry in a direct way like that. Does NIST try to frame it for the larger population, or is it more for practitioners? How do you see that for the larger world?

Yeah, thanks for that question. So let me try to answer that with an example. We are seeing enormous advancements in the AI technology. Just in the past year we saw a lot of releases of powerful models. We are also seeing that this technology, these AI systems are being incorporated into a lot of the functions of the society, and the way we do our work. I want to explain the concept of trust in an example in use of the AI systems in the health domain.

When we go for medical imaging - I’m coming from the computer vision; that’s where my training was, and that’s the field I feel comfortable in. So when we do medical imaging, some sort of imaging of the brain, and the question is “Is it a tumor there or not?” So an algorithm can go and be employed to help the physicians to make that decision.

So first, for that system, for that algorithm, we want it to be – talking about AI-RMF trustworthiness characteristics. We want it to be valid and reliable. We want to make sure that it has some certain level of the accuracy. So the false positive and false negative is low, because there is going to be – you don’t want to scare a patient saying that “Yes, there was a tumor” when there was none. Or vice versa, a tumor, because of the errors of the systems, is going unrecognized. So we want the systems to function as intended. We want it to be valid, and results being reliable.

On top of that, we also wanted the systems to be secure and resilient, because if it’s not, and if the system get hacked, there’s a lot of the personal information that can get in the hand of non-friendly users. Talking about that, we want the systems to be privacy-enhanced. We have heard, read that particularly the large language models, they have the tendency to memorize the training data. Even before the large language model there were papers that showed that with a certain level of expertise, the training data can be inferred from AI systems. So if the system has been trained on real patient data, we don’t want to have any hole that can give access to those private informations.

Explainability and interpretability. So if it comes and says that yes, there is a tumor, we expect it to give some reasoning, some sort of explanations of why it decided that there is a tumor. And then there’s a lot of nuance there too, because that explanation, if it’s been given to a physician versus a technician, versus the patient, it’s going to be a different level of the technicality and a different level of the informations being shared. And of course, we want it to be fair. We don’t want AI systems that have been more accurate for certain demographics versus others. This usually happens if the training data is uneven.

So all of this, at the end, we want to build the confidence in this that this technology works. And the results, predictions, recommendations that the system is providing for better decision-making in this case, analyzing a scanning of the brain to see if there was any tumor there or not. So all of these things are with the end goal of AI technology has a lot of promises. They are very powerful tools. They can transform the way we work for better… But make sure that at the same time it uplifts all of us and we get the maximum benefits while minimizing the negative consequences of the technology.

Break: [00:17:24.08]

So I know in the early part of 2023 NIST issued the AI risk management framework that we’ve been talking about… But a few months later on, or almost exactly a year ago as we’re talking, in late October, the White House issued its executive order on the safe, secure, and trustworthy development and use of artificial intelligence. So I was wanting to understand how the issue of the executive order might have either altered, or accelerated, or changed any of the work that NIST was already doing. You guys were already very much involved in artificial intelligence, through the framework and other activities. Could you describe the impact of the executive order on the work you were doing?

Absolutely. In answering your question, if I can just go back from the release of the AI-RMF in January 2023 to release of the executive order end of October, October 30th, 2023… So the AI-RMF was released January 2023. In March of that year we released the AI Resource Center. This is a one-stop shop of knowledge, data, tools for AI risk management. It houses AI-RMF , its playbook, in an interactive, searchable, filterable manner. And by the way, the AI Resource Center is definitely a work in progress, and we want to keep adding to that, and adding more additional capabilities, things such as a standards hub, a repository for metrics. We want it to be really a one-stop shop of all of the information, but also a place for engagement across the different experts.

In June of 2023 - just to give a little bit of context - ChatGPT-3 was released in November 2022, a month or so before the release of AI-RMF. And ChatGPT-4 was released in February, or beginning of March, a month after release of the AI-RMF. So in response to all of these new developments and advancement in technology, we put together a generative AI public working group, where more than 2,000 volunteers help us study and understand the risk of the generative AI.

And then in October, as you said, we received our latest assignment, Executive Order on Safe, Secure, and Trustworthy AI. This executive order really builds on the foundation works that we have been doing from the AI-RMF [unintelligible 00:23:28.05] resource center, to the generative AI public working group, and supercharged our effort to cultivate trust in AI, mostly by giving us some tight timelines of the things to deliver.

[00:23:42.17] The EO specifically directed NIST to develop evaluations, red teaming, safety and cybersecurity guidelines, facilitate development of consensus-based standards, and provide testing environments for evaluations of AI systems. All of these guidelines infrastructures, true to the nature of NIST, will be a voluntary resource for use by the AI community to support trustworthy development and responsible use of AI. We approach delivering on the EO the same way that we do all of our work, going to the community. We put a request for information out to receive input; based on the input that we receive we can put a draft document out for public comments. Based on the comments that we received, we developed the final documents, that we were very pleased that all of them were released by the deadline of July 26th, that the EO had given us.

A quick overview of the things that we put out… One of them was a document on a profile of AI-RMF for generative AI. The document number is - at NIST we like to refer to everything with a number. So that document is the NIST AI 600-1.

It’s a cross-sectoral profile, companion resource to the AI risk management framework. Based on the input that we had and the discussions that we had on the generative AI public working group, responses to the RFI and inputs that we have received, I think one main contribution of that document, if I want to summarize it, is its description of the risks that are novel or exacerbated by generative AI technologies. These risks span from CBRN information capabilities, access to synthesis of materially nefarious information that can lead to design capabilities for CBRN, confabulation, dangerous, violent, hateful content, data privacy risks… Let me remember the rest. Environmental impact, bias, human AI configuration, information integrity, information security, intellectual property, degrading or abusive content, the concept of the value chain and component integration…

With the generative AI we are moving from the binary deployer/developer kind of actors and dynamics, and now we are having upstream of the third-party components, including data, that are part of this value chain. So one of the things that we’re working in continuing that work is to work with the community to get a better understanding of the technology stack, of the AI stack, if you will, for AI, and understand the role of the different AI actors involved there, so we can do a better risk management.

As you’re talking about that, could you describe a little bit - and this is just a question in my mind; when we’re talking about AI as risks, as a set of risks, and we talk about that effort to create trust in technology, how do you tie those together? In the NIST process you’ve identified these risks, and you’ve just enumerated those… And with the purpose of ultimately helping people get to a point of trust, and being able to implement the technologies productively, how do you approach getting to trust through mitigation of risk? I’m not sure if the question makes sense or not…

It certainly makes sense. I’ll try to answer the way I understood this. So AI systems are not inherently bad or risky, and it’s often the context that determines if a negative impact will occur, and also what are the risks. An example that I usually use is that if I use face recognition to unlock my phone, versus face recognition in the airport that now our faces are a boarding pass to get on the plane, or face recognition in the context of the law enforcement, it’s the same technology, but in the different context there is different risks and different levels of the assurances that we want to have for the systems to work in a trustworthy manner.

[00:28:30.07] So what we have been trying to do as part of our work in approaching trust and trustworthy AI - the first one was to unpack the concept. Try to get into the characteristics that make a system trustworthy. That helps to answer the question of what to measure. If I want to know if it’s trustworthy or not, what are the measurements I need to do? So I listed the seven characteristics, from valid and reliable, safe, secure, et cetera. So that gives a more of a systemic approach and structural approach to what are the dimensions, what are the characteristics that together can make a system trustworthy.

And by the way, AI-RMF talk about this, that not each of them by itself make a system trustworthy. You can have a system that is very secure, but not valid or accurate. So that’s not going to be trustworthy. And a system that’s a hundred percent accurate, but not secure - it’s also not trustworthy. So that gives it, again, a more structured approach on what to measure.

Then the next step is how to measure methods and metrics for the measurement. Those types of measurement gives information about limits and capabilities of the systems, the type of the risks that can occur, the magnitude of the impact if those risks occur. And then based on this information, then we can come up with mitigations and management of the risks.

So AI-RMF, its recommendations is really categorized in the four functions of govern, map, measure, and manage. The govern is giving recommendations on procedures and processes, roles and responsibilities that we want to have in our organizations to do effective risk management. So what is the accountability line? What are the roles and responsibilities involved?

The map functions provides recommendations on understanding the context of the use. Going back to the examples of the face recognition - understanding the environment in which the AI systems are operating there, understanding the community that can be impacted by that, identify the risks in this particular context, understanding the laws, regulations, and policy that are effective in this context of use…

The measure functions provides recommendations on the how to measure. So for all of the risks identified in the map measure, it provides quantitative or qualitative recommendations on how to measure them, how to take into account the trade-off between all of those trustworthiness characteristics… And all of this information is being used during the managing the risk part, that the recommendations can go from safeguards and mitigations that can put in place to mitigate risk, to - sometimes we cannot just mitigate risks, and the risks should either be accepted or transferred, or the system is too risky that it should not be developed or deployed. So that is the process in AI RML.

Break: [00:31:47.15]

So that was very useful for me in terms of trying to frame and understand what you’re relaying here in terms of govern, map, measure, manage… And you talked about something a moment ago that was really interesting, in the sense of you have these characteristics that you’re trying to measure toward trustworthy, but it’s not just one, and it’s not just a black or white issue. You have a collection of them, and they vary across different types of use cases, it sounds like. So you kind of have characteristic profiles, in a sense. How do you think about – if you were out there as a consumer of the guidance that you’re providing from NIST, maybe in a small company that’s doing some work in AI, and you’re trying to implement the guidance from NIST, and you’re kind of evaluating your own profile of characteristics through that govern, map, measure, manage process. How does one frame that? If you’re kind of just getting into this and trying to implement the guidance, could you talk a little bit about how an organization that maybe had not done this before might go about implementing a particular, whatever their use case is, and – how do they get started in the process? What’s your recommendation there?

The first thing I will say is that you don’t need to implement all of the recommendations in the AI-RMF to have a complete risk management. So our recommendation is start by looking at and reading the AI-RMF. It’s not a very long document. I forgot, I think it’s about between 30 to 35 pages. So get kind of a holistic understanding of this. And then check out the playbook in the AI Resource Center, where for each of the recommendations - AI-RMF is in high level four functions. Each function is divided into categories, and then subcategories, so in sort of a granular approach. We give recommendations on what to do for the govern, and then for each of those recommendations get into a little bit more granular recommendations.

The playbook for each of the subcategories, which is about, I think, 70 subcategories in the AI-RMF, provides recommendations on suggested actions, and informative documents that you can go read and get more information, and also suggestions about transparency and documentations for implementation of that subcategory.

So we often suggest to get a better understanding of the AI-RMF, spend some time in the playbook to get a better understanding of the type of the things that can be done. And then, based on the use case, based on exactly what you want to do, start by simple, small number of recommendations in the AI-RMF, and start implementing that. Govern or map functions are useful starting points.

[00:38:09.08] Govern provides recommendations about the setup that you need for a successful risk management, so it can give you ideas or an organization’s ideas about the resources that are needed, the teams that need to do this, so they can align it with their own resources and the teams that they have. And the map functions, as we discussed, gives recommendations of a better understanding of the context, getting answers to what needs to be measured.

I will also add that the functions, govern, map, measure, manage - there is no order on doing that. It depends on the use case, it depends on what needs to be done. The starting point can be recommendations of any of the functions. We usually recommend to start with govern and map. And then start with as few number of the subcategories or recommendations that the resources and the expertise of the entity allows for their implementations. Of course, prioritize in terms of their own risk management.

And then the last thing I’ll also add is also be mindful that the risk management is not a one-time practice that we just do at once and you say “Okay, I’m done with my risk management.” AI systems - there’s data drift, model drift, these newer models can change based on the interactions with the users, with the environment… So we suggest a continual monitoring and risk management. I think one of the recommendations in the map or govern is to come up with a cadence of repeating the assessments of the risks.

So those would be my recommendations. Another thing that I would say is that - I mentioned AI RC, I mentioned the playbook… We also in the AI-RMF talk about profile. So I keep emphasizing the context of the use, and mentioning the importance of the context in AI system deployment, development, and the risk management… At the same time, AI-RMF by design is trying to be sector-agnostic and technology-agnostic. We try to kind of come up with the foundations, the common set of the practices that’s needed to be aware of, and are suggested for risk management. But we also have a section on AI profile, and recommendations on building verticals. These profiles are instantiations of the AI-RMF for a particular use case or domain of use, or technology domain, so that each of the subcategories can be slanted or be aligned with that use case. So there can be a profile of AI-RMF for the example that I used, medical image recognition. There you can imagine a profile of AI-RMF for financial sectors. That’s something that we have been asked to work with the community on.

That was a very long intro to say that there are a couple of profiles posted on the AI Resource Center. One is the one that the Department of Labor did for inclusive hiring. Another one that the Department of State did for human rights in AI… So that can give some sort of a window to, or idea about where the organizations can start.

In addition to the profile, we have also posted a few use cases, and we will post more use cases, and that is how different organizations are using AI-RMF that can hopefully be more practical examples of how to use AI-RMF.

[00:42:03.19] That’s a fantastic set of suggestions right there… And I’d actually like to ask a follow-up to that. And as a prelude to my follow-up, if I’m understanding, kind of go to the AI-RMF, read that core document. It’s not very long. It’s very consumable. Go to the playbook, look at the subcategories… I believe you said there were about 70 of them. It has suggested actions and references to other docs in that… And then start to bite off simple, small chunks in terms of how you’re going to approach the functions that you mentioned, starting with govern and map, and then kind of how to put together resources and teams… And then cycling back, with a cadence of repeated assessments that are also specific to the vertical that you’re in. And as you’re doing that, it’s feeling really practical from my standpoint. We are practical AI, so that appeals to us.

I’d like to ask, are there now, or do you expect tooling? If you look outside of AI, at the software industry at large as a predecessor to that, as standards and workflows and best practices arose in software development at large, lots of tooling arose around how to do agile methodology… And you name it; there are many different approaches to software development. Are you expecting tooling, or do you have any thinking around what kind of tooling might help AI development teams as they’re building these teams and their resources, so that they can be productive over time? How are you seeing that evolve going forward? Do you think that there’ll be a cottage industry kind of forming around this the way we’ve seen in software and other areas, where there’s a lot of tool support?

Yes, we have already started seeing some of that. So there are entities that are putting tools for implementation of the AI-RMF and dashboards and all this. They have developed those tools and they are having it on their websites. If I can just go back and thank you for your excellent summary of my very long-winded answers…

No, it’s very good. I’m learning a lot here.

And I ask your listeners to, I think, start with the AI Resource Center. The URL is airc.nist.gov. AI-RMF is there, and a playbook in an interactive, filterable way is there. So if their businesses are only – you know, if they are developers, they can go and first filter all of the from the 70 recommendations, anything that is only applicable to the developer, so they’re not overwhelmed with all of that. Or if they only care about deployment, and the issue of the bias for the deployment, they can go and say “Filter from the AI actors, for the deployers, and from the characteristics, from the bias”, and that saves them some time. So that is where they get information from our website, and some hints about kind of - we have it in a more filterable way.

And yes, there has already started entities that are putting more tooling in, and with the 600-1, that was a cross-sectoral profile of the AI-RMF for the generative AI… The work that we’re doing with the community, we are focusing on – we use the word “operationalization”, so what are the tools that are needed for operationalizing and implementing AI-RMF. And going back and emphasizing the community engagements and the role that the input from the community plays in all of these things… Some of the tools can be developed by us, but the majority of the tools are being developed by the community, and shared by the community, and we hope that we see more of that.

[00:46:05.18] I hope so, too. It’s fascinating, and I love the framework that you’ve given us here, that can be applied in so many different verticals, and so many different ways, and yet is flexible in its guidance that way.

As we wind up here, and we have seen so much advancement in the development of AI, both as a technology and as the industries around it, and as you are kind of sitting there in the nerve center of kind of where this guidance and these standards come together, bridging both government and industry… As you look forward, what are some of the things, when you’re not in a particular meeting and you’re just kind of winding down, and you’re kind of thinking creatively about where things are going, what are some of your own thoughts about the future of this, both for NIST’s role, and for the industry and the technology at large, where we’re going? Because it’s just – it’s going at such a rate, it’s so fast, and it’s fascinating, and it’s changing the face of business, changing the face of how we are as humans and stuff, in terms of the tools that are available to us… I’d really love your insights into where you think all of this is going in the days and years ahead.

I think what – and for me the end goal, what I’m hoping to see a lot of that is to use this powerful technology as sort of a scientific discovery tool in the way that we are doing the science and discoveries there. I think that is where we are going to see a lot of really advancements into precision medicine, individualized educations, climate change… Anything that’s going to make life a lot better for all of us. I have to say this - my heart was warmed by seeing Nobel prizes for things such as AlphaFold. I keep saying for a long time that that needs a lot more recognitions. But really, all of the recognition that AI got through those prizes.

But I’m also very aware of very important things that NIST can do and the community needs to do. I think we all agree that there is a lot that we don’t know about how these models work, and we ought to do something about it. We need to have a better understanding of how these models work, their capability and limits… That gets me to the important topic of evaluations and testing. We talked about it at the beginning of this podcast, that it’s important to unpack the concept of the trust into the things that need to be measured, but at the end of the day we need to have reliable measurements for assurance that the systems are trustworthy.

At NIST, as a measurement science agency, we are the big fan of this quote from Lord Kelvin, that if you cannot measure it, you cannot improve it. So if you want to improve the trustworthiness and the reliability of the systems. We need to have a good handle on how to test them and how to evaluate for reliability, for validity, for the trustworthiness characteristics, and our knowledge on how to test AI systems is very limited. We need better evaluations. As we can see, benchmarks are too easy. They get saturated very quickly. We need to have a better understanding of how they work. That gets to the assurance that can build trust into the technology, and give users, everybody confidence that the system works.

And the third item that I put in - once we have built that knowledge base, once we have good scientific foundations, when through the research and the work with the community we have built the technical building blocks, let’s develop clear, understandable, technically robust standards that can help with global improbability of AI evaluations, AI assurance, and AI governance.

Fantastic. Well, Elham Tabassi, thank you so much for coming on the Practical AI podcast. It was very, very instructive in terms of how to frame this; certainly information that I’m going to be using going forward, and I really appreciate you taking time to talk with us today.

I appreciate the opportunity to be here and talk, and I really enjoyed the conversation. Thanks.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

Player art
  0:00 / 0:00