Back to Video Hub

Keynote: The state of AI with Arvind KC, former Chief People and Systems Officer at Roblox, now OpenAI

Key Highlights

Senior leaders at high-performing AI companies are three times more committed to AI adoption

McKinsey data shows that companies leading on AI have one defining characteristic: senior leadership commitment that far exceeds the norm. The implication for functional leaders is direct—keeping curiosity alive and building personal learning routines isn't optional, it's the job. 

An AI tool that's right 80% of the time earns 0% of your trust

Arvind shares a quote from an engineer at Roblox about a support bot that was confidently wrong half the time—making it useless all of the time. The insight cuts to the core of why deploying AI without the right guardrails doesn't just fail to add value, it actively erodes it. 

Work is not going away, but workflows as we know them are

Arvind draws a sharp line between automating tasks and rethinking what work actually looks like. The shift from static, task-based workflows to goal-oriented agentic systems is where the real transformation lives—and most companies haven't crossed that line yet. 

AI adoption is not a switch you flip. It is a change in how work gets done, how decisions get made, and how organizations learn.

In this interactive talk, Arvind KC, former Chief People and Systems Officer at Roblox, shares a practical framework for navigating the shift from traditional automation to agentic workflows. The core idea is simple: work is not going away, but workflows as we know them are.

What you’ll learn

How to lead AI adoption as an organizational change, not just a technology rollout
How to build learning routines at the leadership level
How to foster a culture of experimentation in G&A functions
How to identify use cases where you can test quickly and iterate fast
How to design agentic workflows that guard against hallucinations and missing context
How to build evaluation systems that measure both technical quality and real business outcomes
How to structure learning loops so systems improve with usage and teams retain critical thinking
Where AI creates lasting value beyond surface-level features

About the speaker

Arvind KC is the Chief People Officer at OpenAI. He was previously the Chief People and Systems Officer at Roblox, where he led a first-of-its-kind fusion of the people and systems functions into a single team. That combination gave him an unusually hands-on perspective on how AI can reshape the way organizations are built and run.

Before Roblox, Arvind held senior roles at Google, Palantir, and Facebook, giving him a vantage point across multiple waves of technology transformation. He brings that cross-industry experience to bear on one of the most pressing questions for today's leaders: not whether AI will change how work gets done, but how to lead that change well.

Takeaway #1: Lead the change—and build your own learning routine first

AI adoption doesn't happen from the middle of an organization. It starts at the top. Arvind points to McKinsey data showing that senior leaders at high-performing AI companies are three times more committed to the technology than their peers elsewhere. That commitment shows up in a specific way: staying curious even when the technology frustrates you.

Arvind describes oscillating between fascination and frustration with AI—sometimes within the same ten-minute session. His advice isn't to power through the frustration. It's to separate your emotional state from your commitment to learning. You can be pessimistic about a specific tool or outcome while still keeping your foot on the pedal of curiosity. For G&A functions in particular, this means catalyzing learning that won't happen on its own. At Roblox, that looked like:

Investing in hands-on AI training for non-technical teams
Encouraging employees to experiment with tools like NotebookLM
Finding low-stakes ways to build confidence with new capabilities

Takeaway #2: Pick the right use cases before you build anything

Not every business problem is suited for AI right now. Arvind lays out a practical filter for identifying where to start—and where to wait. Three questions worth asking before committing to a use case:

Is there a tolerance for probabilistic outcomes? LLMs are inherently probabilistic, which makes them poorly suited for deterministic workflows like financial filings or inventory management.
Is this an innovation opportunity rather than an efficiency play? AI tends to generate higher ROI when it opens new possibilities rather than just trimming existing costs.
Does the team have quality training data, leadership buy-in, and the appetite for iteration?

Beyond those filters, Arvind emphasizes one practical consideration above all others: how tolerant the team is of failure. If the first sign of failure will cause a team to give up, that's the wrong place to start. The right place is where people are genuinely excited to experiment, fail, and try again.

Takeaway #3: The real work happens in the underbelly of workflows

The biggest misconception Arvind challenges is that AI transformation is about turning on a feature. To make this concrete, he walks through a live example: building a customer support system at Roblox. The naive version—pointing an AI tool at a knowledge base and letting it answer questions directly—produced a bot that was confidently wrong often enough to be useless all of the time. As one Roblox engineer put it:

"I find the support bot usually confidently gives me answers and is just wrong 50% of the time, which means its answer is useless to me 100% of the time."

^{— Arvind KC}

The better version introduced structured prompts, dual-scoring by both an LLM judge and a human domain expert, routing logic based on confidence scores, and feedback loops for continuous improvement. The lesson isn't that AI can't do customer support. It's that making AI work in production requires rethinking the entire workflow: who touches the output, how quality is evaluated, and how the system improves over time. The sparkle is the beginning of the journey, not the destination.

Takeaway #4: Build evaluation systems and protect against cognitive offloading

When you deploy a human workforce at scale, you build performance management systems to know who's doing their job well. The same rigor applies to an agentic workforce. Arvind asks every AI vendor he meets for a detailed eval scorecard before taking a meeting—and breaks that scorecard into three layers:

Technical performance: precision, accuracy, hallucination rate
Business outcomes: did the AI actually accomplish what it was supposed to accomplish?
Ongoing evaluation: models change, data changes, and workflows evolve—evaluation isn't a launch checklist item, it's an operating system

There's also a subtler risk: cognitive offloading. A Microsoft research paper Arvind references found that people who over-trust AI stop applying critical thinking, even when the AI is wrong. His prescription is to design workflows where human domain experts are actively coaching the system, rating outputs, and staying engaged. The goal isn't to get humans out of the loop. It's to make the human-AI loop one where both sides keep getting better—because as Arvind puts it, the future isn't about how we do things. It's about how we teach things.

Transcript

Arvind KC: [00:00:00] I, I like to try different tools. I end up using my machine all the time, which always creates last minute challenges. Figure this out. All right. Hey folks, it's good to be here with you all. Uh, I'm gonna make this a fun, interactive session, so we will make it like a little bit more participatory, and I hope you folks have, uh, you know, questions and comments.

And some of the stuff that we're going to do is going to be a joint exercise. Is that cool? All right. So, uh, the way we are gonna start by making it interactive is the following point your phone to this, and we're going to do a mentee session in a second.

Ready?

You guys gotta move. Ah, that was a trick to see if your folks are paying attention. All right, so now we are going to start by, this is a quick survey. It's gonna take us a few minutes, but just let's get some [00:01:00] participation going. What's your favorite AI tool? Granola. Interesting chat. GPD, clot, perplexity.

Uh, chat. Gp. P'S gotten bigger Notebook columns. That's interesting.

All right. We got 28 folks. We're gonna get close to 28 before we answer. And. So definitely looks like Chad. GPT seems to be a team Hana, a analyst. I'm sure well done whoever did that.

Alright, we got 21 of 28. We got seven more to go. Come on folks.

Interest of time going once, twice. I'll go to the next question. AI will completely transform how work gets done. And there are three questions actually in the next year. What do you think? Do you strongly disagree or strongly agree in the next three years and in the next 10 years? So there are three questions.

How strongly do [00:02:00] you agree or disagree with this? Interesting. Fantastic.

Nine 18. We'll wait for our magic number of 2121. Okay. So as you can see, most of you have a very similar intuition that it's not kind of there yet, but we believe in this tech, right? If you look at the score on strongly agree in the next 10 years, everybody thinks it's gonna completely transform how we work.

But the first quest, the first response in the next one year was a one, right? So it was more of people saying, I'm skeptical in the short run, but optimistic in the long run. The the next one is, does AI create jobs or eliminate jobs? What do you think?

Wow, nobody. Okay. Eliminate jobs. Thank you. One person for, for having a different opinion. Interesting.[00:03:00]

Another person. Well, I think the consensus is it's going to be a huge workforce shift that happens. It's neither gonna create, nor is it going to, it's gonna both create as well as eliminate and, and that's going to be a shift that we are experiencing. And I think, I think that's about right. And then, um, how do internal functions and think about the g and a functions, finance, recruiting, and hr, do they become less strategic, more strategic, or do they get automated away?

I love the responses to this question. There's a. The confidence and conviction, which I agree with.

So almost everybody thinks that, you know, internal functions are gonna get more strategic, uh, with the change that happens. And I think, uh, this is the last question. Uh, you know, what are your predictions for the workforce of [00:04:00] 2030? And this is a freeform text. Anything goes.

Conversational. Okay.

We can do more than agile. Okay.

AI agents as employees, that's, I think many of us can see that. Less junior positions, more strategic level work. That's interesting. I actually have the counter view there. Uh, but we can talk about that separately. And lots of uncertainty for new grads. I think that's right. But I would extend it to say lots of uncertainty.

There's no qualifier beyond that. It's for lots of uncertainty for all of us. It's a, i I, I sometimes joke and say, uh, you know, LLMs are really good at predicting stuff. And the more unpredictable you're as an executive, [00:05:00] the harder it is to eliminate your job. Right. So, but that's, that's just a gist. Um, um, alright, so last one.

Um, how prepared are you for this change? Very well prepared. Bring it on. No idea what to do. And Facebook. Relationship status. Style. It's complicated.

I guess It's complicated and that's why we are all here trying to like, you know, make sense of what's going on. Um. I love the confidence of very well prepared. Bring it on. Is that team Mohana?

All right. So I wanted to start off with this so that, you know, we all can think about, uh, you know, what, what we're gonna do. I agree with many of the views you've said, and my intent in the next, um, uh, half an hour or so is to share my views on where things are and how can you prepare for this change. Uh, but we'll do a little bit of another interactive thing in between, uh, and we'll have some space for questions.

So, with that, uh, I, [00:06:00] I wanna start with a little bit of framing. So what has happened in simple terms? So this was, I, as, as Shar said, I spent, I was fortunate to spend time on Facebook, Palantir, uh, Google. And now, uh, my, the funnest job I have is that Roblox, where we have fused people and systems into one team.

Uh, and that has given me the, uh, the, uh, fortune of looking at some transformations that have happened. Uh, but if you look at, even until five years back, most of the data that ML could operate on was structured data. That was data that was there in tables, and you could only do models that were based on tables.

What has happened with Gen AI is you have unleashed unstructured data, and majority of the data that is there in an organization is unstructured data. That is text, voice, video. All of that is now, like, now has been unleashed, so you can now do a lot more. The second thing that has happened was in the past, if you were to do any form of machine learning, you need to be an ML engineer.

You'll bring some, you'll build some models either with the, uh, TensorFlow or ski kit Learn. But now with Gen Gen ai, [00:07:00] uh, with simple prompts, you can do a lot of fancy stuff. You can, for instance, give a piece of text and say, tell me the sentiment of it. This used to be. Uh, a classification problem. So a lot of NLP problems just became a lot easier where everyone can now do, uh, machine learning.

And the third thing is, in the past output of AI was very simple. Like I think the, if you remember the early image classification days, all it did say, is it a cat or a dog? Yes or no. But now you can do much richer output. It can generate images, it can generate videos. And now the current, uh, uh, hot area in the world is world generation where you actually have physics involved as well.

So it can do a lot more generation than before. So that's kind of what's happened in a nutshell, is you've taken data that was hidden in enterprises, unleashed it, and then let anybody, uh, use, uh, advanced intelligence to interrogate this data. Okay. Now, having said that, there is also this thing about Agent tick tools.

So what's that? What's the big deal about agent tick tools? And I think many of us would've wondered on, is this a new buzzword, uh, like [00:08:00] crypto or is it something else? Uh, and I think like there's something real about it. Like if you think about the static workflows of the past, uh, you had to have a fixed workflow that you define and you give it a series of tasks.

The key difference in an agent workflow is you talk about goals. You don't talk about tasks, right? You don't say, here are the sequence of tasks you want to do. You know that the LM can do multiple things. You give it a set of tools and say, here are the goals I want you to achieve, and I want you to craft a path to achieving it.

So what you have done is workflow automation in companies has been upgraded because you can now operate at a higher intelligence level of giving people goals as opposed to tasks. Right. Or giving the machine goals as opposed to tasks. So that's the big deal about agent tech workflow. Uh, it also lets you have very flexible workflows.

You don't need to plan for the infinite permutations that are there. You can give an LLMA set of tools and say, given a user intent, you figure out what's the workflow that needs to be constructed. Right? So that's, that's the two big things that have happened, and that's why people are like, oh, I suddenly have.

A much more capable [00:09:00] workflow automation that's there, and how do I unleash it in the workforce? So that's, that's, that's the big deal that's happened. The rest of the section I want to like devote to, uh, you know, how do we, how do we be successful with this transformation that's happening? But before that, I see a lot of people feeling a little angsty about, I don't know if I'm, you know, am I too late?

Are the companies ahead of me? Where are things? So I wanted to assure you with like, start with, it's still very early. Don't, don't freak out. Right. You know, there is a lot to do. And let me explain this chart to you. This is from a McKinsey report that just came out where they, uh, uh, had a survey to about 2000 people.

And what this shows is on the, on the, along the rows are the different business functions in an organization, and then they are shaded by. Have they started an AI transformation journey or have they fully scaled it? So if you see blue and darker shades of blue, that means that's the percentage of, uh, fully scaled workflows within a business function, right?

So in it, if you see for example, uh, [00:10:00] the dark blue too represents fully scaled. And then the way to think about this is what's the amount of gray in this, in this chart, and that there's lots of gray, which means most people are very early in their journey. So if you feel you're early in the journey, or if you feel you just started.

Chill. That this is a, this is a marathon, not a sprint. We good with that? All right. Thank you for that interaction. That's fun. Okay, now let's go to, so how do we go about changing this, right? The first, and you all are like leaders of different functions in your organization. The first thing that you need to do is lead the change, right?

And uh, this again, is from the same report. What this shows you are two charts. The left are, um, all, all, uh, uh, uh, everyone except the high performers and the right are high performers. And it asks them a very simple question, which is how committed is your senior leadership? And what they see is companies that are high performers when it comes to AI adoption, their senior leaders are three x more committed, right?

So you, as leaders of your organization. Have to [00:11:00] figure out a way to get learning routines to be on top of, on top of this. I mean, as an example, you all coming here is a way of like, you know, of, uh, fueling the curiosity. So the first thing you gotta do is lead the change. Now, one thing that I've, uh, so like all of you have been on this journey for a, for a few years now, and I go between like fascination and frustration, right?

Fascination on things it does. And frustration on simple things that it cannot do. And sometimes the journey from fascination to frustration happens in 10 minutes. Right. So, uh, and here's, here's my view on this is for a long term trend, and I believe AI is a long term trend. All of you believe it's a long term trend.

When you look at the survey that we did for a long term trend with a lot of hype around it, it's perfectly okay to vacillate between optimism and pessimism. Okay? But don't take the foot off the pedal on being curious. You can be very pessimistic, but this is a long term trend. So you can be like, oh my God, this is so hard.

This is difficult to do, but you can't reduce your curiosity. And with this, you [00:12:00] have to foster a culture of learning, right? So that's the second thing that you can do as leaders is, um, your organizations, people are all looking to say, I have my daily jobs. This wave is sitting at me. I don't know what to do.

And you got to help them make sense of it. So to share, you share some real examples. Uh, we chose Gemini as our, and Google as our stack. It can be any stack. I'm not, uh, pitching any pro, any particular product. Um, but the, uh, uh, the thing that we did is, uh, for, uh, for engineering functions, of course there is a self-learning, uh, loop that is there, but we had to catalyze the self-learning loop for GNA functions.

And it's as simple as. Uh, get some advanced Gemini training. Like, you know, how do you use notebooks? Somebody had said that they like notebooks. Who, who, who, who was the notebook alarms fan? Okay. Thank you. So it's, it's a pretty fascinating product. Uh, and then we also have them, we had people, um, instead of writing walls of text.

Ask, uh, whether, uh, uh, you know, wipe code, a small React app that is visually much more appealing so people [00:13:00] can like, understand what you're trying to communicate. Uh, so this notion of catalyzing, uh, an organization's learning journey is something that you'd have to take on. So you foster a culture of learning.

And couple of other examples that work well for us at Roblox was, um, one of our, uh, HR bps, uh, took, uh, all the expectation matrices that mostly is very hard to understand. Put it into notebook, LM created a podcast out of it. And then managers are much more able to reason, oh, this is what you actually mean as a difference between different levels.

So it's a much better way for like, you know, for someone to, uh, uh, use the tool, but also like easily add value to the enterprise on an area which was like very nons shiny. The, what you see on the left is we built a conversational engine, uh, where, uh, you know, you can have, uh, conversations with, uh, non-playing characters.

So if a manager wanted to go and say, I wanna understand compensation, I really don't understand compensation, and these are like, I've got some very basic questions, they can do it in like a very psychologically safe environment, which is also fun. Similarly, if you're like, I wanna give feedback, I dunno how to [00:14:00] give feedback, you can go and play with this NPC, which does a conversational engine with you.

So these were ways in which we said this is coming and this is, you can embed this into your workflows, right? So first is you lead the change. Second is you foster this culture of like continuous learning and experimenting. That's there. The third thing is like, most people are like, where do I start?

Right? And this is a table that is roughly to give you some idea of potential areas of success. And again, it's opinionated. It can, uh, it could be, i, I, this is my strong conviction, but lightly held, and I'll change that with data, but wanted to share some of my thoughts on that with you. Is first is, is there a tolerance for probabilistic outcomes?

Because LLMs intrinsically are probabilistic. Right. So if you, I don't see, uh, any CFO saying anytime soon, take all my transactions, run it through an LLM, and then do a CC filing automatically for me. That's not gonna happen, right? You, it is. That's a very deterministic workflow. So if there's a tolerance for probabilistic outcomes, then that's well suited for ai.[00:15:00]

The second, which is an interesting insight, is many have looked at AI as, uh, some form of an efficiency optimizer. And hence they've looked at the EBIT side of it, but it is much more applicable in the innovation side than in the EBIT side. Right. So, uh, this has been, uh, there was a recent paper on this to say that the innovation areas where you have a co scientist.

Uh, or like, you know, a, a researcher who's helping you get much better are the areas where it's currently applicable. So I think like focusing on that will give you higher ROI than if you try to change like, the efficiency. And we'll actually do a fun exercise with a workflow where we, where we can go through it.

And of course, third is obvious. Uh, it's better for knowledge work. Uh, it's not there for physical work is we still have to do it ourselves. Uh, uh, but the knowledge work is an, an area of ai. Uh. I believe the other piece, which you all see is there is going to be enormous amount of change management and you gotta pick an area where the openness to change management is extremely high.

If you feel that that area is kind of medium or low to openness to change [00:16:00] management, don't waste your time right now. It can be something that you do in the future. Uh, we talked about leadership, buy-in, training data. A great example of training data is coding. The reason, uh, it's works in software engineering so well is the training data for software engineering is exception.

Know, and hence it's able to generate like code. So you want to think about how much of training data do you have, uh, for the area that you're doing. And you've got to be very iterative in this. So the tolerance for iterations has to be really high. So if you're going to attempt it in a business function where the first sign of failure people are going to give up, you might as well not start.

Right. So that's a negative way to say it. The positive way to say it is go to people who are like really excited to do multiple ations. All good so far? Yeah. Keep going. Yeah. Okay. All right. Um, now the second version of this is where are people finding success? So if you want to see, the way to read this chart again is, uh, on the top on the columns are different industry verticals.

On the rows are different business functions, and the bubbles are the size of percentage of, uh, uh, uh, [00:17:00] business functions that feel that they have reached some scale of work scaled workflow with ai. So when you see. Um, uh, let's take a column and say a technology and software engineering, and you see the bubble of 24.

It says that 24% of organizations feel that they have reached scale on that business function. What's another way to read this chart is you probably have a better chance of finding success where people have already found success. So if you're going to go and do a supply chain, uh, uh, our inventory management and my, my engine room days were in supply chain management.

They're, again, very deterministic workflows. You cannot, you're not going to, uh, you know, have a, a probabilistic foundation to get success in that. So this gives you, again, a map of what are areas where you could go and attempt this in, right? So, lead the change, foster a culture, understand what are the areas of impact.

And then this is, uh, I'll not go too deep in it, but somebody in your organization has to own what is the technical architecture that we're going to use in this agent world? [00:18:00] And some of the pieces of a technical architecture is like, you know, what's the common user interface? It's not just about a sparkle that you turn on on different products.

What's the common user interface? Is there a reasoning agent which figures out what's a task that needs to be done? What's the workflow engine that one can use? Uh, you need to have memory because you need to persist, like, you know, the knowledge of the organization across. Uh, and then there is a, a, a system of connections to multiple, uh, existing systems.

So you need a way to connect to your systems of record, be it workday, be it, uh, Salesforce, be it SAP. So that's the control plane. And then all of this has to have an underlying infrastructure. Yeah. Now at Roblox we are about 80% engineers and you know, you have an engineer running the people function, so we tend to be like much more technical and we've built a lot of these around open source.

Uh, but you don't need to do that. Uh, it's just don't, don't feel intimidated by this. But you can choose any of the hyperscalers. They all have an architecture that helps, uh, uh, you know, bootstrap you and like really accelerate you on this. [00:19:00] But my point of this is, uh, there has to be a technical architecture underlying that compounds.

Because every project that you do cannot be a project where you start from zero. Again, it has to have the learnings of the past. You good? Okay. Uh, the next chart was just a visual of how we, we envisioned it, where we said like, Hey, what if we had a, a place where people can go and look at what are the agents everyone has created so they can say, oh, that agent, uh, say the shark created is like, really cool and it's got lots of success.

Let me pick up that agent and use it for myself. Uh, is there a place where you can go and look at workflows and like run those workflows? Is there an MCP store? So this was just a different visual version of the technical architecture that I talked about, but the point is you gotta have somebody in your organization saying, I'm gonna own the technical architecture for transformation into this new world.

Okay, I wanna take a slight detour now. One of the most important things that LLMs are harnessing is the knowledge that you have in the [00:20:00] organization. Does that make sense? Okay. It is the, the, the, the knowledge basis You've created your emails, your Slack, your Atlassian, your Jira, your GitHub code repositories.

All of this is the foundation on which an LM is going to be used and deployed in a workflow. But the way we have built knowledge graph, we have not paid any attention to the knowledge graph that's been built in organizations. Almost no organization has got a knowledge graph which brings these data of these different systems into a place that can be harnessed as context for an l.

So one of the things that somebody in your organization has to think about it is how are we carefully curating our knowledge graph? We all know really well that one of the key things for an LM now is the context that's given to it, to answer a question. Yes. And that context comes from a knowledge graph.

So as you think about a technical architecture, also think about how are we building this knowledge graph that is, that is going to compound for us in the future. So that's the technical piece of it. I wanted [00:21:00] to like share one thought, which is, um, the key, uh, a strong assertion is like we all feel with AI, work is going away, but I don't believe work is going away, but workflows as they exist are going away.

Okay? Now here's where we are going to do a fun exercise together. Imagine the, with the LLMs, what do we have? You have an unlimited, unlimited, infinite. Or as my 7-year-old calls it, it's Infinity a hundred, right? So it's a, you have an unlimited supply of interns, okay? Who dazzle you with their brilliance?

You got this. You all of us have this right now, but they're unpredictable. You can't say I can't use them. You are infinite supply and you can't just deploy them in the wild because they're unpredictable. That's kind of what you have right now. Okay? Now, having said this, let's design a customer support system.

You ready? Yeah. Okay. [00:22:00] I'll tell you the way to not do it to start with, and then let's together decide the way to do it, the simple way to not do it is right here a user goes to Slack. You turn on a fancy AI tool In Slack, the fancy AI tool reads of a knowledge base and answers a user's question. How many of you have done that?

Nobody. Okay. Yeah. Oh, there is like some company who's pitching you this and saying, Hey, go to go to Slack, turn this on. This will work, and they'll answer all your questions and you'll be fine. What's the problem with this

context? Wrong? Wrong? I mean, context is missing. Context is missing, which leads to wrong answer, wrong answers. Some wrong answers. Yes, some wrong answers. So hallucination. Hallucination, again, leads to some wrong answers. So again, brilliant interns, whatever is the size of your knowledge base, they can read, but they're unpredictable.

And both these problems are true. And when we turned this on, I [00:23:00] wanted to share one of the quotes. We got a lot of. A bunch of engineers and you know, engineers are not shy about sharing their opinions. So we got this, uh, fun quote, which I thought was good, is like, I find support bot, I'm going to, uh, hide the name of the actual bot we used.

Uh, usually confidently gives me answers and is just wrong 50% of the time, which means it's answer is useless to me a hundred percent of the time. And how many of you resonate with this? Yes. Right. And that's the, that's again, a very, uh, nice way to say that if something is going to be right, even if I get 50%, if it's gonna be right, 80%, your trust is 0%.

Right, because you don't know which 20% of it is wrong. Okay? So that's what we did, but we still haven't designed the right system yet. So let's rethink this. What would you do to protect for this? Wrong answers, hallucinations. Now we know that this simple model that we tried, which was this one is not gonna work.

We can buy it easily and tell someone who's annoying us about how have we done AI transformation, that we have done AI transformation, but that's the only box it's gonna check. It's not [00:24:00] gonna add enterprise value. So how will we change this? Have some humans kind of verify stuff? Humans verify, yes. Other ideas,

Guest 1: train it and provide more authenticated review data?

Arvind KC: Yes. Train it and provide more better data. Ask it sources. Ask it to site sources? Yes. Would you expose it to the user directly. Would you send your intern to all your customers? No. Okay. Would you hide the intern?

You might, you might hide the turn behind an expert to say, don't go tell them you, you're brilliant. Come and talk to me and let me tell you, let me like, see whether you can go and talk to them. That it's as simple as that. Right? As you're gonna say, I love your brilliance. I want you to get, take all questions.

I want you to answer these questions really well, but please don't go and tell others that this is the answer yet. I want you to come back to me, for me to check that this is right before you do it. Right. So all these are right answers and the flavor of [00:25:00] the version that we ended up was something like this.

And let me just like walk through this because it's a little useful. The user send a request to Slack and we said, okay, we're not gonna jump in with excitement to answer you right now. We're gonna send it to Jira and stir it, and then we're gonna craft a prompt and send right context. I think you were sharing like don't just ask it to respond.

Craft a good prompt based on it. And that goes to the fancy AI tool, which generates an answer. The moment it generates an answer. We have two, two, uh, forks happening. One is the LLM as a judge, and the second LLM comes and says, is this answer like, does it roughly sound right? Just score it as like high, medium, and low.

And the second is a human who's a domain expert, who also scores the answer and says, is this a good answer? Yes, no, or, uh, uh, high, medium, low confidence, and generate it. And then you generate a final score. And then if it's high score, then you say, come back and write it to Jira. Go to Slack and talk to the user.

If it's a medium score, let's adjust the prompt and try again. Maybe there's something wrong we are doing in the context. Let's see if we can give better context. [00:26:00] If it's a low score, give the feedback back to a domain expert who will work with an agent flow expert to improve the system. Yeah. Now again, this is a simplified workflow.

Actually workflow will be a little bit more, uh, messier, but this is a form of how you would actually deploy it in the wild so that you are leveraging it for the infinite capacity task, but protecting it for the propensity to make it risk. What's an ag agent? Flow expert? Oh, I, I think of this as someone who can choose your, uh, technical system.

Maybe it's N eight N, which is the, a simpler version to do it. Maybe it's land graph, but someone who can then say, oh, this is not good, because it's not reading data from that source. Let me add that also as an input so they're able to reconfigure this workflow that is there. So think of it as a technical person who's adjusting the workflow on how you need to work.

So you have a domain expert who's saying, Hey, intern, your answer is not really good. And the domain expert is talking to someone to say, I think this [00:27:00] answer's not really good because this person is not, this intern is not talking, uh, to this particular, uh, base. And maybe we should change the knowledge sources.

Maybe we should change the prompt. And that person has got the ability to do it. Does it help? Okay. So what does this mean, right? Is it really means that realizing the promise of AI is not going to be about turning on a sparkle, right? Every, uh, company, which just says, Hey, I now have a sparkle. I'm now an AI company.

It's not, that's not the way you're going to be successful, but actually it's going to be, it's going to require like rethinking workflows, configuring systems, and training people to play a different role. In other words. The work, the, the change is going to happen in the underbelly of workflows and not in the simple sparkle that's there.

So if someone is saying like, Hey, have a sparkle, turn it on, you'll just be much more efficient. It's not, that's not the way it's gonna work for a, for a long period of time. Eventually we may get to the world, but for a long period of time we gotta think about [00:28:00] how do I think of this capability that is there, which I cannot ignore, and how do I reimagine and rethink the workflows that are there, Y are with me.

Yeah. Okay. So now I wanted to, uh, uh, briefly side, uh, track into the importance of evals. So, when you deploy a human workforce at scale, you got to do it with, I mean, the, one of the things that you'll do immediately is to think about what is a, what is, what is the system I have for evaluating performance?

Who's doing their job well, who's not doing their job well? Are they meeting their expectations? You gotta do it. The same thing when done for an agentic workforce is evals. Okay, so you have to know, like, and I, I have my, uh, speaking to Tohar for a while back, and we were, uh, uh, discussing this is when, when, when, when, when, when many software vendors talk to me and say, I got this AI thing.

I usually tell them, share your eval scorecard with me. I will take the meeting with you if you have a very detailed eval scorecard that tells you [00:29:00] how well your model is doing, and let's break this down a little bit. There are three pieces to it. The first is just a technical evaluation. Which is what's the precision, what's the accuracy?

What's the hallucination rate, what's the perplexity? It really is a question of have they fine tuned the model for that use case? The second is the business outcomes, right? It's like, is it achieving the business outcome you want? And I'll give you a simple example. We had turned on, uh, just transcription service.

Uh, to summarize transcripts from interviews, uh, seems really simple. Seems like a easy even AI use case. But what is the most important thing over there? The transcript summary has to be accurate. And if you do something as simple as like, this summary, did this represent your conversation? And like have people score it, we were surprised by the low score that it got.

Right? So, and like, what, what we had to do is we had to change the prompt. We had to re-engineer the prompt. We had to give it a little bit more context. And this was the type of interview, uh, and you know, and then there were, you're not going to, in an interview tell someone like, that's a stupid answer. So you gotta like, you know, figure out ways to incorporate like the user feedback into it.

So it just becomes like [00:30:00] a much more involved exercise. And then that's why there's this importance of evals, and that's what I mean by program evaluation is did it meet the business outcomes? But it's not a one time, you know, do it, shut it and forget it. There's an ongoing evaluation that needs to be part of your process because models change, data change, workflows change, right?

So you need to have an evaluation system in this new agent world. So, uh, this is my, uh, summary of all my time leading it and Facebook, Palantir, Google, and Roblox is, there's only a single characteristic of an exceptional system. And an exceptional system is one where systems and humans learn and improve with usage.

That's it, right? If you're using a system where the system does not improve usage, or the human does not improve usage, you're getting a fine system, but you're not gonna get an exceptional system. All right. And if you think about what's happening with LLM uh, inference, uh, when LLM is actually answering something at inference, it's not learning anything.

It's learned a bunch during training. It may have a little bit of [00:31:00] RLHF happening, but during inference it's not learning anything. The system is not improving with usage. And similarly, humans often are doing this cognitive offloading where they take something that comes from an LLM, don't spend time thinking about it and then use it elsewhere.

And humans also are not getting, uh, you know, better with usage. What's the point of this is for the long run, you got to be really careful of the learning loops that exist in the system. You got to make sure that these learning loops either for the system and the humans are nurtured. So let's take back to our workflow that was there, right?

We had this workflow and if you kind of zoom in. The importance of this step is follow is a following the agent tick, flow expert. And to answer your question a little bit more is, is the person who's constantly fine tuning the system and making the system better with usage, saying, you're not answering this correctly.

How can I help it? And then the domain expert is acting as a coach. And we all know that when we teach, we learn much better, right? So there is, that's how you are reinforcing the learning loops as opposed to [00:32:00] getting humans out the system. And I, I've been, I mentioned this briefly, there was a very good paper that came from Microsoft, which, uh, uh, this year, which talked about cognitive offloading.

And the idea of cognitive offloading is, uh, people who are very confident in, uh, in, in the skills of AI don't apply critical thinking. And people who are very confident in their own skills, even if they use ai, they apply critical thinking. In other words, if you, if you rely on the tool too much. Then you're going to offload the cognitive work to the tool and you're not going to actually get better in the task, right?

So you have to think about, as you're both forced for yourself, for the company, for the culture you're creating, what are you going to do to ensure that the cognitive offloading is, is, is like some way contained and, and the workflow was one example of that. But the domain expert is not just blindly saying yes.

They have to figure out a way to rate the answers that comes from the system and then that they continue to apply their critical thinking that's there. [00:33:00] So, um, I think that when you think about the future, our roles are going to change, and the way it's going to change is that we all become coaches of some sort.

And really the future is not going to be how we do things. It's going to be how we teach things, right? It just, you just imagine a world where, um, uh, you know, this is maybe five years, maybe 10 years, but you have infinite agents that are available at your disposal, and success is going to be. Uh, you're going, how do you deploy this?

Right? So really, like in the, in the future, success will be our ability to design these processes. It'll be our ability to wield these agents and evaluate these agents, but I think most importantly, it'll also be our ability to retain our critical thinking when you do this.

Okay, I said a whole bunch. Let's try to capture these, right? What are the different things we talked about? Lead the change. It starts with you. Second is you've got to foster a culture of learning. [00:34:00] Third is pick the right areas of impact that those were the sections where you said most people are having success.

Over here we talked about characteristics where you can be successful. Fourth is either you develop the a technical architecture or you have someone in an organization responsible for building the technical architecture. Fifth is, remember, it's the underbelly of workflows where magic happens. So work is not going away, workflows are going away.

So you get really good at redesigning workflows, then reinforce learning loops. So don't fall into the strap of cognitive offloading either for you or for the organization, and then just be prepared to change everyone's job. I think all of our jobs are going to change. Uh, I see this as, uh, I mean, I've been through multiple waves of technology right from Y 2K.

To, you know, the web, to mobile to, I'm gonna skip crypto. I don't think that's a real trend to cloud and to, and to ai. I think this is probably like the biggest wave that's hitting us and uh, you know, you can frame this as a [00:35:00] threat. You can frame this as an opportunity. I really think it's an opportunity because we are entering an era where we are going to have all of these come and help us accelerate productivity, but there is not going to be an easy path to turning it on and getting magic.

There is going to be a hard but fun path, and the hard and fun path is the work that's ahead of all of us. If you embrace this hard and fun path, then you know we are all going to be a one God when it comes to leading this change.

I think that's it. Thank you.

‍

Frequently asked questions

You don't need an engineering background to get started. Arvind recommends identifying a single use case where your team has high tolerance for iteration, then partnering with someone who can own the technical architecture while you focus on workflow design and change management.

AI won't fix broken alignment—it will expose it faster. Before layering in agentic workflows, Arvind recommends getting cross-functional buy-in and clear ownership of the process you're trying to improve, otherwise you're automating a problem rather than solving it.

The most common failure mode is deploying AI on top of an existing workflow without redesigning the workflow itself. Arvind's framework puts workflow redesign at the center—the technology is only as good as the structure around it, including how outputs are evaluated and how the system improves over time.

Ask for their eval scorecard. Arvind uses this as his first filter when evaluating any AI vendor—a credible one should be able to show you their model's precision, hallucination rate, and evidence that it drives real business outcomes, not just technically accurate outputs.

Start where there's a high tolerance for probabilistic outcomes, strong leadership buy-in, and enough existing data to train against. G&A functions tend to see early wins in knowledge work like summarization, internal Q&A, and manager enablement—areas where iteration is low-risk and feedback loops are fast.