START SELLING WITH BigBCC TODAY

Start your free trial with BigBCC today.

BLOG |

AI still doesn’t work very well in business, reckoning soon • The Register

AI still doesn't work very well in business, reckoning soon • The Register

Table of Contents

interview Enterprise organizations are still struggling to figure out how AI fits into their business, and that may be for the best because it will take time to understand any problems caused by AI-generated code and content.

“No one knows right now what the right reference architectures or use cases are for their institution,” said Dorian Smiley, co-founder and CTO of AI advisory service Codestrap, in an interview with The Register. “A lot of people are pretending that they know. But there’s no playbook to pull from.”

Smiley and his co-founder, CEO Connor Deeks, did time at global consultancy PwC and have set up their own shop to help shepherd organizations toward an AI strategy.

They argue that companies chasing AI have gotten ahead of themselves.

“From the large language model perspective, people aren’t really addressing the fallibility of the underlying text,” said Deeks.

Deeks argues that if you built an AI system from first principles, it would look drastically different from what’s offered today. All the talk about the disappearance of software engineering and office work, he said, “we don’t subscribe to any of that.”

He also contends that companies don’t want to believe that either. “For the most part, they don’t want to believe that everyone’s going to be fired and there’s not going to be anyone underneath them, particularly in the technology or information organizations inside these institutions,” he said.

Missing metrics

The first step for organizations considering AI is experimenting and iterating in a feedback loop, Smiley argues. And the reason for that, he said, is that AI still doesn’t work very well.

“Even within the coding, it’s not working well,” said Smiley. “I’ll give you an example. Code can look right and pass the unit tests and still be wrong. The way you measure that is typically in benchmark tests. So a lot of these companies haven’t engaged in a proper feedback loop to see what the impact of AI coding is on the outcomes they care about. Lines of code, number of [pull requests], these are liabilities. These are not measures of engineering excellence.”

Measures of engineering excellence, said Smiley, include metrics like deployment frequency, lead time to production, change failure rate, mean time to restore, and incident severity. And we need a new set of metrics, he insists, to measure how AI affects engineering performance.

“We don’t know what those are yet,” he said.

It’s 3.7x more lines of code that performs 2,000 times worse

One metric that might be helpful, he said, is measuring tokens burned to get to an approved pull request – a formally accepted change in software. That’s the kind of thing that needs to be assessed to determine whether AI helps an organization’s engineering practice.

To underscore the consequences of not having that kind of data, Smiley pointed to a recent attempt to rewrite SQLite in Rust using AI.

“It passed all the unit tests, the shape of the code looks right,” he said. It’s 3.7x more lines of code that performs 2,000 times worse than the actual SQLite. Two thousand times worse for a database is a non-viable product. It’s a dumpster fire. Throw it away. All that money you spent on it is worthless.”

All the optimism about using AI for coding, Smiley argues, comes from measuring the wrong things.

“Coding works if you measure lines of code and pull requests,” he said. “Coding does not work if you measure quality and team performance. There’s no evidence to suggest that that’s moving in a positive direction.”

No free lunches

Deeks pointed to the recent outages at Amazon and AWS – incidents Amazon has insisted have nothing to do with AI – as indicators of what’s to come.

“The other way to look at this is like there’s no free lunch here,” said Smiley. “We know what the limitations of the model are. It’s hard to teach them new facts. It’s hard to reliably retrieve facts. The forward pass through the neural nets is non-deterministic, especially when you have reasoning models that engage an internal monologue to increase the efficiency of next token prediction, meaning you’re going to get a different answer every time, right? That monologue is going to be different.

“And they have no inductive reasoning capabilities. A model cannot check its own work. It doesn’t know if the answer it gave you is right. Those are foundational problems no one has solved in LLM technology. And you want to tell me that’s not going to manifest in code quality problems? Of course it’s going to manifest.”

New metrics are essential, Smiley argues, because we already have millions of lines of AI-generated code that humans will never review.

In the context of business applications, Deeks pointed to the refund consultancy Deloitte had to give to the Australian government because of a report that contained AI-generated errors.

“We know that big consulting is now adopting this at scale to write their PowerPoint decks,” Deeks said. “That’s going to manifest into huge lawsuits and lost money because the quality isn’t actually being tracked. Everyone has believed this fairy tale story that it’s just perfect already.”

Smiley expects that the application of AI to office work will encounter similar problems to those that AI has when applied to coding. But spotting AI errors will be more difficult due to lack of benchmark tests for hallucinated business advice.

“The other challenge here is that the incentives are misaligned,” said Smiley. At big four firms like PwC, he said, the partner wants more revenue and higher margin.

“You give them AI – what are they going to do?” he asked. “More work, less human work. So you get more revenue, higher margin. That does not lend itself well to saying all the humans on the team will use AI but review every output of AI. Those incentives don’t align. The incentive for the director is to stop talking to the associates, because the associates don’t know anything. [The director is going to] use AI to do the work of the associates. For the associate, the incentive is to get the work done faster and go to the beach. All these incentives are not aligned in a way that makes AI complementary to the business and deliver outcomes.”

Companies will ask for discounts when they know a service company is using AI

Smiley predicts “problems related to code quality that surface in eight to nine months for people who are heavy users of AI.”

Deeks foresees a growing number of lawsuits because that’s what happens when bad advice causes problems.

“People are going to continue to start to feel the pressure of ‘I have to adopt this stuff, I have to make AI decisions.’ They’re going to put this stuff into production, whether it’s in a business workflow or in an engineering group. And that accelerated collapse is then going to cost a lot of people their jobs.”

Another likely outcome, said Smiley, is pricing pressure – companies will ask for discounts when they know a service company is using AI tools.

Deeks said extreme pricing pressure is starting to surface. “Even KPMG pressured another accounting firm to lower their price because they’ve been saying they use AI,” he said. “Customers are now saying things like, ‘Oh you’re producing your PowerPoint decks with AI. Well I want to pay you less.'”

Another looming problem is that large insurers have become wary of underwriting policies that cover companies against AI risk.

“Insurance underwriters are seriously trying now to remove coverage in policies where AI is applied and there’s no clear chain of responsibility,” said Smiley. “So now let’s imagine you’re the big four and you do get sued and you are having pricing pressure applied, the market’s outpacing your ability to adapt, and now your underwriters are telling you, ‘oh by the way we’re not going to cover you.'”

Deeks said “One of our friends is an SVP of one of the largest insurers in the country and he told us point blank that this is a very real problem and he does not know why people are not talking about it more.”

Insurers, he said, are already lobbying state-level insurance regulators to win a carve-out in business insurance liability policies so they are not obligated to cover AI-related workflows. “That kills the whole system,” Deeks said.

Smiley added: “The question here is if it’s all so great, why are the insurance underwriters going to great lengths to prohibit coverage for these things? They’re generally pretty good at risk profiling.”

Deeks said that rather than citing these issues as a sign of impending collapse, he hopes people in the industry will find the motivation to talk seriously about the problems that need to be overcome.

“Can we actually have a conversation about it?” he asks. “Is anyone going to talk about the opposite of AGI [artificial general intelligence] and how it’s going to take over everything in a utopian future?”

We need to be clearer, Deeks contends, about what AI means for finance, for underwriting, and for actual business and the practical operation of business systems. ®

Source link

Share Article:

The newsletter for entrepreneurs

Join millions of self-starters in getting business resources, tips, and inspiring stories in your inbox.

Unsubscribe anytime. By entering your email, you agree to receive
emails from BigBCC.

The newsletter for entrepreneurs

Join millions of self-starters in getting business resources, tips, and inspiring stories in your inbox.

Unsubscribe anytime. By entering your email, you agree to receive marketing emails from BigBCC. By proceeding, you agree to the Terms and Conditions and Privacy Policy.

SELL ANYWHERE
WITH BigBCC

Learn on the go. Try BigBCC for free, and explore all the tools you need to
start, run, and grow your business.