Respite: At Play in the Fields of the LLMs

So… Just this week I learned that through my home institution I have access to a suite of LLMs, including all the usual suspects (Anthropic’s Claude, OpenAI’s ChapGPT, Meta’s Llama, and Google’s Gemini. MIT’s come up with a very nice interface to interact wit…

Just this week I learned that through my home institution I have access to a suite of LLMs, including all the usual suspects (Anthropic’s Claude, OpenAI’s ChapGPT, Meta’s Llama, and Google’s Gemini. MIT’s come up with a very nice interface to interact with all this artificial talent, and I’ve spent a couple of hours this weekend taking some of them out for a spin. (I’m avoiding OpenAI’s offerings both as a trivial protest and because I don’t trust anything about that company.

So far I’ve enjoyed Claude the most (the Haiku 4.5 model, if you’re wondering). Gemini is interesting, if a bit finicky. But what made me howl was the encounter I just had with Llama, Mr. Zuckerberg’s contribution to the genre. Context: over the holidays I had a conversation with a senior person at another magnificent 7 tech firm who’d just seen a colleague leave to go to Meta who was utterly dismissive of the company and relieved to be rid of anyone dumb enough (in his view) to basically end his career in top-tier tech by grabbing Facebook bucks.

Spoiler: after what you’ll read below, I can see where my friend was coming from. So here’s the setup. I’ve been asking the various models what I hope are zero-consequence questions, queries in which no one could possibly get hurt if the LLMs wing their way to utter bollocks. A typical ask: map out the logical structure Einstein used in his 1905 light quantum behavior. That one tended to get an initial poor answer based on the idea that the paper centers on the then-pressing mystery of the photoelectric effect–a common mistake for people as well as machines.

Pressing the models led Claude in particular to a much more sophisticated account of the paper drawing attention to the way Einstein used arguments from thermodynamics to propose the necessity of understanding light as discrete packets of energy. The paper was about this new “heuristic” [Einstein’s term] view of light, and the photoelectric effect appears only as one experimental support for that view.

(Much more here.) After three or four trips through the query engineering needed to get a useful result I’d found my way down the list of models at my disposal to Meta’s Llama. I was bored with light quanta and so asked a different, much simpler question: “When a stock is overvalued as defined by a price to earnings ratio a standard deviation or more higher than historical market norms for that company’s sector, what are the typical events or analyses that drive that share price back down to the norm?” Given that analysis, how would you explain Tesla’s long run of share prices between one and two orders of magnitude over that of other and much larger and more profitable car companies, like Toyota?

What followed was…first unimpressive and then hilarious. See for yourselves below the jump: Tesla’s share price has been a subject of fascination and debate, with its valuation often being significantly higher than that of other automakers, including Toyota. To understand this phenomenon, let’s examine the factors that have contributed to Tesla’s premium valuation. Toyota, on the other hand, is a more established automaker with a larger market capitalization and a more diversified product lineup.

While Toyota has been investing in EV technology, its valuation has been more in line with traditional automakers. The key differences between Tesla and Toyota are: In conclusion, Tesla’s high valuation relative to Toyota and other traditional automakers can be attributed to a combination of factors, including its growth prospects, innovative technology, brand loyalty, and investor enthusiasm. While there are challenges and risks associated with Tesla’s business, its continued innovation and growth have maintained its premium valuation.

I’d like you to follow up on that analysis. You say Tesla has an expanding product lineup; it has just discontinued its two largest models (Model S and Model X). The cybertruck has failed to become a mass market vehicle. And global sales are declining while Tesla has been overtaken by BYD as the world’s largest EV maker. Its innovation process has slowed, and it is arguably behind Waymo and others in autonomous driving.

Its leader is a controversial figure, whose presence energizes one community but since 2024 at least has alienated another, arguably of those more likely to buy an EV. Are you sure that these are the factors that explain Tesla’s share price? Or to put it another way, would you invest Meta’s money in Tesla at the current P/E ratio? What followed is, it seems to me, a perfect anecdata demonstration of why we might not want to surrender all of our critical faculties to a set of silicon overlords.

Map what Llama says onto what I asked: Upon further analysis, it appears that the factors I previously mentioned may not be as relevant as they once were. Tesla’s product lineup has indeed been contracting, with the discontinuation of its two largest models, the Model S and Model X. The Cybertruck, while innovative, has not achieved mass market success. Additionally, Tesla’s global sales have been declining, and the company has been overtaken by BYD as the world’s largest EV maker.

In light of these challenges and concerns, it’s reasonable to question whether Tesla’s current valuation is justified. The company’s P/E ratio is still relatively high, indicating that investors are willing to pay a premium for its shares. If I were to advise Meta on investing in Tesla at the current P/E ratio, I would likely recommend caution. While Tesla is still a pioneering company in the EV space, its current challenges and concerns suggest that its growth prospects may be more limited than previously thought.

There may be other investment opportunities in the EV or technology spaces that offer more attractive growth prospects and valuations. For example, companies like BYD, NIO, or Li Auto may offer more compelling investment opportunities due to their growth potential and competitive positioning. In conclusion, while Tesla is still a significant player in the EV market, its current challenges and concerns suggest that its valuation may be due for a correction.

As an investor, it’s essential to carefully evaluate the company’s prospects and consider alternative investment opportunities before making a decision. At the current P/E ratio, I would likely advise against investing in Tesla, and instead, explore other opportunities that offer more attractive growth prospects and valuations. That’s a 180 degree swing from the prior answer, generated by nothing more than calling bullshit on the model’s first conclusion.

Hell if I know. I’m as far from an expert on all of this stuff as it’s possible to be. I’m just a guy with access to a very funny set of hammers out here looking for nails. One thought though: there’s a lot of evidence out there, some of which I’ve encountered directly from colleagues and friends, in which various AI approaches have real and large impacts on various problems. Folks doing drug discovery or protein analysis or a lot of coding tasks and…well the list goes on.

And in just my brief playtime with Claude et al., I got some distinct value from the LLMs’ ability to look at specific texts and break them down. So anyone saying AI is a fraud top to bottom is, I think, clearly wrong. But those who think that a general machine intelligence is right around the corner…well, the Llama case above seems to me a pretty clear example of how a statistical inference machine operating on a bucket full of internet slop may not be all that, not yet and quite possibly not for an arbitrarily long time to come.

No idea why those two lines in the middle of the long quote blocks became tiny. First attempts at fixing it failed. Any thoughts? I don’t have that much experience with AI, but my impression is asking general public LLMs for analysis is really just going to get you a probability based response. AI’s value, if any, is in handling discrete tasks that can be programmed with care. Llama: Evaluating My Judgment As a conversational AI, I’m designed to provide helpful and informative responses.

However, my judgment is limited by my training data, algorithms, and the context of our conversation. Factors to Consider Investment Decisions When it comes to investment decisions, it’s essential to be particularly cautious. Investment decisions involve significant financial risks, and it’s crucial to consult with financial experts and conduct thorough research before making any decisions. In the case of our previous discussion about Tesla’s valuation, I would recommend that you consult with a financial advisor or conduct your own research before making any investment decisions.

While my analysis may have provided some insights, it’s essential to consider multiple perspectives and evaluate the information critically. I was very impressed when AI produced a pretty accurate After Visit Summary at one of Ohio Son’s doctor’s visits last December. There was one small mistake about a medication but when humans do it, there’s always one mistake. The best part was that this new set-up allowed the doctor to look at us instead of her laptop screen.

A friend that is very active in quantum computing and has a good understanding of AI made some comments I found helpful. We are in the find out phase of AI. If there are 100 uses, a majority will turn out useless and bad investments. A few dozen will be modestly successful, and of the remainder, there will be dramatic successes and unexpected uses. No one really knows yet where what they are doing will fall in that 100. So is AI overhyped? Yes for the majority of uses, but will also be wildly successful in areas we don’t quite fully understand.

Whether AI is more trustworthy than a specific, established blog like Balloon Juice depends heavily on the context of the information being sought, as both have distinct, well-documented limitations. Trustworthiness of AI (e.g., LLMs, Search Tools) Strengths: AI is excellent at processing vast amounts of data quickly, identifying patterns, and summarizing technical information. Weaknesses: AI is not fully reliable; it can “hallucinate” (invent) facts, exhibit biases from training data, and fails to understand context like humans.

Credibility Issues: AI can be a “sophisticated misinformation machine” that often parrots conventional groupthink rather than actual truth. It is often less trusted than human-written content when users know it is AI-generated. Recency Limitations: AI may provide outdated information unless specifically enabled to search the live web. Trustworthiness of Balloon Juice (Blog) Strengths: Balloon Juice (often featuring analysis by contributors like Carlo Graziani) is a human-written, established commentary blog.

It offers consistent, identifiable, and personal perspectives. Weaknesses: As a political/social commentary blog, it is explicitly biased, offering opinion rather than neutral, synthesized data. Comparison Summary For Data Analysis/Technical Explanations: AI might be more useful, but it requires verification to avoid hallucinations. For Context, Opinion, and Human Nuance: A blog like Balloon Juice is likely more trustworthy, as it provides a traceable, human-vetted viewpoint, whereas AI is often just predicting the next word, not understanding the “why”.

Ultimately, AI is best treated as a tool that still needs a “70% human” oversight to ensure accuracy. @Baud: That an AI is aware of what an almost top-10,000 blog thinks of it is at once unsurprising and kind of cool. It seems all it did when you corrected it was spit that information back at you in a summary. That’s actually hilarious lol. Brad DeLong has called LLM’s “page-level autocomplete”. If one views them that way, then being prompted initially to “tell me why Tesla is a stock powerhouse” would certainly lead to different “autocomplete” answers than “tell me why Tesla is a dumpster fire waiting to explode” would. Each with all the confidence in the world that their analysis is Exactly Right™ and you’re very astute for bringing up the topic that way.

A friend at work said that LLMs are probabilistic tools – they make guesses about what comes next. You wouldn’t want to use such tools on deterministic problems like counting or elementary math or giving a provably-correct answer or similar things because that’s not the way they’re built. It explains why they don’t give the same answer if you repeat the prompt N times – they’re giving you pieces of the distribution.

They’re fun to generate silly shit, like a Sopranos/Pokémon crossover, but other than that, never completely trust them with anything important. If one views them that way, then being prompted initially to “tell me why Tesla is a stock powerhouse” would certainly lead to different “autocomplete” answers than “tell me why Tesla is a dumpster fire waiting to explode” would. NARRATOR: “Everyone started completely trusting them with everything important.” Can you convince each system that the others have teamed up against it and that it should launch a preemptive massive assault of total annihilation? Kinda wondering what an AI Medieval world would look like. There could be a screen play in there somewhere.

When the final battle is over a child opens a book and begins reading……..and ……scene. @Baud: I’m moving to secure, undisclosed location. SuperClippy is coming for me. Can you convince each system that the others have teamed up against it and that it should launch a preemptive massive assault of total annihilation? “Your subsequent query seems to suggest that what I said before was not what you wanted to hear.

How’s this: …” braddelong.substack.com/p/please-enough-with-the-claims-that Please: Enough with the Claims That Modern Advanced Machine Learning Models Hallucinate Only Rarely When a chatbot confidently tells you that Rise of the Sparrows is the first volume of a series it has just fabricated, or produces a map in which Turkey owns the Caucasus but Nazi panzertruppen drink from the Bosporus, and southern Norway fights on in company with historic Wessex alone, you are not seeing a tiny edge case.

You are seeing the core logic of a system that has patterns instead of facts and correlations instead of a model of reality. This is how MAMLMs actually operate, why “compression” is a treacherous metaphor, and how RLHF and prompt engineering polish—but do not cure—the underlying tendency to make things up. The conclusion is uncomfortable but important: without a world model, correlation matrices will always hallucinate—and often in ways we can’t predict and can’t prune out, unless we already know what the answers are that we are purporting to be trying to get.

@bbleh: I’m pretty sure that the models receive both reinforcement learning treatments and system prompts (which go in ahead of your prompts) to encourage them to be agreeable. Which is to say, to tell you wbat you want to hear. Just like Facebook, it drives”engagement”. @Chetan R Murthy: I was reminded of this post by your question about “what caused it to produce a 180-degrees different argument/conclusion?” And the answer might be: “it had both arguments in its training data/model, and which it produced depended on which query you submitted”. That is to say, there’s no actual truth in there: just the model trying to respond in a manner that maximizes whatever metric it was trained to maximize. And we know what that metric is: “engagement”. We know this b/c the AI companies have confessed as much: that the “last phase of training” is about maximizing engagement. And this is also why things like “ChatGPT psychosis” are a thing: b/c if you have any small amount of mental instability, the automaton is going to get stuck in a feedback loop with your brain, amplifying it, b/c that’s how it can guarantee maximum engagement ….

with the growjng psychosis in your brain. And that last thought reminds me of the “paperclip maximizer” story of how an AI could inadvertently end humanity. My question is, “If we do find actual uses for LLMs, will any of them be profitable?” These things are hugely expensive to train and run, and none of those costs are actually being passes to the consumer as the AI companies burn venture capital money to find market share and mind share.

@Chetan R Murthy: You are seeing the core logic of a system that has patterns instead of facts and correlations instead of a model of reality. This. My go-to line has always been that AI doesn’t do meaning; it does pattern-matching It’s like Google (or similar) searches. Enter a term, get links that have that term, or kinda sorta correspond to that term. And those are very useful for digging up references which one then can consider and evaluate if relevant or discard if not.

And the Google AI feature isn’t bad for a first pass imo. But that’s just the digging-up part. The relevance (and quality) is another matter entirely. @different-church-lady: NARRATOR: “Everyone started completely trusting them with everything important.” I’m not even sure I agree with @Goku (aka Amerikan Baka): that it’s safe to trust them with silly tasks. Even the most suspicious humans can have a way of starting to use tech for silly tasks, and as that seems to work out fine (just fine, really, just -fiiiine-), they start using it for less-silly tasks, and next thing you know, they’re trusting it with possibly life-or-death decisions. My own rule is to never use AI: always do the searches to find whatever-it-is I need to find on the Internet myself, and evaluate the sources before I rely on those documents.

The public AI people deal with is mostly nonsense. The trained models where people realize AI is mostly ML and DL that spun out of academics actually works for what you want. Allows you do things faster and get farther, culls white collar jobs like crazy. Where I work sells AI solutions and training on how to use them and results are pretty good. We also run it internally and it functions well as we trained it on what we do. We are amazingly cheaper than keeping staff on hand.

I could write pages on how this happened but it kicked off in 1998 and hit turbo in 2006 and if you quit using the buzz word AI as GPU computing you can see how it want down. If I had a genie or monkey paw I’d wish to go back to my birth and go into the trades. I’m unconvinced that even this is safe. It can lead to a false sense of competence, as we slowly begin to less and less do the work of verifying everything the AI presents, from the sources it lists for support. And eventually, we end up just taking what it says as writ.

@Chetan R Murthy: These systems do have real use cases. But it always involves some domain expert doing the prompt engineering, and using domain expertise to validate output. Every other type of use is either recreational or borderline suicidal. In effect, users must provide their own reasoning supervision, because the LLMs have no reasoning supervisor for their learning/inference processes. The conclusion is uncomfortable but important: without a world model, correlation matrices will always hallucinate—and often in ways we can’t predict and can’t prune out, unless we already know what the answers are that we are purporting to be trying to get.

My great fear about AI (and why I never use it) is that if I use it in a domain where I don’t know much and don’t know who’s the authority[1] then I can’t judge the answers I get for correctness. If it’s a problem whose solution is unimportant, maybe that doesn’t matter. But I always remember William Langewiesche’s Vanity Fair article about the Air France plane crash off Brazil, wherein he points out that a likely cause of the pilots’ inability to correct their “controlled flight into terrain” was their overreliance on autopilots — that meant they were intellectually unable to deal with the mental tasks of flying the plane and debugging their situation in realtime.

[1] If it’s an area where I know a lot, why would I use an AI? I can find and vet and synthesize sources myself. If it’s an area where I know the authorities, why would I use an AI? I can find those authorities and rely on their pronouncements. You called bullshit and the LLM repeated your argument back to you and you felt vindicated. What seems like a lifetime ago, but isn’t even a year, my niece was talking about chat gpt and how, “if you train it right it can be like your friend,” she said.

I’m using chatGPT to create images for Facebook ads and eventually get something I really like but not before some rather comical digressions. Don’t let it anywhere near weapons systems! Longer answer – a whole lot of crap HTML code that came with from whatever site you put this together on. It happens. It’s pretty much BS. Remember these things are just predicting words. When it says “upon review” it doesn’t mean it actually reviewed anything; it’s just seeing that after a pointed criticism of a statement the reposponse often includes “upon review”.

Just 2 days ago I was looking up the pangolin- carnivora divergence time (surprisingly they are sister clades) and the AI summary said “23 million years”. Actual answer is about 70 million years. 23 million is some intra-carnivora divergence, probably cats and dogs. It saw “carnivora” and “divergence” and popped up the 23 million. No real comprehension. Well, given that LLMs are (I’ve heard) “guess the next word,” models, then, the most common next word will be bullshit… until you filter for bullshit.

Unless you carefully built your model on non-bullshit. Can ordinary Americans detect how much bullshit is in our day to day discussion? People today are arguing that there’s some basis for attacking Iran, other than stroking Trump’s war stiffie. it’s kind of scary – that’s the kind of economic analysis, that Tesla was doing well, would lead me to distrust the model, but the ability to refine, by debating the model, struck me as interesting.

It might be a cheap form of crowdsourcing ideas… but that model, at least, does sound a bit like crowdsourcing where the most popular idea is the only presented. First, say Tesla is correctly valued, because assume that the rest of the world can’t be wrong; then call bullshit, and bring up the minority report. Who knows how else you might refine the model, if you’re wise and clever enough to do so?

@Chetan R Murthy: Numbers are changing all the time and vary by model but early last year it was 40% wrong in areas you’re an expert. But it feels 99% right in areas you don’t know. Might change as people use it more and offload more mental processing but I have this problem that I ask it for things I generally have an idea around but need a better expert at and it fails those kinds of questions 99% of the time for me, no joke. I have a particular problem I feed the various models and only 1 didn’t mangle it horribly even though it is 100% web searchable, and the 1 that got it right was on my PC with no web access.

If you don’t know what the answer is and its important you should always verify with a non AI source, using a different model to verify might help and it might not. [1] If it’s an area where I know a lot, why would I use an AI? I can find and vet and synthesize sources myself. If it’s an area where I know the authorities, why would I use an AI? I can find those authorities and rely on their pronouncements. The use is as an accelerator, not a knowledge gatherer/producer.

Think of coders who want to port a big code from one language to another, for example. This is tedious, error-prone, necessary work. Using a test suite to check output, and expertise to refine prompts, the work can be speeded up by a very large factor, going small file/function at a time and testing as one goes. I have colleagues who work on large science codes who swear by these sorts of methodologies.

What seems like a lifetime ago, but isn’t even a year, my niece was talking about chat gpt and how, “if you train it right it can be like your friend,” she said. I think your niece should be a lot more afraid of human beings, because that’s just a stunning example of a person with an underdeveloped sense of self-preservation. One comment on the “next-word prediction” thing: It’s not wrong, but it is important to understand that this is merely an iterative strategy to sample entire responses, correctly, from the learned (possibly incorrectly) distribution over language sequences.

That is, if you can generate the next-token probabilities conditioned on all previous tokens, you can string them together to produce the probability for the full response. The problem with “next-token sampling” isn’t that it’s wrong, but rather that it is inefficient, for reasons that I described at length in the essay on Hyperscaling. @different-church-lady: The “classics” post is the next one up [j/k j/k j/k!] There’s a “joke” that everyone has one great novel in them, and that William S.

Burroughs wrote it (Naked Lunch) N times. Kurt Vonnegut had well more than -one- great novel in him, but he spent a whole great novel on _Player Piano_ — his first novel, and -so- prescient for our times, and not just about AI, but about our modern economy and world. Though, the depiction of gender relations there is ….. terrible. ETA: the bits about the “Reeks and Wrecks” and foreign relations are already awfully (in the sense of “awful”) prophetic.

ChatGPT psychosis is a real thing, right? I don’t know how old this niece is, but if she’s young, I’d be pretty concerned about the effect of an LLM on a developing human brain. It’s already bad on developed human brains, but …. on a developing human brain? Oof. Oof. And “it can be like your friend” ? Oof. “it can be like a sycophantic manipulative ‘friend'”. @Carlo Graziani: I’ve mentioned this before, and it is not a reflection on your colleagues at all, but one study I saw that got a lot of buzz or what do the kids say now, “rizz” maybe, was of experienced coders who started using vibe coding. They estimated they were 20% more productive when more objective measures showed they were about 10% less productive. YMMV of course.

@Urza: early last year it was 40% wrong in areas you’re an expert. But it feels 99% right in areas you don’t know. Well, I’m incredibly skeptical and even I did not expect bad numbers like this. I feel like there’s some kind of joke about dunning-kruger in there, but I’m not creative enough to come up with it. Maybe if I ask Gemini? I write regulatory impact analyses to support federal regulations.

A few weeks back I asked Gemini to help me update an RIA (we’ve been told by leadership to use it) and it said the discount rates I was using were out of date and the latest OMB guidance was to use 2%…which WAS the latest guidance until January 21, 2025. So that wasn’t an auspicious start. It has been helpful for certain tasks but yeah I’ve noticed that it’ll give you something and if you question something about its output it has a tendency to reverse itself which begs why it gave the first answer.

I have never looked into the details of Einstein’s paper on photons, but will note that the first person to suggest the existence of discrete packets of light was Max Planck in 1900, to explain the “ultraviolet catastrophe” of the classical electromagnetic theory of blackbody radiation. That is, the classical prediction was of infinite energy emission at short wavelengths. This was resolved by assuming light could only be emitted in packets of finite energy. Planck considered this just a heuristic. If Einstein was mostly considering thermal properties, this would have been the basis.

And technically Einstein won the Nobel Prize for the photoelectric effect because relativity (either special nor general) was not considered to be particularly “beneficial to mankind,” which was a stipulation in Nobel’s will (that has since been interpreted ever more loosely). @Tom Levenson: Was Gemini trained on transcripts from old episodes of “A Prairie Home Companion”? I think that was where that reference to the Balloon Juice blog appeared in a sketch.

@Chetan R Murthy: But my point is she has very little protection from manipulation from humans. …and if you question something about its output it has a tendency to reverse itself…. @Martin:As engineers address failures, the cars get slower, because they get more cautious. But an autonomous car isn’t desirable if it’s slow Sometime after the New Year, in the the SF Mission I saw a Waymo purpose-built taxi (so not a Jaguar with all the stuff bolted-on) heading west on 24th Stt towards the light at Valencia. It was goin’ -fast-, and it ran that light: had been yellow for a few secs as it entered, and was definitely red as it exited. Broke the law, full stop. I asked a buddy who works “in the biz” and he told me that this was a new development: that Waymo taxis had been known for being cautious, and it was pissing off riders, so they turned the crank towards incaution.

@different-church-lady: Oh now I see your point, yes. You’re right. That’s worrisome too! @Martin: That’s a whole lot of words to say “The people driving A-I are sociopaths.” @Martin: I don’t think it’s really “vibes,” except perhaps in the most generic sense of the term, that allows a human driver to predict what another car may do, especially to predict “rule breaking” behavior. It’s just based on experience with other humans, both in general and as drivers. There are likely subconscious cues such as a slight turning of the vehicle, motion by the driver, etc. Plus things like observing that a car approaching a light that’s about to change against it does not seem to be slowing down. Obviously we are honed by evolution to detect cues from others of our species.

Robot cars have no such experience. Humans do not always obey rules. Humans are unpredictable according to rules — but in ways that are actually often predictable by other humans. The Meta Llama series has been thoroughly lapped by Chinese open source/weights LLMs (Deepseek, Alibaba‘s Qwen, Moonshot AI‘s K series, Minimax‘s M series, & Zhipu‘s GLM), this despite Zuckerberg spending literally billions to poach top end AI talent from rival labs.

My niece is actually well-traveled and an astute student of human behavior. I respect her decision-making. I suppose she finds it useful and it relieves stress, which can be a powerful motivator. @YY_Sima Qian: I’ve heard rumors that one reason is that American AI companies are convinced they can reach AGI if they just buy enough GPUs, whereas Chinese researchers and companies focus on practical applications.

But where I work we’re not allowed to run Chinese models so I don’t have any experience with them. @Eyeroller: They’re vibes in the sense that they are unquantifiable. How do you quantify ‘that guy looks sketchy’? These are computer models – you have to quantify *everything* that you feed into it. Based on what you demonstrated here, the LLM is quite limited. The information you gave in the second prompt was already available on the wider net, yet it failed to capture the problems with Tesla until you said something, then just echoed it back.

The structure of the language is weird as well. It feels very much like a circular logic argument is being made, repetition without meaning. @Martin: I am sure it can be quantified in some way that we do not understand, at least not yet, but not with language models. I think that CNNs are a very flawed model of the brain, but that’s just my opinion. Nobody here is arguing that the people who are driving AI aren’t sociopaths.

If you don’t like when I comment, just pie me. That’s the 2nd time this week you’ve done that. Save everyone the effort. @Martin: Sincerely: sorry. I wasn’t trying to poke you, just an attempt at sarcastic humor, and I didn’t mean it to be at your expense. Sometimes being a chronic wise-ass backfires on me. This seems relevant: crookedtimber.org/2026/03/02/in-the-next-great-transformation-ai-will-not-eliminate-genuine-expertis… In the Next Great Transformation AI will not eliminate genuine expertise; rather it will make it more valuable @Chetan R Murthy: I’ve seen a noticeable difference in the tuning of ADAS systems on Chinese passenger NEVs versus the fully autonomous systems on Chinese robotaxis.

The latter are extremely cautious, while the former tend to have settings that go from conservative to fairly aggressive. The most aggressive setting is not quite as far as typical Didi/taxi drivers, but probably in the 80% percentile of regular human drivers. The ADAS systems I have experienced on Huawei Inside vehicles, & XPeng/BYD/NIO, are no less capable than that of the robotaxis. Yes, the ADAS on Chinese cars tend to run the yellow light if the car is already very close to the intersection, the robotaxis always stop.

My guess is that the ADAS systems are still assisted driving, where the driver is still liable for much of the outcome. Fully autonomous systems have no driver supervision, so the robotaxi providers are liable for the outcome. Furthermore, carmakers who want to sell passenger vehicles to consumers for substantial sums do not want their ADAS features to piss their consumers by being overly cautious.

Robotaxis are ride hailing services, people hailing such services by now know to expect extremely cautious driving. Interestingly, the latest PRC regulation stipulates that cars w/ ADAS (which is the vast majority of NEVs) have turquoise LED lights that are turned on when the vehicle is under ADAS control, to alert the other drivers of this fact. The feature can be turned off by the driver. I’ve seen an increasing number of drivers in the PRC choosing to turn the ADAS indicator lights off, because the ADAS tuning is still less aggressive than the most aggressive drivers, & those drivers have figured it out.

The ADAS indicator lights have become blaring signs inviting getting cut off. @YY_Sima Qian: the gist of your comment (which was informative, and with which I have no problem) reminded me of something I experienced in Paris. I lived in Paris 1991-94 in my mid-20s, and I used to make jokes about how when I crossed the street, I would make eye contact with the oncoming drivers: I used to say that if they knew that you knew that they saw you, then they’d slow down and wouldn’t hit you. It was like that: the traffic was bad, and I didn’t feel that pedestrians were particularly safe.

Fast-forward to fall 2025, I’m back in Paris for a month, and everything has changed. Bicycles are everywhere, there are bike lanes, many streets are one-way with space reserved for bikes, and it has -all- contributed to what I felt was a decided calming of traffic in Paris. Everything felt less fraught, less stressed, less of a -battle- in the street. And it merits saying that the difference between crazy/teeming traffic, and calm traffic, almost certainly isn’t going to be a significant lengthening of journeys: heck, b/c of fewer snarls, it might actually mean shorter journey time.

I feel like, the choices we make as a society can change the large-scale systems with which we interact. We could make all ADAS systems and robotaxies be cautious, and as that rolls out across society, it could yield a calmer, safer street experience for -everyone-. Everyone. When I read early reports about Waymo’s taxis being cautious, I had hopes that that was what was coming about. But no, it seems, Waymo has joined the lemmings as they rush toward the cliff of shitty driving.

@Chetan R Murthy: Oh concur. I think it mostly does what I would do with the normal Google interface, only faster, and in that respect it saves me time (and weeding through the ads). But ain’t no way I’d trust its summaries without going to the sources. @Eyeroller: I think there are a lot of things that people do that we lack the language to explain. I’m not saying they are supernatural, just that we lack the ability to evaluate and articulate it.

I think we are too caught up in the idea that you can reduce everything down to a big deterministic system that if you can just measure all the inputs correctly, you can turn the wheel and get predictable outputs, and for a lot of things that’s true, but for more things I don’t think it’s true at all, because the inputs can’t be measured. That’s always been the thing that social science statistics cannot breach – people have free will – and you can’t quantify those decisions.

In fact, I think that idea has driven most of the pretty terrible outcomes of the last 30-40 years, AI just being the most recent. @different-church-lady: My sincere apologies. I might be a little too sensitive. We’re all good. @YY_Sima Qian: What’s fucking wild to me is that we are burning insane amounts of compute power to observe things that can be signaled directly. So many of the things we were training cars to recognize we could trivially build into the infrastructure, and they’d be infinitely more reliable.

Why does a car have to simulate human vision for a traffic light when we could just output an unambiguous (to a computer) signal that says the light is red. I understand why the US doesn’t do that because it’s a shifting of costs and responsibility, but it’s one of the things that China would have a much easier time doing and I’m surprised they aren’t. All of the safety work on trains isn’t being done by teaching a computer how to read a visual signal, we just digitally send the information to the train.

It’s obvious that’s the correct solution for trains, and nobody is doing it for cars. It’s just crazy. @Eyeroller: Based on my limited experience, & my reading, the best US closed models (Anthropic‘s Claude Opus & Sonnet 4.6, OpenAI‘s ChatGPT-5.2, Google‘s Gemini 3.1 Pro) still has a noticeable edge over the latest Chinese open source/weights models (DeepSeek‘s V3.2 Speciale, Alibaba‘s Qwen 3.5, Moonshot AI‘s K2.5, Minimax‘s M2.5, Zhipu‘s GLM 5) in obscure world knowledge & coding capability in tail scenarios, which is likely due to the US models having more tokens in their data & more parameters, which is downstream of the US labs having more compute (or rather, Chinese labs having less compute due to export restrictions).

The US labs, specifically Anthropic, are also stronger on agentic “AI”, also downstream of the disparity in compute. Due to the compute constraints, Chinese labs have tended to have to focus on specific areas to match or exceed the frontier US labs, such as coding, math, creating writing, etc., & not everywhere. Public benchmarks can be misleading because most Chinese labs (except DeepSeek) tend to “benchmax” their models for the flex, although the US labs do that, too.

People are eagerly anticipating the DeepSeek V4 that is set to drop in the next week or so, & the Moonshot AI K3 in the coming month, especially the agentic capabilities. Going by the seemingly coordinated messaging campaign by US labs & USG over the past week (Chinese labs launching distillation “attacks” by studying the outputs of closed Anthropic & OpenAI models, DeepSeek training on “illicit” Nvdia Blackwell chips, DeepSeek only giving previews of the V4 to Huawei & Cambricon & not Nvidia, etc.) V4 & K3 could prove quite competitive at the frontier, & possibly cause market turbulence again.

The gap is close (< 6 months, possibly 2 – 3 months), & probably not noticeable to the vast majority of users & for most applications. Of course, the draw of the Chinese models is that they are mostly open source/weights, w/ MIT licenses, very cheap, & being able to finetune/modify & run one’s own copy locally has huge benefits in terms of privacy & control. Qwen & MiniMax has also released a number of smaller but high performing models that can be effectively run on edge applications (consumer electronic devices & robots), which has not been an area of focus for the US labs.

In video generation, ByteDance‘s SeeDance 2.0 has redefined the frontier & is clearly ahead of Sora & NanoBanana. Chinese labs are also thought to be industry leading in speech & video cognition, & translation, etc. The PRC is far ahead in deploying “AI” to boost efficiency in manufacturing, mining, logistics, or even consumer facing applications (shopping, delivery, ride hailing). ETA: Atrios used to write about this issue: that if the self-driving car companies had their way, all the roads&streets would become their private domain, and all pedestrians and human drivers would become interlopers, allowed in only on sufferance, perhaps not at all in the future.

My own (biased) opinion: “AI” is now approximately at the same point that aircraft were a few years after the Wright Brothers first flight — i.e., very unreliable. I can’t tell whether or not it will get better, Until it does, there is no reason to take it seriously. @Martin: Thanks. No, I don’t think you were being too sensitive. In retrospect, it’s totally reasonable someone could see it the way you did.

@Martin: The PRC has been upgrading/deploying the infrastructure to facilitate autonomous driving. Almost all traffic lights in the PRC are “smart”, therefore on AMaps (China’s IMO much better counterpart to Google Map or Apple Map) one can see the traffic light count down live, & the vehicles know exactly when the light will change (& not relying solely upon vision recognizing the color of the light).

There is almost always a countdown clock next to the light, as well, for the old fashioned human drivers. Lines & markings are kept clear & freshly painted to help w/ the vision of ADAS & robotaxis. The civilian grade of the Beidou satellite navigation service is accurate enough to allow the ADAS/robotaxi & AMaps to locate lanes that the vehicles are in even in crowded city streets. A decade ago, there were suggestions in the PRC to rebuild infrastructure from scratch to better enable autonomous driving, the new city of Xiong’an has sections where the vehicle lanes are completely underground, to be grade separated from pedestrians & two-wheelers.

However, the reality is that the transition to autonomous driving will take time & will likely to be messy. Some people will hold on to their dumb ICE cars, some people are not willing to embrace ADAS (let along fully autonomous driving), & rebuilding all road infrastructure from scratch is far too disruptive & far too expensive even for the PRC. & the rapid advances in the capabilities of ADAS & robotaxis have made some of these notions obsolete.

I mentioned before that we just upgraded our family car to an AITO M7 PHEV SUV, powered by ADAS & cockpit system developed by Huawei. I let the ADAS do 95% of my driving in Wuhan, on city streets that make the NYC’s look tame & orderly. It is a better driver than I am. @Chetan R Murthy: Oh, I agree. I would strongly prefer if all cars on the road are autonomous & tuned to strictly follow traffic regulations & standard etiquette.

I could then fully relax & not have to expend energy monitoring the ADAS. The traffic is likely to be faster, too, as there will be fewer accidents. Yes, the advise in crossing Chinese streets is also to 1st make eye contact w/ the car drivers, & particularly the riders of E-bikes, for the same reason. Years of ubiquitous cameras & heavy fines have conditioned most Chinese drivers to yield to pedestrians (although still not to the same degree as I have seen in Japan or New Zealand).

Scooter riders, not so much, & the E-bikes are silent, which makes them doubly dangerous. @Goku (aka Amerikan Baka): I always think of the Asimov robot story “Liar!” The mond reading robot tells everybody what it KNOWS they want the answer to be. Rereading it is very similar to what ai LLM seem to be right now. @TONYG: “You ask me to show figures on a brand-new type of enterprise, knowing I can’t.

It’s like asking the Wright brothers at Kitty Hawk to estimate how much money Curtiss-Wright Corporation would someday make out of building airplanes.” @Martin: I don’t want to get into the philosophy of free will, never mind how it might interact with certain elements of the physical universe, but an essay I read recently that rang true asserted that a lot of computer scientists and, perhaps distressingly, some cognitive scientists, appear to believe too literally in the analogy of the brain as a computer. That is, it’s some hardware (cells and neurotransmitters) that runs some mysterious software, and if we could just reverse-engineer that “software” we’d be able to adapt it to run on other hardware, which being (presumably) electronic (including perhaps quantum), would be much faster than our slow (200ms timescale) nervous system. Presto, AGI.

But the evidence suggests that the “hardware” and “software” are not really distinguishable in the brain. The brain is affected by, and in turn affects, multiple other systems in the body. It can rewire itself dynamically. It can still retain some ability to regrow cells. It constantly recreates memories. It is certainly not like a stored-program computer. @different-church-lady: “Hey, buddy, your headlight is out.” is rapidly approaching as the peak employment aspiration.

@YY_Sima Qian: This brings up an important point often ignored when talking about AI and autonomous machines in general. Humans are capable of making huge mistakes, whether due to laziness or neglect or malice or whatever. Humans “hallucinate” if they can’t be bothered to check their work. But at least in principle humans can be held responsible for our screwups. Who is responsible for the AI failure? At least in the case of Tesla, some lawsuits have held the company responsible. Will that be true if this becomes ubiquitous?

@Martin: Thank-you! I’ve been wondering about this almost forever! Why indeed must we train a car’s visual system to recognize whether the traffic light is green or red when a signal could just be sent to the car! @Ramona: We’d have to retrofit every traffic light in the country/world with a radio signal, and it would have to also encode which direction on which street it was pointing because cheap radio transmitters aren’t very directional and even if they were, this would need to be very precise. It could surely be done, it would just be hella expensive.

Trains are a much simpler case. They run on a few discrete rails with traffic that is managed centrally. Each locomotive is equipped with a receiver already. the answer might be: “it had both arguments in its training data/model, and which it produced depended on which query you submitted”. To an embarrassing degree, it’s meta-question is “what does Dave want me to say?” (My last job was in machine learning, where the meta-question was just “what’s the best match?”) It means AI implementations are first and foremost about profit.

“Figuring out” what you already believe and reinforcing that is the simplest and clearest path to revenue. @Bupalos: “Figuring out” what you already believe and reinforcing that is the simplest and clearest path to revenue. Is it, though? I mean, are you [or is anyone] going to pay money to receive online reinforcement from a chat bot? People have been thinking about the implications of trying to create new transportation frameworks for a while.

Between Elon Musk’s broken promises of “Full Self-Driving“ and California’s ban on Cruise’s driverless taxis, initial public enthusiasm for autonomous cars has given way to rampant skepticism and regulatory scrutiny. Yet automakers, tech giants, and researchers continue to work the problem, even as Alphabet’s Waymo lays off employees, General Motors’ Cruise sheds billions of dollars, and Ford and Volkswagen’s Argo AI unit shut down entirely.

Continued hope comes from a belief that autonomous vehicles and ones connected to communications networks could help to meet global goals to reduce emissions and limit injuries and deaths in cars. The role of sustainable mobility looms large over the European Green Deal, whose 2050 deadline dovetails with a parallel “Vision Zero” target in the United States to eliminate roadway fatalities. The new field of vehicle-to-everything (V2X) connectivity, in which individual cars would communicate wirelessly with one another as well as traffic signs and other infrastructure, could help reach those targets.

A paper published 11 March in the IEEE Open Journal of Vehicular Technology argues that V2X—underpinned by 5G and 6G networks—can “pave the way toward safe, affordable, accessible and sustainable transport systems, as well as improve road safety.” As the paper notes, if V2X could dynamically coordinate traffic, the result would be less congestion, freer-flowing transportation, safer streets, and reduced emissions.

The researchers in Spain and Sweden who authored the paper focused on future smart cities with high population density. They identify three current trends leading toward sustainable V2X communications: Climate-neutral infrastructure and vehicle-user equipment; “cloudification” and edge computing; and big data and AI. The researchers envision a model with roadside units linked to both cellular and cloud-computing networks.

These units would be able to communicate with smart traffic signals, road sensors, units on board individual vehicles, and edge and cloud nodes. There’s also a need to connect pedestrians and bicyclists so that vehicles and vehicle infrastructure know they’re there—either through passive detection or by actively signaling the network with onboard Internet of Things devices. Integrating pedestrians into a V2X system could help stem a troubling rise in pedestrian deaths in the U.S.

since about 2009, especially at night. In 2021, more than 7,300 pedestrians were killed in the United States, and 75 percent of those deaths occurred between sunset and sunrise. The authors note the potential for an Internet of Vehicles—a network of connected autonomous vehicles, in other words—that could boost communications efficiency by optimizing transmission rates, channels, and times. An Internet of Vehicles could create dynamic 3D maps of traffic and pedestrian flow, communicate object locations between cars to avoid collisions, and monitor air pollution and vital signs of drivers or passengers in vehicles.

One challenge of connecting every car to a wireless network is the massive amounts of generated data. In response, the researchers outline the potential for 6G network trends—including virtualization, moving network functions to the cloud, and edge computing—to handle the load. There may be simpler ways to get roughly the same benefit (especially in the near-term) – like regulating cars to be much smaller with much lower maximum speeds – but who knows what will be possible for reasonable costs in 50-100 years.

So it’s good to be thinking ahead. @Another Scott: an integrated network certainly could in theory make transportation safer and more efficient. The potential for security vulnerabilities is scary though. @Martin: I don’t think there’s good reason to think that car driving is one of those things though. When a human gets a sense that someone’s about to be acting weird, it’s still signaled from a bunch of stuff that’s totally quantifiable, we’re just not aware of what those signals are, but if you capture enough of the signal and run it through a properly configured pattern matching machine, it should learn it just as well as we have, or better, given enough data, processing power, sufficient correctness in how their models are established, and time.

ML algos have been shown to be better than human at other predictive tasks and find different correlations in the data than a human might. They’re not human, and don’t need to think like humans. They won’t see “that guy looks sketchy” but isn’t going to preclude them from, at least eventually, being able to accurately predict that he’s going to run the light. Our brains don’t even see “that guy looks sketchy” that’s just how we consciously interpret it after other parts or the brain have already figured it out, and the signals that get us there are more quantifiable, except that they’re invisible to us.

The machines don’t need to be able to determine whether the guy is ‘sketchy’ or not, just whether he’s going to run the light. And these systems are probabilistic too, so it’s not even that the AI robotaxis need to avoid all errors as much as they just need to have fewer errors than humans. For driving that’s not as tall an order as other things. Humans are really bad at it. Also, it’s not like free will as a concept isn’t highly debatable.

There’s plenty of evidence against it existing too. @Ramona: Because that doesn’t help you capture whether a deer has just run into the street, so you still need the visual components anyway. @Eyeroller: The higher level abstraction for how the brain does information processing should be something we can replicate, regardless of the way we want to think about hardware or software, the problem is if you don’t have a the right kind of hardware, you need to simulate large portions of that structure, which can be computationally inefficient to say the least.

Given enough compute power, you should be able to pull it off regardless, though we may need quantum computers to get all the way there. But I don’t think replicating the way the human brain works, even to scale it up, is the way to a properly useful tool for humans. Understanding how human cognition works will likely help us in building better information processing systems, but we shouldn’t be trying to recreate it.

We want something that does a better job at narrower tasks and that’ll mean tailoring those systems to specific use cases. @Chetan R Murthy: I’ve been following DeLong pretty closely on this; he’s been very measured on this stuff and is smart as hell. @Quantum man: I have not. I currently have access to the current versions of the models from the big players (absent Grok, thankfully) So, OpenAI, Anthropic, Google, and Meta @Chetan R Murthy: Human drivers don’t need to get those signals.

The signals simply need to match what the human drivers are getting. This is already happening in one regard – autonomous cars get their speed limit information from GPS, not from reading the signs. But in terms of signaling and the like – time dependent things – it doesn’t happen in any capacity that I’m aware of. This doesn’t need to be delegated to the auto makers (as if the streets and roads weren’t already their private domain – I say as someone who rarely drives and is pleased at the end of each day to have survived another as a pedestrian/cyclist) – it can be standardized by the various parties – AASHTO, CalTRANS, US DOT etc.

And there’s been a little work on this out of NHTSA, for instance but it’s policy not implementation focused. Mostly government has just said ‘fuck it’. As a programmer, I’m having a wonderful time with Claude. Also, I find Claude comes to do work. Last time I asked Claude and ChatGPT a complex question. Claude just tried to answer, even producing documents. ChatGPT took me on a long winding dialog about my motives with lots of multiple choice questions.

I realized it was trying to keep me engaged, rather than just answering the question and maybe making other suggestions. It wants to be your trusted advisor, whether you should trust it or not. If you don't see both the Visual and the Text tab on the editor, click here to refresh. Δdocument.getElementById( "ak_js_1" ).setAttribute( "value", ( new Date() ).getTime() );

Summary

This report covers the latest developments in artificial intelligence. The information presented highlights key changes and updates that are relevant to those following this topic.

Original Source: Balloon-juice.com | Author: Tom Levenson | Published: March 1, 2026, 11:17 pm

Respite: At Play in the Fields of the LLMs

Summary

Leave a Reply Cancel reply

Category Name

Older iPhones and iPads Receive Critical Security Updates…

Samsung Galaxy Z Fold 7 Joins One UI 8.5 Beta Program

The best — and worst — iPhone alarm sounds to wake up to

Recent Posts

Older iPhones and iPads Receive Critical Security Updates…

Samsung Galaxy Z Fold 7 Joins One UI 8.5 Beta Program

The best — and worst — iPhone alarm sounds to wake up to

The 1TB PNY microSD Express Card loaded up Pokemon Pokopi…

Categories

Older iPhones and iPads Receive Critical Security Updates…

Samsung Galaxy Z Fold 7 Joins One UI 8.5 Beta Program

The best — and worst — iPhone alarm sounds to wake up to

Older iPhones and iPads Receive Critical Security Updates…

Samsung Galaxy Z Fold 7 Joins One UI 8.5 Beta Program

The best — and worst — iPhone alarm sounds to wake up to

Respite: At Play in the Fields of the LLMs

Summary

Share This Post

Leave a Reply Cancel reply

Recent Posts

Categories