Thread-y or not, here’s Python! | InfoWorld

Technology insight for the enterprise

Thread-y or not, here’s Python! 28 Mar 2025, 5:00 am

There’s more than one way to work with threads, or without them, in Python. In this edition of the Python Report: Get the skinny on Python threads and subprocesses, use Python’s native async library to break up non-CPU-bound tasks, and get started using parallel processing for heavier jobs in your Python programs. Also, check out the built-in async features in Python 3.13 if you haven’t already.

Top picks for Python readers on InfoWorld

Python threading and subprocesses explained
How does Python manage units of work meant to run side-by-side? Get started with threads and subprocesses—two different ways of getting things done in your Python programs.

How to use asyncio: Python’s built-in async library
Async is the best way to run many small jobs that yield to each other as needed—say, web scraping at scale or other network-bound operations.

The best Python libraries for parallel processing
Parallel processing libraries are used for big jobs that need to be broken across multiple CPUs or even multiple machines. Here are some of the best options out there.

Get started with the free-threaded build of Python 3.13
Do you want to really master threads in Python? Experiment with Python 3.13’s free-threaded alternate build and discover the difference for yourself.

More good reads and Python updates elsewhere

aiopandas: Async support for common Pandas operations
Check out the new monkey patch for adding async support and parallel execution to map, apply, and other common operations in Pandas. Minimal code changes required.

A very early peek at Astral’s Red Knot static type checker for Python
The team behind the uv environment management tool for Python is now brewing up a wicked-fast static type checker, too. Here’s an early peek, and some primitive but promising hints of what’s to come.

Specializing Python with e-graphs
Curious about the workings of projects that compile Python to assembly? Here’s a deep—and we mean deep—dive into analyzing and rewriting Python expressions as low-level machine language.

A look back at very early C compilers
How early? So early you’d need a PDP-11 emulator to get them running. (Also available on GitHub.)

(image/jpeg; 2.56 MB)

Are we creating too many AI models? 28 Mar 2025, 5:00 am

A few days ago, I stared at yet another calendar invitation from a vendor eager to showcase their “groundbreaking” large language model (LLM). The irony wasn’t lost on me—just weeks ago, this same company had proudly showcased its environmental initiatives and impressive environmental, social, and governance scores. Today they are launching another resource-hungry AI model into an already saturated market.

As I joined the call, the familiar enthusiasm bubbled through my screen: “revolutionary capabilities,” “state-of-the-art performance,” “competitive advantage.” But all I could think about was a massive data center somewhere, humming with thousands of GPUs, consuming megawatts of power to train what was essentially another variation of existing technology.

I couldn’t help but wonder: How do they reconcile their sustainability promises with the carbon footprint of their AI ambitions? It felt like watching someone plant trees while simultaneously burning down a forest.

The world is seeing an explosion of LLMs, with hundreds now in existence. They range from proprietary giants, such as GPT-4 and PaLM, to open source alternatives, such as Llama or Falcon. Open source accessibility and corporate investments have fueled this boom, creating a crowded ecosystem where every organization wants its own version of AI magic. Few seem to realize that this growth comes at a staggering cost.

Access to these AI powerhouses has become remarkably democratized. Although some premium models such as GPT-4 restrict access, many powerful alternatives are free or at minimal cost. The open source movement has further accelerated this trend. Llama, Mistral, and numerous other models are freely available for anyone to download, modify, and deploy.

Environmental and economic impact

As I look at graphics that show the number of LLMs, I can’t help but consider the impact at a time when resources are becoming finite. Training alone can cost up to $5 million for flagship models, and the ongoing operational expenses reach millions per month.

Many people and organizations don’t yet realize the staggering environmental impact of AI. Training a single LLM requires enormous computational resources—the equivalent of powering several thousand homes for a year. The carbon footprint of training just one major model can equal the annual emissions of 40 cars or approximately 200 tons of carbon dioxide when using traditional power grids. Inference, which involves generating outputs, is less resource intensive but grows quickly with use, resulting in annual costs of millions of dollars and significant energy consumption measured in gigawatt centers.

The numbers become even more concerning when we look at the scale of current operations. Modern LLMs require hundreds of billions of parameters for training. GPT-3 uses 175 billion, BLOOM operates with 176 billion, and Google’s PaLM pushes this to 500 billion parameters. Each model requires hundreds of thousands of GPU hours for training, consuming massive amounts of electricity and requiring specialized hardware infrastructure.

Computational demands directly translate into environmental impact due to energy consumption and the hardware’s carbon footprint. The location of training facilities significantly affects this impact—models trained in regions that rely on fossil fuels can produce up to 50 times more emissions than those powered by renewable energy sources.

Too much duplication

Some level of competition and parallel development is healthy for innovation, but the current situation appears increasingly wasteful. Multiple organizations are building similar capabilities, with each contributing a massive carbon footprint. This redundancy becomes particularly questionable when many models perform similarly on standard benchmarks and real-world tasks.

The differences in capabilities between LLMs are often subtle; most excel at similar tasks such as language generation, summarization, and coding. Although some models, like GPT-4 or Claude, may slightly outperform others in benchmarks, the gap is typically incremental rather than revolutionary.

Most LLMs are trained on overlapping data sets, including publicly available internet content (Wikipedia, Common Crawl, books, forums, news, etc.). This shared foundation leads to similarities in knowledge and capabilities as models absorb the same factual data, linguistic patterns, and biases. Variations arise from fine-tuning proprietary data sets or slight architectural adjustments, but the core general knowledge remains highly redundant across models.

Consequently, their outputs often reflect the same information frameworks, resulting in minimal differentiation, especially for commonly accessed knowledge. This redundancy raises the question: Do we need so many similarly trained LLMs? Moreover, the improvements from one LLM version to the next are marginal at best—all the data has already been utilized for training, and our capacity to generate new data organically won’t produce significant improvements.

Slow down, please

A more coordinated approach to LLM development could significantly reduce the environmental impact while maintaining innovation. Instead of each organization building from scratch, we could achieve similar capabilities with far less environmental and economic cost by sharing resources and building on existing open source models.

Several potential solutions exist:

  • Create standardized model architectures that organizations can use as a foundation.
  • Establish shared training infrastructure powered by renewable energy.
  • Develop more efficient training methods that require fewer computational resources.
  • Implement carbon impact assessments before developing new models.

I use LLMs every day. They are invaluable for research, including research for this specific article. My point is that there are too many of them, and too many do mostly the same thing. At what point do we figure out a better way?

(image/jpeg; 2.29 MB)

Adobe announces AI agents for customer interaction 27 Mar 2025, 6:04 pm

Adobe has announced Adobe Experience Platform Agent Orchestrator, an intelligent reasoning engine that allows AI agents to perform complex decision-making and problem-solving tasks to accelerate customer experience orchestration. The Agent Orchestrator and a suite of Experience Platform Agents are “coming soon and currently in development,” Adobe said.

Announced March 18, Agent Orchestrator is rooted in semantic understanding of enterprise data, content, and customer journeys, Adobe said. This enables agentic AI solutions that are purpose-built for businesses to deliver targeted experiences with built-in data governance and regulatory compliance. Working through Adobe Experience Cloud, Adobe Experience Platform is used by companies to connect real-time data across an organization, with insights for customer experiences. With Agent Orchestrator, businesses can build and manage AI agents from Adobe and third-party ecosystems. In conjunction with Agent Orchestrator, Adobe announced 10 purpose-built AI agents, including:

  • Account Qualification Agent, for evaluating opportunities to build sales pipeline and engage members of a buying group.
  • Audience Agent, for analyzing cross-channel engagement data and create high-value audience segments.
  • Content Production Agent, for helping marketers scale by generating and assembling content.
  • Data Insight Agent, for simplifying the process of deriving insights from signals across an organization, for visualizing, forecasting, and remediating customer experiences.
  • Data Engineering Agent, for supporting high-volume data management tasks such as data integration, cleansing, and security.
  • Experimentation Agent, for hypothesizing and simulating new ideas and conducting impact analysis.
  • Journey Agent, for orchestrating cross-channel experiences.
  • Production Advisor Agent, for supporting brand engagement and funneling advancement through product discovery and consideration experiences tailored to individual preferences and past purchases.
  • Site Optimization Agent, for driving performant brand websites by detecting and fixing issues to improve customer engagement.
  • Workflow Optimization Agent, for supporting cross-team collaboration by monitoring the health of ongoing projects, streamlining approvals, and accelerating workflows.

Adobe also introduced Brand Concierge, a brand-centric agent built on top of Agent Orchestrator. Users will be able to configure and manage AI agents that guide consumers from exploration to confident purchasing decisions, using immersive and conversational experiences, Adobe said.

(image/jpeg; 12.27 MB)

What next for WASI on Azure Kubernetes Service? 27 Mar 2025, 5:00 am

Microsoft announced at the end of January 2025 that it would be closing its experimental support for WASI (WebAssembly System Interface) node pools in its managed Azure Kubernetes Service. This shouldn’t have been a surprise if you have been following the evolution of WASI on Kubernetes. The closure does require anyone using server-side WASI code on AKS to do some work as part of migrating to an alternate runtime.

It’s important to note that the two options Microsoft is suggesting don’t mean migrating away from WASI. WebAssembly and Kubernetes are two technologies that work well together. As a result, several different open source projects fill the gap, allowing you to add a new layer to your AKS platform and ensuring that you can continue running with minimal disruption.

If you’re using WASI node pools in AKS, the last day you can create a new one is May 5. You can continue using existing WASI workloads, but it’s time to look at alternatives for new developments and upgrades. You shouldn’t wait until Microsoft’s own WASI AKS service stops working; you can start planning your transition now, with official support for two alternative approaches.

From Krustlets to what?

The big issue for AKS WASI node pools was its dependency on the experimental Krustlet project, which used Rust to build WebAssembly-ready Kubelets. Unfortunately, even though Krustlet was a Cloud Native Computing Foundation Sandbox project, it’s no longer maintained, as team members have moved on to other projects. With no maintainers, the project would be left behind as both Kubernetes and WebAssembly continued to evolve.

With it no longer possible to rely on a key dependency, it’s clear that Microsoft had no choice but to change its approach to WebAssembly in AKS. Luckily for Microsoft, with AKS offering a managed way to work with Kubernetes, it still supports the wider Kubernetes ecosystem via standard APIs. That allows it to offer alternate approaches to running WASI on its platform.

Run WebAssembly functions on AKS with SpinKube

One option is to use another WASI-on-Kubernetes project’s runtime. SpinKube has been developing a shim for the standard Kubernetes container host, containerd, which lets you use runwasi to host WASI workloads without needing to change the underlying Kubernetes infrastructure. Sponsored by WebAssembly specialist Fermyon, Spin is part of a long heritage of Kubernetes tools from a team that includes the authors of Helm and Brigade.

SpinKube is a serverless computing platform that uses WASI workloads and manages them with Kubernetes. Its containerd-shim-spin tool adds runwasi support, so your nodes can host WASI code, treating it as standard Kubernetes resources. Nodes host a WASI runtime and are labeled to ensure that your workloads are scheduled appropriately, allowing you to run both WASI and standard containers at the same time, as well as tools like KEDA (Kubernetes Event-driven Autoscaling) for event-driven operations.

Other Spin tools handle deploying and managing the life cycle of shims, ensuring that you’re always running an up-to-date version and that the containerd shim is deployed as part of an application deployment. This allows you to automate working with WASI workloads, and although this requires more management than the original WASI node pools implementation, it’s a useful step away from having to do everything through the command line and kubectl.

Microsoft recommends SpinKube as a replacement for its tool and provides instructions on how to use it in AKS. You can’t use it with a Windows-based Kubernetes instance, so make sure you have a Linux-based AKS cluster to work with it. Usefully you don’t need to start from scratch as you can deploy SpinKube to existing AKS clusters. This approach ensures that you can migrate to SpinKube-based WASI node pools and keep running with Microsoft’s own tools until you’ve finished updating your infrastructure.

Deploying SpinKube on AKS

Although it’s technically available through the Azure Marketplace, most of the instructions for working with SpinKube and AKS are based on you installing it from the platform’s own repositories, using Kubernetes tools. This may be a more complex approach, but it’s more in line with how most Kubernetes operators are installed, and you’re more likely to get community support.

You will need the Azure CLI to deploy SpinKube after you have created an AKS cluster. This is where you run the Kubernetes kubectl tools, using your AKS credentials. Your cluster will need to be running cert-manager, which can be deployed using Helm. Once that’s in place, follow up by installing SpinKube’s runtime-class-manager and its associated operator. This can be found in its own repository under its original name, KWasm.

You can now deploy the containerd shim to your cluster via kubectl, using the annotate node command. This informs the runtime-class-manager to deploy the shim, labeling nodes that are ready to use. You now can add the SpinKube custom resource definitions and runtime classes to the cluster, using kubectl to copy the spin components from GitHub and apply them to your cluster. Once these are in place, you will use Helm to deploy the spin-operator before adding a SpinAppExecutor.

Getting up and running is a relatively complex set of steps, however, you do have the choice of wrapping the entire deployment process in an Azure CLI script. This will allow you to automate the process and repeat it across application instances and Azure regions.

Once the SpinKube nodes are in place, you can bring your WASI applications across to the new environment. Spin is designed to load WASI code from an OCI-compliant registry, so you will need to set one up in your Azure infrastructure. You also have the option of using a CI/CD integrated registry like the one included as part of the GitHub Packages service. If you take this route, you should use a GitHub Enterprise account to keep your registry private.

With this approach, you can compile code to WASI as part of a GitHub Action, using the same Action to save it in the repository. Your AKS application will always have access to the latest build of your WASI modules. Like all Kubernetes applications, you will need to define a YAML description for your code and use the containerd shim as the executor for your code.

Use WASI microservices on AKS with wasmCloud

As an alternative to SpinKube, you can use another CNCF project, wasmCloud. Here you need a Helm chart to install the various wasmCloud components at the same time. This requires having the Azure CLI and kubectl manage AKS, as there’s no integration with the Azure Portal. At the same time, as there is quite a different architectural approach, you need to start from scratch, rearchitecting your cluster and application for use with wasmCloud.

Start by creating a Kubernetes namespace for wasmCloud, before using Helm to install the wasmCloud platform components. Once the pods have restarted, use kubectl to start a wasmCloud instance and then deploy the components that make up your application. WasmCloud has its own command-line management tool, and you need to forward traffic to the management pod to use it.

Again, you must use YAML to describe your application; however, now you’re using wasmCloud’s own orchestration tools, so you will use its descriptions of your application components. Once complete, you can use the command-line tool to deploy and run the application. WasmCloud is designed to support a component model for building and running applications, with the intent of delivering a standard way of describing and calling WASI components, with support from Cosmonic.

A philosophical difference

With two alternatives for Microsoft’s WASI node pools, it’s clear there’s still a future for WebAssembly on AKS. But why two quite different ways of working with and running WASI?

The underlying philosophies of wasmCloud and SpinKube are very different; wasmCloud is designed to host full-scale WASI-based applications, assembling and orchestrating microservice components, while SpinKube is very much about quickly launching WASI-based functions, scaling from zero in very little time as an alternative to services like AWS Lambdas or Azure Functions. Having support for both in AKS makes sense so you can choose the right WASI platform for your needs.

We’re still exploring what WebAssembly can do for us, so it’s good we’re not being locked into only one way to work with it in AKS. Having many different options is very much the Kubernetes way of doing things, as it’s a way to build and manage distributed applications. Like our PCs and servers, it’s a flexible and powerful platform ready for your applications, whether they’re serverless functions or large-scale enterprise services, written in whatever language, and hosted in whatever containerd-compliant environment you want.

(image/jpeg; 0.78 MB)

How RamaLama helps make AI model testing safer 27 Mar 2025, 5:00 am

Consider this example: An amazing new software tool emerges suddenly, turning technology industry expectations on their heads by delivering unprecedented performance at a fraction of the existing cost. The only catch? Its backstory is a bit shrouded in mystery and it comes from a region that is, for better or worse, in the media spotlight.

If you’re reading between the lines, you of course know that I’m talking about DeepSeek, a large language model (LLM) that uses an innovative training technique to perform as well as (if not better than) similar models for a purported fraction of the typical training cost. But there are well-founded concerns around the model, both geopolitical (the startup is China-based) and technological (Was its training data legitimate? How accurate is that cost figure?).

Some might say that the various concerns around DeepSeek, many of which start on the privacy side of the coin, are overblown. Others, including organizations, states, and even countries, have banned downloads of DeepSeek’s models.

Me? I just wanted to test the model’s crazy performance claims and understand how it works—even if it had bias, even if it was kind of weird, even if it was indoctrinating me into its subversive philosophy (that’s a joke, people). I was willing to take the risk to see how DeepSeek’s advances might be used today and influence AI moving forward. With that said, I certainly didn’t want to download DeepSeek to my phone or to any other network-connected device. I didn’t want to sign up to their service, give them my credentials, or leak my prompts to a web service.

So, I decided to run the model locally using RamaLama.

Spinning up DeepSeek with RamaLama

RamaLama is an open source project that facilitates local management and serving of AI models through the use of container technology. The RamaLama project is all about reducing friction in AI workflows. By using OCI containers as the foundation for deploying LLMs, RamaLama aims to mitigate or even eliminate issues related to dependency management, environment setup, and operational inconsistencies.

Upon launch, RamaLama inspects your system for GPU support. If no GPUs are detected it falls back to CPUs. RamaLama then uses a container engine such as Podman or Docker to download an image that includes all of the software necessary to run an AI model for your system’s setup. Once the container image is in place, RamaLama pulls the specified AI model from a model registry. At this point, it launches a container, mounts the AI model as a data volume, and starts either a chatbot or a REST API endpoint (depending on what you want).

A single command!

That part still makes me super-excited. So excited, in fact, that I recently sent an email to some of my colleagues encouraging them to try it for themselves as a way to (more safely and easily) test DeepSeek.

Here, for context, is what I said:

I want to show you how easy it is to test deepseek-r1. It’s a single command. I know nothing about DeepSeek, how to set it up. I don’t want to. But I want to get my hands on it so that I can understand it better. RamaLama can help!

Just type:


ramalama run ollama://deepseek-r1:7b

When the model is finished downloading, type the same thing you typed with granite or merlin and you can compare how they perform by looking at their results. It’s interesting how DeepSeek tells itself what to include in the story before it writes the story. It’s also interesting how it confidently says things that are wrong 🙂

What DeepSeek thinks

I included in my email the results of a query I entered into DeepSeek. I asked it to write a story about a certain open source-forward software company.

DeepSeek returned an interesting narrative, not all of which was accurate, but what was really cool was the way that DeepSeek “thought” about its own “thinking” — in an eerily human and transparent way. Before generating the story, DeepSeek—which, like OpenAI o1, is a reasoning model—spent a few moments muddling through how it would put the story together. And it showed its thinking, 760-plus words’ worth. For example, it reasoned that the story should have a beginning, a middle, and an end. It should be technical, but not too technical. It should talk about products and how they are being used by businesses. It should have a positive conclusion, and so on. 

This process was like a writer and editor talking through a story. Or healthcare professionals collaborating on a patient’s care plan. Or development and security teams discussing how to work together to protect an application. I can see DeepSeek being used as a tool in these and other collaborations, but I certainly don’t want it to replace them.

Indeed, based on my trial run of DeepSeek with RamaLama, I determined that I would feel comfortable using the LLM for tasks such as generating config files or in situations where inputs and outputs are pretty well packaged up—like, “Hey, analyze this cluster and tell me if you know whether Kubernetes is healthy.” However, the glaring hallucinations in DeepSeek’s narrative about the aforementioned open source company led me to determine that DeepSeek should not be considered the supreme authority for any kind of open-ended questions whose answers have impactful ramifications.

And, honestly, I would say that today about any public LLM.

The value of RamaLama

I think that’s where the value proposition of RamaLama comes in. You can do this kind of testing and iterating on AI models without compromising your own data. When you’re done running the model locally, it can just be deleted. This is something that Ollama also does, but RamaLama’s ability to containerize models provides portability across runtimes and the ability to leverage existing infrastructure (including container registries and CI/CD workflows). RamaLama also optimizes software for specific GPU configurations and generates a Podman Quadlet file that makes it easier for developers eventually to go from experimentation to production.

These kinds of capabilities will be increasingly important as more companies invest more time, money, and trust in AI.

Indeed, DeepSeek has a plethora of potential issues but it has challenged conventional wisdom and therefore has the potential to move AI thinking and applications forward. Curiosity mixed with a healthy dose of caution should drive our work with new technology, so it will be important to continue to use and develop safe spaces such as RamaLama.

Generative AI Insights provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.

(image/jpeg; 1.14 MB)

Microsoft lauds Hyperlight Wasm for WebAssembly workloads 26 Mar 2025, 5:40 pm

Microsoft has unveiled Hyperlight Wasm, a virtual machine “micro-guest” that can run WebAssembly component workloads written in a multitude of languages including C and Python.

Introduced March 26, Hyperlight Wasm serves as a Rust library crate. Wasm modules and components can be run in a VM-backed sandbox. The purpose of Hyperlight Wasm is to enable applications to safely run untrusted or third-party Wasm code within a VM with very low latency/overhead. It is built on Hyperlight, introduced last year as an open source Rust library to execute small, embedded functions using hypervisor-based protection. Workloads in the Hyperlight Wasm guest can run for compiled languages such as C, Go, and Rust as well as for interpreted languages including Python, JavaScript, and C#. But a language runtime must be included as part of the image.

Hyperlight Wasm remains experimental and is not considered production-ready by its developers, according to the project’s GitHub page. This page also contains instructions for building with the technology. Hyperlight Wasm takes advantage of WASI (WebAssembly System Interface) and the WebAssembly Component Model. It can allow developers to implement a small set of high-level, performant abstractions in almost any execution environment and provides a fast, hardware-protected but widely compatible execution environment.

Building Hyperlight with a WebAssembly runtime enables any programming language to execute in a protected Hyperlight micro-VM without any prior knowledge of Hyperlight. Program authors are just compiling for the wasm32-wasip2 target, meaning programs can use runtimes such as Wasmtime or Jco, Microsoft said. Programs also can be run on a server for Nginx Unit, Spin, WasmCloud, or, now, Hyperlight Wasm. In an ideal scenario, developers wouldn’t need to think about what runtime their code will run on as they are developing it. Also, by combining Hyperlight with WebAssembly, Microsoft said it was achieving more security and performance than traditional VMs by doing less work overall. Wasmtime provides strong isolation boundaries for Wasm workloads via a software-defined sandbox, Microsoft said.

Plans call for enabling Hyperlight Wasm to work on Arm64 processors. Thus far, planning has centered on using WASI on Hyperlight for portability between operating systems and VMs. But Wasm applications are portable between different instruction sets. Also, Hyperlight Wasm will soon be extended with default bindings for some WASI interfaces.

(image/jpeg; 1.35 MB)

Critical RCE flaws put Kubernetes clusters at risk of takeover 26 Mar 2025, 10:53 am

The Kubernetes project has released patches for five vulnerabilities in a widely used popular component called the Ingress NGINX Controller that’s used to route external traffic to Kubernetes services. If exploited, the flaw could allow attackers to completely take over entire clusters.

“Based on our analysis, about 43% of cloud environments are vulnerable to these vulnerabilities, with our research uncovering over 6,500 clusters, including Fortune 500 companies, that publicly expose vulnerable Kubernetes ingress controllers’ admission controllers to the public internet — putting them at immediate critical risk,” wrote researchers from cloud security firm Wiz who found and reported the flaws.

Continue reading on CSOonline.com

(image/jpeg; 0.61 MB)

Databricks’ TAO method to allow LLM training with unlabeled data 26 Mar 2025, 8:21 am

Data lakehouse provider Databricks has unveiled a new large language model (LLM) training method, TAO that will allow enterprises to train models without labeling data.

Typically, LLMs when being adapted to new enterprise tasks are trained by using prompts or by fine-tuning the model with datasets for the specific task.

However, both these techniques have caveats. While prompting is seen as an error-prone process with limited quality gains, fine-tuning requires large amounts of human-labeled data which is either not available for most enterprises or is extremely time-consuming to actually label the data.

TAO or Test-time Adaptive Optimization, according to Databricks, provides an alternative to fine-tuning model by leveraging test-time compute and reinforcement learning (RL) to teach a model to do a task better based on past input examples alone, meaning that it scales with an adjustable tuning compute budget, not human labeling effort.

Test-time compute, which has gained popularity due to its use by OpenAI and DeepSeek across their o1 and R1 models, is the compute resources that any LLM uses during the inference phase, which is when it is being asked to complete a task and not during training.

These compute resources, which focus on how the model is actually reasoning to solve a task or query, can be used to make adjustments to improve output quality, according to a community post on Hugging Face.  

However, Databricks’ Mosaic Research team has pointed out that enterprises don’t need to be alarmed about the rise in inference costs if they were to adopt TAO.

“Although TAO uses test-time compute, it uses it as part of the process to train a model; that model then executes the task directly with low inference costs (i.e., not requiring additional compute at inference time),” the team wrote in a blog post.

Mixed initial response to TAO

Databricks’ co-founder CEO Ali Ghodsi’s post about TAO on LinkedIn has attracted mixed initial response to TAO.
While some users, such as Iman Makaremi, co-founding head of AI at Canadian starup Catio; and Naveed Ahamed, senior enterprise architect at Allianz Technology, were excited to implement and experiment with TAO, other users posed questions about the efficiency of TAO.

Tom Puskarich, a former senior account manager at Databricks, questioned the use of TAO when training a model for new tasks.

“If you are upgrading a current enterprise capability with a trove of past queries, but for enterprises looking to create net new capabilities, wouldn’t a training set of labeled data be important to improve quality?” Puskarich wrote.

“I love the idea of using inputs to improve but most production deployments don’t want a ton of bad experiences at the front end while the system has to learn,” the senior account manager added.

Another user — Patrick Stroh, head of Data Science and AI at ZAP Solutions pointed out that enterprise costs may increase.

“Very interesting, but also cognizant of the (likely increase) costs due to an adaptation phase. (This would likely be incremental to the standard costs (although still less than fine-tuning)). (I simply can’t understand how it would the SAME as the original LLM as noted given that adaptation compute. But I suppose they can price it that way.),” Stroh wrote.

How does TAO work?

TAO comprises four stages including response generation, response scoring, reinforcement learning, and continuous improvement.

In the response generation stage, enterprises can begin with collecting example input prompts or queries for a task, which can be automatically collected from any AI application using its proprietary AI Gateway.

Each prompt is then used to generate a diverse set of candidate responses and then these responses are systematically evaluated in the response scoring stage for quality, the company explained, adding that scoring methodologies include a variety of strategies, such as reward modeling, preference-based scoring, or task-specific verification utilizing LLM judges or custom rules.

In the reinforcement learning stage, the model is updated or tuned so that it produces outputs more closely aligned with high-scoring responses identified in the previous step.

“Through this adaptive learning process, the model refines its predictions to enhance quality,” the company explained.

And finally in the continuous improvement phase, enterprise users create data, which are essentially different LLM inputs, by interacting with the model, which can be used to optimize model performance further.

TAO can increase the efficiency of inexpensive models

Databricks said it used TAO to not only achieve better model quality than fine-tuning but also upgrade the functionality of inexpensive open-source models, such as Llama, to meet the quality of more expensive proprietary models like GPT-4o and o3-mini.

“Using no labels, TAO improves the performance of Llama 3.3 70B by 2.4% on a broad enterprise benchmark,” the team wrote.

TAO is now available in preview to Databricks customers who want to tune Llama, the company said. The company is planning to add TAO to other products in the future.

(image/jpeg; 0.11 MB)

What you need to know about Go, Rust, and Zig 26 Mar 2025, 5:00 am

Every language has a life cycle. Sometimes it starts with a relatively narrow use case and escapes its container; sometimes it’s intended as a general-purpose language but finds a powerful niche instead.

Over the last decade-plus, three new languages have emerged as attention getters in the software development space. In this article, we’ll look at what’s so special about each of these languages, and where they may be headed.

Go

With its relatively minimal syntax, simple paradigms, and convenient deployment tooling, the Go language, created by Google, has made it easier to write fast, compact programs that don’t require developers to think heavily about memory safety.

In the decade or so since its introduction, Go has found a few niches where it flourishes. Network or web services, particularly those with asynchronous behaviors, are easy to write in Go. It’s become a powerful alternative to Python in that respect. Go can scale to handle far more traffic than Python does, and with less effort. Applications delivered as standalone binaries, like command-line tools, are another good fit for Go. Compiled Go programs can run without external dependencies and can be built for every major platform.

Go’s biggest obstacles and developer complaints often stem from one of its chief selling points: the deliberate simplicity of the language. Go’s maintainers try to keep its syntax and feature set as unadventurous and unchanging as they can, with a goal of remaining forward-compatible.

But Go’s choices can also feel like a calculated snub of the powerful features that programming languages have gained over the past few decades. Generics were added to Go only very recently, and error handling is closer to C’s way of doing things than anything else. It’s a welcome attitude in a world of moving too fast and breaking too many things. It also enforces constraints on development, which may be difficult to outgrow for projects dependent on Go.

Rust

When a program needs both memory safety and speed, Rust is the language that regularly bubbles to the top of the list. Rust’s whole m.o. is delivering fast, machine-native code that can’t make whole classes of memory-safety errors, as they simply never make it into production.

The explosion of enthusiasm around Rust and its powers has left it with a wide realm of use cases. Most are server-side, cloud-computing, distributed-system, or network-centric apps—things once typically the domain of only Java or C++. It’s also found a strong presence in the WebAssembly world, as it can compile natively to WASM and thus be re-used in many other contexts.

The most newsworthy application of Rust, if not the most widely used, is in replacing C/C++ code in existing “brownfield” projects. The Linux kernel maintainers are working out (albeit with some difficulty) plans for including Rust code strategically in the kernel. This isn’t to edge out the use of C altogether, but rather to employ Rust where it’ll afford the biggest payoffs with the least additional maintenance burden (e.g., device drivers). The goal is to enhance memory safety without forcing Linux kernel C developers to retool in Rust if they don’t want to.

Some Linux kernel developers resist the move, citing common complaints about the language such as its steep learning curve and ahead-of-time complexity. Rust’s memory safety requires programmers to think ahead about how to satisfy the compiler’s demands, and adapting to that mindset is a common rite of passage for Rust newcomers. Plus, Rust projects often require dozens or hundreds of external dependencies that slow down compile times, an echo of similar sprawl in the world of JavaScript.

There’s no question about the demand for memory safety along with speed. The ideal would be a more streamlined version of Rust, or a new language that offered the same benefits without Rust’s conceptual overhead. For now, though, there’s no question Rust has galvanized a generation of developers who want what it offers.

Zig

Andrew Kelley’s one-man programming language project, launched in 2015, is positioned as both a complement and a competitor to C. Zig aims at much the same space as C: the world of low-level, non-garbage-collected, portable languages. Zig also compiles to the same kinds of targets as C, including WebAssembly.

Unlike C, though, Zig has native features to make it easier to write memory-safe low-level code. And unlike Rust—the other major language in this space—Zig does not require programmers to work so hard for the sake of correctness. Memory management is done manually, but the language provides more syntactical tooling than C does for handling memory. One example is Zig’s defer statement, used to clean up resources at the end of a scope. Many common runtime issues like integer overflow are trapped by default and can be granularly overridden (albeit only in a given scope), but the default is toward safety.

One way Zig aims to replace C is by integrating elegantly with it—by sitting side-by-side and even using C’s own libraries. This gives those developing C applications a transition path to Zig that doesn’t require scrapping and redoing everything. The Zig compiler can even function as a C compiler, and can build Zig libraries with C ABIs to allow C to use Zig code.

Zig’s biggest obstacles are typical for any new language. The language itself is in flux (its current version is 0.15), with potential breaking changes at any point along the way. The legacy world it aims to displace—the C “cinematic universe”—is also immensely entrenched, comprising not just the language but the development culture of C.

Another barrier common to new languages—tooling in common editors and IDEs—seems to be dissolving quickly, however. A Zig add-on in Visual Studio Code provides the compiler itself, not just a language server, as an easily integrated component to the editor.

(image/jpeg; 1.79 MB)

Intro to Alpine.js: A JavaScript framework for minimalists 26 Mar 2025, 5:00 am

I recently backpacked through Big Sur, and after a few days, the inevitable happened: I looked at everything I carried and demanded it justify its presence in my backpack. Making tech choices during the software development process is similar. Every asset in the system adds complexity, so everything better be pulling its weight.

Alpine has carved out a place for itself as the minimalist choice among reactive frameworks. It offers an impressive range of powers within a tight footprint. It’s surprising how much you can do with such a small feature set.

Alpine’s minimalist API

As described in the Alpine docs, Alpine is a collection of 15 attributes, six properties, and two methods. That’s a very small API. It delivers reactivity in a simple package, then offers a few niceties on top like eventing and a store.

Consider the following simple web page:




  

Besides including the Alpine package via CDN, the only Alpine-related things here are the two directives: x-data and x-text.

If you put this into an HTML page on your system and view it in the browser, you’ll see the message: “Text literal” output. This is not terribly impressive, but it demonstrates two interesting facts about Alpine.

First, for reactivity to engage, you must enclose the markup in an x-data directive. If you remove this directive, the x-text will not take effect. So, the x-data directive creates an Alpine component. In this case, the directive is empty, but in real usage you almost always have data in there; after all, you’re writing components whose purpose is to be reactive to that data.

Second, you can put any valid JavaScript into the x-text. This is true of all Alpine directives. The x-text property gives you a link between the HTML (the view) and the JavaScript (the behavior).

Using the x-data and x-text elements

The x-data contents are provided to all the contained elements. To understand what I mean, look at the following code:


Now the page will output the beginning of the Declaration of Independence. You can see that x-data has defined a plain old JavaScript object with a single field, “message,” containing the preamble, and that the x-text refers to this object field.

Reactivity in Alpine

Now we’ll use reactivity to fix up an error in the document:


As should now be evident, the x-text directive refers to the noun variable exposed by the x-data directive. The new piece here is the button, which has an x-on:click directive. The handler for this click event replaces the old default noun (“men”) with a gender-neutral one, “people.” Reactivity then handles updating the reference in the x-text.

The UI will automatically reflect the change to the data.

Functions in data

The data properties in Alpine are full-featured JavaScript objects. Knowing that, here’s another way to handle the above requirement:


In this example, you can see that the data object now hosts a fixIt method that is called by the click handler. We can craft whatever object structure is best suited to the behavior we want to see in the HTML.

Fetching remote data

Now let’s switch gears and think about a requirement where you want to load a JSON-formatted list of the American presidents from an external API. The first thing we’ll do is load it when the page loads. For that, we’ll use the x-init directive:


Let’s unpack this code. The x-data directive should be clear; it simply has a presidents field with an empty array. The x-text in the span element outputs the contents of this field.

The x-init code is a bit more involved. First off, notice that it is wrapped in a self-executing function; this is because Alpine expects a function (not a function definition). If you were to use the non-async callback form of fetch, you don’t need to wrap the function like this (because you don’t require the async-scoped function in that case).

Once the list of presidents is obtained from the endpoint, we stick it into the presidents variable, which Alpine has exposed to us as part of the x-data object.

To reiterate: Alpine is making the data from x-data available to the other directive functions (like x-init) within the same context.

Iterating with Alpine

At this point, the app is pulling the data from the remote endpoint and saving it into the state; however, it is outputting something like [Object],[Object]..... That is not what we want. To fix it, we need to first get a look at iterating over the data:


Man, that is really clean, self-explanatory code and template!

The code contains a normal un-ordered list, and then an HTML template element, which contains an x-for directive. This directive operates just like it does in other reactive frameworks. It allows specifying a collection, presidents, and an identifier, which will be provided to the enclosed markup representing each instance of that collection (in this case, pres).

The rest of the markup makes use of the pres variable to output data from the objects via x-text. (This use of iterator is one of the most prevalent patterns in all of software, by the way.)

The app now looks something like the screenshot below, showing a list of United States presidents.

A list of United States presidents generated with Alpine.js.

Show/Hide and onClick

Now let’s say we want to add the ability to toggle the data for a president by clicking on the president’s name. We modify the markup to look like this:



We use the x-show directive on a div containing the presidential details. The truthiness of the x-show value determines if the content is visible. In our case, that is determined by pres.show field. (Note that in a real application, you might not want to use the actual business data to host the show/hide variable, to keep data and behavior more isolated.)

To change the value of pres.show we add an x-on:click handler to the header. This handler simply swaps the true/false value of pres.show: pres.show = ! pres.show.

Add transition animation

Alpine includes built-in transitions that you can apply to the show/hide feature. Here’s how to add the default animation:


From: Until:

All that changed was the element bearing the x-show directive, which now also has an x-transition directive. By default, Alpine applies sensible transitions. In this case, a slide and fade effect is used. You can customize the transition extensively, including by applying your own CSS classes to various stages of the animation. See theAlpine transition docs for more info.

Binding to inputs

Now we’ll add a simple filter capability. This will require adding an input that you bind to your data, then filtering the returned dataset based on that value. You can see the changes here:


...

Notice the x-data object now has a “filter” field on it. This is two-way bound to the input element via the x-model directive which points to “filter“.

We’ve changed the template x-for directive to reference a new getPresidents() method, which is implemented on the x-data object. This method uses standard JavaScript syntax to filter the presidents based on whether they include the text in the filter field.

See my GitHub repository to view all the code for examples in this article.

Conclusion

Like its namesake, Alpine is a backpack with the basic gear to get you through the mountains. It is minimal, but sufficient. It does include some higher-level features, such as a central store and an eventing system, as well as a plugin architecture and ecosystem.

In all, Alpine is ergonomic to use and will be familiar if you’ve worked with other reactive frameworks. For these reasons, it’s quick and easy to learn. The simplicity of declaring a component and its data in an x-data directive is simply genius. Alpine will be a tempting option the next time I go code venturing.

See my JavaScript framework comparison for more about Alpine and other front-end frameworks.

(image/jpeg; 0.71 MB)

Vibe coding is groovy 26 Mar 2025, 5:00 am

Vibe coding is most definitely having a moment.

Don’t feel bad if you haven’t heard of it—the Wikipedia page for it just went up on March 15 of this year. Vibe coding is a new way of working with AI, where you guide the code through natural language and intuition, rather than spelling out every detail or actually writing the code. You express the “vibe” of the app and the AI does the grunt work.

Last month I wrote about how generative AI will change how we develop and write code. This past weekend I had a perspective-altering experience with vibe coding that I’d like to tell you about.

For the past few years, I’ve had a couple of ideas for websites knocking around in my head. Nothing too ambitious—just fun little ideas that would give folks a laugh or a moment of interest. I’ve always wanted to build a website that takes off—maybe even earns a little revenue. If things turned out well, I might even make a living at it. Who knows?

But of course, finding the time to build one of these sites is a challenge. It seems like not many of us have the leisure time to work on a side project that may or may not be revenue-generating. As a result, none of my obviously brilliant ideas were turned into an actual thing. Until this weekend.

Hello, Claude Code

Someone—I wish I could remember who, so I could properly thank them—pointed me in the direction of Claude Code. Built by Anthropic, Claude Code is a large language model (LLM) tuned specifically to helping developers and building things in code. You install it in the directory where your project resides, and it runs on the command line.  Claude Code can read your entire application structure, understand it, answer questions about it, and most importantly, make changes to it that you ask for in plain language. 

It works very well. Astonishingly well. Terrifyingly well.

I started with a completely blank Astro application. I created a file called claude.md and gave Claude thorough instructions on the general rules I wanted it to follow, including strict TypeScript typing and always using the “Astro way” of doing things according to the documentation, which I referenced via its URL.

Once I had done that, I gave Claude a paragraph describing my application, how I wanted the app to work, a general description of how I wanted it to look, and how I wanted to do authentication. In less than an hour, I had a basic site up and running. No, really. About an hour. It took about three more hours to fine-tune it, add features, and tweak the user interface.

Claude is aware of all the code in the directory where its command line tool is run. It can see the entire context of your application and can make changes on the fly. You can ask to approve all the changes or just let it do its thing.

Much of the time spent was me looking at the code it had written and fixing a few things. Occasionally, it would run off on a strange tangent and I had to reel it back in, but ultimately, it completed in hours what may have taken me weeks to figure out. I’m not an Astro genius by any stretch of the imagination, but I’m smart enough to know good code when I see it and I understand Astro’s basic way of doing things. 

The amazing thing was that Claude added all kinds of little touches that I didn’t ask for, including avatar support in the login screen, the inclusion of the name of the user, and the date the user posted something. I simply typed “I’d like to be able to edit the entries that I have made,” and Claude just did it, providing a lovely UI for editing and links in all the right places. I asked Claude to “add tastefully located Google Adsense ads” and it did so, including a placeholder for testing and a switch to show real ads in production. It just knew what the “right” thing to do was and it did it. 

Way to code, Claude

The downside? Claude Code wasn’t cheap—it cost me about $50 in processing fees. You can burn through cash pretty quickly. But when I consider the countless hours of work it saved me, it was well worth it. As I went along, I became more careful about what I asked it to do.

This experience has changed the way I will code going forward. I will leverage my coding experience to get Claude Code to do all the heavy lifting. It was almost as though I was pair-programming with a very capable junior developer who was eager to write all the code while I told her what to do. 

It is important to note that Claude will not enable a newbie to suddenly write code. Critical to this whole process was my knowing what to ask for at the start—frameworks, coding guidelines, etc.—as well as knowing when Claude was going off the rails and doing things that were well off the path of best practices. Keeping things on track was a large part of the work. 

But this is just the beginning of vibe coding, and it probably won’t be too long before anyone can build what they want, and then our creativity will be the limit to what can be done. But I tell you, it was a big revelation to me to have something I’ve fretted about doing for months up and running in a single afternoon. Happily, that appears to be the norm going forward, and it is only going to get better.

(image/jpeg; 0.52 MB)

Open-source Styrolite project aims to simplify container runtime security 26 Mar 2025, 5:00 am

Today Edera launched a new open-source project called Styrolite to bring tighter controls to the interactions between containers and Linux kernel namespaces, at a layer below where Open Container Initiative (OCI) runtimes like containerd operate.

While software supply chain security incidents like Log4j and XZ Utils have dominated the container security headlines in recent years, the container runtime remains an irresistible target. Exploits that target low-level kernel subsystems, such as Dirty Cow and Dirty Pipe, allow attackers to escape containers and escalate privileges. 

Created by Ariadne Conill, co-founder and distinguished engineer at Edera, Styrolite is a programmable sandboxing tool that gives platform engineering teams the ability to “quarantine” the interactions between containers and Linux namespaces. The name comes from a sci-fi quarantine substance in Star Trek Next Generation.

Historically, the container runtime has provided very poor isolation guarantees, Conill says. “I think we’ve gotten to a point where people just don’t understand how these components come together, and think that namespaces provide true isolation,” she said. “They can’t, because they exist as a subset of the shared kernel state.” 

Slippery Linux namespaces

Linux namespaces allow containers to contend for underlying resources in multi-tenant environments. But while the container-to-Kubernetes handshake requires the flexibility to place workloads side-by-side on various Linux hosts across clusters, Linux namespaces were never intended to serve as security boundaries. Which is why container runtime attacks and container escapes are so prevalent. 

“Essentially Styrolite is similar to a container runtime interface (CRI) but focused on the containers’ actual interactions with the kernel,” Conill says. “Styrolite focuses on securing the fundamentals of how images get mounted into namespaces in areas like timekeeping, mounts, and process collections in the process ID namespace.”

By managing the life cycle for those core namespace interactions, Styrolite gives engineers much more granular control over the resource interactions of containers, through configuration of their container images.

Written in Rust and designed as a microservice, Conill says Styrolite helps “bridge the gap between the modern cloud-native computing paradigm and traditional security techniques like virtualization-based security.”

“We’ve basically made Styrolite behave in a similar way to how OCI components work,” said Conill. “In essence, we’ve turned the container sandbox management into a proper microservice in the same way that Kubernetes uses the CRI to connect to containerd or other CRI implementations.”

Sandboxing container runtimes

There have been other attempts at sandboxing container runtimes. Bubblewrap is the best known, as the low- level container sandboxing project commonly used for Fedora and RPM builds. 

“These tools are either too high-level (like the Kubernetes CRI), or they are designed to be used via shell scripting,” said Conill. “While CLIs allow for rapid iteration, we wanted to build a rich programmatic interface for spawning and managing containers.” 

For developers and security professionals used to Bubblewrap, Conill says they will immediately notice how differently Styrolite handles security configurations. Bubblewrap is a very opinionated tool with a complex command line interface that makes it easy for someone moving too fast to inadvertently escalate privileges to hosts, she says.

“Navigating these runtime configurations without proper guardrails is how you can accidentally grant containers full root directory access on a host, when you were merely trying to pass through file sharing,” Conill said.

Conill sees a broad security awakening underway in container security, and she believes tools like Styrolite are foundational to better security configurability by default.

(image/jpeg; 0.26 MB)

Oracle releases ML-optimized GraalVM for JDK 24 25 Mar 2025, 7:47 pm

Oracle has released GraalVM for JDK 24, an alternative Java Development Kit tuned to just-released JDK 24 that uses ML (machine learning)-based profile inference to boost peak performance by about 7.9% on average on microservices benchmarks, the company said.

GraalVM for JDK 24 was released on March 18. GraalVM for JDK 24 can be downloaded from graalvm.org.

With this latest update, a new generation of ML-enabled inference, called GraalNN, is being introduced. GraalNN provides context-sensitive static profiling with neural networks. Oracle said it has seen a roughly 7.9% peak performance improvement on average on a wide range of microservices benchmarks including Micronaut, Spring, and Quarkus. Native Image in Oracle GraalVM has used a pre-trained ML model to predict execution probabilities of control flow graph branches, enabling powerful optimizations and better peak performance of native images, according to Oracle. GraalVM compiles a Java application to a native binary, which starts up 100x faster, provides peak performance with no warmup, and uses less memory and CPU than an application running on a Java Virtual Machine (JVM), the company said.

Also in this release is the introduction of SkipFlow, an extension of points-to-flow analysis that tracks primitive values and evaluates branching conditions during the run of the analysis. SkipFlow enables production of smaller binaries without increasing the build time. Image builds tend to be slightly faster with SkipFlow enabled because of fewer methods to compile and analyze, Oracle said.

GraalVM for JDK 24 also takes a first step toward Java agent support at runtime. Until now, agents have been supported by Native Image but with constraints, such as the agent having to run and transform all classes at build time. Oracle has continued work to optimize more vector API operations on GraalVM, with more operations now efficiently compiled to SIMD (single instruction, multiple data) code, where supported by the target hardware.

Other improvements in GraalVM for JDK 24 include:

  • Experimental support for jcmd on Linux and macOS. jcmd is used to send diagnostic command requests that are useful for controlling Java Flight Recordings, troubleshooting, and diagnosing applications.
  • Additional security features in Native Image including support for dependency trees.
  • The "customTargetConstructorClass" field has been removed from serialization JSON metadata. All possible constructors now are registered by default when registering a type for serialization.
  • Support has been added for Java module system-based service loading.

(image/jpeg; 28.4 MB)

Warning for developers, web admins: update Next.js to prevent exploit 25 Mar 2025, 3:52 pm

Developers and web admins using the Next.js framework for building or managing interactive web applications should install a security update to plug a critical vulnerability.

The vulnerability, CVE-2025-29927, allows an authorization bypass if the “middleware” function is enabled for linking to a service. This vulnerability is critical if the middleware that Next.js is connecting to performs security functions such as authorization, access control, or checking if session cookies are valid.

“This vulnerability would allow you to by-pass that check,” noted Johannes Ullrich, dean of research at the SANS Institute.

“If you are affected, it basically allows a very trivial authentication bypass,” he said. If Next.js is used on an e-commerce site, for example, all a threat actor would have to do is log in as a regular customer and they could explore the company’s use of the framework, then tamper with security controls.

“You can access things like admin features that are supposed to be authorized just by adding a simple header [to bypass security],” he said.

According to researchers Rachid A and Yasser Allam, who discovered the hole, “the impact is considerable, with all versions affected and no preconditions for exploitability.”

All versions of Next.js starting with version 11.1.4 are vulnerable. Developers and admins should immediately make sure that their installation of Next.js 15.x uses version 15.2.3. Those who want to stay on version 14.x should upgrade to 14.2.25.

Not affected are on-prem applications that don’t invoke the “middleware” command (next start with output: standalone), or applications hosted on Vercel – which develops Next.js — or Netlify.

Vercel recommends that, if patching to a safe version is not feasible, admins should prevent external user requests which contain the x-middleware-subrequest header from reaching the Next.js application.

While Next.js is an open source tool, Ullrich said that commercial tools have had similar vulnerabilities in headers that could be spoofed by an attacker.

“It’s really a vulnerability in the way modern web applications are built, particularly if they target cloud deployments,” he said. “They are often built with different components that hand requests back and forth to find the answer to a user’s request. Things like this are often used to short-cut or simplify authorization. But if it’s not done correctly you end up with these bypass vulnerabilities.”

“There are likely more vulnerabilities like this lingering in other [development] frameworks,” he warned.

(image/jpeg; 3.85 MB)

Fauna to shut down FaunaDB service in May 25 Mar 2025, 6:47 am

Fauna, the provider of the NoSQL database FaunaDB, has said that it will shut down the service by the end of May due to the unavailability of capital required to support the database service and market it.

“…after careful consideration, we have made the hard decision to sunset the Fauna service over the next several months,” the company wrote in a blog post, adding that the sunset time will be May 30 at noon Pacific time.  

The company said all Fauna enterprise customers need to move their applications and data out of Fauna by that date, adding that after the specified date, all Fauna accounts and their associated data will be permanently deleted.

The move to shut down operations is expected to affect over 195 databases and over 3,000 development teams across several enterprises. These businesses have a little over two months to switch databases, as per Fauna’s sunset timeline.

However, the company said that it will provide support during migration, and for that, it has built new tooling and provided guidance on migration approaches in a migration guide.

Migration support only for premium plans

Fauna said it is willing to help enterprises with additional support or for any help with migration or transition via its Support Portal. However, this additional support is available only for enterprises that are on Fauna’s Startup, Pro, and Enterprise plans.

Customers on the free and pay-as-you-go plans will have to rely on the company’s community forum for help.

The move to cease the database service has attracted criticism from users of the database on X.

One user by the handle “@ImSh4yy” pointed out that users or enterprises should “think twice” before vendor-locking themselves to an unprofitable startup for a critical part of their infrastructure.

Another user by the handle “@TweetsOfSumit”, who is the founder of an application named Parqet, wrote that his application was initially built on FaunaDB but he had to switch to MongoDB in a few months as it was cumbersome to develop using FaunaDB and the database had “crazy bugs when storing decimals.”

The company has also said that it will not accept any new customers.

At the same time, Fauna has committed to releasing an open-source version of Fauna’s core database technology alongside its existing open-source drivers and command line interface (CLI) tooling soon but did not provide any timeline.

The company said that the open-source version will enable FaunaDB’s user community to continue to access its core features, such as the document-relational data model and its database language (FQL). “We hope this will serve as both a valuable reference for database practitioners and will provide ongoing value to the wider developer community,” the company wrote.

(image/jpeg; 2.73 MB)

Google acquires Wiz: A win for multicloud security 25 Mar 2025, 5:00 am

Google’s recent acquisition of Wiz positions the tech giant as a leader ready to tackle today’s multicloud challenges, potentially outperforming competitors such as Microsoft Azure and AWS. The collaboration aims to simplify complex security architectures and underscores Google’s commitment to addressing the gaps many enterprises face when combining different cloud ecosystems.

Let’s explore why this acquisition is a positive development, what it means for multicloud deployments, and how enterprises should evaluate potential opportunities.

The gaps in multicloud security

One of the most significant pain points of a multicloud strategy is the lack of adequate integrated security solutions. When companies operate across multiple clouds, they face the daunting challenge of piecing together a patchwork of security tools, policies, and compliance frameworks that expose critical gaps in defense.

This is where the Google-Wiz partnership is a game-changer. Wiz’s cloud security platform brings a much-needed holistic view to protecting multicloud and hybrid environments. Unlike traditional tools focused on response and mitigation, Wiz specializes in prevention; it rapidly scans environments and maps interdependencies between cloud resources, services, and applications. It identifies potential attack paths before they can be exploited. Importantly, Wiz provides integration across all major cloud providers, allowing enterprises to bridge the disconnected silos that currently hinder multicloud security efforts.

Google correctly identified a significant gap in the current cloud security landscape, and this acquisition signifies an innovative and forward-looking move—a rare advancement in the cloud industry today.

While AWS and Microsoft Azure lead the market in cloud adoption, their multicloud strategies remain relatively siloed. AWS heavily promotes an all-in approach to lock customers within their ecosystem, actively discouraging multicloud migrations by offering limited cross-platform solutions. Microsoft has made incremental progress in interoperability but still develops its tools primarily around its Azure stack.

Google’s investment in multicloud reflects its unique strategy. It acknowledges the reality that businesses seldom depend on a single cloud provider. By committing to multicloud security through Wiz’s capabilities, Google has established itself as an enabler rather than a hindrance. Businesses now have access to a vendor that embraces and excels in diverse cloud environments.

Additionally, Google’s deep AI and cloud expertise enhances the value proposition of Wiz. By utilizing AI-powered threat detection and response along with Wiz’s code-to-cloud security functionalities, enterprises can automate and reinforce their defenses against emerging threats. Most intriguing is the potential to tackle the concerns regarding threats to and from AI models—an area where Google is setting benchmarks that others have yet to achieve. I’ll discuss that further in other posts.

A unified security platform?

The true power of the Google-Wiz alliance lies in its ability to create a unified security platform for multicloud environments. Companies have historically had to manage the fragmentation of various policies, disconnected insights, and manual workarounds to secure each cloud system. This inefficiency increased operational risk and further burdened security teams as they grappled with growing workloads.

Google Cloud’s integration of Wiz empowers organizations to centralize their security operations and gain visibility across diverse systems. Enterprises have access to precise threat intelligence that offers insights “from the perspective of adversaries,” highlighting vulnerabilities in defenses before hackers can exploit them. Additionally, Wiz’s distinctive emphasis on development-stage enhancements guarantees that security issues can be tackled even before applications are launched, minimizing vulnerabilities caused by inadequate code practices.

Essentially, Google and Wiz provide more than just a set of tools—they present a comprehensive framework that integrates development, security, and operations. This integration allows for less time spent managing breaches and a greater focus on innovation, something enterprises urgently need to stay competitive.

Worth a look

As businesses evaluate this Google-Wiz news, they should consider its implications for their multicloud strategies. For years, the challenge of maintaining a consistent security posture across multiple clouds has derailed many promising deployments. I’ve witnessed this firsthand. With Wiz’s technology now integrated into Google Cloud’s arsenal, companies finally have a viable, comprehensive path forward. At least, that’s the promise.

Enterprises should view the Google-Wiz venture as an opportunity to rethink their cloud security strategies. Google’s multicloud-focused approach allows them to simplify their operations, lower costs, and reduce the risk of breaches. This acquisition cements Google as an enterprise-ready partner, no longer just a competitor fighting for cloud market share. For CIOs, CTOs, and other stakeholders tasked with safeguarding their digital environments, this announcement should signal a shift in how they prioritize their technology partners for the future.

Ultimately, Google’s acquisition of Wiz is a step toward solving the pressing multicloud security challenges that have held enterprises back for years. Forward-thinking businesses should capitalize on the potential of this new era in cybersecurity. This is far from just a win for Google; it’s a win for the entire multicloud ecosystem.

But, if Google and Wiz somehow screw this up, I’ll say it here. Stay tuned.

(image/jpeg; 5.32 MB)

Cosmonic uses WebAssembly to manage apps 25 Mar 2025, 5:00 am

Leveraging the WebAssembly binary instruction format, Cosmonic has launched wasmCloud-based Cosmonic Control, an enterprise control plane for managing distributed applications. Introduced March 24, this control plane can be used across any cloud, Kubernetes, any edge, or for on-premises or self-hosted deployments.

Enterprise platform engineering teams using Cosmonic Control can create polyglot golden templates and components, enabling developers to write applications and deploy them anywhere, the company said. Cosmonic Control supports wasmCloud, a WebAssembly-based (Wasm) orchestration platform for building Wasm apps, and addresses challenges to scaling and distributing applications on containerized platforms. WasmCloud is co-maintained by Cosmonic. Cosmonic Control is intended for platform teams who want a standard set of controls, templates, and processes in order to work securely without sacrificing velocity.

Scaling and distributing applications on a containerized platforms has a high cost, with more than 80% of container spend wasted because of long cold-start times and other issues, and 50% of development time wasted on patching dependencies and maintaining boilerplate code. Cosmonic Control enables platform engineering teams to maintain and update applications at scale while letting developers build new features. A unified control plane is provided for managing WebAssembly workloads from a single interface.

Interested parties can book a demo of Cosmonic Control. The company cites the following key benefits:

  • Cosmonic Control scales applications to zero with zero cold starts and executes in a tiny footprint, which lowers cloud costs.
  • Platform engineers can fit more workloads on existing hardware.
  • Developers can build Wasm components, which are small units of code that interoperate with other components written in any other language.
  • Secure Wasm sandboxes ensure that each component operates safely.

Cosmonic Control also launches with a BYOC (bring your own cloud) service. Customers can deploy WasmCloud infrastructure inside their own cloud. Platform engineering teams then have access to an enterprise-grade suite of observability tools, guardrails, and control to ensure their platform can scale.

(image/jpeg; 4.3 MB)

GenAI tools for R: New tools to make R programming easier 25 Mar 2025, 5:00 am

My previous article focused on some of the best tools for incorporating LLMs into your R scripts and workflows. We’ll expand on that theme here, offering up a collection of generative AI tools you can use to get help with your R programming or to run LLMs locally with R.

Coding help for R developers

Getting help with writing code is one of the most popular uses for large language models. Some developers prefer to use these tools within their integrated development environments (IDEs); others are content to copy and paste into external tools. R programmers have options for both.

gander

The gander package is a bit like GitHub Copilot Light for R scripts. It’s an IDE add-in for RStudio and Positron that can be aware of code in scripts around it as well as variables in your working environment. If you’ve selected code when you invoke gander, you’ll be asked whether you want the model’s result to replace that code or be added either before or after.

gander’s interface asks if you want to replace selected code or put its suggestions either before or after.

gander’s interface asks if you want to replace selected code or put its suggestions either before or after your selection.

You can choose any model supported by ellmer to use in gander. As of early 2025, package author and Posit developer Simon Couch recommended Anthropic’s Claude Sonnet for its R prowess. You can set that as your default with options(.gander_chat = ellmer::chat_claude()).

As always when using a commercial LLM provider, you need to make an API key available in your working environment. As with many current R packages for working with LLMs, you can also use a local model powered by ollama. Note that you can also use ellmer as a chatbot to ask questions about R and run LLMs locally. (See my previous article for more about ellmer and ollama.)

Installing gander

gander is available for R scripts in both RStudio and Positron. You can download it from CRAN or install the development version with pak::pak("simonpcouch/gander").

That command also installs an add-in within RStudio, where you can choose a keyboard shortcut to invoke it.

For Positron, there are instructions on the package website’s homepage for opening the Positron command palette and adding code to the keybindings.json file.

To see what data gander sent to the model—the background information from your script and working session, as well as the question you typed—you can run gander_peek() after running the add-in.

Some gander settings can be changed with R’s options() function. The tool’s default style instructions are:


Use tidyverse style and, when relevant, tidyverse packages. For example, when asked to plot something, use ggplot2, or when asked to transform data, using dplyr and/or tidyr unless explicitly instructed otherwise.

You can change that default with code such as options(.gander_style = "Use base R when possible.") See more customization possibilities by running ?gander_options in your R console.

You can learn more about gander on the gander package website and Posit blog.

chatgpt

An “interface to ChatGPT from R,” chatgpt features around 10 RStudio add-ins for things such as opening an interactive chat session, commenting selected code, creating unit tests, documenting code, and optimizing selected code. If you don’t select any code for the programming-specific tasks, it will evaluate the entire active file.

This could be a good choice for people who don’t use large language models often or don’t want to write their own prompts for regular programming tasks. And if you don’t want to call on different add-ins for each task, you can use its functions in the terminal, such as comment_code() and complete_code().

You can customize model settings with R environment variables, such as OPENAI_MODEL (gpt-4o-mini is the default), OPENAI_TEMPERATURE (which defaults to 1—my choice would be 0), and OPENAI_MAX_TOKENS (which defaults to 256).

Note that the chatgpt package only supports OpenAI models.

gptstudio

gptstudio, on CRAN, is another RStudio add-in that offers access to LLMs. It features defined options for spelling and grammar, chat, and code commenting. I found the interface a bit more disruptive than some other options, but opinions will likely vary on that.

The gptstudio package supports HuggingFace, Ollama, Anthropic, Perplexity, Google, Azure, and Cohere along with OpenAI.

pkgprompt

pkgprompt can turn an R package’s documentation—all of it or only specific topics—into a single character string using the pkg_prompt() function. This makes it easy to send that documentation to an LLM as part of a prompt. For example, the command


library(pkgprompt)
pkg_docs

returns the documentation for the dplyr package’s across and coalesce functions as a single character string. You can then add the string to any LLM prompt, either within R or by copying and pasting to an external tool. This is another R package by Simon Couch. Install with: pak::pak("simonpcouch/pkgprompt").

Help outside your IDE

If you’re willing to go outside your coding environment to query an LLM, there are some R-specific options in addition to a general-purpose chatbot like Claude that knows some R.

Shiny Assistant

If you build Shiny web apps, Posit’s Shiny Assistant is a great resource. This free web-based tool uses an LLM to answer questions about building Shiny apps for both R Shiny and Shiny for Python. Note that the Shiny team says they may log and look at your queries to improve the tool, so don’t use the web version for sensitive work. You can also download the Shiny Assistant code from GitHub and tweak to run it yourself.

R and R Studio Tutor + Code Nerd

R and R Studio Tutor is a custom GPT that adds specific R information to basic ChatGPT. It was developed by Jose A Fernandez Calvo and is designed to answer questions specifically about R and RStudio.

Code Nerd by Christian Czymara is another custom GPT that answers R questions.

R Tutor + Chatilize

One of the earliest entries in the GenAI for R space, R Tutor still exists online. Upload a data set, ask a question, and watch as it generates R code and your results, including graphics.

R Tutor lets you ask questions about a data set and generate R code in response.

R Tutor will let you ask questions about a data set and generate R code in response.

The code for RTutor is available on GitHub, so you can install your own local version. However, licensing only allows using the app for nonprofit or non-commercial use, or for commercial testing. RTutor was a personal project of Dr. Steven Ge, a professor of bioinformatics at South Dakota State University, but is now developed and maintained by a company he founded, Orditus LLC.

Chatilize, a newer option that is similar to R Tutor, can generate Python as well as R.

Interacting with LLMs installed locally

Large LLMs in the cloud may still be more capable than models you can download and run locally, but smaller open-weight models are getting better all the time. And, they may already be good enough for some specific tasks—the only way to know is to try. Local models also have the huge advantage of privacy, so you never have to send your data to someone else’s server for analysis. Plus, you don’t have to worry about a model you like being deprecated. They’re also free beyond whatever it costs to run your desktop PC. The general-purpose ellmer package lets you run local LLMs, too, but there are also R packages specifically designed for local generative AI.

rollama and ollamar

Both the rollama and ollamar packages let you use R to run local models via the popular Ollama project, but their syntax is different. If you want an Ollama-specific package, I’d suggest trying both to see which you prefer.

In addition to one (or both) of these R packages, you’ll need the Ollama application itself. Download and install Ollama as a conventional software package—that is, not an R library—for Windows, Mac, or Linux. If Ollama isn’t already running, you can run ollama serve from a command prompt or terminal (not the R console) to start the Ollama server. (Setting up Ollama to run in the background when your system starts up is worth doing if you use it frequently.)

After loading the rollama R package with library(rollama), you can test with the ping_ollama() function to discover if R sees an Ollama server running.

If you don’t already have local LLMs installed with Ollama, you can do that in R with the pull_model("the_model_name) or ollama pull the_model_name in a terminal. You can check what models are available for Ollama on the Ollama website.

To download the 3B parameter llama 3.2 model, for instance, you could run pull_model("llama3.2:3b") in R.

To set a model as your default for the session, use the syntax options(rollama_model = "the_model_name). An example is: options(rollama_model = "llama3.2:3b").

For a single question, use the query() function:


query("How do you rotate text on the x-axis of a ggplot2 graph?")

If you want a chat where previous questions and answers remain in memory, use chat(). Both functions’ optional arguments include screen (whether answers should be printed to the screen) and model_params (named list of parameters such as temperature). query() also includes a format argument whose returned value can be a response object, text, list, data.frame, httr2_response, or httr2_request.

Queries and chats can also include uploaded images with the images argument.

ollamar

The ollamar package starts up similarly, with a test_connection() function to check that R can connect to a running Ollama server, and pull("the_model_name") to download the model such as pull("gemma3:4b") or pull("gemma3:12b").

The generate() function generates one completion from an LLM and returns an httr2_response, which can then be processed by the resp_process() function.


library(ollamar)

resp 

Or, you can request a text response directly with a syntax such as resp ). There is an option to stream the text with stream = TRUE:


resp 

ollamar has other functionality, including generating text embeddings, defining and calling tools, and requesting formatted JSON output. See details on GitHub.

rollama was created by Johannes B. Gruber; ollamar by by Hause Lin.

Roll your own

If all you want is a basic chatbot interface for Ollama, one easy option is combining ellmer, shiny, and the shinychat package to make a simple Shiny app. Once those are installed, assuming you also have Ollama installed and running, you can run a basic script like this one:


library(shiny)
library(shinychat)

ui 

That should open an extremely basic chat interface with a model hardcoded. If you don’t pick a model, the app won’t run. You’ll get an error message with the instruction to specify a model along with those you’ve already installed locally.

I’ve built a slightly more robust version of this, including dropdown model selection and a button to download the chat. You can see that code here.

Conclusion

There are a growing number of options for using large language models with R, whether you want to add functionality to your scripts and apps, get help with your code, or run LLMs locally with ollama. It’s worth trying a couple of options for your use case to find one that best fits both your needs and preferences.

(image/jpeg; 0.55 MB)

Prompt engineering courses and certifications tech companies want 24 Mar 2025, 5:00 am

Prompt engineering is the process of structuring or creating an instruction to produce the best possible output from a generative AI model. Industries such as healthcare, banking, financial services, insurance, and retail all have use cases for prompt engineering, says the report. Prompt engineering serves various applications, including content generation, problem-solving, and language translation, among others, and helps genAI models respond to a range of queries, it says.

Factors driving the market growth in prompt engineering include technological advancements in genAI and related fields, along with the growing digitalization and automation in various industries. The report says the growing adoption of AI— especially natural language processing (NLP)—is boosting the demand for prompt engineers.

Prompt engineering certification and hiring

As software developers and others integrate prompt engineering into their AI-enabled workflows, professional courses and certifications are bridging the knowledge gap, and some hiring managers are taking notice.

A multitude of prompt engineering courses and certifications offer candidates the opportunity to learn new skills in artificial intelligence, genAI, and other areas. Such courses or certifications can make a job applicant more attractive to organizations looking to hire people with these types of skills.

“Prompt engineering certifications can have a huge impact on the hiring process,” says Jason Wingate founder and CEO of Emerald Ocean, a holding company. As prompt engineering is fairly new, human resources can’t rely on years of experience or the completion of a four-year degree in prompt engineering, Wingate says, so certifications become the next best thing.

“We are at the early stage of how prompt engineering certifications get included in the recruiting process,” says Neil Costa, founder and CEO at HireClix, a global recruitment marketing agency. “I think it’s important that formal education programs and certifications are spawning and rapidly evolving, so there can be a measure of validation in this new area of skill development.”

John Yensen, president of managed IT services provider Revotech Networks, says prompt engineering certifications can have a large impact on the hiring process, but their value largely depends on context.

“For example, in industries where AI drives automation, customer support, or even content generation, certified professionals might stand out more,” Yensen says. “With that said, hands-on experience demonstrating real-world problem-solving skills with AI stands out even more. Of course, certifications can help verify a candidate’s foundational knowledge. But hiring managers will still look for practical expertise.”

Certification vs. hands-on experience

Others downplayed certifications for prompt engineering. “I don’t think certifications are the key factor in hiring for AI-related roles, especially for a field like prompt engineering,” says Ximena Hartsock, founder of BuildWithin, a company that helps organizations build apprenticeship, upskilling, training, and mentoring programs.

“What matters most is hands-on experience, playing with the available tools, experimenting, and learning by doing,” Hartsock says. “AI models are improving rapidly, and fine-tuning is becoming less critical than it once was. The real skill is understanding how to work with the AI tools effectively.”

Prompt engineering is more about language than code, Hartsock says. “My suggestion for anyone trying to break into the industry is to learn through the many free resources available,” she says.

Prompt engineering certifications are still finding their place in hiring, says Damien Filiatrault, founder and CEO of Scalable Path, a software staffing agency with a network of over 39,000 developers. “While they can signal AI literacy, hiring managers—especially in technical fields—still prioritize hands-on experience over a certificate.”

For some roles, such as AI-integrated marketing or customer support, a certification might be helpful in demonstrating structured knowledge, Filiatrault says. “However, in software engineering, data science, or machine learning, companies still expect candidates to show practical problem-solving skills rather than rely on formulaic prompt design,” he says.

“We’ve found that businesses benefit more from internal AI upskilling than relying on external certifications,” Filiatrault says. “Many AI-first companies are training employees on domain-specific AI applications rather than hiring based on standalone prompt engineering credentials.”

How certification can accelerate career development

Prompt engineering certificates can provide benefits for both individuals and organizations.

“It goes without saying that certifications can help accelerate career development by showcasing specialized skills in AI model optimization and prompt refinement,” Yensen says. “Certified employees can help improve efficiency in AI-driven workflows, which helps reduce costs and increase productivity,” and this is a benefit for all organizations.

“For example, companies that integrate AI into customer service can benefit from employees trained to craft precise, high-performing prompts, improving response accuracy and user experience,” Yensen says.

One of the biggest benefits of prompt engineering certifications for professionals is the fact that the field is so new. “It’s almost ‘all there is’ in terms of having a proven track record,” Wingate says. “For an individual that means potential career advancement, enhanced skills, and overall recognition.”

For employers, a prompt engineering certification can provide a reliable and efficient way to validate a candidate’s skills in very new and fresh field, Wingate says.

Prompt engineering certifications and courses

The certifications that are most demanded today tend to come from recognized technology leaders and educational platforms, Yensen says. “OpenAI, DeepLearning.AI, and Microsoft’s AI certifications have gained traction due to their alignment with industry tools and best practices,” he says. I anticipate that more specialist certifications will continue to emerge for different applications as AI adoption increases.”

Here are some of the most recognized prompt engineering certifications and courses.

AI+ Prompt Engineer Level 1

The AI+ Prompt Engineer Level 1 Certification Program introduces students to the fundamental principles of AI and prompt engineering. Covering the history, concepts, and applications of AI, machine learning, deep learning, neural networks, and NLP, the program also delves into best practices for designing effective prompts that harness the capabilities of AI models to their fullest potential.

AI Foundations: Prompt Engineering with ChatGPT

Offered by Arizona State University via Coursera, this prompt engineering course offers students an opportunity to delve into ChatGPT and large language models (LLMs). Students learn to evaluate prompts and create impactful ones, maximizing the potential of ChatGPT. Designed by Andrew Maynard, an expert in transformative technologies, the course covers prompt templates, creative prompt structures, and designing prompts for various tasks and applications.

AWS’s Essentials of Prompt Engineering

In this course from Amazon Web Services (AWS), participants are introduced to the fundamentals of crafting effective prompts. They gain an understanding of how to refine and optimize prompts for a range of use cases, and also explore techniques such as zero-shot, few-shot, and chain-of-thought prompting. Finally, students learn to identify potential risks associated with prompt engineering.

Blockchain Council’s Certified Prompt Engineer

The Certified Prompt Engineer certification program from Blockchain Council offers an overview of AI and prompt engineering and a deep understanding of prompt engineering fundamentals, including the principles and techniques of effective prompt engineering. Obtaining the Certified Prompt Engineer certification validates an individual’s knowledge and skills in prompt engineering, according to the council.

ChatGPT Masterclass: The Guide to AI & Prompt Engineering

This course offered by Udemy covers topics including how to apply ChatGTP to prompt engineering, task automation, code, digital marketing, optimizing workflows, creating content, and building websites.

ChatGPT Prompt Engineering for Developers

Offered by Deeplearning.ai in partnership with OpenAI, this course teaches students how to use a large language model to quickly build new and powerful applications. Using the OpenAI API, they will be able to quickly build capabilities that learn to innovate and create value in ways that were cost-prohibitive, highly technical, or impossible before now. This short course describes how LLMs work, provide best practices for prompt engineering, and shows how LLM APIs can be used in applications for a variety of tasks.

Generative AI: Prompt Engineering Basics

This course from IBM explains the concept and relevance of prompt engineering in generative AI models; applies best practices for creating prompts and exploring examples of impactful prompts; practices common prompt engineering techniques and approaches for writing effective prompts; and explores commonly used tools for prompt engineering.

Google Prompting Essentials

Participants in this course practice the steps to write effective prompts, including applying prompting techniques to help with everyday work tasks, using prompting to speed up data analysis and build presentations, and designing prompts to creating AI agents to role-play conversations and get expert feedback.

MIT’s Applied Generative AI for Digital Transformation

The Applied Generative AI for Digital Transformation course, offered by MIT Professional Education, delves into how generative AI generates original and innovative content, propelling an organization’s digital transformation efforts, according to MIT. By combining technical knowledge with management perspectives, ethical concerns, and human elements, the eight-week program provides a comprehensive understanding of AI-driven digital transformation strategies.

Vanderbilt’s Prompt Engineering Specialization

This course from Vanderbilt University is designed to enable students to master prompt engineering patterns, techniques, and approaches to effectively leverage generative AI. Areas covered include ChatGPT, genAI, advanced data analysis, problem formulation for genAI, chain of thought prompting, prompt patterns, and LLMs.

(image/jpeg; 0.52 MB)

Learning AI governance lessons from SaaS and Web2 24 Mar 2025, 5:00 am

The experimental phase of generative AI is over. Enterprises now face mounting pressure — from boardrooms to the front lines — to move AI into production to streamline operations, enhance customer experiences, and drive innovation. Yet, as AI deployments grow, so do its reputational, legal, and financial risks.

The path forward is clear. After all, good governance is good business. Gartner expects enterprises that invest in AI governance and security tools to achieve 35% more revenue growth than those that don’t. But many leaders are unsure where to start. AI governance is a complex, evolving field, and navigating it requires a thoughtful approach. Fortunately, lessons from the governance journeys of SaaS and Web2 offer a proven roadmap.

AI governance challenges

AI governance isn’t just a technical hurdle — it’s a multifaceted challenge. Gaining visibility into how AI systems interact with data remains difficult, because AI systems often operate as black boxes, defying traditional auditing methods. Solutions that have worked in the past, such as observability and periodic reviews of development practices, don’t mitigate the risks of unpredictable behavior nor prove acceptable use of data when applied to large language models (LLMs).

Complicating matters further is AI’s rapid evolution. Autonomous systems are advancing quickly, with the emergence of agents capable of communicating with each other, executing complex tasks, and interacting directly with stakeholders developing. While these autonomous systems introduce exciting new use cases, they also create substantial challenges. For example, an AI agent automating customer refunds might interact with financial systems, log reason codes for trends analysis, monitor transactions for anomalies, and ensure compliance with company and regulatory policies — all while navigating potential risks like fraud or misuse. 

The regulatory landscape also remains in flux, particularly in the U.S. Recent developments have added complexity, including the Trump administration’s recent repeal of Biden’s AI Executive Order. This will likely lead to an increase in state-by-state legislation over the coming years, making it difficult for organizations operating across state lines to predict the specific near-term and long-term guidelines they need to meet. Recent developments like the Bipartisan House Task Force’s report and recommendations on AI governance have highlighted the lack of clarity in regulatory guidelines. This uncertainty leaves organizations struggling to prepare for a patchwork of state-specific laws while managing global compliance demands like the EU AI Act or ISO 42001.

In addition, business leaders face numerous governance frameworks and approaches, each optimized to address different challenges. This abundance of approaches forces business leaders into a continuous cycle of evaluation, adoption, and adjustment. Many organizations resort to reactive, resource-intensive processes, creating inefficiencies and stalling AI progress.

It’s time to break the cycle. AI governance must evolve from reactive to proactive to drive responsible innovation.

From reactive to proactive governance

This ad hoc approach to AI governance mirrors the initial paths of SaaS and Web2. Early SaaS and Web2 companies often relied on reactive strategies to address governance issues as they emerged, adopting a “wait and see” approach. SaaS companies focused on basics like release sign-offs, access controls, and encryption, while Web2 platforms struggled with user privacy, content moderation, and data misuse.

This reactive approach was costly and inefficient. SaaS applications scaled with manual processes for user access management and threat detection that strained resources. Similarly, Web2 platforms faced backlash over privacy violations and inconsistent enforcement of policies, which eroded trust and hampered innovation.

The turning point for both industries came with the adoption of continuous, automated governance. SaaS providers implemented continuous integration and continuous delivery (CI/CD) pipelines to automate the testing of software and deployed tools for real-time monitoring, reducing operational burdens. Web2 platforms implemented machine learning to flag inappropriate content and detect fraud at scale. The results were clear: improved security, faster innovation, and lower costs. 

AI is now at a similar crossroads. Manual, reactive governance strategies are proving inadequate as autonomous systems multiply and data sets grow. Decision-makers frustrated with these inefficiencies can look at the shift toward automation in SaaS and Web2 as a blueprint for transforming AI governance within their organizations. 

Continuous and automated AI governance

A continuous, automated approach is the key to effective AI governance. By embedding tools that enable these features into their operations, companies can proactively address reputational, financial, and legal risks while adapting to evolving compliance demands.

For example, continuous, automated AI governance systems can track data to ensure compliance with the EU AI Act, ISO 42001, or state-specific legislation such as the Colorado AI Act. These systems can also reduce the need for manual oversight, allowing technical teams to focus on innovation rather than troubleshooting. 

As organizations increasingly integrate AI into their operations, the stakes for effective governance grow higher. The companies that adopt governance strategies focused on continuous and automated monitoring will gain a competitive edge, reducing risks while accelerating deployment. Those that don’t risk repeating the costly mistakes of SaaS and Web2 — falling behind on compliance, losing customer trust, and stalling innovation.

The message is clear: A continuous, automated approach to governance isn’t just a best practice — it’s a business imperative.

Greg Whalen is CTO of Prove AI.

Generative AI Insights provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.

(image/jpeg; 7.31 MB)

Page processed in 0.603 seconds.

Powered by SimplePie 1.3.1, Build 20121030175403. Run the SimplePie Compatibility Test. SimplePie is © 2004–2025, Ryan Parman and Geoffrey Sneddon, and licensed under the BSD License.