Ornith-1.0: self-improving open-source models for agentic coding – Tech

Ornith-1.0 delivers self-improving open-source models for agentic coding tasks

Ornith-1.0 is an open-source suite of self-improving models designed for agentic coding, released under the deepreinforce-ai project. The suite ships as three model checkpoints — a dense 9B model and two Mixture-of-Experts models at 35B and 397B parameters — all sharing the same OpenAI-compatible interface.

Ornith-1.0 delivers self-improving open-source models for agentic coding tasks

Ornith-1.0 is an open-source suite of self-improving models designed for agentic coding, available via the deepreinforce-ai project on GitHub. The release comprises three model checkpoints: a dense 9B model and two Mixture-of-Experts (MoE) variants at 35B and 397B parameters respectively.

All three checkpoints expose the same OpenAI-compatible interface and support a 256K context window (262,144 tokens). The dense 9B model is designed to fit on a single 80GB GPU, while the larger MoE checkpoints are sharded across multi-GPU nodes using tensor parallelism.

Ornith-1.0 is a reasoning model, meaning the assistant turn opens with a <think> … </think> block before producing a final answer. The chain-of-thought is returned in a separate reasoning_content field, and the model's tool-call blocks are surfaced as OpenAI-style tool_calls.

How It Works

Each model in the Ornith-1.0 suite is evaluated against size-appropriate baselines using the same harnesses and decoding setup across all three sizes. Benchmarks used include Terminal-Bench 2.1, SWE-bench Verified, SWE-bench Pro, SWE-bench Multilingual, SWE Atlas, NL2Repo, and ClawEval — an agentic code benchmark built over real-user task distributions.

Evaluation configurations vary by benchmark. Terminal-Bench 2.1 runs use a 4-hour timeout with 32 CPU cores and 48GB RAM, averaged over 5 runs. SWE-bench evaluations use the OpenHands harness with a 256K context window, while NL2Repo uses a 400K context window with a 48K output limit and anti-hacking filters.

The recommended sampling parameters for serving are temperature=0.6, top_p=0.95, and top_k=20, though temperature=1.0 is used to reproduce the reported benchmark results. Each checkpoint is published in multiple precision and format variants.

Getting Started

Ornith-1.0 requires recent runtimes to serve, including transformers >= 5.8.1. The project supports serving via vLLM or SGLang, both of which can be configured to stand up an OpenAI-compatible server. The dense 9B checkpoint is described as the easiest option for local testing.

Once a server is running, users can interact with it using any OpenAI-compatible client, including Python and Node.js SDKs or curl pointed at the standard /v1/chat/completions endpoint. Streaming tokens and tool-calling are both supported out of the box.

Story based on discussion on Hacker News.

Ornith-1.0: self-improving open-source models for agentic coding

How It Works

Getting Started

More Tech Stories

Bunny DNS removes all query fees and offers free DNS hosting

Open Culture Lists 1,700 Free University Courses Across Many Subjects

Beyond All Reason delivers free real-time strategy with full physic...