In Focus/Systematic Alpha: Technology Stack for the Modern Hedge Fund

Hedge Funds — Article 8 of 12

Short-Selling and Locate Management — AI for Hard-to-Borrow Stocks

Locate management has shifted from a back-office checkbox to an alpha-generating discipline. Funds applying machine learning to borrow cost prediction, recall risk, and multi-prime locate aggregation are cutting financing drag by 15-40 basis points annually while avoiding buy-in events that can erase a quarter's P&L overnight.

10 min read

Hedge Funds

Short-selling is where hedge fund operational risk and alpha intersect most violently. A trader can identify a perfect short — overvalued, deteriorating fundamentals, weak technicals — only to watch the thesis die because the borrow cost spiked from 50 basis points to 80% annualized, or because the prime broker recalled the loan three days before the catalyst. The 2021 GameStop episode, where borrow rates touched 226% and forced covers cascaded into a $1.27 billion loss for Melvin Capital in two weeks, was the extreme version of a daily reality for any fund running meaningful short exposure.

The securities lending market intermediates roughly $2.8 trillion in on-loan balances globally as of Q1 2026, generating about $10.5 billion in annual revenue for beneficial owners according to S&P Global Securities Finance data. For hedge funds on the borrower side, financing costs and locate availability are now first-class inputs into portfolio construction — not afterthoughts handled by a stock loan trader with a Bloomberg chat window. AI and structured data pipelines are reshaping how short books are managed from pre-trade through settlement.

$2.8TGlobal on-loan securities balance, Q1 2026 (S&P Global Securities Finance)

The Regulatory Floor: Reg SHO, Rule 204, and the New 10c-1a Regime

U.S. short-selling sits on top of Regulation SHO, which requires a broker-dealer to have a reasonable belief that the security can be borrowed and delivered by settlement (Rule 203(b)(1)) before accepting a short sale order. Rule 204 then enforces close-out: failures to deliver in threshold securities must be cured by the start of regular trading on T+1 in cash equities (T+0 for the day after settlement), or the broker must pre-borrow on subsequent short sales. With the move to T+1 settlement in May 2024, the time window for resolving locate errors compressed by roughly 40%, meaning a stale locate file is now a near-immediate buy-in risk.

Layered on top is SEC Rule 10c-1a, adopted in October 2023, which requires reporting of securities lending transactions to FINRA within 15 minutes of execution. Phased implementation lands in 2026, with public dissemination of aggregated loan terms beginning shortly after. For the first time, the buy-side will have access to near-real-time data on who is lending what at what rate, ending decades of opacity where rate discovery depended on relationships with three or four agent lenders. Funds that build infrastructure to ingest and act on the 10c-1a feed will have a structural edge over those relying on next-day batch reports from Markit or DataLend.

⚠️Rule 10c-1a operational impact

Borrowing counterparties (hedge funds via their prime brokers) are not direct reporters, but the data they consume becomes more granular and timely. Funds should expect borrow rates to converge faster across primes — and should renegotiate prime brokerage agreements that contain wide indicative-to-actual spreads, which historically ran 20-50 bps on general collateral and far wider on specials.

Why Locate Management Is an AI Problem

A mid-sized equity long/short fund running a $4 billion short book typically maintains locate relationships with three to six prime brokers, each providing daily availability files containing 4,000-9,000 names with indicative rates, quantities, and recall sensitivity scores. Multiply that by intraday updates, basket trading needs, ETF creation-redemption flows, and the locate file becomes a high-dimensional dataset that no human stock loan desk can optimize against in real time.

Three distinct prediction problems benefit from machine learning. First, forward borrow cost: given a position size, holding period, and current market microstructure, what is the expected average borrow rate over the trade's life? Second, recall probability: what is the likelihood that the agent lender recalls the loan within N days, forcing a buy-in or a more expensive re-borrow? Third, squeeze risk: when does utilization, days-to-cover, and options market signaling suggest a non-linear spike in borrow cost is approaching?

Traditional vs. AI-Augmented Locate Workflow

Function	Traditional Approach	AI-Augmented Approach
Borrow rate forecast	Trader judgment + indicative rate from PB	Gradient boosted model on utilization, lendable supply, options skew, 13F concentration
Locate aggregation	Manual review of 3-6 PB files in Excel	Automated reconciliation, optimization across PBs by rate × haircut × recall risk
Recall prediction	Reactive — handle when PB calls	Survival model estimating recall hazard rate by name and lender type
Squeeze detection	Anecdotal — Reddit, news, trader chatter	NLP on social + options flow + utilization regime change classifier
Cost attribution	Monthly P&L line item	Per-trade financing TCA, integrated with execution TCA

Building the Borrow Cost Forecasting Model

The most production-tested approach uses a gradient boosted tree (LightGBM or XGBoost) trained on 3-5 years of internal borrow tickets joined with market features. Inputs that consistently drive feature importance include: lendable supply from EquiLend Data & Analytics or S&P Global, utilization rate, on-loan balance trajectory over the prior 20 days, short interest ratio, 13F concentration in passive holders (which behave differently from active managers on recall), options put-call open interest skew, implied volatility term structure, and corporate action calendar proximity.

A well-built model produces a forward 5-day expected borrow rate with mean absolute error of 8-15 bps on general collateral and 200-400 bps on specials — wider on absolute terms but far tighter as a percentage. When integrated into the pre-trade workflow, the portfolio manager sees expected total financing cost as a line in the order ticket alongside expected execution slippage, the same way real-time Greeks surface in multi-asset risk dashboards. Names where expected financing cost exceeds expected alpha by a defined threshold are flagged for resizing or rejection.

💡Did You Know?

GameStop's borrow rate peaked at 226% annualized on January 27, 2021, while utilization hit 100% across major agent lender programs. Funds with automated squeeze-risk monitoring had algorithmic alerts firing five to seven trading days earlier, based on a regime change in options-implied borrow combined with retail sentiment scores.

Multi-Prime Locate Aggregation and Optimization

A fund with locate relationships at Goldman Sachs, Morgan Stanley, JP Morgan, and BNP Paribas receives separate availability files in different formats — some via SFTP, some via API, some still by email attachment. Building a normalized locate book is unglamorous data engineering: parsing variant CUSIP/SEDOL/ticker conventions, reconciling rate quoting bases (some quote bps over GC, some quote all-in), and handling lot-size minimums. Vendors like Hazeltree, Pirum's SBLREX, Sharegain, and EquiLend's 1Source platform have built this layer commercially, with implementation typically running 4-8 months and licensing of $400K-$1.2M annually depending on fund size.

Once normalized, locate selection becomes a constrained optimization problem: minimize blended financing cost subject to prime broker balance constraints, margin agreement haircuts, GMSLA recall provisions, and concentration limits per lender. Funds running this optimization in production report 12-28 basis points of annual financing savings on the short book — for a $4 billion short book that is $4.8M-$11.2M in direct P&L. The same infrastructure also surfaces opportunities to internalize borrows against the fund's own long inventory where permitted by the prime brokerage agreement.

🎯Where the savings actually come from

Approximately 60% of measured savings come from rate arbitrage across primes on the same name. About 25% comes from avoiding unnecessary specials by substituting economically similar names (e.g., choosing one of three correlated regional banks based on borrow cost). The remaining 15% comes from reduced buy-in events because the system flags fragile locates pre-trade rather than at settlement.

Recall Risk and the Survival Model

Recalls happen because the lender's underlying beneficial owner sold the security, voted proxies require physical recall, or the agent lender rebalances. Pre-2023, most funds treated recalls as random shocks. With sufficient historical recall data — typically 18-24 months of internal stock loan tickets at scale — a Cox proportional hazards model or a discrete-time survival neural network can produce a per-name, per-lender recall hazard rate.

Features that drive recall hazard include lender type (passive index funds rarely recall; active mutual funds recall around quarter-end window dressing), proxy season proximity, dividend record dates (for tax-sensitive lenders), corporate action calendars, and on-loan utilization at the program level. A fund running a survival model on its borrow book typically identifies the top decile of recall-risk loans, which historically account for 55-70% of actual recall events. Those positions can be pre-emptively reallocated to more stable lenders or sized down before forced covers occur at unfavorable prices.

Total Short Financing Cost (per name, per trade)

TFC = Σ(Borrow_Rate_t × Notional_t × Δt/360) + Recall_Buy-in_Cost × P(recall) + Locate_Fee

Pre-trade, the AI system estimates each component. Post-trade, actuals feed back into the model for continuous calibration. Funds typically see model MAE compress 20-30% after the first 12 months of in-production learning.

Squeeze Detection and the Crowded Short Problem

Crowded shorts are identifiable in advance. Utilization above 90%, days-to-cover above 8, on-loan-to-float above 15%, rising borrow cost over five sessions, and elevated retail option call volume are the classic signals. The 2021 meme-stock episode added a new dimension: social media coordination, which can be partially captured by NLP models running on Reddit, Stocktwits, and X/Twitter data, similar to the architecture covered in alternative data pipelines.

Predictive Power of Squeeze Indicators (AUC, 2022-2025 backtest)

An ensemble model combining these signals achieves AUC around 0.83 for identifying names that will experience a 3-sigma borrow rate move within the next 10 trading days. The actionability matters more than the prediction: alerts route to risk officers who can mandate position reductions, hedge with single-stock options, or substitute economically similar exposures. S3 Partners and ORTEX publish related crowded-short scores; sophisticated funds build proprietary versions because the public scores are themselves consumed by the crowd and lose information value.

“We stopped thinking of borrow cost as a financing line and started treating it as a real-time market signal. When utilization regime-shifts, that's frequently a leading indicator of the trade being wrong, not just expensive.”

— Head of Short Alpha, $12B equity long/short fund

Architecture and Vendor Landscape

A modern locate management stack has five layers. Data ingestion handles prime broker availability files, S&P Global Securities Finance feeds, EquiLend DataLend, and (from 2026) the 10c-1a public tape. A normalization layer reconciles identifiers and rate conventions. A modeling layer hosts the borrow forecast, recall hazard, and squeeze risk models. An optimization layer solves the locate allocation problem against prime broker constraints. A workflow layer pushes results into the OMS and the pre-trade compliance check, similar to patterns described in execution and SOR architecture.

Key Vendors in the Locate and Securities Finance Stack

EquiLend / 1Source

Central counterparty for stock loan trades; new blockchain-based 1Source platform launched 2024 for golden-source loan data.

Hazeltree

Treasury and securities financing platform widely deployed at hedge funds for locate aggregation and counterparty exposure management.

Pirum SBLREX

Post-trade reconciliation and lifecycle automation; growing capability in pre-trade availability.

S&P Global Securities Finance

Formerly IHS Markit; benchmark borrow rate and lendable supply data covering 99% of global lendable equity inventory.

Sharegain

API-first lending infrastructure; relevant for funds running their own internal lending program.

S3 Partners / ORTEX

Short interest analytics, crowded short scoring; commonly consumed as features in proprietary squeeze models.

Implementation Path and Realistic Metrics

A fund running $2-10B in short notional should plan for an 18-month implementation sequenced in three waves. Wave one (months 1-6) covers data aggregation, normalization, and basic locate optimization — this alone typically delivers 8-15 bps of financing savings. Wave two (months 6-12) introduces the borrow cost forecasting and recall hazard models, adding another 7-15 bps and reducing buy-ins by 40-60%. Wave three (months 12-18) integrates squeeze detection and pre-trade financing TCA into the order management system, closing the loop with portfolio construction.

Phased Build for AI-Driven Locate Management

Phase 1: Data foundation (months 1-6)

Normalize PB availability files, integrate S&P Global and DataLend, build internal loan ticket warehouse. Run baseline locate optimization. Expected payback: 8-15 bps.

Phase 2: Predictive models (months 6-12)

Train and deploy borrow cost forecast and recall hazard models. Backtest against 2-3 years of internal data. Integrate forecasts into pre-trade workflow. Additional 7-15 bps savings.

Phase 3: Squeeze and TCA (months 12-18)

Add squeeze risk ensemble, social/NLP layer, and financing TCA. Embed into OMS pre-trade compliance. Reduce buy-ins by 40-60%, eliminate the 'surprise specials' problem.

Phase 4: 10c-1a integration (2026 onward)

Ingest FINRA public loan tape as it phases in. Recalibrate models with industry-wide rate data. Renegotiate PB spreads using empirical benchmarks.

On a $4B short book, 25-30 bps of financing improvement is $10-12M of pure P&L. That funds the data science team, the vendor stack, and the infrastructure — three times over.
— Quant operations lead, multi-strategy fund

Common Failure Modes

Three failure patterns recur in implementations. The first is treating locate data as a procurement problem rather than a quant data problem — funds buy DataLend and EquiLend feeds but never join them to their own loan tickets, so the model never learns from the fund's actual experience. The second is over-fitting on the post-2020 period, which contained the meme-stock dislocations; models trained only on that data overestimate squeeze probability in normal regimes. The third is failing to integrate the borrow model with portfolio construction, leaving the alpha researcher and the financing desk in separate systems with separate views of the same trade.

Diligence Questions Before Investing in the Build

Do we have at least 18 months of internal stock loan tickets, with rates, lender identity (where available), and recall outcomes, in a queryable warehouse? Are our prime broker agreements written to allow us to act on the rate intelligence — e.g., can we lift locates from PB-A and execute the short at PB-B without giving up the rate? Does our OMS expose pre-trade hooks where financing cost can block or resize an order, or will we need to retrofit? Have we mapped which beneficial owner types lend our specials, and do our agent lenders provide enough metadata to model recall risk? Is our risk team prepared to act on squeeze alerts — including the governance for forced position reductions when borrow rates spike?

Where This Goes Next

Two structural shifts will reshape this space over the next 24 months. First, Rule 10c-1a public dissemination will commoditize rate discovery, forcing prime brokers to compete on execution quality and balance sheet rather than information asymmetry. Funds that built proprietary models before the data became public will retain an edge by combining it with their own loan history. Second, tokenized securities lending — being piloted by EquiLend's 1Source on a permissioned blockchain — will enable intraday settlement of loans and dynamic rate adjustment, which removes some of the friction that currently makes specials sticky. Funds with API-first stacks will adapt within weeks; those still running locate workflows on email and Excel will not.

Short-selling alpha has always come from being right about the company. Increasingly, capturing that alpha requires being right about the financing too. The funds that institutionalize this — moving locate management from the stock loan desk's spreadsheet into the same modeling discipline applied to alpha generation and execution — will keep more of every basis point they earn on the short side. Those that do not will continue to bleed 20-50 bps annually to information asymmetry, recalls, and the occasional catastrophic buy-in.

Frequently Asked Questions

How much can a hedge fund realistically save by deploying AI for locate management?

Funds running $2-10B in short notional typically report 20-40 bps of annual financing cost reduction once the full stack is in production, plus a 40-60% reduction in buy-in events. On a $4B short book that's $8-16M of direct P&L, with payback periods of 12-18 months including vendor licensing and build cost.

Does Rule 10c-1a eliminate the need for proprietary borrow cost modeling?

No. Public dissemination of loan transaction data will compress information asymmetry, but the data lags execution and lacks the borrower's own context. Funds combining the public 10c-1a tape with their internal loan history and proprietary features (options flow, holder composition, sentiment) will retain meaningful predictive edge over those consuming the public tape alone.

What's the difference between general collateral (GC) and specials in this context?

GC names borrow at near-fed-funds rates (typically 25-50 bps fee) with abundant supply. Specials are scarce names where utilization is high and fees can run from 100 bps to thousands of basis points annualized. AI models help mostly on specials and on the transition zone where a GC name is becoming a special — that's where rate volatility and recall risk concentrate.

Can a fund build this in-house or should it buy a vendor solution?

The data aggregation and locate optimization layers are commoditized — Hazeltree, Pirum, and similar vendors solve these well. The predictive modeling layer (borrow forecast, recall hazard, squeeze detection) is where proprietary work pays off, because it benefits from the fund's own historical loan tickets and trade decisions. Hybrid builds — buy the platform, build the models — are the most common pattern at funds above $5B AUM.

How does T+1 settlement affect locate management?

The window to cure failures-to-deliver under Reg SHO Rule 204 compressed roughly 40% with the May 2024 T+1 move, making stale or fragile locates far more dangerous. Funds report needing to validate locates 2-4 hours earlier in the workflow and to maintain backup locates on a higher percentage of HTB names. This intensified the case for automated locate aggregation rather than manual workflows.