Description

Large language models are changing the malware playbook: instead of shipping fixed attack logic inside a binary, some threats now call out to models at runtime and synthesize malicious code on demand, which seriously weakens signature-based detection. In our analysis we uncovered previously unseen samples and a likely earliest instance of this pattern — a campaign we named MalTerminal — and also found other adversary uses of LLMs such as automated people-search agents, red-team benchmarking tools, and utilities that prompt models to inject vulnerabilities. Technically, these samples embed live model access (API keys) and carefully structured prompts so the remote model becomes the payload generator; that model-driven behavior thwarts static signatures and can also evade dynamic sandboxes when execution paths hinge on environment variables or the model’s live responses. We located these by writing YARA rules to match commercial LLM keys, extracting prompts, retrohunting a year of VirusTotal submissions, and clustering samples that reused key sets. One detected artifact, MalTerminal.exe plus associated Python loaders, queried the GPT-4 chat-completions endpoint to produce either ransomware routines or reverse-shell code depending on operator instructions. The endpoint string in the sample is now deprecated, suggesting it predates November 2023 and may represent the earliest known LLM-enabled malware; a small proof-of-concept loader that prompts GPT-4 for ransomware functions was also recovered (not reproduced here).