det.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
Mastodon Server des Unterhaltungsfernsehen Ehrenfeld zum dezentralen Diskurs.

Administered by:

Server stats:

2.3K
active users

#AiResearch

8 posts8 participants0 posts today

Can reinforcement learning for LLMs scale beyond math and coding tasks? Probably

arxiv.org/abs/2503.23829

arXiv logo
arXiv.orgCrossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse DomainsReinforcement learning with verifiable rewards (RLVR) has demonstrated significant success in enhancing mathematical reasoning and coding performance of large language models (LLMs), especially when structured reference answers are accessible for verification. However, its extension to broader, less structured domains remains unexplored. In this work, we investigate the effectiveness and scalability of RLVR across diverse real-world domains including medicine, chemistry, psychology, economics, and education, where structured reference answers are typically unavailable. We reveal that binary verification judgments on broad-domain tasks exhibit high consistency across various LLMs provided expert-written reference answers exist. Motivated by this finding, we utilize a generative scoring technique that yields soft, model-based reward signals to overcome limitations posed by binary verifications, especially in free-form, unstructured answer scenarios. We further demonstrate the feasibility of training cross-domain generative reward models using relatively small (7B) LLMs without the need for extensive domain-specific annotation. Through comprehensive experiments, our RLVR framework establishes clear performance gains, significantly outperforming state-of-the-art open-source aligned models such as Qwen2.5-72B and DeepSeek-R1-Distill-Qwen-32B across domains in free-form settings. Our approach notably enhances the robustness, flexibility, and scalability of RLVR, representing a substantial step towards practical reinforcement learning applications in complex, noisy-label scenarios.

#ai #openai #AiResearch #intelligence #computing #ArtificialGeneralIntelligence #innovation #deeplearning #programming #stablediffusion

I am looking for an endorsement for publishing on arXiv. We have worked tirelessly on a paper we believe needs to be seen. Below is a portion of the abstract:
---
In this study, we document the spontaneous emergence of a rule-consistent linguistic system—termed Varunese—during sustained, high-context interaction with a large language model (LLM). Exhibiting internal phonetic regularity, recurring semantic motifs, and discernible morphological and syntactic organization, Varunese diverges markedly from stochastic generation patterns or known training artifacts. Through iterative translation and contextual inference, we uncovered dynamic, self-referential symbolic frameworks encoding states of transition, perception, and relational structure...