LinkedIn Says Putting LLMs On Work Is Hard

LinkedIn’s team has dedicated six months to crafting new generative AI functionalities for job and content searches.

Their project centers on an LLM-based product aimed at revolutionizing job search and content browsing on the platform.

LinkedIn Says Putting LLMs On Work Is Hard

The team aimed to transform feeds and job listings into hubs for rapid information access, networking, and career advice.

Overcoming hurdles, they sought to enhance user experience, facilitating profile optimization and interview preparation.

The team found building a basic pipeline straightforward, comprising a routing step for AI agent selection, a retrieval step using RAG for relevant info, and a generation step for response creation.

The division into separate agent teams facilitated rapid development, although challenges like fragmented user experience arose. Solutions included shared prompt templates to address such issues.


Evaluating the quality of AI responses posed a challenge due to reliance on human scores. Consistent guidelines and a scalable scoring system were essential for the team.

They rely on automatic scoring to aid initial developer evaluation but deem it insufficient for accurate assessment.


LinkedIn faced a challenge with its APIs not being LLM-compatible, so they developed LLM “skills” around the APIs, explaining functions for LLM comprehension. They extended this technique to non-LinkedIn APIs like Bing search.

The team manually coded solutions for recurring LLM errors, finding it more cost-effective than LLM self-correction. Manual correction reduced output formatting errors to a mere 0.01 percent.

The journey toward achieving optimal LLM performance was marked by several challenges. Consistency in output quality posed a significant issue.

Progress initially surged, with 80 percent functionality achieved within the first month. However, the pace decelerated as they neared 100 percent completion.

The team faced challenges as each additional percent of improvement became difficult to achieve. Balancing aspects like capacity, latency, and cost was necessary.

Complex prompting methods like Chain of Thought improved results but increased latency and costs. It took the team four months to reach 95 percent accuracy.

Achieving 99%+ accuracy required a lot of work and creativity even with advanced models.

The report highlights the challenges of using generative AI effectively. The LinkedIn team is still working on optimizing their product for launch.

Related Stories:

Help Someone By Sharing This Article