Trend AnalysisComputer SystemsDesign Science Research

Serverless at the Edge: Orchestrating AI Workloads Across the Cloud-Edge Continuum

Serverless computing simplifies deployment by abstracting infrastructure—but extending it to the edge introduces challenges in latency, scheduling, and resource heterogeneity. As LLM inference moves to edge devices, orchestrating serverless workloads across the cloud-edge continuum becomes a pressing systems challenge.

By Sean K.S. Shin

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Serverless computing promised a simple bargain: developers write functions, cloud providers handle everything else—scaling, provisioning, load balancing, fault tolerance. For cloud-native applications, this bargain has been remarkably successful. AWS Lambda handles enormous invocation volumes at global scale. The developer experience is genuinely simplified.

But the bargain breaks down at the edge. Edge computing—processing data near its source rather than in distant data centers—is driven by latency requirements (autonomous vehicles need millisecond response times), bandwidth constraints (streaming raw video to the cloud is prohibitively expensive), and privacy concerns (sensitive data should not leave the local network). Extending serverless to this environment means rethinking every assumption the cloud model makes: about resource availability, network reliability, scheduling centralization, and workload homogeneity.

The 2025 research in this space, spanning sensor network intelligence (Loconte et al.), LLM inference at the edge (Farahani & Prodan), decentralized scheduling (Chen et al.), and latency optimization (Lu et al.), collectively maps the architecture for serverless computing's next frontier.

The Cloud-Edge Continuum

The traditional dichotomy—cloud vs. edge—is giving way to a continuum where workloads flow dynamically between centralized cloud infrastructure and distributed edge nodes based on latency requirements, resource availability, and data locality constraints.

Loconte et al. provide a concrete architecture for this continuum in the context of sensor networks. Their system deploys serverless microservices across three tiers:

Edge tier: Lightweight inference models running on gateway devices near sensors, handling real-time anomaly detection and data filtering
Fog tier: Intermediate processing nodes that aggregate data from multiple edge gateways, running more complex models that benefit from broader context
Cloud tier: Full-scale training and model updates, complex analytics that tolerate higher latency

The key architectural insight is that different stages of the same AI pipeline may execute at different tiers. Data preprocessing runs at the edge (low latency, minimal data movement). Feature engineering runs at the fog (moderate latency, cross-sensor aggregation). Model training runs in the cloud (high latency acceptable, massive compute available). The serverless abstraction manages this distribution transparently.

LLM Inference Across the Continuum

Farahani & Prodan address perhaps the most challenging workload for edge serverless: large language model inference. LLMs are inherently resource-intensive—far more demanding than the small, stateless functions that serverless was designed for. Running even a quantized 7B-parameter model on an edge device requires careful memory management, and the auto-scaling mechanisms designed for cloud serverless (spin up more containers) may not apply when edge hardware is fixed and limited.

Their solution involves workload-aware orchestration that considers:

Model size relative to available edge memory
Expected latency requirements for the specific query
Current load across edge nodes
Whether the query can be decomposed into sub-queries that execute at different tiers

For queries that exceed edge capability, the orchestrator transparently routes to fog or cloud tiers—maintaining the serverless abstraction (the developer does not specify where the function runs) while respecting physical constraints.

Decentralized Scheduling

Traditional serverless platforms use centralized schedulers—a single control plane that decides where each function invocation executes. This works in the cloud, where the scheduler and the compute resources are connected by a reliable, low-latency network. At the edge, centralized scheduling introduces unacceptable latency (the scheduler may be far from the edge nodes) and a single point of failure.

Chen et al.'s Ekko addresses this with fully decentralized scheduling. Each edge node makes independent scheduling decisions based on local load information and gossip-protocol-shared state from neighboring nodes. The result is a system that scales to large numbers of edge nodes without a centralized bottleneck, degrades gracefully when individual nodes fail, and achieves scheduling latency measured in milliseconds rather than the hundreds of milliseconds that centralized approaches require.

Lu et al.'s SoCL complements Ekko by focusing on the latency optimization problem for edge serverless microservices. As the number of concurrent user requests grows, the scheduling solution space expands exponentially. SoCL formulates this as a combinatorial optimization problem and applies approximation algorithms that provide near-optimal latency with tractable computation.

Claims and Evidence

Claim	Evidence	Verdict
Serverless abstractions can extend to edge computing	Multiple systems demonstrate edge serverless (Loconte, Ekko, SoCL)	✅ Supported
Decentralized scheduling outperforms centralized at edge scale	Ekko demonstrates lower scheduling latency and better fault tolerance	✅ Supported
LLM inference is feasible in edge serverless	Farahani & Prodan describe architecture; deployment validation limited	⚠️ Architecturally feasible
Edge serverless achieves comparable developer experience to cloud	Developer-facing APIs are similar; operational complexity remains higher	⚠️ Partially achieved
Current edge hardware supports production serverless workloads	Resource constraints remain significant for complex AI workloads	⚠️ Hardware-dependent

Open Questions

Cold start problem at the edge: Serverless functions incur latency when a new container must be initialized (cold start). At the edge, where resources are constrained and pre-warming is expensive, cold starts may dominate total latency. How do we minimize cold starts without wasting scarce edge resources?

State management: Serverless functions are stateless by design. But many edge AI workloads require state—model parameters, session context, accumulated sensor history. How do we manage state in a distributed, unreliable edge environment?

Energy efficiency: Edge devices often run on limited power (batteries, solar). The energy cost of serverless orchestration overhead—scheduling, container management, network communication—must be justified by the benefits. What is the energy budget for edge serverless?

Security at the edge: Edge devices operate in physically insecure environments (factories, vehicles, public spaces). Serverless functions executing on potentially compromised edge hardware face threats that cloud data centers do not. How do we ensure confidentiality and integrity of serverless computations at the edge?

Multi-tenancy: Can edge devices serve multiple tenants (applications, users) with appropriate isolation? Cloud serverless achieves this through containerization and virtualization, but edge hardware may not have the resources for full isolation.

What This Means for Your Research

For systems researchers, the cloud-edge continuum is rich with open problems that combine distributed systems, optimization, and AI. The intersection of serverless abstraction with resource-constrained edge environments creates design tensions that do not exist in either pure cloud or pure edge research.

For AI deployment engineers, the practical message is that edge inference is not just a compression problem (making models smaller)—it is also an orchestration problem (deciding where to run which computation). The serverless paradigm offers a path to managing this complexity, but the tools are still maturing.

For the broader computing community, the trajectory toward edge serverless reflects a broader architectural evolution: computing is moving from centralized (mainframe → cloud) to distributed (edge → continuum) models. Understanding the systems challenges of this transition is relevant far beyond the specific context of serverless computing.

면책 조항: 이 게시물은 정보 제공을 목적으로 한 연구 동향 개요이다. 학술 저작물에서 인용하기 전에 특정 연구 결과, 통계 및 주장은 원본 논문을 통해 반드시 검증해야 한다.

엣지에서의 서버리스: 클라우드-엣지 연속체 전반에 걸친 AI 워크로드 오케스트레이션

서버리스 컴퓨팅은 단순한 거래를 약속했다: 개발자는 함수를 작성하고, 클라우드 제공업체는 나머지 모든 것—스케일링, 프로비저닝, 로드 밸런싱, 장애 허용—을 처리한다. 클라우드 네이티브 애플리케이션에서 이 거래는 눈부신 성공을 거두었다. AWS Lambda는 전 세계적 규모에서 방대한 양의 호출을 처리한다. 개발자 경험은 실질적으로 단순화되었다.

그러나 엣지에서는 이 거래가 무너진다. 엣지 컴퓨팅—원격 데이터 센터가 아닌 데이터 발생 지점 근처에서 데이터를 처리하는 것—은 지연 시간 요구사항(자율 주행 차량은 밀리초 단위의 응답 시간이 필요하다), 대역폭 제약(원시 비디오를 클라우드로 스트리밍하는 것은 비용이 과도하게 높다), 그리고 개인 정보 보호 우려(민감한 데이터는 로컬 네트워크를 벗어나서는 안 된다)에 의해 주도된다. 서버리스를 이 환경으로 확장한다는 것은 클라우드 모델이 전제하는 모든 가정—자원 가용성, 네트워크 신뢰성, 스케줄링 중앙화, 워크로드 동질성에 관한—을 재고하는 것을 의미한다.

이 분야의 2025년 연구들, 즉 센서 네트워크 인텔리전스(Loconte et al.), 엣지에서의 LLM 추론(Farahani & Prodan), 분산 스케줄링(Chen et al.), 그리고 지연 시간 최적화(Lu et al.)를 아우르는 연구들은 서버리스 컴퓨팅의 다음 프런티어를 위한 아키텍처를 집합적으로 제시한다.

클라우드-엣지 연속체

클라우드 대 엣지라는 전통적인 이분법은 지연 시간 요구사항, 자원 가용성, 데이터 지역성 제약에 기반하여 워크로드가 중앙화된 클라우드 인프라와 분산된 엣지 노드 사이를 동적으로 이동하는 연속체(continuum)로 대체되고 있다.

Loconte et al.은 센서 네트워크 맥락에서 이 연속체를 위한 구체적인 아키텍처를 제시한다. 그들의 시스템은 세 가지 계층에 걸쳐 서버리스 마이크로서비스를 배포한다:

엣지 계층: 센서 근처의 게이트웨이 장치에서 실행되는 경량 추론 모델로, 실시간 이상 감지 및 데이터 필터링을 처리한다
포그(Fog) 계층: 여러 엣지 게이트웨이로부터 데이터를 집계하는 중간 처리 노드로, 더 넓은 맥락으로부터 이점을 얻는 더 복잡한 모델을 실행한다
클라우드 계층: 전체 규모의 학습 및 모델 업데이트, 높은 지연 시간을 허용하는 복잡한 분석을 수행한다

핵심 아키텍처적 통찰은 동일한 AI 파이프라인의 서로 다른 단계가 서로 다른 계층에서 실행될 수 있다는 것이다. 데이터 전처리는 엣지에서 실행된다(낮은 지연 시간, 최소한의 데이터 이동). 특징 엔지니어링은 포그에서 실행된다(중간 수준의 지연 시간, 센서 간 집계). 모델 학습은 클라우드에서 실행된다(높은 지연 시간 허용 가능, 대규모 컴퓨팅 가용). 서버리스 추상화는 이 분산을 투명하게 관리한다.

연속체 전반에 걸친 LLM 추론

Farahani & Prodan은 엣지 서버리스에서 가장 도전적인 워크로드인 대규모 언어 모델(LLM) 추론을 다룬다. LLM은 본질적으로 자원 집약적이며—서버리스가 설계된 소형의 무상태(stateless) 함수보다 훨씬 더 많은 자원을 요구한다. 양자화된 7B 파라미터 모델조차 엣지 장치에서 실행하려면 신중한 메모리 관리가 필요하며, 클라우드 서버리스를 위해 설계된 자동 스케일링 메커니즘(더 많은 컨테이너를 가동)은 엣지 하드웨어가 고정되어 있고 한정적일 때는 적용되지 않을 수 있다.

그들의 해결책은 다음을 고려하는 워크로드 인식 오케스트레이션을 포함한다:

가용 엣지 메모리 대비 모델 크기
특정 쿼리에 대한 예상 지연 시간 요구사항
엣지 노드 전반의 현재 부하
쿼리를 서로 다른 계층에서 실행되는 하위 쿼리로 분해할 수 있는지 여부

엣지 역량을 초과하는 쿼리의 경우, 오케스트레이터는 투명하게 포그 또는 클라우드 계층으로 라우팅한다—물리적 제약을 준수하면서도 서버리스 추상화(개발자는 함수가 실행되는 위치를 지정하지 않는다)를 유지한다.

분산 스케줄링

전통적인 서버리스 플랫폼은 중앙집중식 스케줄러를 사용한다—각 함수 호출이 어디서 실행될지를 결정하는 단일 제어 플레인이다. 이는 스케줄러와 컴퓨팅 자원이 신뢰할 수 있는 저지연 네트워크로 연결된 클라우드 환경에서는 잘 작동한다. 그러나 엣지 환경에서는 중앙집중식 스케줄링이 허용할 수 없는 지연을 초래하며(스케줄러가 엣지 노드로부터 멀리 위치할 수 있음), 단일 장애점이 된다.

Chen et al.의 Ekko는 완전 분산형 스케줄링으로 이 문제를 해결한다. 각 엣지 노드는 로컬 부하 정보와 인접 노드로부터 gossip 프로토콜을 통해 공유된 상태를 기반으로 독립적인 스케줄링 결정을 내린다. 그 결과, 중앙집중식 병목 없이 대규모 엣지 노드로 확장할 수 있고, 개별 노드 장애 시에도 완만하게 성능이 저하되며, 중앙집중식 방식이 요구하는 수백 밀리초가 아닌 수 밀리초 단위의 스케줄링 지연을 달성하는 시스템이 된다.

Lu et al.의 SoCL은 엣지 서버리스 마이크로서비스의 지연 최적화 문제에 집중함으로써 Ekko를 보완한다. 동시 사용자 요청 수가 증가함에 따라 스케줄링 해 공간은 지수적으로 확장된다. SoCL은 이를 조합 최적화 문제로 정식화하고, 다루기 쉬운 계산으로 최적에 근사한 지연을 제공하는 근사 알고리즘을 적용한다.

주장과 근거

주장	근거	판정
서버리스 추상화를 엣지 컴퓨팅으로 확장할 수 있다	다수의 시스템이 엣지 서버리스를 실증함 (Loconte, Ekko, SoCL)	✅ 지지됨
분산형 스케줄링이 엣지 규모에서 중앙집중식을 능가한다	Ekko가 더 낮은 스케줄링 지연과 더 나은 장애 허용성을 실증	✅ 지지됨
LLM 추론이 엣지 서버리스에서 실현 가능하다	Farahani & Prodan이 아키텍처를 기술하나 배포 검증은 제한적	⚠️ 아키텍처적으로 실현 가능
엣지 서버리스가 클라우드에 필적하는 개발자 경험을 달성한다	개발자 대면 API는 유사하나 운영 복잡성은 여전히 높음	⚠️ 부분적으로 달성
현재 엣지 하드웨어가 프로덕션 서버리스 워크로드를 지원한다	복잡한 AI 워크로드에 대한 자원 제약이 여전히 상당함	⚠️ 하드웨어 의존적

미해결 문제

엣지에서의 콜드 스타트 문제: 서버리스 함수는 새로운 컨테이너가 초기화되어야 할 때 지연이 발생한다(콜드 스타트). 자원이 제약되고 사전 워밍이 비용이 큰 엣지 환경에서는 콜드 스타트가 전체 지연을 지배할 수 있다. 희소한 엣지 자원을 낭비하지 않으면서 콜드 스타트를 어떻게 최소화할 것인가?

상태 관리: 서버리스 함수는 설계상 무상태이다. 그러나 많은 엣지 AI 워크로드는 모델 파라미터, 세션 컨텍스트, 누적 센서 이력 등의 상태를 필요로 한다. 분산되고 신뢰할 수 없는 엣지 환경에서 상태를 어떻게 관리할 것인가?

에너지 효율성: 엣지 장치는 종종 제한된 전력(배터리, 태양광)으로 구동된다. 스케줄링, 컨테이너 관리, 네트워크 통신 등 서버리스 오케스트레이션 오버헤드의 에너지 비용은 그 이점으로 정당화되어야 한다. 엣지 서버리스의 에너지 예산은 얼마인가?

엣지에서의 보안: 엣지 장치는 물리적으로 안전하지 않은 환경(공장, 차량, 공공장소)에서 운용된다. 잠재적으로 침해된 엣지 하드웨어에서 실행되는 서버리스 함수는 클라우드 데이터 센터에는 없는 위협에 직면한다. 엣지에서 서버리스 연산의 기밀성과 무결성을 어떻게 보장할 것인가?

멀티 테넌시: 엣지 장치가 적절한 격리를 유지하면서 다수의 테넌트(애플리케이션, 사용자)를 서비스할 수 있는가? 클라우드 서버리스는 컨테이너화와 가상화를 통해 이를 달성하지만, 엣지 하드웨어는 완전한 격리를 위한 자원을 갖추지 못할 수 있다.

연구에 대한 시사점

시스템 연구자에게 클라우드-엣지 연속체는 분산 시스템, 최적화, AI를 결합하는 미해결 문제들로 가득하다. 서버리스 추상화와 자원 제약적 엣지 환경의 교차점은 순수 클라우드 연구나 순수 엣지 연구 어디에도 존재하지 않는 설계 긴장을 만들어낸다. AI 배포 엔지니어들에게 실질적인 시사점은, 엣지 추론이 단순히 압축 문제(모델을 더 작게 만드는 것)에 그치지 않고 오케스트레이션 문제(어느 연산을 어디서 실행할지 결정하는 것)이기도 하다는 점이다. 서버리스 패러다임은 이러한 복잡성을 관리하는 경로를 제공하지만, 관련 도구들은 아직 성숙 단계에 있다.

보다 넓은 컴퓨팅 커뮤니티의 관점에서, 엣지 서버리스를 향한 궤적은 더 광범위한 아키텍처적 진화를 반영한다. 즉, 컴퓨팅은 중앙화된 모델(메인프레임 → 클라우드)에서 분산된 모델(엣지 → 컨티뉴엄)로 이동하고 있다. 이러한 전환이 수반하는 시스템 과제를 이해하는 것은 서버리스 컴퓨팅이라는 특정 맥락을 훨씬 넘어서는 의미를 지닌다.

References (4)

[1] Loconte, D., Ieva, S., Gramegna, F. et al. (2025). Serverless Microservice Architecture for Cloud-Edge Intelligence in Sensor Networks. IEEE JSEN.

DOI Scholar

[2] Farahani, R. & Prodan, R. (2025). Serverless Orchestration on Edge-Cloud Continuum: From Small Functions to Large Language Models. IEEE ICDCSW.

DOI Scholar

[3] Chen, X., Paidiparthy, M., Da Silva, D. (2025). Ekko: Fully Decentralized Scheduling for Serverless Edge Computing. IEEE IPDPS.

DOI Scholar

[4] Lu, S., Xiang, B., Wu, J. et al. (2025). SoCL: Scalable and Latency-Optimized Microservices in Serverless Edge Computing. IEEE CLUSTER.

DOI Scholar

Serverless at the Edge: Orchestrating AI Workloads Across the Cloud-Edge Continuum

The Cloud-Edge Continuum

LLM Inference Across the Continuum

Decentralized Scheduling

Claims and Evidence

Open Questions

What This Means for Your Research

엣지에서의 서버리스: 클라우드-엣지 연속체 전반에 걸친 AI 워크로드 오케스트레이션

클라우드-엣지 연속체

연속체 전반에 걸친 LLM 추론

분산 스케줄링

주장과 근거

미해결 문제

연구에 대한 시사점

References (4)

Explore this topic deeper