Hello, I'm
Dawei Liu
About
Master’s student in Computer and Information Science at the University of Pennsylvania (2026), with a B.E. in Software Engineering from Northeastern University. I work across large-scale ads systems, AI infrastructure, and research on agents and multimodal learning.
Recently, I worked at TikTok (Shop Ads), where I joined as an SDE Intern and was converted to a Software Engineer within 12 weeks. During my 8-month tenure, I worked on closed-loop Ads delivery, Image Selection with exploration and exploitation strategy, Multimodal LLM-based image understanding, GenAI image integration, and recommendation infrastructure optimization.
Previously, I contributed to observability frameworks as an SDE Intern at Amazon. I also optimized AI computing platform infra and AIGC marketing workflows as a Backend SDE Intern at JD.com.
I’m passionate about building scalable, efficient, and intelligent systems at the intersection of AI infrastructure and engineering excellence, solving complex problems that push the boundaries of system performance and reliability.
📚 Publications

Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills
Skill usage has become a core component of modern agent systems and can substantially improve agents' ability to complete complex tasks. In real-world settings, where agents must monitor and interact with numerous personal applications, web browsers, and other environment interfaces, skill libraries can scale to thousands of reusable skills. Scaling to larger skill sets introduces two key challenges. First, loading the full skill set saturates the context window, driving up token costs, hallucination, and latency. In this paper, we present Graph of Skills (GoS), an inference-time structural retrieval layer for large skill libraries. GoS constructs an executable skill graph offline from skill packages, then at inference time retrieves a bounded, dependency-aware skill bundle through hybrid semantic-lexical seeding, reverse-weighted Personalized PageRank, and context-budgeted hydration. On SkillsBench and ALFWorld, GoS improves average reward by 43.6% over the vanilla full skill-loading baseline while reducing input tokens by 37.8%, and generalizes across three model families: Claude Sonnet, GPT-5.2 Codex, and MiniMax. Additional ablation studies across skill libraries ranging from 200 to 2,000 skills further demonstrate that GoS consistently outperforms both vanilla skills loading and simple vector retrieval in balancing reward, token efficiency, and runtime.

Multimodal Video Generation Models with Audio: Present and Future
Video generation models have advanced rapidly and are now widely used across entertainment, advertising, filmmaking, and robotics applications such as world modeling and simulation. However, visual content alone is often insufficient for realistic and engaging media experiences—audio is also a key component of immersion and semantic coherence. As AI-generated videos become increasingly prevalent in everyday content, demand has grown for systems that can generate synchronized sound alongside visuals. This trend has driven rising interest in multimodal video generation, which jointly models video and audio to produce more complete, coherent, and appealing outputs. Since late 2025, a wave of multimodal video generation models has emerged, with releases including Veo 3.1, Sora 2, Kling 2.6, Wan 2.6, OVI, and LTX 2. As multimodal generation technology advances, its impact expands across both daily consumer and industrial domains—revolutionizing daily entertainment while enabling more sophisticated world simulation for training embodied AI systems. In this paper, we provide a comprehensive overview of the multimodal video generation model literature covering the major topics: evolution and common architectures of multimodal video generation models; common post-training methods and evaluation; applications and active research areas of video generation; limitations and challenges of multimodal video generation.

A Cookbook of 3D Vision: Data, Learning Paradigms, and Application
3D vision has rapidly evolved, driven by increasingly diverse data representations, learning paradigms, and modeling strategies. Yet the field remains fragmented across representations and benchmarks, making it difficult to develop unified perspectives on efficiency, fidelity, and scalability. This work provides a data-centric taxonomy of 3D vision that connects geometric representations, datasets, learning frameworks, and applications within a single conceptual map. We begin by surveying the principal structural representations of 3D data—point clouds, meshes, voxels, and 3D Gaussians—along with their acquisition pipelines. We then examine how dataset design, benchmark construction, and supervision regimes shape recent advances, spanning 2D-supervised 3D learning, implicit neural representations, and 4D world modeling. Through this integrative lens, we clarify the relationships among representations, learning paradigms, and downstream tasks in reconstruction, generation, and video modeling, offering a consolidated view of emerging trends toward balancing efficiency and fidelity and toward multimodal geometric grounding.

TIMEDB: tumor immune micro-environment cell composition database with automatic analysis and interactive visualization
Deciphering the cell-type composition in the tumor immune microenvironment (TIME) can significantly increase the efficacy of cancer treatment and improve the prognosis of cancer. Such a task has benefited from microarrays and RNA sequencing technologies, resulting in extensive expression profiles with clinical phenotypes across multiple cancers. Current tools infer cell-type composition from bulk expression profiles, enabling investigation of inter- and intra-heterogeneity of TIME across cancer types. TIMEDB is an online database for human TIME cell-type composition estimated from bulk expression profiles, storing curated expression and composition profiles with clinical information for 39,706 samples from 546 datasets across 43 cancer types, equipped with online tools for automatic analysis and interactive visualization.
🎓 Education
University of Pennsylvania
Aug 2024 – May 2026M.S.E. · Computer and Information Science
Northeastern University
Sep 2020 – Jul 2024B.E. · Software Engineering
💼 Professional Experience
- Software Engineer Intern 2026 FTE Return Offer
Shop Ads Team | Seattle, WA
May 2025 – Dec 2025
I worked on the intelligence layer that powers TikTok Shop Ads’ PSA Carousel effectiveness. By designing the Image Selection system with posterior feature modeling, exploration–exploitation ranking, and multimodal LLM-based quality scoring, I helped the platform consistently surface high-performing creatives—resulting in a 20%+ ad revenue uplift. I also integrated GenAI enhancement and generation pipelines, enabling automated creative production for the top 90% of cost-driving products, significantly expanding high-quality supply for advertisers.
Beyond image selection work, I contributed to the core delivery foundation that supports global ad serving. I built the Modular Preview Flow, a flexible injection and verification framework across the entire delivery funnel (Ad → Creative → SPU → Image). This unified filtering log and stage-level previewing capability accelerated debugging, increased format rollout confidence, and enabled smoother expansion into new regions and surfaces. I also delivered Flink-based real-time features and end-to-end creative sync workflows (TBase → Forward Index), strengthening the reliability of ads delivery pipelines.
To ensure Shop Ads Carousel could withstand global scale growing traffic (300k+ QPS), I focused on engineering excellence: redesigning the product handler’s caching architecture, introducing async batch fetching with Folly Future, and shifting product-value computations offline. These improvements reduced p99 latency by 43.6% and eliminated 80% of failure spikes during high-traffic surges—directly enhancing the stability and resilience of the ads serving stack.
- Built the Shop Ads Image Selection system, modeling image posterior features, applying exploration–exploitation ranking, and integrating multimodal LLM–based image quality evaluation, to serve high-performing images, driving a 20%+ Revenue uplift.
- Integrated GenAI image enhancement and generation ability, producing for all products via delivery stream and running weekly scheduling to produce the top 90% cost-coverage products. Persisted assets in TBase and synced to Forward Index via Flink.
- Built Modular Preview Flow, a framework enabling stage-level entity injection (Ad, Creative, SPU, Image) across the delivery funnel with unified filtering log, improving debugging and verification efficiency for new ad formats and region rollouts.
- Optimized product handler stability and latency by redesigning local cache, applying async batch fetching with Folly Future, and migrating online product value call to offline, reducing 43.6% p99 latency and 80% of failure spikes in high-traffic scenarios.
– Software Development Engineer Intern 2026 FTE Return Offer
Global Mile Team | Beijing, China
Jun 2024 - Sep 2024
At Amazon, I developed a custom Java Agent to extend OpenTelemetry’s tracing, enabling end-to-end observability across microservices and Lambda environments. I built full-stack tools for trace visualization and implemented a Loosely Linked module that surfaced hidden cross-service relationships, improved on-call tracing clarity, and enabled reliable instrumentation in heterogeneous runtime environments.
- Developed a Java Agent that extends OpenTelemetry. Leveraged ByteBuddy to enhance methods annotated with @WithSpan, @Input, and @Output, enabling automatic tracing and payload collection.
- Implemented Loosely Linked Tracing module to reconstruct cross-service call chains via business IDs and timestamps, enabling trace aggregation even when intermediate services lack instrumentation. (e.g. MQ or partial service onboarding).
- Extended the Java Agent to support both AWS Fargate and Lambda, using reflection-based runtime detection to adapt data delivery via Kinesis (high-throughput) or SQS (event-driven tasks), ensuring reliability and efficiency across environments.
- Built a full-stack telemetry console with React frontend and Java backend, supporting flexible querying (filters, aggregation, fuzzy search, pagination) and multi-view trace visualization (tree, table, timeline, and span payloads).
– Software Development Engineer Intern
Algorithm Tools Team | Beijing, China
Jul 2023 - Oct 2023
At JD.com, I worked on platform engineering for internal AI tooling. I redesigned a resource management service using Kubernetes' Informer + observer pattern, reducing start-up time by 20x. I introduced GitOps + Argo Workflows for cloud-native CI/CD, built Helm charts for privatized deployments, and improved code modularity for activity page generation using AIGC pipelines. My work enabled faster and more maintainable delivery of algorithmic components.
- Redesigned resource management service with ConfigMap-based automation and Kubernetes Informers; introduced async resource recalculation, observer pattern, and sharded row locking, cutting service cold-start time by 20×.
- Refactored campaign page generation service with the Strategy pattern, improving module reusability and maintainability.
- Designed AIGC-integrated generation pipelines for automated creation of campaign page sections, reducing manual workload.
- Enhanced CI/CD pipelines with cloud-native GitOps workflows built on Argo Workflows and Argo CD, improving automation.
- Developed Helm charts for multi-tenant deployments, enabling client-specific delivery in hybrid-cloud environments.
🛠️ Tech Stack
General
Scripting
Data & Markup
Frameworks
RPC / IDL
Storage
Messaging
Observability
Web
DataViz & UI
iOS
Frameworks
RecSys
LLM
Experimentation
Engine & Shader
Toolkits
Platforms
Observability & Ops
AWS
Dev
Docs & Diagrams
💬 Let’s Connect
Interested in ads delivery, recommendation systems, AI infra, or engineering at scale. I’m always happy to connect — reach out via the links above or email.
