Hello, I'm

Dawei Liu

Current

Software Engineer @ TikTokShop Ads ranking, creative intelligence, and low-latency delivery systems.

Research

Research Intern @ Lehigh LAIRAgent harnesses, skill retrieval, and AI research workflows.

Impact

10%+ revenue, 43.6% p99 latency reductionShipped measurable product and systems improvements at scale.

Interested In

Agent infrastructure, RecSys, platform engineeringEspecially work that connects ambitious ideas with deployable systems.

About

I build agentic systems, intelligent infrastructure, and production-grade ML platforms. My work lives at the intersection of AI research and large-scale engineering, with a focus on turning ambitious ideas into systems that are practical, measurable, and deployable.

Research x Engineering

I am a Software Engineer at TikTok, working on Shop Ads. I hold an M.S.E. in Computer and Information Science from the University of Pennsylvania and a B.E. in Software Engineering from Northeastern University. I enjoy designing AI-native workflows, retrieval systems for agents, and backend platforms that remain reliable under real product constraints.

Agentic SystemsLLM InfrastructureRecommendationSystems Engineering

Current RoleSoftware Engineer @ TikTok

Research AffiliationResearch Intern @ Lehigh LAIR

Current Focus

Designing structural retrieval systems for massive agent skill libraries.
Building end-to-end research automation workflows with strong UX and execution depth.
Bridging research ideas with infrastructure that can survive scale, latency, and product complexity.

I’m currently a Software Engineer at TikTok, working on Shop Ads, where I first joined as an SDE Intern and converted to full-time within 12 weeks. My work there spans ads ranking, GenAI creative pipelines, and low-latency backend systems that serve real traffic.

In parallel, I’m a Research Intern at Lehigh University with LAIR, advised by Prof. Lichao Sun. As one of the top three core contributors, I help build Dr. Claw, a general-purpose AI Research Assistant for end-to-end research workflows, from idea formation and experimentation to paper-ready outputs.

Across TikTok, Amazon, and JD.com, I’ve shipped systems spanning recommendation, observability, AI tooling, backend architecture, and developer infrastructure.

Selected Impact

Research Systems

Building AI research workspaces that reduce workflow friction.

At LAIR, I contribute to Dr. Claw, an open-source AI research assistant that keeps ideation, experiments, writing, and iteration inside one cohesive workspace.

Product Impact

Shipped ranking and creative intelligence with measurable business lift.

At TikTok Shop Ads, I built image selection, GenAI creative integration, and delivery tooling that contributed to 10%+ revenue uplift on PSA Carousel effectiveness.

Systems Performance

Focused on robust backend design, observability, and latency engineering.

I enjoy turning fragile systems into dependable foundations, from reducing p99 latency by 43.6% to extending tracing across heterogeneous microservice and serverless environments.

Selected Publications

My recent work spans agent skill retrieval, AI research tooling, multimodal generation, 3D vision, and biomedical data systems.

COMFYCLAW: Self-Evolving Skill Harnesses for Image Generation Workflows

Under ReviewMay 2026Zongxia Li*, Dawei Liu*, Jingxi Chen, Xiyang Wu, Fuxiao Liu, Yuhang Zhou, Jing Xie, Xiaomin Wu, Lichao Sun

Agents are increasingly used to control executable workflows rather than only answer questions. This makes reliability depend on both execution-time scaffolding and the ability to reuse experience from past runs. We study this problem in workflow-based image generation and introduce COMFYCLAW, an agentic harness that controls an unmodified ComfyUI runtime. COMFYCLAW represents workflow construction as typed graph editing, gates tools by construction stage, rolls back invalid edits, and uses a region-level vision-language model (VLM) verifier to turn visual failures into repair suggestions. It also evolves a progressively disclosed skill library, where trajectories, execution errors, and verifier feedback are distilled into reusable Agent Skills after held-out validation. Across four benchmark splits, three agent models, and two image backbones, COMFYCLAW achieves the best average score in all six agent–backbone settings, improving the strongest setting from 61.09 to 76.34 over a verifier-only baseline without skill evolution. Human annotations further show that annotators prefer COMFYCLAW over variants without skill evolution.

Paper GitHub

Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills

PreprintApr 2026Dawei Liu*, Zongxia Li*, Hongyang Du, Xiyang Wu, Shihang Gui, Yongbei Kuang, Lichao Sun

Skill usage has become a core component of modern agent systems and can substantially improve agents' ability to complete complex tasks. In real-world settings, where agents must monitor and interact with numerous personal applications, web browsers, and other environment interfaces, skill libraries can scale to thousands of reusable skills. Scaling to larger skill sets introduces two key challenges. First, loading the full skill set saturates the context window, driving up token costs, hallucination, and latency. In this paper, we present Graph of Skills (GoS), an inference-time structural retrieval layer for large skill libraries. GoS constructs an executable skill graph offline from skill packages, then at inference time retrieves a bounded, dependency-aware skill bundle through hybrid semantic-lexical seeding, reverse-weighted Personalized PageRank, and context-budgeted hydration. On SkillsBench and ALFWorld, GoS improves average reward by 43.6% over the vanilla full skill-loading baseline while reducing input tokens by 37.8%, and generalizes across three model families: Claude Sonnet, GPT-5.2 Codex, and MiniMax. Additional ablation studies across skill libraries ranging from 200 to 2,000 skills further demonstrate that GoS consistently outperforms both vanilla skills loading and simple vector retrieval in balancing reward, token efficiency, and runtime.

Paper GitHub

A Cookbook of 3D Vision: Data, Learning Paradigms, and Application

CVPR 2026 WorkshopMar 2026Hongyang Du*, Zongxia Li*, Dawei Liu*, Runhao Li*, Haoyuan Song, Qingyu Zhang, Yubo Wang, Jingcheng Ni, Shihang Gui, Congchao Dong, Tao Hu

3D vision has rapidly evolved, driven by increasingly diverse data representations, learning paradigms, and modeling strategies. Yet the field remains fragmented across representations and benchmarks, making it difficult to develop unified perspectives on efficiency, fidelity, and scalability. This work provides a data-centric taxonomy of 3D vision that connects geometric representations, datasets, learning frameworks, and applications within a single conceptual map. We survey principal structural representations of 3D data, then examine how dataset design, benchmark construction, and supervision regimes shape recent advances spanning 2D-supervised 3D learning, implicit neural representations, and 4D world modeling.

Paper GitHub

Multimodal Video Generation Models with Audio: Present and Future

PreprintFeb 2026Zongxia Li, Hongyang Du, Dawei Liu, Xiyang Wu, Lantao Yu, Jingxi Chen, Fuxiao Liu, Xiaomin Wu, Jing Xie, Chengsong Huang, Yicheng He, Guangyao Shi

Video generation models have advanced rapidly and are now widely used across entertainment, advertising, filmmaking, and robotics applications such as world modeling and simulation. However, visual content alone is often insufficient for realistic and engaging media experiences; audio is also a key component of immersion and semantic coherence. As AI-generated videos become increasingly prevalent in everyday content, demand has grown for systems that can generate synchronized sound alongside visuals. This trend has driven rising interest in multimodal video generation, which jointly models video and audio to produce more complete, coherent, and appealing outputs. Since late 2025, a wave of multimodal video generation models has emerged, with releases including Veo 3.1, Sora 2, Kling 2.6, Wan 2.6, OVI, and LTX 2. As multimodal generation technology advances, its impact expands across both consumer and industrial domains, revolutionizing entertainment while enabling more sophisticated world simulation for embodied AI systems. In this paper, we provide a comprehensive overview of the literature covering common architectures, post-training methods, evaluation, applications, and open challenges.

Paper GitHub

TIMEDB: tumor immune micro-environment cell composition database with automatic analysis and interactive visualization

Nucleic Acids ResearchJan 2023Xueying Wang*, Lingxi Chen*, Wei Liu, Yuanzheng Zhang, Dawei Liu, Chenxin Zhou, Shuai Shi, Jiajie Dong, Zhengtao Lai, Bingran Zhao, Wenjingyu Zhang, Haoyue Cheng, Shuaicheng Li

TIMEDB is an online database for analyzing human tumor immune microenvironment cell-type composition from bulk expression profiles. It curates expression and composition profiles with clinical information for 39,706 samples from 546 datasets across 43 cancer types, and provides automatic analysis with interactive visualization.

Paper GitHub Website

Open Source

Research tooling and agent benchmarks I build and contribute to, mostly with the LAIR community.

Dr. Claw: An AI Research Workspace from Idea to Paper

Core ContributorMar 2026Dingjie Song, Hanrong Zhang, Dawei Liu, Yixin Liu, Zongxia Li, Zhengqing Yuan, Siqi Zhang, Lichao Sun

Dr. Claw is a general-purpose AI research assistant designed to help researchers and builders execute end-to-end projects across different domains. From shaping an initial idea to running experiments and preparing publication-ready outputs, Dr. Claw keeps the full workflow in one place so teams can focus on research quality and iteration speed. I am one of the top three core contributors.

GitHub

Long-Horizon Terminal-Bench (LHTB)

ContributorJul 2026

LHTB is a 46-task benchmark measuring how well LLM agents sustain useful work inside a containerized terminal over hundreds of steps. Unlike short-horizon coding benchmarks, it drops agents into a stateful environment and grades them with hidden, rebuild-from-artifact verifiers under a dense reward scheme, so self-reported progress does not count. Tasks span interactive games and puzzles, multimodal analysis, software and reverse engineering, scientific computing, security and performance, and research reproduction. I contribute task design and harness work to the project.

Paper GitHub Website

Education

A research-oriented training path grounded in both systems engineering and applied AI.

Education

University of Pennsylvania

Aug 2024 – May 2026

M.S.E.Master of Science in EngineeringComputer and Information Science

GPA3.90 / 4.00

Hagan International Scholarship

Education

Northeastern University

Sep 2020 – Jul 2024

B.E.Bachelor of EngineeringSoftware Engineering

GPA3.95 / 4.00Top 1%

National ScholarshipOutstanding GraduateMerit-based ScholarshipOutstanding Student ×3Outstanding Student Leader

Professional Experience

I like roles where modeling, infrastructure, and product rigor intersect. The common thread across these teams has been building reliable systems that move real metrics.

Work on ranking, creative intelligence, and delivery tooling for Shop Ads. Joined as an SDE Intern, converted to a full-time Software Engineer within 12 weeks, and returned full-time after graduation.

Themes

Ads rankingGenAI creative toolingLow-latency backend

Built the Image Selection system with posterior feature modeling, exploration-exploitation ranking, and multimodal LLM quality evaluation, driving 3.4%+ revenue uplift.
Developed a GenAI image enhancement and generation pipeline backed by Flink, TBase, and Forward Index, contributing 12%+ revenue uplift on high-value products.
Designed a modular preview flow with unified diagnostics across Ad, Creative, SPU, and Image entities, making rollout debugging much faster.
Redesigned cache and offline fetching paths in Product Handler, reducing p99 latency by 43.6% and cutting 80% of failure spikes during peak traffic.

Focused on distributed observability infrastructure, building tracing and telemetry tools that worked across microservices and serverless environments without invasive code changes.

Themes

OpenTelemetryJava AgentDistributed tracing

Developed a Java Agent on top of OpenTelemetry and ByteBuddy for non-intrusive runtime instrumentation and payload-aware tracing.
Implemented a loosely linked tracing module that reconstructed end-to-end call chains with business IDs across partially instrumented systems.
Added adaptive runtime delivery for Fargate and Lambda using reflection-based environment detection with Kinesis and SQS backends.
Built a full-stack telemetry console with query, aggregation, pagination, and multiple trace views including tree, table, timeline, and payload inspection.

Worked on platform engineering for internal AI tooling, with an emphasis on automation, modular system design, and cloud-native delivery workflows.

Themes

Platform engineeringGitOpsMulti-tenant delivery

Redesigned the resource management service with ConfigMap automation, Kubernetes Informers, async recalculation, and sharded row locking, cutting cold-start time by 20x.
Refactored campaign page generation using a modular Strategy pattern to improve maintainability and feature reuse.
Introduced a GenAI-assisted content pipeline to automate campaign page section creation and reduce manual operations.
Improved CI/CD with Argo Workflows, Argo CD, and Helm-based multi-tenant deployment tooling for hybrid-cloud delivery.

Award-Winning Projects

Competition projects where I led the build, from spatial interaction on Apple platforms to recommendation-driven web products.

Winner · Swift Student Challenge 2026

A touchless cooking assistant for iPad that removes the “digital friction” of swiping recipes with messy hands, using on-device computer vision so the device fades into the background.

Tracked 21 hand joints with the Vision framework and a custom low-pass filter to separate intentional gestures from sensor noise.
Designed a Dual Confirmation Ring and large-scale palm gestures for accessibility, with voice control and text-to-speech as fallback modalities.
Built an AI Visibility Bar that monitors lighting and distance in real time, plus a hand-skeleton PIP so users can see how the model reads their intent.

SwiftUIVisionCombineSpeechAVFoundation

2nd Prize · Mobile Application Innovation Contest 2022

An AR social app, co-hosted by Apple and Zhejiang University, that lets users anchor and share geo-tagged notes in physical space and earn tokens through daily check-ins.

Optimized spatial trace retrieval with Redis Geo for real-time nearby queries and distance-based ranking.
Used Lua scripts for atomic inventory pre-checks, preventing overselling and enforcing a one-order-per-user policy.
Handled async order creation and timeout-based cancellation via RabbitMQ delayed message exchange.

SwiftARKitRedis GeoRabbitMQLua

1st Prize · ByteDance Youth Camp 2022

A rebuilt developer community web app with a personalized “For You” feed, post interactions, and responsive design.

Built the personalized feed with TrustSVD plus timeline signals to address cold start in recommendation.
Automated daily data refresh and model retraining with Spring Scheduler to keep recommendations fresh.
Implemented JWT and RBAC with Sa-Token for fine-grained access control and session management.

ReactSpringTrustSVDSa-Token

Tech Stack

A practical toolkit shaped by backend engineering, ML experimentation, and product-facing system design.

BuildBackend systems, ML tooling, and production-facing infra

BiasReliability, latency, observability, and scalable delivery

ContextApplied AI, recommender systems, and research platforms

Language

C/C++JavaGo

PythonJavaScriptTypeScript

SQLHTMLCSS

AI / ML

SkillsHarnessResearch AutomationRAG

TransformerCLIPToken Pruning

ItemCFTwo-TowerTri-TowerExploration-Exploitation

PyTorchA/B TestingOffline MetricsError Analysis

Backend

SpringGuiceCoral

ThriftProtobuf

MySQLRedisElasticSearch

KafkaRabbitMQFlink

OpenTelemetry

Frontend

ReactVueVite

EChartsAWS UIArco Design

SwiftSwiftUIARKit

Graphics

OpenGLGLSLUnity

QtMaya APIRigNet

DevOps

Unix/LinuxDockerKubernetes

GrafanaHelmCDK

DynamoDBKinesisSQSS3Lambda

Tools

GitVimSSHShell

MarkdownLaTeXMermaidVuePress

Let's Connect

Interested in Ads delivery, Recommendation Systems, Agentic Systems, or engineering at scale. I'm always happy to connect with researchers, engineers, and teams working on ambitious, high-leverage problems.

Research collaborationML infrastructureBackend systemsApplied AI products

Hello, I'm

Dawei Liu

Building AI research workspaces that reduce workflow friction.

Shipped ranking and creative intelligence with measurable business lift.

Focused on robust backend design, observability, and latency engineering.

COMFYCLAW: Self-Evolving Skill Harnesses for Image Generation Workflows

Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills

A Cookbook of 3D Vision: Data, Learning Paradigms, and Application

Multimodal Video Generation Models with Audio: Present and Future

TIMEDB: tumor immune micro-environment cell composition database with automatic analysis and interactive visualization

Dr. Claw: An AI Research Workspace from Idea to Paper

Long-Horizon Terminal-Bench (LHTB)

University of Pennsylvania

Northeastern University

Software Engineer

Software Development Engineer Intern

Software Development Engineer Intern

Aura Chef

Trace Note

Juejin Clone

Systems

Scripting

Query & Markup

Agent Systems

LLM & Multimodal

RecSys & Ranking

Training & Eval

Frameworks

APIs & Schemas

Data & Search

Streaming & MQ

Observability

Web Apps

UI & DataViz

Apple

Real-Time & Shader

Toolkits

Runtime & Containers

Platform Ops

AWS

Developer Workflow

Writing & Docs