His research focuses primarily on data management in computer and distributed systems. With his students and collaborators, Zhang has published a list of papers on
algorithms and their system implementations, which have been adopted in mainstream operating and database systems, as well as commercial processors, including
Sun Microsystems,
MySQL,
BSD operating system,
Fusion Drive by
Apple, Geometric Performance Primitives (GPP) by
Nvidia and others. • In 2000, together with Zhao Zhang and Zhichun Zhu, they identified a structural issue for data transfers between
CPU cache and
DRAM memory in existing computer architectures. Specifically, a conflict miss in the CPU cache would inevitably lead to a row buffer miss in DRAM, resulting significant memory access delays. To address this problem, they proposed a permutation-based page interleaving method, which they presented and published in the
International Symposium on Microarchitecture (MICRO). This method influenced the
interleaved memory design and was quickly adopted by commercial computer products, first by
Sun Microsystems, and later by
AMD,
Intel, and
NVIDIA. Twenty years later, in 2020, the three authors were honored with the ACM Microarchitecture Test of Time Award for their high impact work. • In 2002, Song Jiang and Zhang published and presented their
LIRS cache replacement algorithm in ACM
SIGMETRICS Conference. The LIRS algorithm addressed the fundamental issues in the
LRU replacement algorithm. The LIRS algorithm, LIRS-like, and its approximation
Clock-Pro have been widely adopted in many data management production systems, including
MySQL Database,
H2 Database,
Key-value databases of Cassandra,
RocksDB,
Memcached, in-memory data systems of GridGain (now
Ignite),
Infinispan,
Cloudera Impala,
Red Hat data grid,
Spark in data repository systems of
Apache Jackrabbit, and
Red Hat virtualization system. The LIRS algorithm has also influenced the replacement algorithm implementation of
operating systems, including
Berkeley Software Distribution (BSD) and
Linux. LIRS approximation Clock-Pro has been a part
Rust Library, as an open system utility. Recently the LIRS concept has been used in cache block replacement in
Intel CPU cache. The method is called Re-Reference Interval Prediction (RRIP). • In 2008, with Jiang Lin, Qingda Lu, Xiaoning Ding, Zhao Zhang, and P. Sadayappan, he published a paper on using operating system to allocate pages in the
Last-Level-Cache (LLC) of
multicore processors to avoid cache conflicts among different running processes. The published methods along with the open-source code in
Linux, has been adopted by
Intel. • In 2011, with Rubao Lee and Yin Huai at
Ohio State, Namit Jain and Zheng Shao at
Facebook, and Yongqiang He and Zhi-Wei Xu at the
Chinese Academy of Sciences, he published and presented the paper of
RCFile in
IEEE International Conference on Data Engineering (ICDE), defining an effective data storage format for databases and for
big data processing on large-scale
distributed systems. RCFile and its optimized version
Apache ORC have been widely adopted in many data systems, including
Apache Hive, Meta's
Data Lake,
Cloudera’s
Impala and Amazon Athena and
S3. RCFile and ORC have also been adopted in commercial data systems including
IBM,
Microsoft,
Oracle,
SAS,
Teradata, and others. • In 2011, Rubao Lee, Tian Luo, Yin Huai, Fusheng Wang, Yongqiang He, and Zhang published and presented their paper, titled "YSmart: Yet another SQL-to-MapReduce translator" in the
International Conference on Distributed Computing Systems (ICDCS). YSmart automatically converts SQL queries into
MapReduce programs for execution. It is adopted by
Apache Hive to help SQL users to automatically generate their MapReduce programs. • In 2011, with Feng Chen and David Koufaty, he published a paper, titled "Hystor: making the best use of solid-state drives in high performance storage systems", in ACM International Conference on Supercomputing (ICS). Hystor is a design and implementation in
Linux for a hybrid storage of both
hard disk drive (HDD) and
solid-state drive (SSD), which influenced the
Apple's hybrid storage product
Fusion Drive. • In 2012, with a group of researchers both at Ohio State and Emory University Medical School, the algorithm PixelBox and its GPU implementation in his paper on accelerating pathology image data processing was included in NVIDIA Developer
Geometric Performance Primitives. • In 2013, with a group of researchers both at
Ohio State and
Emory University Medical School, he published paper, titled "Hadoop-GIS: a high-performance spatial data warehousing systems over MapReduce", in the International Conference on Very Large Data Bases. Hadoop-GIS open-source software was released in 2011. This work initiated the development of a new
spatial data analytical ecosystem characterized by its large-scale capacity in both computing and data storage, high scalability, compatibility with low-cost commodity processors in clusters and open-source software. After more than a decade of research and development, this ecosystem has matured and is now serving many applications across various fields. The authors of the Hadoop-GIS paper received the 2024 VLDB Endowment Test of Time Award. A major theme of his work involves designing algorithms and systems for practical applications running in production systems and contributing to the development of computer systems. == Awards and honors ==