WWW2026
Multi-source Multi-level Multi-token Ethereum Dataset and Benchmark Platform
Haoyuan Li, Mengxiao Zhang, Maoyuan Li, Jianzheng Li, Zijian Zhang, Junyi Yang, Shuangyan Deng, Jiamou Liu
被引用 1 次
摘要
The rapid growth of the cryptocurrency ecosystem has intensified the need for multimodal data that captures the interplay among user behavior, market dynamics, and public sentiment. Existing public datasets remain fragmented and cannot support such integrated analysis. We present 3MEthTaskforce¹, a large-scale multimodal Ethereum dataset that integrates 303 million transactions across 3,880 tokens, token histories, global market indicators, and LLM-annotated Reddit sentiment from 2014–2024, all aligned on a daily timeline. The dataset enables tasks such as user behavior prediction and market trend modeling. This paper reports benchmark results for user behavior prediction, while the project website provides additional evaluations. The dataset is publicly available via Figshare², where it has been downloaded more than four hundred times, and the codebase is released at https://github.com/Haoyuan-Li-UoA/3MEthTaskforce.