TBD
The Community Labeling and Sharing of Security and Networking Test datasets (CLASSNET) project will provide new, labeled, rich and diverse datasets to the research community to support network and security research. The project will develop a framework for collaborative, community-driven enrichment and labeling of data, enabling use of these datasets for machine learning (ML) in networking and security. Furthermore, the CLASSNET project will make data available to researchers through multiple methods, ensuring privacy of the data while enabling flexible data computation. The project will also generate diverse continuous (constantly, automatically updated) and curated (selected by humans) datasets for research use.
The CLASSNET project will innovate in dimensions of data labeling, data distribution and data sources. In data labeling, the CLASSNET collaborative framework will provide a low-friction framework for sharing annotations among researchers. The framework will incentivize labeling with feedback mechanisms and user credits, and support bulk, automatic, algorithmic labeling. In data distribution, CLASSNET will support multiple ways of data access, ranging from downloading anonymized data to processing data in the cloud, on provider machines or via the code-to-data approach. Finally, CLASSNET data sources will provide new, diverse, continuous, and curated datasets that are useful for network and security research, including traffic packets and flows, network telescope data, Domain Name System (DNS) data and Internet topology data.
The immediate impact of this project will include new types of labeled, curated and continuous datasets that enable new security, networking, and ML research and education, impacting a large community. The broader impact of this data will be to foster research and education, which will make the Internet safer, more stable, and more secure, and will increase the community's knowledge about the Internet. With the Internet's importance for tele-work, tele-medicine, remote learning, e-commerce and e-government, these improvements will have a broad societal impact.
In addition, CLASSNET datasets will support data-driven exercises for graduate and undergraduate education, and new PhD research. CLASSNET project's innovations in multiple pathways to data access, combined with the automated and incentivized enrichment framework, will improve the state-of-the-art methods in responsible data-sharing in related disciplines of information technology.