Credit: Pixabay/CC0 Public Domain
Researchers from Carnegie Mellon University and the University of California, Berkeley have developed a new method to improve how computers organize and analyze large data sets. The advancement could improve our ability to extract information from knowledge graphs, with implications for our ability to analyze social networks and customer behavior.
The new method, described in a study led by Benjamin Moseley, an associate professor of Carnegie Bosch Operations Research at Carnegie Mellon University’s Tepper School of Management, can more effectively group similar items together while keeping dissimilar items separate.
The paper will be presented at the International Conference on Automata, Languages and Programming (ICALP) in July 2024.
“Our new algorithms can significantly enhance how large data sets are analyzed, whether that’s to accurately detect user communities to improve social media platforms or to better understand gene interactions to advance medical research,” Moseley said.
He noted that a key trend in business analytics is the ability to work with knowledge graphs that show information such as customer behavior or business processes. The paper focuses on clustering, a common method for extracting information from these graphs. The new method presented in this work can more effectively group similar items together while still distinguishing between items that are different.
Properly organizing vast amounts of data is difficult due to inconsistencies and the sheer volume of information. Moseley and his team focused on creating an algorithm that could quickly and accurately group data points. They used a mathematical structure that consists of nodes, which represent data points, and edges, which are connections between the nodes. The algorithm works by evaluating these connections and determining the best way to group similar nodes together.
The results show that their algorithm is faster and more accurate than previous methods, and can handle large datasets more efficiently, making it practical for real-world applications.
“Our new method is faster than any previous method at minimizing mistakes in grouping the data,” says Sami Davis, a research associate in theoretical computer science at the University of California, Berkeley. “Our method is more flexible in the sense that it can group data in ways that are suitable for many different purposes at the same time.”
The researchers plan to continue refining the method and exploring its applications in different fields. This ongoing research may lead to even more accurate and insightful data analysis.
Heather Newman, a doctoral candidate in algorithms, combinatorics and optimization at the Tepper School, is also a co-author.
Further information: Sami Davies et al. “Simultaneous approximation of all ℓp norms in correlation clustering.” arXiv (2023). DOI: 10.48550/arxiv.2308.01534
Journal information: arXiv Provided by: Tepper School of Business, Carnegie Mellon University
Citation: Study showcases new method to improve grouping in data analysis (July 22, 2024) Retrieved July 22, 2024 from https://techxplore.com/news/2024-07-showcases-method-grouping-analysis.html
This document is subject to copyright. It may not be reproduced without written permission, except for fair dealing for the purposes of personal study or research. The content is provided for informational purposes only.