Identifying forensic interesting files in digital forensic corpora by applying topic modelling
DP Joseph, J Norman - Advances in Distributed Computing and Machine …, 2021 - Springer
DP Joseph, J Norman
Advances in Distributed Computing and Machine Learning: Proceedings of ICADCML …, 2021•SpringerThe cyber forensics is an emerging area, where the culprits in a cyber-attack are identified.
To perform an investigation, investigator needs to identify the device, backup the data and
perform analysis. Therefore, as the cybercrimes increase, so the seized devices and its data
also increase, and due to the massive amount of data, the investigations are delayed
significantly. Till today many of the forensic investigators use regular expressions and
keyword search to find the evidences, which is a traditional approach. In traditional analysis …
To perform an investigation, investigator needs to identify the device, backup the data and
perform analysis. Therefore, as the cybercrimes increase, so the seized devices and its data
also increase, and due to the massive amount of data, the investigations are delayed
significantly. Till today many of the forensic investigators use regular expressions and
keyword search to find the evidences, which is a traditional approach. In traditional analysis …
Abstract
The cyber forensics is an emerging area, where the culprits in a cyber-attack are identified. To perform an investigation, investigator needs to identify the device, backup the data and perform analysis. Therefore, as the cybercrimes increase, so the seized devices and its data also increase, and due to the massive amount of data, the investigations are delayed significantly. Till today many of the forensic investigators use regular expressions and keyword search to find the evidences, which is a traditional approach. In traditional analysis, when the query is given, only exact searches that are matched to particular query are shown while disregarding the other results. Therefore, the main disadvantage with this is that, some sensitive files may not be shown while queried, and also additionally, all the data must be indexed before performing the query which takes huge manual effort as well as time. To overcome this, this research proposes two-tier forensic framework that introduced topical modelling to identify the latent topics and words. Existing approaches used latent semantic indexing (LSI) that has synonymy problem. To overcome this, this research introduces latent semantic analysis (LSA) to digital forensics field and applies it on author’s corpora which contain 29.8 million files. Interestingly, this research yielded satisfactory results in terms of time and in finding uninteresting as well as interesting files. This paper also gives fair comparison among forensic search techniques in digital corpora and proves that the proposed methodology performance outstands.
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果