From heterogeneous healthcare data to disease-specific biomarker networks: A hierarchical Bayesian network approach

Ann-Kristin Becker; Marcus Dörr; Stephan B. Felix; Fabian Frost; Hans J. Grabe; Markus M. Lerch; Matthias Nauck; Uwe Völker; Henry Völzke; Lars Kaderali

doi:10.1371/journal.pcbi.1008735

Peer Review History

Original SubmissionAugust 19, 2020
12 Nov 2020 Decision Letter - Florian Markowetz, Editor, Alison Marsden, Editor Dear Prof. Dr. Kaderali, Thank you very much for submitting your manuscript "From heterogeneous healthcare data to disease-specific biomarker networks: a hierarchical Bayesian network approach" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations. Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Alison Marsden Associate Editor PLOS Computational Biology Florian Markowetz Deputy Editor PLOS Computational Biology ********************* A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately: [LINK] Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: TITLE From heterogeneous healthcare data to disease-specific biomarker networks: a hierarchical Bayesian network approach AUTHORS A-K. Becker, M. Dorr, S.B. Felix, F. Frost, H.J. Grabe, M.M. Lerch, M. Nauck, U. Volker, H. Volzke, L. Kaderali DESCRIPTION The paper proposes an automatic algorithm that hierarchically refines the structure of a Bayesian network to detect groups of homogeneous features and to learn their conditional relation. After an initial presentation of the algorithm, the authors compare the performance of their approach on a synthetic dataset generated from a parametric family of networks, and assess the ability of the proposed approach to recover the original Bayesian network topology and parameters. Their approach is then compared with other strategies to handle groups in Bayesian networks and by aggregation strategies using medoids and first principal components. The proposed approach is tested on a toy model which is used to determine factor distinguishing wines produced from two different types of soils. This is followed by an analysis of a large collection of electronic health records, focusing on two conditions whose early diagnosis is critical for positive long term outcomes, i.e., non-alcoholic fatty liver disease and systemic hypertension. Refined group Bayesian networks show superior performance than commonly adopted clinical indices, logistic regression and Bayesian networks with different group handling strategy. The paper is interesting and well written. Publication is recommended, I have only a few minor comments for the authors to address. THINGS THAT ARE NOT CLEAR OR SHOULD BE EXPLAINED FURTHER - pag. 5 section "Evaluating simulated data". While the difference between approaches to aggregate groups (i.e., medoids and first principal components) is clear, the exact meaning of the other methods such as "network-based" and "using group data" in Figure 3 is not clear. Is "network-based" an approach where group information is disregarded and "group Bayesian network" an approach where the group separation is initially a-priori enforced and left unaltered? The authors should further explain the nature of the approaches they are comparing against, adding appropriate citations from the literature, if appropriate. - In my opinion, the policy of the journal where the result section precedes the method section is suboptimal for this paper. Many questions that arise in terms of precisely defining how the numerical tests are performed on both synthetic and real datasets find answers in the method section. I therefore think the paper would benefit from switching the result and method sections. - I was wondering if the authors could add some more detail on how exactly the predictions from the Bayesian network model were obtained in Table 1. Were all other variables assigned as evidence and the most likely value of the variable "steatosis" inferred using a max-product algorithm? Or just variables belonging to the Markov blanket of the variable "steatosis" were used, independent on observations on other variables being missing? - It is not clear why a table like Table 1 is not provided for the application on systemic hypertension. - The authors should report the processing times required for the two real dataset analyzed in the papers, i.e., 2311-407 and 4403-328 participant-features for the NAFLD and hypertension datasets, respectively. Specifically, they should report the time required to perform the initial hierarchical clustering, structure learning with group refinement, parameter learning and prediction. SYNTAX, ETC. TITLE, ABSTRACT, REFERENCES - Title and abstract appear to be appropriate. - Please review reference 15. RECOMMENDATION Publication of the paper is recommended. Reviewer #2: This study presents an approach to analyze medical data using Baysian networks with hierarchical clustering. The authors provide example applications of this model first to a “toy” model and then to hypertension and non-alcoholic fatty liver disease (NAFLD) using a database of clinical data. I have the following comments/concerns/questions about this manuscript: The AUROC for the unrefined detailed Bayesian network was significantly lower for hypertension than for NAFLD. Why are these thought to be so different? For the SHIP Trend data used, it would be useful if the authors provided a little more detail on the input variables and indicate the total number of variables. Were these the same for both the hypertension and NAFLD analyses? For the SHIP Trend data used, how common were missing variables in the data set used for the analyses? The manuscript indicates that for the both clinical analyses variables with greater than 20% missing data were excluded. Do the authors have any estimate of the effect that missing variables had on their results? The results for the NAFLD analyses were compared with NAFLD clinical risk prediction models. Why wasn’t the same done with the hypertension analyses? A number of such models have been reported. Minor: The clinical standard is to diagnose hypertension when the blood pressure is elevated on repeated measurements rather than a single measurement. Was that the case for the subset of patients diagnosed as being hypertensive based on blood pressure readings? AUROC and AUPRC are only defined in a figure caption, not in the text. In Figure 5, BIA is not defined. ****** Have all data underlying the figures and results presented in the manuscript been provided?** Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: No: The dataset for the NAFLD and hypertension studies is not provided with the paper or supplemental material Reviewer #2: Yes ******** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see http://journals.plos.org/ploscompbiol/s/submission-guidelines#loc-materials-and-methods https://doi.org/10.1371/journal.pcbi.1008735.r001
Revision 1
21 Dec 2020 Author Response Attachments Attachment Submitted filename: response.pdf https://doi.org/10.1371/journal.pcbi.1008735.r002
22 Jan 2021 Decision Letter - Florian Markowetz, Editor, Alison Marsden, Editor Dear Prof. Dr. Kaderali, We are pleased to inform you that your manuscript 'From heterogeneous healthcare data to disease-specific biomarker networks: a hierarchical Bayesian network approach' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Alison Marsden Associate Editor PLOS Computational Biology Florian Markowetz Deputy Editor PLOS Computational Biology ********************************************************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The authors have addressed all my comments. Publication of this contribution in its current form is therefore recommended. Reviewer #2: I believe that this revised version is improved and adequately addresses the points raised by the reviewers. ****** Have all data underlying the figures and results presented in the manuscript been provided?** Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: Yes ******** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No https://doi.org/10.1371/journal.pcbi.1008735.r003
Formally Accepted
7 Feb 2021 Acceptance Letter - Florian Markowetz, Editor, Alison Marsden, Editor PCOMPBIOL-D-20-01495R1 From heterogeneous healthcare data to disease-specific biomarker networks: a hierarchical Bayesian network approach Dear Dr Kaderali, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Alice Ellingham PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1008735.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .