Research | Teaching | Publications | Books | Misc |
|
ResearchGate Google Scholar |
Dr.
Daniel Berrar Lecturer in Statistics & Data Science Machine Learning Research Group School of Mathematics & Statistics Faculty of Science, Technology, Engineering & Mathematics The Open University Alan Turing Building, Room 306, Milton Keynes MK7 6AA, UK Email: daniel.berrar[at]open.ac.uk Department of Information and Communications Engineering School of Engineering Institute of Science Tokyo (formerly Tokyo Institute of Technology) 2-12-1-S3-70 Ookayama, Meguro, Tokyo 152-8552, Japan Email: daniel.berrar [at] ict.e.titech.ac.jp |
Berrar D.(2024)
Estimating the replication probability of significant classification benchmark experiments. Journal of Machine Learning Research 24-0158:1-42. [paper]
Berrar D., Lopes P., and Dubitzky, W. (2024)
A data- and knowledge-driven framework for
developing machine learning models to predict soccer match
outcomes. Machine Learning 113:8165-8204. [paper]
[2023
Soccer Prediction Challenge]
Berrar D. (2024) Cross-validation. 2nd
Edition of Encyclopedia of Bioinformatics and
Computational Biology, Volume 1, Elsevier. [preprint]
Berrar D. (2024) Performance measures for
binary classification. 2nd Edition of Encyclopedia of
Bioinformatics and Computational Biology, Volume 1,
Elsevier. [preprint]
Berrar D. (2024) Introduction to the non-parametric bootstrap. 2nd
Edition of Encyclopedia of
Bioinformatics and Computational Biology, Volume 1, Elsevier. [preprint]
Berrar D. (2024) Bayes' theorem and naive Bayes classifier. 2nd
Edition of Encyclopedia of
Bioinformatics and Computational Biology, Volume 1, Elsevier. [preprint]
Cross J., Berrar D., Watson I., Smith R. (2023)
The UK-Japan Engineering Education League
(UKJEEL) Workshop: Rationale, Goals, and Lessons Learned. Proc.
71st Annual Conference of
Japanese Society for Engineering Education, Hiroshima, Japan, 6-8
Sep. 2023, pp. 1-4.
Berrar D. (2022) Using p-values for the comparison of classifiers over multiple data sets: pitfalls and alternatives. Data Mining and Knowledge Discovery 36:1102-1139. [link]
Quinn G.A., Abdelhameed A., Banat I.M., Berrar D., Doerr S.,
Dudley E., Francis L.W., Gazze S.A., Hallin I., Matthews P., Swain
M.T., Whalley R., and van Keulen G. (2022) Complimentary protein
extraction methods increase the identification of the Park Grass
Experiment metaproteome. Applied Soil Ecology.
Berrar D. and Dubitzky W. (2021) Deep Learning in Bioinformatics and Biomedicine. Editorial, Special issue in Briefings in Bioinformatics.
Geyer K.K., Munshia S.E., Vickers M., Squance M., Wilkinson T.J., Berrar D., Chaparroe C., Swain M.T., Hoffmann K.F. (2018) The anti-fecundity effect of 5-azacytidine (5-AzaC) on Schistosoma mansoni is linked to dis-regulated transcription, translation and stem cell activities. International Journal for Parasitology: Drugs and Drug Resistance 8(2):213−222.
Berrar D. (2018) Introduction to the non-parametric bootstrap. Encyclopedia of Bioinformatics and Computational Biology, Volume 1, Elsevier, pp. 766-773.
Berrar D. (2018) Cross-validation. Encyclopedia of Bioinformatics and Computational Biology, Volume 1, Elsevier, pp. 542-545.
Berrar D. (2018) Performance measures for binary classification. Encyclopedia of Bioinformatics and Computational Biology, Volume 1, Elsevier, pp. 546-560.
Berrar D. (2018) Bayes' theorem and naive Bayes classifier. Encyclopedia of Bioinformatics and Computational Biology, Volume 1, Elsevier, pp. 403-412.
Berrar D., Lopes P., Davis J., and Dubitzky W. (2024) Machine
learning in soccer. Special issue in Machine Learning. |
Berrar D., Lopes P., Davis J., and Dubitzky W. (2018) Machine
learning for soccer. Special issue in Machine
Learning. Summary To what extent can machine learning predict the outcome of soccer matches? To answer this question, we developed the Open International Soccer Database and organized the 2017 Soccer Prediction Challenge. This special issue contains selected papers from the top-ranking participants in this data mining competition, as well as papers reporting innovative machine learning methods for soccer data analysis. |
|
Berrar
D.
and Schuster A. (2011) Omnipresent
intelligent
computing - new developments and societal impact.
Special issue in the Journal of Advanced Computational
Intelligence and Intelligent Informatics, Fuji Technology
Press, 15(7):785-812. Summary When the computer revolution began in the second half of the 20th century, few could have foreseen the pervasiveness that intelligent devices would have only half a century later. Today, consumers deal with numerous computing devices providing increasingly sophisticated services. Arguably, no other invention has so profoundly impacted on daily home and work lives as the computer. The downside, however, holds the worrying realization that many artifacts of modern technology now touch on the human sphere to the point of risking an individual\92s privacy, security, and well-being. This special issue informs the research community about exciting new developments in intelligent computing, with an outlook on their societal impacts. |
|
Berrar
D., Sato N., and Schuster A. (2010) Artificial
intelligence
in neuroscience and systems biology: lessons learnt,
open problems, and the road ahead. Special issue in
Advances in Artificial Intelligence, Hindawi, 120 pages. [pdf] Summary This special issue informs the research community about an exciting and stimulating relationship between artificial intelligence, neuroscience, and systems biology. The special issues provides access to many state-of-the-art theoretical and applied problems in these hugely exciting fields that are so relevant for modern science. This special issue is also intended as a platform to bridge cultural and technological gaps between these disciplines. |
Ranganathan S., Nakai K., and Schonbach C. (eds.) (2018) Encyclopedia
of Bioinformatics and Computational Biology: ABC of
Bioinformatics. 2500 pages, Elsevier, ISBN:
9780128114148. Chapters by Berrar D.: Introduction to the non-parametric bootstrap. Cross-validation. Performance measures for binary classification. Bayes' theorem and naive Bayes classifier. |
|
Dubitzky
W., Wolkenhauer O., Yokota H., and Cho K.-H. (eds.)
(2013) Encyclopedia
of
Systems Biology, 2100 pages, ISBN:
978-1-4419-9864-4. Berrar D. (section editor) "Artificial Intelligence and Machine Learning"
The Encyclopedia of Systems Biology is conceived as a comprehensive reference work covering all aspects of systems biology, in particular the investigation of living matter involving a tight coupling of biological experimentation, mathematical modeling and computational analysis and simulation. The main goal of the Encyclopedia is to provide a complete reference of established knowledge in systems biology \96 a "one-stop shop" for someone seeking information on key concepts of systems biology. As a result, the Encyclopedia comprises a broad range of topics relevant in the context of systems biology. The audience targeted by the Encyclopedia includes researchers, developers, teachers, students and practitioners who are interested or working in the field of systems biology. Keeping in mind the varying needs of the potential readership, we have structured and presented the content in a way that is accessible to readers from wide range of backgrounds. In contrast to encyclopedic online resources, which often rely on the general public to author their content, a key consideration in the development of the Encyclopedia of Systems Biology was to have subject matter experts define the concepts and subjects of systems biology. |
|
Zheng H., Dubitzky W., Hu X., Hao J.K., Berrar D., Cho K.H., Wang Y., Gilbert D. (2014) Proceedings of the 2014 IEEE International Conference on Bioinformatics and Biomedicine, 2-5 November 2014, Belfast, UK. | |
Dubitzky
W., Granzow M., and Berrar D. (eds) (2007) Fundamentals
of
Data Mining in Genomics and Proteomics, Springer, 282
pages, ISBN: 978-0-387-47508-0. Front matter [pdf] Back matter [pdf] More than ever before, research and development in genomics and proteomics depends on the analysis and interpretation of large amounts of data generated by high-throughput techniques. With the advance of computational systems biology, this situation will become even more manifest as scientists will generate truly large-scale data sets by simulating of biological systems and conducting synthetic experiments. To optimally exploit such data, life scientists need to understand the fundamental concepts and properties of the fast-growing arsenal of analytical techniques and methods from statistics and data mining. Typically, the relevant literature and products present these techniques in a form which is either very simplistic or highly mathematical, favoring formal rigor over conceptual clarity and practical relevance. Fundamentals of Data Mining in Genomics and Proteomics addresses these shortcomings by adopting an approach which focuses on fundamental concepts and practical applications. The book presents key analytical techniques used to analyze genomic and proteomic data by detailing their underlying principles, merits and limitations. An important goal of this text is to provide a highly intuitive and conceptual (as opposed to intricate mathematical) account of the discussed methodologies. This treatment will enable readers with interest in analysis of genomic and proteomic data to quickly learn and appreciate the essential properties of relevant data mining methodologies without recourse to advanced mathematics. To complement the conceptual discussions, the book draws upon the lessons learned from applying the presented techniques to concrete analysis problems in genomics and proteomics. The caveats and pitfalls of the discussed methods are highlighted by addressing questions such as: What can go wrong? Under which circumstances can a particular method be applied and when should it not be used? What alternative methods exist? Extensive references to related material and resources are provided to assist readers in identifying and exploring additional information. The structure of this text mirrors the typical stages involved in deploying a data mining solution, spanning from data pre-processing to knowledge discovery to result post-processing. It is hoped that this will equip researchers and practitioners with a useful and practical framework to tackle their own data mining problems in genomics and proteomics. In contrast to some texts on machine learning and biological data analysis, a deliberate effort has been made to incorporate important statistical notions. By doing so the book is following demands for a more statistical data mining approach to analyzing high-throughput data. Finally, by highlighting limitations and open issues Fundamentals of Data Mining in Genomics and Proteomics is intended to instigate critical thinking and avenues for new research in the field. |
|
Bremer
E., Hakenberg J., Han E.H., Berrar D., and Dubitzky W.
(eds.) (2006) Proc.
Knowledge
Discovery in Life Science Literature (KDLL 2006),
Springer LNCS Series: Lecture Notes in Bioinformatics, Vol.
3886, 147 pages, ISBN: 3-540-32809-2 (conference
proceedings). This book constitutes the refereed proceedings of the International Workshop on Knowledge Discovery in Life Science Literature, KDLL 2006, held in Singapore in conjunction with the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006). The 12 revised full papers presented together with two invited talks were carefully reviewed and selected for inclusion in the book. The papers cover all topics of knowledge discovery in life science data such as text mining, identification and retrieval of documents, passage retrieval, co-reference resolution, extraction of life science entities or relationships from large collections, automated characterization of biological, biomedical and biotechnological entities and processes, extraction and characterization of more complex patterns and interaction networks, automated generation of text summaries, automated construction, expansion and curation of ontologies for different domains, and construction of controlled vocabularies. |
|
Berrar
D., Dubitzky W., and Granzow M. (eds.) (2002)
A Practical Approach to Microarray Data Analysis,
Springer, Dordrecht/Heidelberg/London, 384 pages, ISBN:
1402072600. Preface, TOC, Index [pdf] The book addresses the requirement of scientists and researchers to gain a basic understanding of microarray analysis methodologies and tools. It is intended for students, teachers, researchers, and research managers who want to understand the state of the art and of the presented methodologies and the areas in which gaps in our knowledge demand further research and development. The book is designed to be used by the practicing professional tasked with the design and analysis of microarray experiments or as a text for a senior undergraduate- or graduate level course in analytical genetics, biology, bioinformatics, computational biology, statistics and data mining, or applied computer science. "Overall, A Practical Approach to Microarray Analysis represents an invaluable resource for statisticians, bioinformaticians and mathematically talented biologists. For such readers, this book is perhaps the definitive guide to microarray analysis at present." [Review by Dr. Matt Wayland, in: Briefings in Functional Genomics and Proteomics 2(1):80-84, 2003] |
BMC
Bioinformatics
Journal of
Intelligent Systems
Journal of
Advanced Computational Intelligence and Intelligent
Informatics
Co-organizer of 9th UK-Japan Engineering Education-League Workshop, 5 September 2022, online.
Co-organizer of
the IJCAI Workshop Learning Data Representation for
Clustering, conjunction with the 29th Intl. Joint
Conference on Artificial Intelligence (IJCAI2020) and the
17th Pacific Rim International Conference on Artificial
Intelligence (PRICAI2020), January 2021, Kyoto, Japan (virtual
workshop).
Co-organizer of
8th UK-Japan Engineering Education-League Workshop,
Niigata University, Niigata, Japan, 26-27 February 2021
(virtual workshop).
Co-organizer of 7th UK-Japan Engineering Education-League Workshop, Queen Mary University, London, 5-7 September 2019.
Co-organizer of 6th
UK-Japan Engineering Education-League Workshop, Kyushu
University, Fukuoka, Japan, 3-5 September 2018.
Program co-chair of
IEEE International Conference on Bioinformatics and
Biomedicine (BIBM14),
Belfast, UK, 2-5 November 2014.
Program chair of the ESF
International
Workshop on Mining of High-Throughput Data in
Functional Genomics, University of Ulster, Coleraine,
Northern Ireland, May 8-9, 2007.
Co-chair of the International
Workshop
on Knowledge Discovery in Life Science Literature (KDLL06),
held in conjunction with the 10th Pacific-Asia Conference on
Knowledge Discovery and Data Mining (PAKDD06), Singapore,
April 9-12, 2006.
Co-chair of first meeting of the Working
Group
on Knowledge Discovery and Management, COST Action 282,
Coleraine, Northern Ireland, May 24-26, 2002.