Data Science & Biotechnology

Home E Syllabus and Course of Studies E Data Science & Biotechnology

Theory: 2 hours/week | ECTS Units: 3

Tutoring in the English language is offered to Erasmus students

Content – Aim of the course

Data science is quite a modern term, which has emerged from previous ones like Knowledge Discovery in Databases or Data Mining and deals with search, understanding and utilization of big data. Considering that biotechnology can be defined as “any technological application that uses biological systems, live organisms and their products in order to develop products or processes for specific uses”, the technological elements that it utilizes in its mission (mathematics, statistics, information technology, data availability etc.) evolve and one of them is data science. Moreover, since all the companies in the biotechnology industry rely on data and on the information technologies, a scientist in this domain must hold competencies that deal with data science.

Moreover, since biotechnologists are researchers that apply statistics in biology, they are data scientists too. Both biotechnologists and data scientists are experts in research design (experimental, pre-experimental and quasi-experimental) and in this respect they are familiar with the triplet: mathematics, statistics (biostatistics) and programming. They collect vast amounts of data from the dynamic systems of the molecular world, analyze them in detail so that they will be able to define the factors that they will possibly require computational power. As such, biotechnologists learn to use software tools like R and Python, and to collect and analyze data from databases, which according to recent studies from head-hunters (Glassdoor) will enable them to enter competitive areas in the labor market.

This course has a double objective: first to offer the theoretical background and the technical skills regarding data science and second, to make students understand how they can utilize data (i.e., biological data) and produce predictive models in biotechnology.

Analytical Description of the Course

Introduction to data science
Predictive modeling
Supervised Segmentation
Discriminant Functions
Model performance analytics
Decision Analytic Thinking
Visualizing model performance
Prediction via evidence combination
Representing and Mining Text
Similarity and nearest neighbors
Unsupervised Data Mining and Clustering
Other processes and techniques in data science
Assignments’ presentation/assessment

Evaluation

Students’ evaluation is performed with:

A personal assignment. (A).
Examinations (E)

The final degree is calculated with the following formula:

TB = 0,7*E + 0,3*A
Where Α takes a value 1-10.
Successful students must gain:

E > 5 and
ΤΒ > 5

Reading Suggestions

Lantz, B. (2015). Machine Learning with R. Second Edition. Packt Publishing.
Verikios, V.S, Kaglis, V. and Stavropoulos, I.K. (2015) Data Science via R language (in Greek). SEAV: Kallipos publishing.

Indicative bibliography:

Simeonidis, P. and Gounaris, A. (2015). Databases, Data warehouses, and data mining with SQL Server: Laboratory Guide. SEAV: Kallipos publishing
Provost, F. and Fawcett, T. (2013).Data Science for Business. O’Reilly Media, Inc: Sebastopol, Canada.

Indicative Journals:

Big Data Research
Data in Brief
Computational Statistics & Data Analysis
Statistical Analysis and Data Mining
ACM Computing Surveys

E-class

https://eclass.uth.gr/courses/BIO_U_176/

Lecturer

Leonidas Anthopoulos (Course Coordinator)

Professor, Department of Business Administration, University of Thessaly