Data Science & Biotechnology

Home E Syllabus and Course of Studies E Data Science & Biotechnology

Theory: 2 hours/week | ECTS Units: 3

Content – Aim of the course

Data science is quite a modern term, which has emerged from previous ones like Knowledge Discovery in Databases or Data Mining and deals with search, understanding and utilization of big data. Considering that biotechnology can be defined as “any technological application that uses biological systems, live organisms and their products in order to develop products or processes for specific uses”, the technological elements that it utilizes in its mission (mathematics, statistics, information technology, data availability etc.) evolve and one of them is data science. Moreover, since all the companies in the biotechnology industry rely on data and on the information technologies, a scientist in this domain must hold competencies that deal with data science.

Moreover, since biotechnologists are researchers that apply statistics in biology, they are data scientists too. Both biotechnologists and data scientists are experts in research design (experimental, pre-experimental and quasi-experimental) and in this respect they are familiar with the triplet: mathematics, statistics (biostatistics) and programming. They collect vast amounts of data from the dynamic systems of the molecular world, analyze them in detail so that they will be able to define the factors that they will possibly require computational power. As such, biotechnologists learn to use software tools like R and Python, and to collect and analyze data from databases, which according to recent studies from head-hunters (Glassdoor) will enable them to enter competitive areas in the labor market.

This course has a double objective: first to offer the theoretical background and the technical skills regarding data science and second, to make students understand how they can utilize data (i.e., biological data) and produce predictive models in biotechnology.

Analytical Description of the Course

  • Introduction to data science
  • Predictive modeling
  • Supervised Segmentation
  • Discriminant Functions
  • Model performance analytics
  • Decision Analytic Thinking
  • Visualizing model performance
  • Prediction via evidence combination
  • Representing and Mining Text
  • Similarity and nearest neighbors
  • Unsupervised Data Mining and Clustering
  • Other processes and techniques in data science
  • Assignments’ presentation/assessment

Evaluation

Students’ evaluation is performed with:

  1. A personal assignment. (A).
  2. Examinations (E)

The final degree is calculated with the following formula:

TB = 0,7*E + 0,3*A
Where Α takes a value 1-10.
Successful students must gain:

  1. E > 5 and
  2. ΤΒ > 5

Reading Suggestions

  • Lantz, B. (2015). Machine Learning with R. Second Edition. Packt Publishing.
  • Verikios, V.S, Kaglis, V. and Stavropoulos, I.K. (2015) Data Science via R language (in Greek). SEAV: Kallipos publishing.

Indicative bibliography:

  • Simeonidis, P. and Gounaris, A. (2015). Databases, Data warehouses, and data mining with SQL Server: Laboratory Guide. SEAV: Kallipos publishing
  • Provost, F. and Fawcett, T. (2013).Data Science for Business. O’Reilly Media, Inc: Sebastopol, Canada.

Indicative Journals:

  • Big Data Research
  • Data in Brief
  • Computational Statistics & Data Analysis
  • Statistical Analysis and Data Mining
  • ACM Computing Surveys

Lecturer

Leonidas Anthopoulos (Course Coordinator)

Professor, Department of Business Administration, University of Thessaly