Bioinformatics or computational biology is the study of large amounts of biological information or genome. It focuses on molecules like DNA. It is done often with the help of computers.
Foundation
As species of living things change over time, the DNA contained in their cells change, because of evolution. If we can extract the information from living things today, and compare them to each other, we can see which living things are most closely related, the most similar can be thought to be the most closely related in time. Biologists can then construct family trees, or phylogenies. By combining each tree, a grand tree connecting all living things can be made, this is called the "Tree of Life". Bioinformatics uses integration of mathematical, statistical and computational method to analyze biological, biochemical and biophysical data.
The process
All of the information needed by a cell is provided in its DNA. When a cell wants to build a protein, it finds the appropriate piece of DNA, makes a copy of it (called RNA), and uses the instructions in the copy to make the protein.
Proteins can perform many functions like transportation, structural support, movement and metabolism. Proteins are made from amino acids. There are twenty different amino acids that are used to build millions of different protein molecules.
These molecules can be studied using computers to analyze the DNA, RNA, and amino acid sequences from which they are created. Because there are so many different molecules, the best way we have of understanding how the entire system works is to use bioinformatics.
Chemists have developed ways to understand the shape and behavior of small molecules, using mathematical analysis. They might use computers to study these molecules. The DNA contained in just one cell of an organism is far too large to be read by any person, and to compare the DNA between two (or more) organisms, be they in the brother and sister, or of a completely different species, requires comparing large amounts of information to find differences. Computers are better suited to such comparisons, and computer programmers have worked with biologists to create very very large databases to store all the DNA information that has ever been learned.
Related pages
|
|---|
|
|
|---|
| Continuous data | |
|---|
| Count data | |
|---|
| Summary tables | |
|---|
| Dependence | |
|---|
| Graphics |
- Bar chart
- Biplot
- Box plot
- Control chart
- Correlogram
- Fan chart
- Forest plot
- Histogram
- Pie chart
- Q–Q plot
- Run chart
- Scatter plot
- Stem-and-leaf display
- Radar chart
- Violin plot
|
|---|
|
|
|
|---|
| Study design |
- Population
- Statistic
- Effect size
- Statistical power
- Optimal design
- Sample size determination
- Replication
- Missing data
|
|---|
| Survey methodology | |
|---|
| Controlled experiments | |
|---|
| Adaptive Designs |
- Adaptive clinical trial
- Up-and-Down Designs
- Stochastic approximation
|
|---|
| Observational Studies |
- Cross-sectional study
- Cohort study
- Natural experiment
- Quasi-experiment
|
|---|
|
|
|
|---|
| Statistical theory | |
|---|
| Frequentist inference | | Point estimation |
- Estimating equations
- Unbiased estimators
- Mean-unbiased minimum-variance
- Rao–Blackwellization
- Lehmann–Scheffé theorem
- Median unbiased
- Plug-in
|
|---|
| Interval estimation | |
|---|
| Testing hypotheses |
- 1- & 2-tails
- Power
- Uniformly most powerful test
- Permutation test
- Multiple comparisons
|
|---|
| Parametric tests |
- Likelihood-ratio
- Score/Lagrange multiplier
- Wald
|
|---|
|
|---|
| Specific tests | | | Goodness of fit | |
|---|
| Rank statistics |
- Sign
- Signed rank (Wilcoxon)
- Rank sum (Mann–Whitney)
- Nonparametric anova
- 1-way (Kruskal–Wallis)
- 2-way (Friedman)
- Ordered alternative (Jonckheere–Terpstra)
|
|---|
|
|---|
| Bayesian inference | |
|---|
|
|
|
|---|
| Correlation | |
|---|
| Regression analysis |
- Errors and residuals
- Regression validation
- Mixed effects models
- Simultaneous equations models
- Multivariate adaptive regression splines (MARS)
|
|---|
| Linear regression | |
|---|
| Non-standard predictors |
- Nonlinear regression
- Nonparametric
- Semiparametric
- Isotonic
- Robust
- Heteroscedasticity
- Homoscedasticity
|
|---|
| Generalized linear model | |
|---|
| Partition of variance |
- Analysis of variance (ANOVA, anova)
- Analysis of covariance
- Multivariate ANOVA
- Degrees of freedom
|
|---|
|
|
Categorical / Multivariate / Time-series / Survival analysis |
|---|
| Categorical |
- Cohen's kappa
- Contingency table
- Graphical model
- Log-linear model
- McNemar's test
- Cochran-Mantel-Haenszel statistics
|
|---|
| Multivariate |
- Regression
- Manova
- Principal components
- Canonical correlation
- Discriminant analysis
- Cluster analysis
- Classification
- Structural equation model
- Multivariate distributions
|
|---|
| Time-series | | General |
- Decomposition
- Trend
- Stationarity
- Seasonal adjustment
- Exponential smoothing
- Cointegration
- Structural break
- Granger causality
|
|---|
| Specific tests |
- Dickey–Fuller
- Johansen
- Q-statistic (Ljung–Box)
- Durbin–Watson
- Breusch–Godfrey
|
|---|
| Time domain |
- Autocorrelation (ACF)
- Cross-correlation (XCF)
- ARMA model
- ARIMA model (Box–Jenkins)
- Autoregressive conditional heteroskedasticity (ARCH)
- Vector autoregression (VAR)
|
|---|
| Frequency domain | |
|---|
|
|---|
| Survival | | Survival function |
- Kaplan–Meier estimator (product limit)
- Proportional hazards models
- Accelerated failure time (AFT) model
- First hitting time
|
|---|
| Hazard function | |
|---|
| Test | |
|---|
|
|---|
|
|
Applications |
|---|
| Biostatistics | |
|---|
| Engineering statistics |
- Chemometrics
- Methods engineering
- Probabilistic design
- Process / quality control
- Reliability
- System identification
|
|---|
| Social statistics | |
|---|
| Spatial statistics |
- Cartography
- Environmental statistics
- Geographic information system
- Geostatistics
- Kriging
|
|---|
|
|