Galaxy Communities, how many are there?

Galaxy is a scientific workflow, data integration, and data and analysis persistence publishing platform that aims to make computational biology accessible to research scientists that do not have computer programming experience.

It is all over the globe and it’s amazing if you think about it someone might be 12 hours ahead or behind you! Developers, Scientist, Engineers and more are involved in this project but let me tell you about the different communities there are within galaxy.

Assembly

DNA sequence data has become an indispensable tool for Molecular Biology & Evolutionary Biology. Study in these fields now require a genome sequence to work from. We call this a ‘Reference Sequence.‘ We need to build a reference for each specie so we do this by Genome Assembly. De novo Genome Assembly is the process of reconstructing the original DNA sequence from the fragment reads alone.

CLIMATE

Climate is defined as the average state of everyday’s weather condition over a period of 30 years. It is measured by assessing the patterns of variation in terms of temperature, humidity, atmospheric pressure, wind, precipitation, atmospheric particle count and other meteorological variables in a given region over long periods of time.

In Galaxy you’ve got re-analysy and observation for information about the past and climate models for both the past and the future. For more info you can check this slide tutorial

Computational chemistry

Thanks to Galaxy you can start to model, simulate and analyse biomolecular systems! Look here for e.g for a tutorial on how to recreate an analysis of molecular dynamics simulations where you’ll use tools that are able to investigate conformational changes by analysis of a typical short protein simulation, such as for CBH1 or here to set up a molecular system. Pretty cool, right?

ECOLOGY

Want to learn to analyse Ecological data through Galaxy? Say no more, you will have all you need to compute and analyze biodiversity metrics with PAMPA toolsuite, model a theoretical ecological niche and predict species distribution in a future climate scenario and more!

EPIGENETICS

DNA methylation is an epigenetic mechanism used by higher eukaryotes and involved in e.g. gene expression, X-Chromosome inactivating, imprinting, and gene silencing of germline specific gene and repetitive elements. It’s recommended you check out sequence analysis first two tutorials before diving right in this community!

GENOME ANOTATIONS

Genome annotation is a multi-level process that includes prediction of protein-coding genes, as well as other functional genome units such as structural RNAs, tRNAs, small RNAs, pseudogenes, control regions, direct and inverted repeats, insertion sequences, transposons and other mobile elements. Here running pairwise genome comparisons and massive chromosome comparisons is at one’s disposal just as visualizing those genome comparisons as well!

IMAGING

Image analysis is the extraction of meaningful information from images by means of digital image processing techniques. Imaging is an important component in a wide range of scientific fields of study, such as astronomy, medicine, physics, biology, geography, chemistry, robotics, and industrial manufacturing. With Galaxy you are ready to perform basic image analysis tasks such as format conversion, image enhancement, segmentation, and feature extraction.

METABOLOMICS

According to Wikipedia, Meta bolomics is the scientific study of chemical processes involving metabolites, the small molecule substrates, intermediates and products of cell metabolism. Specifically, metabolomics is the “systematic study of the unique chemical fingerprints that specific cellular processes leave behind”, the study of their small-molecule metabolite profiles

GRN provides material to analyse Mass spectrometry data in Galaxy: Metabolomics (LCMS, FIAMS, GCMS, NMR) and imaging.

METAGENOMICS

Metagenomics is a discipline that enables the genomic study of uncultured microorganisms. “Like genomics itself,” states the U.S. National Center for Biotechnology Information, “metagenomics is both a set of research techniques, comprising many related approaches and methods, and a research field. In Greek, meta means “transcendent.” In its approaches and methods, metagenomics circumvents the unculturability and genomic diversity of most microbes, the biggest roadblocks to advances in clinical and environmental microbiology.”

Learn how to analyze and extract information about metagenomics data within Galaxy, or why not do the same with metatranscriptomics data as well

PROTEOMICS

Proteomics is the large-scale study of proteomes. You may be asking “what are proteomes?” well Marc Wilkins state that a proteome is the entire complement of proteins that is or can be expressed by a cell, tissue, or organism at a given time. It is not constant as it differs from cell to cell and changes over time.

Get comfy to get through the many tutorials that cover protein identification and/or label-free and label based quantification from data dependent acquisition (DDA) and data independent acquisition (DIA) or if you’re feeling adventorous follow other tutorial sets to combine proteomics with other -omics technologies such as transcriptomics.

SEQUENCE ANALYSYS

Sequence analysis is a term that comprehensively represents computational analysis of a DNA, RNA or peptide sequence, to extract knowledge about its properties, biological function, structure and evolution.

Sequencing produces a collection of sequences without genomic context. We do not know to which part of the genome the sequences correspond to so mapping the reads of an experiment to a reference genome is a key step in modern genomic data analysis. Moreover it is necessary to understand, identify and exclude error-types that may impact the interpretation of downstream analysis. Sequence quality control is therefore an essential first step in your analysis. Catching errors early saves time later on.

STATISTICS AND MACHINE LEARNING

Machine learning uses techniques from statistics, mathematics and computer science to make computer programs learn from data. It is one of the most popular fields of computer science and finds applications in multiple streams of data analysis such as classification, regression, clustering, dimensionality reduction, density estimation and many more.

Machine Learning can be used to create predictive models by learning features from datasets but before going any further check this tutorial that covers the basics of this topic so that you’ll be knowledge-ready to go deeper and deeper into this world

TRANSCRIPTOMICS

The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The term transcriptome is a portmanteau of the words transcript and genome; it is associated with the process of transcript production during the biological process of transcription. So transcriptomics are the techniques used to study an organism’s transcriptome,

Complete end-to-end analysis that take you from raw sequencing reads to pathway analysis or try and visualize RNA-Seq results with varying tools all within Galaxy Project.

VARIANT ANALYSIS

Genetic differences (variants) between healthy and diseased tissue, between individuals of a population, or between strains of an organism can provide mechanistic insight into disease processes and the natural function of affected genes.

The available tutorials show how to detect evidence for genetic variants in next-generation sequencing data, a process termed variant calling. Of equal importance, they also demonstrate how you can interpret, for a range of different organisms, the resulting sets of variants by predicting their molecular effects on genes and proteins, by annotating previously observed variants with published knowledge, and by trying to link phenotypes of the sequenced samples to their variant genotypes.

VISUALISATION

Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data.

Check these handy dandy tutorials on how to visualise blast data or features with a tool called Jbrowse or try this other one using Circos for visualizing data in a circular layout, perfect for exploring relationships between objects or positions

Check here to check all the tutorials available in GTN and many more!
Thanks for reading 🙂

Pia's outreachy blog

Galaxy Communities, how many are there?

Leave a comment Cancel reply

Galaxy Communities, how many are there?

Share this

Leave a comment Cancel reply