Archive for the 'Activity' Category

Modern Massive Data Sets Reflections

Posted in Activity 6 days, 10 hours ago

The workshop was a blast! I had an incredible time getting up to speed on the latest and greatest in data analysis research. It was quite humbling to brush shoulders with some of top folks pushing the frontiers of science. There were also many opportunities to network where I could get a peek at the motivations behind some of the projects presented.

Each day had a theme:

  • Data Analysis and Data Applications
  • Networked Data and Algorithmic Tools
  • Statistical, Geometric, and Topological Methods
  • Machine Learning and Dimensionality Reduction

The breadth of topics was quite exhaustive. I mostly pushed my own agenda: streaming algorithms. There’s so much to write here that I won’t even attempt to.

Besides streaming algorithms, the presentations on mathematical topics were really interesting. Some of it I’ve previously seen from my day-to-day work, some of it was new. Of particular interest to me were the following:

  • Graph Sparsification : Never seen anything like this before.
  • Massive Terrain Data : Real smart use of offline datastrutures.
  • Symmetries in point cloud data : I’m intimately familiar with this style of mathematics from my previous work.
  • Pathway Analysis in Protein Folding : Puts bread on the table.
  • Intersection SVMs : Didn’t know this was a well known concept in machine learning known as the kernel trick. Goes by Reproducing Kernel Hilbert Space in my neck of the woods and also a precursor to the above mentioned image matching algorithm.
  • Manifold regularization : Fréchet means anyone?
  • Sufficient Dimension Reduction
  • Semi-definite programming : Some mathematical insights to a couple of engineering problems (where <1e-4 is good enough) that’s making my life difficult.
  • Spectral Algorithms
  • Matrix/Tensor Factorization
  • Future of Parallel Linear Algebra

Thanks to my friend Krishna who let me sleep on the floor in his house, thereby saving me from the grips of boredom.

Gromacs Workshop

Posted in Activity, Physics 2 months ago

I know some of my readers are deeply interested in high performance computing and computational physics: this is a post for them. The conference I had mentioned in my previous post was the GROMACS Workshop on Advanced Simulation Methods.

Gromacs is a high performance simulation engine primarily for solving Newtonian dynamics (it also does normal mode analysis, structure minimization and mixed molecular mechanics-quantum mechanics simulations.) It was an industry leader in terms of raw single processor performance for many years, until Desmond from D.E. Shaw Research took over with their super-scalable algorithms (I’ve written about this before.) With Gromacs 4.0, they’ve fixed the scalability problems and with a variety of other algorithmic fixes, they are the top dog once again. Disclaimer: these are all claims by relevant parties and I have not verified them myself, though I’d love to do so unencumbered. Though the Gromacs 4.0 paper is published, I’ll only be writing about it when the actual product is released.

The focus of the workshop was on algorithms, though there were some applications too. I’m sure an applications person would have felt out of place, but I felt I had something to contribute in almost every topic that was discussed. I’m archiving the list of topics here for posterity:

  • The new domain decomposition parallelization in Gromacs 4.0, with some tips & tricks to get the most out of your hardware
  • Different methods to perform free energy calculations. Slow-growth, perturbations, Bennett Acceptance Ratio. Which protocol is most efficient, and what new things will be in Gromacs 4.0?
  • QM/MM. How do you mix Quantum Mechanics with Gromacs?
  • Virtual sites for hydrogen motion removal and long time-steps
  • Membrane protein simulations
  • Replica exchange, and extracting kinetic data from it
  • Local pressure extensions to Gromacs
  • Gromacs source code walk-through

The take home message: strong coupling between various pieces of the algorithm is anti-thesis to parallel scalability. The CPU industry seems to have hit a brick wall in terms of improving raw computational speed: the future is in multi-core. Therefore, remove the coupling with better algorithms and you are on your way to highly scalable and by definition superbly fast algorithms.

The timestep used in an integrator while solving a set of equations inherently determines the speed of the algorithm. Big timesteps will make the algorithm unstable as you your trajectory will not be able to follow the phase space manifold accurately (as a side note Euler-type integrators also become unstable as you make the timestep smaller, but this is the least of your worries with a non-symplectic integrator.) The Nyquist theorem determines the sampling rate, so removing fast (or high frequency) degrees of motion such as hydrogen bond vibrations with constraints on them is required for a big timestep. Usual constraints algorithms are coupled leading to undesirable non-scalable algorithms. The Gromacs developers have solved this with a new constraints algorithm called P-LINCS.

Impact Awards

Posted in Activity 2 months ago

I was at the BC Technology Impact Awards ceremony last week representing my company Zymeworks. Zymeworks was nominated for the most promising pre-commercial technology company, but unfortunately we didn’t win. The award in this category went to Lignol Energy Corp., a clean tech company.

The organizers had tiled one wall of the Banquet Hall with a 100 feet screen. They had calibrated multiple projectors to blend the edges. Pretty impressive. You can get this technology from a couple of companies: here’s one.

Abebooks.com, an online market for new and used books also won an award. I’ve been using this website for the past couple of years to get used textbooks. Highly recommended.

John MacDonald, the founder of MDA (the ‘M’ in MDA) and of Day4Energy Inc. got the Person of the Year award. It’s truly an honor to be in the same room with the accomplished!

Local Optima

Posted in Activity 6 months, 3 weeks ago

It’s been about four months since I last wrote anything here.

At times your life can reach a state of local optima — you are settled and you begin to get comfortable. If you were to critically analyze the situation, you’d realize that you are at a 7 on a scale of 10 (arbitrarily normalized.) Going any higher would require significant changes and would probably cause a lot of pain. So why do it?

My answer doesn’t matter.

I came across this incredible speech “Man in the Arena” by Theodore Roosevelt in 1910, that never has and never will become irrelevant:

It is not the critic who counts; not the man who points out how the strong man stumbles, or where the doer of deeds could have done them better. The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood; who strives valiantly; who errs, who comes short again and again, because there is no effort without error and shortcoming; but who does actually strive to do the deeds; who knows great enthusiasms, the great devotions; who spends himself in a worthy cause; who at the best knows in the end the triumph of high achievement, and who at the worst, if he fails, at least fails while daring greatly, so that his place shall never be with those cold and timid souls who neither know victory nor defeat.

That is all.

Mathematical Image Analysis

Posted in Activity, Design 11 months, 1 week ago

I’m taking just one course this semester — a special topics course on Computational Anatomy and Medical Image Analysis. The course is highly focused on the research interests of my supervisor, so instead of sitting beside each other in the lab, we go to another class room and sit beside each other.

Computational Anatomy is “the use of mathematical analysis to learn how tissues grow, assume new shapes and morph into mature structures.” The first few classes are review of the pre-requisites – linear systems theory and stochastic systems. Having not taken either of these courses, I’m seeing plenty of new material. A lot of it is intuitive, some of it not obvious at all. Overall a Good Thing.

I don’t have a copy of the outline yet, but some of the topics we are going to cover are vector space theory, variational calculus, differential and riemannian geometry, tensor analysis and applications in Computational Anatomy. An example of an application is what I’ve just finished writing — linear statistical analysis to classify the hippocampus in dementia of the Alzheimer’s type.

I’ll be posting about things that I find interesting here. If you’re in the class and want to follow along, a quick way to do so if by bookmarking the tag: ensc462.