2010-09-03 Fri

I finally got some time to do some house cleaning. One of my nagging low-hanging fruit stuff was stop running jconsole on one screen to see the state of all my cassandra boxes. I created a ganglia script to graph what is above. Above I am showing all the cassandra servers and their total row read stages as a gauge. Meaning that basically I am graphing the delta of the change between ganglia script runs. This gives me the reads over time based on deltas between runs.
How I have it set up is:
All data exposed by JMX to produce tpstats and cfstats is graphed via ganglia. The pattern for each graph is as follows
cass_{stat_class}_{key}
stat_class - tpc, tpp, tpa means complete, pending, active respectively
key - would be message deserialization for instance.
For column family stats I graph the keyspace stats as well as the specific column family stats exposed by cfstats. For instance below:

If your interested in the scripts I'll send it to you or put it up on code.google.com, its written in perl OOP perl and takes the same approach of packaging that maatkit tool kit for mySQL by Xarb and crew does (puts all the "classes" in the file as the application).
GmetricDelegate is the parent package
GmetricCassandra extends GmetricDelegate and overloads getData as well as defines what is an absolute stats vrs a gauge.
As you can see the pattern I also have
GmetricInnoDB
GmetricMySQL
and so on.
then on each server I run
/usr/bin/perl -w /home/scripts/ganglia_gmetric.pl --module=GmetricCassandra
this then talks to Ganglia through gmetric to report the stats.
add to del.icio.us. look up in del.icio.usAs a result of identifying these areas, we’ve written a few in-depth articles. Each article is meant as a “Deep Dive” into a specific topic, and is paired with open-source, sample reference code.
In no particular order, the articles are as follows:
Visualizing Google Analytics Data with Google Chart Tools
This article describes how you can use JavaScript to pull data from the Export API to dynamically create and embed chart images in a web page. To do this, it shows you how to use the Data Export API and Google Chart Tools to create visualizations of your Google Analytics Data.
Outputting Data from the Data Export API to CSV Format
If you use Google Analytics, chances are that your data eventually makes its way into a spreadsheet. This article shows you how to automate all the manual work by printing data from the Data Export API in CSV, the most ubiquitous file format for table data.
Filling in Missing Values In Date Requests
If you want to request data displayed over a time series, you will find that there might be missing dates in your series requests. When requesting multiple dimensions, the Data Export API only returns entries for dates that have collected data. This can lead to missing dates in a time series, but this article describes how to fill in these missing dates.
We think this article format makes for a perfect jumping off point. Download the code, follow along in the article, and when you’re done absorbing the material, treat the code as a starting point and hack away to see what you can come up with!
And if you’ve got some more ideas for areas you’d like us to expound upon, let us know!
add to del.icio.us. look up in del.icio.usFor the past two weeks, we’ve been sharing Jeremy Allison’s video interviews from his trip to GUADEC. Today we have a third video where he talks to Lennart Poettering, creator of PulseAudio. Jeremy and Lennart talk about PulseAudio features, how Lennart got started improving audio on the linux desktop, and how to be successful in free software. Enjoy!
Thanks to Fabian Scherschel of Sixgun Productions for operating the camera.
add to del.icio.us. look up in del.icio.usBy:Fenng posted @ dbanotes.net. RSS | Ad.Adobe Flash Builder 4 简体中文正式版下载
这两天看到消息,Dell 在对 3PAR 争夺战中退出,HP 宣布获胜。当然,代价也是不菲,总收购价格大约 20 亿美元。
3PAR 这家公司刚进入国内我就有所接触,因为该公司在美国有很多证券、金融行业的客户,加之我上一家雇主就是做这个方面的,所以我非常想了解并引入 3PAR 的一些成功的经验,并且研究了一下高端的几款产品特性(refer: 3PAR 存储架构解析 ),最后还吃了一下螃蟹,在 3PAR 上实现了一套 Oracle 11g RAC 集群。所以,我算得上国内少数真正用过 3PAR 的了吧。考虑到以后再也不会接触这些所谓高端存储,还是有必要写点东西做个纪念。
应该说,3PAR 这产品的确有独到之处。首先是性能上看,通过特定硬件架构,充分利用了机械硬盘的特点,进而保证 I/O 响应时间,这是硬指标,真是非常贴近金融类的用户需求。 Thin Provisioning 技术也是实实在在的可在产品环境使用的,不像个别存储厂商只是一些功能的包装,跟风炒作概念,忽悠客户。让人感慨的是,3PAR 最近有有些生不逢时或是走向末路,上市时也恰好是金融危机来临之际,核心业务一下子受到非常大的影响,这是商业层面上的;另一方面,3PAR 的技术在机械硬盘时代几乎独步存储武林,但到了 SSD 的时代,则有武功被废的可能。尽管也宣称支持 SSD,但毕竟在机械硬盘时代的优势将不复存在了。追求高 IOPS ,更小 I/O 响应时间的用户用 PC 服务器 + SSD 就能很好的满足要求了。
HP 收购 3PAR 的意图其实比较明显,那就是弥补自己在高端产品上的缺陷。最近几年,IBM 在推收购来的 XIV ,HP 也有 EVA 系列的存储,和 3PAR 的一些设计理念都是很相近的,不过都只能算是中端存储,算不得高端产品。据我所知,EVA 似乎市场表现一般。收购 3PAR 后,估计 EVA 产品线将最终消亡。业内其实都知道,HP 自己一直没有高端存储产品,一直是 OEM Hitachi Data Systems(HDS) 的高端产品,后来和 Oracle 合作 Exadata ,Oracle 收购 Sun 之后也不再和 HP 合作,对 HP 来说,如果在未来几年,要在存储领域有所作为,收购是最为便捷的办法--也是高层最不用动脑筋就能使用的办法。
尽管 HP 有收购 3PAR 的足够理由,但我觉得这笔收购未必能对 HP 带来多大价值。如果给 Dell 可能会更好一些,Dell 可能将 3PAR 用来主打中高端存储市场。
据说 3PAR 的 CEO 获益比三位联合创始人的都要高,这就是商业运作的力量。3PAR 发展到现在,历史已经有 10 年了,在看到一些有创造力的公司成功之前,也要想到创业的艰辛,成功只是少数,失败是多数。(3PAR 的命名是这样的:3 表示三个人,P 代表Jeffrey Price;A 代表Ashok Singhal;R 则为 Robert Rogers,已于 2001 年离开. 所以,这家公司的名字应该都用大写字母才是)
从技术发展和业界的需求来看,这些 SAN 中高端存储已经临近黄昏。也不可能在云存储方面有什么进一步的想象力,有的话,也是空想。当然,这是另外的话题了。
--EOF--
最近文章|Recent Articles
本站赞助商:豆瓣网
评论数(1)|添加评论 | 最近作者还说了什么? Follow Fenng@Twitter
本文网址:http://www.dbanotes.net/review/3par_acquired_by_HP.html
DBA Notes 理念: 用简约的技术取得最大的收益...
add to del.icio.us. look up in del.icio.us2010-09-02 Thu
How long it may take MySQL with Innodb tables to shut down ? It can be quite a while.
In default configuration innodb_fast_shutdown=ON the main job Innodb has to do to complete shutdown is flushing dirty buffers. The number of dirty buffers in the buffer pool varies depending on innodb_max_dirty_pages_pct as well as workload and innodb_log_buffer_size and can be anywhere from 10 to 90% in the real life workloads. Innodb_buffer_pool_pages_dirty status will show you the actual data. Now the flush speed also depends on number of factors. First it is your storage configuration – you may be looking at less than 200 writes/sec for single entry level hard drive to tens of thousands of writes/sec for high end SSD card. Flushing can be done using multiple threads (in XtraDB and Innodb Plugin at least) so it scales well with multiple hard drives. The second important variable is your workload, especially how dirty pages would line up on the hard drive. If there are a lot of sequential pages which are dirty Innodb will be able to use larger size IOs – up to 1MB flushing dirty pages which can be a lot faster than flushing data page by page.
So if we have system with single hard drive doing 200 IO/ssc, 48G buffer pool which is 90% dirty and completely random page writes we’ll look at 13500 seconds or about 5min per 1GB of Buffer pool size.
This is worse case scenario though it is quite common in practice to see shutdown time of about 1min per GB of buffer pool per hard drive.
Baron has written a nice post how to decrease innodb shutdown time which you may want to read on this topic.
Entry posted by peter | No comment
add to del.icio.us. look up in del.icio.usIan Skerrett of the Eclipse Foundation wrote on his blog,
...Over 150 people attended the day long event that included 12 sessions related to Eclipse and Google technology. The presentations are now available online. There was lots of great information presented, like upcoming improvements to the Android SDK (based on Eclipse), Git support in Eclipse, a review of the Instantiations tools that Google just purchased and an introduction to the new Tools for Mobile Web project.Most important, all of us at Google would like to thank Ian Skerrett and everyone at the Eclipse Foundation for assembling three of these great events. We were happy to welcome the Eclipse community to our campus, and we are happy to continue to support Eclipse. Don’t forget that we’re always looking to make this conference better, so give us your ideas! Tell us what you would like to see at future events in the comments, or if you were able to attend, tell us what you thought about this year’s program.
By Robert Konigsberg, Software Build Tools Team
add to del.icio.us. look up in del.icio.usbacktrace-mingw 今天更新了一下。原来的版本不能正确显示 dll 里的符号信息。现在可以了。只是打了个补丁,所以代码比较乱。
不知道 backtrace-mingw 的同学,可以看这里 。
add to del.icio.us. look up in del.icio.us2010-09-01 Wed
By:Fenng posted @ dbanotes.net. RSS | Ad.Adobe Flash Builder 4 简体中文正式版下载
看过之后才会相信,电影《盗梦空间》(Inception) 如潮的好评名副其实,个人强烈推荐。即使有人介绍过剧情,自己观看的时候仍然是不一样的。这就是经典的魅力吧。
整部电影还是给人颇有些"庄生晓梦"的感觉。我和 Laura 开玩笑,是不是现在就在梦里?毕竟都凌晨三点钟了。情节繁复,循环,仿佛博尔赫斯的小说,但影片倒是并没有故弄玄虚,虽然开头部分信息量较大,一旦进入情节,后面的节奏控制得非常好。对于情节的设定,计算机同行不妨把把梦当作虚拟机好了,只是下一层的虚拟机会更慢而已。或许你会为电影中的爱情故事感慨,关于父子亲情难道就不感人么?
有一处场景明显是借鉴了荷兰艺术家埃舍尔(M.C. Escher)的作品,"不可能"的图形:

杭州万象城的 IMAX 厅,九月一号凌晨看的首映,买票的时候有些犹豫,午夜场,第二天还要上班,熬夜值不值?回家睡了一觉又杀去,整个放映厅差不多都坐满了,杭州影迷精神头可谓十足。将近三个小时的电影,看完后毫无困意。
--EOF--
最近文章|Recent Articles
本站赞助商:豆瓣网
评论数(8)|添加评论 | 最近作者还说了什么? Follow Fenng@Twitter
本文网址:http://www.dbanotes.net/mylife/inception.html
DBA Notes 理念: 用简约的技术取得最大的收益...
add to del.icio.us. look up in del.icio.usThis is part of the series highlighting some notable publications by Googlers.
At Google, we operate large datacenters containing clusters of servers, networking switches, and more. While this gear costs a lot of money, an increasingly important cost -- both in terms of dollars and environmental impact -- is the electricity that drives the computing clusters and the cooling infrastructure. Since our clusters often do not run at full utilization, Google recently put forth a call to industry and researchers to develop energy proportional computer systems. With such systems, the power consumed by our clusters would be directly proportional to utilization. Servers consume the most electricity, and therefore researchers have responded to Google’s call by focusing their attention towards servers. As the servers become increasingly energy proportional, however, the “always on” network fabric that connects servers together will consume an increasing fraction of datacenter power unless it too becomes energy proportional.
In a paper recently published at the International Symposium on Computer Architecture (ISCA), we push further towards the goal of energy-proportional computing by focusing on the energy usage of high-bandwidth, highly-scalable cluster networking fabrics. This research considers a broad set of architectural and technological solutions to optimize energy usage without sacrificing performance. First, we show how the Flattened Butterfly network topology uses less power since it uses less switching chips and fewer links than a comparable-performance network built using the more conventional Fat Tree topology. Second, our approach takes advantage of the observation that when network demand is low, we can reduce the speed at which links transmit data. We show via simulation, that by tuning the speeds of the links very rapidly, we can reduce power consumption with little impact on performance. Finally, our research is a further call to action for the academic and industry research communities to make energy efficiency, and energy proportionality in particular, a first-class citizen in networking research. Put together, our proposed techniques can reduce energy cost for typical Google workloads seen in our production datacenters by millions of dollars!
add to del.icio.us. look up in del.icio.us
add to del.icio.us. look up in del.icio.us





