2010-03-05 Fri
看到Kamus对SQLULDR2的留言后, 破有感触. 人们应当比较关注, 他们想要的功能用起来方便是否, 关键并不在于功能的多少. 而SQLULDR2的众多的命令行选项, 也确实有些让人发晕, 包括我自已.
为了方便大多数人使用, 简化了SQLULDR2的命令行帮助, 简化到如下所示.
SQL*UnLoader: Fast Oracle Text Unloader (GZIP), Release 3.0.1
(@) Copyright Lou Fangxin (AnySQL.net) 2004 - 2010, all rights reserved.
Usage: SQLULDR2 keyword=value [,keyword=value,...]
Valid Keywords:
user = username/password@tnsname
sql = SQL file name
query = select statement
field = separator string between fields
record = separator string between records
read = set DB_FILE_MULTIBLOCK_READ_COUNT at session level
sort = set SORT_AREA_SIZE at session level (UNIT:MB)
hash = set HASH_AREA_SIZE at session level (UNIT:MB)
array = array fetch size
rows = print progress for every given rows (default, 1000000)
file = output file name(default: uldrdata.txt)
log = log file name, prefix with + to append mode
text = output type (MYSQL, CSV, MYSQLINS, ORACLEINS, FORM, SEARCH).
parfile = read command option from parameter file
for field and record, you can use '0x' to specify hex character code,
\r=0x0d \n=0x0a |=0x7c ,=0x2c, \t=0x09, :=0x3a, #=0x23, "=0x22 '=0x27
对于专家而言, 可以用如下方式得到以前全部的命令行选项.
sqluldr2 help=yes
通过引入一个TEXT选项, 来针对不同格式的导出进行相关选项的设置, 不仅方便了大家使用, 也可以对SQLULDR2的功能有一个很直接的了解, 例如SQLULDR2可以导出数据给MySQL用, 或导出成Excel可以打开的标准CSV文件, 或是生成MySQL和ORACLE上的INSERT语句, 也可以按列显示记录, 或为一些特殊的搜索程序生成数据源.
再次感谢Kamus的好建议, 今年是支付宝的用户体验年, 应当从用户角度进行反思.
Relative Posts:
add to del.icio.us. look up in del.icio.usIf you are interested in learning more about Google’s activities in computer science education, make sure to attend some of the talks we have scheduled or drop by the Google booth!
add to del.icio.us. look up in del.icio.us![]() | ![]() | ![]() |
We continue to be impressed by the new solutions developers are bringing to market by leveraging the Google Analytics Platform. If you have developed a useful new tool or integration on top of Google Analytics, drop us an email at analytics-api@google.com. If it's innovative and useful we'll highlight it to our readers on this blog.
add to del.icio.us. look up in del.icio.usThe open source projects he created as part of his work were two-fold: Linux Trace Toolkit Next Generation (LTTng), a LGPLv2.1/GPLv2 tracer for the Linux kernel; and Userspace RCU library (liburcu), a highly-scalable user-space synchronization library, distributed under the LGPLv2.1 license.
Mathieu was kind enough to send us this summary of his research:
Computer systems, both at the hardware and software-levels, are becoming increasingly complex. Tracing is the key to solving some or all of this increasing complexity. In the case of Linux, used in a large range of applications, from small embedded devices to high-end servers, the size of the operating system kernels are increasing, libraries are being added, and major redesign of existing software is required to benefit from multi-core architectures. As a result, the software development industry and individual developers are facing problems whose resolution requires an understanding of the interaction between applications and all components of an operating system.
In my thesis, I propose the LTTng (Linux Trace Toolkit next generation) tracer as an answer to the industry and open source community tracing needs. The low-intrusiveness of the tracer is a key aspect of its usefulness because we need to be able to reproduce problems occurring in normal conditions. In some cases, users leave tracers active at all times in production, which makes the tracer overhead definitely critical. Our approach involves the design of synchronization primitives that meet the low-impact requirements. The linearly scalable and wait-free RCU (Read-Copy Update) synchronization mechanism used by the LTTng tracer fulfills these requirements with respect to data read. A custom-made buffer synchronization scheme is proposed to extract tracing data while preserving linear scalability and wait-free characteristics.
By measuring the LTTng impact, I demonstrate that it is possible to create a tracer that satisfy all the following characteristics: low latency, deterministic real-time impact (wait-free), small impact on operating system throughput and linear scalability with the number of cores. Experiments on various architectures show that this tracer is portable.
I propose a general model for superscalar multi-core systems with weakly-ordered memory accesses to perform formal verification of the RCU correctness and wait-free guarantees by model-checking. The LTTng
buffering scheme is also formally verified for safety and progress. Formal verification demonstrates that these algorithms allow reentrancy from multiple execution contexts, ranging from standard thread to non-maskable interrupts handlers, allowing a wide instrumentation coverage of the operating system.
Many thanks to Mathieu for sending us this report. You can download the full dissertation for more details.
add to del.icio.us. look up in del.icio.us- 垃圾信息或由用户生成的垃圾内容
- 含有垃圾信息的论坛帖子或大量的垃圾评论
- 可疑的黑客攻击
add to del.icio.us. look up in del.icio.us2010-03-04 Thu
All of these changes in software are very exciting, but who is it all for? Why is anonymity online so important? Companies like Google have privacy and opt-out policies, but not everyone has this stance. Corporations, nations, criminal organizations and individuals want your information. Companies collect information on your web browsing habits and sell it or are sloppy when it comes to protecting it from identity thieves. Others can threaten lives, from repressive nations tracking down outspoken journalists, to abusive spouses or stalkers who want to find out where their victims are hiding; from enemy military forces trying to find a communications link, to criminals who know when law enforcement is watching online.
Political upheaval sparks protests and renewed efforts to control the flow of information online. Interest in censorship circumvention also rises. In 2009, use of Tor increased, as users tried to get around national firewalls during the elections in Iran, and after the introduction of national Internet filters in other countries.

In times of relative political stability, governments routinely filter out international news outlets, information on reproductive health, religion, human rights and other topics deemed unfit. Women blogging about things considered mundane elsewhere, like being forbidden to drive or shop alone, are harassed by authorities. On the one hand, technology has made it easier to crack down on dissent, but the right technology can influence policy in good ways. In Mauritania, the use of censorship circumvention software after 2005 became widespread enough to prompt the government to stop filtering, since it was becoming a waste of time.
Even people living in countries where free speech is protected by law need anonymity for political activities. People blogging about political views that differ from the prevailing attitudes in a small community may lose a job or face boycotts if they run a business. In a company town, writing about the misdeeds of the company that employs your neighbors may be dangerous. Telling people about corruption could lead to harassment from guilty officials.
When someone finds the courage to leave an abusive relationship, the support of victims' advocates is vital. The Internet can help a survivor find counseling, shelter, and encouragement from people who have gone through the same process. Sadly, stalkers are also using technology to find their victims. Abusers monitor web browsers to see if a victim is planning to leave. Information about a shelter's location can be found in email headers, forcing abuse survivors to relocate. According to the U.S. Bureau of Justice Statistics, over one in four people who are stalked experience some sort of cyberstalking. Though some software in a stalker's toolkit is installed on a home computer, IP addresses can reveal which internet cafe or library someone uses to get online. Even if you don't have a stalker, hiding your IP address can be a good idea. Kids and adults alike are advised not to tell strangers where they live, but an IP address can reveal it for them.
Sting operations fail if criminals can tell that the police are connecting to message boards and chat from a government network. The information disappears. Insurgents may be looking for soldiers connecting to their defense department's computers back home. Anonymous tip lines are not so anonymous if someone telling authorities about crime is the only person in the neighborhood connecting to a government website. Without anonymity, going after organized crime can be dangerous to officers and their families.
Some companies do not reveal how much they know about their customers, or who sees the information. Some Internet Service Providers feel entitled to sell data collected from their subscribers to marketers. Though they claim that the information is not tied to any particular users, it is easy to find someone based on their search history. Information about visits to banking websites, searches for details on pre-existing health conditions, or other sensitive online activity could be damaging in the wrong hands; whether made available through carelessness or commercial interest.
Privacy online can protect people offline whether they are organizing protests, covering the news, blowing the whistle on threats to public health, or just blogging about daily life. In the "real world" assaults on privacy like peeking in windows, opening mail, or breaking and entering are obvious crimes. In the online world, however, assaults on privacy are subtle and unyielding. These threats to your health, your wealth and your well-being have no "opt-out" button. They have no "scrub my data" option. Your online activities, e-mails, bank transactions and everything else can be used to trace where you are and who you are. Using software like Tor gives ordinary citizens more choice about the information they reveal online.
For more information about online privacy and circumventing internet censorship, visit the Tor Project's website.
add to del.icio.us. look up in del.icio.us为了查询出保存在员工表(SCOTT.EMP)中, 每个部门工资最高的三个人, 如果是Oracle数据库, 大家可以使用Windows分组汇总函数来轻松地实现, 如下所示.
SELECT * FROM (
SELECT DEPTNO, EMPNO, ENAME, SAL,
RANK() OVER (PARTITION BY DEPTNO ORDER BY SAL DESC) RNK
FROM EMP ) WHERE RNK <= 3
但如果员工表存放在MySQL数据库, 或其他数据库, 如SQLLite中, 要实现同样的功能, 就比较复杂了, 至少我现在都还不会. 但利用DataReport以前开发的功能, 及刚增加的条件过滤功能, 就可以轻松实现这个需求.
webchart.query_1=select deptno, empno, ename, sal from emp
webchart.express_1=rank|x|rnk::sal|deptno
webchart.filter_1=3.5-x|rank
webchart.sort_1=deptno,rank
webchart.group_1=1
如果Filter中的公司算出来的值小于0, 那么这条记录就会被删除, 在这个例子中, 如果排名这一列的值大于3, 这个公式算出来的值就为负数, 所以只保留了前三名, 达到了我们的业务要求. 页面输出如下所示的表格:
deptno empno ename sal rank 10 7839 KING 5000.0 1 7782 CLARK 2450.0 2 7934 MILLER 1300.0 3 20 7788 SSCOTT 3000.0 1 7902 FORD 3000.0 2 7566 JONESS 2975.0 3 30 7698 BLAKE 2850.0 1 7499 ALLEN 1600.0 2 7844 TURNER 1500.0 3
将这些处理放在应用服务器端实现, 不仅让SQL变得通用, 如果访问频率极高, 还可以减轻数据库端的压力.
Relative Posts:
add to del.icio.us. look up in del.icio.us前言:本文是我撰写的关于搭建“Nginx + PHP(FastCGI)”Web服务器的第6篇文章。本系列文章作为国内最早详细介绍 Nginx + PHP 安装、配置、使用的资料之一,为推动 Nginx 在国内的发展产生了积极的作用。本文可能不断更新小版本,请记住原文链接“http://blog.s135.com/nginx_php_v6/”,获取最新内容。第6篇文章主要介绍了Nginx 0.8.x新的平滑重启方式,将PHP升级到了5.2.13,修正了PEAR问题。另将MySQL 5.1.x升级到了5.5.x系列,配置文件变更较大。
链接:《2007年9月的第1版》、《2007年12月的第2版》、《2008年6月的第3版》、《2008年8月的第4版》、《2009年5月的第5版》

Nginx ("engine x") 是一个高性能的 HTTP 和反向代理服务器,也是一个 IMAP/POP3/SMTP 代理服务器。 Nginx 是由 Igor Sysoev 为俄罗斯访问量第二的 Rambler.ru 站点开发的,它已经在该站点运行超过三年了。Igor 将源代码以类BSD许可证的形式发布。
Nginx 超越 Apache 的高性能和稳定性,使得国内使用 Nginx 作为 Web 服务器的网站也越来越多,其中包括新浪博客、新浪播客、网易新闻、腾讯网、搜狐博客等门户网站频道,六间房、56.com等视频分享网站,Discuz!官方论坛、水木社区等知名论坛,盛大在线、金山逍遥网等网络游戏网站,豆瓣、人人网、YUPOO相册、金山爱词霸、迅雷在线等新兴Web 2.0网站。
Nginx 的官方中文维基:http://wiki.nginx.org/NginxChs
在高并发连接的情况下,Nginx是Apache服务器不错的替代品。Nginx同时也可以作为7层负载均衡服务器来使用。根据我的测试结果,Nginx 0.8.34 + PHP 5.2.13 (FastCGI) 可以承受3万以上的并发连接数,相当于同等环境下Apache的10倍。
根据我的经验,4GB内存的服务器+Apache(prefork模式)一般只能处理3000个并发连接,因为它们将占用3GB以上的内存,还得为系统预留1GB的内存。我曾经就有两台Apache服务器,因为在配置文件中设置的MaxClients为4000,当Apache并发连接数达到3800时,导致服务器内存和Swap空间用满而崩溃。
而这台 Nginx 0.8.34 + PHP 5.2.13 (FastCGI) 服务器在3万并发连接下,开启的10个Nginx进程消耗150M内存(15M*10=150M),开启的64个php-cgi进程消耗1280M内存(20M*64=1280M),加上系统自身消耗的内存,总共消耗不到2GB内存。如果服务器内存较小,完全可以只开启25个php-cgi进程,这样php-cgi消耗的总内存数才500M。
在3万并发连接下,访问Nginx 0.8.34 + PHP 5.2.13 (FastCGI) 服务器的PHP程序,仍然速度飞快。下图为Nginx的状态监控页面,显示的活动连接数为28457(关于Nginx的监控页配置,会在本文接下来所给出的Nginx配置文件中写明):

我生产环境下的两台Nginx + PHP5(FastCGI)服务器,跑多个一般复杂的纯PHP动态程序,单台Nginx + PHP5(FastCGI)服务器跑PHP动态程序的处理能力已经超过“700次请求/秒”,相当于每天可以承受6000万(700*60*60*24=60480000)的访问量(更多信息见此),而服务器的系统负载也不高:

2009年9月3日下午2:30,金山游戏《剑侠情缘网络版叁》临时维护1小时(http://kefu.xoyo.com/gonggao/jx3/2009-09-03/750438.shtml),大量玩家上官网,论坛、评论、客服等动态应用Nginx服务器集群,每台服务器的Nginx活动连接数达到2.8万,这是笔者遇到的Nginx生产环境最高并发值。

下面是用100个并发连接分别去压生产环境中同一负载均衡器VIP下、提供相同服务的两台服务器,一台为Nginx,另一台为Apache,Nginx每秒处理的请求数是Apache的两倍多,Nginx服务器的系统负载、CPU使用率远低于Apache:
你可以将连接数开到10000~30000,去压Nginx和Apache上的phpinfo.php,这是用浏览器访问Nginx上的phpinfo.php一切正常,而访问Apache服务器的phpinfo.php,则是该页无法显示。4G内存的服务器,即使再优化,Apache也很难在“webbench -c 30000 -t 60 http://xxx.xxx.xxx.xxx/phpinfo.php”的压力情况下正常访问,而调整参数优化后的Nginx可以。
webbench 下载地址:http://blog.s135.com/post/288/
注意:webbench 做压力测试时,该软件自身也会消耗CPU和内存资源,为了测试准确,请将 webbench 安装在别的服务器上。
测试结果:##### Nginx + PHP #####
Webbench - Simple Web Benchmark 1.5
Copyright (c) Radim Kolar 1997-2004, GPL Open Source Software.
Benchmarking: GET http://192.168.1.21/phpinfo.php
100 clients, running 30 sec.
Speed=102450 pages/min, 16490596 bytes/sec.
Requests: 51225 susceed, 0 failed.
top - 14:06:13 up 27 days, 2:25, 2 users, load average: 14.57, 9.89, 6.51
Tasks: 287 total, 4 running, 283 sleeping, 0 stopped, 0 zombie
Cpu(s): 49.9% us, 6.7% sy, 0.0% ni, 41.4% id, 1.1% wa, 0.1% hi, 0.8% si
Mem: 6230016k total, 2959468k used, 3270548k free, 635992k buffers
Swap: 2031608k total, 3696k used, 2027912k free, 1231444k cached
测试结果:##### Apache + PHP #####
Webbench - Simple Web Benchmark 1.5
Copyright (c) Radim Kolar 1997-2004, GPL Open Source Software.
Benchmarking: GET http://192.168.1.27/phpinfo.php
100 clients, running 30 sec.
Speed=42184 pages/min, 31512914 bytes/sec.
Requests: 21092 susceed, 0 failed.
top - 14:06:20 up 27 days, 2:13, 2 users, load average: 62.15, 26.36, 13.42
Tasks: 318 total, 7 running, 310 sleeping, 0 stopped, 1 zombie
Cpu(s): 80.4% us, 10.6% sy, 0.0% ni, 7.9% id, 0.1% wa, 0.1% hi, 0.9% si
Mem: 6230016k total, 3075948k used, 3154068k free, 379896k buffers
Swap: 2031608k total, 12592k used, 2019016k free, 1117868k cached
为什么Nginx的性能要比Apache高得多?这得益于Nginx使用了最新的epoll(Linux 2.6内核)和kqueue(freebsd)网络I/O模型,而Apache则使用的是传统的select模型。目前Linux下能够承受高并发访问的Squid、Memcached都采用的是epoll网络I/O模型。
处理大量的连接的读写,Apache所采用的select网络I/O模型非常低效。下面用一个比喻来解析Apache采用的select模型和Nginx采用的epoll模型进行之间的区别:
假设你在大学读书,住的宿舍楼有很多间房间,你的朋友要来找你。select版宿管大妈就会带着你的朋友挨个房间去找,直到找到你为止。而epoll版宿管大妈会先记下每位同学的房间号,你的朋友来时,只需告诉你的朋友你住在哪个房间即可,不用亲自带着你的朋友满大楼找人。如果来了10000个人,都要找自己住这栋楼的同学时,select版和epoll版宿管大妈,谁的效率更高,不言自明。同理,在高并发服务器中,轮询I/O是最耗时间的操作之一,select和epoll的性能谁的性能更高,同样十分明了。
安装步骤:
(系统要求:Linux 2.6+ 内核,本文中的Linux操作系统为CentOS 5.3,另在RedHat AS4上也安装成功)
............
Tags - nginx , php , mysql , linux , apache , html , centos , http , apc , xcache , memcached , httpd , memcache , epoll
add to del.icio.us. look up in del.icio.us2010-03-03 Wed
A few weeks ago we announced the launch of a new orkut application in Google Labs called People Hopper that lets you take your profile image and "morph" it into a friend's photo, using publicly available images from other orkut users along the way. No computer graphics tricks are used; every image along the transition comes from real orkut users.

The application hops across millions of public user images in orkut so that one image is smoothly transformed into another. First, faces are automatically detected in public profile images and normalized in contrast and size. Then, for each image, we find other public profile images that are similar to it. Finally, when you pick two faces, we just hop between similar public images, step-by-step, until the connection is made.
People Hopper was outcome of the following research question: Is it possible to learn a low-dimensional space (i.e. a manifold) in which all the human face images live? It is well-known in the machine learning community that to recover the true underlying manifold one needs a large number of samples from it. In 2008, we published a paper at CVPR in which we learned a face manifold using tens of millions of images, which is still the largest scale manifold learning study to date.
To be able to do manifold learning at such a large scale, we had to address two key issues: First, how to do nearest neighbor search in very large databases? We used spill-trees to speed up the search to construct the neighborhood graph. Second, how to do spectral decomposition of matrices which are hundreds of terabytes in size? We investigated sampling-based matrix decomposition methods to handle such matrices.
One way to visualize the quality of the manifold is to find shortest paths between pairs of faces in the manifold, and observe the smoothness of the transitions between them. This is exactly what People Hopper does. Curious? Try People Hopper on orkut now!
The quality of the face manifold depends on three main factors: the number of faces in the manifold, the appearances of those faces, and the similarity measure used for image matching. Since we cannot control the number or appearance of the faces in orkut profiles, it may happen that for a particular image there exists no visually similar image in the database. We plan to update our graph over public profile images frequently, so the quality of paths will change as users join orkut or update their profile images. Finding better contrast normalization and similarity measures is a topic of continuing research. Currently we don't use any face-specific features during this process, just simple image distances.
We are eager to hear your feedback on how we can make this application more fun and useful. Also, if for any reason you would prefer your profile image not to appear in any People Hopper path, you can choose to opt out by visiting our People Hopper homepage.
add to del.icio.us. look up in del.icio.us2010-03-02 Tue
add to del.icio.us. look up in del.icio.us





