最新文章专题视频专题问答1问答10问答100问答1000问答2000关键字专题1关键字专题50关键字专题500关键字专题1500TAG最新视频文章推荐1 推荐3 推荐5 推荐7 推荐9 推荐11 推荐13 推荐15 推荐17 推荐19 推荐21 推荐23 推荐25 推荐27 推荐29 推荐31 推荐33 推荐35 推荐37视频文章20视频文章30视频文章40视频文章50视频文章60 视频文章70视频文章80视频文章90视频文章100视频文章120视频文章140 视频2关键字专题关键字专题tag2tag3文章专题文章专题2文章索引1文章索引2文章索引3文章索引4文章索引5123456789101112131415文章专题3
问答文章1 问答文章501 问答文章1001 问答文章1501 问答文章2001 问答文章2501 问答文章3001 问答文章3501 问答文章4001 问答文章4501 问答文章5001 问答文章5501 问答文章6001 问答文章6501 问答文章7001 问答文章7501 问答文章8001 问答文章8501 问答文章9001 问答文章9501
当前位置: 首页 - 科技 - 知识百科 - 正文

FiltersinHBase(orintrarowscanningpartII)

来源:懂视网 责编:小采 时间:2020-11-09 13:32:19
文档

FiltersinHBase(orintrarowscanningpartII)

FiltersinHBase(orintrarowscanningpartII):Filters in HBase are a somewhat obscure and under-documented feature. (Even us committers are often not aware of their usefulness - see HBASE-5229, and HBASE-4256... Or maybe it's just me...). Intras row scanning can be done using ColumnRa
推荐度:
导读FiltersinHBase(orintrarowscanningpartII):Filters in HBase are a somewhat obscure and under-documented feature. (Even us committers are often not aware of their usefulness - see HBASE-5229, and HBASE-4256... Or maybe it's just me...). Intras row scanning can be done using ColumnRa

Filters in HBase are a somewhat obscure and under-documented feature. (Even us committers are often not aware of their usefulness - see HBASE-5229, and HBASE-4256... Or maybe it's just me...). Intras row scanning can be done using ColumnRa

Filters in HBase are a somewhat obscure and under-documented feature. (Even us committers are often not aware of their usefulness - see HBASE-5229, and HBASE-4256... Or maybe it's just me...).

Intras row scanning can be done using ColumnRangeFilter. Other filters such as ColumnPrefixFilter or MultipleColumnPrefixFilter might also be handy for this. All three filters have in common that they can provide scanners (see scanning in hbase) with what I will call "seek hints". These hints allow a scanner to seek to the next column, the next row, or an arbitrary next cell determined by the filter. This is far more efficient than having a dumb filter that is passed each cell and determines whether the cell is included in the result or not.

Many other filters also provide these "seek hints". The exception here are filters that filter on column values, as there is no inherent ordering between column values; these filters need to look at the value for each column.

For example check out this code in MultipleColumnPrefixFilter (ASF 2.0 license):
TreeSet lesserOrEqualPrefixes =
(TreeSet) sortedPrefixes.headSet(qualifier, true);
if (lesserOrEqualPrefixes.size() != 0) {
byte [] largestPrefixSmallerThanQualifier = lesserOrEqualPrefixes.last();
if (Bytes.startsWith(qualifier, largestPrefixSmallerThanQualifier)) {
return ReturnCode.INCLUDE;
}
if (lesserOrEqualPrefixes.size() == sortedPrefixes.size()) {
return ReturnCode.NEXT_ROW;
} else {
hint = sortedPrefixes.higher(largestPrefixSmallerThanQualifier);
return ReturnCode.SEEK_NEXT_USING_HINT;
}
} else {
hint = sortedPrefixes.first();
return ReturnCode.SEEK_NEXT_USING_HINT;
}
(the is used later to skip ahead to that column prefix)

See how this code snippet allows the filter to

  1. seek to the next row if all prefixes are know to be less or equal the current qualifier (and the largest didn't match the passed column qualifier). Note that a single seek to the next row can potentially skip millions of columns with a single seek operation.
  2. seek to the next larger prefix if there are more prefixes, but the current does not match the qualifier.
  3. seek to the first prefix (the smallest) if none the prefixes are less or equal to the current qualifier.
If you didn't feel like looking at the code, you can take away from this that these filters can be safely and efficiently used in very wide rows. If the filter instead would indicate only INCLUDE or SKIP and be forced to visit/examine every version of every column of every row, it would be inefficient to use for wide rows with hundreds of thousands or millions of columns.

I'm in the process of adding more information for these Filter to the HBase Book Reference Guide.

声明:本网页内容旨在传播知识,若有侵权等问题请及时与本网联系,我们将在第一时间删除处理。TEL:177 7030 7066 E-MAIL:11247931@qq.com

文档

FiltersinHBase(orintrarowscanningpartII)

FiltersinHBase(orintrarowscanningpartII):Filters in HBase are a somewhat obscure and under-documented feature. (Even us committers are often not aware of their usefulness - see HBASE-5229, and HBASE-4256... Or maybe it's just me...). Intras row scanning can be done using ColumnRa
推荐度:
标签: 扫描 sin or
  • 热门焦点

最新推荐

猜你喜欢

热门推荐

专题
Top