Numériser avec un filtre à l'aide du shell HBase

Question

Quelqu'un sait-il comment analyser des enregistrements en fonction d'un filtre d'analyse, par exemple:

column:something = "somevalue"

Quelque chose comme ceci , mais à partir de shell HBase?

havanki4j · Accepted Answer

Essaye ça. C'est un peu moche, mais ça marche pour moi.

import org.Apache.hadoop.hbase.filter.CompareFilter import org.Apache.hadoop.hbase.filter.SingleColumnValueFilter import org.Apache.hadoop.hbase.filter.SubstringComparator import org.Apache.hadoop.hbase.util.Bytes scan 't1', { COLUMNS => 'family:qualifier', FILTER => SingleColumnValueFilter.new (Bytes.toBytes('family'), Bytes.toBytes('qualifier'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('somevalue')) }

HBase Shell inclura tout ce que vous avez dans ~/.irbrc, pour que vous puissiez y insérer quelque chose comme ça (je ne suis pas un expert Ruby, les améliorations sont les bienvenues):

# imports like above def scan_substr(table,family,qualifier,substr,*cols) scan table, { COLUMNS => cols, FILTER => SingleColumnValueFilter.new (Bytes.toBytes(family), Bytes.toBytes(qualifier), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new(substr)) } end

et alors vous pouvez simplement dire dans le shell:

scan_substr 't1', 'family', 'qualifier', 'somevalue', 'family:qualifier'

dape · Answer

scan 'test', {COLUMNS => ['F'],FILTER => \ "(SingleColumnValueFilter('F','u',=,'regexstring:http:.*pdf',true,true)) AND \ (SingleColumnValueFilter('F','s',=,'binary:2',true,true))"}

Plus d'informations peuvent être trouvées ici . Notez que plusieurs exemples se trouvent dans le fichier Filter Language.docx attaché.

Tony · Answer

Utilisez le paramètre FILTER de scan, comme indiqué dans l'aide à l'utilisation:

hbase(main):002:0> scan ERROR: wrong number of arguments (0 for 1) Here is some help for this command: Scan a table; pass table name and optionally a dictionary of scanner specifications. Scanner specifications may include one or more of: TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH, or COLUMNS. If no columns are specified, all columns will be scanned. To scan all members of a column family, leave the qualifier empty as in 'col_family:'. Some examples: hbase> scan '.META.' hbase> scan '.META.', {COLUMNS => 'info:regioninfo'} hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'} hbase> scan 't1', {FILTER => org.Apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)} hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]} For experts, there is an additional option -- CACHE_BLOCKS -- which switches block caching for the scanner on (true) or off (false). By default it is enabled. Examples: hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}

KannarKK · Answer

Scan scan = new Scan(); FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL); //in case you have multiple SingleColumnValueFilters, you would want the row to pass MUST_PASS_ALL conditions or MUST_PASS_ONE condition. SingleColumnValueFilter filter_by_name = new SingleColumnValueFilter( Bytes.toBytes("SOME COLUMN FAMILY" ), Bytes.toBytes("SOME COLUMN NAME"), CompareOp.EQUAL, Bytes.toBytes("SOME VALUE")); filter_by_name.setFilterIfMissing(true); //if you don't want the rows that have the column missing. Remember that adding the column filter doesn't mean that the rows that don't have the column will not be put into the result set. They will be, if you don't include this statement. list.addFilter(filter_by_name); scan.setFilter(list);

Chetan Somani · Answer

Un des filtres est Valuefilter, qui peut être utilisé pour filtrer toutes les valeurs de colonne.

hbase(main):067:0> scan 'dummytable', {FILTER => "ValueFilter(=,'binary:2016-01-26')"}

binary est l'un des comparateurs utilisés dans le filtre. Vous pouvez utiliser différents comparateurs dans le filtre en fonction de ce que vous voulez faire.

Vous pouvez consulter l’URL suivante: http: // www.hadooptpoint.com/filters-in-hbase-Shell/. .__ Il fournit de bons exemples d'utilisation de différents filtres dans HBase Shell.

Prateek Mishra · Answer

Ajoutez setFilterIfMissing (true) à la fin de la requête

hbase(main):009:0> import org.Apache.hadoop.hbase.util.Bytes; import org.Apache.hadoop.hbase.filter.SingleColumnValueFilter; import org.Apache.hadoop.hbase.filter.BinaryComparator; import org.Apache.hadoop.hbase.filter.CompareFilter; import org.Apache.hadoop.hbase.filter. Filter; scan 'test:test8', { FILTER => SingleColumnValueFilter.new(Bytes.toBytes('account'), Bytes.toBytes('ACCOUNT_NUMBER'), CompareFilter::CompareOp.valueOf('EQUAL'), BinaryComparator.new(Bytes.toBytes('0003000587'))).setFilterIfMissing(true)}