Skip to content

Hashable records #641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 68 commits into from
Oct 7, 2019
Merged

Hashable records #641

merged 68 commits into from
Oct 7, 2019

Conversation

jeffreylovitz
Copy link
Contributor

No description provided.

@jeffreylovitz jeffreylovitz force-pushed the hashable-records branch 4 times, most recently from d5ea6db to 8ce1524 Compare September 23, 2019 21:41
@jeffreylovitz jeffreylovitz force-pushed the hashable-records branch 3 times, most recently from b8e2c08 to aecc55e Compare October 3, 2019 18:55
@@ -29,20 +29,20 @@ TEST_F(PagerankTest, Pagerank) {
GrB_init(GrB_NONBLOCKING);

GrB_Matrix A;
double tol = 1e-4 ;
int iters, itermax = 100 ;
double tol = 1e-4 ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is wrong with my indentation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, nothing personal! make format is the culprit. In this case, it replaced some spaces with tabs.

Comment on lines +185 to 187
AST *master_ast = AST_Build(parse_result);
AST *ast = AST_NewSegment(master_ast, 0, cypher_ast_query_nclauses(master_ast->root));
QueryGraph *qg = BuildQueryGraph(gc, ast);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we use master_ast as an input to BuildQueryGraph ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The referenced entities map is populated in AST_NewSegment, and is required to properly build algebraic expressions!

segment->record_map = RecordMap_New();
_BuildReturnExpressions(segment, AST_GetClause(ast, CYPHER_AST_RETURN), ast);
AlgebraicExpression **ae = AlgebraicExpression_FromQueryGraph(qg, segment->record_map, exp_count);
// _BuildReturnExpressions(AST_GetClause(ast, CYPHER_AST_RETURN), ast);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove comment


static cypher_astnode_t *_create_anon_identifier(const cypher_astnode_t *node, int anon_count) {
char *alias;
int alias_len = asprintf(&alias, "anon_%d", anon_count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about switching to a stack base alias?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good idea!

static void _BuildQueryGraphAddNode(const GraphContext *gc,
const AST *ast,
const cypher_astnode_t *ast_entity,
static void _BuildQueryGraphAddNode(const GraphContext *gc, const cypher_astnode_t *ast_entity,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QueryGraph *qg should be the first argument,
Consider removing const GraphContext *gc to reduce function signature, get it from LTS

@@ -25,7 +31,7 @@ static void _setupTraversedRelations(CondVarLenTraverse *op, QGEdge *e) {
}
}

int CondVarLenTraverseToString(const OpBase *ctx, char *buff, uint buff_len) {
static int ToString(const OpBase *ctx, char *buff, uint buff_len) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CondVarLenTraverseToString ?

}

OpResult ApplyInit(OpBase *opBase) {
static OpResult ApplyInit(OpBase *opBase) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

}

OpResult AllNodeScanInit(OpBase *opBase) {
static OpResult AllNodeScanInit(OpBase *opBase) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove


/* Mark alias as being projected by operation.
* Returns the ID associated with alias. */
int OpBase_Projects(OpBase *op, const char *alias);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate why do we need this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good catch - this function was deleted, I forgot to update the header file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

/* Returns true if op is aware of alias.
* an operation is aware of all aliases it modifies and all aliases
* modified by prior operation within its segment. */
bool OpBase_Aware(OpBase *op, const char *alias, int *idx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the comment statement is only valid during execution plan construction time,
when the execution plan is fully constructed I believe this statement is false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only called within NewOp routines currently! Theoretically it could be called within Init routines as well (as OpBase_Modifies is for purposes), but I think that would be valid as well - the mappings are scoped even post-construction.

array_free(merge_clause_indices);

return res;
}

static AST *_NewMockASTSegment(const cypher_astnode_t *root, uint start_offset, uint end_offset) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this function being used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _ValidateScopes, once for each query scope. We don't want to use AST_NewSegment here because that function builds the entire reference map.

* (from variadic) to a triemap. */
void AR_EXP_CollectEntityIDs(AR_ExpNode *root, rax *record_ids);
void AR_EXP_CollectEntities(AR_ExpNode *root, rax *record_ids);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider renaming record_ids

/* Constructs string representation of arithmetic expression tree. */
void AR_EXP_ToString(const AR_ExpNode *root, char **str);

void AR_EXP_BuildResolvedName(AR_ExpNode *root);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a short comment describing the function.

void AR_EXP_BuildResolvedName(AR_ExpNode *root);

/* Construct an arithmetic expression tree from a CYPHER_AST_EXPRESSION node. */
AR_ExpNode *AR_EXP_FromExpression(const cypher_astnode_t *expr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't AR_EXP_FromExpression be part of ast_build_ar_exp.c?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this should have been deleted - good catch!

@@ -268,88 +268,125 @@ static inline void _AR_EXP_FreeResultsArray(SIValue *results, int count) {
}
}

static AR_EXP_Result _AR_EXP_EvaluateFunctionCall(AR_ExpNode *node, const Record r,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this function can be a bit more "structured"

  1. switch on noew->op.type
  2. use goto which will jump to perform both _AR_EXP_FreeResultsArray and return res

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a switch, but I find it a bit odd aesthetically. Will you verify that this is the logic you meant?

AST *ast_segment = AST_NewSegment(ast, start_offset, end_offset);

// Construct a new ExecutionPlanSegment.
ExecutionPlan *segment = _NewExecutionPlan(ctx, gc, ast_segment, result_set);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep _NewExecutionPlan signature to the bare minimum, whichever argument we can get from LTS let's get it from there.

Comment on lines 698 to 710
for(int i = 0; i < segment_count; i++) {
uint end_offset = segment_indices[i];
// Slice the AST to only include the clauses in the current segment.
AST *ast_segment = AST_NewSegment(ast, start_offset, end_offset);

// Construct a new ExecutionPlanSegment.
ExecutionPlan *segment = _NewExecutionPlan(ctx, gc, ast_segment, result_set);

AST_Free(ast_segment); // Free the AST segment.

segments[i] = segment;
start_offset = end_offset;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

Comment on lines 26 to 30
static inline OpBase *_ExecutionPlan_LocateLeaf(OpBase *root, OpBase *prev_segment_head) {
if(root == prev_segment_head) return root->parent; // Don't recurse into the previous segment.
if(root->childCount == 0) return root;
return _ExecutionPlan_LocateLeaf(root->children[0]);
return _ExecutionPlan_LocateLeaf(root->children[0], prev_segment_head);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way _ExecutionPlan_LocateLeaf is being used I don't see how we can get from root to prev_segment_head
as root is not yet connected to prev_segment_head In my eyes we would always return due to if(root->childCount == 0)
Also removing prev_segment_head would make this function much more clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Good call, the prev_segment_head logic can be safely deleted! This had been a way to avoid accidentally recursing into a previous segment when we have multiple WITH clauses, but the projection-building logic has been improved such that this is no longer a concern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

// Collect all unique aliases
const char **aliases = AST_CollectElementNames(ast);
static void _PopulateProjectAll(ExecutionPlan *previous_segment, OpBase *op) {
char **aliases = _CollectAliases(previous_segment->record_map);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MATCH (a)-[b]->(c) WHERE a.v = 1 RETURN *
should return a,b,c
although previous_segment->record_map would only contain a as it is the only identifier which is referenced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query works as expected right now. The reason is that while building the AST reference map, we check to see if we're in STAR projection, and if so mark all user aliases as referenced (and must, as otherwise we won't populate AlgebraicExpressions properly).

Essentially, we handle STAR projections at the AST level and again here at the ExecutionPlan level, which is an annoying and weak approach, but will hopefully be replaced quite soon!


// Free current AST segment if it has been constructed here.
if(ast_segment != ast) AST_Free(ast_segment);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra line

@swilly22 swilly22 merged commit 488035a into master Oct 7, 2019
@swilly22 swilly22 deleted the hashable-records branch October 7, 2019 19:10
pnxguide pushed a commit to CMU-SPEED/RedisGraph that referenced this pull request Mar 22, 2023
* add mapping to record

* opbase updates record mapping with modifiers

* Remove ops consume,init,reset and free functions

* WIP

* WIP

* WIP

* Post-rebase fixes

* More fixes

* WIP

* Rename AR_Exp array in OpSort

* WIP

* Separate mapping for Record projection, updated projecting ops

* Set up modified aliases for Project and Aggregate

* delete ast_mapping.c

* changed unit tests

* introduced reference mapping. wip. unit tests do no link

* removed warnings from unit tests

* started ast segment unit tests

* OpFilter introductions, COUNT optimization

* Bugfixes, RETURN-only query support, functional FilterTree

* Fix record lookups in OpUpdate

* Fixes to OpDelete

* Fixes to OpUnwind

* tested explicit filters and set clauses

* tested merge clause for ref entities

* tested reference mapping

* Post-rebase fixes

* Updates to AST referred entity logic

* Build ResultSet header

* WIP Create edges with appropriate src and dest

* Fix OpType enum

* Join segments on projections ops, iterative MATCH and call processing

* Don't return ORDER entities

* Track referred DELETE entities

* Revert to op-specific names for functions like Consume

* Various fixes

* name anonymous entities declared within WHERE clause, e.g. WHERE ()-[]->()

* remove filtered path validation

* full ast scan to find anonymous entities

* Handle RETURN * projections

* Fixes to MERGE and WITH

* Fixes

* Unit test fixes

* Fix referenced logic in AlgebraicExpression decomposition

* Support WITH * constructions

* Fix AR_Exp sharing of record-hosted values

* fixed test referenced entities to ignore inline label or relation type in count of referenced entities

* Use RedisGraph fork of libcypher-parser

* Various PR cleanups

* Fix memory leak in enriched AST

* One-time Record alias resolution for ops

* Fix compiler warnings, Record cleanup

* Fix memory leaks, general cleanup

* AST reference map includes all WITH/RETURN * projections

* Further leak fixes

* PR fixes

* PR fixes 2

* Post-rebase fixes

* Fix TODOs

* PR Updates

* Improve logic for adding sort expressions to projection ops

* PR updates

* Move AST reference map logic to separate file

* Improve ArithmeticExpression logic

* Re-introduce RecordCap logic when LIMIT is specified

* Revert case statement in AR_Exp function evaluation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants